Matches in ScholarlyData for { ?s <https://w3id.org/scholarlydata/ontology/conference-ontology.owl#abstract> ?o. }
- 7 abstract "Ontology matching is one of the key research topics in the field of Semantic Web. In the last few years, many matching methods have been proposed to generate matches between different ontologies either automatically or semi-automatically. To select appropriate ones, users need some measures to judge whether a method can achieve the similar compliance even on one dataset without reference matches and whether such a method is reliable w.r.t. its output result along with the confidence. However, widely-used traditional measures like precision and recall fail to provide sufficient hints on them. In this paper, we design two novel evaluation measures to evaluate stability of matching methods and one measure to evaluate credibility of matching confidence values, which help to answer the above two questions. Additionally, we carry out comparison among several carefully selected methods systematically using our new measures. Besides, we report some interesting findings such as identifying potential defects of our subjects.".
- 11 abstract "Epistemic querying extends standard ontology inferencing by allowing for deductive introspection. We propose a technique for epistemic querying of OWL2 ontologies not featuring nominals and universal roles by a reduction to a series of standard OWL2 reasoning steps thereby enabling the deployment of off-the-shelf OWL2 reasoning tools for this task. We prove formal correctness of our method, justify the omission of nominals and universal role, and provide an implementation as well as evaluation results.".
- 17 abstract "We present methods that compute generalizations of concepts or individuals described in ontologies written in the Description Logic EL. These generalizations are the basis of methods for ontology design and are the core of concept similarity measures. The reasoning service least common subsumer (lcs) generalizes a set of concepts. Similarly, the most specific concept (msc) generalizes an individual into a concept description. For EL the lcs and the msc do not need to exist, if computed w.r.t. general EL-TBoxes. However, it is possible to find a concept description that is the lcs (msc) up to a certain role-depth. In this paper we present a practical approach for computing the role-depth bounded lcs and msc, based on the polynomial-time completion algorithm for EL and describe its implementation.".
- 2 abstract "In this paper, we investigate an extension of the description logic SHIQ-a knowledge representation formalism used for the Semantic Web- with transitive closure of roles occurring not only in concept inclusion axioms but also in role inclusion axioms. It was proved that adding transitive closure of roles to SHIQ without restriction on role hierarchies may lead to undecidability. We have identified a kind of role inclusion axioms that is responsible for this undecidability and we propose a restriction on these axioms to obtain decidability. Next, we present a tableaux-based algorithm that decides satisfiability of concepts in the new logic.".
- 9 abstract "The SPARQL query language is currently being extended by W3C with so-called entailment regimes, which define how queries are evaluated under more expressive semantics than simple entailment. We describe a sound and complete algorithm for the OWL Direct Semantics entailment regime. The queries of the regime are very expressive since variables can occur within complex class expressions and can also bind to class or property names. We propose several novel optimizations such as strategies for determining a good query execution order, query rewriting techniques, and show how specialized OWL reasoning tasks and the class and property hierarchy can be used to reduce the query execution time. We provide a prototypical implementation and evaluate the efficiency of the proposed optimizations. For standard conjunctive queries our system performs comparably to already deployed systems. For complex queries an improvement of up to three orders of magnitude can be observed.".
- 1 abstract "The W3C SPARQL working group is defining the new SPARQL 1.1 query language. The current working draft of SPARQL 1.1 focuses mainly on the description of the language. In this paper, we provide a formalization of the syntax and semantics of the SPARQL 1.1 federation extension, an important fragment of the language that has not yet received much attention. Besides, we propose optimization techniques for this fragment, provide an implementation of the fragment including these techniques, and carry out a series of experiments that show that our optimization procedures could significantly speed up the query evaluation process.".
- 13 abstract "Processing a join operation using MapReduce platforms such as Hadoop incurs communication and I/O costs due to data transfer between the Map and Reduce phases. This cost is prohibitive for RDF graph patterns which typically involve several joins. Existing approaches on optimizing RDF graph pattern matching exploit the existence of star-join structures in RDF graph patterns and propose the use of bushy query execution plans instead of linear ones. However, some Hadoop-based data processing systems such as Yahoo’s Pig only support linear execution plans, and require significant modifications to the scheduler to support bushy plans. In this paper, we propose an approach for “sneaking in” bushy query execution plans into Hadoop by interpreting star joins as “groups of triples” known as TripleGroups. We present an alternative intermediate TripleGroup-based algebra called the Nested Triple Group Algebra (NTGA) for rewriting star-join queries as TripleGroup queries. We also propose a data representation format (RDFMap) that more efficiently supports TripleGroup-based processing than the existing relational tuple infrastructure. We present a comparative performance evaluation of the traditional Pig approach and RAPID+ (Pig extended with NTGA) for graph pattern matching queries on the BSBM benchmark dataset. Results show over 60% performance of our approach over traditional Pig for some tasks.".
- 23 abstract "Entity-relationship-structured data is becoming more important on the Web. For example, large knowledge bases have been automatically constructed by information extraction from Wikipedia and other Web sources. Entities and relationships can be represented by subject-property-object triples in the RDF model, and can then be precisely searched by structured query languages like SPARQL. Because of their Boolean-match semantics, such queries often return too few or even no results. To improve recall, it is thus desirable to support users by automatically relaxing or reformulating queries in such a way that the intention of the original user query is preserved while returning a sufficient number of ranked results. In this paper we describe comprehensive methods to relax SPARQL-like triple-pattern queries in a fully automated manner. Our framework produces a set of relaxations by means of statistical language models for structured RDF data and queries. The query processing algorithms merge the results of different relaxations into a unified result list, with ranking based on any ranking function for structured queries over RDF-data. Our experimental evaluation, with two different datasets about movies and books, shows the effectiveness of the automatically generated relaxations and the improved quality of query results based on assessments collected on the Amazon Mechanical Turk platform.".
- 26 abstract "The emergence of the Semantic Web has led to the creation of large semantic knowledge bases, often in the form of RDF databases. Improving the performance of RDF databases necessitates the development of specialized data management techniques, such as the use of shortcuts in the place of path queries. In this paper we deal with the problem of selecting the most beneficial shortcuts that reduce the execution cost of path queries in RDF databases given a space constraint. We first demonstrate that this problem is an instance of the quadratic knapsack problem. Given the computational complexity of solving such problems, we then develop an alternative formulation based on a bi-criterion linear relaxation, which essentially seeks to minimize a weighted sum of the query cost and of the required space consumption. As we demonstrate in this paper, this relaxation leads to very efficient classes of linear programming solutions. We utilize this bi-criterion linear relaxation in an algorithm that selects a subset of shortcuts to materialize. This shortcut selection algorithm is extensively evaluated and compared with a greedy algorithm that we developed in prior work. The reported experiments show that the linear relaxation algorithm manages to significantly reduce the query execution times, while also outperforming the greedy solution.".
- 27 abstract "There is a comprehensive body of theory studying updates and schema evolution of Knowledge bases, ontologies, and in particular of RDFS. In this paper we turn these ideas into practice by presenting a feasible and practical procedure for updating RDFS. Along the lines of ontology evolution, we treat schema and instance updates separately, showing that RDFS instance updates are not only feasible, but also deterministic. For RDFS schema update, known to be untractable in the general abstract case, we show that it becomes feasible in real world datasets. We present for both, instance and schema update, simple and feasible algorithms.".
- 32 abstract "The evaluation of matching applications is becoming a major issue in the semantic web and it requires a suitable methodological approach as well as appropriate benchmarks. In particular, in order to evaluate a matching application under different experimental conditions, it is crucial to provide a test dataset characterized by a controlled variety of different heterogeneities among data that rarely occurs in real data repositories. In this paper, we propose SWING (Semantic Web INstance Generation), a disciplined approach to the semi-automatic generation of benchmarks to be used for the evaluation of matching applications. SWING is illustrated in the paper by presenting the specific benchmark we generated for the international instance matching contest at OAEI 2010 (called IIMB 2010) and by discussing the experimental results obtained on it with different matching algorithms.".
- 49 abstract "Skyline queries are a class of preference queries that are valuable for multi-criteria decision making scenarios. Such queries compute the pareto-optimal tuples from a set of tuples. This problem has received significant attention in the context of relational data where many techniques focus on answering queries over a single table. Consequently, for multi-relational skyline query scenarios, as would be the norm for RDF, the strategy for query evaluation would need to be a join-first-skyline-later strategy. However, such a split computational strategy limits the optimization opportunities that are useful for pruning search space via information passing between the join phase and the skyline phase. Other available techniques for multi-relational skyline queries assume storage and indexing techniques that are not typically used with RDF, thereby requiring a preprocessing step. In this paper, we present an approach for optimizing skyline queries over RDF data. The approach is based on the concept of a “Header Point” which maintains a concise summary of visited region in the data space. This summary allows some fraction of non-skyline tuples to be pruned from the set advancing to the skyline processing phase, thus reducing the number of expensive dominance checks required in the skyline phase. We further present more aggressive pruning rules that result in the computation of near-complete skylines in significantly less time than the complete algorithm. A comprehensive performance evaluation of different algorithms is presented using datasets with different types of data distributions generated using a benchmark data generator.".
- 52 abstract "This paper describes the design and implementation of Minimal RDFS semantics based on a backward chaining approach and implemented on a clustered RDF triplestore. The system presented, called 4sr, uses 4store as base infrastructure. In order to achieve a highly scalable system we implemented the reasoning at the lowest level of the quad store, the bind operation. The bind operation runs concurrently in all the data slices allowing the reasoning to be processed in parallel among the cluster. Throughout this paper we provide detailed descriptions of the architecture, reasoning algorithms, and a scalability evaluation with the LUBM benchmark. 4sr is a stable tool available under a GNU GPL3 license and can be freely used and extended by the community.".
- 7 abstract "This paper presents Grr, a powerful system for generating random RDF data, which can be used to test Semantic Web applications. Grr has a SPARQL-like syntax, which allows the system to be both powerful and convenient. It is shown that Grr can easily be used to produce intricate datasets, such as the LUBM benchmark. Optimization techniques are employed, which make the generation process efficient and scalable.".
- 8 abstract "To-date, the application of high-performance computing resources to Semantic Web data has largely focused on commodity hardware and distributed memory platforms. In this paper we make the case that more specialized hardware can offer superior scaling and close to an order of magnitude improvement in performance. In particular we examine the Cray XMT. Its key characteristics, a large, global sharedmemory, and processors with a memory-latency tolerant design, offer an environment conducive to programming for the Semantic Web and have engendered results that far surpass current state of the art. We examine three fundamental pieces requisite for a fully functioning semantic database: dictionary encoding, RDFS inference, and query processing. We show scaling up to 512 processors (the largest configuration we had available), and the ability to process 20 billion triples completely inmemory.".
- 3 abstract "Modern scientific applications of sensor networks are driving the development of technologies to make heterogeneous sensor networks easier to deploy, program and use in multiple application contexts. One key requirement, addressed in this work, is the need for methods to detect events in real time that arise from complex correlations of measurements made by independent sensing devices. Because the mapping of such complex events to direct sensor measurements may be poorly understood, such methods must support experimental and frequent specification of the events of interest. This means that the event specification method must be embedded in the problem domain of the end-user, must support the user to discover observable properties of interest, and must provide automatic and efficient enaction of the specification. This paper proposes the use of ontologies to specify and recognise complex events that arise as selections and correlations (including temporal correlations) of structured digital messages, typically streamed from multiple sensor networks. Ontologies are used as a basis for the definition of contextualised complex events of interest which are translated to selections and temporal combinations of streamed messages. The method is implemented in software built in as a plug-in to Protégé 4.0 that interfaces with a commercial Complex Event Processing (CEP) software package. The interface uses a domain-independent OWL 2.0 ontology of sensed phenomena and events to permit the user to define new events as selections and compositions of the initial events with the ontology providing context for the events. Supported by description logic reasoning, the new event descriptions are translated to the native language of the CEP and executed under the control of the CEP. The software is currently deployed for micro-climate monitoring of experimental crop plants, where precise knowledge and control of growing conditions is needed to map phenotypical traits to the plant genome.".
- 5 abstract "Sensing devices are increasingly being deployed to monitor the physical world around us. One class of application for which sensor data is pertinent is environmental decision support systems, e.g. flood emergency response. However, in order to interpret the readings from the sensors, the data needs to be put in context through correlation with other sensor readings, sensor data histories, and stored data, as well as juxtaposing with maps and forecast models. In this paper we use a flood emergency response planning application to identify requirements for a semantic sensor web. We propose a generic service architecture to satisfy the requirements that uses semantic annotations to support well-informed interactions between the services. We present the SemSorGrid4Env realisation of the architecture and illustrate its capabilities in the context of the example application.".
- 17 abstract "In today's highly dynamic economy, businesses have to adapt quickly to market changes, be it customer, competition- or regulation-driven. Cloud computing promises to be a solution to the ever changing computing demand of businesses. Current SaaS, PaaS and IaaS services are often found to be too inflexible to meet the diverse customer requirements regarding service composition and Quality-of-Service. We therefore propose an ontology-based optimization framework allowing Cloud providers to find the best suiting resource composition based on an abstract request for a custom service. Our contribution is three-fold. First, we describe an OWL/SWRL based ontology framework for describing resources (hard- and software) along with their dependencies, interoperability constraints and meta information. Second, we provide an algorithm that makes use of some reasoning queries to derives a graph over all feasible resource compositions based on the abstract request. Third, we show how the graph can be transformed into an integer program, allowing to find the optimal solution from a profit maximizing perspective.".
- 4 abstract "The usefulness of the Web Ontology Language to describe domains of discourse and to facilitate automatic reasoning services has been widely acknowledged. However, the programmability of ontological knowledge bases is severely impaired by the different conceptual bases of statically typed object-oriented programming languages such as Java and C# and ontology languages such as the Web Ontology Language (OWL). In this work, a novel programming language is presented that integrates OWL and XSD data types with C#. The Zhi# programming language is the first solution of its kind to make XSD data types and OWL class descriptions first-class citizens of a widely-used programming language. The Zhi# programming language eases the development of Semantic Web applications and facilitates the use and reuse of knowledge in form of ontologies. The presented approach was successfully validated to reduce the number of possible runtime errors compared to the use of XML and OWL APIs.".
- 5 abstract "RESTful services are increasingly gaining traction over WS-* ones. As with WS-* services, their semantic annotation can provide benefits in tasks related to their discovery, composition and mediation. In this paper we present an approach to automate the semantic annotation of RESTful services using a cross-domain ontology, two semantic resources like DBpedia and GeoNames, and additional external resources (suggestion and synonym services). We also present a preliminary evaluation in the geospatial domain that proves the feasibility of our approach in a domain where RESTful services are increasingly appearing and highlights that it is possible to carry out this semantic annotation with satisfactory results.".
- 12 abstract "As the most popular microblogging platform, the vast amount of content on Twitter is constantly growing so that the retrieval of relevant information (streams) is becoming more and more difficult every day. Representing the semantics of individual Twitter activities and modeling the interests of Twitter users would allow for personalization and therewith countervail the information overload. Given the variety of topics people discuss on Twitter, semantic user profiles generated from Twitter posts moreover promise to be beneficial also for other applications on the Social Web. However, automatically inferring the semantic meaning of tweets is a non-trivial problem. In this paper we investigate semantic user modeling based on Twitter activities. We introduce and analyze methods for linking Twitter posts with related news articles in order to contextualize Twitter activities. We then propose and compare strategies that exploit the semantics extracted from both tweets and related news articles to represent individual Twitter activities in a semantically meaningful way. A large-scale evaluation validates the applicability of our approach and shows that our methods relate tweets to news articles with high precision, enrich the semantics of tweets clearly and have tremendous impact on the construction of semantic user profiles for the Social Web.".
- 14 abstract "Social media presents unique challenges for topic classification, including the brevity of posts, the informal nature of conversations, and the frequent reliance on external hyperlinks to give context to a conversation. In this paper we investigate the usefulness of these external hyperlinks for determining the topic of individual posts. We focus our analysis on objects which have related metadata available on the Web, either via APIs or as Linked Data. Our experiments show that the inclusion of metadata from hyperlinked objects in addition to the original post content significantly improved classifier performance on two disparate datasets. We found that including selected metadata from APIs and Linked Data gave better results than including text from HTML pages. We also make use of the semantics of the data to compare the usefulness of different types of external metadata for topic classification in a social media dataset.".
- 20 abstract "Social Web platforms are quickly becoming the natural place for people to engage in discussing current events, topics, and policies. Analysing such discussions is of high value to analysts who are interested in assessing up-to-the-minute public opinion, consensus, and trends. However, we have a limited understanding of how content and user features can influence the amount of response that posts (e.g., Twitter messages) receive, and how this can impact the growth of discussion threads. Understanding these dynamics can help users to issue better posts, and enable analysts to make timely predictions on which discussion threads will evolve into active ones and which are likely to wither too quickly. In this paper we present an approach for predicting discussions on the Social Web, by (a) identifying seed posts, then (b) making predictions on the level of discussion that such posts will generate. We explore the use of post-content and user features and their subsequent effects on predictions. Our experiments produced an optimum F1 score of 0.848 for identifying seed posts, and an average measure of 0.673 for Normalised Discounted Cumulative Gain when predicting discussion levels.".
- 32 abstract "Semantic wikis enable collaboration between human agents and the creation of associated knowledge systems. In this way, data embedded in semantic wikis can be mined and the resulting knowledge patterns can be reused to extend and improve the structure the wiki contents. This paper proposes a method for guiding the reengineering and improving the structure of semantic wikis. This method suggests the creation of categories and relations between categories using Formal Concept Analysis (FCA) and the Relational Concept Analysis (RCA) extension. FCA allows the design of a concept lattice while RCA provides relational attributes completing the content of formal concepts. The originality of the approach is to consider the wiki content from FCA and RCA points of view and then to extract knowledge units from this content allowing a factorization and a reengineering of the wiki structure. This method is general and does not depend on any domain and can be thus generalized to every kind of semantic Examples are studied throughout the paper and experiments show the correction and the usefulness of the method.".
- 8 abstract "Recent research has demonstrated how the widespread adoption of collaborative tagging systems yields emergent semantics. In recent years, much has been learned about how to harvest the data produced by taggers for engineering light-weight ontologies. For example, existing measures of tag similarity and tag relatedness have proven crucial stepstones for making latent semantic relations in tagging systems explicit. However, little progress has been made on other issues, such as understanding the different levels of tag generality (or tag abstractness), which is essential for, for example, identifying hierarchical relationships between concepts. In this paper we aim to address this gap. Starting from a review of linguistic definitions of word abstractness, we first use several large-scale ontologies and taxonomies as grounded measures of term abstractness, including Yago, DMOZ, Wordnet and a taxonomy derived from Wikipedia. Then, we introduce and apply several folksonomy-based methods to measure the level of abstractness of given tags. We evaluate these methods by comparing them with the grounded measures. Our results suggest that the abstractness of tags in social tagging systems can be approximated with simple measures. We further corroborate this assumption by an examplary user study. Our work has implications for a number of problems related to social tagging systems, including search, tag recommendation, and the acquisition of light-weight ontologies from tagging data.".
- 301 abstract "Most of the CMS platforms lack the management of semantic information about the content although a lot of research has been carried out. The IKS project has introduced a reference architecture for Semantic Content Management Systems (SCMS). The objective is to merge the latest advancements in semantic web technologies with the needs of legacy CMS platforms. Apache Stanbol is a part of this SCMS reference implementation.".
- 303 abstract "Sgvizler is a small JavaScript wrapper for visualization of SPARQL results sets. It integrates well with HTML web pages by letting the user specify SPARQL SELECT queries directly into designated HTML elements, which are rendered to contain the specified visualization type on page load or on function call. Sgvizler supports a vast number of visualization types, most notably all of the major charts available in the Google Chart Tools, but also by allowing users to easily modify and extend the set of rendering functions, e.g., specified using direct DOM manipulation or external JavaScript visualization tool-kits. Sgvizler is compatible with all modern web browsers.".
- 304 abstract "Recently practical approaches for managing and supporting the life-cycle of semantic content on the Web of Data made quite some progress. However, the currently least developed aspect of the semantic content life-cycle is the user- friendly manual and semi-automatic creation of rich semantic content. In this demo we will present the RDFaCE-Lite editor and will show: – how users can annotate textual content using vocabularies and named enti- ties published on the Data Web. – how different NLP APIs can be combined in order to maximize precision and recall of the annotation process. – how the RDFaCE-lite annotation environment can be used within existing applications such as Blogs, CMSs, etc.".
- 312 abstract "This demo shows a mobile Android app that uses openly available geographic data and crowdsources parking availability information, in order to let its users conveniently find parking when coming to work or driving into town. The application builds on Linked Data, and publishes the crowdsourced parking availability data openly as well. Further, it integrates additional related data sources, such as events and services, to provide rich value-adding features that will act as an incentive for users to adopt the app.".
- 315 abstract "While it is easy to find statistics on almost every topics, coming up with an explanation about those statistics is a much more difficult task. This demo showcases the prototype tool Explain-a-LOD, which uses background knowledge from DBpedia for generating possible explanations for a statistic.".
- 316 abstract "The ScienceWISE system is a collaborative ontology editor and paper annotation tool designed to help researchers in their discovery. In this paper, we describe the system currently deployed at sciencewise.info and the exposition of its data as Linked Data. During the “RDFization” process, we faced issues to encode the knowledge base in SKOS and find resources to link to on the LOD. We discuss these issues and the remaining open challenges to implement the remaining target features.".
- 321 abstract "The Large Knowledge Collider (LarKC) is a prominent development platform for the Semantic Web reasoning applications. Guided by the preliminary goal to facilitate the incomplete reasoning, LarKC has evolved in a unique platform, which can be used for the development of robust, flexible, and efficient semantic web applications, also leveraging the modern grid and cloud resources. As a reaction on the numerous requests coming from the tremendously increasing user community of LarKC, we set up a demonstration package for LarKC that is intended to present the main subsystems, development tools and graphical user interfaces of LarKC. The demo aims for both early adopters and experienced users and serves the purpose of promoting Semantic Web Reasoning and LarKC technologies to the potentially new user communities.".
- 322 abstract "The application of methodologies for building ontologies has improved the ontology quality. However, such a quality is not totally guaranteed because of the difficulties involved in ontology modelling. These difficulties are related to the inclusion of anomalies or worst practices within the ontology development. Several authors have provided lists of typical anomalies detected in ontologies during the last decade. In this context, our aim in this paper is to describe OOPS! (OntOlogy Pitfall Scanner!), a tool for detecting pitfalls in ontologies.".
- 323 abstract "The ICE-Map Visualization was developed to graphically analyze the distribution of indexing results within a given Knowledge Organization System (KOS) hierarchy and allows the user to explore the document sets and the KOSs at the same time. In this paper, we demonstrate the use of the ICE-Map Visualization in combination with a simple automatic indexer to visualize the semantic overlap between a KOS and a set of documents.".
- 325 abstract "This demo presents a web application which implements a pipeline for searching and browsing through newspaper archives. It uses a combination of information extraction, enrichment and visualization algorithms to help the user to grasp large amount of articles normally collected in archives. Illustrative results show appropriateness of the proposed pipeline for searching and brows-ing news archives.".
- 331 abstract "In this paper we present a demo for efficient detecting of visitor’s attention in museum environment based on the application of intelligent complex event processing and semantic technologies. Semantics is used for the correlation of sensors’ data via modeling the interesting situation and the background knowledge used for annotation. Intelligent complex event processing enables the efficient real-time processing of sensor data and its logic-based nature supports a declarative definition of attention situations.".
- 349 abstract "The OWLGrEd ontology editor allows graphical visualization and authoring of OWL 2.0 ontologies using a compact yet intuitive presentation that combines UML class diagram notation with textual Manchester syntax for expressions. We present an extension mechanism for OWLGrEd that allows adding custom information areas, rules and visual effects to the ontology presentation thus enabling domain specific OWL ontology visualizations. The usage of OWLGrEd and its extensions is demonstrated on ontology engineering examples involving custom annotation visualizations, advanced UML class dia-gram constructs and integrity constraints in semantic database schema design.".
- 351 abstract "Exposing data about customizable products is a challenging issue, because of the number of features and options a customer can choose from, and the many constraints that exist between them. These constraints are not tractable without automatic reasoning. But the configuration process, which helps a customer to make her choice, one step at a time, is a traversal of a graph of partially defined products - that is, Linked Data. This natural yet fruitful abstraction for product customization results in a generic configuration API, in use at Renault, who has begun publishing data about its range in this way. Current achievements and prototypes of forthcoming developments are presented.".
- 357 abstract "The Linked Data cloud contains large amounts of RDF generated from databases. Much of this RDF, generated using tools such as D2R, is expressed in terms of vocabularies automatically derived from the schema of the original database. The generated RDF would be significantly more useful if it was expressed in terms of commonly used vocabularies. Defining the mapping from structured sources such as databases or spreadsheets to ontologies is labor intensive using today’s tools. For example, to define such mappings in R2R, users must write a mapping rule for each column, and each mapping is expressed in terms of graph patterns, which are hard to write. In this work, we present a semi-automatic approach for building mappings from structured sources to ontologies. Our system, Karma, automatically derives these mappings, and provides an easy to use interface that enables users to control the automated process to guide the system to produce the desired mappings. In our evaluation, users need to interact with the system less than once per column (on average) in order to construct the desired mapping rules. The system then users these mapping rules to generate semantically rich RDF for the data sources.".
- 363 abstract "Citizens are increasingly aware of the influence of environmental and meteorological conditions on the quality of their life. This results in an increasing demand for personalized environmental information, i.e., information that is tailored to citizens’ specific context and background. In this demonstration, we present an environmental information system that addresses this demand in its full complexity in the context of the PESCaDO EU project. Specifically, we will show a system that supports submission of user generated queries related to environmental conditions. From the technical point of view, the system is tuned to discover reliable data in the web and to process these data in order to convert them into knowledge, which is stored in a dedicated repository. At run time, this information is transferred into an ontology-based knowledge base, from which then information relevant to the specific user is deduced and communicated in the language of their preference.".
- 104 abstract "Oncolor is an association whose mission is to publish and share medical guidelines in oncology. As many scientific information websites built in the early times of the Internet, its website deals with unstructured data that cannot be automatically querried and is getting more and more difficult to maintain over time. The online contents access and the editing process can be improved by using web 2.0 and semantic web technologies, which allow to build collaboratively structured information bases in semantic portals. The work described in this paper aims at reporting a migration from a static HTML website to a semantic wiki in the medical domain. This approach has raised various issues that had to be addressed, such as the introduction of structured data in the unstructured imported guidelines or the linkage of content to external medical resources. An evaluation of the result by final users is also provided, and proposed solutions are discussed.".
- 138 abstract "Current search engines present search results in an ordered list even if semantic technologies are used for analyzing user queries and the document contents. The semantic information that is used during the search result generation mostly remains hidden from the user although it significantly supports users in understanding why search results are considered as relevant for their individual query. The approach presented in this paper utilizes visualization techniques for offering visual feedback about the reasons the results were retrieved. It represents the semantic neighborhood of search results, the relations between results and query terms as well as the relevance of search results and the semantic interpretation of query terms for fostering search result comprehension. It also provides visual feedback for query enhancement. Therefore, not only the search results are visualized but also further information that occurs during the search processing is used to improve the visual presentation and to offer more transparency in search result generation. The results of an evaluation in a real application scenario show that the presented approach considerably supports users in assessment and decision-making tasks and alleviates information seeking in digital semantic knowledge bases.".
- 162 abstract "Integrating experimental data and claims from the literature with hypotheses is an essential activity for the life scientist. Such a task is increasingly challenging given the ever growing volume of publications and data sets. Towards addressing this challenge, we previously developed HyQue, a prototype system for hypothesis formulation and evaluation. HyQue uses domain-specific rule sets to evaluate hypotheses based on well understood scientific principles. However, because scientists may use differing scientific premises when exploring a hypothesis, flexibility is required in both crafting and executing rule sets to evaluate hypotheses. Here, we report on an extension of HyQue that incorporates rules specified using the SPARQL Inferencing Notation (SPIN). Hypotheses, background knowledge, queries, results and now rule sets are represented and executed using Semantic Web technologies, enabling users to explicitly trace a hypothesis to its evaluation, including the data and rules used by HyQue. We demonstrate the use of HyQue to evaluate hypotheses about the yeast galactose gene system.".
- 196 abstract "Using ontologies in software applications is a challenging task due to the chasm between the logics-based world of ontologies and the object-oriented world of software applications. The logics-based representation emphasizes the meaning of concepts and properties, i.e., their semantics. The modeler in the object-oriented paradigm also takes into account the pragmatics, i.e., how the classes are used, by whom, and why. To enable a comprehensive use of logics-based representations in object-oriented software systems, a seamless integration of the two paradigms is needed. However, the pragmatic issues of using logic-based knowledge in object oriented software applications has yet not been considered sufficiently. Rather, the pragmatic issues that arise in using an ontology, e.g., which classes to instantiate in which order, remains a task to be carefully considered by the application developer. In this paper, we present a declarative representation for designing and applying programming access to ontologies. Based on this declarative representation, we have build OntoMDE, a model-driven engineering toolkit that we have applied to several example ontologies. These ontologies have been selected in order to showcase the benefits of the our approach and the OntoMDE toolkit over a range of different ontology characteristics in terms of complexity, level of abstraction, degree of formalization, provenance, and domain-specificity.".
- 223 abstract "We will explain how an Linked Open Drug Data (LODD) application based on diseases, drugs, and clinical trials can be used to improve the (ontology-based) clinical reporting process while, at the same time, improving the patient follow-up treatment process. Specific requirements of the radiology domain let us aggregate RDF results from several LODD sources such as DrugBank, Diseasome, DailyMed, and LinkedCT. The idea is to use state-of-the-art string matching algorithms which allow for a ranked list of candidates and confidences of the approximation of the distance between two diseases at query time. Context information must be provided by the clinician who decides on the “related”-mappings of patient context and links he wants to follow in order to retrieve disease and medication information. The industrial prototype is implemented as a facetted browsing tool which allows for all these aspects of interactive ontology alignment.".
- 228 abstract "In this paper, we argue that query relaxation over RDF data is an important but largely overlooked research topic: the Semantic Web standards allow for answering crisp queries over crisp data, but what of use-cases that require approximate answers for structured queries over RDF data? We introduce one such use-case extracted from an EADS project that aims to fuse together intelligence information for police post-incident analysis. The concrete application for query relaxation involves matching (possibly vague) descriptions of entities involved in crimes to structured descriptions thereof in the database. Here, the core research questions are: (i) how can we formalise potentially vague structured queries in a generic manner; (ii) how can we support approximate, structured query-answering over RDF? We first discuss the use-case, formalise the problem, and survey current literature for possible approaches. Next, we present a proof-of-concept framework for enabling relaxation of structured entity-lookup queries, evaluating different distance measures for performing relaxation. We argue that, beyond our specific use-case, query relaxation is important to many potential use-cases for Semantic Web techniques, and worthy of further attention.".
- 255 abstract "In this paper, we present the architecture and implementation of a Semantic Web Knowledge System that is employed to learn driver preferences for Points of Interest (POIs) using a content based approach. Initially, implicit & explicit feedback is collected from drivers about the places that they like. Data about these places is then retrieved from web sources and a POI preference model is built using machine learning algorithms. At a future time, when the driver searches for places that he/she might want to go to, his/her learnt model is used to personalize the result. The data that is used to learn preferences is represented as Linked Data with the help of a POI ontology, and retrieved from multiple POI search services by `lifting' it into RDF. This structured data is then combined with driver context and fed into a machine learning algorithm that produces a statistical model of the driver's preferences. This data and model is hosted in the cloud and is accessible via intelligent services and an access control mechanism to a client device like an in-vehicle navigation system. We describe the design and implementation of such a system that is currently in-use to study how a driver's preferences can be modeled by the vehicle platform and utilized for POI recommendations.".
- 26 abstract "Recently, home garden and green interior have been receiving attention due to the growing environmental consciousness and macrobiotics. However, it is not very simple to grow the greenery in an urban restricted space, then it occasionally results in overgrowth or extinction. Also, it is important for interior / exterior use to balance the greenery to a user's surrounding, but it is difficult for amateurs to imagine its future grown form. Therefore, we propose an Android application ``Green-Thumb Camera'' to query a plant from LOD Cloud to fit environmental conditions based of sensor information on smart phone, and overlay its grown form in the space using AR.".
- 29 abstract "This paper introduces RIF Assembler, a tool that reuses knowledge to automatically construct rule-based systems. Our novel approach is based on (a) annotating domain rules with metadata and (b) expressing assembly instructions as metarules that manipulate the annotations. We leverage the power of RIF as a rule interchange format, and RDF and OWL for rule annotations. RIF Assembler has applications in many scenarios. This paper presents two of them in increasing order of sophistication: a simplistic example related to the health care industry, and an actual usage in the steel industry that involves the construction of a decision-support system driven by business process descriptions.".
- 9 abstract "Statistics are very present in our daily lives. Every day, new statistics are published, showing the perceived quality of living in different cities, the corruption index of different countries, and so on. Interpreting those statistics, on the other hand, is a difficult task. Often, statistics collect only very few attributes, and it is difficult to come up with hypotheses that explain, e.g., why the perceived quality of living in once city is higher than in another. In this paper, we introduce Explain-a-LOD, an approach which uses data from Linked Open Data for generating hypotheses that explain statistics. We show an implemented prototype and compare different approaches for generating hypotheses by analyzing the perceived quality of those hypotheses in a user study.".
- 92 abstract "Ranges of customizable products are huge and complex, because of the number of features and options a customer can choose from, and the many constraints that exist between them. It could hinder the publishing of customizable product data on the web of e-business data, because constraints are not tractable by agents lacking reasoning capabilities. But the configuration process, which helps a customer to make her choice, one step at a time, is a traversal of a graph of partially defined products - that is, Linked Data. Reasoning being hosted on the server, its complexity is hidden from clients. This results in a generic configuration API, in use at Renault. As configurations can be completed to valid commercial offers, the corresponding ontology fits nicely with GoodRelations. Benefits in e-business related use cases are presented: sharing configurations between media, devices and applications, range comparison based on customer's interests, ads, SEO.".
- 302 abstract "If we want automated agents to consume the Web, they need to understand what a certain service does and how it relates to other services and data. The shortcoming of existing service description paradigms is their focus on technical aspects instead of the functional aspect—what task does a service perform, and is this a match for my needs? This paper summarizes our recent work on RESTdesc, a semantic service description approach that centers on functionality. It has a solid foundation in logics, which enables advanced service matching and composition, while providing elegant and concise descriptions, responding to the demands of automated clients on the future Web of Agents.".
- 313 abstract "This poster shows a mobile Android app that uses openly available geographic data and crowdsources parking availability information, in order to let its users conveniently find parking when coming to work or driving into town. The application builds on Linked Data, and publishes the crowdsourced parking availability data openly as well. Further, it integrates additional related data sources, such as events and services, to provide rich value-adding features that will act as an incentive for users to adopt the app.".
- 314 abstract "In this paper, we describe a semantic approach to scholarly identity and scientific attribution based on a trust extension of VIVO, an open-source semantic social network platform for scientists. The Publish Trust pilot demonstrates how researchers can extend and manage verified claims of authorship in a semantic framework using VIVO instances and open identity technologies.".
- 317 abstract "Reasoning is one of the essential application areas of the modern Semantic Web. At present, the semantic reasoning algorithms are facing significant challenges when dealing with the emergence of the Internet-scale knowledge bases, comprising extremely large amounts of data. The traditional reasoning approaches have only been approved for small, closed, trustworthy, consistent, coherent and static data domains. As such, they are not well-suited to be applied in data-intensive applications aiming on the Internet scale. We introduce the Large Knowledge Collider as a platform solution that leverages the service-oriented approach to implement a new reasoning technique, capable of dealing with exploding volumes of the rapidly growing data universe, in order to be able to take advantages of the large-scale and on-demand elastic infrastructures such as high performance computing or cloud technology.".
- 324 abstract "Although the Web is a great success, around 4.5 billion people -mainly in developing countries- are still unable to access its information. Currently, a number of efforts are being undertaken to bridge this so-called ’digital divide’ in the Web of Documents. At the same time, as engineers of the Web of Data, we have the opportunity to not let the “Digital Linked Data Divide” grow too large. Like it does in the developed world, sharing and re-use of locally produced and consumed data can also increase its value in developing regions. We here describe our ongoing efforts to implement Linked Data-backed solutions for the rural Sahel regions, including voice-based interfaces to this data and use cases highlighting the opportunities of re-using the data.".
- 326 abstract "In this poster we present an approach to obtain automatically domain ontologies relying on domain vocabularies elicited from Twitter lists and on the reuse of conceptualizations in existing knowledge bases. We tap into relations established between list names, under which prominent users in the domain of study as well as in the microblogging platform have been classified, to harvest related concepts. The relations between concepts are identified from interlinked general-purpose knowledge bases.".
- 330 abstract "The information needs of researchers are increasingly personalized, tailored to the state of knowledge about different topics of the user, dependent on the work context, and part of an interactive process, where users are engaged with the scientific information space. We aim at revolutionizing the way scientific information can be accessed. This vision is realized as the SciNet, a framework that enables interactive scientific information access through personalized search and user profiling through monitoring the user's behaviour and allowing user to interact with the underlying user and data models.".
- 334 abstract "Identification of Named Entities (NE) such as people, organisations and locations is fundamental to semantic annotation and is the starting point of more advanced text mining algorithms. For instance, sentiment analysis is widely used in finance to extract the latest signals and events from news that could affect stock prices. However, before extracting company-related sentiment, it is necessary to identify the documents containing the corresponding and unambiguous company entities. Humans usually resolve ambiguities based on context. We argue that Linked Data can be a valuable source for extending the already available context. We combine a state-of-the-art named entity tool with novel Linked Data-based similarity measures and show that our algorithm can improve disambiguation accuracy on a subset of Wikipedia user profiles.".
- 335 abstract "We envision a publish/subscribe ontology system that is able to index millions of user subscriptions and filter them against ontology data that arrive in a streaming fashion. In this work, we propose a SPARQL extension appropriate for a publish/subscribe setting; our extension builds on the natural semantic graph matching of the language and supports the creation of full-text subscriptions. Subsequently, we propose a main-memory subscription indexing algorithm which performs both semantic and full-text matching at low complexity and minimal filtering time. Thus, when ontology data are published matching subscriptions are identified and notifications are forwarded to users.".
- 338 abstract "As large amounts of Linked Data are published on the Web, it is becoming apparent that the validity of published knowledge is not absolute, but often depends on time, location, topic, and other contextual attributes. Therefore, an increasingly perceived need for Semantic Web (SW) applications is the representation of the context of such knowledge and its formalization for using it in reasoning and querying. Recognizing this problem, several extensions of RDF and OWL to support contextual qualification of knowledge have been proposed. Among these, we recently presented the Contextualized Knowledge Repository (CKR), a framework with a well-founded semantics based on established AI principia for contextual representation and reasoning. While our previous work has mainly focused on the formal definitions and implementation of the CKR framework, the proposed poster illustrates the practical applicability of CKR features in real-world SW applications. The poster presents a concrete example of CKR use under the point of view of the tasks of modelling, reasoning over and querying contextualized knowledge.".
- 341 abstract "We survey the current state of SKOS vocabularies on the Web. We identified 478 SKOS vocabularies, which were gathered through collections and web crawling. Analyses were then conducted that included investigation of the use of SKOS constructs; the use of SKOS semantic relations and lexical labels; and the structure of vocabularies in terms of the hierarchical and associative relations, branching factors and the depth of the vocabularies. Almost one-third of the SKOS vocabularies collected fall into the term lists category, with no use of any SKOS semantic relations. As concept labelling is core to SKOS vocabularies, a surprising find is that not all SKOS vocabularies use SKOS lexical labels, whether skos:prefLabel or skos:altLabel, for their concepts. The survey results can serve to provide a better understanding of the modelling styles of the SKOS vocabularies published on the Web, especially when considering the creation of applications that utilize these vocabularies.".
- 347 abstract "In order to make research settings transparent and reproducible there is a need for publishing both data and methods behind the research. In this paper our contribution is to show how large amounts of remote sensing observation data about the Brazilian Amazon Rainforest has been published as Linked Spatiotemporal Data. Moreover, we show how this data can be further accessed and analyzed using R statistical computing environment by openly available methods. This all is a contribution towards Linked Science, where not just publications, but data, methods, tools, and all scientific assets are interconnected and shared online.".
- 354 abstract "Stream Reasoning is the combination of reasoning techniques with data streams. In this paper, we present our approach to enable rule-based reasoning on semantic data streams in a distributed manner.".
- 360 abstract "We present ANISE and illustrate the benefits of exploiting knowledge encoded in controlled vocabularies or ontologies to precisely visualize 3D Medical images. ANISE receives 3D images annotated with existing medical ontologies as for example, RadLex and the Foundational Model of Anatomy (FMA), and performs reasoning tasks to improve the effectiveness of the visualization of organs or tissues of interest presented in the input data. We show the quality of ANISE rendering result on a Computed Tomography Head data. Attendees will be able to observe the benefits of using semantic annotations and the precision achieved by the same visualization tool when these annotations are considered for volume rendering.".
- 366 abstract "SEALS is a project about the evaluation of semantic tools.".
- 367 abstract "PlanetData will support the EU community in conducting research in the large-scale data management area through the provision of data sets and access to tailored data management technology. The research is to bring together approaches to large-scale data management from different disciplines in order to create holistic solutions to the challenges faced when dealing with planetary scale data: e.g. data streams, quality and provenance.".
- 368 abstract "Today, Emergency Management is more challenging than ever, as international crisis and natural disasters are increasing in numbers and in complexity. Misunderstandings between the different actors in Emergency Management Services are becoming challenging, particularly when taking into account the legal, linguistic and cultural differences. Despite the fact that Emergency management systems have spread, this challenge has still not been resolved.".
- 369 abstract "In this paper, we report on the I-SEARCH EU (FP7 ICT STREP) project whose objective is the development of a multimodal search engine that supports multimodal in- and output, as well as multimodal query refinement. An important aspect of I-SEARCH is the so-called Rich Unified Content Description (RUCoD) format for the description of low and high level features of content objects—rich media presentations, enclosing different types of media. We have developed a tool called CoFetch for the creation of such content objects, which partly retrieves its data from the Linking Open Data cloud. During the session, we will present a live demonstration of the I-SEARCH search engine and CoFetch, and—via pre-defined use cases—show how we imagine multimodal search in the future. We are looking for networking opportunities with projects dealing with semantic annotation of multimedia archives and projects interested in RUCoD feature extraction techniques.".
- 370 abstract "The goal of the XLike project is to develop technology to monitor and aggregate knowledge that is currently spread across mainstream and social media, and to enable cross-lingual services for publishers, media monitoring and business intelligence.".
- 371 abstract "The ENVISION project provides an ENVIronmental Services Infrastructure with Ontologies that aims to support non ICT-skilled users in the process of discovery, annotation, analysis and composition of environmental services.".
- 372 abstract "The European research project di.me1, targets the integration of a user’s personal information sphere, in order to: i) provide them with a single-point entry to its management; ii) increase their awareness about their digital footprints; iii) provide privacy-sensitive and context-aware recommendation and automation. (The entire paper is an abstract - First paragraph included here)".
- 374 abstract "Advances in remote sensing technologies have allowed us to send an ever-increasing number of satellites in orbit around Earth. As a result, Earth Observation data archives have been constantly increasing in size in the last few years (now reaching petabyte sizes), and have become a valuable source of information for many scientific and application domains (environment, oceanography, geology, archaeology, security, etc.). TELEIOS is a recent European project that addresses the need for scalable access to petabytes of Earth Observa- tion data and the discovery of knowledge that can be used in applications. To achieve this, TELEIOS builds on scientic database technologies (array databases, SciQL, data vaults) and SemanticWeb technologies (stRDF and stSPARQL) im- plemented on top of state of the art database systems (Strabon and MonetDB). In this presentation/poster we outline the vision of TELEIOS (now in its second year) and present the results we have achieved so far.".
- 375 abstract "Interactive Knowledge Stack (IKS) is an FP7 EU-funded integrating project targeted at European small and medium enterprises that provide content and knowledge management technologies to end-users and organisations. Most such enterprises experience hindrances up to utter impediment when leveraging the power of semantic technologies and Linked Data. As such, they remain tied to traditional data silo paradigms and forced to handle intelligent content as old-fashioned closed worlds. Ultimately, this has a negative downstream impact on thousands of end-users and organisations which are served by these providers. The IKS project responds to these needs by providing a reference architecture and an open-source implementation for a layered software platform that can semantically enhance multiple functional aspects of the content management lifecycle. The mission is to provide and promote an open, interoperable and unobtrusive platform that can be deployed alongside a CMS and customised according to specific needs from vendors and adopters.".
- 10 abstract "The discovery of functionally matching services - often referred to as matchmaking - is one of the essential requirements for realizing the vision of the Internet of Services. In practice, however, the process is complicated by the varying quality of syntactic and semantic descriptions of service components. In this work, we propose COV4SWS.KOM, a semantic matchmaker that addresses this challenge through the automatic adaptation to the description quality on different levels of the service structure. Our approach performs very good with respect to common Information Retrieval metrics, achieving top placements in the renowned Semantic Service Selection Contest, and thus marks an important contribution to the discovery of services in a realistic application context.".
- 102 abstract "Huge RDF datasets are currently exchanged on textual RDF formats, hence consumers need to post-process them through RDF stores for local consumption, such as reasoning/integration and SPARQL query. This results in a painful task which requires a great effort in terms of time and computational resources. A first approach to lightweight data exchange is a compact (binary) RDF serialization format called HDT. In this paper, we focus on enhancing the exchanged data in order to optimize the consumption without the need of "unpacking" the data. This is done by post-processing the HDT, enhancing the data with additional structures which allow basic forms of SPARQL queries. Experiments show that i) with an exchanging efficiency that outperforms universal compression, ii) post-processing now becomes a fast process which iii) provides competitive query performance at consumption.".
- 110 abstract "In recent years, top-k query processing has attracted much attention because in large-scale scenarios, computing only the $k$ best solutions is often sufficient and also, the only affordable way to reach acceptable response time. Top-k query processing has been dealt with in different contexts. One line of research targets the so-called join top-k, where the goal is to produce the k best final results through joining partial results. In this paper, we study join top-k in the Linked Data setting, where partial results to be joined come from different sources. Because the only available access pattern in this setting is URI source lookup, processing queries requires entire sources to be retrieved. Targeting this scenario, we show how existing work on join top-$k$ can be adopted to produce top-k results over Linked Data. We elaborate on strategies for book-keeping scores of partial results and to use them for better estimation of candidate result scores, i.e. to obtain tighter bounds for early termination. Based on experiments on real-world Linked Data, we show that the proposed top-$k$ processing technique substantially improves runtime performance.".
- 12 abstract "Representing and reasoning over mereotopological relations (parthood and location) in an ontology is a well-known challenge, because there are many relations to choose from and OWL has limited expressiveness in this regard. To address these issues, we structure mereotopological relations based on the KGEMT mereotopological theory. A correctly chosen relation counterbalances some weaknesses in OWL's representation and reasoning services. To achieve effortless selection of the appropriate relation, we hide the complexities of the underlying theory through automation of modelling guidelines in the new tool OntoPartS. It uses, mainly, the categories from DOLCE, which avoids lengthy question sessions, and it includes examples and verbalizations. OntoPartS was experimentally evaluated, which demonstrated that selecting and representing the desired relation was done efficiently and more accurately with OntoPartS.".
- 124 abstract "Using semantic web search engines, such as Watson, Swoogle or Sindice, to find ontologies is a complex process as it is often an exploratory activity. It generally requires formulating multiple queries, browsing many pages of results and assessing the returned ontologies against each other to obtain a relevant and adequate subset of ontologies for the intended use. Our hypothesis is that part of the difficulty related to searching ontologies comes from the lack of structure in the search results, where ontologies that are implicitly related to each other are presented as disconnected and shown on different result pages. In a previous work, to overcome this situation, we devised a software framework, Kannel, that detects and makes explicit relationships between ontologies in large ontology repositories. In this paper, we present a study that compares the use of the Watson ontology search engine with the use of its extension, Watson+Kannel, that provides explicit information regarding the various types of relationships between the result ontologies. We evaluate the benefit of Watson+Kannel by measuring through various indicators how these explicit relationships between ontologies are used to improve the user’s efficiency in ontology search, thus validating our hypothesis.".
- 132 abstract "The increasing availability of structured data in Resource Description Framework (RDF) format poses new challenges and opportunities for data mining. Existing approaches to mining RDF have focused on one specific data representation, one specific machine learning algorithm or one specific task, only. Kernels, however, promise a more flexible approach by providing a powerful framework for decoupling the data representation from the learning task. This paper focuses on how the well established family of kernel-based machine learning algorithms can be readily applied to instances represented as RDF graphs. We first review the problems that arise when conventional graph kernels are used for RDF graphs. We then introduce two versatile families of RDF graph kernels based on intersection graphs and intersection trees. These kernels can better exploit the inherent properties of RDF, while providing an easy to use interface between any RDF graph (including vocabulary extensions such as RDFS and OWL) and any kernel-based learning algorithm (which are available for solving many machine learning tasks). The flexibility of the approach is demonstrated on two common relational learning tasks: entity classification and link prediction. The results show that our novel RDF graph kernels with standard SVMs achieve competitive predictive performance when compared to specialized techniques for both tasks.".
- 134 abstract "Traditional On-Line Analytical Processing (OLAP) tools have proven to be successful in analyzing large sets of enterprise data. For today's business dynamics, sometimes these highly curated data is not enough. External data (particularly web data), may be useful to enhance local analysis. In this paper we discuss the extraction of multidimensional data from web sources, and their representation in RDFS. We introduce Open Cubes, an RDFS vocabulary for the specification and publication of multidimensional cubes on the Semantic Web, and show how classical OLAP operations can be implemented over Open Cubes using SPARQL 1.1, without the need of mapping the multidimensional information to the local database (the usual approach to multidimensional analysis of Semantic Web data). We show that our approach is plausible for the data sizes that can usually be retrieved to enhance local data repositories.".
- 136 abstract "The topic of study in the present paper is the class of RDF homomorphisms that substitute one predicate for another throughout an set of RDF triples, under the condition that the predicate in question is not also a subject or object. These maps turn out to be suitable for reasoning about similarities in information content between two or more RDF graphs. As such they are very useful e.g. for migrating data from one RDF vocabulary to another. In this paper we address a particular instance of this problem and try to provide an answer to the question of when we are licensed to say that data is being transformed, reused or merged in a non-distortive manner. We place this problem in the context of RDF and Linked Data, and study the problem in relation to SPARQL construct queries.".
- 139 abstract "We present a survey of the current state of Simple Knowledge Organization System (SKOS) vocabularies on the Web. Candidate vocabularies were gathered through collections and web crawling, with 478 identified as complying to a given definition of a SKOS vocabulary. Analyses were then conducted included investigation of the use of SKOS constructs; the use of SKOS semantic relations and lexical labels; and the structure of vocabularies in terms of the hierarchical and associative relations, branching factors and the depth of the vocabularies. Even though SKOS concepts are considered to be the core of SKOS vocabularies, our findings were that not all SKOS vocabularies published explicitly declared SKOS concepts in the vocabularies. Almost one-third of the SKOS vocabularies collected fall into the category of "term lists", with no use of any SKOS semantic relations. As concept labelling is core to SKOS vocabularies a surprising find is that not all SKOS vocabularies use SKOS lexical labels, whether '{skos:prefLabel' or 'skos:altLabel', for their concepts. The branching factors and maximum depth of the vocabularies have no direct relationship to the size of the vocabularies. We also observed some common modelling slips found in SKOS vocabularies. The survey is useful when considering, for example, converting artefacts such as OWL ontologies into SKOS, where a definition of typicality of SKOS vocabularies could be used to guide the conversion. Moreover, the survey results can serve to provide a better understanding of the modelling styles of the SKOS vocabularies published on the Web, especially when considering the creation of applications that utilizes these vocabularies.".
- 154 abstract "Twitter lists constitute a form of organising Twitter users into sets, and can be created and maintained by any user in Twitter. In this paper we describe a characterisation approach of the emergent semantics in these lists, which consists in deriving semantic relations between lists and users by analyzing the co-occurrence of keywords in list names. We use the vector space model and Latent Dirichlet Allocation to obtain similar keywords according to co-occurrence patterns. These results are then compared to similarity measures relying on the WordNet synset hierarchy and to existing Linked Data sets. Results show that co-occurrence of keywords based on members of the lists produce more synonyms and more correlated results to that of WordNet similarity measures.".
- 157 abstract "With the growth of the Linked Data Web, time-efficient approaches for computing links between data sources have become indispensable. Most Link Discovery frameworks implement approaches that require two main computational steps. First, a link specification has to be explicated by the user. Then, this specification must be executed. While several approaches for the time-efficient execution of link specifications have been developed over the last few years, the discovery of accurate link specifications remains a tedious problem. In this paper, we present EAGLE, an active learning approach based on genetic programming. EAGLE generates highly accurate link specifications while reducing the annotation burden for the user. We present EAGLE and the framework within which it is implemented. We evaluate EAGLE against batch learning on three different data sets and show that it can detect specifications with an F-measure superior to 90% while requiring a small number of questions.".
- 158 abstract "Taxonomies are a useful mechanism to organize, evaluate, and search web content. As such, many popular classes of web applications, from product categorization, similar-product comparative pricing, localized services, to vertical or enterprise search, utilize them. However, their manual generation and maintenance by experts is a time-costly and cumbersome procedure, often resulting in platform-dependent and static vocabularies. Hence lots of research has been focusing currently on more flexible and dynamic methods to develop them, as evidenced for example by the huge interest of folksonomies within the social media realm. We propose a new approach for constructing taxonomies. Our idea stems from the increased human involvement and desire to provide tags and annotate web content (e.g., in social media and product categorization applications). We define the required input from human users in the form of explicit structural information; that is, supertype-subtype relationships between concepts. Humans have a good understanding of such relationships. In this way, we harvest, via common annotation practices, the collective wisdom of users with respect to the (categorization of) web content they share and access. We further define the principles upon which crowdsourced taxonomy construction algorithms should be based. We show that the resulting problem is NP-Hard. We provide heuristic algorithms and relevant optimizations that aggregate human input, resolving conflicting input, and produce taxonomies. Our algorithm's evaluation is based on real-world crowdsourcing experiments (where real users provide such information) and on real-world taxonomies.".
- 164 abstract "The linguistics community is building a metadata-based infrastructure for the description of its research data and tools. At its core is the ISOcat registry, a collaborative platform to hold a (to be standardized) set of data catgories (i.e., field descriptors). Descriptors have definitions in natural language and little explicit interrelations. With the registry growing to many hundred entries, authored by many, it is becoming increasingly apparent that the rather informal definitions and their glossary-like design make it hard for users to grasp, exploit and manage the registry's content. In this paper, we take a large subset of the ISOcat term set and reconstruct from it a tree structure following the footsteps of schema.org. Our ontological re-engineering yields a representation that gives users a hierarchical view of linguistic, metadata-related terminology. The new representation adds to the precision of all definitions by making explicit information which is only implicitly given in the ISOcat registry. It also helps uncovering and addressing potential inconsistencies in term definitions as well as gaps and redundancies in the overall ISOcat term set. The new representation can serve as a complement to the existing ISOcat model, providing additional support for authors and users in browsing, (re-)using, maintaining, and further extending the community's terminological metadata repertoire.".
- 171 abstract "The three most common approaches for deriving or predicting instantiated relations, i.e. triple statements (s, p, o), are information extraction, reasoning and relational machine learning. Information extraction uses sensory information, typically in form of text, and extracts statements using various methods ranging from simple classifiers to the most sophisticated NLP approaches. Logical reasoning is based on a set of true statements and derives new statements via inference using higher-order logical axioms. Finally, machine learning exploits regularities in the data to predict the likelihood of new statements. In this paper we combine all three methods to exploit all sources of available information in a modular way, by which we mean that each approach, i.e., information extraction, reasoning, machine learning, can be optimized independently to be combined in an overall system. For relational machine learning, we present a novel approach based on hierarchical Bayesian multi-label learning which also sheds new light on common factorization approaches. We rank the probabilities for statements to be true in the sense that: given that we are forced to make a decision, what is the best option. We consider the fact that an entity can belong to more than one ontological class and discuss aggregation. We extend the approach to modeling nonlinear dependencies between relationships and for personalization. We validate our model using data from the Yago and the DBpedia ontology.".
- 173 abstract "Assessing Linked Data Mappings using Network Measures Christophe Guéret, Paul Groth, Claus Stadler, and Jens Lehmann Linked Data is at its core about the setting of links between resources. Links provide enriched semantics, pointers to extra information and enable the merging of data sets. However, as the amount of Linked Data has grown, there has been the need to automate the creation of links and such automated approaches can create low-quality links or unsuitable network structures. In particular, it is difficult to know whether the links introduced improve or diminish the quality of Linked Data. In this paper, we present an extensible framework and a series of justified network measures that allow for the assessment of Linked Data mappings. We test the framework on a set of known good and bad links generated by common mapping systems and discuss which network measures show promise for use in assessment.".
- 174 abstract "As the size of Linked Open Data (LOD) increases, the search and access to the relevant LOD resources becomes more challenging. To overcome search difficulties, we propose a novel concept-based search mechanism for the Web of Data (WoD) based on UMBEL concept hierarchy and fuzzy-based retrieval model. The proposed search mechanism groups LOD resources with the same concepts to form categories, which is called concept lenses, for more efficient access to the WoD. To achieve concept-based search, we use UMBEL concept hierarchy for representing context of LOD resources. A semantic indexing model is applied for efficient representation of UMBEL concept descriptions and a novel fuzzy-based retrieval model is introduced for categorization of LOD resources to UMBEL concepts. The proposed fuzzy-based model was evaluated on a particular benchmark (~10,000 mappings). These evaluation results show that we can achieve highly acceptable categorization accuracy and perform better than the vector space model.".
- 176 abstract "Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as Europeana where the original data is abstracted to a common format and schema. Altough this approach ensures a level of consistency and interoperability, the richness of the original data is lost in the process. In this paper, we present a transparent and interactive methodology for ingesting, converting and linking cultural heritage metadata into Linked Data. The methodology is designed to maintain the richness and detail of the original metadata. We introduce the XMLRDF conversion tool and describe how it is integrated in the ClioPatria semantic web toolkit. The methodology and the tools have been validated by converting the Amsterdam Museum metadata to a Linked Data version. In this way, the Amsterdam Museum became the first ‘small’ cultural heritage institution with a node in the Linked Data cloud.".
- 192 abstract "Existing metadata schemes and content management systems used by museums focus on describing the heritage objects that the museum holds in its collection. These are used to manage and describe individual heritage objects according to properties such as artist, date and preservation requirements. Curatorial narratives, such as physical or online exhibitions tell a story that spans across heritage objects and have a meaning that does not necessarily reside in the individual heritage objects themselves. Here we present curate, an ontology for describing curatorial narratives. This draws on structuralist accounts that distinguish the narrative from the story and plot, and also a detailed analysis of two museum exhibitions and the curatorial processes that contributed to them. storyspace, our web based interface and API to the ontology, is being used by curatorial staff in two museums to model curatorial narratives and the processes through which they are constructed.".
- 193 abstract "This paper proposes SCHEMA, an algorithm for automated mapping between heterogeneous product taxonomies in the e-commerce domain. It contributes towards effective aggregation of product information from different sources, in order to reduce search failures in online shopping. SCHEMA utilises word sense disambiguation techniques, based on the ideas from the algorithm proposed by Lesk, in combination with the semantic lexicon WordNet. It introduces a node matching function, based on inclusiveness of the categories in conjunction with the Levenshtein distance for class labels, for finding candidate map categories, and for assessing path-similarity. The final mapping quality score is calculated using the Damerau-Levenshtein distance and a node-dissimilarity penalty. The performance of SCHEMA was tested on three real-life datasets and compared with PROMPT and the algorithm proposed by Park & Kim. It is shown that SCHEMA improves considerably on both recall and F1-score, while maintaining similar precision.".
- 199 abstract "An ontology matching system can usually be run with different configurations that optimize the system's effectiveness, namely precision, recall, or F-measure, depending on the specific ontologies to be aligned. Changing the configuration has potentially high impact on the obtained results. We apply matching task profiling metrics to automatically optimize the system's configuration depending on the characteristics of the ontologies to be aligned. Using machine learning techniques, we can automatically determine the optimal configuration in most cases. Even using a small training set, our system predicts the best configuration in 94% of the cases. Our approach is evaluated using our extensible and configurable ontology matching system AgreementMaker.".
- 20 abstract "Efficient evaluation of complex SPARQL queries is still an open research problem. State-of-the-art engines are based on relational database technologies. We approach the problem from the perspective of Constraint Programming (CP), a technology designed for solving NP-hard problems. Such technology allows us to exploit SPARQL filters early-on during the search instead of as a post-processing step. We propose Castor, a new SPARQL engine based on CP. Castor performs very competitively compared to state-of-the-art engines.".
- 213 abstract "This paper describes POWLA, a formalism to formalize linguistic corpora in OWL/DL. POWLA is based on data models currently developed by the NLP community to overcome the heterogeneity of linguistic annotation (Ide and Pustejovsky 2010), in particular, PAULA, an XML standoff format developed out of early sketches of the Linguistic Annotation Framework (LAF, Ide and Romary 2004) which is currently developed within ISO TC37/SC4. These data models are defined as specializations of directed acyclic (hyper)graphs, and it is claimed that every kind of linguistic annotation can be represented as a directed (hyper)graph (Bird and Liberman 2001). Linguistic corpora can thus be naturally linearized in RDF. Unlike earlier approaches to model generic data models for linguistic annotations by means of Semantic Web standards (e.g., Cassidy 2010), POWLA augments the RDF linearization of linguistic data with a data model formalized in an OWL/DL ontology that defines data types for primary data, annotations and linguistic metadata, as well as consistency constraints on linguistic corpora. Unlike other approaches to model linguistic corpora in OWL/DL (e.g., Burchardt et al. 2008), POWLA is not specific to a particular type of annotation, but it implements a generic data model. This genericity is illustrated here for the conversion of GrAF (the XML linearization of the Linguistic Annotation Format, Ide and Suderman 2007) to POWLA. That POWLA preserves the linguistic information conveyed in the original GrAF data as shown by an experient to emulate ANNIS-QL, a query language specifically designed for heterogeneous and richly annotated linguistic corpora (Chiarcos et al. 2008), by means of SPARQL macros on POWLA data. Finally, the paper identifies advantages and disadvantages of OWL/RDF linearizations of generic data models for linguistic corpora (and in particular, POWLA) as compared to traditional XML standoff formats (Ide and Suderman 2007, Chiarcos et al. 2008).".
- 218 abstract "This paper explores the issue of detecting concepts for ontology learning from text. We investigate various metrics from graph theory and propose various voting schemes based on these metrics. The idea draws its root in social choice theory, and our objective is to mimic consensus in automatic learning methods and increase the confidence in concept extraction through the identification of the best performing metrics, the comparison of these metrics with standard information retrieval metrics (such as TF-IDF) and the evaluation of various voting schemes. Our results show that three graph-based metrics Degree, Reachability and HITS-hub were the most successful in identifying relevant concepts contained in two gold standard ontologies.".
- 231 abstract "The amount of structured data is growing rapidly. Given a structured query that asks for some entities, the number of matching candidate results is often very high. The problem of ranking these results has gained attention. Because results in this setting equally and perfectly match the query, existing ranking approaches often use features that are independent of the query. A popular one is based on the notion of centrality that is derived via PageRank. In this paper, we adopt learning to rank approach to this structured query setting, provide a systematic categorization of query-independent features that can be used for that, and finally, discuss how to leverage information in access logs to automatically derive the training data needed for learning. In experiments using real-world datasets and human evaluation based on crowd sourcing, we show the superior performance of our approach over two relevant baselines.".
- 24 abstract "This paper proposes to apply the RDF framework to the representation of linguistic annotations. We argue that RDF is a suitable data model to capture multiple annotations on the same text segment, and to integrate multiple layers of annotations. Besides the idea of using RDF for this purpose, the main contribution of the paper is an OWL ontology, called TELIX (Text Encoding and Linguistic Information eXchange), which models annotation content. This ontology builds on the SKOS~XL vocabulary, a W3C standard for lexical entities representation as RDF graphs. We extend SKOS in order to capture lexical relations between words (e.g., synonymy), as well as to support word sense disambiguation, morphological features and syntactic analysis, among others. Additionally, a formal mapping of feature structures to RDF graphs is defined, enabling complex composition of linguistic entities. Finally, the paper also suggests the use of RDFa as a convenient syntax that combines source texts and linguistic annotations in the same file.".
- 249 abstract "The Mathematics Subject Classification (MSC), maintained by the American Mathematical Society’s Mathematical Reviews (MR) and FIZ Karlsruhe’s Zentralblatt für Mathematik (Zbl), is a scheme for classifying publications in mathematics according to their subjects. While it is widely used, its traditional, idiosyncratic conceptualization and representation requires custom implementations of search, query and annotation support. This did not encourage people to create and explore connections of mathematics to subjects of related domains (e.g. science), and it made the scheme hard to maintain. We have reimplemented the current version of MSC2010 as a Linked Open Dataset using SKOS and are turning it into the new MSC authority. This paper explains the motivation, and details our design considerations and how we realized them in the implementation. We present in-the-field use cases and show how to scale existing solutions to take full advantage of the now complete LOD set. We conclude with a roadmap for bootstrapping the presence of mathematical and mathematics-based science, technology, and engineering knowledge on the Web of Data, where it has been noticeably underrepresented so far, starting from MSC/SKOS as a seed. We point out how e-science applications can take advantage of that.".