ScholarlyData |

ScholarlyData

Matches in ScholarlyData for { ?s <https://w3id.org/scholarlydata/ontology/conference-ontology.owl#abstract> ?o. }

Showing items 401 to 500 of ±3,700 with 100 items per page.

first
previous
next

285 abstract "In this paper we present Prexto, an algorithm for comput- ing the perfect rewriting of unions of conjunctive queries over ontologies expressed in the description logic DL-LiteA. The main novelty of Prexto lies in the fact that it constitutes the ﰃrst technique for query rewriting over ontologies which fully exploits extensional constraints to optimize query rewriting. In addition, Prexto makes use of functional role axioms and of concept and role disjointness axioms to optimize the size of the rewritten query. We show that these optimizations allow Prexto to out- perform the existing query rewriting techniques for DL-Lite in practical cases.".
31 abstract "The World Wide Web currently evolves into a Web of Linked Data where content providers publish and link data as they have done with hypertext for the last 20 years. While the declarative query language SPARQL is the de facto for querying a-priory defined sets of data from the Web, no language exists for querying the Web of Linked Data itself. However, it seems natural to ask whether SPARQL is also suitable for such a purpose. In this paper we formally investigate the applicability of SPARQL as a query language for Linked Data on the Web. In particular, we study two query models: 1) a full-Web semantics where the scope of a query is the complete set of Linked Data on the Web and 2) a family of reachability-based semantics which restrict the scope to data that is reachable by traversing certain data links. For both models we discuss properties such as monotonicity and computability as well as the implications of querying a Web that is infinitely large due to data generating servers.".
40 abstract "As an essential part of the W3C’s semantic web stack and linked data initiative, RDF data management systems (also known as triplestores) have drawn a lot of research attention. The majority of these systems use value-based indexes (e.g., B+-trees) for physical storage, and ignore many of the structural aspects present in RDF graphs. Structural indexes, on the other hand, have been successfully applied in XML and semi-structured data management to exploit structural graph information in query processing. In those settings, a structural index groups nodes in a graph based on some equivalence criterion, for example, indistinguishability with respect to some query workload (usually XPath). Motivated by this body of work, we have started the SAINT-DB project to study and develop a native RDF management system based on structural indexes. In this paper we present a principled framework for designing and using RDF structural indexes for practical fragments of SPARQL, based on recent formal structural characterizations of these fragments. We then explain how structural indexes can be incorporated in a typical query processing workflow; and discuss the design, implementation, and initial empirical evaluation of our approach.".
41 abstract "The Music Ontology provides a framework for publishing structured music-related data on the Web, ranging from editorial data to temporal annotations of audio signals. It has been used extensively, for example in the DBTune project and on the BBC Music website. Until now it hasn’t been systematically evaluated and compared to other frameworks for handling music-related data. In this article, we design a ‘query-driven’ ontology evalution framework capturing the intended use of this ontology. We aggregate a large set of real-world music-related user needs, and evaluate how much of it is expressible within our ontological framework. This gives us a quantitative measure of how well our ontology could support a system addressing these real-world user needs. We also provide some statistical insights for comparison with related description frameworks and identify areas within the ontology that could be improved.".
47 abstract "OWL 2 is widely used to describe complex objects such as chemical molecules, but cannot represent `structural' features of chemical entities such as having a ring. Adding rules, and description graphs (DGs) has been suggested as a possible solution, but still exhibits several drawbacks. We present a radically different approach that we call Description Graph Logic Programs. Syntactically, our approach combines DGs, rules, and OWL 2 RL axioms, but the semantics is via a translation into logic programs interpreted under stable model semantics. The result is an expressive OWL 2 RL-compatible formalism that is well suited for modelling objects with complex structure.".
48 abstract "Concept recommendation is a widely used technique aimed to assist users to chose the right tags, improve their Web search experience and a multitude of other tasks. In finding potential problem solvers in Open Innovation (OI) scenarios, the concept recommendation is of a crucial importance as it can help to discover the right topics, directly or laterally related to an innovation problem. Such topics then could be used to identify relevant experts. In this paper, we propose two Linked Data-based concept recommendation methods for topic discovery. The first one – called hyProximity - exploits only the particularities of Linked Data structures, while the other one applies a well-known Information Retrieval method – called Random Indexing - to the linked data. We compare the performance of the two methods against the baseline in the gold standard-based and user study-based evaluations, using the real problems and solutions from an OI company.".
49 abstract "Finding collaborators that can complement one’s competence in collaborative research and innovation projects has become important with the advent of multidisciplinary challenges and collaborative R&D in general. In this paper we propose a method for suggesting potential candidates for collaborative solving of innovation challenges online, based on their competence, similarity of interest and social proximity with the user receiving the recommendations. We rely on Linked Data to derive a measure of semantic relatedness that we use to enrich both user profiles and innovation problems with additional relevant topics, thereby improving the performance of co-solver recommendation. We evaluate this approach against state of the art methods for query enrichment based on the distribution of topics in user profiles, and demonstrate its usefulness in recommending collaborators that are both complementary in competence and compatible with the user. Our experiments are grounded using data from the social networking service Twitter.com.".
6 abstract "This paper describes an approach for the task of named entity recognition in structured data containing free text as the values of its elements. We studied the recognition of the entity types of person, location and organization in bibliographic data sets from a concrete wide digital library initiative. Our ap-proach is based on conditional random fields models, using features designed to perform named entity recognition in the absence of strong lexical evidence, and exploiting the semantic context given by the data structure. The evaluation results support that, with the specialized features, named entity recognition can be done in free text in structured data with an acceptable accuracy. Our approach was able to achieve a maximum precision of 0.91 at 0.55 recall and a maximum recall of 0.82 at 0.77 precision. The achieved results were always higher than those obtained with Stanford Named Entity Recognizer, which was developed for well-structured text. We believe this level of quality in named entity recognition allows the use of this approach to support a wide range of information extraction applications in structured data.".
60 abstract "Description Logics -- the logic underpinning the Web Ontology Language OWL -- and rules are currently the most prominent paradigms used for modeling knowledge for the Semantic Web. While both of these approaches are based on classical logic, the paradigms also differ significantly, so that naive combinations result in undesirable properties such as undecidability. Recent work has shown that many rules can in fact be expressed in OWL. In this paper we extend this work to include some types of rules previously excluded. We formally define a set of first order logic rules, C-Rules, which can be expressed within OWL extended with role conjunction. We also show that the use of nominal schemas results in even broader coverage.".
7 abstract "The Linked Open Data continues to grow rapidly, but a limitation of much of the data that is being published is the lack of a semantic description. While there are tools that help users to quickly convert a database into RDF, they do not provide a way to easily map the data into an existing ontology. This paper presents an approach that allows users to interactively map their structured sources into an existing ontology and then use that mapping to generate RDF triples. This approach automatically generates a mapping from the data source into the ontology, but since the precise mapping is sometimes ambiguous, we allow the user to interactively refine the mappings. We implemented this approach in a system called Karma, and demonstrate that the system can map sources into an ontology with minimal user interaction and efficiently generate the corresponding RDF.".
73 abstract "The automated extraction of information from text and its transformation into a formal description is an important goal of in both Semantic Web research and computational linguistics. The extracted information can be used for a variety of tasks such as ontology generation, question answering and information retrieval. LODifier is an approach that combines deep semantic analysis with named entity recognition, word-sense disambiguation and controlled Semantic Web vocabularies in order to extract named entities and relations between them from text and to convert them into an RDF representation which is linked to DBpedia and WordNet. We present the architecture of our tool and discuss design decisions made. Evaluations of the tool give clear evidence of its potential for tasks like information extraction and computing document similarity.".
78 abstract "Online communities are prime sources of information. The Web is rich with forums and Question Answering (Q&A) communities where people go to seek answers to all kinds of questions. Most systems employ manual answer-rating procedures to encourage people to provide quality answers and to help users locate the best answers in a given thread. However, in the datasets we collected from three online communities, we found that half their threads lacked best answer markings. This stresses the need for methods to assess the quality of available answers to: 1) provide automated ratings to fill in for, or support, manu- ally assigned ones, and; 2) to assist users when browsing such answers by filtering in potential best answers. In this paper, we collected data from three online communities and converted it to RDF based on the SIOC ontology. We then explored an approach for predicting best answers us- ing a combination of content, user, and thread features. We show how the influence of such features on predicting best answers differs across communities. Further we demonstrate how certain features unique to some of our community systems can boost predictability of best answers.".
84 abstract "We tackle the problem of high cost associated with generating Linked Government Data (LGD). Our approach is centered around the idea of "self-service LGD". The self-service approach is enabled through an end-to-end publishing pipeline that enables transforming raw government data into interlinked RDF. The cost reduction involved is achieved through: (i) shifting the burden of Linked Data conversion towards the data consumer (ii) integrating all the steps involved into a single workbench (iii) providing graphical user interfaces (iv) enabling result sharing with sufficient provenance information. We present the implementation of the publishing pipeline and describe its application to a local government catalogue in Ireland resulting in a significant amount of Linked Data published.".
85 abstract "As commonly accepted identifiers for data instances in semantic datasets (such as ISBN codes or DOI identifiers) are often not available, discovering links between overlapping datasets on the Web is generally realised through the use of fuzzy similarity measures. Configuring such measures, i.e. deciding which similarity function to apply to which data properties with which parameters, is often a non-trivial task that depends on the domain, ontological schemas, and formatting conventions in data. Existing solutions either rely on the user's knowledge of the data and the domain or on the use of machine learning to discover these parameters based on training data. In this paper, we present a novel approach to tackle the issue of data linking which relies on the unsupervised discovery of the required similarity parameters. Instead of using labeled training data, the method takes into account several desired properties which the distribution of output similarity values should satisfy. The method includes these features into a fitness criterion used in a genetic algorithm to establish similarity parameters that maximise the quality of the resulting linkset according to the considered properties. We show in experiments using benchmarks as well as real-world datasets that such an unsupervised method can reach the same levels of performance as manually engineered methods, and how the different parameters of the genetic algorithm and the fitness criterion affect the results for different datasets.".
89 abstract "The Web content no longer consists of only general text documents, but increasingly structure domain specific data published in the Linked Open Data (LOD) cloud. Data collections in this cloud are, by definition, from dif- ferent domains and indexed with domain specific ontologies and schemas. Such data representation requires retrieval methods that can operate on structured data and semantic feature spaces and remain effective even for small domain specific collections. Unlike previous research, that has concentrated on extending text search by using ontologies as a source for query expansion, we introduce a re- trieval framework based on the well known vector space model of information retrieval to fully support retrieval for Semantic Web data described in Resource Description Framework (RDF) language. We propose an indexing structure, a ranking method, and a way to incorporate reasoning and query expansion in the framework. We evaluate the approach in ad-hoc search using a cultural heritage data collection. Compared to a baseline, experimental results show up to 77% improvement when a combination of reasoning and query expansion is used.".
118 abstract "We introduce a technique to determine implicit information in an RDF graph. In addition to taxonomic knowledge about concepts and properties typically expressible in languages such as RDFS and OWL, we focus on knowledge determined by arithmetic equations. The main use case is exploiting knowledge about functional dependencies among numerical properties expressible by means of such arithmetic equations. While some of this knowledge is expressible for instance in rule extensions to ontology languages, we provide a more flexible framework that treats property equations as first class citizens in the ontology language. The combination of ontological reasoning and property equations is realized by extending query rewriting techniques already successfully applied for ontology languages such as (the DL-fragment of) RDFS, or also OWL QL, respectively. We deploy this technique for rewriting SPARQL queries and discuss the feasibility of alternative implementations, such as rule-based approaches.".
130 abstract "DBpedia is a project aiming to represent Wikipedia content in RDF triples. It plays a central role in the Semantic Web, due to the large and growing number of resources linked to it. Nowadays, the English version covers around 1.7M Wikipedia pages, although the English Wikipedia contains almost 4M pages, showing a clear problem of coverage. In other languages (like French and Spanish) the coverege is even lower. The objective of this paper is to define a methodology to increase the coverage of DBpedia in different languages. The major problems that we have to solve concern the high number of classes involved in the DBpedia ontology and the lack of coverage for some classes in certain languages. In order to deal with these problems, we first extend the population of the classes for the different languages by connecting the corresponding Wikipedia pages through cross-language links. Then, we train a supervised classifier using this extended set as training data. We evaluated our system using a manually annotated test set, demonstrating that our approach can add around 1.7M new entities to DBpedia with high precision (90%) and recall (50%). The resulting resource will be made available through a SPARQL endpoint and a downloadable package.".
156 abstract "For community managers and hosts it is not only important to identify the current key topics of a community but also to assess the specificity level of the community for: a) creating sub-communities, and: b) anticipating community behaviour and topical evolution. In this paper we present an approach that empirically characterises the topical specificity of online community forums by measuring the abstraction of semantic concepts discussed within such forums. We present a range of concept abstraction measures that function over concept graphs - i.e. resource type-hierarchies and SKOS category structures - and demonstrate the efficacy of our method with an empirical evaluation using a ground truth ranking of forums. Our results show that the proposed approach outperforms a random baseline and that resource type-hierarchies work well when predicting the topical specificity of any forum with various abstraction measures.".
16 abstract "In the Web, wiki-like platforms allow the users to provide arguments in favor or against issues proposed by the other users. The increasing content of these wiki pages as well as the high number of revisions of these pages through pros and cons arguments make it difficult for community managers to understand and manage these discussions. In this paper, we propose an automatic framework to support the management of argumentative discussions in wiki-like platforms. Our framework is composed by (i) a natural language module, which automatically detects the arguments in natural language returning the relations among them, and (ii) an argumentation module, which provides the overall view of the argumentative discussion under the form of a directed graph highlighting the accepted arguments. Experiments on the history of Wikipedia show the feasibility of our approach.".
180 abstract "In the last years, basic NLP tasks: NER, WSD, relation extraction, etc. have been configured for Semantic Web tasks including ontology learning, linked data population, entity resolution, NL querying to linked data, etc. Some assessment of the state of art of existing Knowledge Extraction (KE) tools when applied to the Semantic Web is then desirable. In this paper we describe a landscape analysis of several tools, either conceived specifically for KE on the Semantic Web, or adaptable to it, or even acting as aggregators of extracted data from other tools. Our aim is to assess the currently available capabilities against a rich palette of ontology design constructs, focusing specifically on the actual semantic reusability of KE output.".
187 abstract "In this paper, we present an approach for extending the existing concept of nanopublications --- tiny entities of scientific results in RDF representation --- to broaden their application range. The proposed extension uses English sentences to represent informal and underspecified scientific claims. These sentences follow a syntactic and semantic scheme that we call AIDA, which provides a uniform and succinct representation of scientific assertions. Such AIDA nanopublications are compatible with the existing nanopublication concept and enjoy most of its advantages such as information sharing, interlinking of scientific findings, and detailed attribution, while being more flexible and applicable to a much wider range of scientific results. We show that users are able to create AIDA sentences for given scientific results quickly and at high quality, and that it is feasible to automatically extract and interlink AIDA nanopublications from existing unstructured data sources. To demonstrate our approach, a web-based interface is introduced, which also exemplifies the use of nanopublications for non-scientific content, including meta-nanopublications that describe other nanopublications.".
204 abstract "Statistics published as Linked Data promise efficient extraction, transformation and loading (ETL) into a database for decision support. The predominant way to implement analytical query capabilities in industry are specialised engines that translate OLAP queries to SQL queries on a relational database using a star schema (ROLAP). A more direct approach than ROLAP is to load Statistical Linked Data into an RDF store and to answer OLAP queries using SPARQL. However, we assume that general-purpose triple stores -- just as typical relational databases -- are no perfect fit for analytical workloads and need to be complemented by OLAP-to-SPARQL engines. To give an empirical argument for the need of such an engine, we first compare our generated SPARQL to ROLAP SQL queries in terms of performance. Second, we measure the performance gain of RDF aggregate views that, similar to aggregate tables in ROLAP, materialise part of the data cube.".
205 abstract "Linked data has experienced accelerated growth in recent years. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets, called Rule Based Compression (RB Compression) that compresses datasets by generating a set of new logical rules from the dataset and removing triples that can be inferred from these rules. Unlike other compression techniques, our approach not only takes advantage of syntactic verbosity and data redundancy but also utilizes semantic associations present in the RDF graph. Depending on the nature of the dataset, our system is able to prune more than 50% of the original triples without affecting data integrity.".
206 abstract "We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. An RDF model supports interoperability of these steps, and also provides a flexible way of including existing NLP tools and further knowledge sources. From 2000 news articles we construct a custom taxonomy with 10,000 concepts and 12,700 relations, similar in structure to manually created counterparts. Evaluation by 15 human judges shows the precision to be 89% and 90% for concepts and relations respectively; recall was 75% with respect to a manually generated taxonomy for the same domain.".
207 abstract "The Linked Open Data (LOD) cloud contains tremendous amounts of interlinked instances, from where we can retrieve abundant knowledge. However, because of the heterogeneous and big ontologies, it is time consuming to learn all the ontologies manually and it is difficult to observe which properties are important for describing instances of a specific class. In order to construct an ontology that can help users easily access various data sets, we propose a semi-automatic ontology integration framework that can reduce the heterogeneity of ontologies and retrieve frequently used core properties for each class. The framework consists of three main components: graph-based ontology integration, machine learning based ontology schema extraction, and an ontology merger. By analyzing instances of the linked data sets, this framework acquires ontological knowledge and construct a high-quality integrated ontology, which is easily understandable and effective in knowledge acquisition from various data sets using simple SPARQL queries.".
209 abstract "In this paper we present the design of the Dynamic Linked Data Observatory: a long-term experiment to monitor the two-hop neighbourhood of a core set of one hundred thousand diverse Linked Data documents on a weekly basis. We present the methodology used for sampling the URIs to monitor, retrieving the documents, and further crawling part of the two-hop neighbourhood. Having now run this experiment for six months, we analyse the dynamics of the monitored documents over the data collected thus far. We look at the estimated lifespan of the core documents, how often they go on-line or off-line, how often they change; we further investigating domain-level trends. Next we look at changes within the RDF content of the core documents across the weekly snapshops, examining the elements (i.e., triples, subjects, predicates, objects, classes) that are most frequently added or removed. Thereafter, we look at how the links between dereferenceable documents evolves over time in the two-hop neighbourhood.".
219 abstract "With the advent of publicly available geospatial data, ontology-based data access (OBDA) over spatial data has gained increasing interest. Spatio-relational DBMSs are used to implement geographic information systems (GIS) and are fit to manage large amounts of data and geographic objects such as points, lines, polygons, etc. In this paper, we extend the Description Logic DL-Lite with spatial objects and show how to answer spatial conjunctive queries (SCQs) over ontologies---that is, conjunctive queries with point-set topological relations such as next and within---expressed in this language. The goal of this extension is to enable an off-the-shelf use of spatio-relational DBMSs to answer SCQs using rewriting techniques, where data sources and geographic objects are stored in a database and spatial conjunctive queries are rewritten to SQL statements with spatial functions. Furthermore, we consider keyword-based querying over spatial OBDA data sources; as ordinary web users are not familiar with formal query languages and the less with SCQ, this is a reasonable and common way to express a search intention. We show how to map queries expressed as simple lists of keywords that describe objects of interest to SCQs, using a meta-model for completing the SCQs with spatial aspects. We have implemented our lightweight approach to spatial OBDA in a prototype and show initial experimental results using data sources such as Open Street Maps and Open Government Data Vienna from an associated project. We show that for real-world scenarios, practical queries are expressible under meta-model completion, and that query answering is computationally feasible.".
34 abstract "Publicly available Linked Data repositories provide a multitude of information. By utilizing SPARQL, Web sites and services can consume this data and present it in a user-friendly form, e.g., in mash-ups. To gather RDF triples for this task, machine agents typically issue similarly structured queries with recurring patterns against the SPARQL endpoint. These queries usually differ only in a small number of individual triple pattern parts, such as resource labels or literals in objects. We present an approach to detect such recurring patterns in queries and introduce the notion of query templates, which represent clusters of similar queries exhibiting these recurrences. We describe a matching algorithm to extract query templates and illustrate the benefits of prefetching data by utilizing these templates. Finally, we comment on the applicability of our approach using results from real-world SPARQL query logs.".
40 abstract "With the increased use of ontologies in semantically-enabled applications, the issues of debugging and aligning ontologies has become increasingly important. The quality of the results of such applications is directly dependent on the quality of the ontologies and mappings between the ontologies they employ. A key step towards achieving high quality ontologies and mappings is discovering and resolving modeling defects, such as wrong or missing relations and mappings. In this paper we present a unified framework for aligning taxonomies, the most used kind of ontologies, and debugging taxonomies and their alignments, where ontology alignment is treated as a special kind of debugging. Our framework supports the detection and repairing of missing and wrong is-a structure in taxonomies, as well as the detection and repairing of missing (alignment) and wrong mappings between ontologies. Further, we implemented a system based on this framework and demonstrate its benefits through an experiment with ontologies from the Ontology Alignment Evaluation Initiative.".
43 abstract "In this paper, we describe a method for predicting the understandability level of inferences with OWL. Specifically, we present a model for measuring the understandability of a multiple-step inference based on the measurement of the understandability of individual inference steps. We also present an evaluation study which confirms that our model works relatively well for two-step inferences with OWL. This model has been applied in our research on generating accessible explanations for an entailment of OWL ontologies, to determine the most understandable inference among alternatives, from which the final explanation is generated.".
45 abstract "Despite unified data models, such as Resource Description Framework (Rdf) on structural level and the corresponding query language SPARQL, the integration and usage of Linked Open Data faces major heterogeneity challenges on the semantic level. Incorrect use of ontology concepts and class properties impede the goal of machine readability and knowledge discovery. For example, users searching for movies with a certain artist cannot rely on a single given property "artist", because some movies may be connected to that "artist " by the predicate "starring". In addition, the information need of a data consumer may not always be clear and her interpretation of given schemata may differ from the intentions of the ontology engineer or data publisher. It is thus necessary to either support users during query formulation or to incorporate implicitly related facts through predicate expansion. To this end, we introduce a data-driven synonym discovery algorithm for predicate expansion. We applied our algorithm to various data sets as shown in a thorough evaluation of different strategies and rule-based techniques for this purpose.".
52 abstract "With a growing number of ontologies used on the semantic web, agents can fully make sense of different datasets only if correspondences between those ontologies are known. Ontology matching tools have been proposed to find such correspondences. While the current research focus is mainly on fully automatic matching tools, some approaches have been proposed that involve the user in the matching process. However, there are currently no benchmarks and test methods to compare such tools. In this paper, we introduce a number of quality measures for interactive ontology matching tools, and we discuss means to automatically run benchmark tests for such tools. To demonstrate those evaluations, we show examples on assessing the quality of interactive matching tools which involve the user in matcher selection and matcher parametrization.".
66 abstract "We describe a semantic wiki system with an underlying controlled natural language grammar implemented in Grammatical Framework. The grammar restricts the wiki users into a well-defined subset of Attempto Controlled English (ACE) making the wiki content automatically translatable into the Web Ontology Language (OWL) to enable automatic reasoning over the wiki content. Additionally, the grammar facilitates a precise bidirectional automatic translation between ACE and language fragments of a number of other natural languages, to provide a multilingual interface to the wiki. The developed wiki environment thus allows users to build, query and view OWL knowledge bases via a user-friendly multilingual natural language interface. The underlying multilingual grammar is integrated into the wiki itself and can be collaboratively edited to extend the vocabulary of the wiki and modify its multilingual interface. This work demonstrates the combination of the existing technologies of Attempto Controlled English and Grammatical Framework, and is implemented as an extension of the existing ACE-based semantic wiki engine AceWiki.".
68 abstract "Schema information about resources in the Linked Open Data (LOD) cloud can be provided in a twofold way: it can be explicitly defined by attaching RDF types to the resources. Or it is provided implicitly via the definition of the resources’ properties. In this paper, we analyse the information theoretic proper- ties and the correlation between the two manifestations of schema information. To this end, we have extracted schema information regarding the types and prop- erties defined in the datasets segments provided for the Billion Triples Challenge 2012. We have conducted an in depth analysis and have computed various entropy measures as well as the mutual information encoded in the two types of schema information. Our analysis provides insights into the information encoded in the different schema characteristics. Two major findings are that implicit schema in- formation is far more discriminative and that a schema based on either types or properties alone will only capture between 63.5% and 88.1% of the schema infor- mation contained in the data. Based on these observations, we derive conclusions about the design of future schemas for LOD as well as potential application sce- narios.".
78 abstract "With the ever-growing amount of RDF data available across the Web, the discovery of links across datasets and deduplication of resources within knowledge bases have become tasks of central importance. Over the last years, several link discovery approaches have been developed to tackle the runtime and complexity problems that are intrinsic to link discovery. Yet, so far, the management of hardware resources for the execution of link discovery tasks has been payed little attention to. This paper aims to address exactly this research gap by investigating the use of hardware resources for link discovery. We implement the HR3 approach within three different paradigms of parallel computing. Based on a comparison of the runtimes of three different implementations, we address the following question: Under which conditions should which hardware be used to link or deduplicate knowledge bases? Our results show that certain tasks that seem to be predestined to being carried out in the cloud can actually be ran using standard massively parallel hardware. Moreover, our evaluation provides break-even points that can serve as guidelines for deciding on when to use which hardware for link discovery.".
82 abstract "Semantic analysis and annotation of textual information with appropriate semantic entities is an essential task to enable semantic search on the annotated data. For video resources textual information is rare at first sight. But in recent years the development of technologies for automatic extraction of textual information from audio visual content has advanced. Additionally, video portals allow videos to be annotated with tags and comments by authors as well as users. All this information taken together forms video metadata which is manyfold in various ways. By making use of the characteristics of the different metadata types context can be created to enable sound semantic analysis and to support accuracy of understanding the video's content. This paper proposes a description model for semantic analysis on video metadata taking into account different contextual factors.".
93 abstract "Ontology classification is the reasoning service that computes all subsumption relationships inferred in an ontology between concept, role, and attribute names in the ontology signature. OWL 2 QL is a tractable profile of OWL 2 for which ontology classification is polynomial in the size of the ontology TBox. However, to date, no efficient methods and implementations specifically tailored to OWL 2 QL ontologies have been developed. In this paper, we provide a new algorithm for ontology classification in OWL 2 QL, which is based on the idea of encoding the ontology TBox into a directed graph and reducing core reasoning to computation of the transitive closure of the graph. We have implemented the algorithm in the Quonto reasoner and extensively evaluated it over very large ontologies. Our experiments show that Quonto outperforms various popular reasoners in classification of OWL 2 QL ontologies.".
12 abstract "Sesam is an archive system developed for Hafslund, a Norwegian energy company. It achieves the often-sought but rarely-achieved goal of automatically enriching metadata by using semantic technologies to extract and integrate business data from business applications. The extracted data is also indexed with a search engine together with the archived documents, allowing true enterprise search.".
158 abstract "Museums around the world have built databases with metadata about millions of objects, their history, the people who created them, and the entities they represent. This data is stored in proprietary databases and is not readily available for use. Recently, museums embraced the Semantic Web as a means to make this data available to the world, but the experience so far shows that publishing museum data to the Linked Data cloud is difficult: the databases are large and complex, the information is richly structured and varies from museum to museum, and it is difficult to link the data to other datasets. This paper describes the lessons learned in publishing the data from the Smithsonian American Art Museum (SAAM). We highlight complexities of the database-to-RDF mapping process, discuss our experience linking the SAAM dataset to hub datasets such as DBpedia, and present our experience in allowing SAAM personnel to review the information to verify that it meets the high standards of the Smithsonian. Using our tools, we helped SAAM publish 5-star Linked Data of their complete holdings (41,000 objects, 8,000 artists), richly linked to DBpedia and the Getty Union List of Artist Names (ULAN), and verified to be of high quality.".
59 abstract "======== Abstract ======== The BnF_ (French national library) sees Semantic Web technologies as an opportunity to weave its data into the Web and to bring structure and reliability to existing information. The BnF is one of the most important heritage institutions in France, with a history going back to the 14th century and millions of documents, including a large variety of hand-written, printed and digital material, through millions of bibliographic records. Linked Open Data tools have been implemented through `data.bnf.fr`_, a project which aims at making the BnF data more useful on the Web. `data.bnf.fr`_ publishes data automatically merged from different in-house databases describing authors, works and themes. These concepts are given persistent URIs, as they are the nodal points to our resources and services. We provide **different views of the same information**: HTML and PDF views for humans and raw data in RDF and JSON for machines. This data is freely reusable under an **Open Licence**. The site, powered by the **open source platform** CubicWeb_, queries a **relational database** to generate both HTML and RDF data. Available online since July 2011, this service is under continuous development with several releases per year. After having gathered feedback from the public and users, we are now in a position to report on this use of Semantic Web technologies. The article will describe the features available to the end user and explain what difficulties where faced when implementing them. Obstacles where of many kinds: structure of the source data, business processes, information technology, etc. We will explain the importance of useful links for provenance information, and of persistency for archival purposes. We will discuss the methodology and the solutions we found, trying to show their strengths and weaknesses. We will summarize the happy or critical comments that were made by our external users and project team, both about the workflow and about the data model. We will share our insight on the ontologies and vocabularies we are using, including our view of the interaction between rich RDF-based ontologies, and light HTML embedded data such as `schema.org`_. The broader question of Libraries on the Semantic Web will be addressed so as to help run similar projects. .. _BnF: http://bnf.fr .. _CubicWeb: http://www.cubicweb.org .. _`schema.org`: http://schema.org .. _`data.bnf.fr`: http://data.bnf.fr".
81 abstract "Evolving complex artifacts as multilingual ontologies is a difficult activity demanding for the involvement of different roles and for guidelines to drive and coordinate them. We present the methodology and the underlying tool that have been used in the context of the Organic.Lingua project for the collaborative evolution of the multilingual Organic Agriculture ontology. Findings gathered from a quantitative and a qualitative evaluation of the experience are reported, revealing the usefulness of the methodology used in synergy with the tool.".
87 abstract "To date, the automatic exchange of product information between business partners in a value chain is typically done using Business-to-Business (B2B) catalog standards such as EDIFACT, cXML, or BMEcat. At the same time, the Web of Data, in particular the GoodRelations vocabulary, offers the necessary means to publish highly-structured product data in a machine-readable format. The advantage of the publication of rich product descriptions can be manifold, including better integration and exchange of information between Web applications, high-quality data along the various stages of the value chain, or the opportunity to provide more precise and more effective searches. In this paper, we (1) stress the importance of rich product master data for e-commerce on the Semantic Web, and (2) present a tool to convert BMEcat XML data sources into an RDF/XML data model anchored in the GoodRelations vocabulary. The benefits of our proposal are tested using product data collected from a set of 2625 on-line retailers of disparate sizes and domains.".
16 abstract "In many applications one has to fetch and assemble pieces of information coming from more than one web sources such as SPARQL endpoints. In this paper we describe the corresponding requirements and challenges, and then we present a process and a tool that we have developed (called MatWare) for constructing such semantic warehouses. We focus on domain-specific warehouses, where the focus is given on the aspects of quality, value and freshness. Subsequently, we present MatWare (MATerialized WAreshouse), a tool that we have developed for automating the construction of such warehouses, for assessing their value and for automating their reconstruction. Finally we report our experiences from using it for building, maintaining and evolving an operational semantic warehouse for the marine domain, that is currently in use by several applications ranging from e-infrastructure services to smart phone applications.".
189 abstract "R2RML defines a language to express mappings from relational data to RDF. That way, applications built on top of the W3C Semantic Technology stack can seamlessly integrate relational data using those mappings. A major obstacle to using R2RML, though, is the effort for manually curating the mappings. In particular in scenarios that aim to map data from huge and complex relational schemata to more abstract ontologies efficient ways to support the mapping creation are needed. In previous work we presented a mapping editor that aims to reduce the effort that users need to invest for mapping creation. While assisting users in mapping construction the editor also imposed a fixed editing approach, which turned out to be not optimal for all users and all kinds of mapping tasks. Most prominently, it is unclear on which of the two data models users should best start with the mapping construction. In this paper, we present the results of a comprehensive user study that evaluates different alternative editing approaches for constructing R2RML mapping rules. In the user study we measure the efficiency and quality of the mapping construction to find out which approach works better for users with different background knowledge and for different types of tasks in order to extend our editor.".
193 abstract "Recommender systems are an important technology component for many e-commerce applications. In short, they are technical means that suggest potentially relevant products and services to the users of a Web site, typically a shop. The recommendations are computed in advance or during the actual visit and use various types of data as input, in particular past purchases and the purchasing behavior of other users with similar preferences. One major problem with recommender systems is that the quality of recommendations depends on the amount, quality, and representativeness of the in-formation about items already owned by the visitor, e.g. from past purchases at that particular shop. For first-time visitors and customers migrating from other merchants, the amount of available information is often too small to generate good recommendations. In other words, shopping history data for a single user is fragmented and spread over multiple sites, and cannot be actively exposed by the user himself to additional shops. This creates strong lock-in effects for consumers, because they cannot migrate from one merchant to another without losing access to high-quality recommendations, and it creates strong market entry barriers for new merchants, since they do not have access to the customers’ shopping history with other shops. In this paper, we propose to use Semantic Web technology, namely GoodRelations and schema.org, to empower e-commerce customers to (1) collect and manage ownership information about products, (2) advertise interest in sharing this information with shop sites in exchange for better recommendations or oth-er incentives, and (3) expose the information to such shop sites directly from their browser. We then sketch how a shop site could use the ownership infor-mation to propose relevant products.".
205 abstract "Predictive reasoning, or the problem of estimating future observations given some historical information, is an important inference task for obtaining insight on cities and supporting efficient urban planning. This paper, focusing on transportation, presents how severity of road traffic congestion can be predicted using semanticWeb technologies. In particular we present a system which integrates numerous sensors (exposing heterogenous, exogenous and raw data streams such as weather information, road works, city events or incidents) to improve accuracy and consistency of traffic congestion prediction. Our prototype of semantics-aware prediction, being used and experimented currently by traffic controllers in Dublin City Ireland, works efficiently with real, live and heterogeneous stream data. The experiments have shown accurate and consistent prediction of road traffic conditions, main benefits of the semantic encoding.".
224 abstract "Analysts spend a disproportionate amount of time with financial data curation before they are able to compare company performances in an analysis. The Extensible Business Reporting Language (XBRL) for annotating financial facts is suited for automatic processing to increase information quality in financial analytics. Still, XBRL does not solve the problem of data integration as required for a holistic view on companies. Semantic Web technologies promise benefits for financial data integration, yet, existing literature lacks concrete case studies. In this paper, we present the Financial Information Observation System (FIOS) that uses Linked Data and multidimensional modelling based on the RDF Data Cube Vocabulary for accessing and representing relevant financial data. FIOS fulfils the information seeking mantra of ``overview first, zoom and filter, then details on demand'', integrates yearly and quarterly balance sheets, daily NASDAQ stock quotes as well as company and industry background information and helps analysts creating their own analyses with Excel-like functionality.".
244 abstract "The classification of products and services enables reliable and efficient electronic exchanges of product data across organizations. Many companies classify products (a) according to generic or industry-specific product classification standards, or (b) by using proprietary category systems. Such classification systems often contain thousands of product classes that are updated at regular intervals. This implies a large quantity of useful information which the e-commerce Web of Data could immediately benefit from. Thus, instead of building up product ontologies from scratch, which is costly, tedious, error-prone, and high-maintenance, it is generally easier to derive them from existing classifications. In this paper, we (1) describe a generic, semi-automated method for deriving OWL ontologies from product classification standards and proprietary category systems. Moreover, we (2) show that our approach generates logically and semantically correct vocabularies, and (3) present the practical benefit of our approach. The resulting product ontologies are compatible with the GoodRelations vocabulary for e-commerce and with schema.org and can be used to enrich product and offer descriptions on the Semantic Web, allowing for better product searches.".
31 abstract "Most libraries use the machine-readable cataloguing (MARC) format to encode and exchange metadata about the items they make available to their patrons. Traditional library systems have not published this data on the Semantic Web. However, some agile open source library systems have begun closing this gap by publishing structured data that uses the schema.org vocabulary to describe the bibliographic data, make offers for items available for loan, and link the items to their owning libraries. This article distills the lessons learned from implementing structured data in Evergreen, Koha, and VuFind; highlights emerging design patterns for publishing structured data in other library systems; and traces the influence these implementation experiences have had on the evolution of the schema.org vocabulary. Finally, we discuss the impact that "the power of the default" publishing of structured data could have on discoverability of library offerings on the Semantic Web.".
36 abstract "The Web democratized publishing -- everybody can easily publish information on a Website, Blog, in social networks or microblogging systems. The more the amount of published information grows, the more important are technologies for accessing, analysing, summarising and visualising information. While substantial progress has been made in the last years in each of these areas individually, we argue, that only the intelligent combination of approaches will make this progress truly useful and leverage further synergies between techniques. In this paper we develop a text analytics architecture of participation, which allows ordinary people to use sophisticated NLP techniques for analysing and visualizing their content, be it a Blog, Twitter feed, Website or article collection. The architecture comprises interfaces for information access, natural language processing and visualization. Different exchangeable components can be plugged into this architecture, making it easy to tailor for individual needs. We evaluate the usefulness of our approach by comparing both the effectiveness and efficiency of end users within a task-solving setting. Moreover, we evaluate the usability of our approach using a questionnaire-driven approach. Both evaluations suggest that ordinary Web users are empowered to analyse their data and perform tasks, which were previously out of reach.".
47 abstract "Knowledge Discovery in Databases (KDD) has evolved significantly over the past years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. User support for building data analysis workflows, however, has not progressed sufficiently: the large number of operators currently available in KDD systems and interactions between these operators complicates successful data analysis. To help Data Miners we enhanced one of the most used open source data mining tools---RapidMiner---with semantic technologies. Specifically, we first annotated all elements involved in the Data Mining (DM) process---the data, the operators, models, data mining tasks, and KDD workflows---semantically using our eProPlan modelling tool that allows to describe operators and build a task/method decomposition grammar to specify the desired workflows embedded in an ontology. Second, we enhanced RapidMiner to employ these semantic annotations to actively support data analysts. Third, we built an Intelligent Discovery Assistant, eIDA, that leverages the semantic annotation as well as HTN planning to automatically support KDD process generation. We found that the use of Semantic Web approaches and technologies in the KDD domain helped us to lower the barrier to data analysis. We also found that using a generic ontology editor overwhelmed KDD-centric users. We, therefore, provided them with problem-centric extensions to protege. Last and most surprising, we found that our semantic modeling of the KDD domain served as a rapid prototyping approach for several hard-coded improvements of RapidMiner, namely correctness checking of workflows and quick-fixes, reinforcing the finding that even a little semantic modeling can go a long way in improving the understanding of a domain even for domain experts.".
106 abstract "To make digital resources on the web verifiable, immutable, and permanent, we propose a technique to include cryptographic hash values in URIs. We call them trusty URIs and we show how they can be used for approaches like nanopublications to make not only specific resources but their entire reference trees verifiable. Digital artifacts can be identified not only on the byte level but on more abstract levels such as RDF graphs, which means that resources keep their hash values even when presented in a different format. Our approach sticks to the core principles of the web, namely openness and decentralized architecture, is fully compatible with existing standards and protocols, and can therefore be used right away. Evaluation of our reference implementations shows that these desired properties are indeed accomplished by our approach, and that it remains practical even for very large files.".
108 abstract "Entity coreference is important to Linked Data integration. User involvement is considered as a valuable source of human knowledge that helps identify coreferent entities. However, the quality of user involvement is not always satisfying, which significantly diminishes the coreference accuracy. In this paper, we propose a new approach called coCoref, which leverages distributed human computation and consensus partition for entity coreference. Consensus partition is used to aggregate all distributed user-judged coreference results and resolve their disagreements. To alleviate user involvement, ensemble learning is performed on the consensus partition to automatically identify coreferent entities that users have not judged. We integrate coCoref into an online Linked Data browsing system, so that users can participate in entity coreference with their daily Web activities. Our empirical evaluation shows that coCoref largely improves the accuracy of user-judged coreference results, and reduces user involvement by automatically identifying a large number of coreferent entities.".
118 abstract "Processing streams rather than static files of Linked Data has gained increasing importance in the web of data. When processing data streams system builders are faced with the conundrum of guaranteeing a constant maximum response time with limited resources and, possibly, no prior information on the data arrival frequency. One approach to address this issue is to delete data from a cache during processing – a process we call eviction. The goal of this paper is to show that data-driven eviction outperforms today’s dominant data-agnostic approaches such as first-in-first-out or random deletion. Specifically, we first introduce a method called Clock that evicts data from a join cache based on the likelihood estimate of contributing to a join in the future. Second, using the well-established SR-Bench benchmark as well as a data set from the IPTV domain, we show that Clock outperforms data-agnostic approaches indicating its usefulness for resource limited linked data stream processing.".
124 abstract "Description Logics have been extensively studied from the viewpoint of decidability and computational tractability. Less attention has been given to their usability and the cognitive difficulties they present, in particular for those who are not specialists in logic. This paper reports on a study into the difficulties associated with the most commonly used Description Logic features. Psychological theories are used to take account of these. Whilst most of the features presented no difficulty to participants, the comprehension of some was affected by commonly occurring misconceptions. The paper proposes explanations and remedies for some of these difficulties. In addition, the time to confirm stated inferences was found to depend both on the maximum complexity of the relations involved and the number of steps in the argument.".
127 abstract "We present Dedalo, a framework which is able to exploit Linked Data to generate explanations for clusters. In general, any result of a Knowledge Discovery process, including clusters, is interpreted by human experts who use their background knowledge to explain them. However, for someone without such expert knowledge, those results may be difficult to understand. Obtaining a complete and satisfactory explanation becomes a laborious and time-consuming process, involving expertise in possibly different domains. Having said so, not only does the Web of Data contain vast amounts of such background knowledge, but it also natively connects those domains. While the efforts put in the interpretation process can be reduced with the support of Linked Data, how to automatically access the right piece of knowledge in such a big space remains an issue. Dedalo is a framework that dynamically traverses Linked Data to find commonalities that form explanations for items of a cluster. We have developed different strategies (or heuristics) to guide this traversal, reducing the time to get the best explanation. In our experiments, we compare those strategies and demonstrate that Dedalo finds relevant and sophisticated Linked Data explanations from different areas.".
133 abstract "Non expert users need support to access linked data available on the Web. To this aim, keyword-based search is considered an essential feature of database systems. The distributed nature of the Semantic Web demands query processing techniques to evolve towards a scenario where data is scattered on distributed data stores. Existing approaches to keyword search cannot guarantee scalability in a distributed environment, because, at runtime, they are unaware of the location of the relevant data to the query and thus, they cannot optimize join tasks. In this paper, we illustrate a novel distributed approach to keyword search over RDF data that exploits the MapReduce paradigm by switching the problem from graph-parallel to data-parallel processing. Moreover, our framework is able to consider ranking during the building phase to return directly the best (top-k) answers in the first (k) generated results, reducing greatly the overall computational load and complexity. Finally, a comprehensive evaluation demonstrates that our approach exhibits very good efficiency guaranteeing high level of accuracy, especially with respect to state-of-the-art competitors.".
134 abstract "In the past years, open-domain Information Extraction (IE) systems like Nell and ReVerb have achieved impressive results by harvesting massive amounts of machine-readable knowledge with minimal supervision. However, the knowledge bases they produce still lack a clean, explicit semantic data model. This, on the other hand, could be provided by full-fledged semantic networks like DBpedia or Yago, which, in turn, would benefit from the additional coverage provided by Web-scale IE. In this paper, we bring these two strains of research together, and present a method to align terms from Nell with instances in DBpedia. Our approach is unsupervised in nature and relies on two key components. First, we automatically acquire probabilistic type information for Nell terms given a set of matching hypotheses. Second, we view the mapping task as the statistical inference problem of finding the most likely coherent mapping – i.e., the maximum a posteriori (MAP) mapping – based on the outcome of the first component used as soft constraint. These two steps are highly intertwined: accordingly, we propose an approach that iteratively refines type acquisition based on the output of the mapping generator, and vice versa. Experimental results on gold-standard data indicate that our approach outperforms a strong baseline, and is able to produce ever-improving mappings consistently across iterations.".
139 abstract "Linked Data consumers may need explanations for debugging or understanding the reasoning behind producing the data. They may need the possibility to transform long explanations into more understandable short explanations. In this paper, we discuss an approach to explain reasoning over Linked Data. We introduce a vocabulary to describe explanation related metadata and we discuss how publishing these metadata as Linked Data enables explaining reasoning over Linked Data. Finally, we present an approach to summarize these explanations taking into account user specified explanation filtering criteria.".
140 abstract "Cross-Language Information Retrieval (CLIR) systems extend classic information retrieval mechanisms for allowing users to query across languages, i.e., to retrieve documents written in languages different from the language used for query formulation. In this paper, we present a CLIR system exploiting multilingual ontologies for enriching documents representation with multilingual semantic information during the indexing phase and for mapping query fragments to concepts during the retrieval phase. This system has been applied on a domain-specific document collection and the contribution of the ontologies to the CLIR system has been evaluated in conjunction with the use of both Microsoft Bing and Google Translate translation services. Results demonstrate that the use of domain-specific resources leads to a significant improvement of CLIR system performance.".
144 abstract "In many applications (like social or sensor networks) the information generated can be represented as a continuous stream of RDF items, where each item describes an application event (social network post, sensor measurement, etc). In this paper we focus on compressing RDF streams. In particular, we propose an approach for lossless RDF stream compression, named RDSZ (RDF Differential Stream compressor based on Zlib). This approach takes advantage of the structural similarities among items in a stream by combining a differential item encoding mechanism with the general purpose stream compressor Zlib. Empirical evaluation using several RDF stream datasets shows that this combination produces gains in compression ratios with respect to using Zlib alone.".
145 abstract "Ontology authoring is a non-trivial task for novice authors who are not proficient in logics. It is difﬁcult to either specify the requirements to an ontology, or test the satisfaction of them. In this paper, we propose a novel approach to addressing this problem by leveraging the ideas of competency questions and test-driven software development. We ﬁrst analyse real-world competency questions collected from two different domains. Analysis shows that many of them can be categorised into several frequent patterns that differ along a set of features. Then we employ the notion of presupposition from linguistics to describe the ontology requirements implied by competency questions, and show that these presuppositions can be tested automatically.".
146 abstract "In previous work it has been shown how an OWL 2 DL ontology O can be ‘repaired’ for an OWL 2 RL system ans—that is, how we can compute a set of axioms R that is independent from the data and such that ans that is generally incomplete for O becomes complete for all SPARQL queries when used with O\cup R. However, the initial implementation and experiments were very preliminary and hence it is currently unclear whether the approach can be applied to large and complex ontologies. Moreover, the approach so far can only support instance queries. In the current paper we thoroughly investigate repairing as an approach to scalable (and complete) ontology-based data access. First, we present several non-trivial optimisations to the first prototype. Second, we show how (arbitrary) conjunctive queries can be supported by integrating well-known query rewriting techniques with OWL 2 RL systems via repairing. Third, we perform an extensive experimental evaluation obtaining encouraging results. In more detail, our results show that we can compute repairs even for very large real-world ontologies in a reasonable amount of time, that the performance overhead introduced by repairing is negligible in small to medium sized ontologies and noticeable but manageable in large and complex one, and that the hybrid reasoning approach can very efficiently compute the correct answers for real-world challenging scenarios.".
154 abstract "Communities of academic authors are usually identified by means of standard community detection algorithms, which exploit ‘static’ relations, such as co-authorship or citation networks. In contrast with these approaches, here we focus on diachronic topic-based communities –i.e., communities of people who appear to work on semantically related topics at the same time. These communities are interesting because their analysis allows us to make sense of the dynamics of the research world –e.g., migration of researchers from one topic to another, new communities being spawn by older ones, communities splitting, merging, ceasing to exist, etc. To this purpose, we are interested in developing clustering methods that are able to handle correctly the dynamic aspects of topic-based community formation, prioritizing the relationship between researchers who appear to follow the same research trajectories. We thus present a novel approach called Temporal Semantic Topic-Based Clustering (TST), which exploits a novel metric for clustering researchers according to their research trajectories, defined as distributions of semantic topics over time. The approach has been evaluated through an empirical study involving 25 experts from the Semantic Web and Human-Computer Interaction areas. The evaluation shows that TST exhibits a performance comparable to the one achieved by human experts.".
168 abstract "In the latest years, the Web has seen an increasing interest in legal issues, concerning the use and re-use of online published material. In particular, several open issues affect the terms and conditions under which the data published on the Web is released to the users, and the users rights over such data. Though the number of licensed material on the Web is considerably increasing, the problem of generating machine readable licenses information is still unsolved. In this paper, we propose to adopt Natural Language Processing techniques to extract in an auto-mated way the rights and conditions granted by a license, and we return the license in a machine readable format using RDF and adopting two well known vocabularies to model licenses. Experiments over a set of widely adopted licenses show the feasibility of the proposed approach.".
175 abstract "Networks of citations are a key tool for referencing, disseminating and evaluating research results. The task of characterising the functional role of citations in scientific literature is very difficult, not only for software agents but for humans, too. The main problem is that the mental models of different annotators hardly ever converge to a single shared opinion. The goal of this paper is to investigate how an existing reference model for classifying citations, namely CiTO (Citation Typing Ontology), is interpreted and used by annotators of scientific literature. We present some experiments capturing the cognitive processes behind users' decisions in annotating papers with CITO, and provide initial ideas to refine future releases of CiTO and to simulate readers' behaviour within CiTaLO, a tool for automatic classification of citations.".
192 abstract "In this paper we introduce Spartiqulation, a system that translates SPARQL queries into English text. Our aim is to allow casual end users of semantic applications with limited to no expertise in the SPARQL query language to interact with these applications in a more intuitive way. The verbalization approach exploits domain-independent template-based natural language generation techniques, as well as linguistic cues in labels and URIs.".
20 abstract "Due to the distributed nature of Linked Data, many resources are referred to by more than one URI. This phenomenon, known as co-reference, increases the probability of leaving out implicit semantically related results when querying Linked Data. The probability of co-reference increases further when considering distributed SPARQL queries over a larger set of distributed datasets. Addressing co-reference in Linked Data queries, on one hand, increases complexity of query processing. On the other hand, it requires changes in how statistics of datasets are taken into consideration. We investigate these two challenges of addressing co-reference in distributed SPARQL queries, and propose two methods to improve query efficiency: 1) a model named Virtual Graph, that transforms a query with co-reference into a normal query with pre-existing bindings; 2) an algorithm named $\Psi$, that intensively exploits parallelism, and dynamically optimises queries using runtime statistics. We deploy both methods in an distributed engine called LHD-d. To evaluate LHD-d, we investigate the distribution of co-reference in the real world, based on which we simulate an experimental RDF network. In this environment we demonstrate the advantages of LHD-d for distributed SPARQL queries in environments with co-reference.".
202 abstract "Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is -- compared to other domains, such as biology -- a late Linked Data adopter. However, it has seen a steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the Lemon and NIF vocabularies in 277 test cases and point out common quality issues.".
210 abstract "Information on the temporal interval of validity for facts described by RDF triples plays an important role in a large number of applications. Yet, most of the knowledge bases available on the Web of Data do not provide such information in an explicit manner. In this paper, we present a generic approach which addresses this drawback by inserting temporal information into knowledge bases. Our approach combines two types of information to associate RDF triples with time intervals. First, it relies on temporal information gathered from the document Web by an extension of the fact validation framework DeFacto. Second, it harnesses the time information contained in knowledge bases. This knowledge is combined within a three-step approach which comprises the steps matching, selection and merging. We evaluate our approach against a corpus of facts gathered from Yago2 by using DBpedia and Freebase as input and different parameter settings for the underlying algorithms. Our results suggest that we can detect temporal information for facts from DBpedia with an F-measure of up to 70%.".
211 abstract "A variety of query approaches have been proposed by the semantic web community to explore and query semantic data. Each was developed for a specific task and employed its own interaction mechanism; each query mechanism has its own set of advantages and drawbacks. Most semantic web search systems employ only one approach, thus being unable to exploit the benefits of alternative approaches. Motivated by a usability and interactivity perspective, we propose to combine two query approaches (graph-based and natural language) as a hybrid query approach. In this paper, we present NL-Graphs which aims to exploit the strengths of both approaches, while ameliorating their weaknesses. NL-Graphs was conceptualised and developed from observations, and lessons learned, in several evaluations with expert and casual users. The results of evaluating our approach with expert and casual users on a large semantic dataset are very encouraging; both types of users were highly satisfied and could effortlessly use the hybrid approach to formulate and answer queries. Indeed, success rates showed they were able to successfully answer all the evaluation questions.".
227 abstract "Twitter, due to its massive growth as a social networking platform has been in focus to analyze its user generated content for personalization and recommendation tasks. A common challenge across these tasks is identifying user interests from tweets. Lately, semantic enrichment of Twitter posts to determine (entity-based) user interests has been an active area of research. The advantages of these approaches include interoperability, information reuse and the availability of knowledge-bases to be exploited. However, exploiting these knowledge bases for identifying user interests still remains a challenge. In this work, we focus on exploiting hierarchical relationships present in knowledge-bases to infer richer user interests expressed as \textit{Hierarchical Interest Graph}. We argue that the hierarchical semantics of concepts can enhance the existing systems to personalize or recommend items based on varied level of conceptual abstractness. We demonstrate the effectiveness of our approach through a user study which shows an average of approximately eight of the top ten weighted hierarchical interests in the graph being relevant to a given user's interests.".
26 abstract "A primary challenge to Web data integration is coreference resolution, namely identifying entity descriptions from different data sources that refer to the same real-world entity. Increasingly, solutions to coreference resolution have humans in the loop. For instance, many active learning, crowdsourcing, and pay-as-you-go approaches solicit user feedback for verifying candidate coreferent entities computed by automatic methods. Whereas reducing the number of verification tasks is a major consideration for these approaches, very little attention has been paid to the efficiency of performing each single verification task. To address this issue, in this paper, instead of showing the entire descriptions of two entities for verification which are possibly lengthy, we propose to extract and present a compact summary of them, and expect that such length-limited comparative entity summaries can help human users verify more efficiently without significantly hurting the accuracy of their verification. Our approach exploits the common and different features of two entities that best help indicate (non-)coreference, and also considers the diverse information on their identities. Experimental results show that verification is 2.7--2.9 times faster when using our comparative entity summaries, and its accuracy is not notably affected.".
33 abstract "An increasing number of services and marketplaces are flourishing in the Internet. From human services advertised in Google Helpouts to cloud services available at Amazon AWS marketplace. All services are, however, solely described in unstructured Web pages, which are suitable for manual browsing, but are difficult to exploit for creating advanced software applications which can foster service trading by supporting automated search, selection, service level negotiation, and contracting. The Unified Service Description Language (USDL) aimed at formalising service descriptions to take the current Internet to a Web of services. Nonetheless, its rigid technological foundations, its high complexity, and the reduced extensibility of the model limited its adoption. Informed from past experience on USDL, we present in this paper Linked USDL, the next evolution of USDL that adopts and exploits Linked Data to be scalable for the Web, to promote and simplify its adoption by reusing vocabularies and datasets, and to benefit from a high level of genericity and adaptability for domain specific modelling.".
35 abstract "Generators for synthetic RDF datasets are very important for testing and benchmarking various semantic data management tasks (e.g. querying, storage, update, compare, integrate). However, the current generators, like the Univ-Bench Artificial data generator (UBA), do not support sufficiently (or totally ignore) blank node connectivity issues. Blank nodes are used for various purposes (e.g. for describing complex attributes), and a significant percentage of resources is currently represented with blank nodes. Moreover, several semantic data management tasks, like isomorphism checking (useful for checking equivalence), and blank node matching (useful in comparison/versioning/synchronization, and in semantic similarity functions), not only have to deal with blank nodes, but their complexity and optimality depends on the connectivity of blank nodes. To enable the comparative evaluation of the various techniques for carrying out these tasks, in this paper we present the design and implementation of a generator, called BGEN, which allows building data sets containing blank nodes with the desired complexity, controllable through various features (morphology, size, diameter, clustering coefficient). Finally, the paper reports experimental results concerning the efficiency of the generator, as well as results from using the generated datasets, that demonstrate the value of the generator.".
37 abstract "The performance of classification models extremely relies on the quality of training data. However, label imperfection is an inherent fault of training data, which cannot be manually handled in big data environment. Various methods have been proposed to remove label noises in order to improve classification quality, with the side effect of cutting down data bulk. In this paper, we propose a knowledge based approach for tackling mislabeled multi-class big data, in which knowledge graph technique is combined with other data correction method to perceive and correct the error labels in big data. Experiments on a medical Q&A social data set show our knowledge graph based approach can effectively improve data quality and classification accuracy. Furthermore, this approach can be applied in other data mining tasks requiring deep understanding.".
38 abstract "Using semantic technologies the materialization of implicit given facts that can be derived from a dataset is an important task performed by a reasoner. With respect to the answering time for queries and the growing amount of available data, scaleable solutions that are able to process large datasets are needed. In previous work we described a rule-based reasoner implementation that uses massively parallel hardware to derive new facts based on a given set of rules. This implementation was limited by the size of processable input data as well as on the number of used parallel hardware devices. In this paper we introduce further concepts for a workload partitioning and distribution to overcome this limitations. Based on the introduced concepts, additional levels of parallelization can be proposed that benefit from the use of multiple parallel devices. Furthermore, we introduce a concept to reduce the amount of invalid triple derivations like duplicates. We evaluate our concepts by applying different rulesets to the real-world DBPedia dataset as well as to the synthetic Lehigh University benchmark ontology (LUBM) with up to 1.1 billion triples. The evaluation shows that our implementation scales in a linear way and outperforms current state of the art reasoner with respect to the throughput achieved on a single computing node.".
39 abstract "DBpedia is a central hub of Linked Open Data (LOD). Being based on crowd-sourced contents and heuristic extraction methods, it is not free of errors. In this paper, we study the application of numerical outlier detection methods to DBpedia, using Interquantile Range (IQR), Kernel Density Estimation (KDE), and various dispersion estimators, combined with different semantic grouping methods. Our approach reaches 87% precision, and has lead to the identification of 11 systematic errors in the DBpedia extraction framework.".
43 abstract "In this paper we analyse the sensitivity of twelve prototypical linked data index models towards evolving data. Thus, we consider the reliability and accuracy of results obtained from an index in scenarios where the original data has changed after having been indexed. Our analysis is based on empirical ob- servations over real world data covering a time span of more than one year. The quality of the index models is evaluated w.r.t. their ability to give reliable esti- mations of the distribution of the indexed data. To this end we use metrics such as perplexity, cross-entropy and Kullback-Leibler divergence. Our experiments show that all considered index models are affected by the evolution of data, but to different degrees and in different ways. We also make the interesting observa- tion that index models based on schema information seem to be more stable than index models based on triples or context information.".
50 abstract "Efficient federated query processing is of significant importance to tame the large amount of data available on the Web of Data. Previous works have focused on generating optimized query execution plans for fast result retrieval. However, devising source selection approaches beyond triple pattern-wise source selection has not received much attention. This work presents HiBISCuS, a novel hypergraph-based source selection approach to federated SPARQL querying. Our approach can be directly combined with existing SPARQL query federation engines to achieve the same recall while querying fewer data sources. We extend three well-known SPARQL query federation engines -- DARQ, SPLENDID, and FedX -- with HiBISCus and compare our extensions with the original approaches on FedBench. Our evaluation shows that HiBISCuS can efficiently reduce the total number of sources selected without losing the recall. Moreover, our approach significantly reduces the execution time of the selected engines on most of the benchmark queries.".
53 abstract "For effectively searching the Web of data, ranking results is a crucial. Top-k processing strategies have been proposed to allow an efficient processing of such ranked queries. These strategies aim at computing k top-ranked results without complete result materialization. However, for many applications result computation time is much more important than result accuracy and completeness. Thus, there is a strong need for approximated ranked results. Unfortunately, previous work on approximate top-k processing is not well-suited for the Web of data. In this paper, we propose the first approximate top-k join framework for Web data and queries. Our approach is very lightweight – neces- sary statistics are learned at runtime in a pay-as-you-go manner. We conducted extensive experiments on state-of-art RDF benchmarks. Our results are very promising: we could achieve up to 65% time savings, while maintaining a high precision/recall.".
65 abstract "Large scale Linked Data is often based on relational databases and thereby tends to be modeled with rich object properties, specifying the exact relationship between two objects, rather than a generic is-a or part-of relationship. We study this phenomenon on government issued statistical data, where a vested interest exists in matching such object properties for data integration. We leverage the fact that while the labeling of the properties is often heterogeneous, e.g. ex1:geo and ex2:location, they link to individuals of semantically similar code lists, e.g. country lists. State-of-the-art ontology matching tools do not use this effect and therefore tend to miss the possible correspondences. We enhance the state-of-the-art matching process by aligning the individuals of such imported ontologies separately and computing the overlap between them to improve the matching of the object properties. The matchers themselves are used as black boxes and are thus interchangeable. The new correspondences found with this method lead to an increase of recall up to 2.5 times on real world data, with only a minor loss in precision.".
67 abstract "The choice of which vocabulary to reuse when modeling and publishing Linked Open Data (LOD) is far from trivial. Until today, there is no study that investigates the different strategies of reusing vocabularies for LOD modeling.In this paper, we present the results of a survey with 79 participants that examines the most preferred strategies of vocabulary reuse of LOD modeling. Participants of our survey are LOD publishers and practitioners. Their task was to assess different vocabulary reuse strategies and explain their ranking decision. We found significant differences between the modeling strategies that range from reusing popular vocabularies, minimizing the number of vocabularies, and staying within one domain vocabulary. A very interesting insight is that the popularity in terms of how many data sources use a vocabulary is more important than total occurrence of individual classes and properties in the LOD cloud. Overall, the results of this survey help in better understanding the strategies how data engineers reuse vocabularies and may be used to develop future vocabulary engineering tools.".
68 abstract "Background knowledge about the application domain can be used in event processing in order to improve processing quality. The idea of semantic enrichment is to incorporate background knowledge into events, thereby generating enriched events which in the next processing step can be better understood by event processing engines. In this paper, we present an efficient technique for event stream enrichment by planning multi-step event enrichment and processing. Our optimization goal is to minimize event enrichment costs while meeting application-specific service expectations. The event enrichment is optimized to avoid unnecessary event stream enrichment without missing any complex events of interest. Our experimental results shows that by using this approach it is possible to reduce the knowledge acquisition costs.".
73 abstract "Abstract We propose an approach for modifying a declarative description of a set of entities (e.g., a SPARQL query) for the purpose of finding alternative declarative descriptions of the entities. Such a shift in representation can help to get new insights into the data, to discover related attributes, or to find a more con- cise description of the entities of interest. Allowing the alternative descriptions furthermore to be close approximations of the original entity set leads to more flexibility in finding such insights. Our approach is based on the construction of parallel formal concept lattices over different sets of attributes for the same en- tities. Between the formal concepts in the parallel lattices, we define mappings which constitute approximations of the extent of the concepts. In this paper, we formalize the idea of two types of mappings between formal concept lattices, provide an implementation of these mappings and evaluate their ability of finding alternative descriptions on several real-world RDF data sets for one concrete set- ting of finding alternative descriptions for entities described by RDF class types which are based on properties.".
76 abstract "In this paper, we present a novel approach – called WaterFowl – for the storage of RDF triples that addresses some key issues in the contexts of big data and the Semantic Web. The architecture of our prototype, largely based on the use of succinct data structures, enables the representation of triples in a self-indexed, compact manner without requiring decompression at query answering time. Moreover, it is adapted to efficiently support RDF and RDFS entailment regimes thanks to an optimized encoding of ontology concepts and properties that does not require a complete inference materialization or extensive query rewriting algorithms. This approach implies to make a distinction between the ter- minological and the assertional components of the knowledge base early in the process of data preparation, i.e., preprocessing the data before storing it in our structures. The paper describes the complete architec- ture of this system and presents some preliminary results obtained from evaluations conducted on our first prototype.".
80 abstract "Existing algorithms for signing graph data typically do not cover the whole signing process. In addition, they lack distinctive features such as signing graph data at different levels of granularity, iterative signing of graph data, and signing multiple graphs. In this paper, we introduce a novel framework for signing arbitrary graph data provided, e.g., as RDF(S), Named Graphs, or OWL. We conduct an extensive theoretical and empirical analysis of the runtime and space complexity of different framework configurations. The experiments are performed on synthetic and real-world graph data of different size and different number of blank nodes. We investigate security issues, present a trust model, and discuss practical considerations for using our signing framework.".
81 abstract "We present PRISSMA, a context-aware presentation layer for Linked Data. PRISSMA extends the Fresnel vocabulary with the notion of mobile context. Besides, it includes an algorithm that determines whether the sensed context is compatible with some context declarations. The algorithm finds optimal error-tolerant subgraph isomorphisms between RDF graphs using the notion of graph edit distance and is sublinear in the number of context declarations in the system.".
82 abstract "Over the last few years, devising efficient approaches to compute links between datasets has been regarded as central to achieve the vision behind the Data Web. Several unsupervised approaches have been developed to achieve this goal. Yet, so far, none of these approaches make use of replication of resources across several knowledge bases. In this paper, we present Colibri, an unsupervised approach that allows discovering links between datasets while improving the quality of the instance data in these datasets. A Colibri iteration begins by generating links between knowledge bases. Then, our approach makes use of these links to detect resources with probably erroneous or missing information. The erroneous or missing information detected by the approach is finally corrected or added. We evaluate our approach on benchmark datasets with respect to the F-score it achieves. Our results suggest that Colibri can improve the results of unsupervised machine-learning approaches for link discovery by up to 12% while correctly detecting and repairing erroneous resources.".
84 abstract "The increasing adoption of Linked Data principles has led to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propose an approach for creating linked dataset profiles. A profile consists of structured dataset metadata describing topics and their relevance. Profiles are generated through the configuration of techniques for resource sampling from datasets, topic extraction from knowledge bases and their ranking based on graphical models. To enable a good trade-off between scalability and representatives of generated data, appropriate parameters are determined experimentally. Our evaluation considers topic profiles of all accessible datasets from the Linked Open Data cloud and shows that our approach generates representative profiles even with comparably small sample sizes (10%) outperforms established topic modelling approaches.".
87 abstract "Ontology versions are periodically released to ensure their usefulness and reliability over time. This potentially impacts dependent artefacts such as mappings and annotations. To deal with this issue re- quires to ﬁnely characterize ontology entities’ changes between ontology versions. This article proposes to identify change patterns at attribute values when an ontology evolves, to track textual statements describing concepts. We empirically evaluate our approach by using biomedical on- tologies, for which new ontology versions are frequently released. Our achieved results suggest the feasibility of the proposed techniques.".
93 abstract "Lexicon-based approaches to Twitter sentiment analysis are gaining much popularity due to their simplicity, domain independence, and relatively good performance. These approaches rely on sentiment lexicons, where a collection of words are marked with fixed sentiment polarities. However, words' sentiment orientation (positive, neural, negative) and/or sentiment strengths could change depending on context and targeted entities. In this paper we present SentiCircle; a novel lexicon-based approach that takes into account the contextual and conceptual semantics of words when calculating their sentiment orientation and strength in Twitter. We evaluate our approach on three Twitter datasets using three different sentiment lexicons. Results show that our approach significantly outperforms two lexicon baselines. Results are competitive but inconclusive when comparing to state-of-art SentiStrength, and vary from one dataset to another. SentiCircle outperforms SentiStrength in accuracy on average, but falls marginally behind in F-measure.".
105 abstract "Serious games with 3D interfaces are Virtual Reality (VR) systems that are becoming common for the training of military and emergency teams. A platform for the development of serious games should allow the addition of semantics to the virtual environment and the modularization of the artificial intelligence controlling the behaviors of non-playing characters in order to support a productive end-user development environment. In this paper, we report the ontology design activity performed in the context of the PRESTO project aiming to realize a conceptual model able to abstract the developers from the graphical and geometrical properties of the entities in the virtual reality, as well as the behavioral models associated to the non-playing characters. The feasibility of the proposed solution has been validated through real-world examples and discussed with the actors using the modeled ontologies in every day practical activities.".
111 abstract "Enterprise Architecture (EA) models are established means for decision makers in organizations. They describe the business processes, the application landscape and IT infrastructure as well as the relationships between those layers. Current research focuses merely on frameworks, modeling and documentation approaches for enterprise architectures. But once these models are established, methods for their analysis are rare. In this paper we propose the use of semantic web technologies in order to represent the enterprise architecture and perform analyses. We present an approach how to transform an existing EA model into an ontology. Using this knowledge base, simple questions can be answered with the query language SPARQL. The major benefits of semantic web technologies can be found, when defining and applying more complex analysis. Change impact analysis are important to estimate the effects and costs of a change to an EA model element. To show the benefits of semantic web technologies for EA, we implemented an approach to change impact analysis and executed it within a case study.".
164 abstract "The Web was originally developed to support collaboration in science. Although scientists benefit from many forms of collaboration on the Web (e.g., blogs, wikis, forums, code sharing, etc.), most collaborative projects are coordinated over email, phone calls, and in-person meetings. Our goal is to develop a collaborative infrastructure for scientists to work on complex science questions that require multi-disciplinary contributions to gather and analyze data, that cannot occur without significant coordination to synthesize findings, and that grow organically to accommodate new contributors as needed as the work evolves over time. Our approach is to develop an organic data science framework based on a task-centered organization of the collaboration, includes principles from social sciences for successful on-line communities, and exposes an open science process. Our approach is implemented as an extension of a semantic wiki platform, and captures formal representations of task decomposition structures, relations between tasks and users, and other properties of tasks, data, and other relevant science objects. All these entities are captured through the semantic wiki user interface, represented as semantic web objects, and exported as linked data.".
19 abstract "A Decision Support System (DSS) in tunnelling domain deals with identifying pathologies based on disorders present in various tunnel portions and contextual factors affecting a tunnel. Another key area in diagnosing pathologies is to identify regions of the spread of pathologies. In practice, tunnel experts intuitively abstract such regions of interest and in doing so select tunnel portions that are susceptible to the same types of pathologies with some distance approximation. This complex diagnosis process is often subjective and poorly scales across cases and transport structures. In this paper, we introduce PADTUN system, a working prototype of DSS in tunnelling domain using semantic technologies. Ontologies are developed and used to capture tacit knowledge from tunnel experts. Tunnel inspection data are annotated with ontologies to take advantage of inferring capabilities offered by semantic technologies. In addition, an intelligent mechanism is developed to exploit abstraction and inference capabilities to identify regions of interest (ROI). PADTUN is developed in real-world settings offered by the NeTTUN EU Project and is applied in a tunnel diagnosis use case with Société Nationale des Chemins de Fer Français (SNCF), France. We show how the use of semantic technologies allows addressing the complex issues of pathology and ROI inferencing and matching experts’ expectations of decision support.".
47 abstract "Named Entity Resolution (NER) is an information extraction task that involves detecting mentions of named entities within texts and mapping them to their corresponding entities in a given knowledge source. Systems and frameworks for performing NER have been developed both by the academia and the industry with different features and capabilities. Nevertheless, what all approaches have in common is that their satisfactory performance in a given scenario does not constitute a trustworthy predictor of their performance in a different one, the reason being the scenario's different parameters (target entities, input texts, domain knowledge etc.). With that in mind, we describe in this paper a metric-based Diagnostic Framework that can be used to identify the causes behind the low performance of NER systems in industrial settings and take appropriate actions to increase it.".
56 abstract "In this paper we present an architecture and approach to publishing open linked data in cultural heritage domain. We demonstrate our approach for building a system both for data publishing and consumption and show how user’s benefits can be achieved with semantic technologies. For domain knowledge representation CIDOC-CRM ontology is used. As a main source of trusted data the web portal of the Russian Museum is crawled. For data enrichment we selected DBpedia and SPARQL-endpoint of the British Museum. The evaluation shows potentials of semantics applications for data publishing in contextual environment, semantic search and visualization and automated enrichment according to needs and expectations of art experts and regular museum visitors. ".
68 abstract "A wealth of biomedical datasets is meanwhile published as Linked Open Data. Each of these datasets has a particular focus, such as providing information on diseases or symptoms of a certain kind. Hence, a comprehensive view can only be provided by integrating information from various datasets. Although, links between diseases and symptoms can be found, these links are far too sparse to enable practical applications such as a disease-centric access to clinical reports that are annotated with symptom information. For this purpose, we build a model of disease-symptom relations. Utilizing existing ontology mappings, we propagate semantic type information disease and symptom across ontologies. Then entities of the same semantic type from different ontologies are clustered and object properties between entities are mapped to cluster-level relations. The effectiveness of our approach is demonstrated by integrating all available disease-symptom relations from different biomedical ontologies resulting in a significantly increased linkage between datasets.".
75 abstract "Various studies have reported on inefficiencies of existing travel search engines, and user frustration generated through hours of searching and browsing, often with no satisfactory results. Not only do the users fail to find the right offer in the myriad of websites, but they end up browsing through many offers that do not correspond to their criteria. The Semantic Web framework is a reasonable candidate to improve this. In this paper, we present a semantic travel offer search system named “RE-ONE”. We especially highlight its ability to help users formulate better search queries. An example of a permitted query is in Croatia at the seaside where there is Vegetarian Restaurant. To our best knowledge, our system is the first search system that leverages the semantic graph to construct indexes and to support the cognitive process of travel search. We conducted two experiments to evaluate the Query Auto-completion mechanism. The results showed that our system outperforms the Google Custom Search baseline. Queries freely conducted in RE-ONE are shown to be 63.4% longer and 27% richer in terms of number of search criteria. RE-ONE supports better users’ cognitive process by giving suggestions in greater accordance with users’ idea flow.".

first
previous
next