Matches in ScholarlyData for { ?s <https://w3id.org/scholarlydata/ontology/conference-ontology.owl#abstract> ?o. }
- 91 abstract "The European Commision recently became interested in mapping digital social innovation in Europe. In order to understand this rapidly developing if little known area, a visual and interactive survey was made in order to crowd-source a map of digital social innovation, available at http://digitalsocial.eu. Over 900 organizations participated, and Linked Data was used as the backend with a Ruby on Rails Framework. The data was processed using SPARQL and network analysis, and a number of concrete policy recommendations resulted from the analysis. ".
- 11 abstract "Being promoted by major search engines such as Google, Yahoo!, Bing, and Yandex, Microdata embedded in web pages, especially using schema.org, has become one of the most important markup languages for the Web. However, actually deployed Microdata is most often not free from errors, which limits its practical use. In this paper, we use the WebDataCommons corpus of Microdata extracted from more than 250 million web pages for a quantitative analysis of common mistakes in Microdata provision. Since it is unrealistic that data providers will fix all the mistakes and provide clean and correct data, we discuss a set of heuristics that can be applied on the data consumer side to fix many of those mistakes in a post-processing step. We apply those heuristics to provide an improved Microdata corpus.".
- 116 abstract "There is a huge demand to be able to integrate heterogeneous data sources, which requires mapping the attributes of a source to the concepts and relationships defined in a domain ontology. In this paper, we present a new approach to the problem of identifying these mappings, which we call semantic labeling. Previous approaches map each data value individually, typically by learning a model based on features extracted from the data using supervised machine learning techniques. Our approach differs from existing approaches in that we take a holistic view of the data values corresponding to a semantic label and use techniques that treat this data collectively, which makes it possible to capture characteristic properties of the values associated with a semantic label as a whole. Our approach supports both textual and numeric data and proposes the top-k semantic labels along with their associated confidence scores. Our experiments show that the approach has higher label prediction accuracy, has lower time complexity, and is more scalable than existing systems.".
- 118 abstract "There is an increasing amount of (semi)-structured data found on the Web that exhibits a graph data model. Although it provides comprehensive details about individual entities, the entities themselves are not the item of interest, but rather an aggregated view of the data. The analysis of graph data is a common task, thus, there is a need for an analytical framework that facilitates this. In this work, we propose a graph aggregation operator, called Gagg, that is flexible with regards to the data’s structure and analysis requirements. We formalise the semantics of Gagg so it can be part of a wider graph processing algebra. We believe such an operator is core to such an algebra, where several optimisations are possible. We evaluate Gagg over the BSBM and SP2B benchmarks and we find that it improves the graph aggregation performance by a factor of three.".
- 119 abstract "The advances of the Linked Open Data (LOD) initiative are giving rise to a more structured Web of data. Indeed, a few datasets act as hubs (e.g., DBpedia) connecting many other datasets. They also made possible new Web services for entity detection inside plain text (e.g., DBpedia Spotlight), thus allowing for new applications that will benefit from a combination of the Web of documents and the Web of data. To ease the emergence of these new use-cases, we propose a query-biased algorithm for the ranking of entities detected inside a Web page. Our algorithm combine link analysis with dimensionality reduction. We use crowdsourcing for building a publicly available and reusable dataset on which we compare our algorithm to the state of the art. Finally, we use this algorithm for the construction of semantic snippets for which we evaluate the usability and the usefulness with a crowdsourcing-based approach.".
- 125 abstract "With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of different data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu and evaluated in terms of performance and accuracy on Linked Open Datasets of broad relevance.".
- 13 abstract "Ontology learning (OL) aims at the (semi-)automatic acquisition of ontologies from sources of evidence, typically domain text. Recently, there has been a trend towards the application of multiple and heterogeneous evidence sources in OL. Heterogeneous sources provide benefits, such as higher accuracy by exploiting redundancy across evidence sources, and including complementary information. When using evidence sources which are heterogeneous in quality, amount of data provided and type, then a number of questions arise, for example: How many sources are needed to see significant benefits from heterogeneity, what is an appropriate number of evidences per source, is balancing the number of evidences per source important, and to what degree can the integration of multiple sources overcome low quality input of individual sources? This research presents an extensive evaluation based on an existing OL system. It gives answers and insights on the research questions posed for the OL task of concept detection, and provides further hints from experience made. Among other things, our results suggest that a moderate number of evidences per source as well as a moderate number of sources resulting in a few thousand data instances are sufficient to exploit the benefits of heterogeneous evidence integration.".
- 132 abstract "As of today, there exists no standard language for querying Linked Data on the Web, where navigation across distributed data sources is a key feature. A natural candidate seems to be SPARQL, which recently has been enhanced with navigational capabilities thanks to the introduction of property paths (PPs). However, the semantics of SPARQL restricts the scope of navigation via PPs to single RDF graphs. This restriction limits the applicability of PPs on the Web. To fill this gap, in this paper we provide formal foundations for evaluating PPs on the Web, thus contributing to the definition of a query language for Linked Data. In particular, we introduce a query semantics for PPs that couples navigation at the data level with navigation on the Web graph. Given this semantics we find that for some PP-based SPARQL queries a complete evaluation on the Web is not feasible. To enable systems to identify queries that can be evaluated completely, we establish a syntactic and, thus, decidable property of such queries.".
- 134 abstract "This paper explores the factors that influence the performance of hybrid named entity recognition (NER) approaches to microblogs, which combine state-of-the-art automatic techniques with human and crowd computing. We identify a set of content and crowdsourcing-related features (number of entities in a post, types of entities, skipped true-positive posts, average time spent to complete the tasks, and interaction with the user interface) and analyze their impact on the accuracy of the results and the timeliness of their delivery. Using CrowdFlower and a simple, custom built gamified NER tool we run experiments on three datasets from related literature and a fourth newly annotated corpus. Our findings show that crowd workers are adept at recognizing people, locations, and implicitly identified entities within shorter microposts. We expect them to lead to the design of more advanced NER pipelines, informing the way in which tweets are chosen to be outsourced or processed by automatic tools. Experimental results are published as JSON-LD for further use by the research community.".
- 141 abstract "Currently one of the challenges for the ontology alignment community is the user involvement in the alignment process. At the same time, the focus of the community has shifted towards large-scale matching which introduces an additional dimension to this issue. This paper aims to provide a set of requirements that foster the user involvement for large-scale ontology alignment tasks. Further, we present and discuss the results of a literature study for 7 ontology alignments systems as well as a heuristic evaluation and an observational user study for 3 ontology alignment systems to reveal the coverage of the requirements in the systems and the support for the requirements in the user interfaces.".
- 142 abstract "We present a method for a compact in-memory RDF dictionary that supports high frequency updates. Our method leverages the long common prefixes in RDF terms to compress them using a trie data structure. To overcome the memory inefficiency of tries, we present a highly memory efficient implementation which is especially tuned for RDF data. Our approach compacts the dictionary further by unifying the two independent tables that are normally used for encoding and decoding into a single table by mapping each string to a memory location from where it can be reached again instead of counter ID. To motivate our approach, we present empirical data on the amount of repetition that is present in both IRIs and literals of large realistic RDF graphs. An empirical analysis shows that our technique saves 50-59% memory compared to uncompressed conventional dictionaries while still offering comparable encoding/decoding performance.".
- 144 abstract "The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Hence, answering complex questions often required combining information from structured and unstructured data sources. We present HAWK, a novel entity search approach for Hybrid Question Answering based on combining Linked Data and textual data. The approach uses predicate-argument representations of questions to derive equivalent combinations of SPARQL query fragments and text queries. These are executed so as to integrate the results of the text queries into SPARQL and thus generate a formal interpretation of the query. We present a thorough evaluation of the framework, including an analysis of the influence of entity annotation tools on the generation process of the hybrid queries and a study of the overall accuracy of the system. Our results show that HAWK 0.68 respectively 0.61 F-measure within the training respectively test phases on the Question Answering over Linked Data (QALD-4) hybrid query benchmark.".
- 146 abstract "In this paper, we present VocBench, an open source web application for editing thesauri complying with the SKOS and SKOS-XL standards. VocBench has a strong focus on collaboration, supported by workflow management for content validation and publication. Dedicated user roles provide a clean separation of competences, addressing different specificities ranging from management aspects to vertical competences on content editing, such as conceptualization versus terminology editing. Extensive support for scheme management allows editors to fully exploit the possibilities of the SKOS model, as well as to fulfill its integrity constraints. We discuss thoroughly the main features of VocBench, detail its architecture, and then evaluate it under both a functional and user-appreciation ground, through a comparison with other similar tools and by performing an analysis of user questionnaires, respectively. Finally, we provide insights on future developments.".
- 151 abstract "The OntoLex W3C Community Group has been working for more than three years on a shared lexicon model for ontologies. Alongside the core specification, the group also developed additional modules for specific tasks and use cases. In many usage scenarios the discovery and exploitation of linguistically grounded ontologies may benefit from summarizing information about their linguistic expressivity. That situation is compound by the fact that OntoLex allows the independent publication of ontologies, lexica and lexicalizations linking them. While the VoID vocabulary already addressed the need for general metadata about interlinked datasets, it is unable by itself to represent the more specific metadata relevant to OntoLex. To solve this problem, we developed a module of OntoLex, named LIME (Linguistic Metadata), which extends VoID with a vocabulary of metadata about the ontology-lexicon interface. ".
- 156 abstract "This paper addresses the problem of failing RDF queries. Query relaxation is one of the cooperative techniques that allows providing user with alternative answers instead of an empty result. While previous work on query relaxation over RDF data have focused on defining new relaxation operators, in this paper techniques to find the parts of an RDF query which are responsible of its failure, are investigated. Finding such subqueries, named Minimal Failing Subqueries (MFS), is of great interest to efficiently perform the relaxation process. We propose two algorithmic approaches inspired by previous work conducted in the context of relational databases. The first approach (called LBA) intelligently leverages the lattice of subqueries of the initial RDF query while the second (called MBA) is based on a particular matrix that can be used to improve the performance of LBA. In addition, we also show how our approaches can compute maximal non-failing subqueries, called Maximal Success Subqueries (XSS). XSSs are subqueries with maximal number of triple patterns of the initial query that result in non-empty answers. To validate our approaches, a set of thorough experiments is conducted on the LUBM benchmark and a comparative study with other approaches is done.".
- 171 abstract "The exponential growth of the web and the extended use of semantic web tech-nologies has brought to the fore the need for quick understanding, flexible explo-ration and selection of complex web documents and schemas. To this direction ontology summarization aspires to produce an abridged version of the original ontology that highlights the most representative concepts of it. In this paper, we present RDF Digest, a novel platform that automatically produces summaries of RDF/S Knowledge Bases. In our work, a summary is a valid RDF/S document that includes the most representative concepts of the schema adapted to the corre-sponding data instances. To construct this graph our algorithm exploits the se-mantics and the structure of the schema and the distribution of the corresponding data/instances as well. The performed evaluation demonstrates the benefits of our approach and the considerable advantages gained.".
- 175 abstract "Indented tree has been widely used to organize information and visualize graph-structured data like RDF graphs. Given a starting resource in a cyclic RDF graph, there are different ways of transforming the graph into a tree representation to be visualized as an indented tree. It would be interesting to investigate whether and how these different representations influence the user's browsing experience. In this paper, we address this issue from the coherence aspect of tree representation. We aim to smooth the user's reading experience by visualizing an optimal indented tree in the sense of featuring the fewest reversed edges, which often cause confusion and interrupt the user's cognitive process due to lack of a general, effective way of presentation. To achieve this, we propose a two-step approach that is theoretically proved to generate such an optimal tree representation for a given RDF graph and a specified starting resource. We also empirically show the difference in coherence between tree representations of real-world RDF graphs generated by our approach and two baseline approaches that have been widely adopted. These different tree representations lead to significantly different user experience in our preliminary user study, which reports a considerable degree of dependence between coherence and user experience.".
- 176 abstract "Large-scale knowledge graphs such as those in the Linked Data cloud are typically represented as subject-predicate-object triples. However, many facts about the world involve more than two entities. While n-ary relations can be converted to triples in a number of ways, unfortunately, the structurally different choices made in different knowledge sources significantly impede our ability to connect them. They also make it impossible to query the data concisely and without prior knowledge of each individual source. We present FrameBase, a wide-coverage knowledge-base schema that uses linguistic frames to seamlessly represent and query n-ary relations from other knowledge bases, at different levels of granularity connected by logical entailment. It also opens possibilities to draw on natural language processing techniques for querying and data mining.".
- 178 abstract "OWL 2 EL is one of the tractable profiles of Web Ontology Language (OWL) which has been standardized by the W3C. OWL 2 EL provides sufficient expressivity to model large biomedical ontologies as well as streaming data such as traffic. Automated generation of ontologies from streaming data and text can lead to very large ontologies. Existing reasoners make use of only a single machine and are thus constrained by memory and computational power. There is a need to develop reasoning approaches which scale with the size of ontologies. We describe a distributed reasoning system that scales well using a cluster of commodity machines. We also apply our system to a use case on city traffic data and show that it can handle volumes which cannot be handled by current reasonably-sized single machine reasoners.".
- 184 abstract "In the context of Semantic Web, one of the most important issue related to the class-membership prediction task (by using inductive models) on ontological knowledge bases concerns with the class-imbalance of the training examples, mostly due to the heterogeneous nature and the incompleteness of the knowledge bases. An ensemble learning approach has been proposed to cope with this problem. However, the majority voting procedure, exploited for deciding the membership, does not consider explicitly the uncertainty and the conflict that may occur among the classifiers of an ensemble model. Moving from this observation, we propose to integrate the Dempster-Shafer (DS) Theory with ensemble learning by exploiting DS pooling operators for combining conflictual information. Specifically, we propose an algorithm for learning Evidential Terminological Random Forest models to be used for the class-membership prediction task. The algorithm extends Terminological Random Forest in the settings of the Dempster-Shafer Theory. An empirical evaluation showed that the resulting models performs better for datasets with a lot of positive and negative examples and have a less conservative behavior than the voting-based forests.".
- 191 abstract "Social media -- and microblogs in particular -- have emerged as high-value, high-volume content, which organisations increasingly wish to analyse automatically. These short, noisy, context-dependent, stylized and dynamic texts currently present a significant challenge for semantic annotation techniques, such as ontology-based named entity disambiguation. State-of-the-art systems, however, have largely ignored the richer, microblog-specific context, which humans draw upon when interpreting these short messages. This paper focuses specifically on quantifying the impact on entity disambiguation performance when readily available contextual information is included from URL content, hash tag definitions, and Twitter user profiles. In particular, including URL content significant improves the performance of a state-of-the-art DBpedia-based entity linking system. Similarly, user profile information for @mentions improves recall by over 10% with no adverse impact on precision. Another contribution lies in a new, publicly available corpus of tweets, which have been hand-annotated with DBpedia URIs, with high inter-annotator agreement.".
- 20 abstract "Learning cross-lingual semantic representations of relations from textual data is useful for tasks like cross-lingual information retrieval and question answering. So far, research has been mainly focused on cross-lingual entity linking, which is confined to linking between phrases in a text document and their corresponding entities in a knowledge base but cannot link to relations. In this paper, we present an approach for inducing clusters of semantically-related relations expressed in text, where relation clusters i) can be extracted from text of different languages, ii) are embedded in a semantic representation of the context, and iii) can be linked across languages to properties in a knowledge base. This is achieved by combining multi-lingual semantic role labeling (SRL) with cross-lingual entity linking followed by spectral clustering of the annotated SRL graphs. With our initial implementation we learned a cross-lingual library of relations from English and Spanish Wikipedia articles. To demonstrate its usefulness we apply it to cross-lingual question answering over linked data.".
- 26 abstract "Uniform Resource Identifiers (URIs) are one of the corner stones of the Web; They are also exceedingly important on the Web of data, since RDF graphs and Linked Data both heavily rely on URIs to uniquely identify and connect online entities. Due to their hierarchical structure and their string serialization, sets of related URIs typically contain a high degree of redundant information and are systematically dictionary-compressed or encoded at the back-end (e.g., in the triple store). The paper represents, to the best of our knowledge, the first systematic comparison of the most common data structures used to encode URI data. We evaluate a series of data structures in term of their read/write performance and memory consumption.".
- 28 abstract "More and more RDF data is exposed on the Web via SPARQL endpoints. With the recent SPARQL 1.1 standard, these datasets can be queried in novel and more powerful ways, e.g., complex analysis tasks involving grouping and aggregation, and even data from multiple SPARQL endpoints, can now be formulated in a single query. This enables Business Intelligence applications that access data from federated web sources and can combine it with local data. However, as both aggregate and federated queries have become available only recently, state-of-the-art systems lack sophisticated optimization techniques that facilitate efficient execution of such queries over large datasets. To overcome these shortcomings, we propose a set of query processing strategies and the associated Cost-based Optimizer for Distributed Aggregate queries (CoDA) for executing aggregate SPARQL queries over federations of SPARQL endpoints. Our comprehensive experiments show that CoDA significantly improves performance over current state-of-the-art systems.".
- 32 abstract "HDT a is binary RDF serialization aiming at minimizing the space overheads of traditional RDF formats, while providing retrieval features in compressed space. Several HDT-based applications, such as the recent Linked Data Fragments proposal, leverage these features for diverse publication, interchange and consumption purposes. However, scalability issues emerge in HDT construction because the whole RDF dataset must be processed in a memory-consuming task. This is hindering the evolution of novel applications and techniques at Web scale. This paper introduces HDT-MR, a MapReduce-based technique to process huge RDF and build the HDT serialization. HDT-MR performs in linear time with the dataset size and has proven able to serialize datasets up to 4.42 billion triples, preserving HDT compression and retrieval features. These results enable a new generation of HDT-based applications.".
- 35 abstract "Although recent developments have shown that it is possible to reason over large RDF datasets with billions of triples in a scalable way, the reasoning process can still be a challenging task with respect to the growing amount of available semantic data. By now, reasoner implementation that are able to process large scale datasets usually use a MapReduce based implementation that runs on a cluster of computing nodes. In this paper we address this circumstance by identifying the resource consuming parts of a reasoner process and providing a solution for a more efficient implementation in terms of memory consumption. As a basis we use a rule-based reasoner concept from our previous work. In detail, we are going to introduce an approach for a memory efficient RETE algorithm implementation. Furthermore, we introduce a compressed triple-index structure that can be used to identify duplicate triples and only needs a few bytes to represent a triple. Based on these concepts we show that it is possible to apply all RDFS rules to more than 1 billion triples on a single laptop reaching a throughput, that is comparable or even higher than state of the art MapReduce based reasoner. Thus, we show that the resources needed for large scale lightweight reasoning can massively be reduced.".
- 41 abstract "Instance matching concerns identifying pairs of instances that refer to the same underlying entity. Current state-of-the-art instance matchers use machine learning methods. Supervised learning systems achieve good performance by training on significant amounts of manually labeled samples. To alleviate the labeling effort, this paper presents a minimally supervised instance matching approach that is able to deliver competitive performance using only 2% training data and little parameter tuning. As a first step, the classifier is trained in an ensemble setting using boosting. Iterative semi-supervised learning is used to improve the performance of the boosted classifier even further, by re-training it on the most confident samples labeled in the current iteration. Empirical evaluations on a suite of six publicly available benchmarks show that the proposed system outcompetes optimization-based minimally supervised approaches in 1-7 iterations. The system's average F-Measure is shown to be within 2.5% of that of recent supervised systems that require more training samples for effective performance. ".
- 54 abstract "In order to reduce the cost of publishing queryable Linked Data, Triple Pattern Fragments (TPF) were introduced as a~simple interface to RDF triples. It allows for SPARQL query execution at low server cost, by partially shifting the load from the server to the client. The previously proposed client algorithm uses more HTTP requests than strictly necessary, and only makes partial use of the available metadata. In this paper, we propose a new query execution algorithm for a client communicating with a TPF server. Instead of using a greedy solution, we maintain an overview of the entire query to find the steps that are optimal for solving a given query. We show multiple cases in which our implementation produces a solution containing far fewer server calls, while not significantly increasing the cost in other cases. This improves the efficiency of common SPARQL queries against TPF interfaces, augmenting their viability compared to the more powerful, but more costly, SPARQL interface.".
- 63 abstract "In the context of Web of Things (WoT), embedded networks have to face the challenge of getting ever more complex. The complexity arises as the number of interchanging heterogeneous devices and different hardware resource classes always increase. When it comes to the development and the use of embedded networks in the WoT domain, Semantic Web technologies are seen as one way to tackle this complexity. For example, properties and capabilities of embedded devices may be semantically described in order to enable an effective search over different classes of devices, semantic integration may be deployed to integrate data produced by these devices, or embedded devices may be empowered to reason about semantic data in the context of a WoT application. Despite these possibilities, a wide adoption of Semantic Web or Linked Data technologies in the domain of embedded networks has not been established yet. One reason for this is an inefficient representation of semantic data. Serialisation formats of RDF data, such as for instance a plain-text XML, are not suitable for embedded devices. In this paper, we present an approach that enables constrained devices, such as microcontrollers with very limited hardware resources, to store and process semantic data. Our approach is based on the W3C Efficient XML Interchange (EXI) format. To show the applicability of the approach, we provide an EXI-based RDF Store and show associated evaluation results.".
- 70 abstract "Ad-hoc querying is crucial to access information from Linked Data, yet publishing queryable RDF datasets on the web is not a trivial exercise. The most compelling argument to support this claim is that the Web contains hundreds of thousands of data documents, while only 300 SPARQL endpoints are provided. Even worse, the SPARQL endpoints we do have are often unstable, may not comply with the standards, and may differ in supported features. In other words, hosting data online is easy, but publishing Linked Data via a queryable API such as SPARQL appears to be too difficult. As a consequence, in practice, there is no single uniform way to query the LOD cloud today. In this paper, we therefore combine a large-scale Linked Data publication project (LOD Laundromat) with a low-cost server-side interface (Triple Pattern Frag- ments), in order to bridge the gap between the web of downloadable data documents and the web of live queryable data. The result is a repeat- able, low-cost, open-source-based data publication process. To demon- strate its applicability, we made over 550.000 data documents available as data APIs, consisting of over 21 billion triples.".
- 72 abstract "A major challenge in information management today is the integration of huge amounts of data distributed across multiple data sources. One suggested approach to this problem is ontology-based data integration where legacy data systems are integrated via a common on- tology that represents a unified global view over all data sources. In many domains (e.g., biology, medicine) there exist established ontolo- gies to integrate data from existing data sources. However, data is often not natively born using these ontologies. Instead, much data resides in relational databases. Therefore, mappings that relate the legacy data sources to the ontology need to be constructed. Recent techniques and systems that automatically construct such mappings have been devel- oped. The quality metrics of these systems are, however, often only based on self-designed, highly biased benchmarks. This paper introduces a new publicly available benchmarking suite called RODI which is designed to cover a wide range of integration challenges in Relational-to-Ontology Data Integration scenarios. RODI provides a set of different relational data sources and ontologies as well as a scoring function with which the performance of relational-to-ontology mapping construction systems may be evaluated.".
- 83 abstract "We present a novel approach, called SPSC, for the efficient composition of semantic services in unstructured peer-to-peer (P2P) networks. With SPSC, the peers are jointly planning complex IOPE-chained workflows of services in OWL-S together with their respective signature variable bindings in order to answer a given composition request. For this purpose, each peer exploits its local observation-based knowledge about the semantic overlay to contribute to the actual semantic service workflow, as well as to support the further exploration of alternatives in a heuristically pruned search space within the given TTL. In particular, the local query routing decisions are based on the application of two strategies for a guided composition plan branching and memorization of potentially helpful services. We show that the composition planning process with SPSC is sound and provide a lower bound of its completeness in terms of total recall of solutions for a given composition request, if they exist in the network. Our experimental evaluation revealed that SPSC can achieve a high cumulative recall with relatively low traffic overhead.".
- 86 abstract "Scalability of the data access architecture in the Semantic Web is dependent on the establishment of caching mechanisms to take the load off of servers. Unfortunately, there is a chicken and egg problem here: Research, implementation, and evaluation of caching infrastructure is uninteresting as long as data providers do not publish relevant metadata. And publishing metadata is useless as long as there is no infrastructure that uses it. We show by means of a survey of live RDF data sources that caching metadata is prevalent enough already to be used in some cases. On the other hand, they are not commonly used even on relatively static data, and when they are given, they are very conservatively set. We point out future directions and give recommendations for the enhanced use of caching in the Semantic Web.".
- 89 abstract "The use of knowledge bases have been shown to improve performance in applications ranging from web search and event detection to entity recognition and disambiguation. More recently, knowledge bases have been used to address challenges in analyzing social data. A key challenge in this domain has been that of identifying the geographic footprint of online users in a social network such as Twitter. Existing approaches to predict the location of users, based on their tweets, solely rely on social media features or probabilistic language models. These approaches are purely data-driven and require large training dataset of geo-tagged tweets to build statistical models that predict the location of a user. As most Twitter users are reluctant to publish their location, the collection of geo-tagged tweets is a time intensive process. To address this issue, we present an alternative, knowledge-based approach to predict a Twitter user's location at the city level. We utilize Wikipedia as the source of our knowledge base by exploiting its hyperlink structure which alleviates the dependence on training data set. Our experiments, on a publicly available dataset demonstrate an improvement of 3\% in the accuracy of prediction, over the state of the art supervised techniques.".
- 90 abstract "With the adoption of RDF across several domains come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means to enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against 8 manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.".
- 11 abstract "The BBC has a wealth of permanently available programmes across a wide range of subjects with very low usage. We wanted to create a route into these programmes which balanced the need for curated, high quality journeys between programmes and the limited resource available for that curation effort. I will demonstrate ADA, a system created to create consistent, meaningful high-quality links between programmes with limited user input.".
- 16 abstract "Linked Data is in many cases generated from (semi-)structured data. This generation is supported by several tools, a number of which use a mapping language to facilitate the Linked Data mappings. However, knowledge about this mapping language and other used technologies is required in order to use the tools, limiting their adoption by non-Semantic Web experts. We propose the RMLEditor: a graphical user interface that utilizes graphs to easily visualize the mappings that deliver the RDF representation of the original data. The required amount of knowledge of the underlying mapping language and the used technologies is kept to a minimum. The RMLEditor lowers the barriers to create Linked Data by aiming to also facilitate the editing of mappings for non-Semantic Web experts.".
- 18 abstract "In this demo paper, a SPARQL Query Recommendation Tool (called SQRT) based on query reformulation is presented. Based on three steps, Generalization, Specialization and Evaluation, SQRT implements the logic of reformulating a SPARQL query that is satisfiable w.r.t a source RDF dataset, into others that are satisfiable w.r.t a target RDF dataset. In contrast with the existing approaches, SQRT aims at recommending queries whose reformulations: i) reflect as much as possible the same intended meaning, structure, type of results and result size as the original query and ii) do not require to have a mapping between the two datasets. Based on a set of criteria to measure the similarity between the initial query and the recommended ones, SQRT demonstrates the feasibility of the underlying query reformulation process, ranks appropriately the recommended queries, and offers a valuable support for query recommendations over an unknown and unmapped target RDF dataset, not only assisting the user in learning the data model and content of an RDF dataset, but also supporting its use without requiring the user to have intrinsic knowledge of the data.".
- 2 abstract "As Linked Data gains traction, the proper support for its publication and consumption is more important than ever. Even though there is a multitude of tools for preparation of Linked Data, they are still either quite limited, difficult to use or not compliant with recent W3C Recommendations. In this demonstration paper, we present LinkedPipes ETL, a lightweight, Linked Data preparation tool. It is focused mainly on smooth user experience including mobile devices, ease of integration based on full API coverage and universal usage thanks to its library of data processing units (DPUs). We build on our experience gained by development and use of UnifiedViews, our previous Linked Data ETL tool, and present four use cases in which our new tool excels in comparison.".
- 21 abstract "Over the past several years the amount of published open data has increased significantly. The majority of them are tabular data and transforming them to linked data requires powerful and flexible approaches for data cleaning, preparation and RDF-ization. This paper introduces Grafterizer --- a software framework developed to support data workers and data developers in the process of converting raw tabular data into linked data. Its main components include a powerful software library and DSL for data cleaning and RDF-ization, a front-end framework for interactive specification of data transformations and a back-end for management and execution of data transformations. The proposed demonstration will focus on Grafterizer's powerful features for data cleaning and RDF-ization in a scenario using data about the risk of failure of transport infrastructure components due to natural hazards.".
- 22 abstract "Sentiment analysis over social streams offers governments and organisations a fast and effective way to monitor the publics' feelings towards policies, brands, business, etc. In this paper we present SentiCircles, a platform that captures feedback from social media conversations and applies contextual and conceptual sentiment analysis models to extract and summarise sentiment from these conversations. It provides a novel sentiment navigation design where contextual sentiment is captured and presented at term/entity level, enabling a better alignment of positive and negative sentiment to the nature of the public debate. ".
- 27 abstract "Despite the increasing availability of RDF datasets, searching and browsing semantic data is still a daunting task for mainstream users. With PepeSearch, it is easy to query an arbitrary triple store without previous knowledge of RDF/SPARQL. PepeSearch offers a form-based interface with simple and intuitive elements such as drop-down menus or sliders that are automatically mapped from the ontological structures of the target dataset. In this demonstration we will show how to set up a PepeSearch instance, how to formulate queries and how to retrieve results.".
- 29 abstract "AutoRDF is an original open source framework that facilitates handling RDF data from a software engineering point of view. Built on top of the Redland software package, it bridges the gap between semantic web ontology and legacy object oriented languages, by providing transparent access to RDF resources from within standard C++ objects. Its use of widespread C++11, Boost and Redland makes it suitable not only for the desktop and server, but also for low computing power embedded devices. This framework is a result of the IDFRAud research project, where it is used to handle complex domain specific knowledge and make it available on smartphone-class devices.".
- 30 abstract "Creating applications on top of ontologies brings many difficulties. The applications are either generic (and thus not appealing for end-users), or bound to ontology structure, change of which breaks the application. We present JOPA, a tool that formalizes the contract between the application and the ontology, combining advantages of both worlds. JOPA is a persistence framework for Java applications, providing formalized object-ontological mapping, transactions, access to multiple repository contexts. The system is demonstrated on a real use-case of a reporting tool that we design and develop within Czech national projects focused on aviation safety.".
- 36 abstract "In this demo, we present an ontology-based visual query system, namely OptiqueVQS, extended for a stream query language called STARQL in the context of use cases provided by Siemens AG.".
- 44 abstract "In the recent years, the number of data sources available poses challenges to the automation of the data integration process. Lots of data, both on the Web and within an organization’s intranet, is already available but the number and diversity of sources makes it increasingly challenging for the user to gain a comprehensive overview of the available data. The goal of our work is to support the iterative extension of data tables, by assisting the user in the process of finding appropriate records for her quest. To this end we propose (i) a data search and integration framework specifically tailored for tables and (ii) an initial Open Source implementation of the framework as a RapidMiner extension. We will demonstrate the usage of the framework with a publicly available table dataset extracted from Wikipedia.".
- 45 abstract "In the smart city domain, many projects and works are generating essential information. Open and efficient sharing of this information can be beneficial for all parties ranging from researchers, engineers or even governments. To our knowledge, there is currently no full-fledged semantic platform which properly models this domain, publishes such information and allows data extraction using a standard query language. To complement this, we developed and deployed the Smart City Artifacts Web Portal. In this paper, we present our approach used within this platform and summaries some of its technical features and applications.".
- 48 abstract "We present a live demo of a use case and a technical solution that addresses the problem of organizing the collaborative ontology development with deliverables including the diagrams and various views of the data model. The use case describes the real life situation, in which the geographically distributed team was challenged with a task of producing the open budget ontology and consequently was to select the tool set to support such development. The technical solution is based on the combination of 3 basic tools: Prot\'{e}g\'{e} - to provide a collaborative environment for ontology creation and modification, Ontodia.org - to visualize and publish results in a form of diagrams and GitHub - to host the repository of the project, whilst Ontodia is integrated with the last. The preliminary version of the produced ontology can be accessed at: \url{https://github.com/k0shk/pfontology}. Ontodia with GitHub integration capabilities is fully operational and can be tested here: \url {http://www.ontodia.org}. ".
- 5 abstract "In this demonstrator we introduce DataGraft – a Data-as-a-Service platform for hosted open data management. DataGraft provides data transformation, publish-ing and hosting capabilities that aim to simplify the data publishing lifecycle for data workers (i.e., open data publishers, linked data developers, data scientists). This demonstrator highlights the key features supported by the current DataGraft platform by exemplifying a data transformation and publishing use case from the domain of property-related data.".
- 54 abstract "Question answering (QA) systems focus at making sense out of data via a easy-to-use interface.However, these systems are very complex and integrate a lot of technology tightly.Previously presented QA systems are mostly singular and monolithic implementations.Hence, the reusability is limited. In contraction we follow the research agenda of establishing an ecosystem for components of QA systems which will enable the QA community to elevate reusability of such components and to intensify the research activities. In this paper, we present a reference implementation of the Qanary methodology for creating QA systems. Qanary is a vocabulary-driven approach built on top of linked data technology. Here, we present a fast-track approach following a light-weight, message-driven, component-oriented architecture. Hence, researchers will be enabled to establish new research ideas which will increase the efficiency and boost the research activities in the field of question answering.".
- 56 abstract "The Linked Data paradigm has changed how data on the Web is published, retrieved, and interlinked, thereby enabling modern question answering systems and contributing to the spread of open data. With the increasing size, interlinkage, and complexity of the Linked Data cloud, the focus is now shifting towards strategies and technologies to en- sure that Linked Data can also succeed as an infrastructure. This raises questions about the sustainability of query endpoints, the reproducibility of scientific experiments conducted using Linked Data, the lack of established quality metrics, as well as the need for improved ontology alignment and query federation techniques. One core issue that needs to be addressed is the trade-off between storing data and computing them on-demand. Data that is derived from already stored data, changes frequently in space and time, or is the output of some workflow, should be computed. However, such functionality is not readily available on the Linked Data cloud today. To address this issue, we have developed a transparent SPARQL proxy that enables the on-demand computation of Linked Data together with the provenance information required to understand how the data were derived. Here, we demonstrate how the proxy works under the hood by applying it to the computation of cardinal directions between geographic features in DBpedia.".
- 57 abstract "In this demo, we stretch the capabilities of RDF as a metadata representation format, and we use it to encode digital music. Digital music is broadly used today in many professional music production environments. For decades, MIDI (Musical Instrument Digital Interface) has been the standard for digital music exchange between musicians and devices, albeit not in a Web friendly way. Here, we show the potential of expressing digital music as Linked Data, using our midi2rdf suite of tools to convert and stream digital music in MIDI format to RDF. The conversion allows for lossless round tripping: we can reconstruct a MIDI file identical to the original using its RDF representation. The streaming uses a novel generative audio matching algorithm that enables us to broadcast, with very low latency, RDF triples describing MIDI events coming from arbitrary analog instruments.".
- 6 abstract "Existing interlinking tools focus on finding similarity relationships between entities of distinct RDF datasets by generating owl:sameAs links. These approaches address the detection of equivalence relations between entities. However, in some contexts, more complex relations are required, and the links to be defined follow more sophisticated patterns. This paper introduces Link++, an approach that enables the discovery of complex links in a flexible manner. Link++ enables the users to generate rich links by defining custom functions and linking patterns that fit their needs.".
- 9 abstract "There is a need for being able to effectively demonstrate the benefits of publishing Linked Data. There are already many datasets and they are no longer limited to research based data sources. Governments and even companies start publishing Linked Data as well. However, a tool, which would be able to immediately demonstrate the Linked Data benefits to those, who still need convincing, was missing. In this paper, we demonstrate LinkedPipes Visualization, a tool based on our previous work, the Linked Data Visualization Model. Using this tool, we show four simple use cases that immediately demonstrate the Linked Data benefits. We demonstrate the value of providing dereferenceable IRIs and using vocabularies standardized as W3C Recommendations on use cases based on SKOS and the RDF Data Cube Vocabulary, providing data visualizations on one click. LinkedPipes Visualization can be extended to support other vocabularies through additional visualization components.".
- 122 abstract "The maritime security domain is challenged by a number of data analysis needs focusing on increasing the maritime situation awareness, i.e., detection and analysis of abnormal vessel behaviors and suspicious vessel movements. The need for efficient processing of dynamic and/or static vessel data that come from different heterogeneous sources is emerged. In this paper we describe how we address the challenge of combining and processing real-time and static data from different sources using ontology-based data access techniques, and we explain how the application of Semantic Web technologies increases the value of data and improves the processing workflow in the maritime domain. ".
- 14 abstract "The publishing industry is undergoing major changes. These changes are mainly based on technical developments and related habits of information consumption. Wolters Kluwer already engaged in new solutions to meet these challenges and to improve all processes of generating good quality content in the backend on the one hand and to deliver information and software in the frontend that facilitates the customer's life on the other hand. JURION is an innovative legal information platform developed by Wolters Kluwer Germany (WKD) that merges and interlinks over one million documents of content and data from diverse sources such as national and European legislation and court judgments, extensive internally authored content and local customer data, as well as social media and web data (e.g. DBpedia). In collecting and managing this data, all stages of the Data Lifecycle are present – extraction, storage, authoring, interlinking, enrichment, quality analysis, repair and publication. Ensuring data quality is a key step in the JURION data lifecycle. In this industry paper we present two use cases for verifying quality: 1) integrating quality tools in the existing software infrastructure and 2) improving the data enrichment step by checking the external sources before importing them in JURION. We open-source part of our extensions and provide a screencast with our prototype in action.".
- 158 abstract "This paper presents the WarSampo system for publishing collections of heterogeneous, distributed data about the Second World War on the Semantic Web. WarSampo is based on harmonizing massive datasets using event-based modeling, which makes it possible to enrich datasets semantically with each others' contents. WarSampo has two components: First, a Linked Open Data (LOD) service WarSampo Data for Digital Humanities (DH) research and for creating applications related to war history. Second, a semantic WarSampo Portal has been created to test and demonstrate the usability of the data service. The WarSampo Portal allows both historians and laymen to study war history and destinies of their family members in the war from different interlinked perspectives. Published in November 2015, the WarSampo Portal had some 20,000 distinct visitors during the first three days, showing that the public has great interest in these kind of applications.".
- 21 abstract "Due to the increasing amount of Linked Data openly published on the Web, user-facing Linked Data Applications (LDAs) are gaining momentum. One of the major entrance barriers for Web developers to contribute to this wave of LDAs is the required knowledge of Semantic Web technologies such as the RDF data model and SPARQL query language. This paper presents an adaptive component-based approach together with its open source implementation for creating flexible and reusable Semantic Web interfaces driven by Linked Data. Linked Data-driven (LD-R) Web components abstract the complexity of the underlying Semantic Web technologies in order to allow reuse of existing Web components in LDAs, enabling Web developers who are not experts in Semantic Web to develop interfaces that view, edit and browse Linked Data. In addition to the modularity provided by the LD-R components, the proposed RDF-based configuration method allows application assemblers to reshape their user interface for different use cases, by either reusing existing shared configurations or by creating their proprietary configurations.".
- 234 abstract "Drug-Drug Interactions (DDIs) are a major cause of preventable adverse drug reactions (ADRs), causing a significant burden on the patients' health and the healthcare system. It is widely known that clinical studies cannot sufficiently and accurately identify DDIs for new drugs before they are made available on the market. In addition, existing public and proprietary sources of DDI information are known to be incomplete and/or inaccurate and so not reliable. As a result, there is an emerging body of research on in-silico prediction of drug-drug interactions. We present Tiresias, a framework that takes in various sources of drug-related data and knowledge as inputs, and provides DDI predictions as outputs. The process starts with semantic integration of the input data that results in a knowledge graph describing drug attributes and relationships with various related entities such as enzymes, chemical structures, and pathways. The knowledge graph is then used to compute several similarity measures between all the drugs in a scalable and distributed framework. The resulting similarity metrics are used to build features for a large-scale logistic regression model to predict potential DDIs. We highlight the novelty of our proposed approach and perform thorough evaluation of the quality of the predictions. The results show the effectiveness of Tiresias in both predicting new interactions among existing drugs and among newly developed and existing drugs.".
- 37 abstract "This paper describes the development of an OWL DL ontology from the so-cial science codebook (schema) developed by the Seshat: Global History Databank. The ontology describes human history as a set of over 1500 time series variables and supports expression of uncertainty, temporal scoping, annotations and bibliographic references. The Seshat ontology was developed to transition from traditional social science data collection and storage tech-niques to an RDF-based semantic representation in order to support auto-mated generation of data entry and validation tools, data quality manage-ment processes, rich interlinking with the web of data, and management of the data curation lifecycle. This ontology engineering and codebook translation exercise identified several pitfalls in modelling social science codebooks with semantic web technologies; provided insights into the practical application of OWL DL to complex, real-world modelling challenges; and has enabled the construction of new, RDF-based tools to support the large-scale data curation effort in Seshat. The Seshat ontology exhibits a set of ontology design patterns for model-ling annotated and bibliographically referenced data, uncertainty or temporal bounds in standard RDF triplestores without requiring custom reasoning or storage engines. These patterns and the lessons learned from translating the Seshat codebook into RDF provide some generally useful guidance for the development and deployment of RDF models in the social sciences. Our ontology has been developed to support the large scale data collection and curation effort involved in building the Seshat: Global History Data-bank. OWL-based quality management will assure the data is suitable for statistical analysis. Publication of Seshat as high-quality, linked open data will enable other researchers to build on it.".
- 49 abstract "Although several tools have been implemented to generate Linked Data from raw data, users still need to be aware of the underlying technologies and Linked Data principles to use them. Mapping languages enable to detach the mapping definitions from the implementation that executes them. However, no thorough research has been conducted on how to facilitate the editing of mappings. We propose the RMLEditor, a visual graph-based user interface, which allows users to easily define the mappings that deliver the RDF representation of the corresponding raw data. Neither knowledge of the underlying mapping language nor the used technologies is required. The RMLEditor aims to facilitate the editing of mappings, and thereby lowers the barriers to create Linked Data. The RMLEditor is developed for use by data specialists who are partners of (i) companies-driven pilot and (ii) a community group. The current version of the RMLEditor was validated: participants indicate that it is adequate for its purpose and the graph-based approach enables users to conceive the linked nature of the data.".
- 50 abstract "Cultural heritage institutions have recently started to explore the added value of sharing their data, opening to initiatives that are using the Linked Open Data cloud to integrate and enrich metadata of their cultural heritage collections. However, each museum and each collection shows peculiarities which make it difficult to generalize this process and offer one-size-fits-all solutions. In this paper, we report the integration, enrichment and interlinking activities of metadata from a small collection of verbo-visual artworks in the context of the Verbo-Visual-Virtual project. We investigate how to exploit Semantic Web technologies and languages combined with natural language processing methods to transform and boost the access to documents providing cultural information, i.e., event announcements, artist descriptions, collection notices. We also discuss the open challenges raised by working with a small collection including little-known artists and the definition of what an artwork is, for which additional data can be hardly retrieved from the Web.".
- 10 abstract "In this paper we present an ongoing work on building a repository of knowledge about objects typically found in homes, their usual locations and usage. We extract an RDF knowledge base by automatically reading text on the Web and applying simple inference rules. The obtained common sense object relations are ready to be used in a domestic robotic setting, e.g. “a frying pan is usually located in the kitchen”.".
- 12 abstract "The discovery of optimal or close to optimal query plans for SPARQL queries is a difficult and challenging problem for query optimisers of RDF engines. Despite the growing volume of work on optimising SPARQL query answering, using heuristics or data statistics (such as cardinality estimations) there is little effort on the use of OWL constructs for query optimisation. OWL axioms can be the basis for the development of schema-aware optimisation techniques that will allow significant improvements in the performance of RDF query engines when used in tandem with data statistics or other heuristics. The aim of this paper is to show the potential of this idea, by discussing a diverse set of cases that depict how schema information can assist SPARQL query optimisers.".
- 15 abstract "Associations, which are one of the key ingredients of human intelligence and thinking, are not easily accessible to the Semantic Web community. High quality RDF datasets of this kind are missing. In this paper we generate such a dataset by transforming 788 K free-text associations of the Edinburgh Associative Thesaurus (EAT) into RDF. Furthermore, we provide a verified mapping of strong textual associations from EAT to DBpedia Entities with the help of a semi-automatic mapping approach. Both generated datasets are made publicly available and can be used as a benchmark for cross-type link prediction and pattern learning. ".
- 17 abstract "In operations of increasingly complex telecommunication networks, characterization of a system state and choosing optimal operation in it are challenges. One possible approach is to utilize statistical and uncertain information in the network management. This paper gives an overview of our work in which a Markov Logic Network model (MLN) is used for mobile network analysis with an RDF-based faceted search interface to monitor and control the behavior of the MLN reasoner. Our experiments, based on a prototype implementation, indicate that the combination of MLN and semantic web technologies can be effectively utilized in network status characterization, optimization and visualization.".
- 19 abstract "Dem@Home is an ambient assisted living framework to support intelligent de-mentia care, by integrating a variety of ambient and wearable sensors together with sophisticated, interdisciplinary methods, such as image and semantic analy-sis. Semantic Web technologies, such as OWL 2, are extensively employed in the framework to represent sensor observations and application domain specifics as well as to implement hybrid activity recognition and problem detection solutions. Complete with tailored user interfaces, Dem@Home supports accurate monitor-ing of multiple aspects, such as physical activity, sleep, complex daily activities and problems, leading to adaptive interventions for the optimal care of dementia, validated in four home pilots.".
- 20 abstract "An increasing number of RDF datasets are available on the Web. In order to query these datasets, users must have information about their content as well as some knowledge of a query language such as SPARQL. Our goal is to facilitate the interrogation of these datasets. In this paper, we propose an approach for enabling users to search in RDF data using keywords. Moreover, our approach allows the definition of patterns to include some external knowledge during the search process which increases the quality of the results. ".
- 23 abstract "We discus the problem of explaining relationships between two unconnected entities in a knowledge graph. We frame it as a path ranking problem and propose a path ranking mechanism that utilizes features such as specificity, connectivity, and path cohesiveness. We also report results of a preliminary user evaluation and discuss a few example results.".
- 25 abstract "TempoWordNet (TWn) has recently been proposed as an extension of WordNet, where each synset is augmented with its temporal connotation: past, present, future or atemporal. However, recent uses of TWn show contrastive results and motivate the construction of a more reliable resource. For that purpose, we propose an iterative strategy that temporally extends glosses based on TWn{t} to obtain a potentially more reliable TWn{t+1}. Intrinsic and extrinsic evaluation results show improvements when compared to previous versions of TWn.".
- 3 abstract "Traditional RDF stream processing engines work completely server-side, which contributes to a high server cost. For allowing a large number of concurrent clients to do continuous querying, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this poster, we give the overview of a client-side RDF stream processing engine on top of TPF. Our experiments show that our solution significantly lowers the server load while increasing the load on the clients. Preliminary results indicate that our solution moves the complexity of continuously evaluating real-time queries from the server to the client, which makes real-time querying much more scalable for a large amount of concurrent clients when compared to the alternatives.".
- 32 abstract "Applications built on top of the Semantic Web are emerging as a novel solution in different areas, such as – among others – decision making and route planning. However, to connect results of these solutions – i.e., the semantically annotated data – with real-world applications, this semantic data needs to be connected to actionable events. A lot of work has been done (both semantically as non-semantically) to describe and define Web services, but there is still a gap on a more abstract level, i.e., describing interfaces independent of the technology used. In this paper, we present a data model, specification, and ontology to semantically declare and describe functions independently of the used technology. This way, we can declare and use actionable events in semantic applications, without restricting ourselves to programming language-dependent implementations. The ontology allows for extensions, and is proposed as a possible solution for semantic applications in various domains.".
- 33 abstract "Fast and correct identification of named entities in queries is crucial for query understanding and to map the query to information in structured knowledge base. Most of the existing work have focused on utilizing search logs and manually curated knowledge bases for entity linking and often involve complex graph operations and are generally slow. We describe a simple, yet fast and accurate, probabilistic entity-linking algorithm used in enterprise settings where automatically constructed, domain specific Knowledge Graphs are used. In addition to the linked graph structure, textual evidence from the domain specific corpus is also utilized to improve the performance.".
- 35 abstract "Many solutions have been developed to convert data to RDF. A common task during this conversion is applying data manipulation functions to obtain the desired output. Depending on the data format, one can rely on the underlying technology, such as RDBMS for relational databases or XQuery for XML, to manipulate – to a certain extent – the data while generating RDF. For CSV files, however, there is no such underlying technology. One has to resort to pre- or post-processing techniques when data manipulation is needed, which renders the process of generating RDF more complex (in terms of number of steps), and therefore also less traceable and transparent. Another solution is to declare functions in mappings. KR2RML provides data manipulation functions as part of the mapping, but due to its complex format, it is difficult to create or maintain mappings without their editor. In this paper, we propose a method to incorporate functions into mapping languages in a more amenable way. ".
- 39 abstract "Building rich axiomatic ontologies automatically is a step towards the realization of the Semantic Web. In this paper, we describe an automatic approach to extract complex classes’ axioms from Wikipedia definitions based on recurring syntactic structures. The objective is to enrich DBpedia concept descriptions with formal definitions. We leverage RDF to build a sentence representation and SPARQL to model patterns and their transformations, thus easing the querying of syntactic structures and the reusability of the extracted patterns. Our preliminary evaluation shows that we obtain satisfying results, which will be further improved.".
- 41 abstract "In an attempt to put a Semantic Web-layer that provides linguistic analysis and discourse information on top of digital content, we develop a platform for digital curation technologies. The platform offers language-, knowledge- and data-aware services as a flexible set of workflows and pipelines for the efficient processing of various types of digital content. The platform is intended to enable human knowledge workers to get a grasp and understand the contents of large document collections in an efficient way so that they can curate, process and further analyse the collection according to their sector-specific needs.".
- 42 abstract "Embedded markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data. However, RDF statements extracted from markup are fundamentally different to traditional RDF graphs: entity descriptions are flat, facts are highly redundant and granular, and co-references are very frequent yet explicit links are missing. Therefore, carrying out typical entity-centric tasks such as retrieval and summarisation cannot be tackled sufficiently with state of the art methods. We present an entity summarisation approach that overcomes such issues through a combination of entity retrieval and summarisation techniques geared towards the specific challenges associated with embedded markup. We perform a preliminary evaluation on a subset of the Web Data Commons dataset and show improvements over existing entity retrieval baselines. In addition, an investigation into the coverage and complementary of facts from the constructed entity summaries shows potential for aiding tasks such as knowledge base population.".
- 47 abstract "This paper presents an approach for automatically validating candidate hierarchical relations extracted from parallel enumerative structures. It relies on the discursive properties of these structures and on the combination of external resources of different nature, a semantic network and a distributional resource. The results show an accuracy of between .5 and .67 depending on the experimental settings.".
- 51 abstract "We address the problem of ranking relationships in an automatically constructed knowledge graph. We propose a probabilistic ranking mechanism that utilizes entity popularity, entity affinity, and support from text corpora for the relationships. Results obtained from preliminary experiments on a standard dataset are encouraging and show that our proposed ranking mechanism can find more informative and useful relationships compared to a frequency based approach. ".
- 53 abstract "TheSemanticWebDogFood(SWDF)is the reference linked dataset of Semantic Web community about papers, people, organisations, and events related to its academic conferences. In this paper we analyse the existing problems, of generating, representing and maintaining Linked Data for the SWDF. Accordingly, we discuss a refactoring of the Semantic Web Conference Ontology by adopting best ontology design practices (e.g., Ontology Design Patterns, ontology reuse and interlink- ing). We regenerate metadata for a a set of conferences already existing in SWDF, using cLODg (conference Linked Open Data generator), an Open Source workflow which adopts the proposed refactoring.".
- 65 abstract "The increase of ICT infrastructure in hospitals offer opportunities for cost reduction by optimizing workflows, while maintaining quality of care. This work-in-progress poster details the AORTA system, which is a semantic platform to optimize transportation task scheduling and execution in hospitals. It provides a dynamic scheduler with an up-to-date view about the current context by performing semantic reasoning on the information provided by the available software tools and smart devices. Additionally, it learns semantic rules based on historical data in order to avoid future delays in transportation time.".
- 106 abstract "The problem of updating ontologies has received increased attention in recent years. In the approaches proposed so far, either the update language is restricted to (sets of) atomic updates, or, where the full SPARQL update language is allowed, the TBox language is restricted so that no inconsistencies can arise. In this paper we discuss directions to overcome these limitations. Starting from a DL-Lite fragment covering RDFS and concept disjointness axioms, we define three semantics for SPARQL update: under cautious semantics, inconsistencies are resolved by rejecting updates potentially introducing conflicts; under brave semantics, instead, conflicts are overridden in favor of new information where possible; finally, the fainthearted semantics is a compromise between the former two approaches, designed to accommodate as much of the new information as possible, as long as consistency with the prior knowledge is not violated. We show how these semantics can be implemented in SPARQL via rewritings of polynomial size and draw first conclusions from their practical evaluation.".
- 111 abstract " The Semantic Web is founded on a number of Formal Languages (FL) whose benefits are precision, lack of ambiguity, and ability to automate reasoning tasks such as inference or query answering. This however poses the challenge of mediation between machines and users because the latter generally prefer Natural Languages (NL) for accessing and authoring knowledge. In this paper, we introduce the design pattern N<A>F based on Abstract Syntax Trees (AST), Huet's zippers and Montague grammars to zip together a natural language and a formal language. Unlike question answering, translation does not go from NL to FL, but as symbol N<A>F suggests, from ASTs (A) of an intermediate language to both NL (N<A) and FL (A>F). ASTs are built interactively and incrementally through a user-machine dialog where the user only sees NL, and the machine only sees FL.".
- 113 abstract "In media monitoring tasks users have a clearly defined information need to find so far unknown statements regarding certain entities or relations mentioned in natural-language text. However, commonly used keyword-based search technologies are focused on finding relevant documents and cannot judge the novelty of statements contained in the text. In this work, we propose a new semantic novelty measure that allows to retrieve statements, which are both novel and relevant, from natural-language sentences, e.g., found in news articles. Relevance is defined by a semantic query of the user, while novelty is ensured by checking whether the extracted statements are related, but non-existing in a knowledge base containing the currently known statements. An evaluation performed on English news texts and on CrunchBase as the knowledge base demonstrates the effectiveness, unique capabilities and future challenges of this novel approach to novelty.".
- 117 abstract "Tracking the provenance of information published on the Web is of crucial importance for effectively supporting trustworthiness, accountability and repeatability in the Web of Data. Although extensive work has been done on computing the provenance for SPARQL queries, little research has been conducted forthe case of SPARQL updates. This paper proposes a new provenance model that borrows properties from both how and where provenance models, and is suitable for capturing the triple and attribute level provenance of data introduced via SPARQL updates. To the best of our knowledge, this is the first model that deals with the provenance of SPARQL updates using algebraic expressions, in the spirit of the well-established model of provenance semirings. Additionally, we present an algorithm that records the provenance of SPARQL update results, and a reconstruction algorithm that uses this provenance to identify a SPARQL update that is compatible to the original one, given only the recorded provenance. Our approach is implemented and evaluated on top of Virtuoso Database Engine.".
- 130 abstract "In this paper, we present an approach for automatically recommending categories for geospatial entities, based on already existing annotated entities. Our goal is to facilitate the annotation process in crowdsourcing map initiatives such as OpenStreetMap, so that more accurate annotations are produced for the newly created spatial entities, while at the same time increasing the reuse of already existing tags. We select and construct a set of training features to represent the attributes of the geospatial entities and to capture their relation with the categories they are annotated with. These features include spatial, textual and semantic properties of the entities. We evaluate four different approaches, namely SVM, kNN, clustering+SVM and clustering+kNN, on several combinations of the defined training features and we examine which configurations of the algorithms achieve the best results. The presented work is deployed in OSMRec, a plugin for the JOSM tool that is commonly used for editing content in OpenStreetMap.".
- 139 abstract "Product ads are a popular form of search advertizing offered by major search engines, including Yahoo, Google and Bing. Unlike traditional search ads, product ads include structured product specifications, which allow search engine providers to perform better keyword-based ad retrieval. However, the level of completeness of the product specifications varies and strongly influences the performance of ad retrieval. On the other hand, online shops are increasing adopting semantic markup languages such as Microformats, RDFa and Microdata, to annotate their content, making large amounts of product description data publicly available. In this paper, we present an approach for enriching product ads with structured data extracted from thousands of online shops offering Microdata annotations. In our approach we use structured product ads as supervision for training feature extraction models able to extract attribute-value pairs from unstructured product descriptions. We use these features to identify matching products across different online shops and enrich product ads with the extracted data. Our evaluation on three product categories related to electronics show promising results in terms of enriching product ads with useful product data.".
- 140 abstract "With the increasing volume of RDF data, the diverse links and the large amount of linked entities make it diffcult for users to traverse the Linked Data. As semantic link and class of linked entities are two key facets to help users navigate, clustering links and classes can offer effective ways of navigating over Linked Data. In this paper, we propose a co-clustering approach to provide users with an iterative entity navigation. It clusters both links and classes simultaneously utilizing both the relationship between link and class, and the intra-link relationship and intra-class relationship. We evaluate our approach on a real-world data set and the experimental results demonstrate the effectiveness of our approach. A user study is conducted in a prototype system to show that our approach provides useful support for iterative entity navigation.".
- 148 abstract "In recent years, an increasing number of semantic data sources are published on the Web. These sources are further interlinked to form the Linking Open Data (LOD) cloud. To make full use of these data sets, it is a strong demand to learn their data qualities. Researchers have proposed several metrics and developed tools to measure qualities of the data sets in LOD from different dimensions. However, there exist few studies on evaluating data set quality from users' usability perspective while usability has great impacts on the spreading and re-use of LOD data sets. On the other hand, usability is well-studied in the area of software quality. In the newly published standard ISO/IEC 25010, usability is further broadened to include the notion “quality in use'' besides the other two factors namely internal and external. In this paper, we firstly adapt the notions and the methods used in software quality to assessing the data set quality. Secondly, we formally define two quality dimensions namely Queriability and Informativity from the perspective of quality in use. The two proposed dimensions correspond to querying and answering, which are the most frequent usage scenarios for accessing LOD data sets. Then we provide a series of metrics to measure the two dimensions. At last, we apply the metrics to two representative data sets in LOD (i.e., YAGO and DBpedia). In the evaluating process, we select dozens of questions from both QALD and WebQuestions, and ask a group of users to construct queries as well as to check answers with the help of our usability testing tool. The findings during the assessment not only illustrate the capability of our method and metrics, but also give new insights of data quality of the two knowledge bases.".
- 161 abstract "Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. It finds its application in the extraction of structured data in RDF (Resource Description Framework) from textual documents, but equally so in facilitating artificial intelligence applications, such as Semantic Search, Reasoning and Question & Answering. In this work, we propose DoSeR (Disambiguation of Semantic Resources), a (named) entity disambiguation framework that is knowledge-base-agnostic in terms of RDF (e.g. DBpedia) and entity-annotated document knowledge bases (e.g. Wikipedia). Initially, our framework automatically generates semantic entity embeddings given one or multiple knowledge bases. Then, DoSeR accepts documents with a given set of surface forms as input and collectively links them to an entity in a knowledge base with a graph-based approach. We evaluate DoSeR on seven different data sets against publicly available, state-of-the-art (named) entity disambiguation frameworks. Our approach outperforms the state-of-the-art approaches that make use of RDF knowledge bases and/or entity-annotated document knowledge bases by up to 10% F1 measure.".
- 163 abstract "Latent embedding models, based for example on matrix and tensor factorization, are the basis of state-of-the art statistical solutions for modelling Knowledge Graphs and Recommender Systems to predict new links between known entities and/or relations. To be able to perform predictions for new entities and/or relation types, however, the model often has to be retrained completely to derive the new latent embeddings. This could be a potential limitation for data sets when fast predictions for new entities and relation types are required. In this paper we propose approaches that can map new entities into the existing latent embedding space learned from factorization models. Without retraining of any kind, our model is solely based on the observable ---even incomplete--- features of the new entities e.g. a subset of observed links to other known entities. We show that these mapping approaches are efficient and are applicable to a wide variety of existing factorization models, including nonlinear models. We perform experiments on multiple real-world datasets and evaluate the performances from different aspects.".
- 166 abstract "This paper deals with an ontology-driven approach for semantic annotation of documents from a corpus where each document describes an entity of a same domain. Our goal is to annotate each document with concepts being too specific to be explicitly mentioned into the text. The only thing we know about the concepts is their label. They have no definitions. Moreover, their characteristics in the texts are incomplete. We propose an ontology-based approach, named SAUPODOC, aiming to perform this particular annotation process by combining several approaches. Indeed, SAUPODOC relies on a domain ontology relative to the field under study, which has a pivotal role, on its population with property assertions coming from documents and external resources, and its enrichment with formal specific concept definitions. Experiments have been carried out in two application domains, showing the benefit of the approach compared to well-known classifiers.".
- 168 abstract "In this paper, we investigate the Normalized Semantic Web Distance (NSWD), a semantics-aware distance measure between two concepts in a knowledge graph. Our measure advances the Normalized Web Distance, a recently established distance between two textual terms, to be more semantically aware. In addition to the theoretic fundamentals of the NSWD, we investigate its properties and qualities with respect to computation and implementation. We investigate three variants of the NSWD that make use of all semantic properties of nodes in a knowledge graph. Our performance evaluation based on the Miller-Charles benchmark shows that the NSWD is able to correlate with human similarity assessments on both Freebase and DBpedia knowledge graphs with values up to 0.69. Moreover, we verified the semantic awareness of the NSWD on a set of 20 unambiguous concept-pairs. We conclude that the NSWD is a promising measure with (1) a reusable implementation across knowledge graphs, (2) sufficient correlation with human assessments, and (3) awareness of semantic differences between ambiguous concepts.".
- 17 abstract "The Web of data is growing continuously with respect to both the size and number of the datasets published. Porting a dataset to five-star Linked Data however requires the publisher of this dataset to link it with the already available linked datasets. Given the size and growth of the Linked Data Cloud, the current mostly manual approach used for detecting relevant datasets for linking is obsolete. We study the use of topic modelling for dataset search experimentally and present TAPIOCA, a linked dataset search engine that provides data publishers with similar existing datasets automatically. Our search engine uses a novel approach for determining the topical similarity of datasets. This approach relies on probabilistic topic modelling to determine related datasets by relying solely on the metadata of datasets. We evaluate our approach on a manually created gold standard and with a user study. Our evaluation shows that our algorithm outperforms a set of comparable baseline algorithms including standard search engines significantly by 6% F1-score. Moreover, we show that it can be used on a large real world dataset with a comparable performance.".
- 174 abstract "Natural Language Query Formalization involves semantically parsing queries in natural language and translating them into their corresponding formal representations. It is a key component for developing question-answering (QA) systems on RDF data. The chosen formal representation language in this case is often SPARQL. In this paper, we propose a novel framework, called AskNow, where users can pose queries in English to a target RDF knowledge base (e.g. DBpedia), which are first normalized into an intermediary canonical syntactic form, called Normalized Query Structure (NQS), and then translated into SPARQL queries. NQS facilitates the identification of the desire (or expected output information) and the user-provided input information, and establishing their mutual semantic relationship. At the same time, it is sufficiently adaptive to query paraphrasing. We have empirically evaluated the framework with respect to the syntactic robustness of NQS and semantic accuracy of the SPARQL translator on standard benchmark datasets.".
- 177 abstract "Document retrieval is the task of returning relevant textual resources for a given user query. In this paper, we investigate whether the semantic analysis of the query and the documents, obtained exploiting state-of-the-art Natural Language Processing techniques (e.g., Entity Linking, Frame Detection) and Semantic Web resources (e.g., Yago, DBpedia), can improve the performances of the traditional term-based similarity approach. Our experiments, conducted on a recently released document collection, show that Mean Average Precision (MAP) increases of around 4 percentage points when combining textual and semantic analysis, thus suggesting that semantic content can effectively improve the performances of Information Retrieval systems.".
- 179 abstract "Geo-ontologies are becoming first-class artifacts in spatial data management because of their ability to represent places and points of interest. Several general-purpose geo-ontologies are available and widely employed to describe spatial entities across the world. The cultural, contextual and geographic differences between locations, however, call for more specialized and spatially-customized geo-ontologies. In order to help ontology engineers in (re)engineering geo-ontologies, spatial data analytics can provide interesting insights on territorial characteristics, thus revealing peculiarities and diversities between places. In this paper we propose a set of spatial analytics methods and tools to evaluate existing instances of a general-purpose geo-ontology within two distinct urban environments, in order to support ontology engineers in two tasks: (1) the identification of possible location-specific ontology restructuring activities, like specializations or extensions, and (2) the specification of new potential concepts to formalize neighborhood semantic models. We apply the proposed approach to datasets related to the cities of Milano and London extracted from LinkedGeoData, we present the experimental results and we discuss their value to assist geo-ontology engineering.".
- 184 abstract "It is very challenging to access the knowledge expressed within (big) data sets. Question answering (QA) aims at making sense out of data via a simple-to-use interface. However, QA systems are very complex and earlier approaches are mostly singular and monolithic implementations for QA in specific domains. Therefore, it is cumbersome and inefficient to design and implement new or improved approaches, in particular as many components are not reusable. Hence, there is a strong need for enabling best-of-breed QA systems, where the best performing components are combined, aiming at the best quality achievable in the given domain. Taking into account the high variety of functionality that might be of use within a QA system and therefore reused in new QA systems, we provide an approach driven by a core QA vocabulary that is aligned to existing, powerful ontologies provided by domain-specific communities. We achieve this by a methodology for binding existing vocabularies to our core QA vocabulary without re-creating the information provided by external components. We thus provide a practical approach for rapidly establishing new (domain-specific) QA systems, while the core QA vocabulary is re-usable across multiple domains. To the best of our knowledge, this is the first approach to open QA systems that is agnostic to implementation details and that inherently follows the linked data principles.".
- 188 abstract "Linked Data Fragment (LDF) approach promotes a new trade-off between performance and data availability for querying Linked Data. If data providers’ HTTP caches plays a crucial role in LDF performances, LDF clients are also caching data during SPARQL query processing. Unfortunately, as these clients do not collaborate, they cannot take advantage of this large decentralized cache hosted by clients. In this paper, we propose CyCLaDEs an overlay network based on LDF fragments similarity. For each LDF client, CyCLaDEs builds a neighborhood of LDF clients hosting related fragments in their cache. During query processing, LDF clients try to resolve the query in the neighborhood and eventually request the LDF server, if neighborhood cannot answer. Experimental results show that CyCLaDEs is able to handle a significant amount of LDF query processing and provide a more specialized cache on client-side.".
- 191 abstract "Assessing the relatedness of documents is at the core of many applications such as document retrieval and recommendation. Most similarity approaches operate on word-distribution based document representations - fast to compute, but problematic when documents differ in language, vocabulary or type and neglecting the rich relational knowledge available in Knowledge Graphs. In contrast, graph-based document models can leverage valuable knowledge about relations between entities - however, due to expensive graph operations, similarity assessments tend to become infeasible in many applications. This paper presents an efficient semantic similarity approach exploiting explicit hierarchical and transversal relations. We show in our experiments that (i) our similarity measure provides a significantly higher correlation with human notions of document similarity than comparable measures, (ii) this also holds for short documents with few annotations, (iii) document similarity can be calculated efficiently compared to other graph-traversal based approaches.".