ScholarlyData |

ScholarlyData

Matches in ScholarlyData for { ?s <https://w3id.org/scholarlydata/ontology/conference-ontology.owl#abstract> ?o. }

Showing items 1 to 100 of ±3,700 with 100 items per page.

10 abstract "Music is not a patio chair or a cutlery set, and it should not be recommended the same way. Music is a highly complex signal that is part of our cultural framework and has a prominent place in our everyday life. Yet dealing with the millions of songs we have access to and finding the ones that will enliven our day is essential. What could once be solved by advice from one’s best friends has evolved into a large-scale internet-era problem, and the music information retrieval community needs to step up to the challenge. We introduce the Million Song Dataset Challenge to improve the state-of-the-art and unify the community behind a common evaluation. The goal is to predict the songs a user will listen to, given the user’s listening history and full information (meta-data, content analysis) of all songs. We explain the need for such a contest, the data we use, our goals and design choices, and present baseline experimental results using simple, off-the-shelf recommendation algorithms.".
11 abstract "Reliable evaluation of Information Retrieval systems requires large amounts of relevance judgments. Making these annotations is quite complex and tedious for many Music Information Retrieval tasks, so performing such evaluations requires too much effort. A low-cost alternative is the application of Minimal Test Collection algorithms, which offer quite reliable results while significantly reducing the annotation effort. The idea is to incrementally select what documents to judge so that we can compute estimates of the effectiveness differences between systems with some degree of confidence. In this paper we show a first approach towards its application to the evaluation of the Audio Music Similarity and Retrieval task. An analysis with the MIREX 2009 and 2011 data shows that the judging effort can be reduced to about 30-40% to obtain results with 95% confidence.".
12 abstract "In this paper we present our work towards developing a large-scale web application for digitizing, recognizing (via optical music recognition), correcting, displaying, and searching printed music texts. We present the results of a recently completed prototype implementation of our workflow process, from document capture to presentation on the web. We discuss a number of lessons learned from this prototype. Finally, we present some open-source Web 2.0 tools developed to provide essential infrastructure components for making searchable printed music collections available online. Our hope is that these experiences and tools will help in creating next-generation globally accessible digital music libraries.".
15 abstract "This paper aims at leveraging microblogs to address two challenges in music information retrieval (MIR), similarity estimation between music artists and inferring typical listening patterns at different granularity levels (city, country, global). From two collections of several million microblogs, which we gathered over ten months, music-related information is extracted and statistically analyzed. We propose and evaluate four co-occurrence-based methods to compute artist similarity scores. Moreover, we derive and analyze culture-specific music listening patterns to investigate the diversity of listening behavior around the world.".
2 abstract "In this paper we propose a hybrid music recommender system, which combines usage and content data. We describe an online evaluation experiment performed in real time on a commercial music web site, specialised in content from the very long tail of music content. We compare it against two stand-alone recommenders, the first system based on usage and the second one based on content data. The results show that the proposed hybrid recommender shows advantages with respect to usage- and content-based systems, namely, higher user absolute acceptance rate, higher user activity rate and higher user loyalty.".
4 abstract "The emergence of social tagging websites such as Last.fm has provided new opportunities for learning computational models that automatically tag music. Researchers typically obtain music tags from the Internet and use them to construct machine learning models. Nevertheless, such tags are usually noisy and sparse. In this paper, we present a preliminary study that aims at refining (retagging) social tags by exploiting the content similarity between tracks and the semantic redundancy of the track-tag matrix. The evaluated algorithms include a graph-based label propagation method that is often used in semi-supervised learning and a robust principal component analysis (PCA) algorithm that has led to state-of-the-art result in matrix completion. The result indicates that robust PCA with content similarity constraint is particularly effective; it improves the robustness of tagging against three types of synthetic errors and boosts the recall rate of music auto-tagging by 7% in a real-world setting.".
5 abstract "Many sound-related applications use Mel-frequency cepstral coefficients (MFCC) to describe audio timbral content. Most of the research efforts dealing with MFCCs have been focused on the study of different classification and clustering algorithms, the use of complementary audio descriptors, or the effect of different distance measures. The goal of this paper is to focus on the statistical properties of the MFCC descriptor itself. For that purpose, we use a simple encoding process that maps a short-time MFCC vector to a dictionary of binary code-words. We study and characterize the rank-frequency distribution of such MFCC code-words, considering speech, music, and environmental sound sources. We show that, regardless of the sound source, MFCC code-words follow a shifted power-law distribution. This implies that there are a few code-words that occur very frequently and many that happen rarely, with no typical or characteristic code-word in the distribution. We also observe that the inner-structure of the most frequent code-words has characteristic patterns. For instance, we observe that close MFCC coefficients tend to have similar quantization values in the case of music signals. Finally, we study the rank-frequency distributions of individual music recordings and show that they present the same type of heavy-tailed distribution as found in the large-scale databases. This fact is exploited in two supervised semantic inference tasks: genre and instrument classification. In particular, we obtain similar classification results as the ones obtained by considering all frames in the recordings by just using 50 (properly selected) frames. Beyond this particular example, we believe that the fact that MFCC frames follow a power-law distribution could potentially have important implications for future applications dealing with audio signals.".
6 abstract "In this paper we compare the use of different musical representations for the task of version identification (i.e.~retrieving alternative performances of the same musical piece). We automatically compute descriptors representing the melody and bass line using a state-of-the-art melody extraction algorithm, and compare them to a harmony-based descriptor. The similarity of descriptor sequences is computed using a dynamic programming algorithm based on nonlinear time series analysis which has been successfully used for version identification with harmony descriptors. After evaluating the accuracy of individual descriptors, we assess whether performance can be improved by descriptor fusion, for which we apply a classification approach, comparing different classification algorithms. We show that both melody and bass line descriptors carry useful information for version identification, and that combining them increases version detection accuracy. Whilst harmony remains the most reliable musical representation for version identification, we demonstrate how in some cases performance can be improved by combining it with melody and bass line descriptions. Finally, we identify some of the limitations of the proposed descriptor fusion approach, and discuss directions for future research.".
9 abstract "This work builds on and responds to previous publications on adaptation of similarity measures to user voting data from the MagnaTagATune dataset. The similarity dataset presented by Stober and Nürnberger at AMR 2011 has been reproduced to test other approaches in a comparable way. On this set, we compare their two-level approach, defining similarity measures on individual facets and combining them in a linear model, to the Metric Learning to Rank (MLR) algorithm which adapts a measure that operates directly on low-level features. We compare the different algorithms, features and parameter spaces with regards to minimising constraint violations. Furthermore, the effectiveness of the MLR algorithm in generalising over unknown similarity data is evaluated on this dataset. We explore the effects of feature choice. Here, we found that the binary genre data showed little correlation with the similarity data, but combined with audio features it clearly improved generalisation.".
1 abstract "Community-based Question and Answering (CQA) services have brought users to a new era of knowledge dissemination by allowing users to ask questions and to answer other users’ questions. However, due to the fast increasing of posted questions and the lack of an effective way to find interesting questions, there is a serious gap between posted questions and potential answerers. This gap may degrade a CQA service’s performance as well as reduce users’ loyalty to the system. To bridge the gap, we present a new approach to Question Routing, which aims at routing questions to participants who are likely to provide answers. We consider the problem of question routing as a classification task, and develop a variety of local and global features which capture different aspects of questions, users, and their relations.Our experimental results obtained from an evaluation over the Yahoo! Answers dataset demonstrate high feasibility of question routing. We also perform a systematical comparison on how different types of features contribute to the final result and show that question-user relationship features play a key role in improving the overall performance.".
11 abstract "One of the important targets of community-based question answering (CQA)services, such as Yahoo! Answers, Quora and Baidu Zhidao, is to maintain and evenincrease the number of active answerers, that is the users who provide answersto open questions. The reasoning is that they are the engine behind satisfiedaskers, which is the overall goal behind CQA. Yet, this task is not an easy one.Indeed, our empirical observation shows that many users provide just one or twoanswers and then leave.In this work we try to detect answerers that are about to quit, a task known aschurn prediction, but unlike prior work, we focus on new users. To address thetask of churn prediction in new users, we extract a variety of features to modelthe behavior of Yahoo! Answers users over the first week of their activity, including personal information, rate of activity, and social interaction with other users. Several classifiers trained on the data show that there is a statisticallysignificant signal for discriminating between users who are likely to churn andthose who are not. A detailed feature analysis shows that the two most importantsignals are the total number of answers given by the user, closely related tothe motivation of the user, and attributes related to the amount of recognitiongiven to the user, measured in counts of best answers, thumbs up and positiveresponses by the asker.".
13 abstract "We develop an innovative approach to delivering relevant information using a combination of socio-semantic search and filtering approaches. The goal is to facilitate timely and relevant information access through the medium of conversations by mixing past community specific conversational knowledge and web information access to recommend and connect users and information together. Conversational Information Access is a socio-semantic search and recommendation activity with the goal to interactively engage people in conversations by receiving agent supported recommendations. It is useful because people engage in online social discussions unlike a solitary search; the agent brings in relevant information as well as identifies relevant users; participants provide feedback during the conversation that the agent uses to improve it's recommendations.".
2 abstract "Users tend to ask and answer questions in community question answering (CQA) service to seek information and share knowledge. A corollary is that myriad of questions and answers appear in CQA service. Accordingly, volumes of studies have been taken to explore the answer quality so as to provide a preliminary screening for better answers. However, to our knowledge, less attention has so far been paid to question quality in CQA. Knowing question quality provides us with finding and recommending good questions together with identifying bad ones which hinder the CQA service. In this paper, we are conducting two studies to investigate the question quality issue. The first study analyzes the factors of question quality and finds that the interaction between askers and topics results in the differences of question quality. Based on this finding, in the second study we propose a Mutual Reinforcement-based Label Propagation (MRLP) algorithm to predict question quality. We experiment with Yahoo! Answers data and the results demonstrate the effectiveness of our algorithm.".
5 abstract "Answer ranking is very important for cQA services due to the high variance in the quality of answers. Most existing works in this area focus on using various features or employing machine learning techniques to address this problem. Only a few of them noticed and involved user profile information in this particular task. In this work, we assume the close relationship between user profile information and the quality of their answers under the ground truth that user information records the user behaviors and histories as a summary. Thus, we exploited the effectiveness of three categories of user profile information, i.e. engagement-related, authority-related and level-related, on answer ranking in cQA. Different from previous work, we only employed the information which is easy to extract without any limitations, such as user privacy. Experimental results on Yahoo! Answers manner questions showed that our system by using the user profile information achieved comparable or even better results over the state-of-the-art baseline system. Moreover, we found that the picture existence of a user in cQA community contributed more than other information in the answer ranking task.".
6 abstract "Community Question Answering (CQA) services, such as Yahoo! Answers, are specifically designed to address the innate limitation of Web search engines by helping users obtain information from a community. Understanding the user intent of questions would enable a CQA system identify similar questions, find relevant answers, and recommend potential answerers more effectively and efficiently. In this paper, we propose to classify questions into three categories according to their underlying user intent: subjective, objective, and social. In order to identify the user intent of a new question, we build a predictive model through machine learning based on both text and metadata features. Our investigation reveals that these two types of features are conditionally independent and each of them is sufficient for prediction. Therefore they can be exploited as two views in co-training - a semi-supervised learning framework - to make use of a large amount of unlabelled questions, in addition to the small set of manually labelled questions, for enhanced question classification. The preliminary experimental results show that co-training works significantly better than simply pooling these two types of features together.".
8 abstract "Recently, query suggestions are quite useful in web searches. Most of them provide additional and correct terms based on the initial query entered by users. However, users can not be satisfied with them, when user has ambiguous or diverse information requests. In those cases, faceted query expansions along with their usage are quite efficient.In this paper, faceted query expansion methods using Community QuestionAnswering(CQA) resources are proposed. CQA is one of the social networkservice(SNS) which aims to share user knowledge. In a CQA site, users can post their questions in a suitable category. And others pick up and answer the questions according to the category framework. Thus, the "category" in CQA makes "facet" of query expansion. In addition, the season of question posting plays a important role in understanding its context. Thus, the "seasonality" makes another "facet" of query expansion.Therefore, we implement the two-dimensional faceted query expansion methods based on the results of LDA( Latent Dirichlet Allocation ) analysis to the CQA resources. Moreover, the question-articles deriving query expansion will be provided for choosing appropriate terms by users.Our sophisticated evaluations using actual and long-term CQA resources, such as "Yahoo! CHIEBUKURO", demonstrate that most part of CQA questions are posted in periodicity and bursts.".
9 abstract "Community Question Answering (CQA) websites provide a rapidly growing source of information in many areas. This rapid growth while offering new opportunities, puts forward new challenges. In most CQA implementations, there is little effort in directing new questions to the right group of experts. This means that experts are not be provided with questions matching their expertise, and therefore new matching questions may be missed and not receive a proper answer. We focus on finding experts for a newly posted question. We investigate the suitability of two statistical topic models for solving this issue and compare these methods against more traditional Information Retrieval approaches. We show that for a dataset constructed from the Stackoverflow website, these topic models outperform other methods in retrieving a candidate set of best experts for a question. We also show that the Segmented Topic Model gives consistently better performance compared to the Latent Dirichlet Allocation Model.".
A-1004 abstract "The University of North Texas (UNT) Libraries recently revised their Metadata Input Guidelines in order to improve usability and accessibility for metadata writers, and to enhance the quality of metadata that drives new features in their digital systems. This paper describes important considerations in the revision process and also demonstrates the relationship between quality metadata and system functionality that ultimately benefits both metadata creators and system end-users. Keywords: metadata; input guidelines; schemas; system functionality; quality control; faceted searching".
A-1005 abstract "The Federal Reserve wanted to use RSS to represent not only news, such as press releases, but also data, such as exchange rates. The Fed hoped to use one set of feeds to accommodate two different audiences for RSS, human readers (at one remove) and self-contained automated processes. While the different RSS specifications provide elements for traditional news items, they require extensions to handle data. Since central banks all tend to report the same sorts of information, the Fed joined with other central banks to create an extended specification that met their needs. This specification extends RSS 1.0, which is the more readily extended RSS specification. The extension uses elements from established metadata standards wherever it can, such as for language and audience, and adds elements when subjects are not found in those standards or are more particular to central banks, such as (monetary) currency. Although the central banks intend these new elements to be used primarily by machine processes, the element names have sufficient semantic transparency so that they can be understood by human readers.".
A-1007 abstract "This paper introduces metadata issues in the framework of the WICRI project, a network of semantic wikis for communities in research and innovation. A wiki can be related to an institution, to a research field (mainly, environment or ICT at this time), or to a regional entity. Metadata and semantic items play the strategic role to handle the quality and the consistency of the network. An important point deals with the “wiki way of working” in which a metadata specialist and a scientist, familiar with abstract formalisms, can work altogether, at the same time, on the same pages. Some first experiments of designing metadata are presented. A wiki, encyclopedia of metadata, is proposed, and related technical issues are discussed.".
A-1009 abstract "The current mandate to digitize all collections at the Smithsonian Institution along with the increasing need to share data and increase access to collections has made it essential to establish institutional metadata standards, including those for embedding metadata. This paper documents the ongoing process of establishing core embedded metadata within the institution through the work of the Smithsonian Embedded Metadata Group, which is pan-institutional in nature and includes museums, libraries, archives, and research institutes. The focus of the working group described within this paper is the creation of core embedded metadata fields for use in still images.".
A-1010 abstract "To ensure that they can participate in the Semantic Web, libraries need to prepare their legacy metadata for use as linked data. eXtensible Catalog (XC) software facilitates converting legacy library data into linked data using a platform that enables risk-free experimentation and that can be used to address problems with legacy metadata using batch services. The eXtensible Catalog also provides “lessons learned” regarding the conversion of legacy data to linked data by demonstrating what MARC metadata elements can be transformed to linked data, and helping to suggest priorities for the cleanup and enrichment of legacy data. Converting legacy metadata to linked data will require a team of experts, including MARC-based catalogers, specialists in other metadata schemas, software developers, and Semantic Web experts to design and test normalization/conversion algorithms, develop new schemas, and prepare individual records for automated conversion. Library software applications that do not depend upon linked data may currently have little incentive to enable its use. However, given recent advances in registering legacy library vocabularies, converting national library catalogs to linked data, and the availability of open source software such as XC to convert legacy data to linked data, libraries may soon find it difficult to justify continuing to create metadata that is not linked data compliant. The library community can now begin to propose smart practices for using linked data, and can encourage library system developers to implement linked data. XC is demonstrating that implementing linked data, and converting legacy library data to linked data, are indeed achievable.".
A-1013 abstract "Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.".
A-1018 abstract "Data growth in the environmental sciences has resulted in multidimensional datasets that are heterogeneous and extensive. Scientific academic research includes scalar, sensor, or vector data, which may be publically available. The datasets generated extend to local environmental groups whose trained citizens contribute to the surveillance of local habitats and ecological conditions that can potentially enhance various data analyses on a national and international level. While the abundance of environmental data is growing, tools to select, compare, and utilize the growing number of datasets generated from multiple institutions and groups are not keeping pace. This paper focuses on planning the construction of a dataset visualization that concentrates on the use of metadata to facilitate the identification, selection, and comparison of dataset information. It is presented in the visualization framework at the School of Information Sciences called VIBE (Visual Information Browsing Environment) and plans to adapt the Dublin Core Metadata Element Set as a basis for its development. In the long term, visualization may emerge not only as a primary tool for modeling environmental scientific metadata, but also as a mechanism used at the incipience of environmental scientific discovery.".
A-1024 abstract "According to the Singapore Framework, any development of a Dublin Core Application Profile (DCAP) has to include the creation of a domain model. DC Scholarly Works Application Profile (SWAP) was the first one explicitly using Functional Requirements for Bibliographic Records (FRBR) model in creating its domain model. FRBR has recently been extended with Functional Requirements for Authority Data (FRAD) and Functional Requirement for Subject Authority Data (FRSAD) thus forming the so-called FRBR family. This paper first further develops the SWAP domain model to incorporate the FRBR family models. Then a generalized FRBR-family-based DCAP domain model is presented to be used as the basis for specific domain application profiles.".
A-1032 abstract "Question and answer sets are the core of clinical research. The [RD] PRISM (Patient Registry Item Specifications and Metadata for Rare Disease) project will provide a library of standardized questions across a broad spectrum of rare diseases that can be used for developing new registries and revising existing ones. Questions will be encoded using well-established clinical terminologies to enable cross-indication and cross-disease analyses, facilitate collaboration, and generate meaningful results for rare disease patients, physicians, and researchers. Encoded question and answer sets will also be indexed to facilitate information retrieval by subject matter, data type, and time interval. This project will outline issues and challenges related to indexing questions for future use and for data sharing; to explore possible metadata and terminological standards for indexing them; and, determine if Dublin Core (DC) is a viable alternative to be explored in a library of standardized rare disease research questions.".
A-1033 abstract "This article announces the availability of a crosswalk between ONIX 2.1 and MARC 21 developed by OCLC and illustrates how it is used in the OCLC Metadata for Publishers project. To accomplish the goal of merging library and publisher metadata and anticipating the need to mine MARC records for other purposes, the design of the crosswalk, the corresponding software, and the application take records apart and process the fields individually, creating data streams that match the intended use of the ONIX standard and resemble the pre-Internet paradigm of Electronic Data Interchange, or EDI, for describing materials and tracking them through a supply chain. Though this design works well enough to support commercial-grade processes, problems arise with mappings between physical descriptions in the two standards, which need to be more rigorously modeled or closely aligned. Nevertheless, the RDA/ONIX Framework, which is reviewed here, promises to reduce this obstacle.".
A-1037 abstract "The Variations/FRBR project at Indiana University is experimenting with implementing the Functional Requirements for Bibliographic Records (FRBR) conceptual model in order to further research on next-generation library catalogs and promote the re-usability and interoperability of FRBR-based metadata. This paper describes the use of FRBR in some system implementations, discusses the first steps our project has taken to promote shared FRBRized data, and raises some issues related to representing FRBRized data in Dublin Core Application Profiles.".
A-1041 abstract "The idea that metadata, particularly Dublin Core, could be usable as a Lego™-like construction kit has been a popular suggestion for over a decade. In this paper, we first explore what this metaphor originally meant – why the idea is so appealing, and what design lessons we might take from the idea. We take a look at how close we are today to that ideal, looking at examples of real-world metadata design projects, and suggest that at present the situation is often more analogous to a game of Tetris – that is, the construction kit is sometimes limited, time concerns are often an issue, and there is limited opportunity for creativity. We explore patterns of collaboration in existing projects, such as the Scholarly Works Application Profile development. Finally, we ask how what we know about the process of building a shared understanding and formalisation about a domain can help us come closer to the ideal of Dublin Core as an approachable puzzle-game or construction kit.".
A-1043 abstract "The DCMI One-to-One Principle holds that related but conceptually different entities, such as a photograph and a digital image of that photograph, should be represented by separate metadata records. In practice, however, large numbers of practitioners do not adhere to this principle and commonly mix elements representing two related entities in a single metadata record. This paper explores reasons why this is the case, why it is problematic, how the principle itself would benefit from greater clarity, some practical options for maintaining the principle in current systems, with advantages and disadvantages of each. The paper focuses on the widespread application context of small to medium-sized cultural heritage institutions digitizing unique local resources, creating metadata using digital collection software packages such as CONTENTdm, and exposing only simple Dublin Core metadata for OAI harvesting and aggregating.".
A-1044 abstract "This paper reports a project named “General Rules of National Digital Library Metadata”, trying to build a metadata application framework for National Library of China (NLC). It aims at solving the applications of DC in Chinese digital library, developing a series of related standards, criteria and platform, to meet the requirements of describing, organizing, managing, serving and preserving the Chinese digital objects. It functions to support producing, processing, organizing, releasing, preserving and managing information resources in the digital library system of NLC, and then to achieve the interoperability and data sharing with other digital library systems to the more extent. The project outcomes are two parts: a metadata application framework and principles of National Digital Library of China based on the work of DCMI and the other international leading metadata projects, and a conversion program. According to this project, we are trying to find the best practice of metadata application for developing digital library in China.".
A-1046 abstract "As the Dublin Core Metadata Initiative celebrates its 15th anniversary, the Government of Canada (GC) celebrates its 10th year of making information easier to find. The Government of Canada officially adopted the Dublin Core as its core metadata standard for Web resource discovery in 2001. Soon the Government of Canada started to develop domain-specific metadata beyond Web and resource discovery to meet wider information needs. Supported by standards and other policy instruments, rapid metadata developments were made in the areas of records management, Web content management, e-learning, executive correspondence and geospatial data. The Government of Canada has been an active participant in the DC-Government Working Group, and organized its own event, the Canadian Metadata Forum in 2003 and 2005. More recently, the Government of Canada has adopted an enterprise information architecture (EIA) approach to metadata, within a larger information management strategy. The Government of Canada now has plans underway to develop other metadata domains, registries and repositories, its own namespace facility, and a vast awareness campaign to brand metadata as the “DNA of Government”.".
A-996 abstract "The OCLC CONTENTdm Metadata Working Group was formed in response to research demonstrating the need for guidelines and best practices for creating quality Dublin Core metadata, which is useful to the primary user community, but also “shareable” outside of the local context. The CONTENTdm Metadata Working Group has worked since August 2009 in a test environment to identify best practices for creating Dublin Core metadata in CONTENTdm and mapping to MARC for sharing in WorldCat.org via the WorldCat Digital Collection Gateway, a self-service OAI harvesting tool. The first best practices document identified 12 core metadata fields and four recommended fields “as appropriate,” and provides guidelines for field content standardization and mapping. Concerns such as adherence to the Dublin Core one-to-one principle and the recording of original and digital dates and publishers are discussed, along with recommendations for configuring metadata for the Digital Collection Gateway, which was developed to increase the shareability of metadata.".
A-997 abstract "PolicyArchive collects public policy research from over 700 known research publishers and makes these documents accessible in a navigable digital library. The contributions of thousands of publications from these providers enables in-depth secondary source materials to be utilized by policymakers, legislators, foundations, scholars, journalists, and educators. The functionality of this digital repository is discussed, including the use of terminologies, subject navigability, and Special Collections. PolicyArchive presents unique content with structured metadata which is openly accessible; the application of these principles not only provides coordinated access to previously unavailable resources, but also allows the reader to place a given document in multiple contexts. Analysis of this information environment illuminates ongoing digital library initiatives regarding the creation of navigable, accessible learning resources.".
A-1002 abstract "Objective: To describe the gaps in existing vocabularies and taxonomies that are used to retrieve literature on health issues for LGBT adolescents in order to build a working ontology of appropriate terms. Methods: A formal literature search on healthcare concerns of LGBT adolescents was done, using indices to the literature of the health sciences, the social and psycho-social sciences, and the information sciences. No terms from standard subject heading lists and controlled vocabularies such as MeSH®, the Thesaurus of Psychological Index Terms®, the ERIC Thesaurus®, and the MedlinePlus® Consumer Vocabulary, exist that adequately describe this literature, making precise retrieval difficult. A search using keywords and text words yielded 80 articles, and a careful reading of the articles prompted this effort to develop an ontology of “gay-sensitive” terms from the consumer informatics perspective. Results: A first-step model of LGBT terms, derived from the published research literature, is presented, which offers a more appropriate set of terms to use when searching the multi-disciplinary literature that reports current research on health concerns of LGBT adolescents. If an ontology can be developed, tested, and described for this topic, it will add to the sparse literature on consumer terminology for informatics applications.".
A-1006 abstract "Individuals who wish to develop digital scholarly works and libraries that wish to provide access to their precious and fragile holdings have an interest in digitizing premodern manuscripts. These handmade objects are often beautiful and each one is unique. The features of zooming and light alteration available through digital photography and manipulation are assets to medieval scholars, because these methods can reveal more information for teaching and research. Digital collections of medieval manuscripts can be difficult to find or remain unpublished, in part because the description of these works is difficult. Involvement in the International Congress on Medieval Studies at Kalamazoo, Michigan has made it clear to librarians at WMU that both smaller institutions (which may hold only one or two items) and individual scholars wish to provide appropriate metadata for digitized manuscripts, but do not have the combination of technical and subject skills needed. Even with a good description of material from a bookseller or printed catalog, those unfamiliar with metadata schema and language may find it daunting, while libraries may lack a specialist in the terminology and skills of paleography and codicology. In addition, most existing large digital collections use TEI, which has a steep learning curve. The goal of this project is to develop a standardized and user-friendly Dublin Core application profile (Heery & Patel, 2000; Coyle & Baker, 2009) which uses elements from the European Networking Resources and Information Concerning Cultural Heritage (ENRICH) Specification (Burnard, 2008) and Dublin Core (DCMI, 2008) to create a metadata profile which works well with the long-standing conventions of premodern manuscript descriptive codicology and paleography (Ricci, 1935-1940; Ker, 1969). The ENRICH Specification includes elements standardizing traditional manuscript description with the goal of creating “seamless access to information about the vast collections of manuscripts and incunables distributed across major European libraries (Cummings & Burnard, 2009).” However, the size of the specification can be formidable, and requires knowledge of TEI and XML (a steep learning curve in itself), a searchable XML platform, and the expertise to set it up and sustain it. A Dublin Core application profile developed specifically for premodern manuscript description would allow for the creation of standardized, shareable metadata and Web-accessible digital images within easy-to-use digital collection management software such as CONTENTdm. Inclusion of defined ENRICH elements in the profile provides a “fill-in-the-blank” template informing non-specialists of descriptive metadata useful to medieval scholars. This, combined with suggested content standards and a simple glossary, should allow catalogers with limited subject expertise to provide some access to their materials in a way which conforms to the expectations of the target users, specialists in the interdisciplinary study of the premodern world. In addition, if the collection attracts scholars, the descriptive metadata can be easily updated and modified from their research. This poster will illustrate how digital images of manuscripts and traditional print catalog descriptions translate easily into the Premodern Manuscript Application Profile (PMAP), which is in development.".
A-1008 abstract "Within the cultural heritage community, it is increasingly common to distinguish the tasks of identification and addressing the object by using a location-independent Persistent Identifier (PI) such as a URN, a DOI, or a Handle linked to a URL describing the object location in an institutional repository or a digital long-term preservation system run by a national library. This way, the problem that a digital object is inaccessible if the content provider moves it to a different location can be solved since the object can still be found using the PI. In order to resolve a PI to a URL with the object location, a resolution service is required, which usually is run by the national library acting as legal deposit for the digital object. This requires the user to know, which national library is responsible for the service, which is a problem for digital library portals collecting metadata from content providers in different cultural settings and from different nations. For the European cultural heritage portal Europeana, it was decided to implement a metaresolver – The Europeana Resolution Discovery Service (ERDS) – which collects all PI resolution requests and dispatches them to the proper national resolver service. The development of this metaresolver is part of the Europeana sister-project EuropeanaConnect and is scheduled for completion in July 2010.".
A-1015 abstract "AGROVOC is one of the most important resources for covering the terminology of all subjects to interest the Food and Agriculture Organization of the United Nations (FAO) (including agriculture, forestry, fisheries, food and related domains). AGROVOC is a multilingual thesaurus developed by FAO and the Commission of the European Communities in the early 80s. Since then it has continuously been updated by FAO in collaboration with partner organizations in different countries, and is now available online in 19 languages . AGROVOC is currently being converted from a traditional term-based knowledge organization system (KOS) to a concept-based system (Soergel, 2004), the AGROVOC Concept Server (CS). The CS allows the representation of more semantics such as specific relationships between concepts as well as relationships between their multilingual lexicalizations. Its functions include being a resource to help structure and standardize agricultural terminology in multiple languages for use by any number of different users and systems around the world. An enabling tool, the AGROVOC Concept Server Workbench (ACSW), has been developed by FAO in collaboration with Kasetsart University in Thailand and other partners. It supports the maintenance of the CS data in a distributed environment (Sini, 2008). One of the goals of the project is to set up a network of international experts who can share the collaborative maintenance and extension of the AGROVOC CS, and thus enhance the creation of agricultural knowledge much more efficiently. The ACSW is part of the larger Agricultural Ontology Service (AOS) initiative and the first major step towards an "Ontology Service" (Fisseha, 2001), which aims to provide semantic-based services to users in the agricultural domain. To cover all agricultural related information, ACSW needs integrated vocabularies.".
A-1016 abstract "This poster reports a project which respond to the institutional needs of storage and preservation of teacher and student memories captured in digital learning material. Also propose a system for retrieval of digital learning material through adequate metadata and taking advantage of full text content. The main objective of the project is provide adequate storage, preservation and retrieval of digital learning material through appropriate metadata tags and pertinent digital collection software management selection. Some of the benefits are exposure, sharing and preservation of digital learning materials produced by the university community and the use of open software. The barriers founded were lack of regular practices about copyright issues and institutional policies. The most important result of the project is a system of storage and retrieval of digital learning material based in Greenstone with an application profile metadata based in Dublin Core, Learning Object Metadata and local labels. Besides, this experience permits the conceptualization of a model for developing application profiles for other digital collections. In the future we hope to design, develop and offer to the community a social network system to facilitate the description, social tagging and sharing of learning digital material which will improve the current system characteristics.".
A-1022 abstract "This poster describes how we created a set of collection-level metadata for Taiwan’s digital collections, seeking to explore new facets of knowledge organization, facilitating the searching of information for users and the administering of resources for collection managers.".
A-1031 abstract "This case study proposes a scenario with three topic-related thesauri, which have been connected with bilateral cross-concordances as part of a major terminology mapping initiative in the project KoMoHe (Mayr et al., 2008). The thesauri have already been or will be converted to SKOS and in order to not omit the relevant crosswalks, the mapping properties of SKOS will be used for modeling them adequately. The participating thesauri in this approach are: (i) TheSoz (Thesaurus for the Social Sciences, GESIS) which has been converted to SKOS in a first experimental version (Zapilko et al., 2009), an update is underway which will be oriented on the introduced SKOS extensions of the EUROVOC thesaurus (Smedt, 2009) and will use SKOS-XL additionally, (ii) STW (Standard-Thesaurus for Economics, ZBW) which has also been published in SKOS format (Neubert, 2009) and (iii) IBLK-Thesaurus (SWP). Currently, the conversion of vocabularies to SKOS is an active research area, but there are still unsolved and relevant issues which could not be treated satisfyingly yet. Our approach focuses on the application of existing crosswalks to the SKOS mapping properties and the establishment of a linked data application based on those connected thesauri.".
A-994 abstract "The specific goal of this project is to examine and compare how library users access, use, and interact with two social discovery systems used in two Canadian public library systems. Transaction log analysis will be conducted to answer the following research questions: a) How do public library users interact with social discovery systems? Specifically, which enhanced catalogue features do they use, e.g., faceted navigation, user-contributed content such as tagging, reviews, and ratings, sorting features, etc., and with which frequency? b) How does usage between the two social discovery systems compare? Specifically, are there commonalities or differences between how public library users use different social discovery systems? and c) Does the use of social discovery systems change over time? Specifically, is the use of the features in social discovery systems consistent over time?".
1 abstract "We describe an on-line environment in which the ontology development process can be performed collaboratively in a Wiki-like fashion. To start the construction (or the extension) of an ontology, the user can exploit a domain corpus, from which the terminological component of the system automatically extracts a set of domain-specific key-concepts. These key-concepts are further disambiguated in order to be linked to existing external resources and obtain additional information such as the concept definition, the synonyms and the hypernyms. Finally, the user can easily select through the interface which concepts should be imported into the ontology. The system support several ontology engineering tasks, including (i) boosting the construction or extension of ontologies, (ii) terminologically evaluating and ranking ontologies, and (iii) ranking the concepts defined in an ontology according to their relevance with respect to the domain described by the corpus.".
10 abstract "RightField is a Java application that provides a mechanism for embedding ontology annotation support for scientific data in Microsoft Excel or Open Office spreadsheets. The result is semantic annotation by stealth, with an annotation process that is less error-prone, more efficient, and more consistent with community standards. By automatically generating RDF statements for each cell a rich, Linked Data querying environment allows scientists to search their data and other Linked Data resources interchangeably, and caters for queries across heterogeneous spreadsheets. RightField has been developed for Systems Biologists but has since adopted more widely. It is open source (BSD license) and freely available from http://www.rightfield.org.uk.".
109 abstract "We have implemented a novel approach for robust ontology design from natural language texts by combining Discourse Representation Theory (DRT), linguistic frame semantics, and ontology design patterns. We show that DRT-based frame detection is feasible by conducting a comparative evaluation of our approach and existing tools. Furthermore, we define a mapping between DRT and RDF/OWL for the production of quality linked data and ontologies, and present FRED, an online tool for converting text into internally well-connected and linked-data-ready ontologies in web-service-acceptable time.".
11 abstract "This demo builds on ATRUST, a probabilistic model of trust aimed to assist peers for query answering in semantic P2P systems. We illustrate the usage of ATRUST in a P2P bookmarking scenario where peers exchange URLs of articles about topics they are interested in. Unlike classical bookmarking systems (e.g. Delicious), in this ideal P2P bookmarking network information is no longer centralised, and peers need to query each other to gather new articles. Further, each peer uses her own taxonomy of categories for indexing URLs of articles. We highlight the gain in the quality of peers' answers ---measured with precision and recall--- when the process of query answering is guided by ATRUST. As a particular case, we show how trust overcomes homonymy. Moreover, a trust-based ranking of articles allows to distinguish the articles relevant to a category from the ones related to its homonymous categories.".
12 abstract "The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The motivation behind NIF is to allow NLP tools to exchange annotations about text documents in RDF. Other than more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform. NIF wrappers have to be only created once for a particular tool and can subsequently interoperate with a potentially large number of other tools without additional adaptations. In this paper, we present 1. the currently implemented NIF Wrappers, which are available as free web services and 2. a GUI called the NIF Combinator, which allows to combine the output of the implemented NIF web services.".
3 abstract "LODE, the Live OWL Documentation Environment, is a service for the generation of human-readable documentation of OWL ontologies and RDFS vocabularies. It automatically extracts classes, object properties, data properties, named individuals, annotation properties, meta-modelling (punning), general axioms, SWRL rules and namespace declarations and renders them as an HTML page designed for easy brows- ing and navigation by means of embedded links. In this paper, we present an overview of the tool, in particular focusing on the features introduced in the latest available version.".
4 abstract "In this paper, we present the capability of our ontology matching tool YAM++. We show that YAM++ is able to discover mappings between entities of given two ontologies by using machine learning approach. Besides, we also demonstrate that if the training data are not available, YAM++ can discover mappings by using information retrieval techniques. Finally, we show that YAM++ is able to deal with multi-lingual ontologies matching problem.".
4 abstract "Using a foundational ontology for domain ontology development is beneficial in theory and practice. However, developers have difficulty with choosing the appropriate foundational ontology, and why. In order to solve this problem, a comprehensive set of criteria that influence foundational ontology selection has been compiled and the values for each parameter determined for DOLCE, BFO, GFO, and SUMO. This paper-based analysis is transformed into an easily extensible algorithm and implemented in the novel tool ONSET, which helps a domain ontology developer to choose a foundational ontology through interactive selection of preferences and scaling of importance so that it computes the most suitable foundational ontology for the domain ontology and explains why this selection was made. This has been evaluated in an experiment with novice modellers, which showed that ONSET greatly assists in foundational ontology selection.".
5 abstract "The high expressivity of OWL allows to express the same conceptualisation in different ways. In this demo paper we present tools supporting semi-automatic transformation of the modelling style of existing ontologies, which will alleviate the structural problems related to ontology matching, merging and processing by various tools. On the top of the PatOMat Transformation Framework (computational core of the approach), user-friendly visual tools have been designed: the Graphical User Interface for Pattern-based Ontology Transformation (GUIPOT), the Transformation Wizard for ontology adaptation to a content pattern (within the XDtools ontological engineering framework, as a plugin for the NeOn toolkit), and a Transformation Pattern Editor (TPE).".
66 abstract "Ontology quality can be affected by the difficulties involved in ontology modelling which may imply the appearance of anomalies in ontologies. This situation leads to the need of validating ontologies, that is, assessing their quality and correctness. Ontology validation is a key activity in different ontology engineering scenarios such as development and selection. This paper contributes to the ontology validation activity by proposing a web-based tool, called OOPS!, independent of any ontology development environment, for detecting anomalies in ontologies. This tool will help developers to improve ontology quality by automatically detecting potential errors.".
68 abstract "Simple Knowledge Organization System (SKOS) vocabularies are commonly used to represent lightweight conceptual vocabularies such as taxonomies, classifications and thesauri on the Web of Data. We identified 11 criteria for evaluating the validity and quality of SKOS vocabularies. We then analyzed 14 such vocabularies against the identified criteria and found most of them to contain structural errors. Our tool, Skosify, can be used to automatically validate SKOS vocabularies and correct many problems, helping to improve their quality and validity.".
7 abstract "UTILIS (Updating Through Interaction in Logical Information Systems), introduced in a research paper at EKAW'12, is an interactive process to help users create new objects in a RDF graph. While creating a new object, relaxation rules are applied to its current description to find similar objects, whose properties serve as suggestions to expand the description. UTILIS is implemented in Sewelis, a system that reconciles the expressiveness of querying languages (e.g., SPARQL), and the benefits of exploratory search found in faceted search. The same interaction principles are used for both exploration and creation of semantic data. We illustrate the UTILIS approach by applying Sewelis to the semantic annotation of comic panels, reusing the dataset that was used for a user evaluation.".
8 abstract "This demonstration will present a system, called I-CAW, which aggregates content from social spaces into a semantic-enriched data browser to promote informal learning. The work pioneers a new way to interact with social content using nudges (in the form signposts and prompts) that exploit ontlogies and semantically augmented content. The results of a user study with I-CAW suggest that semantic nudges are a fruitful way to enhance the interaction in semantic browsers in order to facilitate learning. The demonstration will offer hands-on experience with I-CAW following the settings from the user study.".
132 abstract "Knowledge management on the desktop is not a recent challenge. It has been around one way or another ever since the desktop emerged as a life (and work)-changing device. It has been around even before that, foreseen by visionaries like Bush, Engelbart and Nelson. Their ideas for solutions have been taken up by many projects in the field of PIM and KM. Semantic Web technologies have been regarded as a game-changer, and applying them to PIM has resulted in the Semantic Desktop. Many Semantic Desktops have been created over time, each focusing on problems specific or generic, on restricted areas like email or task management, or on providing a general solution. Mostly they have not received the uptake they envisioned. This paper describes the representative Semantic Desktop systems. We explore their similarities, and what they do differently; the features they provide, as well as some common shortcomings and sensitive areas.".
54 abstract "One of the major obstacles for a wider usage of Web Data is the difficulty to obtain a clear picture of the available datasets. In order to reuse, link, revise or query a dataset published on the Web it is important to know the structure, coverage and coherence of the data. In order to obtain such information we developed LODStats - a statement-stream-based approach for gathering comprehensive statistics about datasets adhering to the Resource Description Framework (RDF). LODStats is based on the declarative description of statistical dataset characteristics. Its main advantages over other approaches are a smaller memory footprint and significantly better performance and scalability. We integrated LODStats into the CKAN dataset metadata registry and obtained a comprehensive picture of the current state of the Data Web.".
58 abstract "In this paper, we report on the experiences made during the implementation of a therapeutic process, i.e. guideline, for automatically ventilating patients in Intensive Care Units. The Semantic Wiki KnowWE was used as a collaborative development platform for domain specialists, knowledge and software engineers, and reviewers. We applied the graphical guideline language DiaFlux to represent medical expertiseabout mechanical ventilation in a flowchart oriented manner. Finally, this computerized guideline was embedded seamlessly into and executed autonomously by a mechanical ventilator. We experienced that the use of semantic wikis effectively supports collaborative, geographically distributed development and validation of knowledge-based systems. The graphical-oriented knowledge representation of the therapeutic process was easily graspable for the domain specialists involved, and allows interactive discussions as well as flexible and quick modifications.".
68 abstract "Simple Knowledge Organization System (SKOS) vocabularies are commonly used to represent lightweight conceptual vocabularies such as taxonomies, classifications and thesauri on the Web of Data. We identified 11 criteria for evaluating the validity and quality of SKOS vocabularies. We then analyzed 14 such vocabularies against the identified criteria and found most of them to contain structural errors. Our tool, Skosify, can be used to automatically validate SKOS vocabularies and correct many problems, helping to improve their quality and validity.".
90 abstract "In this paper we introduce LODE, the Live OWL Documen- tation Environment, an online service that automatically generates a human-readable description of any OWL ontology (or, more generally, an RDF vocabulary), taking into account both ontological axioms and annotations, and ordering these with the appearance and functionality of a W3C Recommendations document. This documentation is presented to the user as an HTML page with embedded links for ease of brows- ing and navigation. We have tested LODE’s completeness and usability by recording the success of test users in completing tasks of ontology comprehension, and here present the results of that study.".
101 abstract "Available domain ontologies are increasing over the time. However there is a huge amount of data stored and managed with RDBMS. We propose a method for learning association rules from both sources of knowledge in an integrated way. The extracted patterns can be used for performing: data analysis, knowledge completion, ontology refinement.".
57 abstract "In this position paper we identify nichesourcing, a specific form of human-based computation that harnesses the computational efforts from niche groups rather than the ``faceless crowd''. In the past six years, crowdsourcfing has achieved mainstream recognition as a cheap and fast way to collect large quantities of data. However, experiences in different domains have shown that crowdsourcing has serious drawbacks with respect to supporting complex user tasks and achieving the right level of quality of the result. Moreover, crowdsourcing initiatives require continuous attention in terms of maintaining a highly motivated crowd. We claim that nichesourcing combine the strengths of the crowd with those of professionals, optimizing the result of human-based computation for certain tasks. We illustrate our claim using scenarios in two domains: cultural heritage and regreening in Africa. The contribution of this paper is to provide a definition of the main characteristics of nichesourcing as a natural extension of crowdsourcing and to outline research challenges for realizing nichesourcing applications.".
87 abstract "Mining social media for opinions is important to governments and businesses. Current approaches focus on sentiment and opinion detection. Yet, people also justify their views, giving arguments. Understanding arguments in social media would yield richer knowledge about the views of individuals and collectives. Extracting arguments from social media is difficult. Messages appear to lack indicators for argument, document structure, or inter-document relationships. In addition, social media have lexical variety, alternative spellings, multiple languages, and alternative punctuation. Social media also encompasses numerous genres. These aspects can confound the extraction of well-formed knowledge bases of argument. We first chart out the various aspects in order to isolate them for further analysis and processing.".
13 abstract "This poster addresses the EKAW 2012 knowledge management and special focus areas by presenting the requirements and architecture for a manageable dataset cura-tion tool designed to enable low overhead hosting of new public knowledge models. Our work contributes to the development of new methodologies and tools for knowledge management by presenting a knowledge administration process that re-duces administrator effort while supporting distributed communities of administrators, authors and contributors. This is in contrast to most work to date on knowledge sharing that focus on easing the publication and consumption of the managed knowledge. At the heart of our architecture are components that translate, authorise and queue messy, real-world model update requests into SPARQL-Update queries that can leverage previous research on ontology evolution.".
14 abstract "Verifying whether an ontology meets the set of established requirements is a crucial activity in ontology engineering. In this sense, methods and tools are needed (a) to transform (semi-)automatically functional ontology requirements into SPARQL queries, which can serve as unit tests to verify the ontology, and (b) to check whether the ontology fulfils the requirements. Thus, our purpose in this poster paper is to apply the SWIP approach to verify whether an ontology satisfies the set of established requirements.".
15 abstract "In this poster paper we present an overview of knOWLearn, a novel approach for building domain ontologies in a semi-automatic fashion.".
17 abstract "Managing scientific knowledge about scientific findings is crucial. First of all, knowing what has been done before is essential in targeting new research. Secondly, a collection of formalized facts can as such be source for making new research. In this paper we discuss and present an approach to represent such scientific findings regarding research about the environment, i.e. the Earth. We focus on spatial and temporal aspects, and use deforestation in the Brazilian Amazon Rainforest as a case to illustrate our approach.".
18 abstract "Organic.Lingua is a project aiming to deliver a Web portal for Sustainable Agricultural and Environmental Education that is able to provide automated multilingual services for facilitating the cross-lingual retrieval and the multilingual construction of agricultural content. In such a global picture a key role is played by the collaborative multilingual construction and evolution of conceptual models supporting the content lifecycle within the portal.".
20 abstract "The task of entity retrieval becomes increasingly prevalent as more and more structured information about entities is available on the Web in various forms such as documents embedding metadata (RDF, RDFa, Microdata, Microformats). International benchmarking campaigns, e.g., the Text REtrieval Conference or the Semantic Search Challenge, propose entity-oriented search tracks. This reflects the need for an effective search and discovery of entities. In this work, we present a multi-valued attributes model for entity retrieval which extends and generalises existing field-based ranking models. Our model introduces the concept of multi-valued attributes and enables attribute and value-specific normalization and weighting. Based on this model we extend two state-of-the-art field-based rankings, i.e., BM25F and PL2F, and demonstrate based on evaluations over heterogeneous datasets that this model improves significantly the retrieval performance compared to existing models. Finally, we introduce query dependent and independent weights specifically designed for our model which provide significant performance improvement. ".
21 abstract "This paper describes an approach and framework based on semantic web and linked data technologies to assess medical data collected by patient monitoring equipment. Our approach has been successfully applied to patient data in the neuro-intensive care domain.".
22 abstract "The infectious disease domain brings together many practitioners working at different levels of granularity. And thus, they have different perspectives (biology, clinic and epidemiology) on the same phenomenon. Biomedical perspective deals with pathogens (e.g. life cycle) while clinical perspective deals with hosts (e.g. healthy or infected by the pathogen). In an epidemiological perspective, an infectious disease is characterized according to the way (causes, contamination process, etc.) it spreads in a population over space and time. This knowledge is then used in epidemiological monitoring systems to control the evolution and to prevent the emergence of disease spreading by implementing monitoring systems. In this poster we provide an ontological analysis of the epidemiological monitoring domain that aims to show how an Infectious Disease Ontology can support an infectious disease spreading crisis management in a various ways and at different phases of the monitoring process.".
23 abstract "In 2009, there was a large survey to collect journal articles and conference papers concerning the Wikipedia project. The findings of this survey were published, but are, as of now, only partially analyzed. The results showed a continuous growth of the number of journal articles related to Wikipedia. The number of conference papers grew until 5 years after the founding, but afterwards it began to decrease year by year. With these numbers in mind, we began our survey and started to interpret the data and find reasons for these characteristics. The poster will show the results of a systematic review of journal articles and conference papers about the DBpedia project. DBpedia extracts structured content from Wikipedia and republishes this content in a semantically understandable way. This allows one to ask comprehensive and sophisticated questions on top of the Wikipedia data. DBpedia is one of the most prominent data sets in the Linked Open Data cloud and has been used in a plethora of articles, since its foundation in 2007. The methodology comprises three main steps: (1) the data retrieval, (2) processing and (3) analysis. The process of data retrieval included four sources: Google Scholar, Arnetminer, the Semantic Web Conference Corpus and a manual research. We based our work on criteria developed in the Wikipedia study. The main criterion was that the term dbpedia occurs in the title or – if supported by the source – the abstract of the publication. Furthermore, we reduced the amount of publications to only match conference papers and journal articles, which were all peer reviewed.".
24 abstract "Vocabularies are an integral part of Linked Data (LD). They are published as browsable services for humans, as data dumps, as LD services, and as SPARQL endpoints for machines. Vocabulary services, such as concept lookup via term searches and hierarchical browsing, are used by indexers describing documents and searchers looking for suitable keywords. Basing vocabulary services on SPARQL endpoints containing SKOS vocabularies brings several benefits. We present the ONKI Light on SPARQL system, demonstrating how vocabulary services can be implemented using only SPARQL endpoints.".
109 abstract "We have implemented a novel approach for robust ontology design from natural language texts by combining Discourse Representation Theory (DRT), linguistic frame semantics, and ontology design patterns. We show that DRT-based frame detection is feasible by conducting a comparative evaluation of our approach and existing tools. Furthermore, we define a mapping between DRT and RDF/OWL for the production of quality linked data and ontologies, and present FRED, an online tool for converting text into internally well-connected and linked-data-ready ontologies in web-service-acceptable time.".
11 abstract "Most knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this paper, we present BOA, a bootstrapping strategy for extracting RDF from text. The idea behind BOA is to extract natural-language patterns that represent predicates found on the Data Web from unstructured data by using background knowledge from the Data Web. These patterns are used to extract instance knowledge from natural-language text. This knowledge is finally fed back into the Data Web, therewith closing the loop. The approach followed by BOA is quasi independent of the language in which the corpus is written. We demonstrate our approach by applying it to four different corpora and two different languages. We evaluate BOA on these data sets using DBpedia as background knowledge. Our results show that we can extract several thousand new facts in one iteration with very high accuracy. Moreover, we provide the first multilingual repository of natural-language representations of predicates found on the Data Web.".
117 abstract "Although roles have been discussed by many researchers, there remains some room to investigate to clarify ontological characteristics of them. This paper focuses on roles which are dependent on the future or past event/process, such as candidate, departing passenger, murderer, and product etc. In order to deal with such kinds of roles based on an ontological theory of roles, we introduce a model of derived roles with its temporal model. It could provide a computational model to represent temporal characteristics of roles.".
125 abstract "In this paper, we propose the use of a minimal generic basis of association rules between terms (AR), in order to automatically enrich an existing domain ontology. For this purpose, three distance measures are defined to link the candidate terms identified from AR, to the initial concepts in the ontology. The final result is a proxemic conceptual network which contains additional implicit knowledge. Therefore, to evaluate our ontology enrichment approach, we propose a novel document indexing approach based on this proxemic network. The experiments carried out on the OHSUMED document collection of the TREC 9 filtring track and MeSH ontology, showed that our conceptual indexing approach could considerably enhance information retrieval effectiveness.".
126 abstract "Compared to other existing semantic role repositories, FrameNet is characterized by an extremely high number of roles or Frame Elements (FEs), which amount to 8,884 in the last resource release. This represents an interesting issue to investigate both from a theoretical and a practical point of view. In this paper, we analyze the semantics of frame elements by automatically assigning them a set of synsets characterizing the typical FE fillers. We show that the synset repository created for each FE can adequately generalize over the fillers, while providing more informative sense labels than just one generic semantic type. We also evaluate the impact of the enriched FE information on a semantic role labeling task, showing that it can improve classification precision, though at the cost of lower recall.".
19 abstract "This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.".
23 abstract "This paper presents a generic framework for assisting personalization and enrichment of end-user experience as well as designing new functionalities in digital applications. The approach provides end-user profiles which are auto- matically generated using interaction traces corresponding to a set of temporally situated observed elements recorded in a given situation. A main contribution of this work is to extend existing inference services based on interaction traces with a declarative and generic approach. An ontology-based architecture formalized with Description Logics provides semantics for interaction traces, observed ele- ments and their associated objects. Due to our ontology modularization approach, the framework permits to describe new forms of interactions or novel object char- acteristics. We present the architecture of our framework and its reasoning levels, provide a proof of concept on a medical Web application, and emphasize that actors of many kinds can benefit from the supported inferences.".
25 abstract "Named Entity Recognition (NER) is important for extracting information from highly heterogeneous web documents. Most NER systems have been developed based on formal documents, but informal web documents usually contain noise, and incorrect and incomplete expressions. The performance of current NER systems drops dramatically as informality increases in web documents and a different kind of NER is needed. Here we propose a Ripple-Down-Rules-based Named Entity Recognition (RDRNER) system. This is a wrapper around the machine-learning-based Stanford NER system, correcting its output using rules added by people to deal with specific application domains. The key advantages of this approach are that it can handle the freer writing style that occurs in web documents and correct errors introduced by the web’s informal characteristics. In these studies the Ripple-Down Rule approach, with low-cost rule addition improved the Stanford NER system’s performance on informal web document in a specific domain to the same level as its state-of-the-art performance on formal documents.".
27 abstract "One of the main obstacles hampering the adoption of semantic technologies, is the lack of interest of users to create semantic content. In this paper, we focus on the incentives that may be applied and embedded within an application, in order to motivate users to create semantically annotated content. As an example, we show a semantically-enhanced Social Web Recommendation application, called Taste It! Try It!, integrated with a Linked Data source and Social Network. We also discuss the findings from the experiments run with 140 users.".
31 abstract "With the persistent deployment of ontological specifications in practice and the increasing size of the deployed ontologies, methodologies for ontology engineering are becoming more and more important. In particular, the specification of negative constraints is often neglected by human experts, whereas they are crucial for increasing an ontology's deductive potential. We propose a novel, arguably cognitively advantageous methodology for identifying and adding missing negative constraints to an existing ontology. To this end, a domain expert navigates through the space of satisfiable class expressions with the aim of finding absurd ones, which then can be forbidden by adding the respective constraint to the ontology. We give the formal foundations of our approach, provide an implementation, called Possible World Explorer (PEW) and illustrate its usability by describing prototypical navigation paths using the example of the well-known pizza ontology.".
36 abstract "With existing tools, when creating a new object in the Semantic Web, users benefit neither from existing objects and their properties, nor from the already known properties of the new object. We propose UTILIS, an interactive process to help users add new objects. While creating a new object, relaxation rules are applied to its current description to find similar objects, whose properties serve as suggestions to expand the description. A user study conducted on a group of master students shows that students, even the ones disconcerted by the unconventional interface, used UTILIS suggestions. In most cases, they could find the searched element in the first three sets of properties of similar objects. Moreover, with UTILIS users did not create any duplicate whereas with the other tool used in the study more than half of them did.".
37 abstract "Information retrieval on RDF data benefits greatly from additional provenance information attached to the individual pieces of information. Currently, provenance information such as source, certainty, and temporal information on RDF statements can be used to rank search results according to one of those dimensions. In this paper, we consider the problem of aggregating provenance information from different dimensions in order to obtain a joint ranking over all dimensions. We relate this problem to the problem of preference aggregation in social choice theory and translate different solutions for preference aggregation to the problem of aggregating provenance rankings. By exploiting the algebraic structure of provenance rankings we characterize three different approaches for aggregating preferences, namely the lexicographical rule, the Borda rule and the plurality rule, in our framework of provenance aggregation.".
4 abstract "Using a foundational ontology for domain ontology development is beneficial in theory and practice. However, developers have difficulty with choosing the appropriate foundational ontology, and why. In order to solve this problem, a comprehensive set of criteria that influence foundational ontology selection has been compiled and the values for each parameter determined for DOLCE, BFO, GFO, and SUMO. This paper-based analysis is transformed into an easily extensible algorithm and implemented in the novel tool ONSET, which helps a domain ontology developer to choose a foundational ontology through interactive selection of preferences and scaling of importance so that it computes the most suitable foundational ontology for the domain ontology and explains why this selection was made. This has been evaluated in an experiment with novice modellers, which showed that ONSET greatly assists in foundational ontology selection.".
53 abstract "The task of entity retrieval becomes increasingly prevalent as more and more structured information about entities is available on the Web in various forms such as documents embedding metadata (RDF, RDFa, Microdata, Microformats). The amount of structured data is such that supporting an effective search and discovery of entities is crucial for ensuring a satisfying user experience. Indeed, this need is reflected by international benchmarking campaigns, such as the Text REtrieval Conference or the Semantic Search Challenge which propose entity-oriented search tracks. In this work, we present a multi-valued attribute model for entity retrieval which extends and generalises existing field-based ranking models. Our model introduces the concept of multi-valued attributes and enables attribute and value specific normalization and weighting. We extend based on our model two state-of-the-art field-based rankings, i.e., BM25F and PL2F, and demonstrate that our model improves significantly the retrieval performance compared to existing models through evaluations on heterogeneous datasets. Finally, we introduce query dependent and independent weights specifically designed for our model which provide significant performance improvement.".
55 abstract "The Semantic Web has seen a rise in the availability and usage of knowledge bases over the past years, in particular in the Linked Open Data initiative. Despite this growth, there is still a lack of knowledge bases that consist of high quality schema information and instance data adhering to this schema. Several knowledge bases only consist of schema information, while others are, to a large extent, a mere collection of facts without a clear structure. The combination of rich schema and instance data would allow powerful reasoning, consistency checking, and improved querying possibilities as well as provide more generic ways to interact with the underlying data. In this article, we present a light-weight method to enrich knowledge bases accessible via SPARQL endpoints with almost all types of OWL 2 axioms. This allows to semi-automatically create schemata, which we evaluate and discuss using DBpedia.".
61 abstract "Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems, such as search and knowledge man-agement engines. However, scientific documents that are manually annotated with keyphrases are in the minority. This paper describes a new machine learn-ing-based automatic keyphrase annotation method for scientific documents, which utilizes Wikipedia as a thesaurus for candidate selection from docu-ments’ content and deploys genetic algorithms to learn a model for ranking and filtering the most probable keyphrases. We have evaluated the performance of this method on a third-party dataset of research papers. Reported experimental results show that the performance of our method, evaluated in terms of inter-consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised methods.".
66 abstract "Ontology quality can be affected by the difficulties involved in ontology modelling which may imply the appearance of anomalies in ontologies. This situation leads to the need of validating ontologies, that is, assessing their quality and correctness. Ontology validation is a key activity in different ontology engineering scenarios such as development and selection. This paper contributes to the ontology validation activity by proposing a web-based tool, called OOPS!, independent of any ontology development environment, for detecting anomalies in ontologies. This tool will help developers to improve ontology quality by automatically detecting potential errors.".
70 abstract "It is nowadays well-established that the construction of quality domain ontologies benefits by the involvement in the modelling process of more actors, possibly having different roles and skills. To be effective, the collaboration between these actors has to be fostered, enabling each of them to actively and readily participate to the development of the ontology, favouring as much as possible the direct involvement of the domain experts in the authoring activities. Recent works have shown that ontology modelling tools based on wikis' paradigm and technology could contribute in meeting these collaborative requirements. This paper investigates the effectiveness of wiki-enhanced approaches for collaborative ontology authoring in supporting the team work carried out by domain experts and knowledge engineers, as well as their impact on the whole process of collaborative ontology modelling.".
71 abstract "Linked Data querying over cached indexes of Web data often suffers from stale or missing results due to infrequent updates and partial coverage of available sources. Conversely, live decentralised approaches offer fresh results taken directly from the Web, but suffer from slow response times due to the expense of numerous remote lookups at runtime. We thus propose a hybrid query approach that improves upon both paradigms, offering fresher results from a broader range of sources than Linked Data caches while offering faster results than live querying. Our hybrid query engine takes a cached and live query engine as black boxes, where a hybrid query planner splits an input query and delegates the appropriate sub-queries to each interface. In this paper, we discuss the core query-planning issues and their main strengths and weaknesses. We also present coherence measures to quantify the coverage and freshness for cached indexes of Linked Data, and show how these measures can be used during query planning to maximise the trade-off between fresh results and fast query execution.".
75 abstract "The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The motivation behind NIF is to allow NLP tools to exchange annotations about text documents in RDF. Hence, the main prerequisite is that parts of the documents (i.e. strings) are referenceable by URIs, so that they can be used as subjects in RDF statements. In this paper, we present two NIF URI schemes for different use cases and evaluate them experimentally by benchmarking the stability of both NIF URI schemes in a Web annotation scenario. Additionally, the schemes are compared with other available schemes used to address text with URIs. The String Ontology, which is the basis for NIF, fixes the referent (i.e. a string in a given text) of the URIs unambiguously for machines and thus enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform.".
8 abstract "Presentations play a crucial role in knowledge management within organizations, in particular to facilitate organizational learning and innovation. Much of the corporate strategy, direction and accumulated knowledge within organizations is encapsulated in presentations. In this paper, we investigate the limitations of current presentation tools for semi-structured knowledge representation and sharing within organizations. We address challenges such as collaborative creation of presentations, tracking changes within them, sharing and reusing existing presentations. Then we present SlideWiki as a crowdsourcing platform for the elicitation and sharing of corporate knowledge using presentations. With SlideWiki users can author, collaborate and arrange slides in organizational presentations by employing Web 2.0 strategies. Presentations can be organized hierarchically, so as to structure them reasonably according to their content. According to the wiki paradigm, all content in SlideWiki (i.e. slides, decks, themes, diagrams) are versioned and users can fork and merge presentations the same way as modern social coding platforms (e.g. Github, Bitbucket) allow. Moreover, SlideWiki supports social networking activities such as following and discussing presentations for effective knowledge management. The article also comprises an evaluation of our SlideWiki implementation involving real users.".
9 abstract "OWL 2 DL is a very expressive language and has many features for declaring complex object property expressions. Standard reasoning services for OWL ontologies assume the axioms in the `object property box' to be correct and according to the ontologist's intention. However, the more one can do, the higher the chance modelling flaws are introduced; hence, an unexpected or undesired classification or inconsistency may actually be due to a mistake in the object property box, not the class axioms. We identify the types of flaws that can occur in the object property box and propose corresponding compatibility services, SubProS and ProChainS, that check for meaningful property hierarchies and property chaining and propose how to revise a flaw. SubProS and ProChainS were evaluated with several ontologies, demonstrating they indeed do serve to isolate flaws and can propose useful corrections.".
95 abstract "The completion of clinical trial depends on sufficient participant enrollment, which is often problematic due to the restrictiveness of eligibility criteria, and effort required to verify patient eligibility. The objective of this research is to support the design of eligibility criteria, enable the reuse of structured criteria and to provide meaningful suggestions of relaxing them based on previous trials. The paper presents the first steps, a method for automatic comparison of criteria content and the library of structured and ordered eligibility criteria that can be browsed with the fine-grained queries. The structured representation consists of the automatically identified contextual patterns and semantic entities. The comparison of criteria is based on predefined relations between the patterns, concept equivalences defined in medical ontologies, and finally on threshold values. The results are discussed from the perspective of the scope of the eligibility criteria covered by our library.".
96 abstract "The sheer complexity and number of functionalities embedded in many everyday devices already exceed the ability of most users to learn how to use them effectively. An approach to tackle this problem is to introduce ‘smart’ capabilities in technical products, to enable them to proactively assist and co-operate with humans and other products. In this paper we provide an overview of our approach to realizing networks of proactive and co-operating smart products, starting from the requirements imposed by real-world scenarios. In particular, we present an ontology-based approach to modeling proactive problem solving, which builds on and extends earlier work in the knowledge acquisition community on problem solving methods. We then move on to the technical design aspects of our work and illustrate the solutions, to do with semantic data management and co-operative problem solving, which are needed to realize our functional architecture for proactive problem solving in concrete networks of physical and resource-constrained devices. Finally, we evaluate our solution by showing that it satisfies the quality attributes and architectural design patterns, which are desirable in collaborative multi-agents systems.".
97 abstract "Ontology engineering is lacking methods for verifying that ontological requirements are actually fulfilled by an ontology. There is a need for practical and detailed methodologies and tools for carrying out testing procedures and storing data about a test case and its execution. In this paper we first describe a methodology for conducting ontology testing, as well as three examples of this methodology for testing specific types of requirements. Next, we describe a tool that practically supports the methodology. We conclude that there is a need to support users in this crucial part of ontology engineering, and that our proposed methodology is a step in this direction.".
1 abstract "In this paper, we present an approach for representing an email archive in form of a network, capturing the communication among users and relations between entities extracted from the textual part of the email messages. We showcase the method on the Enron email corpus, from which we extract various entities and a social network. Extracted entities are organized in a graph including email connected with named entities (NE) extracted from emails such as people, email addresses, telephone numbers. Edges in the graph denote relations between NEs, representing occurrence in same email part, paragraph, sentence or composite NE. We study mathematical properties of the graph structure created by the proposed approach and we describe our hands-on experience with the processing of such structure. Enron Graph corpus contains a few millions of nodes and it is a large corpus for experimenting with various graph-querying techniques, e.g. graph traversing or spread of activation. Due to its size, the exploitation of traditional graph processing libraries might be problematic as that keep the whole structure in the memory. We describe our experience with the management of such data and with the relation discovery among extracted entities. The described experience might be valuable for practitioners and highlights several research challenges.".
3 abstract "This paper describes a fully automated process of address book enrichment by means of information extraction in e-mail signature blocks. The main issues we tackle are signature block detection, named entites tagging, mapping with a specific person, standardizing the details and auto-updating of the address book. We describe how the process was designed to handle multiple-type of errors (human or computer-driven) while aiming at 100% precision rate. Last, we tackle the question of automatic updating confronted to users rights over their own data.".