ESWC 2020 |

ESWC 2020

Matches in ESWC 2020 for { ?s <http://purl.org/spar/c4o/hasContent> ?o. }

Showing items 1 to 83 of 83 with 100 items per page.

Paper.11_Review.0 hasContent "The paper presents a solution for supporting complex decision making processes in part through the use of ontologies and reasoning. The solution uses semantic technologies and has been used in applications in "building refurbishment" and "IT security risk management" by real users with very promising results. As a result, the paper first the In-Use Track criteria. I would like to see the paper published and presented at ESWC, but the paper's presentation can be improved to include more details and more clearly discuss the benefits and shortcomings of semantic technologies. It would help if you can use a single use case to more clearly describe each step involved in the process. I found Figure 2 very useful (as a side note, it has a strange black mark on some boxes so it's not fully readable), but that is only Step 3. For Steps 1-2, your examples are very high-level and not fully understandable, and for the remaining steps you have screenshots that are harder to understand and some are not English). Overall, what I would have liked to understand is: 1) How critical is the role of using an ontology and the use of a reasoner? 2) What would the solution look like without semantic technologies, but with some automation. In your use cases, you are comparing non-automated ways with your automated way and of course your solution is better. 3) What is the cost of setting up the solution for a new problem and how feasible it is to repeat the process for a new application. At least some estimates would be useful, e.g. did you spend a month to setup semergy.net or 5 years!? It would help if you can translate Figure 7 or add translations as notes on top of the figure. As per the usefulness of using a reasoner, in one or all of your use cases, can you bring examples where a basic matching of criteria would fail or would be much more difficult? How would your solution compare with other alternative decision support solutions that are not ontology-based but based on some other form of knowledge acquisition? For example the following work uses "Mind Maps" and transformation into an AI Planning model for decision support: http://www.cs.toronto.edu/~shirin/Sohrabi-AAAI-18.pdf"".
Paper.11_Review.1 hasContent "The paper "Supporting complex decision making by semantic technologies" proposes an ontology engineering method that supports researchers and practitioners to build the required ontologies independently of the application field, with the final goal of enabling sustainable decisions by stakeholder inclusion. The topic of the paper is interesting, and relevant to the In-Use track of the ESWC conference. Overall, the paper is clearly motivated, and the proposed methods are nicely discussed. Evaluations of the proposed methods are carried out in four European small- and medium- sized companies and two governmental institutions, proving their feasibility."".
Paper.11_Review.2 hasContent "The paper presents a practical application adopting semantic technologies for supporting decision making. I appreciated this work since it is a very good example of in-use contribution. The paper is easy to follow and it is clear in presenting how semantic technologies have been integrated in the system and which is their role. For making this paper complete, I warmly invite the author to save some space in the first part of the paper (e.g. page 8) in order to enrich the discussion part by including the main feedback collected by users, main lessons learned, and more envisioned applications or uses of this work. I think that this is the only weakness of this paper. From the presentation perspective the language is clear and I did not find typos to fix. ------------------ I thank the author for the effort in preparing the rebuttal. I confirm my score and I hope to see this paper accepted for ESWC."".
Paper.28_Review.0 hasContent "The paper presents a contribution on Aspect-based Sentiment Analysis, that is, sentiment analysis where the sentence may contains more opinions on a number of aspects, with potentially different polarity. The work is based/compares against the previous work of one of the authors. The previous approach, based on a given ontology exploited to improve the sentiment analysis task, is improved in this work through semi-automatic generation of the ontology based on external resources providing synsets. This work almost reaches the performance of the previous approach, but require less time to the user to construct the ontology due to the semi-automatic generation. The paper needs some extra work to improve the presentation, and it is difficult to read in some points. In the following some examples: On page 3 "finally the resulting ontology...", which ontology is it? Further, you are describing data (section 3) without describing your approach first. Regarding SemEval-2016 data, authors mention that has "a target "word, aspect, sentiment and sentiment score", but a map to terms used in Fig 1 would be beneficial (e.g. where is the sentiment score in the provided example?) "The term extraction method that is used has score based on domain pertinence (DP) and domain consensus (DC)" Is this part of a new contribution or you are just using results in [12]? In general it is not clear since the beginning the approach you are proposing, and whether it is a new one (which parts?) or whether you are applying an existing one to solve ABSA. The initial part of Sect 4.1, that is, the core contribution of the paper (Semi-automatic Ontology Learning), it does not go deep enough describing the approach. Also an example would be useful to the reader. Further, in this semi-automatic approach is not clear which part is performed automatically and which one requires human intervention (and to which extent). For instance, where the ontology in Fig.4 come from? On the other hand, subsection "Hierarchical Relations" seems to clearly describe a completely automatic step in building the ontology, described in a reproducible manner. Experiments: comparing only against [19] as baseline may perhaps be fair but not useful to understand the effectiveness of the approach w.r.t. very different approaches to ABSA. Further, although table 2 shows that the approach let the user save some time constructing the ontology, it is not clear whether this is true in different domains. Also, the metodology to get this result is not described clearly. Concluding, it is a potentially interesting and valuable work but at this stage may require some extra efforts to improve readability. Minor: moreover, moreover, furthermore... too many close repetitions in Section 3."".
Paper.28_Review.1 hasContent "The paper discusses a semi-automatic ontology builder approach for addressing the aspect-based sentiment analysis task. The paper is well written but the novelty is not completely clear since the number of approaches proposed in the literature about this topic is definitely huge. Anyway, I invite the authors to fix this work in two directions: related work and experiments. The related work section should include more recent contribution concerning the use of learning and argumentation techniques for aspect-based sentiment analysis. For example: - Mauro Dragoni, Célia da Costa Pereira, Andrea G. B. Tettamanzi, Serena Villata: Combining argumentation and aspect-based opinion mining: The SMACk system. AI Commun. 31(1): 75-95 (2018) Moreover, a discussion comparing the proposed contribution with respect to the papers cited within the related work section should be properly included where missing or expanded. The authors are invited to include in the related work section also works adopting semantic techniques for detecting aspects and opinions. An example is the following paper: - Marco Federici, Mauro Dragoni: A Knowledge-Based Approach for Aspect-Based Opinion Mining. SemWebEval@ESWC 2016: 141-152 While, concerning the evaluation, the authors are invited to test the algorithms also on the SemEval 2015 aspect-based benchmark. Moreover, results obtained by other SemEval campaign participants should be reported and discussed. ------------------ I thank the author for the effort in preparing the rebuttal. I confirm my score and I hope to see this paper accepted for ESWC."".
Paper.39_Review.0 hasContent "Given that some Wikipedia pages (tagged as list pages) contain large lists of entities organized into categories or tables, this paper proposes a method to extract entities from these pages and identify their types. The motivation for this work is to extend general knowledge bases like DBPedia or Yago with new entities in relation with their classes. The result of the process is a shared RDF knowledge graph called CaLiGraph and the addition of 700K entities, 7.5M type statements and 3.8M additional facts to DBpedia. Two kinds of list pages are exploited : pages with vertical enumerations and pages with table. The authors propose a machine learning process in two stage: they first generate training data that will provide positive examples to a distant supervision algorithm, and then they represent this data as features to train this algorithm (a classifier) that learns list items and their types (called subject entity in the paper). The strength of the approach is the way training data is collected: a taxonomy of concepts is built by combining Wikipedia categories, DBPedia types and Wikipedia list graph (list categories). This process, called Cat2Tax, has been presented in a previous paper. Given this taxonomy, the goal is link each entity identified on a list page to the right subject type, or to decide that this entity is itself a subject type. The papers explains in a very clear way the complementarity of the three sources of taxonomic relations, and the way they are cleaned and combined to get a high quality taxonomy. The lexical structure of the nodes in this graph is used to decide whether nodes and hypernym relations are meaningful or should be eliminated from the taxonomy. This resource is used to label entity mentions in Wikipedia list pages. If an entity in the list is identified in the taxonomy, its type and all its ascending nodes in the graph are used to label this entity but also all the entity at the same level in the list page. A balanced set of positive and negative examples is built to train the classifier. Each example is represented with a set of features, some of weach are specific to lists and other to tables. After generating the features, 7 classification algorithms are compared, with highest scores for random forest and XG-boost. XG-boost is selected as it get higher precision. Results are very promising, and analysed in terms of distribution of entities added to DBPedia (a majority of places and species), of the role of the features (page features are the most influential), and number of types statements added according to the types. This work is very clearly presented. The process of crossing various sources to build a taxonomy and then learn to identify and type entities in tables and lists results of high quality. Results are positive and promising results. The authors evaluated their correctness and precision, discuss them with acute analyses and identify possible improvements like taking into account lay-out features, and including an entity disambiguation stage when linking entities to their mentions on the pages. Assertions to be clarified . in step4, why is DbPedia taxonomy considered as the reference? is it better than YAGO's taxonomy which is said checked more closely than DBPedia? is it because disjointness axioms are available with DBpedia and not with Yago? . about the entity facts identified in section5: For the reader not familiar with Cat2Tax, it is no clear that this algorithm generated relation axioms when building the taxonomy. May be you should add an example of relation axiom in your example in step 1, when explaining the role of Cat2Tax ____ after reading the author answer to our comments _____ I appreciate the answer to the reviewers questions and requests. I wissh all this be included in the final version."".
Paper.39_Review.1 hasContent "In this paper, the authors presents a two-phased approach for the extraction of entities from Wikipedia’s list pages. The paper is well structured, according to this kind of papers. The paper provides a concise formalization of the proposed approach. Minor comments: - In the section 2, the authors provides some insights about related approaches in the literature. However, this section lacks of critical comparitions and limitations existing in the literature that justify the new proposed aproach described in the paper. - I strongly recommend to update the cited bibliographic. More than 50 percent of the cited papers aren't from the last five years (13/23)."".
Paper.43_Review.0 hasContent "Overall evaluation: 2 (accept)"".
Paper.69_Review.1 hasContent "Overall evaluation: 2 (accept)"".
Paper.80_Review.1 hasContent "Overall evaluation: 2 (accept)"".
Paper.120_Review.1 hasContent "Overall evaluation: 2 (accept)"".
Paper.196_Review.1 hasContent "Overall evaluation: 2 (accept)"".
Paper.43_Review.1 hasContent "The paper describes the Thing Description Ontology, an RDF axiomatization of the W3C's Thing Description information model, and its alignment with the Semantic Sensor Network Ontology. It also reports on an evaluation performed on the model, based on a set of (partial) implementations of it by 8 distinct organizations. After reading the paper I am confused as to what exactly is its goal. If it is to describe the development of the TD Ontology and how effective it is in facilitating semantic interoperability, then it should be a resource paper rather than an in-use one. If it is to report on the experiences and lessons learned during its implementation by the 8 organizations, then it does a mediocre job as it presents only the model creators' side and not the implementers' one. The important question to answer in an in-use evaluation is how the users assess the effectiveness and usability of the ontology or system at hand, not the creators. I like that the authors evaluate the quality of the semantic tagging in section 5.1 as this evaluation reveals several applicability issues. I am missing though the implementer's perspective! In general, I find the paper interesting but incomplete and imbalanced, at least for the in-use track. I would like less details on the characteristics of the TD ontology and its alignment to SSN, and more on the experiences of applying and using it, from both sides. ************** I have read the authors' response and while I feel that the paper is not completely satisfactory for the in-use track, I won't mind accepting it. Thus, I am changing my recommendation to "weak accept"."".
Paper.48_Review.0 hasContent "This paper introduces a technique for linking vectorized maps to other RDF datasets that describe geographical entities, converting the geographical data into geo RDF data in the process and enriching individual geographical features with spatiotemporal metadata. The paper describes the approach and reports on preliminary evaluations of the prototype implementation based on a few performance metrics. This is a highly interesting and timely research question, and the proposed method, while not overly novel, seems convincing. The paper is clearly written, easy to follow. I have one concern, that should be easy to fix in a minor revision. The description of the reverse geocoding step should be more elaborate (Section 3.2). It is fairly unclear how ambiguities are resolved. There can be multiple objects in a given geo bounding box, with different types. For instance, polylines corresponding to roads, rivers, as well as areas corresponding to, e.g., buildings, lakes, parking, etc. Is there some typing from the shapefiles in the original vectorized data, and if so, is it used in the disambiguation process. And then, if so again, is it sufficient? For instance, the quality and level of detail of the vector data can also have an influence on the bounding box. Isn't this introducing uncertainty/ambiguity? Also, what happens if the reverse geocoding service returns nothing meaningful? Overall, it is unlikely that this step would yield 100% successful "tagging" of the vector data. How good/accurate coverage can we expect this step to yield?"".
Paper.52_Review.0 hasContent "In this paper, the author exploited the importance of the geometrical space, hyperbolic space, for the Knowledge Base Completion. They showed that the lagging performance of translational models compared to the bilinear ones is not an intrinsic characteristic of them but a restriction that can be lifted in the hyperbolic space. Experimental results validated that the right choice of geometrical space is a critical decision for KBC. Positive points: 1) The motivation is clear and the related work is sufficient for their research topic. The authors mainly focus shallow embedding. They discuss the shortcoming of current techniques, e.g., RESCAL and TransE and proposed their model based on the benefit of hyperbolic space. The motivation is clear. 2) The paper is well written and happy to follow. The background of some concepts i.e., Hyperbolic Space, are well introduced in the paper. Some Lemmas are also shown to strength their model. And the problem is clear and experimental results do answer their research question. 3) The code and its software document is available for reproducibility. Negative points: 1) The time/space complexity is not provided, which is very important for large-scale dataset. Current bilinear model, like RESCAL, actually scale very well for large dataset. The proposed HyperKG involve some Riemannian gradient. It is unclear how efficiency of the HyperKG. Some complexity will convince the readers. 2) The dataset of WD and WD++ is quite small. Not sure if such small datasets have statistical significance. Some large dataset, e.g., dataset in recommender system area, can be introduced in the experiments. In summary, the overall idea of introducing geometrical spaces for Knowledge Base Completion. The code and its software document is available for reproducibility. I vote for acceptance at this time. =============================== After Rebuttal =============== The authors answer my question about the time and space complexity of HyperKG, which is important. I suggest the authors also add one subsection to discuss time and space complexity in the original paper. The dataset is still an issue by using randomization tests. The auhtors point out that "recent studies have notated many KB relations have very few facts". I keep my original score since the authors answer my questions."".
Paper.52_Review.1 hasContent "----------- Strong Points ----------- -Novel approach. -Theoretic proofs and counterexamples are provided. -Strong evaluation. -Implementation and data available online ----------- Weak Points ----------- -Difficult to find any Summary: The paper addresses the important task of link prediction on knowledge bases(KB) i.e. automatic knowledge base completion (KBC). The presented approach adopts hyperbolic geometry to exploit scale-free structures of KBs in order to learn KB embeddings. The authors focus on a specific type of KB embeddings models, i.e., translational models aiming to model vector translation between entities. The proposed model is also shown to be effective in capturing the logical consistency in the facts induced by the KB embeddings. The paper demonstrates how the performance gap between translational and bilinear model families can be closed. The paper provides counterexamples showing the Kazemi and Pool restrictions do not apply to translational models ie TransE when fact validity is based on implausibility scores below a non-zero threshold. Introduction and Related Work: The introduction as the whole paper is well-written and introduces the problem and its specifics for KBs and how they can be exploited through hyperbolic geometry. The authors did a very good job of narrowing down the problem and scope of the paper while putting in the context of the existing body of work. Preliminaries and Proposed Algorithm: The introduced notation and preliminaries are well-explained. Although I am not an expert on hyperbolic geometry and spaces, I was relatively easy to grasp the idea behind it due to the efforts of the authors to provide a clear and easy to follow the narrative. Experiments and Evaluation: A sound evaluation aimed to highlight the specifics of the proposed model. The evaluation addresses the structural properties of the datasets as well as the novel regularisation scheme introduced due to the usage of the Pioncaré-ball model. Critical Appreciation: Overall, I think this paper presents a sound and dense and valuable contribution to the ESWC community and has to be accepted. The paper addresses limitations in a set of KB models i.e. translational and provides strong empirical evidence for that the performance of the TransE model family is not an intrinsic model property but a shortcoming that can be eliminated by the right choice of the geometric space. Moreover, the paper settles an existing disagreement in the recent body of work on KB embeddings as it provides counterexamples showing that Kazemi and Poole restrictions do not apply to TransE models. The authors also provide a theoretical proof that the relation regions captured by the proposed HyperKG are convex and thus can effectively represent QC rules and consequently are reasoning based on HyperKG embeddings would be logically consistent and deductively closed with respect to ontological rules. Minor comments: Page 7: … In our experiments, we noticed a tendency of the “word” vectors to … Page 8: … have shown that the FTransE … =============================== After Rebuttal =============== I keep my original score."".
Paper.52_Review.2 hasContent "The paper investigates the impact of the mathematical nature of the embedding space on the performance of lower dimensional embedding approachs to link prediction. Further, they investigate the possibility to represent certain types of logical rules into hyperbolic space. Knowledge-base embedding is a very active area of research with a high potential impact on semantic web technologies. The paper is therefore potentially relevant for ESWC. Concerning the two contributions claimed by the authors (see above), I however only consider the second one (rule embedding) to be valid. The first contribution: the investigation of the impact of the nature of the embedding space on the performance imho has been superseded by recent work by Ruffinelli et al* that shows that the differences between embedding models proposed so far was mainly due to differences in the training and evaluation protocol and that the older translational approaches do not perform worse than more recent ones if the same training and evaluation protocol is applied. Further, the results reported in this paper are systematically worse than the ones achieved by Ruffinelli et al even with the most outdated models. The ability to include rule-like structures into the embedding space is indeed an interesting result. In its present form, however, the paper only discusses this in a very brief and incomplete way. I therefore vote for rejection *Daniel Ruffinelli, Samuel Broscheit, Rainer Gemulla: You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings. Proceedings of the International Conference on Learning Representations ICLR 2020."".
Paper.53_Review.0 hasContent "After rebuttal: The authors have addressed most of my comments and now the vocabulary is available, so I decided to raise my score to accept. ************* This paper describes an approach to represent semantically study results so users can aggregate and compare them more efficiently; and an application to ease search, exploration and comparison of studies in the domain of human cooperation. The paper is well written, easy to follow and highly relevant for the conference in general and this track in particular. I also believe the topic is quite novel and with a lot of potential usefulness, as it could really be helpful to make researchers more efficient when comparing their work or exploring the state of the art. Therefore, I think the paper would be a nice addition to the conference, but I also felt like the approach was not completely mature. I discuss below some of the points that I think have room from improvement: - The vocabulary is listed as a contribution, but I tried to resolve http://data.coda.org/coda/vocab and I got a 404. - The paper claims to help on hypothesis generation, but this is not explained or demonstrated in the evaluation. I suggest removing such claims. - The authors state that reusing existing vocabularies was not within the scope of this early release. However, if the contribution is the semantic representation, this point needs to be addressed. Specifically because some of the terms have a very simple mapping to schema.org or the investigation, study, assay (ISA) model. https://isa-tools.org/ - Regarding the data model, I find very confusing that DOI and Paper are concepts described with the same properties. A DOI represents a paper, so I would have that as the paper identifier or import the DOI metadata and associate that to the paper. As it is, it can be interpreted that the title or author of a DOI is the title or author of the identifier, not the target paper. In addition, some properties seem to have been introduced just to define a hierarchy, to group subproperties under them (e.g., scholarly prop). If these properties do not have any semantics associated with them, it is often recommended to delete them from the ontology. Instead, you can define a category and add it as an annotation property. - I think that the paper will become really strong after the user-based evaluation is published. The formative evaluation presented is appropriate, but I would like to know more about how did the community converge on those 86 independent variables. This is often challenging, and I would like to see if there are guidelines from the community and how they reached consensus on those variables. - Some example SPARQL queries showing how to retrieve contents would really help understanding that data and the model behind it. - Where are the APIs used by the application? - Even though I understood the intent behind table 4, I don't quite understand the table itself: how is the information shown in another paper relevant to the studies assessed by the experts in this experiment? - How would one submit a custom visualization to the system? Cosmetic issues: - The paper gets at some point repetitive with contributions. I think that I have read them three times. And 'more automatically' does not sound correct to me. I would say "facilitate automation" or performing meta-analysis "semi-automatically" - There are a few typos, I recommend a proof read. In particular "softwares" is incorrect, instead use "software" or "software tools""".
Paper.53_Review.1 hasContent "In this paper the authors develop a new approach that speeds up the redaction of systematic reviews. The authors perform their exercise on a particular case: a dataset reporting experiments in the field of social and behavioural science. The authors first convert their dataset in knowledge graph, using their own model, and then they design a web interface that allows to explore this data through several facets. The effects of such semanticisation definitely benefits the whole workflow. The paper is well written and easy to understand. The evaluation is sound, although I would have very much welcomed a more qualitative evaluation with few researchers. However, I must admit that such evaluation requires a different planning as you will need researchers to perform an actual systematic review. Minor issue: Page 11, there is a disalignment between Fig4d and Fig4e and when referenced in the text."".
Paper.60_Review.0 hasContent "The paper presents Piveau, a platform for "Large-scale Open Data Management". The authors provide several arguments on why a new platform for open data management is needed and how their solutions compare with other open and popular solutions, and present some details of their implementation of the system. All the components of the system are open-source and publicly available on github. The solution relies on semantic web technologies and is deployed on https://www.europeandataportal.eu/ and so meets the requirements of the In-Use Track. I would like to see the paper published and presented at ESWC, although I have a few comments that I encourage the authors to address before final publication. My main comment is about the core hypothesis, which is: "a more sophisticated application of Semantic Web technologies can lower many barriers in Open Data publishing and reuse". I was happy to see that you define this core hypothesis early in the paper, but at the end I was somewhat disappointed that you haven't completely tested and validated this hypothesis. Throughout the paper you outline how semantic web technologies are used and how other solutions don't use such technologies, but then you don't make it clear how the use of these technologies can help (or have helped) the end users of the system. - In Section 6, Table 1, you are comparing features, and e.g. you are giving your solution 2 points for having a "Linked Data interface" through a "SPARQL endpoint". But the main question is: why is this needed and how can it help? If I can achieve my goals through a simple JSON based REST API, then why SPARQL? Similarly, one of the semantic technologies you are using is to achieve "Quality Assurance" comparing with the other solutions. Can you more clearly outline why this is important and how one could have achieved it without e.g. using SHACL and how crucial is the role of semantic web technologies here? - One suggestion is to use examples throughout the paper. You seem to have a really large deployment that you can use to derive examples. https://www.europeandataportal.eu/ shows 483,714 datasets. How would the solution for europeandataportal.eu look like if it wasn't based on Piveau? - Related to the above, it would be good to know whether RDF enables particular forms of reasoning that a standard JSON meta-data based solution cannot provide. - Is there a reason you do not mention the Socrata Platform which is a very widely used open data publication platform? I believe it is also open https://github.com/socrata ? - Another core question / food for thought: why not using Wikidata for all the meta-data, which has the additional benefit that you will contribute to a large and completely open knowledge base? And on the technical side, do you really need to rely on Virtuoso? Couldn't you contribute facts to Wikidata instead and query the public APIs?"".
Paper.60_Review.1 hasContent "The paper describes the architecture and technical choices behind Piveau, a platform for creating, sharing, curating and querying Open Data. The authors explain their thinking with respect to the choices made. As can be quickly seen, Piveau tries to attain many different goals, thus there are "many different solutions" and one could, apparently, have envisioned other "combinations of solutions" while still doing something meaningful and interesting. The strong point of the paper, according to me, is the comparison table (Table 1) on page 12. That really shows how the system relates to comparable ones. The strong point of the work itself is that Piveau appears to be the system behind https://www.europeandataportal.eu, with millions of RDF datasets and (the authors state) "tens of thousands of updates per day". I think this level of usefulness and real deployment, alone, would justify acceptance. However, I'm a bit unhappy with the paper *writing*. It starts with the introduction: it is hard to figure out exactly what the problem being solved is, and what the limitations or shortcoming of the state of the art was, before this paper was written. The abstract talks of "barriers" and "limitations...", then of "bodies that encourage and foster...", then states "However, no existing solution for managing Open Data takes full advantage of these possibilities and benefits." This claim is too vague: which possibilities and benefits? I understand that due to the platform having wide applications, it is hard to pinpoint ONE advantage. Yet I still the authors should try to at least rewrite the abstract and introduction to clarify their contribution as much as it can be done. Then, the paper suffers from several typos: "verication", "tenths", "driven. [9]" (the citation should not be outside of the previous phrase to which it belongs), a phrase without a predicate ("For instance the integration of synchronous third-party libraries into our asynchronous programming model.") The authors should really proof-read it thoroughly to catch typos and improve the style. More annoyingly, the paper makes some claims or choices whose reasons or justification are not fully clear: 1. "Furthermore, there is no satisfactory and human-friendly method to present RDF in a user interface." This ignores significant efforts invested in the data visualization community to do just that (present RDF in user interfaces). Other methods are based on RDF graph summarization etc. I understand what the authors had in mind, but the claim here is too broad and needs to be nuanced. 2. The orchestration through PPL (at the end of 4.1) appears to use an ad-hoc model for orchestration, whereas very well-known standards exist for orchestrating Web services (think WSDL or BPML). Why was it necessary to invent something new? 3. The authors state that existing ETL platforms are not developed specifically for RDF. How big of an obstacle is that? Would it have been hard to tweak an existing platform to obtain an RDF-specific ETL one? Overall, I think the paper has value, but it is also annoying in some declarations that appear unjustified. It also suffers from "describing a lot of disconnected aspects" which I think follows from *implementing* a lot of orthogonal aspects. Thus, this second problem may be hard or impossible to solve in the paper. [After reading the authors' rebuttal] It is still not clear to me why micro-services and an ad-hoc orchestration approach is better (more flexible?... why, how?) than the standards in that area. I also find the authors' renewed statement of their contributions not very convincing. However, the paper clearly describes a fair amount of work, I wouldn't fight against acceptance."".
Paper.60_Review.2 hasContent "The paper describes the Open Data management solution "Piveau", which is a framework for publishing, harvesting, and managing dataset descriptions. The framework is deployed at the European data portal, which is impressive in size and functionality. Current Open Data publishing frameworks, such as CKAN or OpenDataSoft, only partially support RDF metadata (e.g., only export of RDF, flat data schema, etc.). The proposed solution, "Piveau", clearly demonstrates the advantages of using semantic web technologies for the Open Data use case: The solution is very well-thought and scalable (as shown with the application in the European data portal). In particular, the section on the impact of SW technologies (6.3) is an interesting read and gives us a list of open issues which have to be tackled to support such solutions. While I really like the work and, in my opinion, it clearly should get accepted, I have some points that could help to improve the paper: * The difference between datasets and metadata descriptions was not always clear. In my opinion you could make more clear that the e.g. the importer collects metadata descriptions (and not datasets). A clarification/definition of terms at the beginning and a consistent use throughout the paper (dataset vs data vs metadata) could help to make this easier to understand. * Why do you compare to uData? While it is clear that CKAN is the most popular software for Open Data publishing, I was missing an argument why you select uData (and not OpenDataSoft, Socrata, DCAN, etc.) * To what extent are there actually links to other resources? Import of existing (e.g. JSON-based) dataset description makes it obviously not Linked Data. It could be mentioned in your critical assessment that the technology alone is not enough to get from Open Data to Linked Open Data. * While you discuss already related works with respect to management solutions, you could also include SW-based approaches which aim at harmonized/integrated Open Data metadata, e.g. [1, 2]. [1] Brickley, Dan, Matthew Burgess, and Natasha Noy. "Google Dataset Search: Building a search engine for datasets in an open Web ecosystem." The World Wide Web Conference. ACM, 2019. [2] Neumaier, Sebastian, Jürgen Umbrich, and Axel Polleres. "Automated quality assessment of metadata across open data portals." Journal of Data and Information Quality (JDIQ) 8.1 (2016): 2. p.3: recent efforts focusses on -> focus on"".
Paper.62_Review.0 hasContent "++++ COMMENTS AFTER REVIEW ++++ While I still hold some reserves, the authors' response has been clear and convincing on some of the points raised by me. Additionally, the required changes do not require a full revision of the work but only clarifications. I thus revise my evaluation to "weak accept". ++++ INITIAL REVIEW ++++ The paper reports on the latest evolutions and improvements in the evaluation campaign OAEI, introducing in particular the latest gold standards for the recently added Knowledge Graph track and discussing the results of a hidden task – an evaluation of a run of matchers over datasets with no-overlapping domains - the that they performed over the last competing systems of the initiative. The paper is well and clearly written, it introduces the initiative for those who are new to it, briefly updates the reader on the latest additions to the evaluation and discusses the benchmarks, the way they have been built and the challenges characterizing the task as much as its evaluation I have a doubt about the trustworthiness of the Gold Standards. If the GS 2019 revealed so many matches, then there are a lot of them missing from GS 2018. In particular, why, if the models have been always matched by experts (i.e., even in 2019, as claimed by the authors), the two dataset pairs that are present in both GSs (i.e. memoryalpha-memorybeta and memoryalpha-stexpanded) have so different results? Have they simply been improved in 2019? However, if GS 2019 shows 4 and 7 trivial matches (respectively, for the two pairs, i.e. 14 total – 10 = 4 and 13 – 6 = 7) for them, why these slipped out of the attention of the crowdworkers on such a small number of elements? Additionally, if the number of matched instances with the link method is decently reliable (at least as an order of magnitude), how can the poor numbers of negative matches from 2018 be of any support in the evaluation? (as in section 3.1 it is said that only them have been adopted for the precision). I’m guessing they are mostly negative trivial matches, thus avoiding a phenomenon which might be smaller; however I’m not so sure of that and I don’t think it is understandable from this data. As these corpora of matches are used for the evaluation, noting such large differences across years raises some doubts about how much these can be called “gold standards” Later on, in the golden hammer bias section, in order to assess how much a 50-sample is reliable for the golden hammer, a statistical test of confidence should have been carried on, repeating experiments with different 50-matches samples on the same systems and checking their variation. The observations on the golden hammer bias are also interesting, but it should be, again, statistically assessed how much those numbers in the domain-overlapping case are not a result of those overfitting matchers, yet a purely proportional result of a matcher that “throws out some results” in an ocean populated with many positive matches versus one where there are not. I’m ambivalent, on the one side I value the importance of publicly disseminating on the results of such a renowned initiative such as OAEI, on the other side we must critically address flaws in the initiative itself or on its related dissemination, where the results might be questionable and the way to analyze them not as complete as it could be. I’d really suggest, for future investigation, to collaborate with some expert in statistics, to understand the statistical significance of the results. Numbers are important, but their meaning (as we all know working in the field of semantics) is even more. TYPOs. * the reference to table 3.2 in pag. 6 is actually table 4 MINOR REMARKS: The concept of trivial match (which can – rightly, as the reviewer is aware of it – be guessed from the end of the paragraph where “same names” are mentioned) should be clarified to the reader, as they might not be aware of it."".
Paper.66_Review.0 hasContent "This paper introduces a novel (RDF) entity summarizer, DRESSED, that takes user feedback into account to improve the set of triples presented to users. The approach uses reinforcement learning under the hood. The paper describes the approach, and reports on two experiments that evaluate the performance of DRESSED (compared to other entity summarization approaches). The paper is nicely written, and addresses a very important problem. Entity summarization is key in many ways for the Semantic Web, including, but not limited to, the display of information to users when, e.g., browsing the Web of Data. The approach is described with an appropriate level of detail, and an open source implementation is made available by the authors. While there does not seem to be much documentation about how to set it up and run it, it is already a good thing that the software prototype is made available upfront. A few fairly minor concerns: - Figure 1: maybe nitpicking, but isn't it weird to consider that the label is an irrelevant triple? - There is little reference/explanation of Figure 2, which limits its value. - The notation for significant/non-significant differences in Table 1 is unclear (circle and triangle). Is the first glyph referring to FACES-E and the second to IPS? This should be made clear. Also, if there were significant differences between FACES-E and IPS, it would be relevant to report them (using a less cryptic notation). - Section 4.2: give more detail about the random sampling of entities. Are the entities the same for all participants? What's their rdf:type? Is it a within subject experiment, all participants performing the three conditions? If so, is the presentation order of conditions counterbalanced to void asymmetric transfer? - Similar issue with Table 2 as with Table 1. Overall, the results are promising. This is an interesting piece of work, likely to be of interest to a broad audience at thr conference, and I recommend acceptance. -------------POST REBUTTAL-------------------- Rebuttal addresses my concerns. Rating and recommendation remain unchanged. Please revise the paper accordingly."".
Paper.69_Review.0 hasContent "Thank your for the clarifications in the rebuttal, in particular about the covered expressivity. The scope/limits have to be clearly stated in the paper because "complex queries" can have a very different meaning for different people (for me, they're really simple). # Strengths S1 - first QA dataset that comes with answer verbalizations, as an extension of LC-QuAD S2 - several machine learning models are given as baseline for future evaluation by the community S3 - the resources are available through a dedicated web page and a repository # Weaknesses W1 - the range of expressivity of the covered questions is not clearly defined W2 - the reusability of the dataset is limited by the fact that often many answer verbalizations are possible W3 - the dataset is large (5000 questions) but this may not be enough for machine learning The dataset should also include the raw answers, as a list, for easier reusability. # Summary The main proposed resource is an extension of the LC-QuAD dataset, which is a question-answer collection for evaluating Question-Answering (QA) approaches, with a new field that contains the verbalization of the answers. The verbalizations were first generated automatically based on templates, and second were manually curated by following some style rules (e.g., active voice). The secondary resource is made of machine learning models based on neural networks to generate the answer verbalization from the question or formal query. They serve as baselines for the production of templates for answer verbalization. Scores are given in the paper for each model, and it is shown that there is ample room for improvement. # Discussion QA systems generally have low accuracy in open domain questions, ranging from 20% to 80%. It is therefore important, when answers are returned by a QA system, to give insight about how the QA system came to those answers. The authors propose to generate a verbalization of the answers that reflects the intention of the formal query that was used to retrieved the answers. I agree with the authors that this is more natural than showing the formal query or even to verbalize the formal query before listing the answers, at least in a vocal dialogue. [W1] The authors claim that the dataset covers complex questions and not only factoid questions. However, from what I have seen, all questions use either a ASK query or a SELECT DISTINCT query with a single projection, either ?uri or COUNT(?uri). We need to know more precisely the range of questions: - how many projections in the SELECT clause? at most 1? (several projections would make the verbalization more useful and interesting) - which aggregators? only COUNT? - how many triple patterns at most? what's the distribution? - are there cycles in the graph patterns? - are there graph patterns with UNION, OPTIONAL or MINUS? - what about CONSTRUCT queries (more open questions) that would really make verbalization compulsory? I agree several features are clearly future work, but it seems fair to state clearly what is covered by the dataset, and what is future work. [W2] The main difficulty I see in the proposed approach is that many correct answer verbalizations are possible. On the contrary, there is in general a single correct answer set for a given question. Although I agree that verbalizing answers is a good idea, using your own verbalizations to evaluate other verbalizations seems a bit fragile. Therefore, your verbalizations can be useful as examples or as target for machine learning, but if I come up with my own verbalizations, it is not clear how I can compare to yours. # Minor comments - Fig.1: I can't see QALD datasets, this should be added - Table 1: 33k -> 33K, 11k -> 11K - p.5: the users is --> the user is - p.6: publicity --> publicly - p.7: I would switch 'Generate' and 'Create' in the paragraph headers because the verbalization templates are manually created, while the initial verbalizations are automatically generated. - p.9: suitability --> sustainability - p.10: straight forward --> (in one word) - p.11: evaluation metrics: please give the range of values for each measure, and whether it is better to have lower values or higher values. Some of this information is given later but it would be better here, close to their definitions."".
Paper.72_Review.0 hasContent "The paper describes the creation of ESBM, a benchmark for testing entity summarization. This is definitely a relevant problem for the semantic web Community. ESBM is manually created and made publicly available with a permanent identifier on w3id.org. The main objective is to create a resource of general purpose, in comparison to state-of-the-art datasets. It is also intended that the resource shall be permanently available. The methodology followed by the authors was to sample prominent datasets (DBpedia and LinkedMDB). The result is a curated dataset from which ground truth summaries are produced manually. The paper described in sufficient detail the methodology for creating these ground truth summaries. One concern that arises, regarding this resource, is that the ground truth is intrinsically bias. Further, it seems rather difficult to main and verify the dataset. Nevertheless, given the scarcity of benchmarks available for evaluating entity summarization, ESBM shall provide reliable starting point for benchmarking, thus improving the state or the art."".
Paper.72_Review.1 hasContent "The paper presents an interesting Entity Summarization BenchMark (ESBM). 1) The ESBM is said to overcome the limitations, however, these limitations of existing approaches are not covered (only references) sufficient detail. 2) Similarly, the desired criteria are not presented clearly in the paper. Hence the paper is not self-contained. - I suggest adding an overview of both. In the results section, the mean of F1 with the ground truth results does not seem the best estimate. In case there is an exact match with one summary, It can be concluded that the summarise works good, irrespective of how well/bad is the match with other summaries. Average F1 would be a good estimate for Oracle which is an intersection of the ground truths. For the results, it would be interesting to see the best F1, median, and also the worst-case match with the ground truths for one or two of the best approaches."".
Paper.77_Review.0 hasContent "The paper introduces a partial domain adaptive model for relation extraction. The authors argue that the method better accounts for negative transfer and therefore helps to improve F1 scores for the target domains. The paper is a bit hard to understand as the example "Baghdad, US" example is not very intuitive and throughout the paper it is unclear what type of relations this work actually supports: only "part of" (which I think is actually wrong in the case of Baghdad and US) or other types of relation such as "sibling", "parent", etc. This is important to gauge the potential impact of this work. A more comprehensive example that is used throughout the paper would help also this work. In the introduction there is a lot of terminology that is later-on not used very often (e.g., "GAN") or at all ("DS", "TL") while PDA (guess: "partial domain adaption") is not properly introduced. In general the paper is interesting so I would suggest to include some main improvements for a final version: write the introduction more focused on the actual task and have a nice running example for the whole work. Have the paper proof read due to some grammar/formulation issues."".
Paper.77_Review.1 hasContent "This paper presents an approach that leverages domain adaptation and adversarial learning to improve relation extraction. The paper is well written and detailed. Sometimes I miss some concrete examples to illustrate the different steps, and it is not always is easy to follow given the mixture of techniques applied and the number of acronyms. Some minor comments: - In the intro it is not clear the Model collapse explanation, I would need more context or a more detailed example. - When Fig. 1 appears there is no previous description about PDA, and even PDA is not referenced in the text at any moment, only in the image caption. I missing some explanation here. - In Section 2.2 it says that many studies have proved that domain adaptive models based on DL are better than DL models with sparse data. I would need more context here, studies only on relation extraction or this is a more general assumption? - Fig. 2 is a bit hard to understand - Section 4.1, I'm missing a more detailed explanation about the dataset, what kind of dataset, you are assuming that the reader knows the dataset. All in all I recommend a rewriting trying to make the paper easier to understand for readers not experts in relation extraction."".
Paper.80_Review.0 hasContent "----------- Strong Points ----------- -Domain independent scoring function and threshold heuristic -Evaluation using textual, structured, and “dirty” datasets -Implementation and data available online ----------- Weak Points ----------- -Valley and elbow thresholding have rather similar results Summary: The paper addresses an important shortcoming of active learning for entity resolution i.e. cold start problem. The proposed method deals with the cold start problem by introducing unsupervised matching based on a novel domain-independent threshold heuristic to bootstrap active learning. The unsupervised matching uses a datatype-specific similarity metrics to assign a similarity score to all record pairs. The threshold boundary “t” is then set to a value accounting for the elbow point of the cumulative similarity score distribution of all record pairs. The distance between the threshold values and the aggregated similarity score of each pair serves as confidence weights that are then used to provide the active learning with the most suitable pairs at for every iteration, i.e. more noisy pairs are supposed to affect the warm start less than more confident pairs. The method is evaluated and shows promising results on three different types of data, i.e. structured, textual and dirty. The evaluation experiments are well-designed to measure the influence of the proposed thresholding heuristic, the bootstrapping and the warm start of the active learning. Introduction and Related Work: The introduction as the whole paper is well-written and introduces the problem and its specific. Related work is nicely structured along the three main points of the presented methodology, feature engineering, unsupervised matching, and active learning. Proposed Active Learning Methodology: The authors proposed a two-step methodology. An unsupervised matching step consisting of labeling pairs and assigning confidence weights using the elbow point threshold method. In the second step unsupervised labeled and weighted pairs are used in the warm start pool to bootstrap the training of the active learning random forest classifier and a heterogeneous committee of five different classifiers which includes the random forest classifier. The committee is used to select a pair form the noisy pool to be added to the labeled set after manual labeling in every iteration of the active learning. The labeled set is used to incrementally train new trees of the random forest classifier. This procedure allows for a “fading away” effect of the initial model learned in the warm start phase. Experiments and Evaluation: I appreciate the evaluation procedure aimed to highlight the specifics of the proposed threshold heuristic. Nonetheless, I somehow missed a comparison to other existing approaches of entity resolution. Such a comparison would have put the results in a different light. It would have been also interesting to see an evaluation addressing the effects of blocking non-matches that eventually would justify the selected threshold of 0.2. My main point of concern is, however, the very similar results of the valley and elbow threshold methods. For example, if we look at the deltas to the supervised F1-scores, we have three wins for the elbow two wins for the valley and one for the static threshold. For the unsupervised, we see similar results where for two datasets we have a difference between the two methods in third place after the decimal point. The results are presented in a clear insightful way. The authors, however, may consider using different than yellow color for the “no_boot” results as standard divisions are rather difficult to see in figures 5-7. From my point of view, the authors elegantly combine a set of existing methodologies and techniques in an interesting and innovative way to solve an important problem. The key point of the paper is the elbow point threshold heuristic. Overall, I think this paper presents a sound and valuable contribution to the ESWC community and should to be accepted. =============================== After Rebuttal =============== I keep my original score."".
Paper.80_Review.2 hasContent "This paper proposes an approach to perform entity resolution cooping with one of the utmost problems in the Semantic Web, namely entity duplicates. The authors deal with this problem proposing a work centered on the notion of similarity multi-dimensional distance across entities (using different entity features, whether not available it is assigned a -1) that is tailored to a given dataset utilizing a pre-processing analysis in order to identify the optimal threshold. This information is then utilized by a Random Forest classifier that learns through an (active) learning process that uses as input unlabeled entities and those labeled by the thresholding process. This is an unsupervised process that does not require human annotations. The paper presents nicely the problem statement, the experimental setups involves 6 datasets and the experimentation is made reproducible, this is a plus. However there are some flaws that prevent me to accept the paper (see below). Strengths: - good problem statement - experimental setup based on three different types of datasets (this points to a weakness) - the experiment is made reproducible thanks to the sharing of the code and data used for the evaluation, thus fostering the reproducibility Weaknesses: - lack of statistical analysis of the dataset. In other words, the authors claim that the thresholding works well on 3 types of datasets (nominally different), however are they really different? - I see the elbow point as dataset dependent. Presented as such, the measure of the elbow is a pre-processing task. However, this is much dependent how entities are "similar" from each other. For instance, I'd expect that a dataset with numerous duplicates has a point higher than a dataset with less entities, regardless the dimension of the dataset itself. I suspect that such an assumption works well with DBpedia since it models how the dataset has been created. Would not fit for instance other knowledge bases that are for instance created for specific domains such as tourism or health. This might limit the applicability of this work. Despite the authors stated the opposite that it's is domain independent in the value proposition. - not clear how the Query-by-Committee strategy works in detail. The intuition is to utilize an ensemble of classifiers to further filter the output of the active learning. If so, is this learning phase utilizing the same exact set of data from the noisy pool? Generally, the entire methodology would benefit from an illustration and a mathematical derivation (as it is presented as a sequence of packages without clear interfaces that make harder the understanding from a reader) - In 3.4, are the results of the resolution process over the full datasets? Ideally, I'd expect to learn a model from the training, and applying it to test. This does not seem to be the case from the illustrations since the performance improves at each iteration. However, this is not clear nor illustrated in Figures thus the doubt"".
Paper.88_Review.0 hasContent "Comment after Rebuttal: Thank you for the clarifying comments. My decision (accept) remains unchanged. --- The authors present a benchmark created for the evaluation of systems that match tabular data to knowledge graphs, and explain the underlying generating algorithm. Furthermore, they give insights into the first edition of the SemTab challenge in which they invited participants to evaluate their systems against the generated benchmark. All in all, the paper is well structured and nice to read. The necessary concepts are introduced properly and an appropriate amount of background information is provided. The topic itself is very relevant as - despite of the existence of a couple of evaluation data sets - it is not guaranteed that all systems are evaluated against them in the same way. Additionally, the current benchmarks lack either in appropriate size or in quality of annotations. My wish for the next edition of the SemTab challenge would (as the authors already mentioned) also be a more realistic data set. Some ideas from my side (for which it is very likely that you have heard already) are: - The manual or automatic identification of some general "kinds" of noise that occur in the existing benchmarks; they can then be introduced to the generated data set on a random basis for a more realistic setting. - Include tables and/or entities that can't be mapped at all, as this is often the case in real world scenarios (of course only if that is in the scope of the SemTab challenge). Realistic entities that can't be mapped could easily be created by generating the data set from a complete knowledge graph, but let the participants only match against a part of the knowledge graph. Some minor remarks: - Page 8, Section 5.1: "CTA, CEA and CTA" -> "CTA, CEA and CPA" - Page 10, Section 5.1 (last sentence): You say that "AH is used as primary score to encourage perfect annotations". As far as I understand the metrics, the AP score rewards perfect mappings even more, doesn't it? - Page 14 (Evaluation platform): "One the one hand" -> "On the one hand" I suggest to accept the paper as it is well written, relevant, and does not contain any major or minor flaws."".
Paper.88_Review.1 hasContent "The paper describes the datasets, evaluation protocol as well as the results of the SemTab 2019 Challenge which has been conducted at ISWC2019 and has attracted a significant number of participants. The paper falls into the category “Benchmarks” of the CfP and fulfils all the review criteria given in the slideset for submissions of the category Benchmarks. The SemTab 2019 Challenge is thus clearly eligible for the resource track. The design of the challenge as well as the text of the paper still has some flaws which prevented me from giving the paper a full accept: 1. The paper is not self-contained concerning the description of the benchmark datasets as well as the description of the procedure of producing the datasets. For a resource paper, it does not allocate enough space to the description of the actual resources (the datasets and gold standards) but uses a lot of space for the discussion of the specific benchmarking methodology, participating, systems, and lessons learned from the 2019 campaign. The paper would gain as a resource paper if the second part would be shortened and more space would be allocated to: 1. detailed statistics about the datasets, e.g. histogram of columns and cells per table, amount of annotations in the gold standard as distribution of these annotations over tables of different sizes (as all tasks are easier for large tables); 2. Proper description of the “refinded lookup approach” and not only a reference to [8] as well as statistics about the impact of the label refinement (how many % of the tables were affected?). 2. The paper should denote more space to the comparison of the generated datasets to existing benchmark datasets such as Limaye, T2DV2, and the datasets from [8] as this essential for judging the results on the benchmark compared to existing results of other systems. The comparison would ideally also include more statistics about the datasets so that the reader would not need to refer to the original papers. This would for instance be much more interesting for the reader than the details on the RDFization of the tables currently provided in Section 4.4. 3. Other table to KB matching benchmarks use significantly deviating tasks, e.g. table to knowledge base class matching, table row to KB entity matching (which differs to CEA task in respect that also combinations of multiple values can be used to identity an entity, e.g. name + birthdate), matching of datatype properties (such as population or economic statistics). The paper should acknowledge the other tasks and should explicitly motivate the selection of the tasks for the challenge. 4. Please explain in more detail what is meant in section 5.1. with “the target cells, columns and column pairs were provided to the participants” and how this impacts the difficulty of the challenge compared to other benchmarks."".
Paper.88_Review.2 hasContent "Post-rebuttal comments: I hereby acknowledge the authors response and I wish to thank you for your clarifications. Regarding the description of the round 2 tables, thanks for clarifying that 83% is coming from [8] and that 17% was synthetically generated ... but how did you select those 10k tables. The authors state that it correspond to a manageable and relatively clean subset of the dataset published in [8]. What were the selection criteria? How do you assess this cleanness? I expect the camera ready version to clarify those points. There is a confusion between wiki redirect pages (which is what the authors are talking about) and the wiki disambiguation pages (which can be used as a noisiness technique as written by the authors). The expected clarification is whether a disambiguation page (not a redirect one) was considered as good match or not in the challenge? Original review: General comments: This paper presents the SemTab 2019 resource, composed of 4 sets of tables (summing up to roughly 15k tables) coming with semantic annotations with entities from the DBpedia knowledge graph. Those semantic annotations enable to benchmark systems that must interpret web tables according to a given knowledge graph, and in particular, the 3 sub-tasks named: CTA (guess the type of a column), CEA (disambiguate a cell value) and CPA (guess the property holding between the main focus of the table and another column). The resource also comes with a scorer to evaluate systems. This resource was used during one of the two ISWC 2019 Semantic Web Challenge and the paper reports also the results of the different systems having competed against this benchmark dataset. Finally, this benchmark being mostly synthetic, the paper also describes the methodology used to generate the dataset as well as how to improve it. The resource is well-motivated mentioning a number of applications that benefit from semantic table interpretation such as web search, QA system and Knowledge Base construction. The related work is thorough and this resource is well situated with respect to previous efforts in providing annotated datasets for comparing systems aiming to annotate web tables. While one of the drawback of existing datasets is that they always use the same knowledge graph (DBpedia or Freebase), the authors of SemTab 2019 do not address this issue, e.g. by providing tables that could be annotated with multiple knowledge graphs in the ground truth. This should be more clearly addressed. Similarly, SemTab 2019 do not provide tables in which a large number of entities would not be present in a knowledge graph (so-called NIL values). While this is acknowledged in the paper, the authors should also better state this limitation for SemTab 2019 since it is mentioned as a criticism for the state of the art. The annotated tables are mostly synthetically generated. The pipeline for generating them is well described. The first step consists in profiling a knowledge graph using a set of generic queries. Numerous tools providing this functionality exist and should be mentioned. For example, a tool such as LOUPE, http://loupe.linkeddata.es/loupe/ (http://ceur-ws.org/Vol-1486/paper_113.pdf). A number of parameters are fixed when generating tables such as: 2000 row maximum / table; 7 columns maximum / table; etc. However, those parameters are not really discussed. In order to increase the challenge, the authors have introduced some noisiness in the data. The only technique mentioned is abbreviating the first name of a person. What other techniques do the authors consider applying in the future? The round 1 corresponds to the T2Dv2 dataset [19]. This dataset contains a small number of annotation errors, some of them have been discussed on the challenge forum. It is unclear why a proper adjudication phase has not been organized among the system participants. I strongly recommend the authors to update this part of the resource and to correct the errors that have been identified by the system participants. The round 2 is composed of real tables manually annotated from [8) and of synthetically generated tables. However, the proportion of each is not mentioned in the paper. This should be addressed. Among the 12k tables, how many come from [8] and how many have been generated? Regarding the CEA task, why don't you prohibit disambiguation page as valid annotations? Those are arguably not resources that aim to identify a real word entity. It is also not clear if all possible redirect pages were considered as equal valid annotation for a given entity? Regarding the CPA task, why only one property can be considered as a valid annotation given that knowledge graph often contains a hierarchy of properties? Minor comments: * Page 2: the reference [12] should be better put as a footnote, the homepage of the semantic web challenge not being a proper reference * Page 2: dbr:ernesto is not a resource existing in the DBpedia knowledge graph. dbr:Ernesto is an existing resource (https://dbpedia.org/resource/Ernesto) but it refers to many possible Ernesto (novel, film, people, fictional character). * Page 9: "Round 3 dataset was composed of 2,162 tables; they were 406,827 ..." (and not 406,727 according to Table 2) * Page 9: "Particpants" -> "Participants" * Page 10: Table 3, add a new row with the number of days for each round to ease the analysis of each round duration * Page 12: do not use the term "Table 2d" which does not exist but rather "Figure 2d" even if it is a table * Page 15: the footnote 3 used for the "MaSI" and "ED" projects in the acknowledgment section refer to nothing * Page 15: the reference [3] could be transformed into a footnote pointing to https://github.com/sem-tab-challenge/aicrowd-evaluator"".
Paper.89_Review.0 hasContent "In this study the authors design an approach that can recommend potential collaborators. They use a large knowledge graph to train embeddings I have one particular concern. By creating the negative sample, considering that it is based on random replacement of subject/object, is there chance that you are creating instances that are certainly not true in the present but very likely in the future? For instance, if you create (A, co-author, B) this is definitely not true at the present time as you have your knowledge graph evidencing that such triple does not exist. However, the fact that it currently does not exist does not mean that it will not happen in the future. By creating such sample, you are letting you model learn that such triple is a bad sample as instead it did not occur yet. At page 12, it would be interesting to observe more details, for the recommended authors and how many of them you kept because the was not co-authorship, not belonging to the same organisation and so on. Minor: References 19 and 20 are the same References 24 and 25 are the same After the REBUTTAL: Thank you for addressing my concerns"".
Paper.89_Review.1 hasContent "The paper introduces and evaluates the effectiveness of a novel loss functions useful to learn to predict links from a knowledge graph by relying on relation and entity embeddings. In particular, the authors evaluate the proposed link prediction approach by relying on a scholarly knowledge graph and focusing on the recommendation of potential co-authorship. After introducing and motivating the general context of their work, the authors present the scholarly knowledge graphs they built in order to learn to recommend co-authorship links among Author edges: the structure, the data sources and the content (in terms of number of entities and relations) of each knowledge graph are described. Then an overview of current approaches to embedding-based knowledge graph link prediction is provided: in particular, the weak aspects of the use of the Margin Ranking Loss function used to learn such embeddings are highlighted. A new loss function, the Soft Margin Loss, is introduced with the aim of mitigating some of thee problems highlighted with respect to the Margin Ranking Loss function: in particular, the new loss function avoids the need to define a hard margin to separate prediction scores of positive and negative samples as is required if we use the Margin Ranking Loss function; indeed such hard margin could penalize training performance especially when negative sampling techniques generate a non-negligible rate of false negative training samples. The link-prediction performance of the Soft Margin Loss function is compared with scenarios where embeddings are learnt by using the Margin Ranking Loss function: several relation scoring functions are used. Both the scholarly knowledge graphs created by the authors and other link prediction datasets (FB15k and WN18) are exploited for evaluation purpose. The authors also present the results of a manual evaluation performed to validate the co-authorship recommendations proposed by their approach by training over the scholarly knowledge graphs they created. Also the sensitivity of the proposed loss function with respect to variations in hyperparameters is analyzed. --- The paper is fairly well written. It deals with a relevant topic in the area of semantic content recommendation: link prediction approaches over knowledge graphs based on entity and relation embeddings. In particular, a novel loss functions to train these embeddings with improved performance is presented and evaluated quantitatively and qualitatively by considering scholarly knowledge graphs. Comments: - could you better explain the differences among the two scholarly knowledge graphs you consider (SKGOLS and SKGNEW) with respect to the evaluation of the considered approaches? Why would you expect that these two scholarly knowledge graphs provide distinct, may be complementary evaluation frameworks / scenarios with respect to the link prediction approaches considered or proposed? - in Section 5.2 "Quality and Soundness analysis", the manual evaluation performed and their results should be explained with greater clarity. Instead of "50 recommendations filtered for a closer look", it could be better to say that "the top 50 recommendations for each author have been manually reviewed in order to distinguish correct from incorrect ones with respect to the following set of criteria: 1. close match in research...". It is not clear that Table 4 contains the results of this manual filtering of automated recommendations. - could you explicitly specify in Table 2 and 3 the meaning of underlined and bold numbers? - does the formula (6) of the loss function miss a sigma / simmation over all negative samples? MINORS: - Abstract: sixth line: "knowldge graph embedding (KGE) models have..." --> Knowldge Graph Embedding (uppecase first letter) - Figure 1: could you check if the direction of the "isPublished" relation / link (from Event to Paper) is correct? For completnessm, should the direction of the "isAffiliatedIn" relation / link be specified (it is not)? - Section 3, "Preliminaries and Related Work": "A Kg is roughly represented" --> KG - Section 3, "Preliminaries and Related Work": "...defines an score function..." --> a score function - Section 3, "Preliminaries and Related Work": "...a loss function to adjust embedding." --> embeddings - Section 5, "Evaluation": "...evaluation methods have been performed in order to approve: 1) better performance and..." --> assess: 1)"".
Paper.89_Review.2 hasContent "Overall evaluation: -1 (weak reject)"".
Paper.102_Review.0 hasContent "This paper describes a case study of constructing a knowledge graph for the building automation systems (BAS). The proposed work extends the Brick ontology and integrates some other ontologies to define a comprehensive vocabulary to describe various requirements and operations of automatically operating a building, in particular its HVAC system. Positives: + The paper fits very well with the conference. + The paper is well-written and easy to follow. + The examples presented throughout the paper are especially helpful in making the contributions concrete. Negatives: - The paper lacks in-depth technical discussions on the employed vocabularies and techniques. For example, I was really looking forward to seeing a detailed discussion on "transformations and calculations" using SHACL rules in Section 4, as described in the end of Section 2. However, there is no such technical discussions at all. - There is no discussion on the values realised through this proposed knowledge graph, which has apparently been commissioned and deployed in a building. Without this discussion it is hard to evaluate the practical contributions made by this paper. Detailed comments: * The figures are not clear or large enough to be seen clearly. * Many of the figures are used to present fragments of the ontology or the knowledge graph. Instead of showing them as UML-like figures, I suggest actual definitions of these classes, for example in the DL syntax or the Manchester syntax, which is more precise and also appropriate for the ESWC community. * Pg. 5, end of 1st paragraph of Sec. 3.1, "a in the meanwhile" is wrong. * Pg. 6, where are the green arrows that are supposed to represent imports relationship? I can't see them. ====================== I thank the authors for providing the rebuttal. The rebuttal does not really address the questions raised by myself and the other reviewers. It is more a promise of what to add/address in the revision. Hence, I've decided to keep my original score."".
Paper.102_Review.1 hasContent "1. The state-of-the-art is either limited or incomplete. Moreover, the author did not mention the gaps in the existing literature which motivated this work. 2. To me, this work is low on novelty but high on application specific use of existing technology. Given this, the authors need to explain clearly the biggest challenges in dealing with real-world BAS. 4. In Introduction, Section 1, state clearly the existing approaches to BAS, what are the limitations of existing approaches, how the proposed approach bridges the gaps observed in existing approaches, what are the key contributions of your work. Finally, please provide how the paper is organized. 5. There is no need of a footnote 1. Its already explained in Section 1. 6. The authors need to give further technical details of a knowledge graph. How the knowledge graph is designed? How the performance of a knowledge graph is evaluated? 7. Data Ingestion, 2nd paragraph, "The IFC file is then automatically converted to ifcOWL..." How is it automatically converted? Please provide details of parser used? 8. Data Ingestion, 2nd paragraph, are the triples indexed while storing them into a database? Please provide example of triples. How the metadata used for triples is designed or from which standard it is used? How often the semantic model requires update? 9. Generally, the semantically annotation information is rich in its knowledge representation and capture, but such knowledge graphs can be very slow to execute with increases scale of data (particularly gathered in real-world application). Please state how computationally efficient is the performance of your knowledge graph when employed to solve real world applications. 10. Please improve the quality of figures to make them readable. 11. Section 3, Integrated Semantic Information Model, 2nd paragraph, "The semantic model is realized as an RDF-based knowledge graph..." How this semantic model and the knowledge graph is developed? Is it developed manually or automatically? 12. Section 3.1, Design Decisions and Related Work, 3rd paragraph, "....or information model from scratch, we analyzed existing solutions for suitability." Please provide details of what considerations were given to analyze ontologies? how the suitability of ontologies was assessed? did you have to do any adaptation or changes to the ontologies to meet your requirements? 13. How did you assess the correctness, completeness, and coverage of the information using your knowledge graphs and the model? 4. How generic is the proposed model? Can you reuse the proposed model to formalize information content related to another similar construction building?"".
Paper.102_Review.2 hasContent "=== After rebuttal === I thank the authors for their rebuttal and have improved my score accordingly. ----- I discuss the review criteria of the track: === Relevance to the track === This track or the Industry track are the right targets of this paper. === Rigor in the methodology and analysis used to reach conclusions === I am not saying that the authors did not use rigor, but in the paper they did not provide too many rationale for their decisions, eg. * "we analyzed existing solutions for suitability. And in the Brick ontologies, we found the most suitable candidate..." What were the criteria, which were the other candidates? * The authors use Semantic Tagging, which is not a particularly well-defined technical term, or a term I encountered often in the SW community. Sometimes what in other work is being presented as Semantic Tagging looks like sloppy modelling tailored to facilitate some sort of retrieval. What is the use of Semantic Tagging use in the project? In other projects, Semantic Tags are assigned to unstructured data with the aim of adding some sort of structure. Given that you use semistructured data anyway, why not model more elaborately? === Originality === A lot of the things the authors use have been around, but are nicely combined by the authors. If not there, novelty could be found I think in the embedding into the workflow and the solution. There was an Industry talk at Semantics 2019 on Semantics and BIM https://2019.semantics.cc/role-semantics-googles-smart-building-platform === Usefulness to developers, researchers, and practitioners === The paper reads a bit like a technical case study. So the paper is most useful to practitioners I think. === Significance of the problem addressed === Increasing the energy efficiency of buildings is relevant. === Value of the use case in demonstrating benefits/challenges of semantic technologies === The paper shows the benefits of semantic technologies to the use case. === Adoption by domain practitioners and general members of the public === The paper reports on the case of one building of the authors' company. There is no evidence of adoption outside of this one building. === Impact of the solution, especially in supporting the adoption of semantic technologies === The solution has the potential to further the uptake of semantic technologies in the building domain. === Applicability of the lessons learnt to other use cases === Lessons learnt can be found in section 4, but the section is not marked as lessons learnt. The lessons learnt should be applicable in other domains. === Readability, Clarity and quality of the description === * The paper reads well * Section 3 could be restructured: ** Why is related work (3.1) a subsection of Section 3 and not a section on its own? ** Similarly, "Brick Criticism and Recommendations" could be factored out into a lessons learnt or discussion section * I printed the paper. A lot of the figures have a fairly tiny font, Fig. 8 has the smallest and is barely legible. * I find the symbols and colours used in Fig. 1 are hard to understand. Why is REST dark blue and SPARQL light blue? How do the components communicate with the triple store (is it only Data Ingestion that uses SPARQL and only sometimes)? Do the number indicate the workflow? Maybe the use of standardised symbols, eg. from the UML component diagram, could make this figure easier to follow. * What is the difference between "Knowledge Graph" "Dataset" (residing in Jena, according to Fig. 1) and RDF Graph or RDF Dataset? * The authors did not give the specific prefix definitions, rather they say that they are from the Brick ontology. The presentation of the actual prefixes could help in determining which version of the Brick ontology the authors use. * The third dimension in Fig. 3 - what does it mean? ==== Typos: ==== * Let`s -> Let's (p. 11) ==== Verdict ==== The case presented in the paper is interesting and would deserve acceptance. However, the paper could benefit greatly from: * emphasizing the novel part, that is how semantics is embedded in the solution and how and how the solution is embedded into the workflow * restructuring Section 3"".
Paper.106_Review.0 hasContent "The authors treat the very interesting problem of answering queries over web apis with rewriting. The paper is well-written and the theorems convincing. An appendix with full proofs is available. I have read the authors rebuttal, and I argue for accepting this paper."".
Paper.107_Review.0 hasContent "This paper focuses on the problem of subsumption checking in KBs, with the limitation of having incomplete information in the KB, which makes it impossible to find concept equivalences that an expert would otherwise expect. The general approach is to, instead of checking subsumption on the original concepts, these are expanded, or better, unfolded into more complex but (hopefully) equivalent ones, using natural language statistical techniques. Therefore, instead of checking subsumption/equivalence of the original classes, it is performed on the unfolded versions of them. Even though the methods used for this general approach do not add much in terms of novelty if taken individually, I find the overall concept rather practical, as it has been shown in the use cases presented in the paper. KBs, even those that are well curated, will never instantiate all possible ways of representing a concept, using a large number of classes and predicates. In this way, the proposed approach shows a rather practical way of addressing a common problem in KBs. As the authors show in the results, there is logically still considerable amounts of false positives/negatives, and the concept on-the-fly construction still does not match what the quality that experts would otherwise provide. Although in the e-health industry this can be rather problematic, in the mentioned uses cases (e.g. chatbot) the given results are already positive compared to what is usually delivered in terms of symptom data acquisition. Nevertheless, it is still not clear from these results to what point this would compare to a totally data-driven approach, e.g. based entirely on ML data matching. The usage of KB-driven logical reasoning has of course other advantages, but a full comparison with an ML approach (which actually is being used in several e-health chatbot prototypes nowadays) would be of enormous interest. Having said so, I think this more comprehensive study would probably be out of the scope of what this paper is addressing. As the authors mention, the proposed approach does not need to be limited to healthcare. In this respect, I think the authors missed the opportunity to show a stronger version of this work, showing it in action with other KBs like DBPedia or others. They do mention this possibility in the paper, but I think it would have been important to include it, as it is well known that from different KBs the results can be rather different, and pitfalls of the approach can be learned from those differences. Furthermore, applying the approach in more datasets strengthens the results and its credibility. This also applies to other elements of the evaluation, such as the queries (which were unfortunately not real due to patient protection, which it's understandable), but perhaps on a different domain with other KBs, this could have been tested in more 'real' conditions. ------------------ Thanks to the authors for their answers. With this information I confirm my assessment."".
Paper.107_Review.1 hasContent "Symbolic reasoning methods cannot produce any derivation if the underlying database is incomplete. This paper addresses this limitation with a method that creates extracts new knowledge on-the-fly so that reasoning can be carried on anyway. The reasoning problem that is being investigated is subsumption checking under Description Logic. Whenever needed knowledge is missing during this process, this paper proposes a method tgat attempts at extracting it from the labels of concepts. The paper describes the methodology used to extract knowledge from labels (using traversals on the dependency parsing tree) and several heuristics to integrate such knowledge into the reasoning pipeline. I liked this idea, and the paper is clear and well-written. The various contributions are motivated with good examples and the evaluation (in the biomedical domain) is convincing. In my opinion, the paper fits the scope of this venue and should be accepted. I have nevertheless some questions/comments for the rebuttal. - I understand the idea of extracting knowledge from labels, but it would be even nicer if this knowledge was extracted from larger textual corpora. I have some doubts about the quality of dependency parsing on such small textual snippets. Did you perform any systematic experiment to evaluate the quality of this extraction? - You mention that some doctors have evaluated the results of your system. However, you do not mention how many they are (I would expect at least three people) nor report any statistic about their agreement. Can you clarify on this issue? - The statements "Large KBs like DBPedia, YAGO, and Wikidata are usually stored in a distributed manner and are accessible only via SPARQL end-points. Hence the use of existing in-memory reasoning systems is not possible" and "triple-store of graph DBs do not reason over existential quantifiers" are not true. The knowledge bases that you cite can be easily stored in a single machine and both RDFox and VLog support reasoning using existentially quantified rules. I would use different motivations to motivate your contribution in section 5. --- I read the rebuttal and I confirm my score. However, I do not agree with the response that existentially quantified reasoning is not possible because the systems that are used in production do not support it. I still do not see any reason why reasoners like RDFox/VLog cannot be used. Anyway, revising that statement will probably fix this issue."".
Paper.110_Review.0 hasContent "This paper introduces YAGO 4: a new dataset derived from Wikidata and schema.org. The main goal of the dataset is to provide a cleaner subset of types for entities, along with enforcing constraints on properties. Specifically, YAGO 4 selects six top-level classes, upon which disjointness constraints are defined; properties are taken from schema.org and associated with domain, range and cardinality constraints, defined using SHACL (and RDFS/OWL). The base classes are extended with the schema.org taxonomy, and the bioschema taxonomy; Wikidata entities are associated to this taxonomy based on a defined mapping from its types to Wikidata types, where additional unmapped Wikidata types meeting certain constraints are also included. Fresh IRIs are minted based on the English name of the entity in Wikipedia or Wikidata (where available), and temporal annotations are added using RDF*. The authors describe the construction of YAGO 4 based on streaming operators, and provide statistics on three flavours of the dataset, using (1) all of Wikidata's entities; (2) only entities with a Wikipedia article; (3) only entities with an English Wikipedia article. The final dataset contains 9702 classes, 57/15/5 million entities and 326/48/18 million facts (depending on the flavour). The call for the resources track has a special list of criteria that I will initially follow for this review. Later I will discuss aspects relating to the paper itself. # Potential impact On the one hand, YAGO 4 is an incremental contribution, effectively exporting Wikidata into a schema.org-like view; the added value here is essentially a mapping of schema.org terms to a subset of Wikidata terms along with some constraints/axioms. On the other hand, I do buy into the idea that YAGO 4 could serve as a research-friendly version of Wikidata. In particular, Wikidata has grown so large and diverse that it is becoming a challenge/obstacle to manage in research works. Having a smaller subset of Wikidata with a cleaner schema seems useful along these lines for certain research works (the core use-case of OWL 2 DL-compliance is acceptable even if not thoroughly convincing). # Reusability As an emerging resource, there is no evidence of usage provided. On the other hand, I did briefly review the webpage and the provided resources, and from what I saw, the resource is more-or-less ready for use. # Design & Technical quality The authors do follow best practices for Linked Data, provide a SPARQL endpoint, etc. The dataset reuses schema.org terms. I did not find a description in VoID (mentioned by the track's call; it could be useful to add). I did notice one technical issue that the authors may need to look into relating to the dereferencing of IRIs minted for the purposes of Linked Data publishing. Taking the example of: - http://yago-knowledge.org/resource/Douglas_Adams This 301 redirects to: - http://www.yago-knowledge.org/resource/Douglas_Adams ... not a problem in itself, but for large-scale access to YAGO 4 through dereferencing, this adds an extraneous request. Looking up the aforementioned IRI gives a 303 redirect to the following URL: - http://lod.openlinksw.com/describe/?uri=http://yago-knowledge.org/resource/Douglas_Adams Unfortunately, my attempts to get RDF in popular formats from this URL were not successful. I tried: - curl --header "Accept: application/rdf+xml" http://lod.openlinksw.com/describe/?uri=http://yago-knowledge.org/resource/Douglas_Adams - curl --header "Accept: text/turtle" http://lod.openlinksw.com/describe/?uri=http://yago-knowledge.org/resource/Douglas_Adams Both provided blank responses. # Availability No persistent IRIs are provided (I don't personally see that as a major issue, though perhaps a dump could be published on Zenodo or Figshare just in case). Licence information is provided. Code is available on Github. I am not sure if YAGO 4 has been registered in any catalogues. What I really miss here is a plan for the maintenance of the resource; more specifically, the future outlook is left vague. Are there plans to periodically update YAGO 4 with new data from Wikidata and new vocabulary from schema.org? How will this be handled? # The Paper The paper itself is relatively well-written and easy to follow. I did, however, find that the paper lacks details in certain parts. * I would have liked to have seen an example of the YAGO 4 description of an illustrative entity. * Where do the property terms come from? (It is explained later that they come from schema.org, but this is too late given that constraints on these properties are defined in earlier sections.) * How were the constraints defined? Manually? Exported from Wikidata? From schema.org? How many constraints are there? * When constraints are violated how are repairs made? Are all triples removed? Or are minimal repairs somehow applied? * How many classes come from Bioschema? How many from schema.org? How many (only) from Wikidata? (The 9700 figure seems to refer to all classes in the taxonomy, irrespective of source.) * Are the RDF* meta-data provided separately? How are these data published? * Which Wikidata dump is used? Is this the truthy version or the complete version? This is important to clarify as it refers to how ranked statements are exported, and also influences the importance of the temporal meta-data, as well as the constraints (e.g., do countries have one current population or several potentially historical populations). How are such issues handled? * A complete example of a dereferenceable IRI would be appreciated. * Table 1: number of triples would be appreciated. * How are the precisions of values handled? Such precisions are a key ingredient of Wikidata. I understand that the page limit is a factor, but I think most of these details could be provided without using much space (barring, perhaps, the entity example, but I think it would be very helpful to provide this in the paper as it would implicitly address many of the other doubts). # Other Comments I find it a bit strange to call this dataset YAGO 4 as it bears little relation to the YAGOs that came before. I guess the authors wish to leverage some of the name recognition of YAGO in order to bootstrap usage of the new dataset, but I would have preferred a different name given the radically different heritage of YAGO 4. # Overall Impression Focusing on the resource itself, YAGO 4 is essentially a schema.org "view" of Wikidata. Given that it is an emerging resource, it does not yet have proven usage to point to. On the other hand, I think it has the potential to be used at least for research purposes. Hence I lean towards accepting and would encourage the authors to consider the feedback provided here in order to improve the paper and the resource. # Minor Comments - "2 Billions of type-consistent triples" -> "2 billion type-consistent triples" (also lowercase all M(m)illion, B(b)illion, etc.) - "incurs that ... are tedious" -> "makes ... tedious" - "Moreover, there is little hope to run logical reasoners ..." The authors are guilty of overreaching here. We can run logical reasoners just fine using rule-based approaches (not based on satisfiability), para-consistent approaches, repair strategies, ..., ..., ... This strawman of there being no hope of reasoning over inconsistent data is counterproductive and needs to be stamped out. Rather the authors could say something like "inconsistencies add complications ...", "care must be taken when ...", etc. - "as the KB contains many small inconsistencies" Under what logic? Under what constraints/axioms? Wikidata does not have a notion of (in)consistency to the best of my knowledge. - "pointed [to by] the ..." or "pointed [out] in the ..." - "All the number[s]" - Footnote 2: This is a very strange way to define a "class" (that it has a super-class). Why not define classes as the values of P31? - "that [it] is not easy" - "Potentially, this could lead to millions of entities ..." While strictly true, it is perhaps misleading in that it gives the impression that any user off the street could effect such a change. In reality, to the best of my knowledge, most of these "central" parts of the Wikidata schema are semi-protected for edits, and hence not so easily changed. My issue is that the text here makes Wikidata seem a bit more of a "free-for-all" than it actually is. - "the the" - "If the first class on the path" Unclear. - "allows [for] efficiently selecting ... and get[ting] back" - "This works per property ..." Unclear. - "URIs are converted into literals xsd:anyURI" Which URIs? This is very likely a bad idea. I'm guessing this is perhaps for external IDs, but these are still IRIs that should be represented as such. The valid use of "xsd:anyURI" in RDF is vanishingly rare (e.g., give the namespace *string* prefix of a vocabulary, which should confirm to URI syntax). - If the dumps are N-Triples, how are the RDF* data published? ================================================================================== Regarding the rebuttal, I am quite satisfied with the authors' clarifications and remarks. Just regarding the concrete example in Figure 1, I should clarify that it would be great to have an example *early* in the paper to clarify a lot of doubts the reader might have. If accepted, I encourage the authors to add the details and clarifications requested by the reviews (even if, for space reasons, some such details are rather made available on a webpage or an extended version published online, and referenced from the camera-ready version)."".
Paper.110_Review.1 hasContent "The paper presents version 4 of the widely-used YAGO knowledge base and motivates the design choices behind the knowledge base. In contrast to previous releases, YAGO 4 combines data from Wikidata with using schema.org as upper ontology. As previous YAGO releases, YAGO 4 focuses on data quality and logical consistency for the prize of coverage. By cleansing and increasing the logical consistency of Wikidata, YAGO 4 clearly adds value and I’m sure that the knowledge base will again be widely used. YAGO 4 is clearly a suitable resource for being presented in the ESWC resource track. The text of the paper still has the following weaknesses which should be resolved for the final version: 1. You say on page 4 that you delete 26% of the Wikidata facts using your constraints. Please explain in more detail what types of facts you delete, e.g. split the 116M deleted triples by constraint type which deleted them. 2. You mention on page 4 that you validate literals using regular expressions, but do not state for which percentage of your datatype properties you have such regexes. 3. In Section 2.2 you mention the misfit between schema.org classes and Wikidata classes and explain that you delete 12M instances dues to this misfit (7.5 meta-entities). Please name the top classes to which the remaining 4.5 million entities belong, so that the reader gets an idea which classes are not covered in YAGO 4. 4. In Section 2.3 you mention that you manually map 116 relations between Yago and Wikidata. In Section 4.1 you say that you have 116 properties. How many of these properties are relations? How many are datatype properties? How many of the datatype properties do you validate (see comment above)? Please also split the facts in Table 1 into relations and datatype properties. 5. You are not the first effort to cleanse Wikidata and represent its content using a more consistent ontology. You mention the related work only very superficially on page 3 and only vaguely mention that your design choices lead to different strengths and limitations. Knowing more about the differences between the resulting knowledge bases is crucial for people to decide which resource to use in their projects. Thus, please add a proper related work section to your paper and discuss the differences between your KB and the related work in more detail. Please also add statistics about DBpediaWikidata and Wikidata itself to Tables 1 so that the reader can see the impact of the different design choices. ================================================================================== Regarding the rebuttal, I am also satisfied with the authors' clarifications and their plans to address many of our comments in the final version of the paper. I believe that YAGO 4 is a really useful resource which I expect to be as widely used as its predecessors. Given this as well as the author's willingness to improve the text of the paper, I raise my rating from weak accept to accept."".
Paper.110_Review.2 hasContent "Comment after Rebuttal: Thank you for the clarifying comments which resolved many of my initial doubts. However, I am still having an unsure feeling about the mappings that are used for the extraction of YAGO4. The authors did not clarify the workflow of creating the mappings in their rebuttal (e.g., did only one person create those mappings; has there been peer-reviewing / crowd-sourcing / data-driven checks; ..). As the mappings are the central part of the new knowledge graph and are responsible for its correctness, this should be made very clear. As most of my doubts have still been resolved by the rebuttal of the authors, I am raising my final evaluation to "weak accept". --------------------------------------- The authors present the YAGO4 knowledge graph which combines Wikidata instances and parts of its type system with the very constrained ontology of schema.org. The YAGO4 knowledge graph is considerably large in size with almost 10K classes and up to 57M individuals as well as 326M facts. Despite its size, the main benefit of the resource is the very restrictive ontology which uses SHAQL constraints to ensure that the knowledge graph is in a consistent state. In general, I very much like the idea of the paper as Wikidata is in fact not properly accessible by a reasoner. Consequently, a logically consistent version of the knowledge graph has much potential for further use. The SHAQL constraints provide a nice foundation for a consistent and extensible graph. However, these constraints may also make it difficult to extend the knowledge graph further as new entities or facts have to comply to every constraint (and if the constraints are too restrictive, then a part of valid information might be excluded from the graph). But this is not a problem as it is intended that way. Although the research idea is very interesting, I think the paper/resource in its current state has still room for improvement. Among other things, it lacks a lot of detail in several places and contains some inconsistencies. Consequently, I evaluate the submission as "borderline paper". These are the problems that I see in particular: MISSING DETAILS ------------------ 1) I am missing a more detailed description of underlying resources (schema.org, Wikidata), so that the reader can better understand how you create the new knowledge graph out of them. For me, this is more relevant than, e.g., the description of previous YAGO versions (page 1) as they don't have an immediate impact on the current version. 2) The paper has only a rather superficial comparison of YAGO4 with related knowledge graphs. It should be more apparent how YAGO4 is different from other graphs (e.g. DBpedia), especially in terms of reasoning capabilities. 3) How were the mappings in sections 2.2 and 2.3 established? And in particular, how has it been made sure that the mappings are correct? As these are a central part of your knowledge graph, this should be made clear. 4) Page 3, "Disjointnesses": How do you resolve inconsistencies that come up due to the disjointnesses during the creation of YAGO4? For example: The class "Person" is disjoint with "CreativeWork", but the resource "Peter Pan" [1] in Wikidata is an instance of both. Which one do you keep (if any)? 5) Page 4, "Functional Constraints": How do you decide which of the properties are functional? And how do you resolve inconsistencies that may come up? For the property "birthPlace", for example, Wikidata lists only the most specific place of birth, but other knowledge bases like DBpedia assign multiple places of birth of varying granularity to a person. 6) Page 4, last paragraph: Has a portion of the removed facts been inspected and were the facts all actually "wrong"? What kinds of errors are fixed by removing them? 7) Page 10, "Applications": The fact itself that previous versions of YAGO have been used in several projects doesn't say much about the current version as it is - at least to my understanding - rather different from all its previous versions. Seeing some kind of application (e.g. reapplying the new version of YAGO to some old project, or at least showing what it can do better than Wikidata with the improved reasoning capabilities) would be really great here. INCONSISTENCIES ------------------- 8) Page 5, Section 2.2: You state that only "leaf-level classes are taken from Wikidata", but in section 3 (page 8) you say that you include all classes into YAGO4 that are sub-classes of a class that has been mapped to schema.org. How is that possible if you only take the leaf-level classes from Wikidata? Or do you mean in section 2.2 that you map the leaf-level classes of schema.org to Wikidata? 9) Page 9, Table 1: What kind of sameAs links to Wikipedia do you extract? The numbers for the "W" and "E" flavours (42M and 25M) seem a little high to me given that these flavours have only 15M and 5M instances. READABILITY -------------- 10) Section 3 is, in my opinion, really hard to digest. At the start of the section some high-level information about the complete extraction procedure (maybe with an overview figure and a couple of examples) would aid my understanding more than a low-level description of the implemented algebraic operators. This might also help understanding the description of the extraction workflow on page 8. NITPICKS (FORMATTING, TYPOS, ..) ------------------------------------- 11) lowercase (Section 2: "person" and "thing") vs. CamelCase (Section 2.1: "BioChemicalEntity", "Event",..) classes 12) The same for properties (e.g. Section 2, first paragraph: "birthDate" vs. "capitalof") 13) Different formatting of facts (Section 2.3: "wd:Q42 wdt:P31 wd:Q5" vs. "yago:Douglas_Adams rdf:type schema:Person") 14) Page 8, second bullet point: "sub-classes" vs. "subclasses" 15) Page 8, fourth bullet point: "subclass of" in italics but "instance of" in quotes 16) Page 4, middle: "the range of the birthPlace property" should be "the range of the birthDate property" 17) Page 1: You give four sources for YAGO but do not cite the other mentioned knowledge graphs (e.g. DBpedia, BabelNet, NELL, KnowItAll) at all [1] https://www.wikidata.org/wiki/Q107190"".
Paper.111_Review.0 hasContent "I acknowledge the rebuttal answers by the authors, though I am not fully happy about those. I am still not fully convinced about the relevance of this paper for thee track. I also think that evaluation results are not fully convincing and mature. These considerations preserve my initial score of weak accept. ------------------- This paper presents the results of an evaluation of the CoModIDE tool for graphical modular ontology engineering in an experiment with 21 subjects carrying out some typical (simple) modeling tasks. Having noted that, I believe that the paper does not fully fit into the scope of Ontology and Reasoning track of ESWC. Overall, the paper is well written and structured. The methodology used in the presented research is mature and well presented. the data collected in the exaluation exersize is presented and discussed in fully sufficient detail. I cannot however say that all the results, as interpreted and presented in the discussion, including some generalizations are well supported and convincing. My major concerns are: - Chosen evaluation measures and the size of the subjects pool: The paper states that the tools were compared based on the task completion time (with the threshold of 20 min) and the correctness of the subjects' outputs. while reading Section 5.2, I wondered why completion time has at all been selected as a measure. It was not a quick chess tournament, I believed. A reasonably convincing argument needs to be given in support of this experiment design choice. Also, why 20 minutes point has been chosen as a threshold? I got the answer to my completion time question only two pages later - in Fig. 5. Still, it has not been made clear, why a 20 minute time slot to completion. As for Completeness, I feared that this approach is too simplistic and coarse, as presented. Furthermore, I felt that there was a high risk that the results would have been under-representative having quite a small pool of subjects (21 people). Generalization: In the conclusion to Section 6.2, the authors state that they "... believe that our results are generalizable, due to the strength of the statistical signicance". I think that this statement is pre-mature. A larger-scale experiment, e.g. via croud-sourcing, is needed before you may state any generalization of a statistical sort. In my opinion, the pool of subjects is too small to be considered statistically representative. Some comments to free text responses and future work: Graph layout: This is so common for all diagram editors, that you might have taken it for granted before asking. The reason for the use of the force-directed graph layout in VOWL is, most probably, the way to build better layouts. This is not perfect - as you described in thee related work section. However, having nothing for improving the layout is not good. Future work: The presentation in Section 7.1 is the future work on the tool, but not on its evaluation. However, the paper focuses on the evaluation. So,iss this (evaluation) line of research fully accomplished? There are several imperfections in the text and figures in terms if readability, reference numbering, and English. Examples: - References need to have consecutive numbering in the order of their citation appearance in the text. Hence, [17] cannot be the first citation. This is one of the indicataor of a quick reshuffling of a previously written manuscript, btw. - P.2: "we have formulated for the following hypotheses" - remove "for" - P.4: "... provides an a graphical overview of an ontology's structure ..." - to be: a graphical overview of an ontology structure - P.4: "We also invite the reader to download and install CoModIDE themselves ..." - I would remove "themselves" - Fig. 1: text in the figure is not readable - Acronyms: acronyms have to be given in full at the place of their first appearance. Please check OPLa in p.5. Does OPLa stand for an Office of the Principal Legal Advisor? - P.7: "...the time taken to complete each tasks" - each task - P.8: "... we provided a 10 minute tutorial ..." - I would rather call it a briefing. You do not teach, but put people into the context of the activity and instruct. - P.9: "... quick and dirty ..." - please avoid emotionals and colloquials - Fig. 5: Please make H1 and H2 in one tense. To conclude, I believe that the paper presents ongoing research and better fits into a relevant workshop as a work-in-progress report."".
Paper.111_Review.1 hasContent "The paper presents and evaluates a Protégé plug-in called CoModIDE, developed by the authors, for visual and graphical ontology engineering. First, the authors present the plug-in and the modular ontology design process, as well as related work, especially ontology design patterns. In the main part of the paper, the authors describe a user-based evaluation of the developed tool. CoModIDE was already presented in a previous publication. This new submission expands on the description of the tool and provides a user-study based evaluation, which is, overall, sound and well described. The evaluation shows that the participants prefer the new plug-in over standard Protégé. Overall, this is a highly relevant and interesting submission that I definitely recommend to be presented at ESWC 2020. The submission is written in a very clear and descriptive way but there are a few typo and unusual constructions. If at all possible and feasible, it would be good to have the paper checked by a native speaker."".
Paper.111_Review.2 hasContent "Update: Dear authors, thank you very much for answering all my open questions a more than sufficient way. I agree with all of your points. I also think you have sufficiently answered the points raised by the other reviewers. This paper presents CoModIDE, a Protégé-based GUI for designing ontologies in a modular way, using ontology design patterns (ODPs). Section 2 presents high-level requirements and the features of the GUI. Section 3 points out limitations of related visual modelling interfaces and provides an introduction to ODPs. Section 4 presents in detail the method of the study that constitutes the main contribution of this paper: a user evaluation that leads to the findings that CoModIDE can be used more effectively and more effectively than Protégé and leads to a higher user satisfaction. The tasks that users are requested to accomplish are made up but realistic. In Section 5, a comprehensive set of conclusions is drawn by a detailed statistical analysis of the study observations. Section 6 discusses the findings more broadly, also including the qualitative observations. In particular, the paper leaves no doubt that CoModIDE answers the research questions positively. Furthermore, the software is available for download and testing and comes with good documentation, and the research data are also available. Minor issues (see https://www.dropbox.com/s/60pvq18me25ca4s/eswc2020_paper_111.pdf?dl=0 for details): * Related graphical notations and editors for ontologies are covered in "Related Work", but Section 2.1 suggests that they do not exist. OK, they are not standardized, but still widely used. Also, VOWL indeed has a force-directed structure, but at least one can pin nodes. * If by "subsumption hierarchy" you mean rdfs:subClassOf, why is this an "advanced construct"? * When a pattern is applied in an ontology, what does it mean that "the IRIs of its constructs are updated with the target ontology namespace"? To my understanding patterns hardly contain concrete IRIs, but rather placeholders to be instantiated with concrete IRIs. * The statement "This work observes that restrictions are easier to understand in a notation where they are displayed coupled to the types they apply to, rather than the relations they range over" makes me wonder to what extent this finding (about EER and UML) is applicable to OWL property restrictions. It would be nice if you could discuss this. * Why do you ask CV3 "I am familiar with Manchester Syntax"? If you asked this question because Protégé uses Manchester Syntax on its UI, then maybe many people were actually familiar with Protégé's UI syntax, but not aware that its name is "Manchester Syntax"? * The expected solution for Task B realizes explicit typing by a "hasType" property, whereas this choice is debatable and the same could also have been realized by introducing subclasses of Apparatus. Please justify your choice, e.g., by pointing to some ODP literature. * The in-depth statistical analysis is a strength of this paper, but not all readers are fully familiar with this background. Please explain for non-statisticians what you mean by "our limited sample size is not amenable to partitioning". * The conclusion says that "the answers to our a posteriori survey questions on this matter proved inconclusive" – where exactly can this be seen?"".
Paper.116_Review.0 hasContent "After rebuttal: I thank the authors for their response. I decided to keep my score (accept). ****************** This paper describes an approach for generating a knowledge graph of software mentions in scientific papers from the social sciences. The approach includes disambiguation and enrichment using DBPedia and Wikidata. The evaluation shows that the approach has a .82 f-score for detecting software mentions in the corpora. The paper is well written, easy to follow and highly relevant for the conference and track. I believe this is an important topic to measure both the impact of software and to properly credit authors for their work, and it's great to see that both the code used and the resultant knowledge graph are available online with examples to explore (even if the readme of the code still needs work to be reusable). In addition, the authors are straightforward with the limitations of the approach, which is very useful when comparing and assessing it for reuse. Therefore I think this paper should be accepted at ESWC 2020. I list below some comments, suggestions and questions that would be great to see addressed in the camera ready version of the paper. - Given that some manual rules are needed for the approach, how dependent is the approach on the chosen domain? - The authors acknowledge that the graph has errors. However there is no comment on how would these errors be fixed when detected by users. Is there a plan for a feedback mechanism? - The authors state that one benefit of the approach is for proper attribution to authors and citation. I don't see the difference between them; aren't we attributing the authors by properly citing their work? Maybe the authors of the paper are referring to tracking the impact of software? - The precision obtained for the SSC is in most cases very low. Training with SSC with distant supervision does not really add much to the precision, which I guess it's what the consumers of the KG will mostly care. I would have liked to see some discussion on whether the extra effort is really worth the gain in those cases. - In the evaluation, the comparison against the state of the art is not really fair, because they used different corpus and domain, although it's informative. Why not comparing against a simple classifiers as baselines? For example, a TF-IDF + binary classifier on whether a sentence is a software mention or not would have been easy to do with GSC. It would not tell you which software was mentioned, but it may have been a good alternative to SSC. - I am a little confused about using "String" as a class in the data model. String is usually a data type, and having it as class does not sound right. It looks redundant to have a mention which then refers to a software, and I can think of a few alternatives that would produce a cleaner data model (specifically for querying): - 1) Extend schema:mentions with skg:mentionsSoftware, (domain skg:SoftwareArticle, range skg:SoftwareApplication, both classes extensions of their respective schema.org.). That way you can have a direct link between paper and software. - 2) Instead of String, call the class skg:SoftwareMention, it will be less confusing for users. - I would like to suggest the authors to look at codemeta.org, an extension of schema.org for scientific software that includes some of the terms proposed by the authors to describe software. - Schema.org has the class SoftwareSourceCode, so the information about the repositories could be linked as well. - Content negotiation on the vocabulary (skg) does not work. I tried: 'curl -sH "accept:application/rdf+xml" https://data.gesis.org/softwarekg -L' with text/turtle and application/rdf+xml. In both cases, only html is returned. This means I cannot import this vocabulary in my application. I didn't find a link to download the rdfs/owl file of the data model in the documentation. Since the paper does not emphasize the vocabulary as a contribution, I will not penalize this in my review, but I still think it should be addressed."".
Paper.120_Review.0 hasContent "This paper proposed new rule mining system called AMIE 3, which is an extension of AMIE. In order to find rules quickly, the authors proposed to modify mechanisms of some heuristics in AMIE and introduce parallel processing. Then they compared AMIE 3 with other systems about computational time using several datasets. Although the computational time is shorter than the other systems, the authors do not evaluate the quality of the rules extracted by the proposed method. It is fatal problem of this paper. The paper do not include evaluation of the extracted rules. If the rules are not useful, the new algorithm is not meaningful even we can cut off the computational costs. Therefore, the evaluation of rules is mandatory. In addition to the above point, there is some discussion about a problem of rules extracted by AMIE on link prediction problem [1]. Therefore we have to carefully re-design support and confidence for finding out the rules for actual use. The paper do not have discussion about it. Minor comment: When we propose an algorithm to reduce the computational costs, we usually have an analysis using order of computational costs. P4. “that that” -> “that” [1] Takuma Ebisu, Ryutaro Ichise: Graph Pattern Entity Ranking Model for Knowledge Graph Completion. NAACL-HLT (1) 2019: 988-997"".
Paper.120_Review.2 hasContent "Rule mining is an important problem in the Semantic Web and it has received considerable attention from this research community. AMIE has been an influential system and it is often used as a baseline in more recent works. This paper proposes a new version of this engine, explaining engineering and heuristic-driven techniques that improved its runtime up to 15x. Speeding up the runtime is important to allow the application of this technique to the largest KGs, but the proposed solution does not alter the original algorithm but instead suggests "tricks" to improve the execution. The first two "tricks" are interesting, but the third one is simple (parallelization) while the fourth one (dictionary encoding) is a standard practice in many other scenarios (like SPARQL answering). Because of this, after reading the paper, I was somehow disappointed with the novelty of this work. However, I do value system contribution and the evaluation is comprehensive and convincing. Therefore, I remain positive about this paper. If also other reviewers are concerned about the novelty, then one possibility could be to move this paper to the resource track. I think it's a better fit there. I have more suggestions for improving this paper: 1) Add more citations, especially because you have the space for them. There are other works that perform fact-checking with rules. Also works on dictionary encoding should be cited. 2) Wikidata 2019 was used only on one experiment. It would be nice to include this dataset also for the other experiments, given its large size. 3) Considering tidying up the formal explanation of your method. For instance, - is 'x' in livesIn(x,Berlin) (page 3) a variable or a constant? - Instead of writing $| s : \exists o : r(s,o)| $ why not writing $|{ s : r(s,o) \in K}|$? - Substitutions are partial mappings - From your definition of rules I did not understand if they are safe or not (every variable in the head appears in the body?). - what's p^-? - in some cases p is used as predicate, in others as fact. This is confusing. ---- I read the rebuttal and I confirm my score."".
Paper.136_Review.0 hasContent "This paper presents a system StreamPipes Connect which targets domain experts to facilitate the process of connecting new Industrial IOT time series data sources and harmonize data at the edge with an intuitive GUI. The system uses a master/worker paradigm where master manages and controls all distributed workers (containers) at the edge for pre-processing data. The proposed system is evaluated by a user study with respect to its usability and user experience, and its performance. Results show that the system provides good usability for non-technical users (domain experts) and good user experience. Overall, the paper is well structured and easy to read. Some questions and comments --------------------------- - Section 3: Which triple store does the system use and is there specific reasons or benefits of choosing that one? - Section 6.3: It would be better to make it clear that whether the usage is StreamPipes including the StreamPipes Connect proposed in this paper or not. If I understand correctly, the usage of the StreamPipes Connect alone is difficult to measure since it is distributed as part of the StreamPipes? - It would be great to discuss and share some of the challenges or lessons learned associated with the use of semantic technologies for the system. Minor things --------------------------- - Section 4.3 (P8): url for qudt ontology should be here instead of P11 - Section 5.3 (P11): Users able to refine - Section 6.2 (P14): all with a different lengths After rebuttal --------------------------- Thank you for the authors clarifying the comments. I think the work is a good fit for the in-use track for sharing the experience of building practical tools using semantic technologies."".
Paper.136_Review.1 hasContent "The paper describes the "StreamPipes Connect" tool. The tool provides different connectors to easy the ingestion and harmonization of heterogeneous sensor sources. The tool is available within the Apache StreamPipes framework. The paper is well-written and easy to follow. The tool and its components are described clearly. Moreover, the tool is already being adopted. The paper also includes an evaluation of the tool, which indicates its usefulness, specially for less technical users. A number of semantic of IoT connectors have been suggested in the latest years, for instance within the GraphofThings middleware and the BIG IoT API. Authors could add those in the references. I believe the novelty on the current paper is not on the mapping itself but rather on the lightweight edge pre-processing. The set of current supported transformations is given in the paper. The set can be extended, but its not clear if end users can actually suggest new transformation rules. I would clarify that in the text. A small performance evaluation is presented but I feel that it could be extended. In particular, I would be interested in seeing how different transformation pipelines perform. The current results show only the transformation rules in isolation. -------- Authors' response I acknowledge the response provided by the authors and I think they clarify most of the comments. I think there is an interesting contribution in the paper, but its somewhat small giving the other existing systems."".
Paper.136_Review.2 hasContent "The paper introduces a tool, StreamPipes, designed to let users create data stream processing pipelines across different data sources. Semantic Technologies are used to power the adapters that connect to each data sources. It is an interesting piece of work and a sound paper but to better fit ESWC I would have like to see more information about the specifics challenges/motivations/value for using Semantic Web technologies here. In particular: * There is no indication of how identifiers can be consistently minted and re-used across sensors. If this is not an issue as no triples is created to connect across sensors, then where is the "Linked" of the linked data? * Following up on that, if the motivation for using JSON-LD and a triple store is the vocabulary (rather than linking resources), then I would argue the vocabulary presented in Listing 1.1 is very shallow semantic wise. It does not seem to be that JSON-LD brings much value there. * And lastly, this issue extends to the rules which are specified and apparently then processed outside of the semantic stack. Why not leverage OWL and the triple store to express and process this? * In terms of related work I would have expected to see a note on DataCube at least. And another on Prov-O eventually. A reference and comparision to https://dl.acm.org/doi/10.1145/1526709.1526788 would also make sense. Lastly, and more broadly, it seems the authors missed the work on semantic enriched IoT done in the context of the project "SpitFire" by Manfred Hauswirth and team. * Next to the interesting user study that validates the pipes system itself, the authors could also have considered adding an ablation study on the impact of taking the triple store and JSON-LD out of the picture. ------------------------ Update after rebuttal ------------------------ I would like to thank the authors for the responses to my question and those of the other reviewers. In the light of these (and trusting that some of those answers will make it through the final version of the paper), I'd like to raise my overall score for the paper and recommend its publication."".
Paper.138_Review.0 hasContent "The paper is a prosecution of the work presented at ISWC 2019 related to the application of graph embeddings for the detection of synonyms in RDF graphs (citation [14] in the paper). The authors expand on that work and develop a novel approach by relying on rule mining techniques, particularly, the AMIE+ rule miner - citation [7]. The main argument for adopting a strategy based on the generation of declarative statements (horn clauses) is the explainability of the output, that better fits a workflow were humans may want to intervene and verify the quality of the output, for example, in a knowledge graph maintenance activity. I believe strongly in these type of techniques to be better, in general, than black-box algorithms, when introduced in a human-centred workflow. This is why I think the paper suffers from a lack of discussion on the pragmatic value of the method. The examples shown are quite obvious and could be found without the need of computing horn rules for 10 hours! How many synonym pairs were detected on DBpedia? 5? 10? 50? Also, how many of them could have been found without the support of this approach? I think these questions are crucial for demonstrating the value of the approach and answers must be in the paper. Moreover, the evaluation is performed by analysing the precision @ 500 but then only the first 3 rules from 1 experiment are shown. A longer list of examples would certainly help on understanding the relevance of the contribution. The evaluation compares the approach with the previous results (in a good set of variants) and a more naive baseline based on closed object sets (assuming that properties that have the same type of objects are most probably synonyms). The results are very positive and demonstrate the benefit of the approach. However, the techniques are not particularly innovative and it is quite surprising that these type of experiments come after the ones based on graph embeddings. The paper is well written and clear in most of its parts. A depiction of the general workflow would have helped. ---- I thank the authors for their clarifications that solve my concerns related to motivation and impact. I increased my score but ask the authors to perform the related changes for the camera-ready if accepted."".
Paper.138_Review.1 hasContent "The paper describes a rule-based approach for detecting equivalent or synonymous properties in a KG. While this approach gives promising results compared to embedding approaches, the experiments with only two datasets DBpedia and Wikidata raise concerns about the applicability and scalability of this approach in different datasets. Since this approach is a data-driven, more experiments are expected with the real-world datasets having the problem, and an in-depth discussion on the computational complexity and hardware requirements would be desirable. Positive: - The rule-based approach for finding synonymous properties is well-presented and easy to follow. - The results in the DBpedia is comparable to the best embedding approach HolE - The rule-based approach offers explanations over embedding approach - The datasets and source code are provided. Negative: - The previous work by the same authors had performed the experiments with Freebase. However, they do not include it in this paper for comparison. What is the rationale behind not providing the experiments with Freebase? Is it not implemented? Or is the result not good? Since this approach is data-driven, we expect to see the experiments in more different datasets. - DBpedia is the only dataset with a real problem and the proposed approach gives a comparable performance, not superior to HolE. - Wikidata doesn’t have synonymous properties and the authors introduced synthetic synonyms into it. It would be more convincing to find the datasets having real problems than creating a “fake” one, especially for a data-driven approach. - While the authors state that their approach is scalable, we do not see any analytical discussion on the scalability. Instead, the evaluation was performed on sampling datasets with reduced sizes to ~11-12M triples. - Scalability. The authors claim that both KG embedding and rule mining have problems with the state-of-the-art hardware but never state which hardware they are using in their experiments. Both GPUs and RAM are getting cheaper and cheaper, and the cloud options are also available. - What is the computational complexity of the rule mining process? How big are the joins? === After rebuttal ==== Thanks the authors for addressing my concerns."".
Paper.138_Review.2 hasContent "In this paper, the authors aim at detecting what they call synonymous properties in large knowledge graphs. They consider that two properties are synonymous if they share the same (formal) definition, which means in their case that the two properties are defined with two conjunctions pf properties that match at least partially. This paper extends previous work dedicated to learning property definitions from large knowledge graphs using techniques like rules, frequent item sets or knowledge graph embeddings. The paper states very clearly the objectives of this work and the related work that were used to mine relations as well as contrastive approaches like ontology matching. A very nice and relevant example provides very clear illustrations of the phenomenon to be captured, of the formal definitions that are mined as well as the support of each rule (given the triples that contain the relation). The authors propose the RuleAlign approach as a way to align relations, here in the same knowledge graph. The rule mining technique relies on mining property definitions using rule induction on definitions that are turned into Horn clauses. The evaluation of the confidence and support of each rule is a means to select the learned rules among all possible rules. The relations are matched using these definitions. Two relations are said to be synonymous when they refer to (almost) the same conjunction of relations. The evaluation of the approach compares 6 embedding implementations with rules RuleAlign and frequent item set algorithms as a baseline. The dataset to be mined is DBPedia. Several baselines are manually evaluated, with a precision to k with k going up to 500. Results show that RuleAlign outperforms other implementations, with results very close to the one obtained with the best embedding solution (using HolE). The advantage of RuleAlign is that synonymy of relations is "explained" thanks to their definition. The paper as well as the results are of high quality. The paper is clear, well structured and well written. The state of the art is relevant. It is a nice contribution to knowledge graph exploitation. At the end of section 4, it would be nice to give a synthetic view of your approach, in the form of a kind of algorithm, of the process carried out by RuleAlign. _____ after the rebuttal phase _____ I thank the authors for their answers to the comments and requests of the reviewers. I hope that the final version of the paper will integrate the suggested changes."".
Paper.150_Review.0 hasContent "This paper presents a method for recommending new properties to Wikidata editors that can be added to enrich existing Wikidata entities. The method is designed to be effective (provide relevant properties) and efficient (in terms of computational resources). It uses a trie-based representation of the property space to efficiently calculate support for rules that are then used to recommend new properties. The paper furthermore presents various "backoff" strategies to increase the performance. The evaluation compares the method to the currently implemented Wikidata property recommender and compares different backoff variants. The paper presents a clear research problem, improving a service for a crucial and central semantic resource. The method itself is an interesting and novel combination of existing trie-methods to order properties in large knowledge graphs. The evaluation is extensive and well-described. The paper itself is well-written and easy to follow. A few points of concern: - In section 3, the recommendation task is defined to predict a relevant property, given existing properties of an item. This is a narrow definition the task, since it would make sense that not only properties, but also values could be considered. In fact, this is mentioned in the second-to-last sentence of the paper for future work. I appreciate that this is at least addressed, but I suggest to do this in section 3 already and to explain why this was not chosen in this work. - The evaluation only presents statistical comparisons between the different variants and to the existing recommender. What would clearly be an interesting extension is a human evaluation of the recommender. Are the suggested properties appreciated and or used by the editors? This is more of a suggestion for future work, which can be addressed in this paper. Especially as this paper was submitted to the Social and Human Aspects of the Semantic Web Track. All in all, the user is missing from this paper. - Related to this, even though multiple metrics are used, what is missing is some insight into the type of properties that *are* recommended in one version and not in another version. This would give more insight into the behaviour of the method. - I am also missing some discussion about the applicability of the method to other knowledge graphs. The method is now tested on one specific one, but could be applied to arbitrary (RDF) knowledge graphs as is claimed by the authors. It is now unclear what properties of Wikidata make this approach work. It might be the case that there is a specific distribution of properties that is needed for this method? A second experiment on other graphs would increase the impact of the work. Overall, I think that although the research contribution is limited, the work is interesting and well-described with a good evaluation. typos: p2 conclouding -> concluding p3 realative -> relative p14 more then -> more than ** After rebuttal: I'd like to thank the authors for their responses. I understand some of the choices that were made as explained in the rebuttal. Overall, I still think that the paper presents an interesting method and is well-written. However, I still feel that for this track, the lack of users/humans is problematic, as well as the lack of research contribution. I will keep my score."".
Paper.150_Review.1 hasContent "This paper presents a new approach to property recommendation of entities in Wikidata by using a tree-based data structure to store support of frequent property sets. The proposed approach is evaluated against the current property recommendation used for Wikidata and performs better on several metrics. Strengths 1) The paper presents a novel approach to property recommendation in knowledge-bases such as Wikidata. 2) The SchemaTree recommendation algorithm outperforms compared to the PropertySuggester. 3) The paper is well-written with minor grammar issues. Weaknesses 1) Although novel, the significance of the proposed approach looks limited given the incremental improvement (4%-6%) in results against the baseline. 2) It is unclear why the proposed approach was not compared with Balaraman et al. and Dessi & Atzori. 3) The presentation style of the paper can be improved by using consistent terminology (property, attribute, or predicate). ** After rebuttal: I thank the authors for their responses to reviewers' concerns. After going through other reviews and authors' responses I still think that the approach is novel; however, I will keep my score as it is due limitations of evaluation and significance of research contribution."".
Paper.155_Review.0 hasContent "== Paper summary == This paper proposes a method to compute aggregate queries over SPARQL endpoints that support web preemption. Web preemption was proposed in previous work by the authors as a way for a server to pause a long running query, requiring the client to make a new request to continue executing the query; this technique allows for servers to control the cost of executing individual requests without having to terminate a query after a timeout, allowing for a more fair scheduling of client requests and also the ability to run more expensive queries using multiple requests. While previous work looked at features such as BGPs with union, where the client can simply take the (bag) union of results over different requests, this work focuses specifically on the case of aggregation, where the final results must be grouped and the aggregation function applied. However, rather than gathering all results and applying aggregation at the end, the authors propose a technique whereby the server returns the results of an intermediary aggregation over batches of partial results (computed within a single request), where these intermediate aggregations are themselves aggregated on the client across requests to compute the final result. For example, an average is computed by computing a sum and a count of the partial results of a request on the server, returning these to the client, who will subsequently divide the total sum and total count of all such pairs over all requests to get the final result; in the context of web preemption, such a technique can reduce the data to be transferred from the server to the client. These aggregation techniques under preemption are described in the paper for SUM, COUNT, MAX, MIN, AVERAGE, SUM DISTINCT, COUNT DISTINCT and AVERAGE DISTINCT, with the latter four aggregate functions being the most challenging (not being associative/commutative). Experiments are provided over BerlinSBM and DBpedia using an existing set of SPARQL aggregate queries (proposed in another work to compute VoID descriptions over endpoints). Comparisons are made between the author's web preemption framework without the proposed optimisations (SaGe), with the proposed optimisations (SaGe-AGG), as well as a TPF client and Virtuoso. The results confirm some expected trends: Virtuoso is faster overall and transfers less data (but does not offer preemption), TPF performs the worst (not being able to do joins server-side), while SaGe-AGG performs better than SaGe, reducing the amount of data to be transferred and processed (particularly in the simpler non-distinct cases). The paper is highly relevant to the ESWC conference, and though it could have been submitted to a number of tracks, the choice of "Distribution and Decentralization" seems a good one. In terms of strengths and weaknesses … == Strengths == S1: I like the idea of web preemption. It is a relatively simple but practical idea that does raise some interesting technical questions. Though the idea was originally proposed in another paper, the optimisation of aggregate queries proposed in the current submission does seem to be a good topic for a paper at ESWC, and a step towards making preemption more applicable in practice. In summary, I really could see this line of work finding applications in SPARQL endpoints in practice. S2: The informal discussion in the paper is quite easy to follow. The approach is well motivated, existing literature is discussed adequately (to the best of my knowledge), etc. S3: The proposed approach, though quite straightforward, makes a lot of sense, and does lead to significant speed-ups. S4: Though the experiments could be more comprehensive (and some more details could be added, discussed later), in general I think the experiments that are presented are sufficient to get a good overall idea of the performance of the system in practice, with respect to an existing set of aggregation queries, two different datasets, and four different strategies. The discussion of results and subsequent conclusions also seem appropriate, address limitations of the current work, and are easy to follow. == Weaknesses == W1: While in S2 I mentioned that the informal discussion in the paper is quite easy to follow, the main weakness I detect in this paper is with respect to the formal definitions, which are (unfortunately) sloppy throughout the paper. This sloppiness makes the formal parts of the paper quite frustrating to read and understand. In general, I think I would not have understood what the authors were trying to define here were it not for the fact that the technique is relatively straightforward and I knew a priori (from frameworks such as Spark) how such aggregations can be decomposed in batches. I will try to be specific so these aspects can be improved. (I hope this is not interpreted as nitpicking but each issue listed resulted in real confusion for me and, taken together, affect the technical quality of the paper): - General: Throughout the paper, the authors add unnecessary quantifications on free variables in the set-builder notation. For example, they write { (x,y) | \forall x some condition holds and \exists y for which some other condition holds }. This is incorrect and really quite confusing. The variables on the left of the "such that" | are already universally quantified! The right hand side should rather define a boolean condition (predicate) on the variables appearing on the left-hand side, e.g., { (x,y) | some condition holds on x and y }. (Also the notation will often become a lot simpler defining the domain of the variable(s) on the left; e.g., rather than { x | x \in X and ... }, one can more simply write { x \in X | ... }.) - Definition 1: Compatibility was defined for a single variable, while it is used here for a set. Only knowing beforehand what I know can I guess that all variables in the set (rather than any) need to be compatible. This should be defined beforehand. - Definition 1: The projection operator is not introduced as part of the query language previously; I also dislike that while P is defined using syntax ("AND", "OPT", "UNION"), the authors here mix syntax and relational operators (like \pi). - Definition 2: While \mu is defined earlier as a partial mapping from variables to terms, here it is unioned with a single term. Being more specific, the authors do not associate the results of the aggregation function with a variable, which is inconsistent with the definition of \mu and incompatible with later definitions. - "For mapping-at-a-time operators, ... is bounded by O(|Q| x log_b(|D|)). What is D? (I guess the RDF graph?) What is b? (Actually b can just be removed since changing base leaves a constant factor.) What does it mean to suspend and resume a query, precisely? What happens to the iterators and the programme state? Is the time considered for resumption that taken to return the first result? What are the number of operators required to evaluate Q? Is this the same as the number of operators in Q? (These were not defined precisely.) Such a specific bound seems to require more details to make any sense; without such details, I don’t see the point of stating such a bound. - Can a "full mapping operator" also be a "mapping-at-a-time operator"? I think the intent is that these are disjoint sets of operators, but the authors define that a full mapping operator must serialise all results, which seems to be the case for other queries, like JOIN. I guess they mean that that operator must be applied after all intermediate results are materialised to compute the final results? Perhaps this just requires a clearer way to phrase the text. - "However, the operator used to evaluate SPARQL aggregation is a full-mapping operator ... hence it cannot be suspended and resumed in constant time." Why does this implication hold? Why is constant time important here? (Even mapping-at-a-time operators are not suspended and resumed in constant time.) How is constant time defined (in data complexity or combined complexity)? - I have the same doubts about the core problem statement. I don't understand what precisely the suspend-and-resume task consists of, or what precisely is meant by "constant time". - Footnote 6: I don't follow. Does this assume no unbound (group) variables? I assume so but then, how can the number of group keys be greater than the number of intermediate solutions? How does having all variables in the group condition affect this property? Perhaps I misunderstood something. - Definition 3: Again, the quantification in the text is strange, mixing existential (if for *some* grouping variables ...) and universal (and for *all* non-empty multisets ...) in a way that I cannot wrap my head around. Is it not necessary to define that \Omega_1 and \Omega_2 have the same set of variables, and that V is a (non-empty) subset of that set? - Definition 3: I found the k \mapsto v notation very confusing as I don't know if k is the key, or k is an aggregation variable. Previous definitions used <k,\Omega> for grouped keys (not \mapsto), and definition 2 dropped variables for the results of aggregation functions; only from the examples later can I guess that k is a variable here. (Also why state k = k' rather than just use k in both instances?) - Definition 3: This does not seem to work, in my understanding, if a key is only in \Omega_1 or \Omega_2, in which case the key is dropped. - "CT(X) = { x | x \in X }" In other words, CT(X) = X? - "is decomposed as ..." This seems incorrect. Won't this compute the distinct values for V in \Omega_1 summed with the distinct values for V in \Omega_2? Take V = { ?a }, \gamma(V,CT(?o),\Omega_1) = { (:A,1) } and \gamma(V,CT(?o),\Omega_2) = { (:A,3) }. The union gives { (:A,1) , (:A,3) }. The count gives 2 rather than { (:A,3) } or { (:A,4) } as might be expected. - Definition 4: "and \omega_i \in [[P]]_G such that [[P]]_G = \bigcup_{i=1}^{i=n} \omega_i" This seems incorrect. If \omega_i in [[P]]_G, then it must be an element of [[P]]_G; if [[P]]_G = \bigcup_{i=1}^{i=n} \omega_i, then omega_i must be a subset of [[P]]_G. The only way this might be consistent is if [[P]]_G contains subsets of itself, which is probably not the intention. - Algorithm 1: what happens in Merge if Y contains a grouping key not seen previously in X? It seems that such a key should be added as an initial value, or combined with a zero value. Without this process, does this algorithm ever add anything to Z? - "on a single set of mappings, which can be done in constant time" Should this not be "a single mapping"? Again the question arises: "constant time" with respect to what? It does not seem constant in the number of variables, and merge (called from the non-interruptible section) reads through all solutions in X, so it's not constant in the intermediate results/data either. - "the preemptable SPARQL aggregation iterator is fully stateless". Again I don't really understand what this means. The iterator I_p needs a state to remember where it is (unless the server forgets this and the client passes this information somehow in the request?). W2: Algorithm 1 looks naive performance-wise: a nested loop is used to check through all solutions in X for one compatible in Y. Rather a data structure can be trivially used to look-up such solutions in X in O(1) or O(log|X|) time as it seems X can grow reasonably large (depending on the quantum defined). I was assuming this was just for simplicity of presentation, but the paper does not actually state that any optimisations are applied beyond the stated algorithm. W3: In general, the methods proposed are quite straightforward; the techniques for computing aggregations in this batch-wise streaming fashion are (as the authors allude to) known for frameworks such as MapReduce (combine/reduce) and Spark (aggregateByKey). Their application in this setting is indeed novel, but overall, this novelty is fairly straightforward. == Other comments == As a side observation, I think the authors should include some idea of which SP queries use which aggregation operators (an analysis of the effect of different aggregation operators on performance -- aside from the case of DISTINCT -- is missing). Also perhaps experiments could be done with aggregate queries taken from logs for Wikidata, DBpedia, etc. (it might also be interesting to know what percentage of queries in these logs are using aggregation). == Verdict == All in all, I would be willing to accept the paper were W3 the only concern as in general, I think such contributions should be accepted if they have the potential for practical impact (which this work has). Primarily I am most worried by W1, and thereafter by W2. With respect to W1, in particular, I think the extensiveness of the issue goes beyond the scope of a camera ready revision in that I think after corrections are made, the paper should be reviewed again to ensure technical correctness. Hence I lean towards a reject. I do believe this work can have practical impact and hope to see an improved version of this paper published soon, revising the aforementioned issues in the formal presentation, as well as using data structures in the server-side merge; it would also strengthen the paper to include experiments for other sources of aggregate queries (perhaps taken from SPARQL logs) and to provide results comparing the same queries with different operators. == Minor comments == - The plural of "quantum" is "quanta". Please revise throughout. - "After [a] quantum of time" - "to finally compute[] groups" - "which [allows for performing] the decomposition and [recombination]" - "that process[es] queries" - "as the TPF server[] only processes" - "An RDF triple $... \times T[]$" (stray parenthesis) - What about SAMPLE or GROUP CONCAT? - "HAVING BY" -> "HAVING". Also support for HAVING is never discussed? - "[]Preemption adds an overhead" - "To [address] these challenges" - "a query $Q$ is bound[ed by]" - "where $|Q[|]$ is the number" - "is usually [much] bigger" - "Reducing data transfer [requires] reduc[ing] ..." - "on [the] server side and recombine[] partial" - "[where] $n$ is the number of quant[a]" - "as a mapping-at-a-time[] operator" - "under the same condi[]tions" - "duration of the quantum [seriously impacts]" - "[A l]arge quantum" - "However, [a] large quantum" - "and [W]ikidata" - Algorithm 1: Would be good to add "Server-Side" to the caption for clarity. - "over all aggregation results in [X] ... to merge them with their equivalent in [Y] using the diff[e]rent" - "and then suspend[s]" - "and the [DISTINCT modifier]" - "SAGE server $S$[]," - "following [] the Web preemption model" - "SPARQL aggregation[] results" - Table 2: better to right align numeric columns. Also I do not like the scientific notation used for a single number (better to write it in full). - "It run[s] with [the] same configuration" - "between the start[] of the query and the production of [the final] results" - "it supports [] projection and joins [on the] server side" - "[significantly improves] data transfer" - "increasing the quantum [significantly improves the] execution times" - "quantum of 30s [] compared with Virtuoso" - "As ex[pe]cted" - "better performance[] in data transfer" - "on [the] server" - "aggregate[] push down" - "does not [support] intra-query" =================================================================== Post-rebuttal: I am quite satisfied by the authors' comments in the rebuttal. W1 was my main concern, and while I think there are potentially many (minor-ish) fixes to make for the camera ready version, I trust the authors to be able to clean up the notation (as they have already begun to do). I appreciate the clarification regarding W2. I agree that W3 is not a weakness per se in that well-executed, straightforward contributions are important and can make for very nice papers. I think though that in such cases a higher standard should apply to the paper and the work, which contradicts W1 in this case. Web pre-emption is an interesting approach and this is a useful contribution in that direction. With the clarifications of the rebuttal, I will improve my score to Weak Accept. I do urge the authors though to carefully correct the paper for the camera-ready version."".
Paper.155_Review.1 hasContent "This paper describes the design and implementation of an operator for the SAGE framework that can handle aggregate queries, taking into account the execution model that is used by this framework, which uses the Web preemption model (allowing SPARQL queries to be suspended and resumed so that such queries can be completed even in those cases where in other centralised models there would be a timeout or the convoy effect would appear, and allowing reducing the amount of data transfer that is required by other models like that of Triple Fragments). The paper is well written (only a good number of typos are present, but this does not diminish the understanding) and easy to follow, since concepts are well explained and formalised, and algorithms are well explained and discussed. The approach taken is intuitive, benefiting from the fact that aggregate queries can be easily decomposed (especially in the cases of SUM, MIN and MAX, although with clear approaches to be followed as well for the COUNT and AVG cases). The operator implementation is also made available in GitHub, in a well documented repository that also contains all the data and results, which is highly appreciated by this reviewer. There are only two major aspects that may be improved in the future: - On the one hand, it is not sufficiently clear which are the assumptions under which aggregate queries can be used in this context and where this approach can be applied, since in a footnote there is a mention to the fact that OPTIONALs cannot be used in some parts of the queries. Is this the only restriction for this operator to be applicable? Or are there any other restrictions? The queries used for evaluation are not sufficiently rich to determine whether there may be other options, since they are rather simple and although based on a real use case, the may be very biased towards the generation of Void-like data instead of the typical other cases where aggregates need to be used (e.g., data-warehouse-like queries, like those that can be evaluated on RDF DataCube data). - The experiments are sufficiently well designed, but in my opinion more work may have been put on them: in terms of the variety of configurations for the different approaches presented, in terms of the types of data used and queries evaluated (see above for the discussion on DataCube), and on the usage of usual evaluation and comparison procedures (cold vs warm queries). In any case, the experimentation is sufficient to support the initial claims. However, as an additional point, in the discussion on the comparison in the case of DBpedia, the discussion is not sufficient: there are many comments that are related to the specific implementation of other operators and optimisations that may affect query evaluation time in different systems, which is right, but then it may make somebody think that the comparison is completely unrigurous, since then the comparisons are not applicable at all given the high differences in terms of implementation between different systems. I would have liked to see the behaviour, for instance, under similar general query plans in all systems. - The fragment of DBpedia used is unclearly stated. Some typos: - finally computes --> finally compute - which ables --> which is able - to evaluated --> to evaluate - recombines partial --> recombine partial - a partial aggregations --> a partial aggregation - q5 seems to be missing in the end of the second paragraph of page 9 - PostrgreSQL - supports f projection --> supports projection Note after rebuttal: I acknowledge having read the rebuttal, and as long as those aspects that are repsonded by the authors are considered in the final paper if it is accepted, I would be happy to see this paper accepted."".
Paper.155_Review.2 hasContent "The submission follows up on previous work on Web preemption, the concept of allowing a query processor to suspend and resume query execution on order to implement a more fair resource allocation between clients than the first-come-first-server model. The contribution of the current paper is the theoretical foundation and the prototype implementation of an extension that treats aggregate queries (min, max, avg, count, sum), and its comparative empirical evaluation against both the work it extends and against Virtuoso, the state of the art in SPARQL query processing. The paper compares favourably against the former, but not against the latter. This is expected, and should not be considered a failure, as Virtuoso is a production-grade server that implements many optimizations that are outside the scope of this work, and the comparison is always interesting to see. I would recommend that in future work, the authors consider including in their evaluation protocol the simultaneous execution of multiple queries, as the behaviour when allocating resources between multiple clients is their strong point. There will be workloads where their approach outperforms Virtuoso on metrics relevant to fairness, such as "median time to first response between all clients". Editorial: The abstract claims that experimental results demonstrate outperforming existing approaches by "several orders of magnitude" in terms of execution time and the amount of transferred data. This statement raises expectations that are not met in the paper: a factor of 100 is a significant improvement, but I don't recommend promising "several order of magnitude" and delivering 10^2. Minor editorial: Bottom of page 6: "more bigger" Bottom of page 10: "waits for its non-interruptible section to complete and then suspend query execution" -> "... suspends query execution" In reference 24, fix the .bib so that HBase maintains its capitalization."".
Paper.159_Review.0 hasContent "The paper is on incremental entity resolution for knowledge graph completion. The paper is appropriate for the track submitted. The contributions of the paper are the following: * New methods for incremental entity resolution as extensions to the FAMER framework of the authors. * Implementation of the techniques using Apache Flink * Detailed experimental evaluation of the proposed approaches. The paper is well written and easy to read. The contributions are important and the results of the experimental evaluation very good. Major comments: * Given the title of your paper, I would expect a short discussion of what knowledge graph completion is, what the challenges are and how your contributions address these challenges. This is now done only superficially in the paper. * In the related work section, please mention other state of the art frameworks for ER such as JedAI (Papadakis et al. PVLDB 2018) and others. * Explain with a few phrases which functionality of Apache Flink motivated you to use it for your implementation. Minor comments: * Some of the references have e. a. in the author names which is probably a bibtex error. * Footnote 1 needs fixing."".
Paper.169_Review.0 hasContent "This paper addresses challenges related to the so-called Industry 4.0, which refers to the transformation of manufacturing systems and processes with information systems. The main contribution of this work is the development, documentation, and evaluation of a knowledge graph for this domain. The main goal is to support newcomers and experts in the tasks related to the development of Industry 4.0 systems. The paper is well written, well organized, and the authors do a reasonably good job at fitting in an evaluation of their proposal via the documentation of three use cases, as well as a technical evaluation based on standard metrics. This last part is the most difficult to understand due to a general lack of detail -- if accepted, I recommend the authors to provide a more in depth discussion when presenting the work at the conference. Typos: for instance from by DBpedia --> for instance from DBpedia potential consumers groups --> potential consumer groups can easily results in inconsistencies --> can easily result in inconsistencies"".
Paper.169_Review.1 hasContent "The paper concerns Industry 4.0 Knowledge Graph that depicts the status of standards, reference frameworks and concerns for Industry 4.0. The potential application of the resource is to enable newcomers and experts to get the recent knowledge on how to implement Industry 4.0 systems. Some chosen application scenarios are also presented in the paper. The motivation of the paper is clear and well described. The idea behind the I40KG is also valid as this method for aggregation of diverse resources (that should be “living” resources) may provide its timeliness. The research goal is provided within the paper and the structure of the paper is aligned to meet the goal. The paper does not include detailed methodology related to the development of the graph, however some insights into the research process are given. It would be appreciated to provide insights into assessment of completeness of the approach. Related work: related work on knowledge base design and knowledge graph maintenance seems to be missing. The maintenance process is crucial to make the graph stay up to date and fulfill the requirements specified. In my opinion, the description of processes of updating, extending and maintaining the graph is the weakest part of the paper. Quality: how the Knowledge Graph was evaluated for completeness? How to know that no important guidelines are missing, and all knowledge is covered? How is the provenance of diverse resources described while presenting data to the user? What in case some part of resources are not valid anymore? Visualization: it is appreciated that I40KG comes with visualizations enabling non-semantic experts’ interaction with the graph. However, some more detail on architecture of all elements enabling for production of visualizations would be needed."".
Paper.169_Review.2 hasContent "This work proposes a knowledge graph for the he digital transformation of manu facturing systems (named Industry 4.0) related industrial standards norms and reference frameworks. This resource is relevant for Systems Architects, component developers, system integrators, machine manufacturers, etc. It’s a specialized resource and I’m not sure it will be used by Semantic Web community or by academic researchers but it’s useful in an industrial context. The resource is created with FAIR principles, is published at a persistent URI, provides open licence publication (creative common licence), publicly available, registered in a github repository. There is not a big novelty since the resource is an extension of an existing ontology (Bader et al 2019). However, the extension is important: previously 40 entities enhanced to 300 entities and it provides relations to external sources applying machine-date readable data interlinking the textual, normative, and informative resources. I find the overall quality of the paper and of the work to be good, but I have a concern about the impact of the resource as mentioned above."".
Paper.179_Review.0 hasContent "The authors present an approach to semi-autmatically learn and populate an ontology utilizing Czech language texts. The texts are first preprocessed and analyzed for surface forms representing entities and respective phrases representing the relations between the entities. Based on this extracted information several rules are applied to formulate the relationships. These relationships are presented to the user and asked for approval/rejection. The approach provides two different modi: a text is analyzed for a newly created ontology and instances, classes, properties have to be learned from the provided text. Or the ontology already contains classes/instances and the text is analyzed for specific entities found in the vocabulary and the user is presented links between these extracted entities according to the underlying ontology. The paper is concluded with an evaluation of the applicability of the formulated rules/patterns regarding recall and precision. Overall, it is eligible to provide development for entity linking within new languages. Especially, when the language differs in syntax from Germanic or Romanic family of languages. However, the authors seem to start from the very beginning of entity linking. For instance, they did not discuss previous work on Czech language by Michal Konkol: Konkol M. (2015) First Steps in Czech Entity Linking. In: Král P., Matoušek V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science, vol 9302. Springer, Cham The description of the identification of entities within textual information seems a bit naive as they describe that surface forms consisting of combined nouns are preferred over annotating each term separately. Here, the authors could have invested some more time on previous approaches even if they are in a different language. The authors present a set of 9 rules to be utlized for ontology learning. Unfortunately, there is no comparison to existing approaches (again, even for other languages). Is this set complete? Is it sufficient to find (almost) all relations mentioned in a text? Also, this issue is not included in the evaluation. Recall and precision is evaluated for each pattern separately, but there is no information, if all relationship contained in a text could be extraced using these rules. In addition, the authors could have mentioned results for other pattern-based approaches in ontology learning. In this way, the achieved results and the scientific contribution could be estimated although there is no other approach in Czech language."".
Paper.180_Review.0 hasContent "This paper presents two resources: (1) Astrea-KG, a knowledge graph representing mappings between OWL constraints and their equivalent SHACL constraints; and (2) Astrea, a tool for automatically generating SHACL shapes from ontologies using Astrea-KG. As usual, this resource submmission is reviewed according to the following dimensions: potential impact, reusability, design and technical quality, as well as availability. Overall I think the submission is borderline due to a number of weaknesses in the reusability and technical quality. === Potential Impact === The resource certainly plugs a gap in the state of the art and should be of interest to the Semantic Web community. Given the fact that there is an increase of interest to SHACL to complement OWL as a schema-level modeling language for the linked data, I believe the submitted resources can accelerate the adoption of Semantic Web technologies. Comparison with existing work with similar scope has also been made. For the latter, however, I find that the comparison with the work from Knublauch (reference #11) should have been elaborated more. What do the authors mean by the use of patterns was not considered? === Reusability === There is not yet evidence of usage beyond the resource creators. Documentation are scatterred, hence making it a bit difficult to obtain. The API documentation returns a JSON string that is virtually not readable by human. There is a potential for extensibility though this is not discussed clearly by the authors. What I find missing is the documentation about the mapping implementation that is published together with the mapping. Yes, there is a HTML page for the vocabulary terms, but the explanation about mapping implementation is a bit lacking: one needs to read through the paper (common users may miss this) to understand that the query in the mapping implementation should be applied to the source pattern in order to obtain the target pattern. === Design & Technical quality The methodology followed during the creation of the resources seems sound to me. No obvious re-use was done though. Schema diagrams are provided in the paper. The one in the resource website can only be accessed if we go to https://w3id.org/def/astrea, which is not explicitly mentioned in the paper. but not in the resource's website. There are a few points at which improvements could be made. - There is non-uniformity in the use of namespace in the KG. Some URIs are in w3id.org namespace, while some other are in http://astrea.helio.linkeddata.es/ namespace, hence does not satisfy the linked data principles. What's the reason for this non-uniformity? - The URIs in http://astrea.helio.linkeddata.es/ are not resolvable. - The property isMappedBy should have been named isMappedTo - The authors claimed that the OWL constructs and the corresponding SHACL shapes are equivalent, but there is no explanation why this holds. Given that there are 157 mappings proposed, how do we know if the mappings are indeed correct? Note that OWL 2 uses open world assumption, while SHACL essentially employs closed world assumption. Equivalence between them is thus not necessarily straightforward. === Availability === The URI http://astrea.linkeddata.es is accessible though for some reason, I sometimes got "site is unreachable" as response. The URI http://astrea.helio.linkeddata.es/ (which appears in the KG) is not accessible (I received a response of "the site is unreachable". The DOI https://doi.org/10.5281/zenodo.3571009 resolves to a Zenodo record giving me the KG. The KG does use RDF Turtle syntax, which is an open standard. Open license information is mentioned in the paper, but not in any of the resource addresses. API and download are provided, but it is unclear if the KG is registered in any of the community registries. The software is available in Github. Sustainability plan is unclear beyond the claim that continous updates will be performed. ******* AFTER REBUTTAL ******** I thank the reviewer for the response. My concerns are addressed by the response and the promised improvement in the paper. I update my score accordingly."".
Paper.186_Review.0 hasContent "I am happy with the clarifications made by the authors. The rebuttal shows their capability and willingness to amend the paper in order to address my comments, mostly focusing on various claims that I found to not be carefully written or fully supported in the previous version of the paper. I have updated my score. ############### This submission investigates how to best use ElasticSearch (ES) for retrieving RDF triples, in order to achieve the best accuracy on several entity-based tasks. Several aspects are being tested: field separation, field weighting, index extensions with properties beyond the triple, and off-the-shelf ES similarity metrics. What I enjoyed about the paper is its pragmatic approach of exploring how the ES's rich functionality set can be tuned to RDF data and tasks. Another strong aspect of the paper is the multifaceted evaluation in section 5. These aspects, together with the fact that performing keyword search over RDF is an open challenge, the fairly clear paper story and the public release of the code, make this paper decent and worth considering for acceptance. I do, however, have a few comments about various claims made in this paper. This, IMHO, weakens its position and contribution. While I trust that these points are generally addressable, I have serious doubts about whether they can happen for the camera-ready version. 1) The authors claim to be the first ones to use ElasticSearch for retrieval of RDF triples and to investigate various indexing, querying and retrieval approaches. I have to disagree with this point. LOTUS (Ilievski et al., 2015; 2016), which was built on top of LodLaundromat data, also uses ElasticSearch to index and retrieve RDF triples, and investigates 32 retrieval options. This is not to say that the two approaches are the same: the present paper has focus on systematic investigation of existing functionality for accurate retrieval; LOTUS focused on scalability and was built on an assumption that the 'best' retrieval is application-dependent. In any case, this should be integrated in the paper, and the relation to LOTUS (and potentially other ES RDF engines) should be made clear. 2) I find the related work to be long and not very concise. There are two pages explaining approach after approach without a direct comparison to the approach in the present paper, and then the positioning is briefly outlined in two paragraphs (which should probably be revisited according to point 1). I would suggest that this section is rewritten in a concise and focused way, explaining what are the general ideas in the two directions covered and directly, how this work relates to them. 3) While this paper does a very nice job of exploring and measuring the accuracy of different configurations, I missed the general picture. Several sections point to this from various perspectives (requirements, challenges, approach), but they are not mappable to each other. It would really help the paper if the main hypotheses and aspects are summarized fairly early in the paper, potentially aided with a scheme/table, and ideally already pointing to the results tables. It would also help if these several sections are integrated with each other better (making pointers, aligning points, etc.) 4) Besides the matter discussed in point 1, I find other claims made in the paper to be insufficiently supported/obvious. Specifically, is it justified to say that the analysis in section 5 is 'extensive' (also considering the systematicity note in point 3)? The SDM system in table 6 is on average 0.03 points better than your system - which is comparable to the improvements that you observe in the previous tables - is it fair to say it is a 'slight' improvement? And overall, section 5 claims 'high' performance - again unsure whether this is justified. Minor comments: * On several occasions, the authors claim that the result is 'as expected' - where do these expectations come from? As it is, they seem fairly ad-hoc. Addressing point 3 above would help here I think. * does one really need to be aware of the schema to rely on rdfs:label and rdfs:comment (in practice, it seems like these RDFS constructs are commonly used and can almost be assumed) * section 2.3 discusses five types of objects and explains how type i (URIs) are indexed - how are the types ii-v indexed? * the approach indexes triples - how about indexing 'statements' in general (e.g., quads)? * The last paragraph of 4.4 is quite dense and hard to follow - please rewrite * the approach does not seem very scalable - the performance of the baseline model (which is comparable to LOTUS) is similar to LOTUS, but the size of the index is 10x smaller. Can the authors comment on this? Comment on efficiency would be nice in the summary in 5.4 anyways. * How are lists exactly evaluated - please expand on this in 5.2 * please say what is DL, and b-connected * it would be nice to have a demo where users can play and get a feeling of the system behavior. * Ideally, the paper should ideally be black-and-white readable (I'd suggest adapting figure 1 to enable this)"".
Paper.186_Review.1 hasContent "In this paper, the authors proposes a keyword search approach over RDF datasets. The paper describes a practical approach to a specific problem tackled in the last few years. It is well structured, according to this kind of papers. The paper is easy to read and follow. The proposed tool is publicly available in Github, a key aspect to guaranty reproducibility of the results. Recommendations: - The section 4 (proposed approach) use only four pages over sixteen. From my point of view, this section should have more attention, but this is'nt mandatory. - The cited bibliographic should be updated. More than 50 percent aren't from the last five years (16/27)."".
Paper.191_Review.0 hasContent "The paper presents a modular evaluation framework for graph embedding techniques. The paper starts with the definition of the graph embeddings and some introductory pages / related work which also includes an overview of the various related tasks. The evaluation framework is well-described, a UML diagram and example usage being included. The largest section of the paper includes an overview of the available tasks: classification, regression, clustering, entity relatedeness, document similarity and semantic analogies. The various tasks are well explained, datasets, structure, size, model, configuration, metric, ranges and interpretation or optimum being available. The evaluation is rather similar to a story and includes some details on available use cases. Overall, even though the evaluation can be improved upon, the work feels solid."".
Paper.191_Review.1 hasContent "The paper presents a framework for evaluate graph embeddings on several tasks. This framework includes both machine learning tasks (e.g., classification, regression, clustering) and semantic tasks (e.g., entity relatedness, document similarity). The paper is well written and clear. The framework is well explained and the given examples help to understand the user cases. The choices of tasks and parameters are reasonably justified. However, while I understand the rationale given for not including link prediction, I think that one of the main advantage of these kinds of framework is to be able to perform comprehensive evaluations that include all popular tasks. Therefore, it may be useful to include also this task in the future. I could not find the Permanent URL for the resource at the beginning of the paper. The authors should include one in the camera ready. The related work is well done and fairly comprehensive. However, I would suggest to briefly mention also the work relevant to the evaluation of ontology/knowledge graphs via relevant tasks (basically the “application and usage impact” category of the survey “Zablith et al. Ontology evolution: a process-centric survey”, e.g., https://www.zora.uzh.ch/id/eprint/174974/1/00-iswc2019-pernischova-dc.pdf, http://oro.open.ac.uk/55536/1/ISWC2018_Research.pdf). It would benefit the paper to include more details about the scalability of the frameworks. How does it handle parallelization? How long did it take for the evaluation discussed in section 5? What kind of machine did you use? How would the size of the vectors affect the computational time for the various tasks? The evaluation is not particularly comprehensive, since it focuses only on two tasks (classification and regression). Why not showcase all the tasks supported by the system? Figure 3 is not very readable. Please, improve it or change it to a table. Finally, I would suggest choosing a name for the framework, in order to facilitate people that would like to refer to it. In conclusion, the framework presented in the paper appears to be quite a useful resource for the communities of Semantic Web and Machine Learning."".
Paper.191_Review.2 hasContent "The paper describes a framework for KG embedding evaluation. It takes the generated vector representation and uses them for various tasks and report the corresponding results to see how these vectors perform in different tasks. Many different tasks are already implemented and a new embedding technique could be directly tested with this framework. The related work section is well written and also shows the differences to the proposed approach. Section three gives an overview of the framework and its extension points. Following I describe my procedure when installing and using the proposed software: I tried installing the software with pip but I received a FileNotFoundError because in the setup.py in line 10 it reads the file pip_readme.md which is not contained in the package but only in the GitHub repro. The authors should fix it, to allow others an easy installation. Moreover many users currently have anaconda installed. Maybe in the future a version on anaconda would be also helpful. With my updated version of the pip package I installed it in python 2 which works fine but in my new conda environment with python 3.8.1 I got a UnicodeDecodeError which I did not further analyzed. Since python 2 already reach its end of life [1], new software should target python 3 anyway. Afterwards I wanted to test main_00.py from the example folder. Unfortunately I couldn't find the file country_vectors.txt. Thus I proceed with main_01.py. There it couldn't load the FrameworkManager. The reason was, that in all examples the import statement is ```from evaluation_manager.manager import FrameworkManager``` but should be ```from evaluation_framework.manager import FrameworkManager``` After these changes, my script runs. The result directory is not generated at the correct place (as least when installing with pip) because in file evaluationManager.py line 228 [2], the results directory is the file path of evaluationManager.py. This results in a directory like "{{pythonpath}}\\lib\\site-packages\\evaluation_framework\\../results" where nobody would have a look. Thus I recommend to change to "os.getcwd()" for the current working directory. When looking at the results folder, only a log file is generated which describes that the data files are missing. Which is correct because in the pip file, no data files (like /evaluation_framework/Classification/data/Cities.tsv). Thus these files also needs to be included in the pip file. After manually copying these files, I first got some result files which are mostly empty. In the log I saw the error message: "Classification : Problems in merging vector with gold standard" The reason was that in the objectFrequencyS.txt file no URIs from the gold standard appeared. I thought that an example would cover such things. After trying out the software I asked myself where to get the KG for which I should generate the embedding. Based on the gold standard files I assume this is DBpedia. Maybe I have overlooked it but this should be clearly mentioned somewhere (and probably also in the GitHub readme). More importantly, not only the specific version of DBpedia is necessary but also which files [3] can be used to actually allow a comparison between the embedding techniques. I know that this does not be fixed by the evaluation framework as long as each embedding uses the same files, but maybe a general recommendation would be good. The software is not yet ready to be easily used but I think the authors can update it very fast. If this is done, the framework allows to easily compare different KG embeddings methods and I think that this solves an important gap. Some minor points: - Figure one can be converted to grayscale to allow a black and white print - It would also help to point out that this work is an extension? to [4] - page 4: "do not state it further"-> "do not state if further" ; "It takes in input a file" -> "It takes as input a file" - page 10: dbo:SportsTeam is over the line width because of \texttt [1] https://www.python.org/doc/sunset-python-2/ [2] https://github.com/mariaangelapellegrino/Evaluation-Framework/blob/master/evaluation_framework/evaluationManager.py#L228 [3] https://wiki.dbpedia.org/downloads-2016-10 [4] Pellegrino M.A., Cochez M., Garofalo M., Ristoski P. (2019) A Configurable Evaluation Framework for Node Embedding Techniques. In: Hitzler P. et al. (eds) The Semantic Web: ESWC 2019 Satellite Events. ESWC 2019. Lecture Notes in Computer Science, vol 11762. Springer, Cham After reading the rebuttal, I update my overall evaluation to accept. The technical details are solved."".
Paper.193_Review.0 hasContent "This paper describes a method for taxonomy induction on knowledge graphs. The method is based on working with the knowledge graph as if it contained a collection of documents (subjects) which have annotations- tags- (property, object), so they can adapt and apply tag induction methods. The authors justify the need for extracting taxonomies from the knowledge graphs and summarize the state of the art by analyzing methods for taxonomy induction and tag induction. The paper contains a description of the method and its application to three datasets: Life, DBPedia and WordNet. The description of the experiment is not clear to me. It seems that in the three cases there are properties that define the taxonomy in the resource and that these properties are used as tags for the induction of the taxonomy, which could bias the results. The data used and the experiments have not been shared or their reproducibility facilitated. The results are compared with state of the art methods. The authors have used their own implementation of methods like Heymann and Garcia-Molina / Schmitz, which is a valuable effort but it is not clear if they are reproducing correctly the methods. This is also a limitation regarding the scalability analysis and the fact that some of these methods do not finish for some datasets. The text mentions the comparison with the tag induction methods, but Table 1 also includes the results of class taxonomy induction methods like Volker and Niepert, which is the one obtaining the best results. There is no discussion about these other methods, which are only applied to DBPedia, would they be applicable to the other ones? The results in Table 1 show that the method works better in some datasets but not in all, why?"".
Paper.193_Review.1 hasContent "This work introduces a new method to induce a class taxonomy from knowledge graphs. The problem is interesting and relevant to the community. The main contribution is the idea of re-forming triples into document-tag tuples thus the commonly used word-frequency and co-occurrence techniques from NLP can be applied on knowledge graphs. Experimental results also demonstrate the merits of the proposed technique over traditional methods. Pros: (1). Comparing with popular machine learning techniques, this proposed work is straightforward and easy to implement, without sacrificing the performance. (2). The experiments are very well designed. For example, the experiment and following discussions on selecting the decay factor are not limited to the dataset used in the paper, but provide an insightful guidance to future users applying the approach. Also, experiments on dataset size would guide future user to make decisions using the approach as well. Cons: There are still some issues that could be improved in order to better demonstrate the work: (1). The methods seems to be a deterministic approach based on the description of the algorithm. Then why "we ran the methods five times on each dataset...."? (2). Any explanation on why 'Heymann and Garcia-Molina was not able to terminate sufficiently fast enough for us to obtain results on the Life dataset'? Also, it would be great if the author could discuss why the proposed method is faster to terminate than traditional methods. (3). I would suggest to also discuss about the validity of the assumption "subclasses will co-occur in document with their superclasses more often than with classes they are not logical descendants of", especially in the context of knowledge graph. (4). There are also many typos across the paper, the authors have to proofread it. Here are just some examples: (a). "high distribution across many topics" --> I guess you mean "high frequency" (b). "lower the more distant ..." --> "lower than the more distant..." (c). "It is preferred evaluation" --> 'it is a preferred evaluation' (d). "to derive the the harmonic mean between ..." --> duplicated "the""".
Paper.193_Review.2 hasContent "The paper presents a method for inducing class hierarchies from knowledge graphs. The authors claim that the method is "simple" and is "scalable to large datasets". The method is based on ideas from tag hierarchy induction, i.e. counting classes and their co-occurrences. The approach is demonstrated based on different use cases with known datasets (Life, dbpedia, WikiData) and it is evaluated against other tag hierarchy induction methods. I like to "simpliness" of the approach allowing for performance and scalability while still showing convincing results. The paper is well written, the experiments have been clearly described and the data used for the experiments in publicly available."".
Paper.196_Review.0 hasContent "- The problem statement is not quite clear, and even less precise - The problem statement should come with an expected theoretical guarantee of closeness between the sample and the entire data set - It should also include those guarantees for specific tasks related to RDF data profiles, e.g. QA, semantics, ... - The presentation should also be broader in the sense of providing some light on how profiles are represented (all kinds of things are included there as stated at the bottom of page 2), and how they are used in combination with RDF graphs for some tasks, e.g. QA - The goal of capturing semantics through the sample is only superficially stated and treated - The introduction states as specific objective to avoid misestimations (page 2, middle), but not much is said about how and to what (precise) extent - There certain contribution here; some people may find it useful, there are experiments, but the treatment is superficial"".
Paper.230_Review.0 hasContent "Thanks for your replies, which just reinforce my positive impression of this work. ----------------------------- SUMMARY: This paper proposes a meta-dataset for a very large set of owl:sameAs links in combination with a previously published error metric for such links and a previously published large LOD Cloud subset, LOD-a-lot. This combination enables applications to decide whether to follow explicit and implicit identity links. In addition to a number of highly important requirements for such an approach, the paper provides a number of interesting use cases for the utility and utilization of the proposed dataset with error metric. The paper is extremely well written in terms of content, line of argumentation, evaluation and discussion of related work, as well as style of presentation. Furthermore, the suggested approach might grant LOD applications/the reuse of LOD resources a new level of reliability and provides a vitally needed alternative to simply defining semantically less expressive equivalence or similarity relations in the LOD. Even though such approaches already exist, as the authors point out, the presented solution is easy to reuse, adapt to specific needs, and widely applicable. Questions to authors: - Could you maybe define/exemplify the notion of community in 3.3? Might be nice to briefly mention it - On p. 9, what does "which ever comes lexicographically first/last" mean? MINOR COMMENTS (in order of appearance): 7. Low-cost. Since it it => Since it is Raad et al. 2018 [18] => Raad et al. [18] Table 4.1. => Table 1 p.11 => encoding of fb:m.05b6w1g seems strange Buistra et al. 2011 [3] => Buistra et al. [3] Lopez et al. 2013 [13] => Lopez et al. [13] cannot be more trustworthy that either => than error degree of error => error degree two categories of approach => approaches"".
Paper.237_Review.0 hasContent "This paper presents an architecture to generate a portable Question Answering System (QA) over RDF data. This architecture is an extension of a previous QA focused on the portability problem. The system allows non-sparql-experts-users to query RDF dataset using natural language. The architecture is based on machine learning algorithms and confidences. I believe this is an interesting problem. The article is well organized and motivated. However, I am not sure about the real contribution of the proposed extension. It seems that the main contribution of this paper is a new step in the previous workflow, where the user have to evaluate the query results (as shown in figure 6). This evaluation is then used to re-training the system to obtain better solutions. If I do not misunderstood, non-expert user have to know the results, or at least part of them, when doing the query?. I believe that in a real scenario an end-user does not know the results of a query in advance. On the other hand, experiments show that the proposed architecture improve F-measure significantly. Them, I am mainly concerned about which users have carried out the experiments and what level of previous knowledge about the RDF dataset they had. I think authors should clarify these questions. After reading the rebuttal response I have changed my decision to weak accept."".
Paper.247_Review.0 hasContent "This paper describes a technique that optimizes the execution of stream reasoning programs in LARS, by keeping track of formulas that will not hold and thus should not be considered during the reasoning process. This work is an incremental contribution to a previous paper by the authors, in which the same type of LARS programs execution was optimized by considering those formulas that would be guaranteed to hold for a given time interval. In the considered subset of LARS chosen by the authors, it is possible to establish formulas that may hold for all time instants on a given window, or that hold for at least some time instants of a window. With this information, the authors propose a technique for optimization in the cases where these formulas are guaranteed not-to-hold, and therefore can be excluded from the reasoning process. The techniques is well explained, and it is based on the LARS framework, which has the advantage of providing rich semantics and a number of expressive operators for stream reasoning, often offering more features than other approaches in the stream reasoning umbrella, such as RDF stream processors or CEP-based solutions. Nevertheless, given that this specific technique is essentially an incremental contribution with respect to the previous paper, the degree of novelty is not especially high. In contrast, the fact that the authors actually re-implemented Laser, is a nice technical contribution, although more on the engineering side. Another issue is related to the motivation of this work. Although the authors mention in the introduction some potential uses for this type of reasoning, the rest of the paper does not follow any of these motivating examples and goes directly into solving the proposed challenge. While the optimization is totally reasonable, the lack of a real motivating use case brings up the question of the concrete impact of these techniques in actual stream reasoning problems. The paper would benefit form a clearer motivation taken from more realistic use cases in which it is clear that handling these 'impossible derivations' has a substantial impact. This problem is also found in the evaluation. While it is fair to show the best and worst case scenarios with the proposed microbenchmarks, the reader may have the impression of having an experimentation setup that is only designed to validate the paper hypotheses, but that has no connection to real use cases and real problems. It is understandable, as the authors mention, that some of the benchmarks out there, do not really handle many of the rich features of LARS programs. However, the authors may need to find better ways of showing the utility of this interesting work, while getting closer to real life datasets and reasoning problems. The paper well-written and the technique is well described including a clear formalization, examples and description of the main algorithms in detail. ----- Thanks to the authors for the response. I disagree that this is the best that could be done in terms of evaluation. The authors can argue that the microbenchmarks are fair enough or somehow sufficient to validate their hypothesis. I still think that the risk of bias is high, but I also understand the difficulty and the tons of work that would take to build a comprehensive evaluation scenario. I keep the scores for this solid manuscript."".
Paper.251_Review.0 hasContent "A solid paper targeting a typical research situation. The description of the state of the art is complete and well written. The idea of exploiting knowledge to improve shilling attacks is original. The description of the concept and the work done is comprehensible and coherent. With some missing articles, there is a slight sloppiness in the English writing. This makes a solid experiment with a solid outcome. This is the reason for accepting the paper. What I'm missing though is what that actually means. Are the differences of using the attacks big enough to make a difference e.g. in Social media and political campaigns? What would be defensive strategies in RS using the knowledge based approach? Is there any relation between how intelligent the CF-RS is and how much this affects the change from ShA to SAShA? Knowing a system and knowing the similarity to the target items, couldn't the system avoid the random items and only concentrate only on similar items without revealing the target item? It is one thing to target the improvement of state of the art attacks, but with this background, it may be worthwhile to explore new attacks based on semantic knowledge. In this case, the challenge is discoverability."".
Paper.251_Review.1 hasContent "This paper proposed a semantic-aware method for attacking collaborative filtering recommendation models, named SAShA and investigated the impact of publicly available knowledge graph data to generate fake profiles. Novelty: The novelty of this works comes from exploiting publicly available semantic information to develop more effective shilling attack strategies against CF models in terms of overall prediction shift and overall hit ratio. Soundness: Experiments have been designed, conducted and analysed rigorous, convincing, and support the stated claims. The study evaluated SAShA on two real-world datasets by extending three baseline Shilling attacks considering different semantic types of features. In detail, they have extended random, love-hate and average attacks by considering Ontological, Categorical and Factual Knowledge Graph features extracted from DBpedia. Design and execution of the evaluation of the work: This research performed an extensive experimental evaluation in order to investigate whether SAShA is more effective than baseline attacks against Collaborative Filtering models by taking into account the impact of various semantic features. Experimental results evaluated on two real-world datasets to show the usefulness of the proposed strategy . Clarity and quality of presentation: The paper has been written clearly and has been organised in a very good structure. Grounding in the literature: Comprehensive literature review of related works has been provided and the differences between them with this research has been discussed appropriately. Appropriateness: This paper contributes to addressing theoretical, analytical, and empirical aspects of using Semantic Web in recommendation models. Overall Evaluation: I really enjoyed reading this paper and I vote to accept this paper. (3: strong accept)"".