Matches in ESWC 2020 for { ?s ?p ?o. }
- Author.110.2 type RoleDuringEvent.
- Author.110.2 label "Gerhard Weikum, 2nd Author for Paper 110".
- Author.110.2 withRole PublishingRole.
- Author.110.2 isHeldBy Gerhard_Weikum.
- b0_g286 first Author.110.3.
- b0_g286 rest nil.
- Thomas_Pellissier_Tanon type Person.
- Thomas_Pellissier_Tanon name "Thomas Pellissier Tanon".
- Thomas_Pellissier_Tanon label "Thomas Pellissier Tanon".
- Thomas_Pellissier_Tanon holdsRole Author.110.1.
- Gerhard_Weikum type Person.
- Gerhard_Weikum name "Gerhard Weikum".
- Gerhard_Weikum label "Gerhard Weikum".
- Gerhard_Weikum holdsRole Author.110.2.
- Gerhard_Weikum holdsRole Author.166.3.
- Gerhard_Weikum holdsRole Author.274.3.
- Author.166.3 type RoleDuringEvent.
- Author.166.3 label "Gerhard Weikum, 3rd Author for Paper 166".
- Author.166.3 withRole PublishingRole.
- Author.166.3 isHeldBy Gerhard_Weikum.
- Author.274.3 type RoleDuringEvent.
- Author.274.3 label "Gerhard Weikum, 3rd Author for Paper 274".
- Author.274.3 withRole PublishingRole.
- Author.274.3 isHeldBy Gerhard_Weikum.
- Aidan_Hogan type Person.
- Aidan_Hogan name "Aidan Hogan".
- Aidan_Hogan label "Aidan Hogan".
- Aidan_Hogan holdsRole Paper.110_Review.0_Reviewer.
- Aidan_Hogan holdsRole Paper.155_Review.0_Reviewer.
- Aidan_Hogan mbox mailto:aidhog@gmail.com.
- Paper.110_Review.0_Reviewer type RoleDuringEvent.
- Paper.110_Review.0_Reviewer label "Aidan Hogan, Reviewer for Paper 110".
- Paper.110_Review.0_Reviewer withRole ReviewerRole.
- Paper.110_Review.0_Reviewer withRole NonAnonymousReviewerRole.
- Paper.110_Review.0_Reviewer isHeldBy Aidan_Hogan.
- Paper.155_Review.0_Reviewer type RoleDuringEvent.
- Paper.155_Review.0_Reviewer label "Aidan Hogan, Reviewer for Paper 155".
- Paper.155_Review.0_Reviewer withRole ReviewerRole.
- Paper.155_Review.0_Reviewer withRole NonAnonymousReviewerRole.
- Paper.155_Review.0_Reviewer isHeldBy Aidan_Hogan.
- Paper.110_Review.0 type ReviewVersion.
- Paper.110_Review.0 issued "2001-01-27T15:22:00.000Z".
- Paper.110_Review.0 creator Paper.110_Review.0_Reviewer.
- Paper.110_Review.0 hasRating ReviewRating.1.
- Paper.110_Review.0 hasReviewerConfidence ReviewerConfidence.4.
- Paper.110_Review.0 reviews Paper.110.
- Paper.110_Review.0 issuedAt easychair.org.
- Paper.110_Review.0 issuedFor Conference.
- Paper.110_Review.0 releasedBy Conference.
- Paper.110_Review.0 hasContent "This paper introduces YAGO 4: a new dataset derived from Wikidata and schema.org. The main goal of the dataset is to provide a cleaner subset of types for entities, along with enforcing constraints on properties. Specifically, YAGO 4 selects six top-level classes, upon which disjointness constraints are defined; properties are taken from schema.org and associated with domain, range and cardinality constraints, defined using SHACL (and RDFS/OWL). The base classes are extended with the schema.org taxonomy, and the bioschema taxonomy; Wikidata entities are associated to this taxonomy based on a defined mapping from its types to Wikidata types, where additional unmapped Wikidata types meeting certain constraints are also included. Fresh IRIs are minted based on the English name of the entity in Wikipedia or Wikidata (where available), and temporal annotations are added using RDF*. The authors describe the construction of YAGO 4 based on streaming operators, and provide statistics on three flavours of the dataset, using (1) all of Wikidata's entities; (2) only entities with a Wikipedia article; (3) only entities with an English Wikipedia article. The final dataset contains 9702 classes, 57/15/5 million entities and 326/48/18 million facts (depending on the flavour). The call for the resources track has a special list of criteria that I will initially follow for this review. Later I will discuss aspects relating to the paper itself. # Potential impact On the one hand, YAGO 4 is an incremental contribution, effectively exporting Wikidata into a schema.org-like view; the added value here is essentially a mapping of schema.org terms to a subset of Wikidata terms along with some constraints/axioms. On the other hand, I do buy into the idea that YAGO 4 could serve as a research-friendly version of Wikidata. In particular, Wikidata has grown so large and diverse that it is becoming a challenge/obstacle to manage in research works. Having a smaller subset of Wikidata with a cleaner schema seems useful along these lines for certain research works (the core use-case of OWL 2 DL-compliance is acceptable even if not thoroughly convincing). # Reusability As an emerging resource, there is no evidence of usage provided. On the other hand, I did briefly review the webpage and the provided resources, and from what I saw, the resource is more-or-less ready for use. # Design & Technical quality The authors do follow best practices for Linked Data, provide a SPARQL endpoint, etc. The dataset reuses schema.org terms. I did not find a description in VoID (mentioned by the track's call; it could be useful to add). I did notice one technical issue that the authors may need to look into relating to the dereferencing of IRIs minted for the purposes of Linked Data publishing. Taking the example of: - http://yago-knowledge.org/resource/Douglas_Adams This 301 redirects to: - http://www.yago-knowledge.org/resource/Douglas_Adams ... not a problem in itself, but for large-scale access to YAGO 4 through dereferencing, this adds an extraneous request. Looking up the aforementioned IRI gives a 303 redirect to the following URL: - http://lod.openlinksw.com/describe/?uri=http://yago-knowledge.org/resource/Douglas_Adams Unfortunately, my attempts to get RDF in popular formats from this URL were not successful. I tried: - curl --header "Accept: application/rdf+xml" http://lod.openlinksw.com/describe/?uri=http://yago-knowledge.org/resource/Douglas_Adams - curl --header "Accept: text/turtle" http://lod.openlinksw.com/describe/?uri=http://yago-knowledge.org/resource/Douglas_Adams Both provided blank responses. # Availability No persistent IRIs are provided (I don't personally see that as a major issue, though perhaps a dump could be published on Zenodo or Figshare just in case). Licence information is provided. Code is available on Github. I am not sure if YAGO 4 has been registered in any catalogues. What I really miss here is a plan for the maintenance of the resource; more specifically, the future outlook is left vague. Are there plans to periodically update YAGO 4 with new data from Wikidata and new vocabulary from schema.org? How will this be handled? # The Paper The paper itself is relatively well-written and easy to follow. I did, however, find that the paper lacks details in certain parts. * I would have liked to have seen an example of the YAGO 4 description of an illustrative entity. * Where do the property terms come from? (It is explained later that they come from schema.org, but this is too late given that constraints on these properties are defined in earlier sections.) * How were the constraints defined? Manually? Exported from Wikidata? From schema.org? How many constraints are there? * When constraints are violated how are repairs made? Are all triples removed? Or are minimal repairs somehow applied? * How many classes come from Bioschema? How many from schema.org? How many (only) from Wikidata? (The 9700 figure seems to refer to all classes in the taxonomy, irrespective of source.) * Are the RDF* meta-data provided separately? How are these data published? * Which Wikidata dump is used? Is this the truthy version or the complete version? This is important to clarify as it refers to how ranked statements are exported, and also influences the importance of the temporal meta-data, as well as the constraints (e.g., do countries have one current population or several potentially historical populations). How are such issues handled? * A complete example of a dereferenceable IRI would be appreciated. * Table 1: number of triples would be appreciated. * How are the precisions of values handled? Such precisions are a key ingredient of Wikidata. I understand that the page limit is a factor, but I think most of these details could be provided without using much space (barring, perhaps, the entity example, but I think it would be very helpful to provide this in the paper as it would implicitly address many of the other doubts). # Other Comments I find it a bit strange to call this dataset YAGO 4 as it bears little relation to the YAGOs that came before. I guess the authors wish to leverage some of the name recognition of YAGO in order to bootstrap usage of the new dataset, but I would have preferred a different name given the radically different heritage of YAGO 4. # Overall Impression Focusing on the resource itself, YAGO 4 is essentially a schema.org "view" of Wikidata. Given that it is an emerging resource, it does not yet have proven usage to point to. On the other hand, I think it has the potential to be used at least for research purposes. Hence I lean towards accepting and would encourage the authors to consider the feedback provided here in order to improve the paper and the resource. # Minor Comments - "2 Billions of type-consistent triples" -> "2 billion type-consistent triples" (also lowercase all M(m)illion, B(b)illion, etc.) - "incurs that ... are tedious" -> "makes ... tedious" - "Moreover, there is little hope to run logical reasoners ..." The authors are guilty of overreaching here. We can run logical reasoners just fine using rule-based approaches (not based on satisfiability), para-consistent approaches, repair strategies, ..., ..., ... This strawman of there being no hope of reasoning over inconsistent data is counterproductive and needs to be stamped out. Rather the authors could say something like "inconsistencies add complications ...", "care must be taken when ...", etc. - "as the KB contains many small inconsistencies" Under what logic? Under what constraints/axioms? Wikidata does not have a notion of (in)consistency to the best of my knowledge. - "pointed [to by] the ..." or "pointed [out] in the ..." - "All the number[s]" - Footnote 2: This is a very strange way to define a "class" (that it has a super-class). Why not define classes as the values of P31? - "that [it] is not easy" - "Potentially, this could lead to millions of entities ..." While strictly true, it is perhaps misleading in that it gives the impression that any user off the street could effect such a change. In reality, to the best of my knowledge, most of these "central" parts of the Wikidata schema are semi-protected for edits, and hence not so easily changed. My issue is that the text here makes Wikidata seem a bit more of a "free-for-all" than it actually is. - "the the" - "If the first class on the path" Unclear. - "allows [for] efficiently selecting ... and get[ting] back" - "This works per property ..." Unclear. - "URIs are converted into literals xsd:anyURI" Which URIs? This is very likely a bad idea. I'm guessing this is perhaps for external IDs, but these are still IRIs that should be represented as such. The valid use of "xsd:anyURI" in RDF is vanishingly rare (e.g., give the namespace *string* prefix of a vocabulary, which should confirm to URI syntax). - If the dumps are N-Triples, how are the RDF* data published? ================================================================================== Regarding the rebuttal, I am quite satisfied with the authors' clarifications and remarks. Just regarding the concrete example in Figure 1, I should clarify that it would be great to have an example *early* in the paper to clarify a lot of doubts the reader might have. If accepted, I encourage the authors to add the details and clarifications requested by the reviews (even if, for space reasons, some such details are rather made available on a webpage or an extended version published online, and referenced from the camera-ready version)."".
- Paper.110_Review.1_Reviewer type RoleDuringEvent.
- Paper.110_Review.1_Reviewer label "Anonymous Reviewer for Paper 110".
- Paper.110_Review.1_Reviewer withRole ReviewerRole.
- Paper.110_Review.1_Reviewer withRole AnonymousReviewerRole.
- Paper.110_Review.1 type ReviewVersion.
- Paper.110_Review.1 issued "2001-01-28T13:05:00.000Z".
- Paper.110_Review.1 creator Paper.110_Review.1_Reviewer.
- Paper.110_Review.1 hasRating ReviewRating.2.
- Paper.110_Review.1 hasReviewerConfidence ReviewerConfidence.5.
- Paper.110_Review.1 reviews Paper.110.
- Paper.110_Review.1 issuedAt easychair.org.
- Paper.110_Review.1 issuedFor Conference.
- Paper.110_Review.1 releasedBy Conference.
- Paper.110_Review.1 hasContent "The paper presents version 4 of the widely-used YAGO knowledge base and motivates the design choices behind the knowledge base. In contrast to previous releases, YAGO 4 combines data from Wikidata with using schema.org as upper ontology. As previous YAGO releases, YAGO 4 focuses on data quality and logical consistency for the prize of coverage. By cleansing and increasing the logical consistency of Wikidata, YAGO 4 clearly adds value and I’m sure that the knowledge base will again be widely used. YAGO 4 is clearly a suitable resource for being presented in the ESWC resource track. The text of the paper still has the following weaknesses which should be resolved for the final version: 1. You say on page 4 that you delete 26% of the Wikidata facts using your constraints. Please explain in more detail what types of facts you delete, e.g. split the 116M deleted triples by constraint type which deleted them. 2. You mention on page 4 that you validate literals using regular expressions, but do not state for which percentage of your datatype properties you have such regexes. 3. In Section 2.2 you mention the misfit between schema.org classes and Wikidata classes and explain that you delete 12M instances dues to this misfit (7.5 meta-entities). Please name the top classes to which the remaining 4.5 million entities belong, so that the reader gets an idea which classes are not covered in YAGO 4. 4. In Section 2.3 you mention that you manually map 116 relations between Yago and Wikidata. In Section 4.1 you say that you have 116 properties. How many of these properties are relations? How many are datatype properties? How many of the datatype properties do you validate (see comment above)? Please also split the facts in Table 1 into relations and datatype properties. 5. You are not the first effort to cleanse Wikidata and represent its content using a more consistent ontology. You mention the related work only very superficially on page 3 and only vaguely mention that your design choices lead to different strengths and limitations. Knowing more about the differences between the resulting knowledge bases is crucial for people to decide which resource to use in their projects. Thus, please add a proper related work section to your paper and discuss the differences between your KB and the related work in more detail. Please also add statistics about DBpediaWikidata and Wikidata itself to Tables 1 so that the reader can see the impact of the different design choices. ================================================================================== Regarding the rebuttal, I am also satisfied with the authors' clarifications and their plans to address many of our comments in the final version of the paper. I believe that YAGO 4 is a really useful resource which I expect to be as widely used as its predecessors. Given this as well as the author's willingness to improve the text of the paper, I raise my rating from weak accept to accept."".
- Paper.110_Review.2_Reviewer type RoleDuringEvent.
- Paper.110_Review.2_Reviewer label "Anonymous Reviewer for Paper 110".
- Paper.110_Review.2_Reviewer withRole ReviewerRole.
- Paper.110_Review.2_Reviewer withRole AnonymousReviewerRole.
- Paper.110_Review.2 type ReviewVersion.
- Paper.110_Review.2 issued "2001-01-29T13:43:00.000Z".
- Paper.110_Review.2 creator Paper.110_Review.2_Reviewer.
- Paper.110_Review.2 hasRating ReviewRating.1.
- Paper.110_Review.2 hasReviewerConfidence ReviewerConfidence.4.
- Paper.110_Review.2 reviews Paper.110.
- Paper.110_Review.2 issuedAt easychair.org.
- Paper.110_Review.2 issuedFor Conference.
- Paper.110_Review.2 releasedBy Conference.
- Paper.110_Review.2 hasContent "Comment after Rebuttal: Thank you for the clarifying comments which resolved many of my initial doubts. However, I am still having an unsure feeling about the mappings that are used for the extraction of YAGO4. The authors did not clarify the workflow of creating the mappings in their rebuttal (e.g., did only one person create those mappings; has there been peer-reviewing / crowd-sourcing / data-driven checks; ..). As the mappings are the central part of the new knowledge graph and are responsible for its correctness, this should be made very clear. As most of my doubts have still been resolved by the rebuttal of the authors, I am raising my final evaluation to "weak accept". --------------------------------------- The authors present the YAGO4 knowledge graph which combines Wikidata instances and parts of its type system with the very constrained ontology of schema.org. The YAGO4 knowledge graph is considerably large in size with almost 10K classes and up to 57M individuals as well as 326M facts. Despite its size, the main benefit of the resource is the very restrictive ontology which uses SHAQL constraints to ensure that the knowledge graph is in a consistent state. In general, I very much like the idea of the paper as Wikidata is in fact not properly accessible by a reasoner. Consequently, a logically consistent version of the knowledge graph has much potential for further use. The SHAQL constraints provide a nice foundation for a consistent and extensible graph. However, these constraints may also make it difficult to extend the knowledge graph further as new entities or facts have to comply to every constraint (and if the constraints are too restrictive, then a part of valid information might be excluded from the graph). But this is not a problem as it is intended that way. Although the research idea is very interesting, I think the paper/resource in its current state has still room for improvement. Among other things, it lacks a lot of detail in several places and contains some inconsistencies. Consequently, I evaluate the submission as "borderline paper". These are the problems that I see in particular: MISSING DETAILS ------------------ 1) I am missing a more detailed description of underlying resources (schema.org, Wikidata), so that the reader can better understand how you create the new knowledge graph out of them. For me, this is more relevant than, e.g., the description of previous YAGO versions (page 1) as they don't have an immediate impact on the current version. 2) The paper has only a rather superficial comparison of YAGO4 with related knowledge graphs. It should be more apparent how YAGO4 is different from other graphs (e.g. DBpedia), especially in terms of reasoning capabilities. 3) How were the mappings in sections 2.2 and 2.3 established? And in particular, how has it been made sure that the mappings are correct? As these are a central part of your knowledge graph, this should be made clear. 4) Page 3, "Disjointnesses": How do you resolve inconsistencies that come up due to the disjointnesses during the creation of YAGO4? For example: The class "Person" is disjoint with "CreativeWork", but the resource "Peter Pan" [1] in Wikidata is an instance of both. Which one do you keep (if any)? 5) Page 4, "Functional Constraints": How do you decide which of the properties are functional? And how do you resolve inconsistencies that may come up? For the property "birthPlace", for example, Wikidata lists only the most specific place of birth, but other knowledge bases like DBpedia assign multiple places of birth of varying granularity to a person. 6) Page 4, last paragraph: Has a portion of the removed facts been inspected and were the facts all actually "wrong"? What kinds of errors are fixed by removing them? 7) Page 10, "Applications": The fact itself that previous versions of YAGO have been used in several projects doesn't say much about the current version as it is - at least to my understanding - rather different from all its previous versions. Seeing some kind of application (e.g. reapplying the new version of YAGO to some old project, or at least showing what it can do better than Wikidata with the improved reasoning capabilities) would be really great here. INCONSISTENCIES ------------------- 8) Page 5, Section 2.2: You state that only "leaf-level classes are taken from Wikidata", but in section 3 (page 8) you say that you include all classes into YAGO4 that are sub-classes of a class that has been mapped to schema.org. How is that possible if you only take the leaf-level classes from Wikidata? Or do you mean in section 2.2 that you map the leaf-level classes of schema.org to Wikidata? 9) Page 9, Table 1: What kind of sameAs links to Wikipedia do you extract? The numbers for the "W" and "E" flavours (42M and 25M) seem a little high to me given that these flavours have only 15M and 5M instances. READABILITY -------------- 10) Section 3 is, in my opinion, really hard to digest. At the start of the section some high-level information about the complete extraction procedure (maybe with an overview figure and a couple of examples) would aid my understanding more than a low-level description of the implemented algebraic operators. This might also help understanding the description of the extraction workflow on page 8. NITPICKS (FORMATTING, TYPOS, ..) ------------------------------------- 11) lowercase (Section 2: "person" and "thing") vs. CamelCase (Section 2.1: "BioChemicalEntity", "Event",..) classes 12) The same for properties (e.g. Section 2, first paragraph: "birthDate" vs. "capitalof") 13) Different formatting of facts (Section 2.3: "wd:Q42 wdt:P31 wd:Q5" vs. "yago:Douglas_Adams rdf:type schema:Person") 14) Page 8, second bullet point: "sub-classes" vs. "subclasses" 15) Page 8, fourth bullet point: "subclass of" in italics but "instance of" in quotes 16) Page 4, middle: "the range of the birthPlace property" should be "the range of the birthDate property" 17) Page 1: You give four sources for YAGO but do not cite the other mentioned knowledge graphs (e.g. DBpedia, BabelNet, NELL, KnowItAll) at all [1] https://www.wikidata.org/wiki/Q107190"".
- b0_g288 first Author.111.2.
- b0_g288 rest b0_g289.
- b0_g289 first Author.111.3.
- b0_g289 rest nil.
- Author.111.3 type RoleDuringEvent.
- Author.111.3 label "Pascal Hitzler, 3rd Author for Paper 111".
- Author.111.3 withRole PublishingRole.
- Author.111.3 isHeldBy Pascal_Hitzler.
- Pascal_Hitzler type Person.
- Pascal_Hitzler name "Pascal Hitzler".
- Pascal_Hitzler label "Pascal Hitzler".
- Pascal_Hitzler holdsRole Author.111.3.
- Pascal_Hitzler holdsRole Author.218.7.
- Author.218.7 type RoleDuringEvent.
- Author.218.7 label "Pascal Hitzler, 7th Author for Paper 218".
- Author.218.7 withRole PublishingRole.
- Author.218.7 isHeldBy Pascal_Hitzler.
- Vadim_Ermolayev type Person.
- Vadim_Ermolayev name "Vadim Ermolayev".
- Vadim_Ermolayev label "Vadim Ermolayev".
- Vadim_Ermolayev holdsRole Paper.111_Review.0_Reviewer.
- Vadim_Ermolayev mbox mailto:vadim@ermolayev.com.