Matches in ESWC 2020 for { <https://metadata.2020.eswc-conferences.org/rdf/submissions/Paper.110_Review.0> ?p ?o. }
Showing items 1 to 10 of
10
with 100 items per page.
- Paper.110_Review.0 type ReviewVersion.
- Paper.110_Review.0 issued "2001-01-27T15:22:00.000Z".
- Paper.110_Review.0 creator Paper.110_Review.0_Reviewer.
- Paper.110_Review.0 hasRating ReviewRating.1.
- Paper.110_Review.0 hasReviewerConfidence ReviewerConfidence.4.
- Paper.110_Review.0 reviews Paper.110.
- Paper.110_Review.0 issuedAt easychair.org.
- Paper.110_Review.0 issuedFor Conference.
- Paper.110_Review.0 releasedBy Conference.
- Paper.110_Review.0 hasContent "This paper introduces YAGO 4: a new dataset derived from Wikidata and schema.org. The main goal of the dataset is to provide a cleaner subset of types for entities, along with enforcing constraints on properties. Specifically, YAGO 4 selects six top-level classes, upon which disjointness constraints are defined; properties are taken from schema.org and associated with domain, range and cardinality constraints, defined using SHACL (and RDFS/OWL). The base classes are extended with the schema.org taxonomy, and the bioschema taxonomy; Wikidata entities are associated to this taxonomy based on a defined mapping from its types to Wikidata types, where additional unmapped Wikidata types meeting certain constraints are also included. Fresh IRIs are minted based on the English name of the entity in Wikipedia or Wikidata (where available), and temporal annotations are added using RDF*. The authors describe the construction of YAGO 4 based on streaming operators, and provide statistics on three flavours of the dataset, using (1) all of Wikidata's entities; (2) only entities with a Wikipedia article; (3) only entities with an English Wikipedia article. The final dataset contains 9702 classes, 57/15/5 million entities and 326/48/18 million facts (depending on the flavour). The call for the resources track has a special list of criteria that I will initially follow for this review. Later I will discuss aspects relating to the paper itself. # Potential impact On the one hand, YAGO 4 is an incremental contribution, effectively exporting Wikidata into a schema.org-like view; the added value here is essentially a mapping of schema.org terms to a subset of Wikidata terms along with some constraints/axioms. On the other hand, I do buy into the idea that YAGO 4 could serve as a research-friendly version of Wikidata. In particular, Wikidata has grown so large and diverse that it is becoming a challenge/obstacle to manage in research works. Having a smaller subset of Wikidata with a cleaner schema seems useful along these lines for certain research works (the core use-case of OWL 2 DL-compliance is acceptable even if not thoroughly convincing). # Reusability As an emerging resource, there is no evidence of usage provided. On the other hand, I did briefly review the webpage and the provided resources, and from what I saw, the resource is more-or-less ready for use. # Design & Technical quality The authors do follow best practices for Linked Data, provide a SPARQL endpoint, etc. The dataset reuses schema.org terms. I did not find a description in VoID (mentioned by the track's call; it could be useful to add). I did notice one technical issue that the authors may need to look into relating to the dereferencing of IRIs minted for the purposes of Linked Data publishing. Taking the example of: - http://yago-knowledge.org/resource/Douglas_Adams This 301 redirects to: - http://www.yago-knowledge.org/resource/Douglas_Adams ... not a problem in itself, but for large-scale access to YAGO 4 through dereferencing, this adds an extraneous request. Looking up the aforementioned IRI gives a 303 redirect to the following URL: - http://lod.openlinksw.com/describe/?uri=http://yago-knowledge.org/resource/Douglas_Adams Unfortunately, my attempts to get RDF in popular formats from this URL were not successful. I tried: - curl --header "Accept: application/rdf+xml" http://lod.openlinksw.com/describe/?uri=http://yago-knowledge.org/resource/Douglas_Adams - curl --header "Accept: text/turtle" http://lod.openlinksw.com/describe/?uri=http://yago-knowledge.org/resource/Douglas_Adams Both provided blank responses. # Availability No persistent IRIs are provided (I don't personally see that as a major issue, though perhaps a dump could be published on Zenodo or Figshare just in case). Licence information is provided. Code is available on Github. I am not sure if YAGO 4 has been registered in any catalogues. What I really miss here is a plan for the maintenance of the resource; more specifically, the future outlook is left vague. Are there plans to periodically update YAGO 4 with new data from Wikidata and new vocabulary from schema.org? How will this be handled? # The Paper The paper itself is relatively well-written and easy to follow. I did, however, find that the paper lacks details in certain parts. * I would have liked to have seen an example of the YAGO 4 description of an illustrative entity. * Where do the property terms come from? (It is explained later that they come from schema.org, but this is too late given that constraints on these properties are defined in earlier sections.) * How were the constraints defined? Manually? Exported from Wikidata? From schema.org? How many constraints are there? * When constraints are violated how are repairs made? Are all triples removed? Or are minimal repairs somehow applied? * How many classes come from Bioschema? How many from schema.org? How many (only) from Wikidata? (The 9700 figure seems to refer to all classes in the taxonomy, irrespective of source.) * Are the RDF* meta-data provided separately? How are these data published? * Which Wikidata dump is used? Is this the truthy version or the complete version? This is important to clarify as it refers to how ranked statements are exported, and also influences the importance of the temporal meta-data, as well as the constraints (e.g., do countries have one current population or several potentially historical populations). How are such issues handled? * A complete example of a dereferenceable IRI would be appreciated. * Table 1: number of triples would be appreciated. * How are the precisions of values handled? Such precisions are a key ingredient of Wikidata. I understand that the page limit is a factor, but I think most of these details could be provided without using much space (barring, perhaps, the entity example, but I think it would be very helpful to provide this in the paper as it would implicitly address many of the other doubts). # Other Comments I find it a bit strange to call this dataset YAGO 4 as it bears little relation to the YAGOs that came before. I guess the authors wish to leverage some of the name recognition of YAGO in order to bootstrap usage of the new dataset, but I would have preferred a different name given the radically different heritage of YAGO 4. # Overall Impression Focusing on the resource itself, YAGO 4 is essentially a schema.org "view" of Wikidata. Given that it is an emerging resource, it does not yet have proven usage to point to. On the other hand, I think it has the potential to be used at least for research purposes. Hence I lean towards accepting and would encourage the authors to consider the feedback provided here in order to improve the paper and the resource. # Minor Comments - "2 Billions of type-consistent triples" -> "2 billion type-consistent triples" (also lowercase all M(m)illion, B(b)illion, etc.) - "incurs that ... are tedious" -> "makes ... tedious" - "Moreover, there is little hope to run logical reasoners ..." The authors are guilty of overreaching here. We can run logical reasoners just fine using rule-based approaches (not based on satisfiability), para-consistent approaches, repair strategies, ..., ..., ... This strawman of there being no hope of reasoning over inconsistent data is counterproductive and needs to be stamped out. Rather the authors could say something like "inconsistencies add complications ...", "care must be taken when ...", etc. - "as the KB contains many small inconsistencies" Under what logic? Under what constraints/axioms? Wikidata does not have a notion of (in)consistency to the best of my knowledge. - "pointed [to by] the ..." or "pointed [out] in the ..." - "All the number[s]" - Footnote 2: This is a very strange way to define a "class" (that it has a super-class). Why not define classes as the values of P31? - "that [it] is not easy" - "Potentially, this could lead to millions of entities ..." While strictly true, it is perhaps misleading in that it gives the impression that any user off the street could effect such a change. In reality, to the best of my knowledge, most of these "central" parts of the Wikidata schema are semi-protected for edits, and hence not so easily changed. My issue is that the text here makes Wikidata seem a bit more of a "free-for-all" than it actually is. - "the the" - "If the first class on the path" Unclear. - "allows [for] efficiently selecting ... and get[ting] back" - "This works per property ..." Unclear. - "URIs are converted into literals xsd:anyURI" Which URIs? This is very likely a bad idea. I'm guessing this is perhaps for external IDs, but these are still IRIs that should be represented as such. The valid use of "xsd:anyURI" in RDF is vanishingly rare (e.g., give the namespace *string* prefix of a vocabulary, which should confirm to URI syntax). - If the dumps are N-Triples, how are the RDF* data published? ================================================================================== Regarding the rebuttal, I am quite satisfied with the authors' clarifications and remarks. Just regarding the concrete example in Figure 1, I should clarify that it would be great to have an example *early* in the paper to clarify a lot of doubts the reader might have. If accepted, I encourage the authors to add the details and clarifications requested by the reviews (even if, for space reasons, some such details are rather made available on a webpage or an extended version published online, and referenced from the camera-ready version)."".