Matches in ESWC 2020 for { <https://metadata.2020.eswc-conferences.org/rdf/submissions/Paper.88_Review.2> ?p ?o. }
Showing items 1 to 10 of
10
with 100 items per page.
- Paper.88_Review.2 type ReviewVersion.
- Paper.88_Review.2 issued "2001-01-29T09:37:00.000Z".
- Paper.88_Review.2 creator Paper.88_Review.2_Reviewer.
- Paper.88_Review.2 hasRating ReviewRating.2.
- Paper.88_Review.2 hasReviewerConfidence ReviewerConfidence.5.
- Paper.88_Review.2 reviews Paper.88.
- Paper.88_Review.2 issuedAt easychair.org.
- Paper.88_Review.2 issuedFor Conference.
- Paper.88_Review.2 releasedBy Conference.
- Paper.88_Review.2 hasContent "Post-rebuttal comments: I hereby acknowledge the authors response and I wish to thank you for your clarifications. Regarding the description of the round 2 tables, thanks for clarifying that 83% is coming from [8] and that 17% was synthetically generated ... but how did you select those 10k tables. The authors state that it correspond to a manageable and relatively clean subset of the dataset published in [8]. What were the selection criteria? How do you assess this cleanness? I expect the camera ready version to clarify those points. There is a confusion between wiki redirect pages (which is what the authors are talking about) and the wiki disambiguation pages (which can be used as a noisiness technique as written by the authors). The expected clarification is whether a disambiguation page (not a redirect one) was considered as good match or not in the challenge? Original review: General comments: This paper presents the SemTab 2019 resource, composed of 4 sets of tables (summing up to roughly 15k tables) coming with semantic annotations with entities from the DBpedia knowledge graph. Those semantic annotations enable to benchmark systems that must interpret web tables according to a given knowledge graph, and in particular, the 3 sub-tasks named: CTA (guess the type of a column), CEA (disambiguate a cell value) and CPA (guess the property holding between the main focus of the table and another column). The resource also comes with a scorer to evaluate systems. This resource was used during one of the two ISWC 2019 Semantic Web Challenge and the paper reports also the results of the different systems having competed against this benchmark dataset. Finally, this benchmark being mostly synthetic, the paper also describes the methodology used to generate the dataset as well as how to improve it. The resource is well-motivated mentioning a number of applications that benefit from semantic table interpretation such as web search, QA system and Knowledge Base construction. The related work is thorough and this resource is well situated with respect to previous efforts in providing annotated datasets for comparing systems aiming to annotate web tables. While one of the drawback of existing datasets is that they always use the same knowledge graph (DBpedia or Freebase), the authors of SemTab 2019 do not address this issue, e.g. by providing tables that could be annotated with multiple knowledge graphs in the ground truth. This should be more clearly addressed. Similarly, SemTab 2019 do not provide tables in which a large number of entities would not be present in a knowledge graph (so-called NIL values). While this is acknowledged in the paper, the authors should also better state this limitation for SemTab 2019 since it is mentioned as a criticism for the state of the art. The annotated tables are mostly synthetically generated. The pipeline for generating them is well described. The first step consists in profiling a knowledge graph using a set of generic queries. Numerous tools providing this functionality exist and should be mentioned. For example, a tool such as LOUPE, http://loupe.linkeddata.es/loupe/ (http://ceur-ws.org/Vol-1486/paper_113.pdf). A number of parameters are fixed when generating tables such as: 2000 row maximum / table; 7 columns maximum / table; etc. However, those parameters are not really discussed. In order to increase the challenge, the authors have introduced some noisiness in the data. The only technique mentioned is abbreviating the first name of a person. What other techniques do the authors consider applying in the future? The round 1 corresponds to the T2Dv2 dataset [19]. This dataset contains a small number of annotation errors, some of them have been discussed on the challenge forum. It is unclear why a proper adjudication phase has not been organized among the system participants. I strongly recommend the authors to update this part of the resource and to correct the errors that have been identified by the system participants. The round 2 is composed of real tables manually annotated from [8) and of synthetically generated tables. However, the proportion of each is not mentioned in the paper. This should be addressed. Among the 12k tables, how many come from [8] and how many have been generated? Regarding the CEA task, why don't you prohibit disambiguation page as valid annotations? Those are arguably not resources that aim to identify a real word entity. It is also not clear if all possible redirect pages were considered as equal valid annotation for a given entity? Regarding the CPA task, why only one property can be considered as a valid annotation given that knowledge graph often contains a hierarchy of properties? Minor comments: * Page 2: the reference [12] should be better put as a footnote, the homepage of the semantic web challenge not being a proper reference * Page 2: dbr:ernesto is not a resource existing in the DBpedia knowledge graph. dbr:Ernesto is an existing resource (https://dbpedia.org/resource/Ernesto) but it refers to many possible Ernesto (novel, film, people, fictional character). * Page 9: "Round 3 dataset was composed of 2,162 tables; they were 406,827 ..." (and not 406,727 according to Table 2) * Page 9: "Particpants" -> "Participants" * Page 10: Table 3, add a new row with the number of days for each round to ease the analysis of each round duration * Page 12: do not use the term "Table 2d" which does not exist but rather "Figure 2d" even if it is a table * Page 15: the footnote 3 used for the "MaSI" and "ED" projects in the acknowledgment section refer to nothing * Page 15: the reference [3] could be transformed into a footnote pointing to https://github.com/sem-tab-challenge/aicrowd-evaluator"".