Matches in ESWC 2020 for { ?s ?p ?o. }
- Paper.88_Review.1 releasedBy Conference.
- Paper.88_Review.1 hasContent "The paper describes the datasets, evaluation protocol as well as the results of the SemTab 2019 Challenge which has been conducted at ISWC2019 and has attracted a significant number of participants. The paper falls into the category “Benchmarks” of the CfP and fulfils all the review criteria given in the slideset for submissions of the category Benchmarks. The SemTab 2019 Challenge is thus clearly eligible for the resource track. The design of the challenge as well as the text of the paper still has some flaws which prevented me from giving the paper a full accept: 1. The paper is not self-contained concerning the description of the benchmark datasets as well as the description of the procedure of producing the datasets. For a resource paper, it does not allocate enough space to the description of the actual resources (the datasets and gold standards) but uses a lot of space for the discussion of the specific benchmarking methodology, participating, systems, and lessons learned from the 2019 campaign. The paper would gain as a resource paper if the second part would be shortened and more space would be allocated to: 1. detailed statistics about the datasets, e.g. histogram of columns and cells per table, amount of annotations in the gold standard as distribution of these annotations over tables of different sizes (as all tasks are easier for large tables); 2. Proper description of the “refinded lookup approach” and not only a reference to [8] as well as statistics about the impact of the label refinement (how many % of the tables were affected?). 2. The paper should denote more space to the comparison of the generated datasets to existing benchmark datasets such as Limaye, T2DV2, and the datasets from [8] as this essential for judging the results on the benchmark compared to existing results of other systems. The comparison would ideally also include more statistics about the datasets so that the reader would not need to refer to the original papers. This would for instance be much more interesting for the reader than the details on the RDFization of the tables currently provided in Section 4.4. 3. Other table to KB matching benchmarks use significantly deviating tasks, e.g. table to knowledge base class matching, table row to KB entity matching (which differs to CEA task in respect that also combinations of multiple values can be used to identity an entity, e.g. name + birthdate), matching of datatype properties (such as population or economic statistics). The paper should acknowledge the other tasks and should explicitly motivate the selection of the tasks for the challenge. 4. Please explain in more detail what is meant in section 5.1. with “the target cells, columns and column pairs were provided to the participants” and how this impacts the difficulty of the challenge compared to other benchmarks."".
- Paper.88_Review.2 type ReviewVersion.
- Paper.88_Review.2 issued "2001-01-29T09:37:00.000Z".
- Paper.88_Review.2 creator Paper.88_Review.2_Reviewer.
- Paper.88_Review.2 hasRating ReviewRating.2.
- Paper.88_Review.2 hasReviewerConfidence ReviewerConfidence.5.
- Paper.88_Review.2 reviews Paper.88.
- Paper.88_Review.2 issuedAt easychair.org.
- Paper.88_Review.2 issuedFor Conference.
- Paper.88_Review.2 releasedBy Conference.
- Paper.88_Review.2 hasContent "Post-rebuttal comments: I hereby acknowledge the authors response and I wish to thank you for your clarifications. Regarding the description of the round 2 tables, thanks for clarifying that 83% is coming from [8] and that 17% was synthetically generated ... but how did you select those 10k tables. The authors state that it correspond to a manageable and relatively clean subset of the dataset published in [8]. What were the selection criteria? How do you assess this cleanness? I expect the camera ready version to clarify those points. There is a confusion between wiki redirect pages (which is what the authors are talking about) and the wiki disambiguation pages (which can be used as a noisiness technique as written by the authors). The expected clarification is whether a disambiguation page (not a redirect one) was considered as good match or not in the challenge? Original review: General comments: This paper presents the SemTab 2019 resource, composed of 4 sets of tables (summing up to roughly 15k tables) coming with semantic annotations with entities from the DBpedia knowledge graph. Those semantic annotations enable to benchmark systems that must interpret web tables according to a given knowledge graph, and in particular, the 3 sub-tasks named: CTA (guess the type of a column), CEA (disambiguate a cell value) and CPA (guess the property holding between the main focus of the table and another column). The resource also comes with a scorer to evaluate systems. This resource was used during one of the two ISWC 2019 Semantic Web Challenge and the paper reports also the results of the different systems having competed against this benchmark dataset. Finally, this benchmark being mostly synthetic, the paper also describes the methodology used to generate the dataset as well as how to improve it. The resource is well-motivated mentioning a number of applications that benefit from semantic table interpretation such as web search, QA system and Knowledge Base construction. The related work is thorough and this resource is well situated with respect to previous efforts in providing annotated datasets for comparing systems aiming to annotate web tables. While one of the drawback of existing datasets is that they always use the same knowledge graph (DBpedia or Freebase), the authors of SemTab 2019 do not address this issue, e.g. by providing tables that could be annotated with multiple knowledge graphs in the ground truth. This should be more clearly addressed. Similarly, SemTab 2019 do not provide tables in which a large number of entities would not be present in a knowledge graph (so-called NIL values). While this is acknowledged in the paper, the authors should also better state this limitation for SemTab 2019 since it is mentioned as a criticism for the state of the art. The annotated tables are mostly synthetically generated. The pipeline for generating them is well described. The first step consists in profiling a knowledge graph using a set of generic queries. Numerous tools providing this functionality exist and should be mentioned. For example, a tool such as LOUPE, http://loupe.linkeddata.es/loupe/ (http://ceur-ws.org/Vol-1486/paper_113.pdf). A number of parameters are fixed when generating tables such as: 2000 row maximum / table; 7 columns maximum / table; etc. However, those parameters are not really discussed. In order to increase the challenge, the authors have introduced some noisiness in the data. The only technique mentioned is abbreviating the first name of a person. What other techniques do the authors consider applying in the future? The round 1 corresponds to the T2Dv2 dataset [19]. This dataset contains a small number of annotation errors, some of them have been discussed on the challenge forum. It is unclear why a proper adjudication phase has not been organized among the system participants. I strongly recommend the authors to update this part of the resource and to correct the errors that have been identified by the system participants. The round 2 is composed of real tables manually annotated from [8) and of synthetically generated tables. However, the proportion of each is not mentioned in the paper. This should be addressed. Among the 12k tables, how many come from [8] and how many have been generated? Regarding the CEA task, why don't you prohibit disambiguation page as valid annotations? Those are arguably not resources that aim to identify a real word entity. It is also not clear if all possible redirect pages were considered as equal valid annotation for a given entity? Regarding the CPA task, why only one property can be considered as a valid annotation given that knowledge graph often contains a hierarchy of properties? Minor comments: * Page 2: the reference [12] should be better put as a footnote, the homepage of the semantic web challenge not being a proper reference * Page 2: dbr:ernesto is not a resource existing in the DBpedia knowledge graph. dbr:Ernesto is an existing resource (https://dbpedia.org/resource/Ernesto) but it refers to many possible Ernesto (novel, film, people, fictional character). * Page 9: "Round 3 dataset was composed of 2,162 tables; they were 406,827 ..." (and not 406,727 according to Table 2) * Page 9: "Particpants" -> "Participants" * Page 10: Table 3, add a new row with the number of days for each round to ease the analysis of each round duration * Page 12: do not use the term "Table 2d" which does not exist but rather "Figure 2d" even if it is a table * Page 15: the footnote 3 used for the "MaSI" and "ED" projects in the acknowledgment section refer to nothing * Page 15: the reference [3] could be transformed into a footnote pointing to https://github.com/sem-tab-challenge/aicrowd-evaluator"".
- Author.89.1 type RoleDuringEvent.
- Author.89.1 label "Mojtaba Nayyeri, 1st Author for Paper 89".
- Author.89.1 withRole PublishingRole.
- Author.89.1 isHeldBy Mojtaba_Nayyeri.
- b0_g240 first Author.89.2.
- b0_g240 rest b0_g241.
- Author.89.2 type RoleDuringEvent.
- Author.89.2 label "Sahar Vahdati, 2nd Author for Paper 89".
- Author.89.2 withRole PublishingRole.
- Author.89.2 isHeldBy Sahar_Vahdati.
- b0_g241 first Author.89.3.
- b0_g241 rest b0_g242.
- Author.89.3 type RoleDuringEvent.
- Author.89.3 label "Xiaotian Zhou, 3rd Author for Paper 89".
- Author.89.3 withRole PublishingRole.
- Author.89.3 isHeldBy Xiaotian_Zhou.
- b0_g242 first Author.89.4.
- b0_g242 rest b0_g243.
- Author.89.4 type RoleDuringEvent.
- Author.89.4 label "Hamed Shariat Yazdi, 4th Author for Paper 89".
- Author.89.4 withRole PublishingRole.
- Author.89.4 isHeldBy Hamed_Shariat_Yazdi.
- b0_g243 first Author.89.5.
- b0_g243 rest nil.
- Mojtaba_Nayyeri type Person.
- Mojtaba_Nayyeri name "Mojtaba Nayyeri".
- Mojtaba_Nayyeri label "Mojtaba Nayyeri".
- Mojtaba_Nayyeri holdsRole Author.89.1.
- Mojtaba_Nayyeri holdsRole Author.243.2.
- Author.243.2 type RoleDuringEvent.
- Author.243.2 label "Mojtaba Nayyeri, 2nd Author for Paper 243".
- Author.243.2 withRole PublishingRole.
- Author.243.2 isHeldBy Mojtaba_Nayyeri.
- Sahar_Vahdati type Person.
- Sahar_Vahdati name "Sahar Vahdati".
- Sahar_Vahdati label "Sahar Vahdati".
- Sahar_Vahdati holdsRole Author.89.2.
- Xiaotian_Zhou type Person.
- Xiaotian_Zhou name "Xiaotian Zhou".
- Xiaotian_Zhou label "Xiaotian Zhou".
- Xiaotian_Zhou holdsRole Author.89.3.
- Hamed_Shariat_Yazdi type Person.
- Hamed_Shariat_Yazdi name "Hamed Shariat Yazdi".
- Hamed_Shariat_Yazdi label "Hamed Shariat Yazdi".
- Hamed_Shariat_Yazdi holdsRole Author.89.4.
- Hamed_Shariat_Yazdi holdsRole Author.243.3.
- Author.243.3 type RoleDuringEvent.
- Author.243.3 label "Hamed Shariat Yazdi, 3rd Author for Paper 243".
- Author.243.3 withRole PublishingRole.
- Author.243.3 isHeldBy Hamed_Shariat_Yazdi.
- Paper.89_Review.0_Reviewer type RoleDuringEvent.
- Paper.89_Review.0_Reviewer label "Anonymous Reviewer for Paper 89".
- Paper.89_Review.0_Reviewer withRole ReviewerRole.
- Paper.89_Review.0_Reviewer withRole AnonymousReviewerRole.
- Paper.89_Review.0 type ReviewVersion.
- Paper.89_Review.0 issued "2001-01-30T13:44:00.000Z".
- Paper.89_Review.0 creator Paper.89_Review.0_Reviewer.
- Paper.89_Review.0 hasRating ReviewRating.2.
- Paper.89_Review.0 hasReviewerConfidence ReviewerConfidence.3.
- Paper.89_Review.0 reviews Paper.89.
- Paper.89_Review.0 issuedAt easychair.org.
- Paper.89_Review.0 issuedFor Conference.
- Paper.89_Review.0 releasedBy Conference.
- Paper.89_Review.0 hasContent "In this study the authors design an approach that can recommend potential collaborators. They use a large knowledge graph to train embeddings I have one particular concern. By creating the negative sample, considering that it is based on random replacement of subject/object, is there chance that you are creating instances that are certainly not true in the present but very likely in the future? For instance, if you create (A, co-author, B) this is definitely not true at the present time as you have your knowledge graph evidencing that such triple does not exist. However, the fact that it currently does not exist does not mean that it will not happen in the future. By creating such sample, you are letting you model learn that such triple is a bad sample as instead it did not occur yet. At page 12, it would be interesting to observe more details, for the recommended authors and how many of them you kept because the was not co-authorship, not belonging to the same organisation and so on. Minor: References 19 and 20 are the same References 24 and 25 are the same After the REBUTTAL: Thank you for addressing my concerns"".
- Paper.89_Review.1_Reviewer type RoleDuringEvent.
- Paper.89_Review.1_Reviewer label "Anonymous Reviewer for Paper 89".
- Paper.89_Review.1_Reviewer withRole ReviewerRole.
- Paper.89_Review.1_Reviewer withRole AnonymousReviewerRole.
- Paper.89_Review.1 type ReviewVersion.
- Paper.89_Review.1 issued "2001-01-15T14:40:00.000Z".
- Paper.89_Review.1 creator Paper.89_Review.1_Reviewer.
- Paper.89_Review.1 hasRating ReviewRating.2.
- Paper.89_Review.1 hasReviewerConfidence ReviewerConfidence.4.
- Paper.89_Review.1 reviews Paper.89.
- Paper.89_Review.1 issuedAt easychair.org.
- Paper.89_Review.1 issuedFor Conference.
- Paper.89_Review.1 releasedBy Conference.
- Paper.89_Review.1 hasContent "The paper introduces and evaluates the effectiveness of a novel loss functions useful to learn to predict links from a knowledge graph by relying on relation and entity embeddings. In particular, the authors evaluate the proposed link prediction approach by relying on a scholarly knowledge graph and focusing on the recommendation of potential co-authorship. After introducing and motivating the general context of their work, the authors present the scholarly knowledge graphs they built in order to learn to recommend co-authorship links among Author edges: the structure, the data sources and the content (in terms of number of entities and relations) of each knowledge graph are described. Then an overview of current approaches to embedding-based knowledge graph link prediction is provided: in particular, the weak aspects of the use of the Margin Ranking Loss function used to learn such embeddings are highlighted. A new loss function, the Soft Margin Loss, is introduced with the aim of mitigating some of thee problems highlighted with respect to the Margin Ranking Loss function: in particular, the new loss function avoids the need to define a hard margin to separate prediction scores of positive and negative samples as is required if we use the Margin Ranking Loss function; indeed such hard margin could penalize training performance especially when negative sampling techniques generate a non-negligible rate of false negative training samples. The link-prediction performance of the Soft Margin Loss function is compared with scenarios where embeddings are learnt by using the Margin Ranking Loss function: several relation scoring functions are used. Both the scholarly knowledge graphs created by the authors and other link prediction datasets (FB15k and WN18) are exploited for evaluation purpose. The authors also present the results of a manual evaluation performed to validate the co-authorship recommendations proposed by their approach by training over the scholarly knowledge graphs they created. Also the sensitivity of the proposed loss function with respect to variations in hyperparameters is analyzed. --- The paper is fairly well written. It deals with a relevant topic in the area of semantic content recommendation: link prediction approaches over knowledge graphs based on entity and relation embeddings. In particular, a novel loss functions to train these embeddings with improved performance is presented and evaluated quantitatively and qualitatively by considering scholarly knowledge graphs. Comments: - could you better explain the differences among the two scholarly knowledge graphs you consider (SKGOLS and SKGNEW) with respect to the evaluation of the considered approaches? Why would you expect that these two scholarly knowledge graphs provide distinct, may be complementary evaluation frameworks / scenarios with respect to the link prediction approaches considered or proposed? - in Section 5.2 "Quality and Soundness analysis", the manual evaluation performed and their results should be explained with greater clarity. Instead of "50 recommendations filtered for a closer look", it could be better to say that "the top 50 recommendations for each author have been manually reviewed in order to distinguish correct from incorrect ones with respect to the following set of criteria: 1. close match in research...". It is not clear that Table 4 contains the results of this manual filtering of automated recommendations. - could you explicitly specify in Table 2 and 3 the meaning of underlined and bold numbers? - does the formula (6) of the loss function miss a sigma / simmation over all negative samples? MINORS: - Abstract: sixth line: "knowldge graph embedding (KGE) models have..." --> Knowldge Graph Embedding (uppecase first letter) - Figure 1: could you check if the direction of the "isPublished" relation / link (from Event to Paper) is correct? For completnessm, should the direction of the "isAffiliatedIn" relation / link be specified (it is not)? - Section 3, "Preliminaries and Related Work": "A Kg is roughly represented" --> KG - Section 3, "Preliminaries and Related Work": "...defines an score function..." --> a score function - Section 3, "Preliminaries and Related Work": "...a loss function to adjust embedding." --> embeddings - Section 5, "Evaluation": "...evaluation methods have been performed in order to approve: 1) better performance and..." --> assess: 1)"".
- Paper.89_Review.2_Reviewer type RoleDuringEvent.
- Paper.89_Review.2_Reviewer label "Anonymous Reviewer for Paper 89".
- Paper.89_Review.2_Reviewer withRole ReviewerRole.
- Paper.89_Review.2_Reviewer withRole AnonymousReviewerRole.
- Paper.89_Review.2 type ReviewVersion.
- Paper.89_Review.2 issued "2001-01-18T13:12:00.000Z".
- Paper.89_Review.2 creator Paper.89_Review.2_Reviewer.
- Paper.89_Review.2 hasRating ReviewRating..
- Paper.89_Review.2 hasReviewerConfidence ReviewerConfidence.4.
- Paper.89_Review.2 reviews Paper.89.