ESWC 2020 |

ESWC 2020

Matches in ESWC 2020 for { ?s ?p The paper describes the datasets, evaluation protocol as well as the results of the SemTab 2019 Challenge which has been conducted at ISWC2019 and has attracted a significant number of participants. The paper falls into the category “Benchmarks” of the CfP and fulfils all the review criteria given in the slideset for submissions of the category Benchmarks. The SemTab 2019 Challenge is thus clearly eligible for the resource track. The design of the challenge as well as the text of the paper still has some flaws which prevented me from giving the paper a full accept: 1. The paper is not self-contained concerning the description of the benchmark datasets as well as the description of the procedure of producing the datasets. For a resource paper, it does not allocate enough space to the description of the actual resources (the datasets and gold standards) but uses a lot of space for the discussion of the specific benchmarking methodology, participating, systems, and lessons learned from the 2019 campaign. The paper would gain as a resource paper if the second part would be shortened and more space would be allocated to: 1. detailed statistics about the datasets, e.g. histogram of columns and cells per table, amount of annotations in the gold standard as distribution of these annotations over tables of different sizes (as all tasks are easier for large tables); 2. Proper description of the “refinded lookup approach” and not only a reference to [8] as well as statistics about the impact of the label refinement (how many % of the tables were affected?). 2. The paper should denote more space to the comparison of the generated datasets to existing benchmark datasets such as Limaye, T2DV2, and the datasets from [8] as this essential for judging the results on the benchmark compared to existing results of other systems. The comparison would ideally also include more statistics about the datasets so that the reader would not need to refer to the original papers. This would for instance be much more interesting for the reader than the details on the RDFization of the tables currently provided in Section 4.4. 3. Other table to KB matching benchmarks use significantly deviating tasks, e.g. table to knowledge base class matching, table row to KB entity matching (which differs to CEA task in respect that also combinations of multiple values can be used to identity an entity, e.g. name + birthdate), matching of datatype properties (such as population or economic statistics). The paper should acknowledge the other tasks and should explicitly motivate the selection of the tasks for the challenge. 4. Please explain in more detail what is meant in section 5.1. with “the target cells, columns and column pairs were provided to the participants” and how this impacts the difficulty of the challenge compared to other benchmarks.". }

Showing items 1 to 1 of 1 with 100 items per page.

Paper.88_Review.1 hasContent "The paper describes the datasets, evaluation protocol as well as the results of the SemTab 2019 Challenge which has been conducted at ISWC2019 and has attracted a significant number of participants. The paper falls into the category “Benchmarks” of the CfP and fulfils all the review criteria given in the slideset for submissions of the category Benchmarks. The SemTab 2019 Challenge is thus clearly eligible for the resource track. The design of the challenge as well as the text of the paper still has some flaws which prevented me from giving the paper a full accept: 1. The paper is not self-contained concerning the description of the benchmark datasets as well as the description of the procedure of producing the datasets. For a resource paper, it does not allocate enough space to the description of the actual resources (the datasets and gold standards) but uses a lot of space for the discussion of the specific benchmarking methodology, participating, systems, and lessons learned from the 2019 campaign. The paper would gain as a resource paper if the second part would be shortened and more space would be allocated to: 1. detailed statistics about the datasets, e.g. histogram of columns and cells per table, amount of annotations in the gold standard as distribution of these annotations over tables of different sizes (as all tasks are easier for large tables); 2. Proper description of the “refinded lookup approach” and not only a reference to [8] as well as statistics about the impact of the label refinement (how many % of the tables were affected?). 2. The paper should denote more space to the comparison of the generated datasets to existing benchmark datasets such as Limaye, T2DV2, and the datasets from [8] as this essential for judging the results on the benchmark compared to existing results of other systems. The comparison would ideally also include more statistics about the datasets so that the reader would not need to refer to the original papers. This would for instance be much more interesting for the reader than the details on the RDFization of the tables currently provided in Section 4.4. 3. Other table to KB matching benchmarks use significantly deviating tasks, e.g. table to knowledge base class matching, table row to KB entity matching (which differs to CEA task in respect that also combinations of multiple values can be used to identity an entity, e.g. name + birthdate), matching of datatype properties (such as population or economic statistics). The paper should acknowledge the other tasks and should explicitly motivate the selection of the tasks for the challenge. 4. Please explain in more detail what is meant in section 5.1. with “the target cells, columns and column pairs were provided to the participants” and how this impacts the difficulty of the challenge compared to other benchmarks."".