Matches in ESWC 2020 for { <https://metadata.2020.eswc-conferences.org/rdf/submissions/Paper.69_Review.0> ?p ?o. }
Showing items 1 to 10 of
10
with 100 items per page.
- Paper.69_Review.0 type ReviewVersion.
- Paper.69_Review.0 issued "2001-01-29T21:06:00.000Z".
- Paper.69_Review.0 creator Paper.69_Review.0_Reviewer.
- Paper.69_Review.0 hasRating ReviewRating.1.
- Paper.69_Review.0 hasReviewerConfidence ReviewerConfidence.3.
- Paper.69_Review.0 reviews Paper.69.
- Paper.69_Review.0 issuedAt easychair.org.
- Paper.69_Review.0 issuedFor Conference.
- Paper.69_Review.0 releasedBy Conference.
- Paper.69_Review.0 hasContent "Thank your for the clarifications in the rebuttal, in particular about the covered expressivity. The scope/limits have to be clearly stated in the paper because "complex queries" can have a very different meaning for different people (for me, they're really simple). # Strengths S1 - first QA dataset that comes with answer verbalizations, as an extension of LC-QuAD S2 - several machine learning models are given as baseline for future evaluation by the community S3 - the resources are available through a dedicated web page and a repository # Weaknesses W1 - the range of expressivity of the covered questions is not clearly defined W2 - the reusability of the dataset is limited by the fact that often many answer verbalizations are possible W3 - the dataset is large (5000 questions) but this may not be enough for machine learning The dataset should also include the raw answers, as a list, for easier reusability. # Summary The main proposed resource is an extension of the LC-QuAD dataset, which is a question-answer collection for evaluating Question-Answering (QA) approaches, with a new field that contains the verbalization of the answers. The verbalizations were first generated automatically based on templates, and second were manually curated by following some style rules (e.g., active voice). The secondary resource is made of machine learning models based on neural networks to generate the answer verbalization from the question or formal query. They serve as baselines for the production of templates for answer verbalization. Scores are given in the paper for each model, and it is shown that there is ample room for improvement. # Discussion QA systems generally have low accuracy in open domain questions, ranging from 20% to 80%. It is therefore important, when answers are returned by a QA system, to give insight about how the QA system came to those answers. The authors propose to generate a verbalization of the answers that reflects the intention of the formal query that was used to retrieved the answers. I agree with the authors that this is more natural than showing the formal query or even to verbalize the formal query before listing the answers, at least in a vocal dialogue. [W1] The authors claim that the dataset covers complex questions and not only factoid questions. However, from what I have seen, all questions use either a ASK query or a SELECT DISTINCT query with a single projection, either ?uri or COUNT(?uri). We need to know more precisely the range of questions: - how many projections in the SELECT clause? at most 1? (several projections would make the verbalization more useful and interesting) - which aggregators? only COUNT? - how many triple patterns at most? what's the distribution? - are there cycles in the graph patterns? - are there graph patterns with UNION, OPTIONAL or MINUS? - what about CONSTRUCT queries (more open questions) that would really make verbalization compulsory? I agree several features are clearly future work, but it seems fair to state clearly what is covered by the dataset, and what is future work. [W2] The main difficulty I see in the proposed approach is that many correct answer verbalizations are possible. On the contrary, there is in general a single correct answer set for a given question. Although I agree that verbalizing answers is a good idea, using your own verbalizations to evaluate other verbalizations seems a bit fragile. Therefore, your verbalizations can be useful as examples or as target for machine learning, but if I come up with my own verbalizations, it is not clear how I can compare to yours. # Minor comments - Fig.1: I can't see QALD datasets, this should be added - Table 1: 33k -> 33K, 11k -> 11K - p.5: the users is --> the user is - p.6: publicity --> publicly - p.7: I would switch 'Generate' and 'Create' in the paragraph headers because the verbalization templates are manually created, while the initial verbalizations are automatically generated. - p.9: suitability --> sustainability - p.10: straight forward --> (in one word) - p.11: evaluation metrics: please give the range of values for each measure, and whether it is better to have lower values or higher values. Some of this information is given later but it would be better here, close to their definitions."".