ESWC 2020 |

ESWC 2020

Matches in ESWC 2020 for { ?s ?p I am happy with the clarifications made by the authors. The rebuttal shows their capability and willingness to amend the paper in order to address my comments, mostly focusing on various claims that I found to not be carefully written or fully supported in the previous version of the paper. I have updated my score. ############### This submission investigates how to best use ElasticSearch (ES) for retrieving RDF triples, in order to achieve the best accuracy on several entity-based tasks. Several aspects are being tested: field separation, field weighting, index extensions with properties beyond the triple, and off-the-shelf ES similarity metrics. What I enjoyed about the paper is its pragmatic approach of exploring how the ES's rich functionality set can be tuned to RDF data and tasks. Another strong aspect of the paper is the multifaceted evaluation in section 5. These aspects, together with the fact that performing keyword search over RDF is an open challenge, the fairly clear paper story and the public release of the code, make this paper decent and worth considering for acceptance. I do, however, have a few comments about various claims made in this paper. This, IMHO, weakens its position and contribution. While I trust that these points are generally addressable, I have serious doubts about whether they can happen for the camera-ready version. 1) The authors claim to be the first ones to use ElasticSearch for retrieval of RDF triples and to investigate various indexing, querying and retrieval approaches. I have to disagree with this point. LOTUS (Ilievski et al., 2015; 2016), which was built on top of LodLaundromat data, also uses ElasticSearch to index and retrieve RDF triples, and investigates 32 retrieval options. This is not to say that the two approaches are the same: the present paper has focus on systematic investigation of existing functionality for accurate retrieval; LOTUS focused on scalability and was built on an assumption that the 'best' retrieval is application-dependent. In any case, this should be integrated in the paper, and the relation to LOTUS (and potentially other ES RDF engines) should be made clear. 2) I find the related work to be long and not very concise. There are two pages explaining approach after approach without a direct comparison to the approach in the present paper, and then the positioning is briefly outlined in two paragraphs (which should probably be revisited according to point 1). I would suggest that this section is rewritten in a concise and focused way, explaining what are the general ideas in the two directions covered and directly, how this work relates to them. 3) While this paper does a very nice job of exploring and measuring the accuracy of different configurations, I missed the general picture. Several sections point to this from various perspectives (requirements, challenges, approach), but they are not mappable to each other. It would really help the paper if the main hypotheses and aspects are summarized fairly early in the paper, potentially aided with a scheme/table, and ideally already pointing to the results tables. It would also help if these several sections are integrated with each other better (making pointers, aligning points, etc.) 4) Besides the matter discussed in point 1, I find other claims made in the paper to be insufficiently supported/obvious. Specifically, is it justified to say that the analysis in section 5 is 'extensive' (also considering the systematicity note in point 3)? The SDM system in table 6 is on average 0.03 points better than your system - which is comparable to the improvements that you observe in the previous tables - is it fair to say it is a 'slight' improvement? And overall, section 5 claims 'high' performance - again unsure whether this is justified. Minor comments: * On several occasions, the authors claim that the result is 'as expected' - where do these expectations come from? As it is, they seem fairly ad-hoc. Addressing point 3 above would help here I think. * does one really need to be aware of the schema to rely on rdfs:label and rdfs:comment (in practice, it seems like these RDFS constructs are commonly used and can almost be assumed) * section 2.3 discusses five types of objects and explains how type i (URIs) are indexed - how are the types ii-v indexed? * the approach indexes triples - how about indexing 'statements' in general (e.g., quads)? * The last paragraph of 4.4 is quite dense and hard to follow - please rewrite * the approach does not seem very scalable - the performance of the baseline model (which is comparable to LOTUS) is similar to LOTUS, but the size of the index is 10x smaller. Can the authors comment on this? Comment on efficiency would be nice in the summary in 5.4 anyways. * How are lists exactly evaluated - please expand on this in 5.2 * please say what is DL, and b-connected * it would be nice to have a demo where users can play and get a feeling of the system behavior. * Ideally, the paper should ideally be black-and-white readable (I'd suggest adapting figure 1 to enable this)". }

Showing items 1 to 1 of 1 with 100 items per page.

Paper.186_Review.0 hasContent "I am happy with the clarifications made by the authors. The rebuttal shows their capability and willingness to amend the paper in order to address my comments, mostly focusing on various claims that I found to not be carefully written or fully supported in the previous version of the paper. I have updated my score. ############### This submission investigates how to best use ElasticSearch (ES) for retrieving RDF triples, in order to achieve the best accuracy on several entity-based tasks. Several aspects are being tested: field separation, field weighting, index extensions with properties beyond the triple, and off-the-shelf ES similarity metrics. What I enjoyed about the paper is its pragmatic approach of exploring how the ES's rich functionality set can be tuned to RDF data and tasks. Another strong aspect of the paper is the multifaceted evaluation in section 5. These aspects, together with the fact that performing keyword search over RDF is an open challenge, the fairly clear paper story and the public release of the code, make this paper decent and worth considering for acceptance. I do, however, have a few comments about various claims made in this paper. This, IMHO, weakens its position and contribution. While I trust that these points are generally addressable, I have serious doubts about whether they can happen for the camera-ready version. 1) The authors claim to be the first ones to use ElasticSearch for retrieval of RDF triples and to investigate various indexing, querying and retrieval approaches. I have to disagree with this point. LOTUS (Ilievski et al., 2015; 2016), which was built on top of LodLaundromat data, also uses ElasticSearch to index and retrieve RDF triples, and investigates 32 retrieval options. This is not to say that the two approaches are the same: the present paper has focus on systematic investigation of existing functionality for accurate retrieval; LOTUS focused on scalability and was built on an assumption that the 'best' retrieval is application-dependent. In any case, this should be integrated in the paper, and the relation to LOTUS (and potentially other ES RDF engines) should be made clear. 2) I find the related work to be long and not very concise. There are two pages explaining approach after approach without a direct comparison to the approach in the present paper, and then the positioning is briefly outlined in two paragraphs (which should probably be revisited according to point 1). I would suggest that this section is rewritten in a concise and focused way, explaining what are the general ideas in the two directions covered and directly, how this work relates to them. 3) While this paper does a very nice job of exploring and measuring the accuracy of different configurations, I missed the general picture. Several sections point to this from various perspectives (requirements, challenges, approach), but they are not mappable to each other. It would really help the paper if the main hypotheses and aspects are summarized fairly early in the paper, potentially aided with a scheme/table, and ideally already pointing to the results tables. It would also help if these several sections are integrated with each other better (making pointers, aligning points, etc.) 4) Besides the matter discussed in point 1, I find other claims made in the paper to be insufficiently supported/obvious. Specifically, is it justified to say that the analysis in section 5 is 'extensive' (also considering the systematicity note in point 3)? The SDM system in table 6 is on average 0.03 points better than your system - which is comparable to the improvements that you observe in the previous tables - is it fair to say it is a 'slight' improvement? And overall, section 5 claims 'high' performance - again unsure whether this is justified. Minor comments: * On several occasions, the authors claim that the result is 'as expected' - where do these expectations come from? As it is, they seem fairly ad-hoc. Addressing point 3 above would help here I think. * does one really need to be aware of the schema to rely on rdfs:label and rdfs:comment (in practice, it seems like these RDFS constructs are commonly used and can almost be assumed) * section 2.3 discusses five types of objects and explains how type i (URIs) are indexed - how are the types ii-v indexed? * the approach indexes triples - how about indexing 'statements' in general (e.g., quads)? * The last paragraph of 4.4 is quite dense and hard to follow - please rewrite * the approach does not seem very scalable - the performance of the baseline model (which is comparable to LOTUS) is similar to LOTUS, but the size of the index is 10x smaller. Can the authors comment on this? Comment on efficiency would be nice in the summary in 5.4 anyways. * How are lists exactly evaluated - please expand on this in 5.2 * please say what is DL, and b-connected * it would be nice to have a demo where users can play and get a feeling of the system behavior. * Ideally, the paper should ideally be black-and-white readable (I'd suggest adapting figure 1 to enable this)"".