ESWC 2020 |

ESWC 2020

Matches in ESWC 2020 for { ?s ?p After rebuttal: I thank the authors for their response. I decided to keep my score (accept). ****************** This paper describes an approach for generating a knowledge graph of software mentions in scientific papers from the social sciences. The approach includes disambiguation and enrichment using DBPedia and Wikidata. The evaluation shows that the approach has a .82 f-score for detecting software mentions in the corpora. The paper is well written, easy to follow and highly relevant for the conference and track. I believe this is an important topic to measure both the impact of software and to properly credit authors for their work, and it's great to see that both the code used and the resultant knowledge graph are available online with examples to explore (even if the readme of the code still needs work to be reusable). In addition, the authors are straightforward with the limitations of the approach, which is very useful when comparing and assessing it for reuse. Therefore I think this paper should be accepted at ESWC 2020. I list below some comments, suggestions and questions that would be great to see addressed in the camera ready version of the paper. - Given that some manual rules are needed for the approach, how dependent is the approach on the chosen domain? - The authors acknowledge that the graph has errors. However there is no comment on how would these errors be fixed when detected by users. Is there a plan for a feedback mechanism? - The authors state that one benefit of the approach is for proper attribution to authors and citation. I don't see the difference between them; aren't we attributing the authors by properly citing their work? Maybe the authors of the paper are referring to tracking the impact of software? - The precision obtained for the SSC is in most cases very low. Training with SSC with distant supervision does not really add much to the precision, which I guess it's what the consumers of the KG will mostly care. I would have liked to see some discussion on whether the extra effort is really worth the gain in those cases. - In the evaluation, the comparison against the state of the art is not really fair, because they used different corpus and domain, although it's informative. Why not comparing against a simple classifiers as baselines? For example, a TF-IDF + binary classifier on whether a sentence is a software mention or not would have been easy to do with GSC. It would not tell you which software was mentioned, but it may have been a good alternative to SSC. - I am a little confused about using "String" as a class in the data model. String is usually a data type, and having it as class does not sound right. It looks redundant to have a mention which then refers to a software, and I can think of a few alternatives that would produce a cleaner data model (specifically for querying): - 1) Extend schema:mentions with skg:mentionsSoftware, (domain skg:SoftwareArticle, range skg:SoftwareApplication, both classes extensions of their respective schema.org.). That way you can have a direct link between paper and software. - 2) Instead of String, call the class skg:SoftwareMention, it will be less confusing for users. - I would like to suggest the authors to look at codemeta.org, an extension of schema.org for scientific software that includes some of the terms proposed by the authors to describe software. - Schema.org has the class SoftwareSourceCode, so the information about the repositories could be linked as well. - Content negotiation on the vocabulary (skg) does not work. I tried: 'curl -sH "accept:application/rdf+xml" https://data.gesis.org/softwarekg -L' with text/turtle and application/rdf+xml. In both cases, only html is returned. This means I cannot import this vocabulary in my application. I didn't find a link to download the rdfs/owl file of the data model in the documentation. Since the paper does not emphasize the vocabulary as a contribution, I will not penalize this in my review, but I still think it should be addressed.". }

Showing items 1 to 1 of 1 with 100 items per page.

Paper.116_Review.0 hasContent "After rebuttal: I thank the authors for their response. I decided to keep my score (accept). ****************** This paper describes an approach for generating a knowledge graph of software mentions in scientific papers from the social sciences. The approach includes disambiguation and enrichment using DBPedia and Wikidata. The evaluation shows that the approach has a .82 f-score for detecting software mentions in the corpora. The paper is well written, easy to follow and highly relevant for the conference and track. I believe this is an important topic to measure both the impact of software and to properly credit authors for their work, and it's great to see that both the code used and the resultant knowledge graph are available online with examples to explore (even if the readme of the code still needs work to be reusable). In addition, the authors are straightforward with the limitations of the approach, which is very useful when comparing and assessing it for reuse. Therefore I think this paper should be accepted at ESWC 2020. I list below some comments, suggestions and questions that would be great to see addressed in the camera ready version of the paper. - Given that some manual rules are needed for the approach, how dependent is the approach on the chosen domain? - The authors acknowledge that the graph has errors. However there is no comment on how would these errors be fixed when detected by users. Is there a plan for a feedback mechanism? - The authors state that one benefit of the approach is for proper attribution to authors and citation. I don't see the difference between them; aren't we attributing the authors by properly citing their work? Maybe the authors of the paper are referring to tracking the impact of software? - The precision obtained for the SSC is in most cases very low. Training with SSC with distant supervision does not really add much to the precision, which I guess it's what the consumers of the KG will mostly care. I would have liked to see some discussion on whether the extra effort is really worth the gain in those cases. - In the evaluation, the comparison against the state of the art is not really fair, because they used different corpus and domain, although it's informative. Why not comparing against a simple classifiers as baselines? For example, a TF-IDF + binary classifier on whether a sentence is a software mention or not would have been easy to do with GSC. It would not tell you which software was mentioned, but it may have been a good alternative to SSC. - I am a little confused about using "String" as a class in the data model. String is usually a data type, and having it as class does not sound right. It looks redundant to have a mention which then refers to a software, and I can think of a few alternatives that would produce a cleaner data model (specifically for querying): - 1) Extend schema:mentions with skg:mentionsSoftware, (domain skg:SoftwareArticle, range skg:SoftwareApplication, both classes extensions of their respective schema.org.). That way you can have a direct link between paper and software. - 2) Instead of String, call the class skg:SoftwareMention, it will be less confusing for users. - I would like to suggest the authors to look at codemeta.org, an extension of schema.org for scientific software that includes some of the terms proposed by the authors to describe software. - Schema.org has the class SoftwareSourceCode, so the information about the repositories could be linked as well. - Content negotiation on the vocabulary (skg) does not work. I tried: 'curl -sH "accept:application/rdf+xml" https://data.gesis.org/softwarekg -L' with text/turtle and application/rdf+xml. In both cases, only html is returned. This means I cannot import this vocabulary in my application. I didn't find a link to download the rdfs/owl file of the data model in the documentation. Since the paper does not emphasize the vocabulary as a contribution, I will not penalize this in my review, but I still think it should be addressed."".