Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/lrec2008/papers/542> ?p ?o. }
Showing items 1 to 12 of
12
with 100 items per page.
- 542 creator anthony-hartley.
- 542 creator bogdan-babych.
- 542 type InProceedings.
- 542 label "Sensitivity of Automated MT Evaluation Metrics on Higher Quality MT Output: BLEU vs Task-Based Evaluation Methods".
- 542 sameAs 542.
- 542 abstract "We report the results of our experiment on assessing the ability of automated MT evaluation metrics to remain sensitive to variations in MT quality as the average quality of the compared systems goes up. We compare two groups of metrics: those, which measure the proximity of MT output to some reference translation, and those which evaluate the performance of some automated process on degraded MT output. The experiment shows that proximity-based metrics (such as BLEU) loose sensitivity as the scores go up, but performance-based metrics (e.g., Named Entity recognition from MT output) remain sensitive across the scale. We suggest a model for explaining this result, which attributes stable sensitivity of performance-based metrics to measuring cumulative functional effect of different language levels, while proximity-based metrics measure structural matches on a lexical level and therefore miss higher-level errors that are more typical for better MT systems. Development of new automated metrics should take into account possible decline in sensitivity on higher-quality MT, which should be tested as part of meta-evaluation of the metrics.".
- 542 hasAuthorList authorList.
- 542 hasTopic Linguistics.
- 542 isPartOf proceedings.
- 542 keyword "Evaluation methodologies".
- 542 keyword "Machine Translation, SpeechToSpeech Translation".
- 542 title "Sensitivity of Automated MT Evaluation Metrics on Higher Quality MT Output: BLEU vs Task-Based Evaluation Methods".