Recently I got a paper accepted into ACL 2022 in Dublin. Happy to be able to travel after a long time!
Towards Better Characterization of Paraphrases
Timothy Liu, De Wen Soh
Abstract: To effectively characterize the nature of paraphrase pairs without expert human annotation, we proposes two new metrics: word position deviation (WPD) and lexical deviation (LD). WPD measures the degree of structural alteration, while LD measures the difference in vocabulary used. We apply these metrics to better understand the commonly-used MRPC dataset and study how it differs from PAWS, another paraphrase identification dataset. We also perform a detailed study on MRPC and propose improvements to the dataset, showing that it improves generalizability of models trained on the dataset. Lastly, we apply our metrics to filter the output of a paraphrase generation model and show how it can be used to generate specific forms of paraphrases for data augmentation or robustness testing of NLP models.
<a href="https://aclanthology.org/2022.acl-long.588/" class="autolinkedURL autolinkedURL-url" target="_blank">aclanthology.org/2022.acl-long.588</a>
<a href="https://github.com/tlkh/paraphrase-metrics" class="autolinkedURL autolinkedURL-url" target="_blank">github.com/tlkh/paraphrase-metrics</a>

Recently I got a paper accepted into ACL 2022 in Dublin. Happy to be able to travel after a long time!

**Towards Better Characterization of Paraphrases**

Timothy Liu, De Wen Soh

Abstract: To effectively characterize the nature of paraphrase pairs without expert human annotation, we proposes two new metrics: word position deviation (WPD) and lexical deviation (LD). WPD measures the degree of structural alteration, while LD measures the difference in vocabulary used. We apply these metrics to better understand the commonly-used MRPC dataset and study how it differs from PAWS, another paraphrase identification dataset. We also perform a detailed study on MRPC and propose improvements to the dataset, showing that it improves generalizability of models trained on the dataset. Lastly, we apply our metrics to filter the output of a paraphrase generation model and show how it can be used to generate specific forms of paraphrases for data augmentation or robustness testing of NLP models.

https://aclanthology.org/2022.acl-long.588/

https://github.com/tlkh/paraphrase-metrics

Timothy Liu's Blog

Timothy Liu's Blog

Towards Better Characterization of Paraphrases