Research Post
Word lists have become available for most of the world’s languages, but only a small fraction of such lists contain cognate information. We present a machine-learning approach that automatically clusters words in multilingual word lists into cognate sets. Our method incorporates a number of diverse word similarity measures and features that encode the degree of affinity between pairs of languages. The output of the classification algorithm is then used to generate cognate groups. The results of the experiments on word lists representing several language families demonstrate the utility of the proposed approach.
Acknowledgements
We thank Eric Holman, Søren Wichmann, and other members of the ASJP project for sharing their cognate-annotated data sets. We also thank Shane Bergsma for insightful comments. Format conversion of the Comparative Indo-European Database was performed by Qing Dou. This research was partially funded by the Natural Sciences and Engineering Research Council of Canada.
Feb 26th 2023
Research Post
Jan 23rd 2023
Research Post
Aug 8th 2022
Research Post
Read this research paper co-authored by Canada CIFAR AI Chair Angel Chang: Learning Expected Emphatic Traces for Deep RL
Looking to build AI capacity? Need a speaker at your event?