alineR: an R Package for Optimizing Feature-Weighted Alignments and Linguistic Distances

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Authors Sean S. Downey, Guowei Sun, Peter Norquest
Journal/Conference Name The R Journal
Paper Category
Paper Abstract Linguistic distance measurements are commonly used in anthropology and biology when quantitative and statistical comparisons between words are needed. This is common, for example, when comparisons between linguistic and genetic data are required. Such comparisons can provide insight into historical population patterns and they provide general insight into evolutionary processes. However, the most commonly used linguistic distances are derived from edit distances, which do not weight phonetic features that may, for example, represent smaller-scale patterns in linguistic evolution. Thus, computational methods for calculating feature-weighted linguistic distances are needed for linguistic, biological, and evolutionary applications; additionally, the linguistic distances presented here are generic and may have broader applications in fields such as text mining and search. To facilitate similar research, we are making alineR available as an open-source R software package that performs feature-weighted linguistic distance calculations. The package includes a supervised learning methodology that uses a genetic algorithm and manually determined alignments to estimate 13 linguistic parameters including feature weights and a skip penalty. Here we present the package and use it to demonstrate a supervised learning methodology to estimate the optimal linguistic parameters for a sample of Austronesian languages. Our results show that the methodology can estimate these parameters for both simulated language data and for real language data, that optimizing feature weights improves alignment accuracy by approximately 29%, and that optimizing these parameters affects the resulting distance measurements. Availability: alineR is available on CRAN.
Date of publication 2017
Code Programming Language R
Comment

Copyright Researcher 2021