WordKit: a Python Package for Orthographic and Phonological Featurization

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Please contact us in case of a broken link from here

Authors Walter Daelemans, St{\'e}phan Tulkens, Dominiek ra, S
Journal/Conference Name LREC 2018 5
Paper Category
Paper Abstract The modeling of psycholinguistic phenomena, such as word reading, with machine learning techniques requires the featurization of word stimuli into appropriate orthographic and phonological representations. Critically, the choice of features impacts the performance of machine learning algorithms, and can have important ramifications for the conclusions drawn from a model. As such, featurizing words with a variety of feature sets, without having to resort to using different tools is beneficial in terms of development cost. In this work, we present wordkit, a python package which allows users to switch between feature sets and featurizers with a uniform API, allowing for rapid prototyping. To the best of our knowledge, this is the first package which integrates a variety of orthographic and phonological featurizers in a single package. The package is fully compatible with scikit-learn, and hence can be integrated into a variety of machine learning pipelines. Furthermore, the package is modular and extensible, allowing for the future integration of a large variety of feature sets and featurizers. The package and documentation can be found at github.com/stephantul/wordkit
Date of publication 2018
Code Programming Language Python

Copyright Researcher 2022