Arbel Groshaus has just wrapped up his Winter 2024 co-op term as an RA at TLOW, working on bibliographies, etymologies, and OED sources. Here he describes a project to build an etymological web to parse relationships in texts.
Hello Life of Words fans! I would like to introduce a project that I’ve been working on for a few months now. This is an etymological web, showing how millions of terms across thousands of languages are related, derived from English Wiktionary data. In this post, I’ll give an overview of how I got the data and what I’m doing with it.
Data sources
The genesis of the project was an idea by Dr. Williams to create a metric for calculating the “etymological relatedness” of text — that is, how often etymologically connected words appear close together. The more obscure the connection, the higher the score: thus, apple and applesauce (obvious) gets a score of zero whereas nerve, neuron, and sinew (all from Proto-Indo-European *snéh₁wr̥, much more interesting!) would get a very high score.