[Languagechange News] Release of diachronic word embeddings

Hengchen, Simon P I simon.hengchen at helsinki.fi
Fri Dec 20 14:43:41 CET 2019


Hi everyone,

This email to let you know we have released diachronic word embeddings for Finnish, Swedish, Dutch, and English. The embeddings, available at https://zenodo.org/record/3585027, were trained on newspaper data, file sizes for each time slice below. For FI-SV-NL-EN we release independently trained models (that need to be aligned), and for FI-SV-NL we also release incrementally trained models (that don't need to be aligned).

All details are available on the README in the Zenodo repository, and I'm happy to answer any questions you might have.

Best,
Simon


[simon at taito-login3 SGNS]$ du -h fi*
12M fi_1820_SGNS_corpus_file.gensim
89M fi_1840_SGNS_corpus_file.gensim
797M    fi_1860_SGNS_corpus_file.gensim
7.0G    fi_1880_SGNS_corpus_file.gensim
22G fi_1900_SGNS_corpus_file.gensim

[simon at taito-login3 SGNS]$ du -h sv*
1.6M    sv_1740_SGNS_corpus_file.gensim
44M sv_1760_SGNS_corpus_file.gensim
124M    sv_1780_SGNS_corpus_file.gensim
228M    sv_1800_SGNS_corpus_file.gensim
678M    sv_1820_SGNS_corpus_file.gensim
1.6G    sv_1840_SGNS_corpus_file.gensim
4.5G    sv_1860_SGNS_corpus_file.gensim
6.5G    sv_1880_SGNS_corpus_file.gensim
113M    sv_1900_SGNS_corpus_file.gensim

[simon at taito-login3 SGNS]$ du -h nl*
6.8M    nl_1620_SGNS_corpus_file.gensim
7.9M    nl_1640_SGNS_corpus_file.gensim
43M nl_1660_SGNS_corpus_file.gensim
78M nl_1680_SGNS_corpus_file.gensim
138M    nl_1700_SGNS_corpus_file.gensim
243M    nl_1720_SGNS_corpus_file.gensim
287M    nl_1740_SGNS_corpus_file.gensim
431M    nl_1760_SGNS_corpus_file.gensim
825M    nl_1780_SGNS_corpus_file.gensim
1.2G    nl_1800_SGNS_corpus_file.gensim
1.8G    nl_1820_SGNS_corpus_file.gensim
3.1G    nl_1840_SGNS_corpus_file.gensim
5.2G    nl_1860_SGNS_corpus_file.gensim
13G nl_1880_SGNS_corpus_file.gensim

[simon at taito-login3 SGNS]$ du -h en*
4.3M    en_1620_SGNS_corpus_file.gensim
11M en_1640_SGNS_corpus_file.gensim
11M en_1660_SGNS_corpus_file.gensim
106M    en_1680_SGNS_corpus_file.gensim
409M    en_1700_SGNS_corpus_file.gensim
1.7G    en_1720_SGNS_corpus_file.gensim
834M    en_1740_SGNS_corpus_file.gensim
2.4G    en_1760_SGNS_corpus_file.gensim
5.3G    en_1780_SGNS_corpus_file.gensim
5.5G    en_1800_SGNS_corpus_file.gensim
15G en_1820_SGNS_corpus_file.gensim
42G en_1840_SGNS_corpus_file.gensim
65G en_1860_SGNS_corpus_file.gensim
88G en_1880_SGNS_corpus_file.gensim
26G en_1900_SGNS_corpus_file.gensim
21G en_1920_SGNS_corpus_file.gensim
6.3G    en_1940_SGNS_corpus_file.gensim

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.languagechange.org/pipermail/news_lists.languagechange.org/attachments/20191220/a303cf57/attachment.html>


More information about the News mailing list