[Languagechange News] CFP: RuShiftEval - a shared task on automatic detection of semantic shift for Russian

Lidia Pivovarova lidia.pivovarova at helsinki.fi
Wed Jan 20 20:29:27 CET 2021

The *RuShiftEval <https://competitions.codalab.org/competitions/28340>* shared
task aimed at the comparison of various methods for detection of word
meaning shift from diachronic corpora. In 2020, two shared tasks for
semantic shift detection were organized: SemEval Task 1 for English,
German, Swedish and Latin languages [1] and DIACR-Ita for Italian [2].
*RuShiftEval* is the first attempt to organize a similar event with the
Russain data. The main novelty of our setting is that we deal with three
time periods naturally stemming from the history of the Russian language:

   1. Pre-soviet (1700-1916)
   2. Soviet (1918-1990)
   3. Post-soviet (1991-2016).

The shared task is collocated with Dialogue 2021, the 27th International
Conference on Computational Linguistics and Intellectual Technologies. The
system description papers will be published in the proceedings of the
Dialogue conference, indexed in Scopus.

The task is formulated as a ranking problem: a list of about 100 Russian
words should be ranked according to the strength of their meaning change:

   1. between pre-Soviet and Soviet periods;
   2. between Soviet and post-Soviet periods;
   3. between pre-Soviet and post-Soviet periods.

The lower rank means a stronger change; the higher rank means a higher
semantic similarity between word usages in different time periods.

During the evaluation phase, the participants will be provided a list of
target words. They will have to submit a tab-separated file, where each
line contains the word and three non-negative values, corresponding to
semantic change between the aforementioned periods: for example, from 1
("the senses are unrelated") to 4 ("the senses are identical"). These
values will be used to build rankings and Spearman Rank Correlation will be
computed between those ranks and the ranks based on human annotation. The
final score will be the average of three Spearman correlation coefficients.

Each participating team will be able to submit up to 10 answers in the
evaluation phase, and up to 1000 answers in the Development phase. During
the Development phase (after February 1, 2021), we will publish a small
development set (12 manually annotated Russian words), and you will be able
to submit your predictions to Codalab to get an estimation of your
performance (no true labels will be published). Before February 1, there
will go on the Practice phase, where you can submit your predictions to the
words from the RuSemShift test set. This dataset is public, so the true
labels are known to everyone. This phase can be used to sanity check your
submission routines.

Important: we use COMPARE as the measure approximating the degree of
semantic change. Essentially, COMPARE is mean semantic relatedness between
word usages from two different time periods, based on human judgments. The
lower it is, the stronger is the degree of semantic change. See [3] for
more detail.

Semantic shift should be detected based on word usage in Russian National
Corpus (RNC). The shuffled version of the corpus split in three sub-periods
is freely available <https://rusvectores.org/static/corpora/>. Each
participant will have to sign a license agreement to get access to the

The test and development sets will be manually annotated using
crowdsourcing. The annotation roughly follows the DuReL workflow described
in [3]. aIt has been previously used to produce test sets for semantic
shift detection, including RuSemShift for Russian [4]. *RuSemShift* is
freely available <https://github.com/juliarodina/RuSemShift> and could be
used as a training set, as it was annotated using the same corpora as the
evaluation dataset. Note that it is yet not known whether training data
actually helps semantic change detection. To find it out is one of the
purposes of this shared task.

After the end of the evaluation period, the test and development sets with
full annotation will be made publicly available.

   - January 20, 2021 - the shared task in announced, the Practice phase
   - February 1, 2021 - the development set is published, the Development
   phase starts
   - February 22, 2021 - the test set is published, the Evaluation phase
   - February 28, 2021 - the submission period and the Evaluation phase ends
   - March 1, 2021 - the shared task results announcement; the
   Post-Evaluation phase starts


Andrey Kutuzov, University of Oslo

Lidia Pivovarova, University of Helsinki

Valery Solovyev, Kazan Federal University

[1] Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., &
Tahmasebi, N. (2020). Semeval-2020 task 1: Unsupervised lexical semantic
change detection. Proceedings of the Fourteenth Workshop on Semantic
Evaluation, ACL, 2020

[2] Basile, Pierpaolo, et al. "DIACR-Ita@ EVALITA2020: Overview of the
EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task." Proceedings of
the 7th evaluation campaign of Natural Language Processing and Speech tools
for Italian (EVALITA 2020), Online. CEUR. org (2020).

[3] Schlechtweg, Dominik, Sabine Schulte im Walde, and Stefanie Eckmann.
"Diachronic Usage Relatedness (DURel): A Framework for the Annotation of
Lexical Semantic Change." Proceedings of the 2018 Conference of the North
American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 2 (Short Papers). 2018.

[4] Rodina, Julia, and Andrey Kutuzov. "RuSemShift: a dataset of historical
lexical semantic change in Russian." COLING 2020
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.languagechange.org/pipermail/news_lists.languagechange.org/attachments/20210120/87d726a1/attachment-0001.html>

More information about the News mailing list