Cross-language information retrieval (CLIR) at TREC 2022

For some languages such as English there is an abundance of data when it comes to retrieving information on specific topics. For many other languages which are not that wide-spread, there is much less information to be found on single topics. Cross-language information retrieval (CLIR) tries to overcome this problem, by having the language of a user-defined information need not necessarily match the language of the retrieved document containing the required information.

TREC is an information retrieval evaluation campaign where specific problem definitions and evaluation datasets are presented for (groups of) people to work on and try to solve the task. This year, TREC hosts the NeuCLIR track which targets the problem of cross-language information retrieval with Chinese, Persian and Russian as target languages.

In this thesis/project you could try to advance methods for CLIR by participating in the TREC NeuCLIR track.

Relevant Literature and Links

TREC NeuCLIR track: https://neuclir.github.io/
Nie J-Y (2010) Cross-Language Information Retrieval. In: Synthesis Lectures on Human Language Technologies, https://www.morganclaypool.com/doi/abs/10.2200/S00266ED1V01Y201005HLT008
Costello C, Yang E, Lawrie D, Mayfield J (2022) Patapasco: A Python Framework for Cross-Language Information Retrieval Experiments. In: ECIR'22, https://arxiv.org/abs/2201.09996

Abschlussarbeit

Abschluss

B.Sc/M.Sc.

Bearbeiterin

n.a.

Betreuer/in

Christin Kreutz Philipp Schaer

Information Retrieval Research Group

IR Research Group

Technische Hochschule Köln

Cross-language information retrieval (CLIR) at TREC 2022