Cross-language information retrieval (CLIR) at TREC 2022
For some languages such as English there is an abundance of data when it comes to retrieving information on specific topics. For many other languages which are not that wide-spread, there is much less information to be found on single topics. Cross-language information retrieval (CLIR) tries to overcome this problem, by having the language of a user-defined information need not necessarily match the language of the retrieved document containing the required information.
TREC is an information retrieval evaluation campaign where specific problem definitions and evaluation datasets are presented for (groups of) people to work on and try to solve the task. This year, TREC hosts the NeuCLIR track which targets the problem of cross-language information retrieval with Chinese, Persian and Russian as target languages.
In this thesis/project you could try to advance methods for CLIR by participating in the TREC NeuCLIR track.
Relevant Literature and Links
- TREC NeuCLIR track: https://neuclir.github.io/
- Nie J-Y (2010) Cross-Language Information Retrieval. In: Synthesis Lectures on Human Language Technologies, https://www.morganclaypool.com/doi/abs/10.2200/S00266ED1V01Y201005HLT008
- Costello C, Yang E, Lawrie D, Mayfield J (2022) Patapasco: A Python Framework for Cross-Language Information Retrieval Experiments. In: ECIR'22, https://arxiv.org/abs/2201.09996