Continous Evaluation in Information Retrieval

Continous Evaluation in Information Retrieval

As the information era progresses, the sheer volume of information calls for sophisticated retrieval systems. Evaluating them holds the key to ensuring the reliability and relevance of retrieved information. If evaluated with renowned methods, the measured quality is generally presumed to be dependable. That said, it is often forgotten that most evaluations are only snapshots in time and the reliability might be only valid for a short moment.

Further, each evaluation method makes assumptions about the circumstances of a search and thereby has different characteristics. Achieving reliable evaluation is critical to retain the aspired quality of an IR system and maintain the confidence of the users. Therefore, we investigate how the evaluation environment (EE) evolves over time and how this might affect the effectiveness of retrieval systems. Further, attention is paid to the differences in the evaluation methods and how they work together in a continuous evaluation framework.

A literature review was conducted to investigate changing components which are then modeled in an extended EE. Exemplarily, the effect of document and qrel updates on the effectiveness of IR systems is investigated through reproducibility experiments in the LongEval shared task. As a result, 11 changing components together with initial measures to quantify how they change are identified, the temporal consistency of five IR systems could precisely be quantified through reproducibility and replicability measures and the findings were integrated into a continuous evaluation framework. Ultimately, this work contributes to more holistic evaluations in IR.

Abschlussarbeit

Abschluss
M.Sc.
Bearbeiterin
Jüri Keller