Profile
Timo Breuer is a research associate at the Institute of Information Science at TH Köln (University of Applied Sciences). He is part of the team led by Prof. Philipp Schaer and works in the area of information retrieval and recommender systems.
The DFG funded project STELLA is a cooperation between GESIS, ZB MED, and TH Köln.
Timo will develop a living lab platform for search systems within the context of this project.
He received both his B.Sc. and M.Sc. in Media Technology. His previous studies had a
focus on software development, especially in the field of image processing.
His studies and several industrial projects awakened his interest in machine learning and
finally led him to the topic of information retrieval and recommender systems.
List of Publications
Browsing and Searching Metadata of TREC.
In:
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, series SIGIR '24, pages 313–323.
Association for Computing Machinery, New York, NY, USA, 2024.
Timo Breuer, Ellen M. Voorhees and Ian Soboroff.
[doi] [pdf]
[abstract]
[BibTeX]
Information Retrieval (IR) research is deeply rooted in experimentation and evaluation, and the Text REtrieval Conference (TREC) has been playing a central role in making that possible since its inauguration in 1992. TREC's mission centers around providing the infrastructure and resources to make IR evaluations possible at scale. Over the years, a plethora of different retrieval problems were addressed, culminating in data artifacts that remained as valuable and useful tools for the IR community. Even though the data are largely available from TREC's website, there is currently no resource that facilitates a cohesive way to obtain metadata information about the run file - the IR community's de-facto standard data format for storing rankings of system-oriented IR experiments.To this end, the work at hand introduces a software suite that facilitates access to metadata of experimental resources, resulting from over 30 years of IR experiments and evaluations at TREC. With a particular focus on the run files, the paper motivates the requirements for better access to TREC metadata and details the concepts, the resources, the corresponding implementations, and possible use cases. More specifically, we contribute a web interface to browse former TREC submissions. Besides, we provide the underlying metadatabase and a corresponding RESTful interface for more principled and structured queries about the TREC metadata.
Toward Evaluating the Reproducibility of Information Retrieval Systems with Simulated Users.
In:
Proceedings of the 2nd ACM Conference on Reproducibility and Replicability, series ACM REP '24, pages 25–29.
Association for Computing Machinery, New York, NY, USA, 2024.
Timo Breuer and Maria Maistro.
[doi] [pdf]
[abstract]
[BibTeX]
Reproducibility is a fundamental part of scientific progress. Compared to other scientific fields, computational sciences are privileged as experimental setups can be preserved with ease, and regression experiments allow the validation of computational results by bitwise similarity. When evaluating information access systems, the system users are often considered in the experiments, be it explicit as part of user studies or implicit as part of evaluation measures. Usually, system-oriented Information Retrieval (IR) experiments are evaluated with effectiveness measurements and batches of multiple queries. Successful reproduction of an Information Retrieval (IR) system is often determined by how well it approximates the averaged effectiveness of the original (reproduced) system. Earlier work suggests that the nave comparison of average effectiveness hides differences that exist between the original and reproduced systems. Most importantly, such differences can affect the recipients of the retrieval results, i.e., the system users. To this end, this work sheds light on what implications for users may be neglected when a system-oriented Information Retrieval (IR) experiment is prematurely considered reproduced. Based on simulated reimplementations with comparable effectiveness as the reference system, we show what differences are hidden behind averaged effectiveness scores. We discuss possible future directions and consider how these implications could be addressed with user simulations.
Validating Synthetic Usage Data in Living Lab Environments.
Journal of Data and Information Quality, 16(1):1-33, 2024.
Timo Breuer, Norbert Fuhr and Philipp Schaer.
[doi] [pdf]
[BibTeX]
Context-Driven Interactive Query Simulations Based on Generative Large Language Models.
In:
ECIR 2024.
2024.
Björn Engelmann, Timo Breuer, Jana Isabelle Friese, Philipp Schaer and Norbert Fuhr.
[pdf]
[BibTeX]
Teaching Information Retrieval with a Shared Task
Across Universities: First Steps and Findings.
In:
Proceedings of LWDA’24: Lernen, Wissen, Daten, Analysen. September 23–25, 2024, Würzburg, Germany, series CEUR Workshop Proceedings (CEUR-WS.org).
2024.
Maik Fröbe, Christopher Akiki, Timo Breuer, Thomas Eckart, Annemarie Friedrich, Lukas Gienapp, Jan Heinrich Merker, Martin Potthast, Harrisen Scells, Philipp Schaer and Benno Stein.
[pdf]
[BibTeX]
Evaluation of Temporal Change in IR Test Collections.
In:
Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval, series ICTIR '24, pages 3–13.
Association for Computing Machinery, New York, NY, USA, 2024.
Jüri Keller, Timo Breuer and Philipp Schaer.
[doi] [pdf]
[abstract]
[BibTeX]
Information retrieval systems have been evaluated using the Cranfield paradigm for many years. This paradigm allows a systematic, fair, and reproducible evaluation of different retrieval methods in fixed experimental environments. However, real-world retrieval systems must cope with dynamic environments and temporal changes that affect the document collection, topical trends, and the individual user's perception of what is considered relevant. Yet, the temporal dimension in IR evaluations is still understudied.To this end, this work investigates how the temporal generalizability of effectiveness evaluations can be assessed. As a conceptual model, we generalize Cranfield-type experiments to the temporal context by classifying the change in the essential components according to the create, update, and delete operations of persistent storage known from CRUD. From the different types of change different evaluation scenarios are derived and it is outlined what they imply. Based on these scenarios, renowned state-of-the-art retrieval systems are tested and it is investigated how the retrieval effectiveness changes on different levels of granularity.We show that the proposed measures can be well adapted to describe the changes in the retrieval results. The experiments conducted confirm that the retrieval effectiveness strongly depends on the evaluation scenario investigated. We find that not only the average retrieval performance of single systems but also the relative system performance are strongly affected by the components that change and to what extent these components changed.
Leveraging Prior Relevance Signals in Web Search.
In: G. Faggioli, N. Ferro, P. Galuscáková and A. G. S. de Herrera, editors,
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, 9-12 September, 2024, volume 3740, series CEUR Workshop Proceedings, pages 2396-2406.
CEUR-WS.org, 2024.
Jüri Keller, Timo Breuer and Philipp Schaer.
[doi]
[BibTeX]
Replicability Measures for Longitudinal Information Retrieval Evaluation.
In:
Experimental IR Meets Multilinguality, Multimodality, and Interaction - 15th International Conference of the CLEF Association, CLEF 2024, Grenoble, France, September 9–12, 2024, Proceedings, Part I.
Springer Cham, 2024.
Jüri Keller, Timo Breuer and Philipp Schaer.
[pdf]
[BibTeX]
SIGIR 2024 Workshop on Simulations for Information Access (Sim4IA 2024).
In:
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, series SIGIR '24, pages 3058–3061.
Association for Computing Machinery, New York, NY, USA, 2024.
Philipp Schaer, Christin Katharina Kreutz, Krisztian Balog, Timo Breuer and Norbert Fuhr.
[doi] [pdf]
[abstract]
[BibTeX]
Simulations in various forms have been used to evaluate information access systems, like search engines, recommender systems, or conversational agents. In the form of the Cranfield paradigm, a simulation setup is well-known in the IR community, but user simulations have recently gained interest. While user simulations help to reduce the complexity of evaluation experiments and help with reproducibility, they can also contribute to a better understanding of users. Building on recent developments in methods and toolkits, the Sim4IA workshop aims to bring together researchers and practitioners to form an interactive and engaging forum for discussions on the future perspectives of the field. An additional aim is to plan an upcoming TREC/CLEF campaign.
Bibliometric Data Fusion for Biomedical Information Retrieval.
In:
ACM/IEEE Joint Conference on Digital Libraries, JCDL 2023, Santa Fe, NM, USA, June 26-30, 2023, pages 107-118.
IEEE, 2023.
Timo Breuer, Christin Katharina Kreutz, Philipp Schaer and Dirk Tunger.
[doi] [pdf]
[BibTeX]
Simulating Users in Interactive Web Table Retrieval.
In: I. Frommholz, F. Hopfgartner, M. Lee, M. Oakes, M. Lalmas, M. Zhang and R. L. T. Santos, editors,
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21-25, 2023, pages 3875-3879.
ACM, 2023.
Björn Engelmann, Timo Breuer and Philipp Schaer.
[doi] [pdf]
[BibTeX]
Evaluating Temporal Persistence Using Replicability Measures.
In: M. Aliannejadi, G. Faggioli, N. Ferro and M. Vlachos, editors,
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), Thessaloniki, Greece, September 18th to 21st, 2023, volume 3497, series CEUR Workshop Proceedings, pages 2441-2457.
CEUR-WS.org, 2023.
Jüri Keller, Timo Breuer and Philipp Schaer.
[doi]
[BibTeX]
An in-depth investigation on the behavior of measures to quantify reproducibility.
Information Processing and Management, 60(3):103332, 2023.
Maria Maistro, Timo Breuer, Philipp Schaer and Nicola Ferro.
[doi] [pdf]
[BibTeX]
irmetadata: An Extensible Metadata Schema for IR Experiments.
In:
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3078-3089.
ACM, 2022.
Timo Breuer, Jüri Keller and Philipp Schaer.
[doi] [pdf]
[BibTeX]
Relevance assessments, bibliometrics, and altmetrics: a quantitative study on PubMed and arXiv.
Scientometrics, 127(5):2455-2478, 2022.
Timo Breuer, Philipp Schaer and Dirk Tunger.
[doi] [pdf]
[BibTeX]
Validating Simulations of User Query Variants.
In: M. Hagen, S. Verberne, C. Macdonald, C. Seifert, K. Balog, K. Nørvåg and V. Setty, editors,
Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10-14, 2022, Proceedings, Part I, volume 13185, series Lecture Notes in Computer Science, pages 80-94.
Springer, 2022.
Timo Breuer, Norbert Fuhr and Philipp Schaer.
[doi] [pdf]
[BibTeX]
A Living Lab Architecture for Reproducible Shared Task Experimentation.
In:
Information between Data and Knowledge, pages 348-362.
Werner Hülsbusch, Glückstadt, 2021.
Session 6: Emerging Technologies
Timo Breuer and Philipp Schaer.
[pdf]
[abstract]
[BibTeX]
No existing evaluation infrastructure for shared tasks currently supports both reproducible on- and offline experiments. In this work, we present an architecture that ties together both types of experiments with a focus on reproducibility. The readers are provided with a technical description of the infrastructure and details of how to contribute their own experiments to upcoming evaluation tasks.
Evaluating Elements of Web-based Data Enrichment for Pseudo-Relevance Feedback Retrieval.
In:
Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Twelfth International Conference of the CLEF Association (CLEF 2021), series Lecture Notes in Computer Science.
Springer Nature, 2021.
Timo Breuer, Melanie Pest and Philipp Schaer.
[pdf]
[BibTeX]
reproeval: A Python Interface to Reproducibility Measures of System-Oriented IR Experiments.
In: D. Hiemstra, M. Moens, J. Mothe, R. Perego, M. Potthast and F. Sebastiani, editors,
Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part II, volume 12657, series Lecture Notes in Computer Science, pages 481-486.
Springer, 2021.
Timo Breuer, Nicola Ferro, Maria Maistro and Philipp Schaer.
[doi] [pdf]
[BibTeX]
Living Lab Evaluation for Life and Social Sciences Search Platforms - LiLAS at CLEF 2021.
In: D. Hiemstra, M. Moens, J. Mothe, R. Perego, M. Potthast and F. Sebastiani, editors,
Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part II, volume 12657, series Lecture Notes in Computer Science, pages 657-664.
Springer, 2021.
Philipp Schaer, Johann Schaible and Leyla Jael Garca Castro.
[doi] [pdf]
[BibTeX]
Overview of LiLAS 2021 - Living Labs for Academic Search.
In: K. S. Candan, B. Ionescu, L. Goeuriot, B. Larsen, H. Müller, A. Joly, M. Maistro, F. Piroi, G. Faggioli and N. Ferro, editors,
Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Twelfth International Conference of the CLEF Association (CLEF 2021), volume 12880, series Lecture Notes in Computer Science.
2021.
Philipp Schaer, Timo Breuer, Leyla Jael Castro, Benjamin Wolff, Johann Schaible and Narges Tavakolpoursaleh.
[pdf]
[BibTeX]
Overview of LiLAS 2021 - Living Labs for Academic Search (Extended Overview).
In: G. Faggioli, N. Ferro, A. Joly, M. Maistro and F. Piroi, editors,
Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, series CEUR Workshop Proceedings.
2021.
Philipp Schaer, Timo Breuer, Leyla Jael Castro, Benjamin Wolff, Johann Schaible and Narges Tavakolpoursaleh.
[pdf]
[BibTeX]
How to Measure the Reproducibility of System-oriented IR Experiments.
In: J. Huang, Y. Chang, X. Cheng, J. Kamps, V. Murdock, J.-R. Wen and Y. Liu, editors,
SIGIR, pages 349-358.
ACM, 2020.
Timo Breuer, Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Philipp Schaer and Ian Soboroff.
[doi] [pdf]
[BibTeX]
Relations Between Relevance Assessments, Bibliometrics and Altmetrics.
In:
Proceedings of the 10th International Workshop on Bibliometric-enhanced Information Retrieval co-located with 42nd European Conference on Information Retrieval, BIR@ECIR 2020, Lisbon, Portugal, April 14th, 2020 [online only], pages 101-112.
2020.
Timo Breuer, Philipp Schaer and Dirk Tunger.
[doi] [pdf]
[BibTeX]
Reproducible Online Search Experiments.
In:
Advances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020, Proceedings, Part II, pages 597-601.
2020.
Timo Breuer.
[doi] [pdf]
[BibTeX]
Evaluation Infrastructures for Academic Shared Tasks.
Datenbank-Spektrum, 20(1):29-36, 2020.
Johann Schaible, Timo Breuer, Narges Tavakolpoursaleh, Bernd Müller, Benjamin Wolff and Philipp Schaer.
[doi] [pdf]
[abstract]
[BibTeX]
Academic search systems aid users in finding information covering specific topics of scientific interest and have evolved from early catalog-based library systems to modern web-scale systems. However, evaluating the performance of the underlying retrieval approaches remains a challenge. An increasing amount of requirements for producing accurate retrieval results have to be considered, e.g., close integration of the system's users. Due to these requirements, small to mid-size academic search systems cannot evaluate their retrieval system in-house. Evaluation infrastructures for shared tasks alleviate this situation. They allow researchers to experiment with retrieval approaches in specific search and recommendation scenarios without building their own infrastructure. In this paper, we elaborate on the benefits and shortcomings of four state-of-the-art evaluation infrastructures on search and recommendation tasks concerning the following requirements: support for online and offline evaluations, domain specificity of shared tasks, and reproducibility of experiments and results. In addition, we introduce an evaluation infrastructure concept design aiming at reducing the shortcomings in shared tasks for search and recommender systems.
Dockerizing Automatic Routing Runs for The Open-Source IR Replicability Challenge (OSIRRC 2019).
In: R. Clancy, N. Ferro, C. Hauff, J. Lin, T. Sakai and Z. Z. Wu, editors,
OSIRRC@SIGIR, volume 2409, series CEUR Workshop Proceedings, pages 31-35.
CEUR-WS.org, 2019.
Timo Breuer and Philipp Schaer.
[pdf]
[BibTeX]
Replicability and Reproducibility of Automatic Routing Runs.
In: L. Cappellato, N. Ferro, D. E. Losada and H. Müller, editors,
CLEF (Working Notes), volume 2380, series CEUR Workshop Proceedings.
CEUR-WS.org, 2019.
Timo Breuer and Philipp Schaer.
[pdf]
[BibTeX]
STELLA: Towards a Framework for the Reproducibility of Online Search Experiments.
In: R. Clancy, N. Ferro, C. Hauff, J. Lin, T. Sakai and Z. Z. Wu, editors,
OSIRRC@SIGIR, volume 2409, series CEUR Workshop Proceedings, pages 8-11.
CEUR-WS.org, 2019.
Timo Breuer, Philipp Schaer, Narges Tavakolpoursaleh, Johann Schaible, Benjamin Wolff and Bernd Müller.
[pdf]
[BibTeX]