AutoLearn loader

Project Overview

The AutoLLMSelect project aims to:

  • 1. Publish a comprehensive LLM benchmark dataset analysis that would facilitate a robust and unbiased LLM benchmarking.
  • 2. Make the first steps towards a robust, explainable and evolving framework for automated LLM selection based on a multi-disciplinary approach that would reduce the cost for comparing a large LLM portfolio on ML datasets.
  • 3. Evaluate the applicability of the framework on a use-case from the field of sustainable development.

Due to the high complexity of the problem to be solved, the proposal will present a proof-of-concept on a selected LLM portfolio, dataset portfolio, and performance metrics, based on the available data in public benchmarks. The framework would evolve and could be extended in the future with new LLMs, benchmark datasets, ML tasks, performance metrics, from both our side and the community.

The project lasts two years and will be coordinated by the Jožef Stefan Institute (JSI) in Ljubljana, Slovenia. The project will be placed at the Computer Systems Department at JSI. A two-month research stay will take place at the Machine Learning Lab, Department of Computer Science, University of Freiburg, Germany.

Objectives

Comprehensive benchmarking

Publishing robust and transparent complementary analysis of a diverse portfolio of LLM benchmark datasets, in order to find their similarities and differences.

Automated LLM selection

Developing a framework (proof-of-concept) which allows prediction of LLM performance measured by a selected metric, on a previously unseen dataset, designed for easy extensibility with new LLMs, ML tasks, and performance metrics.

Proof-of-concept and sustainability

Evaluate the framework capacity to select an LLM which has a good performance in associating indicators to Sustainable Development Goals from United Nations 2030 Agenda for Sustainable Development.

Contribution

Reduced costs

Reduce financial costs and energy consumption for comparing a large portfolio of LLMs on a dataset of interest.

Reduce effort

Reduce the chances for domain-specific fine-tuning (even when not the case, eventual fine-tuning time would be reduced)

Enhanced reusability

Advocate thorough evaluation and reuse of already available LLMs before training or fine-tuning new ones.

Participants

Tome Eftimov

Dr. Ana Gjorgjevikj

Researcher

Tome Eftimov

Prof. Dr. Tome Eftimov

Supervisor

Computer Systems Department, Jožef Stefan Institute

Barbara Koroušić Seljak

Prof. Dr. Barbara Koroušić Seljak

Co-supervisor

Computer Systems Department, Jožef Stefan Institute

Frank Hutter

Prof. Dr. Frank Hutter

Supervisor during secondment

Machine Learning Lab, Department of Computer Science, University of Freiburg