Evaluating Models with Alauda AI

If you followed the example described in the LLM Compressor with Alauda AI documentation and want to run the evaluation steps demonstrated in the Notebook, you must perform several additional manual steps.

At present, Alauda AI does not provide full, built-in support for model evaluation. As a result, these steps must be completed manually within the JupyterLab environment.

TOC

Installing required dependencies

In JupyterLab, open the Launcher page, select the Terminal tile, and run the following commands to install the required dependencies:

/.venv/bin/python -m pip install vllm==0.8.5 -i https://pypi.tuna.tsinghua.edu.cn/simple &&
/.venv/bin/python -m pip install compressed_tensors==0.10.2 -i https://pypi.tuna.tsinghua.edu.cn/simple &&
/.venv/bin/python -m pip install --force-reinstall "numpy<2.0" -i https://pypi.tuna.tsinghua.edu.cn/simple &&
/.venv/bin/python -m pip install lm-eval -i https://pypi.tuna.tsinghua.edu.cn/simple
  1. When using GPUs, installing the vllm framework is recommended to accelerate evaluation. The preinstalled torch 2.6.0 in the workbench is compatible with this version of vllm.
  2. To avoid incompatibilities, the compressed_tensors version is pinned.
  3. To prevent dependency conflicts, numpy is restricted to versions earlier than 2.0.
  4. Required: lm-eval is the core dependency used for model evaluation.

Creating a custom evaluation task

NOTE

As of the latest release, the lm_eval library does not natively support custom evaluation tasks. To enable this capability, you must manually apply a small patch to the lm_eval source code.

Edit the following file:

~/.venv/lib/python3.11/site-packages/lm_eval/tasks/__init__.py

Locate approximately line 683 and update the code as shown below. For additional context, see this upstream pull request: PR #3436.

try:
    relative_yaml_path = yaml_path.relative_to(lm_eval_tasks_path)
except ValueError:
    relative_yaml_path = yaml_path

In the Notebook examples, the evaluation task named my-wikitext is referenced. This task is not provided by default and must be defined manually by creating a my-wikitext.yaml file.

The built-in evaluation tasks in lm_eval use hard-coded dataset definitions with relative paths. This behavior causes the framework to automatically download datasets from Hugging Face. Because Hugging Face is not accessible from mainland China, you must define a custom evaluation task that points to a local dataset.

The following example shows a sample my-wikitext.yaml configuration:

task: my-wikitext
dataset_path: /home/jovyan/wikitext_document_level
dataset_name: wikitext-2-raw-v1
output_type: loglikelihood_rolling
training_split: train
validation_split: validation
test_split: test
doc_to_text: ''''
doc_to_target: !function preprocess_wikitext.wikitext_detokenizer
process_results: !function preprocess_wikitext-process_results
should_decontaminate: true
doc_to_decontamination_query: "{{page}}"
metric_list:
  - metric: word_perplexity
  - metric: byte_perplexity
  - metric: bits_per_byte
metadata:
  version: 1.0
  1. Prepare the dataset by following the Prepare and Upload a Dataset and Clone Models and Datasets in JupyterLab sections in the LLM Compressor with Alauda AI documentation.

After completing these steps, you can proceed with the model evaluation sections in either the data-free compressor notebook or the calibration compressor notebook.