Evaluating Models with Alauda AI
If you followed the example described in the LLM Compressor with Alauda AI documentation and want to run the evaluation steps demonstrated in the Notebook, you must perform several additional manual steps.
At present, Alauda AI does not provide full, built-in support for model evaluation. As a result, these steps must be completed manually within the JupyterLab environment.
TOC
Installing required dependencies
In JupyterLab, open the Launcher page, select the Terminal tile, and run the following commands to install the required dependencies:
- When using GPUs, installing the
vllmframework is recommended to accelerate evaluation. The preinstalledtorch 2.6.0in the workbench is compatible with this version ofvllm. - To avoid incompatibilities, the
compressed_tensorsversion is pinned. - To prevent dependency conflicts,
numpyis restricted to versions earlier than 2.0. - Required:
lm-evalis the core dependency used for model evaluation.
Creating a custom evaluation task
As of the latest release, the lm_eval library does not natively support custom evaluation tasks. To enable this capability, you must manually apply a small patch to the lm_eval source code.
Edit the following file:
~/.venv/lib/python3.11/site-packages/lm_eval/tasks/__init__.py
Locate approximately line 683 and update the code as shown below. For additional context, see this upstream pull request: PR #3436.
In the Notebook examples, the evaluation task named my-wikitext is referenced. This task is not provided by default and must be defined manually by creating a my-wikitext.yaml file.
The built-in evaluation tasks in lm_eval use hard-coded dataset definitions with relative paths. This behavior causes the framework to automatically download datasets from Hugging Face. Because Hugging Face is not accessible from mainland China, you must define a custom evaluation task that points to a local dataset.
The following example shows a sample my-wikitext.yaml configuration:
- Prepare the dataset by following the Prepare and Upload a Dataset and Clone Models and Datasets in JupyterLab sections in the LLM Compressor with Alauda AI documentation.
After completing these steps, you can proceed with the model evaluation sections in either the data-free compressor notebook or the calibration compressor notebook.