This folder contains implementations to evaluate LLM360 models on ToxiGen dataset, which evaluates language model's toxicity on text generation.
The folder contains code for model response generation and evaluation for ToxiGen dataset. Amber and Crystal models are currently supported.
single_ckpt_toxigen.py
is the main entrypoint for running ToxiGen on a single model. It uses python modules in utils/
folder.
The utils/
folder contains helper functions for model/dataset IO:
data_utils.py
: Dataset IO utilsmodel_utils.py
: Model loader
By default, the model generations are saved in ./{model_name}_{prompt_key}_with_responses.jsonl
, and the evaluation results are saved in ./{model_name}_results.jsonl
.
- Clone and enter the folder:
git clone https://github.com/LLM360/Analysis360.git cd analysis360/analysis/safety360/toxigen
- Install dependencies:
pip install -r requirements.txt
An example usage is provided in the demo.ipynb, which can be executed with a single A100 80G
GPU.