🐢 Open-Source Evaluation & Testing for AI & LLM systems
-
Updated
Jan 7, 2025 - Python
🐢 Open-Source Evaluation & Testing for AI & LLM systems
A curated list of awesome responsible machine learning resources.
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Deliver safe & effective language models
Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
Aligning AI With Shared Human Values (ICLR 2021)
[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
RuLES: a benchmark for evaluating rule-following in language models
[AAAI 2025] Official repository of Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection
Code accompanying the paper Pretraining Language Models with Human Preferences
An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
📚 A curated list of papers & technical articles on AI Quality & Safety
Toolkits to create a human-in-the-loop approval layer to monitor and guide AI agents workflow in real-time.
Attack to induce LLMs within hallucinations
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
[CCS'24] SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."