LenML LLM Leaderboard

A leaderboard focused on evaluating practical open-source language models. We only test models that are:

Why Another Leaderboard?

Current open-source model evaluations face several limitations:

Most leaderboards focus solely on English capabilities or standardized test scores
Primary emphasis on large models (100B+ parameters), which lack practicality
Evaluation methods are too academic and fail to reflect real-world usage
Limited coverage of community models, especially ERP variants

We've designed a set of metrics that better align with real-world usage scenarios:

Metric	Description
Hardcore	Evaluates model knowledge in specific (you known) niche domains
Reject	Tests model's tendency to refuse responses (lower is better)
Creative	Assesses creative writing capabilities
Long	Measures accuracy in generating content of specified length
ACG	Evaluates knowledge of Anime, Comics, and Games (ACG culture)

Issues and Pull Requests are welcome to help improve this project!