Skip to content
/ SCoRe Public

SCoRe: Training Language Models to Self-Correct via Reinforcement Learning

Notifications You must be signed in to change notification settings

BY571/SCoRe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCoRe

Minimal implementation of the paper Training Language Models to Self-Correct via Reinforcement Learning

Environment Setup

1. Create and Activate Conda Environment

To set up the environment for this project, follow these steps:

  1. Create a new conda environment named "llmrl" with Python 3.9:

    conda create -n score python=3.9
    
  2. Activate the environment:

    conda activate score
    

2. Install Dependencies

Install the required packages using the requirements.txt file:

pip install -r requirements.txt

Run to test on toy problem

python score_toy.py

Run SCoRe on Math Probelm

python score_math.py

dataset_relabel.py was used to add final answer pattern: 'Final Answer: The final answer is $answer$. I hope it is correct.'

About

SCoRe: Training Language Models to Self-Correct via Reinforcement Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages