story-snippets.ipynb

Notebook for the code snippets in Mamba: A shallow dive

finetune.py

This code is setting up and executing a training process for a 🐍 mamba using PyTorch, Hugging Face's Transformers, Accelerate, and W&B (Weights & Biases). Here's a breakdown of the key components and steps:

Imports and Initial Setup: The script imports necessary libraries and modules such as PyTorch, Transformers, Accelerate, and W&B. It also imports a specific model (MambaLMHeadModel) from mamba_ssm.models.mixer_seq_simple.
Accelerator and W&B Initialization: The Accelerator is initialized for distributed and mixed precision training, and W&B (a tool for tracking and visualizing machine learning experiments) is initialized in a disabled mode.
Model Loading and Modification:
- Model Path and Loading: The model is loaded from a specified path using the MambaLMHeadModel.from_pretrained method.
- Monkey Patching forward Method: The forward method of the model is modified (monkey patched) to include a custom loss function using cross-entropy loss. This modification allows the model to compute loss if labels are provided.
- Tokenizer Initialization: A tokenizer (AutoTokenizer) is loaded and configured with padding and end-of-sequence tokens.
Token Embeddings Update: The token embeddings in the model are resized to accommodate additional tokens and special tokens added to the tokenizer.
Dataset Preparation:
- Loading Dataset: A dataset is loaded from a specified source.
- Tokenization: The dataset is tokenized using the prepared tokenizer. The tokenize function processes text into a format suitable for the model.
- Multithreading: Tokenization is done in a multithreaded manner using os.cpu_count() to utilize all available CPU cores.
Data Collation: A collate function is defined to transform a list of tokenized elements into a format suitable for batch processing during training.
Training Configuration:
- Batch Size, Gradient Accumulation, and Learning Rate: Key hyperparameters for training, such as batch size, gradient accumulation steps, and learning rate, are set.
- TrainingArguments: The TrainingArguments class from Hugging Face is used to configure various aspects of training like batch sizes, logging, evaluation strategy, and learning rate scheduler.
Trainer Setup: A Trainer object is created with the model, tokenizer, training arguments, data collator, and datasets for training and evaluation.
Model Training: Finally, the model is trained using the Trainer.train() method.

Overall, the script is a comprehensive setup for training a large language model (MambaLMHeadModel) with specific configurations and customizations for tokenization, data collation, and training arguments.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
README.md		README.md
benchmark.sh		benchmark.sh
chat_mamba.py		chat_mamba.py
finetune.py		finetune.py
lm_harness_eval.py		lm_harness_eval.py
story-snippets.ipynb		story-snippets.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

story-snippets.ipynb

finetune.py

About

Releases

Packages

Contributors 4

Languages

geronimi73/mamba

Folders and files

Latest commit

History

Repository files navigation

story-snippets.ipynb

finetune.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages