Huggingface Autotrain
You training can be interrupted at any point. Please make sure you save your checkpoint regularly and restore from checkpoint when training is interrupted. For more information, please refer to this section
In this tutorial, we will look at how to fine tune a large language model (LLM). We will use the Huggingface Autotrain (opens in a new tab) framework to fine tune the LLAMA model with Alpaca sample dataset from Huggingface (opens in a new tab).
Huggingface Autotrain supports fine tuning other models, such as image models. Interested readers can check their documents and experiment with other models on EverlyAI.
Step 1 Implement Code
We will create two files. everlyai_entrypoint.sh
to install dependencies and run fine tuning and dataset_downloader.py
to download the dataset from Huggingface and store it in the EverlyAI file storage.
We only need to download the dataset once. Once it is saved to EverlyAI file storage, it is accessible on all instances within the same region.
pip install -U autotrain-advanced
# Download dataset.
pip install datasets
mkdir -p /everlyai/fs/autotrain
python3 dataset_downloader.py
# Start the training process.
cd /everlyai/fs
autotrain llm \
--train \
--model 'tiiuae/falcon-7b' \
--project-name 'everlyai-autotrain' \
--data-path /everlyai/fs/autotrain/ \
--text-column text \
--lr 2e-4 \
--batch-size 1 \
--epochs 4 \
--block-size 1024 \
--warmup-ratio 0.1 \
--lora-r 16 \
--lora-alpha 32 \
--lora-dropout 0.045 \
--weight-decay 0.01 \
--gradient-accumulation 4 \
--mixed-precision fp16 \
--peft \
--quantization int4
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("tatsu-lab/alpaca")
train = dataset['train']
train.to_csv('/everlyai/fs/autotrain/train.csv', index = False)
Within the same directory, run the following command to zip the code.
zip code.zip everlyai_entrypoint.sh load_dataset.py
The final directory looks like below.
- everlyai_entrypoint.sh
- dataset_downloader.py
- code.zip
Step 2 Create a project
Now we can fine tune the model to EverlyAI.
Visit the Projects (opens in a new tab) page and click Create Project
button. On the next page, enter the following configurations.
- change
job type
tomodel training
- change
code type
tolocal code
- select the generated zip file in step 1.
and click Create
button. An example is shown below.
Step 3 Check model checkpoints
Once the project completes, we can go to the file storage of EverlyAI and find all the training artifacts there.