Tutorials
Huggingface Autotrain

Huggingface Autotrain

⚠️

You training can be interrupted at any point. Please make sure you save your checkpoint regularly and restore from checkpoint when training is interrupted. For more information, please refer to this section

In this tutorial, we will look at how to fine tune a large language model (LLM). We will use the Huggingface Autotrain (opens in a new tab) framework to fine tune the LLAMA model with Alpaca sample dataset from Huggingface (opens in a new tab).

💡

Huggingface Autotrain supports fine tuning other models, such as image models. Interested readers can check their documents and experiment with other models on EverlyAI.

Step 1 Implement Code

We will create two files. everlyai_entrypoint.sh to install dependencies and run fine tuning and dataset_downloader.py to download the dataset from Huggingface and store it in the EverlyAI file storage.

💡

We only need to download the dataset once. Once it is saved to EverlyAI file storage, it is accessible on all instances within the same region.

everlyai_entrypoint.sh
pip install -U autotrain-advanced
 
# Download dataset.
pip install datasets
mkdir -p /everlyai/fs/autotrain
python3 dataset_downloader.py
 
# Start the training process.
cd /everlyai/fs
autotrain llm \
--train \
--model 'tiiuae/falcon-7b' \
--project-name 'everlyai-autotrain' \
--data-path /everlyai/fs/autotrain/ \
--text-column text \
--lr 2e-4 \
--batch-size 1 \
--epochs 4 \
--block-size 1024 \
--warmup-ratio 0.1 \
--lora-r 16 \
--lora-alpha 32 \
--lora-dropout 0.045 \
--weight-decay 0.01 \
--gradient-accumulation 4 \
--mixed-precision fp16 \
--peft \
--quantization int4
dataset_downloader.py
from datasets import load_dataset 
 
# Load the dataset
dataset = load_dataset("tatsu-lab/alpaca") 
train = dataset['train']
train.to_csv('/everlyai/fs/autotrain/train.csv', index = False)

Within the same directory, run the following command to zip the code.

zip code.zip everlyai_entrypoint.sh load_dataset.py

The final directory looks like below.

    • everlyai_entrypoint.sh
    • dataset_downloader.py
    • code.zip
  • Step 2 Create a project

    Now we can fine tune the model to EverlyAI.

    Visit the Projects (opens in a new tab) page and click Create Project button. On the next page, enter the following configurations.

    • change job type to model training
    • change code type to local code
    • select the generated zip file in step 1.

    and click Create button. An example is shown below.

    Step 3 Check model checkpoints

    Once the project completes, we can go to the file storage of EverlyAI and find all the training artifacts there.