Fine-Tuned Meta's LLama3.1 8B Parameter Model

Fine-tuned the Llama 3 8B model for instruction-following, prioritizing efficiency and speed for mathematical reasoning.

Discovery phase

It started when I was searching for a research topic on LLM and Generative AI. I used Perplexity and Grok for popular yet new research topics and one of them was Fine tuning for a specific use case without any GPU costs and even 40-60% faster. It hit me instantly and Jumped right into docs and techniques for fine-tuning models. My first fine-tuned model was on Apache dataset , later I used the same syntax and skills to build a custom-instruction fine-tuned model for mathematical Reasoning over GSM8K dataset.

Through Research papers , Various articles and Videos, I gathered information about the pain points , issues and why behind fine tuning. These Insights shaped the foundation.

Tools & Techniques Used

Hugging Face | Google Colab | Python & Libraries | Q-LorA | Unsloth | Gradio

Category

LLM Fine-Tuning | Generative AI

Live Project

Visit Website

Ideation Development

Based on the insights gathered during the research phase, I began a possible solutions and techniques. Watched and read various articles to find out unsloth was a game changer. This allowed me to translate abstract idea into a fine-tune the Llama 3 8B model for instruction-following, prioritizing efficiency and speed.

LLM Fine-Tuning

QLorA

Model Quantization

banner

Short Summary

1. Fine-tuned an 8-billion parameter Large Language Model (Llama 3.1) on the GSM8K dataset using Parameter-Efficient Fine-Tuning (LoRA) via Unsloth, achieving 66% strict accuracy on complex, multi-step mathematical reasoning tasks, low perplexity distribution (between 2.5 and 4.0) demonstrating high confidence and fluency in generating mathematical syntax. 2. Optimized model architecture by applying 4-bit quantization (GGUF) via llama.cpp, successfully compressing the neural network to 4.92 GB to enable high-speed, offline CPU inference on standard local hardware via GPT4All. 3. Observed significant learning convergence, dropping the training loss from 1.55 to 0.73.

Applications & Use Cases

Key applications include Mathematical reasoning over grade school maths problem on zero-shot prompting and chain of thought reasoning, creating specialized chatbots, reducing reliance on lengthy prompts for consistent and efficient results.

This technique can provide significant operational value by automating time tasks thus improving overall efficiency and memory usage for solopreneurs to small startups. With the ability to operate at a minimal cost and accelerated, memory-efficient training.

banner-image

Create a free website with Framer, the website builder loved by startups, designers and agencies.