Fine-Tuned Meta's LLama3.1 8B Parameter Model
Fine-tuned the Llama 3 8B model for instruction-following, prioritizing efficiency and speed for mathematical reasoning.

Discovery phase
It started when I was searching for a research topic on LLM and Generative AI. I used Perplexity and Grok for popular yet new research topics and one of them was Fine tuning for a specific use case without any GPU costs and even 40-60% faster. It hit me instantly and Jumped right into docs and techniques for fine-tuning models. My first fine-tuned model was on Apache dataset , later I used the same syntax and skills to build a custom-instruction fine-tuned model for mathematical Reasoning over GSM8K dataset.
Through Research papers , Various articles and Videos, I gathered information about the pain points , issues and why behind fine tuning. These Insights shaped the foundation.
Tools & Techniques Used
Hugging Face | Google Colab | Python & Libraries | Q-LorA | Unsloth | Gradio
Category
LLM Fine-Tuning | Generative AI
Live Project
Visit Website
Ideation Development
Based on the insights gathered during the research phase, I began a possible solutions and techniques. Watched and read various articles to find out unsloth was a game changer. This allowed me to translate abstract idea into a fine-tune the Llama 3 8B model for instruction-following, prioritizing efficiency and speed.
LLM Fine-Tuning
QLorA
Model Quantization

Short Summary
1. Fine-tuned an 8-billion parameter Large Language Model (Llama 3.1) on the GSM8K dataset using Parameter-Efficient Fine-Tuning (LoRA) via Unsloth, achieving 66% strict accuracy on complex, multi-step mathematical reasoning tasks, low perplexity distribution (between 2.5 and 4.0) demonstrating high confidence and fluency in generating mathematical syntax. 2. Optimized model architecture by applying 4-bit quantization (GGUF) via llama.cpp, successfully compressing the neural network to 4.92 GB to enable high-speed, offline CPU inference on standard local hardware via GPT4All. 3. Observed significant learning convergence, dropping the training loss from 1.55 to 0.73.
Applications & Use Cases
Key applications include Mathematical reasoning over grade school maths problem on zero-shot prompting and chain of thought reasoning, creating specialized chatbots, reducing reliance on lengthy prompts for consistent and efficient results.
This technique can provide significant operational value by automating time tasks thus improving overall efficiency and memory usage for solopreneurs to small startups. With the ability to operate at a minimal cost and accelerated, memory-efficient training.












