Latest Posts

Blog

apoorvakumar169

·

Jan 18, 2025

Top AI/LLM learning resource in 2025

The Blog is organized into three main segments: 📝 Notebooks Below is a collection of notebooks and articles dedicated to LLMs. Tools Notebook Name Description Notebook 🧐 LLM AutoEval Evaluate your LLMs automatically using RunPod. Notebook 🥱 LazyMergekit Merge models effortlessly using MergeKit with a single click. Notebook 🦎 LazyAxolotl Fine-tune models in the cloud…
Blog

apoorvakumar169

·

Jan 18, 2025

Improve ChatGPT with Knowledge Graphs

ChatGPT has shown impressive capabilities in processing and generating human-like text. However, it is not without its imperfections. A primary concern is the model’s propensity to produce either inaccurate or obsolete answers , often called “hallucinations.” The New York Times recently highlighted this issue in their article, “Here’s What Happens When Your Lawyer Uses ChatGPT.”…
Blog

apoorvakumar169

·

Jan 18, 2025

Uncensor any LLM with abliteration

The third generation of Llama models provided fine-tunes (Instruct) versions that excel in understanding and following instructions. However, these models are heavily censored, designed to refuse requests seen as harmful with responses such as “As an AI assistant, I cannot help you.” While this safety feature is crucial for preventing misuse, it limits the model’s…
Blog

apoorvakumar169

·

Jan 18, 2025

Create Mixtures of Experts with MergeKit

Thanks to the release of Mixtral, the Mixture of Experts (MoE) architecture has become popular in recent months. This architecture offers an interesting tradeoff: higher performance at the cost of increased VRAM usage. While Mixtral and other MoE architectures are pre-trained from scratch, another method of creating MoE has recently appeared. Thanks to Arcee’s MergeKit…
Blog

apoorvakumar169

·

Jan 18, 2025

Merge Large Language Models with MergeKit

Model merging is a technique that combines two or more LLMs into a single model. It’s a relatively new and experimental method to create new models for cheap (no GPU required). Model merging works surprisingly well and produced many state-of-the-art models on the Open LLM Leaderboard. In this tutorial, we will implement it using the…
Blog

apoorvakumar169

·

Jan 18, 2025

ExLlamaV2 The Fastest Library to Run LLMs

Quantizing Large Language Models (LLMs) is the most popular approach to reduce the size of these models and speed up inference. Among these techniques, GPTQ delivers amazing performance on GPUs. Compared to unquantized models, this method uses almost 3 times less VRAM while providing a similar level of accuracy and faster generation. It became so…
Blog

apoorvakumar169

·

Jan 18, 2025

Decoding Strategies in Large Language Models

In the fascinating world of large language models (LLMs), much attention is given to model architectures, data processing, and optimization. However, decoding strategies like beam search, which play a crucial role in text generation, are often overlooked. In this article, we will explore how LLMs generate text by delving into the mechanics of greedy search…
Blog

apoorvakumar169

·

Jan 18, 2025

Quantize Llama models with GGUF and llama.cpp

Due to the massive size of Large Language Models (LLMs), quantization has become an essential technique to run them efficiently. By reducing the precision of their weights, you can save memory and speed up inference while preserving most of the model’s performance. Recently, 8-bit and 4-bit quantization unlocked the possibility of running LLMs on consumer…
Blog

apoorvakumar169

·

Jan 18, 2025

4-bit LLM Quantization with GPTQ

Recent advancements in weight quantization allow us to run massive large language models on consumer hardware, like a LLaMA-30B model on an RTX 3090 GPU. This is possible thanks to novel 4-bit quantization techniques with minimal performance degradation, like GPTQ, GGML, and NF4. 🧠 Optimal Brain Quantization For every layer \( \ell \) in the…