fbpx

Latest Posts

  • Demystifying LLMs: A Deep Dive into Large Language Models

    ·

    Demystifying LLMs: A Deep Dive into Large Language Models

    This blog post will delve into the intricacies of LLMs, exploring their inner workings, capabilities, future directions, and potential security concerns.

  • Building a LLM in 2024: A Detailed Guide

    ·

    Building a LLM in 2024: A Detailed Guide

    This guide delves into the process of building an LLM from scratch, focusing on the often-overlooked aspects of training and data preparation. We’ll also touch on fine-tuning, inference, and the importance of sharing your work with the community.

  • LISA: A Simple But Powerful Way to Fine-Tune LLM Efficiently

    ·

    LISA: A Simple But Powerful Way to Fine-Tune LLM Efficiently

    LISA introduces a surprisingly simple yet effective strategy for fine-tuning LLMs. It builds upon a key observation about LoRA: the weight norms across different layers exhibit an uncommon skewness. The bottom and top layers tend to dominate the updates, while the middle layers contribute minimally.

  • QMoE: Bringing Trillion-Parameter Models to Commodity Hardware

    ·

    QMoE: Bringing Trillion-Parameter Models to Commodity Hardware

    This blog post delves into QMoE, a novel compression and execution framework that tackles the memory bottleneck of massive MoEs. QMoE introduces a scalable algorithm that compresses trillion-parameter MoEs to less than 1 bit per parameter, utilizing a custom format and bespoke GPU decoding kernels for efficient end-to-end compressed inference.

  • AnimateDiff: Paper Explained

    ·

    AnimateDiff: Paper Explained

    Introducing AnimateDiff, a groundbreaking framework that empowers you to animate your personalized T2I models without the need for complex, model-specific tuning. This means you can now breathe life into your unique creations and watch them come alive in smooth, visually-appealing animations.

  • Jamba : A hybrid model (GPT + Mamba) by AI 21 Labs

    ·

    Jamba : A hybrid model (GPT + Mamba) by AI 21 Labs

    Jamba boasts a 256K context window, allowing it to consider a vast amount of preceding information when processing a task. This extended context window is particularly beneficial for tasks requiring a deep understanding of a conversation or passage.

  • DBRX: A New State-of-the-Art Open LLM by Databricks

    ·

    DBRX: A New State-of-the-Art Open LLM by Databricks

    DBRX utilizes a transformer-based decoder-only architecture with a fine-grained Mixture-of-Experts (MoE) design. This means it uses a large number of smaller expert models to process different parts of the input, rather than relying on a single massive model.

  • Fine-tune an Instruct model over raw text data

    ·

    Fine-tune an Instruct model over raw text data

    This experiment seeks to discover a lighter approach that navigates between the constraints of a 128K context window and the complexities of a model fine-tuned on billions of tokens, perhaps more in the realm of tens of millions of tokens. For a smaller-scale test, I’ll fine-tune Mistral’s 7B Instruct v0.2 model on The Guardian’s manage-frontend…

  • Crew AI Tutorial

    ·

    Crew AI Tutorial

    In the realm of artificial intelligence, the adoption of multi-agent systems (MAS) via crew ai represents a paradigm shift towards more dynamic and complex problem-solving capabilities. This blog dives into the essence of Multi Agent Systems, highlighting the necessity for such systems in today’s technological landscape and exploring the CrewAI framework as a possible solution.