fbpx

Latest Posts

  • Unveiling Smaug: The Open-Source LLM King

    ·

    Unveiling Smaug: The Open-Source LLM King

    Abacus.ai has introduced Smaug, an enhanced iteration of Qwen_72B by Alibaba, which unarguably stands as the new sovereign of open-source, being the inaugural open-source model to ever achieve an average score of 80 across various benchmarks. Additionally, it stands as irrefutable evidence that we have at last discovered a definitive method that narrows the divide…

  • Paper – GPT-NeoX

    ·

    Paper – GPT-NeoX

    GPT-NeoX-20B is an autoregressive language model trained on the Pile, and the largest dense autoregressive model that had publicly available weights at the time of submission. Model Architecture GPT-NeoX-20B’s architecture largely follows that of GPT-3 with a few notable deviations. It has 44 layers, a hidden dimension size of 6144, and 64 heads. Rotary Positional…

  • Paper – UniLMv2

    ·

    Paper – UniLMv2

    UniLMv2 introduces a novel training procedure, PMLM, which enables efficient learning of inter-relations between corrupted tokens and context via autoencoding, as well as intra-relations between masked spans via partially autoregressive modeling, significantly advancing the capabilities of language models in diverse NLP tasks. Overview of PMLM pre-training. The model parameters are shared across the LM objectives.…

  • Paper – UniLM

    ·

    Paper – UniLM

    UNIfied pre-trained Language Model (UNILM)is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction, by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on, thus can be fine-tuned for both natural language understanding and generation tasks. Methodology Overview of unified LM pre-training.…

  • Paper – Zephyr

    ·

    Paper – Zephyr

    Zephyr is 7B LLM that utilizes distilled Direct Preference Optimization (dDPO) that significantly improves intent alignment and AI Feedback (AIF) preference data to achieve superior intent alignment in chat-based language modeling without requiring human annotation. Method The approach follows similar stages as InstructGPT. Distilled Supervised Fine-Tuning (dSFT) Starting with a raw LLM, it first needs…

  • Paper – CodeFusion

    ·

    Paper – CodeFusion

    Auto-regressive models for code generation have a limitation: they do not easily allow reconsidering earlier tokens generated. CodeFusion is a 75M pre-trained diffusion code generation model that addresses this limitation by iteratively denoising a complete program conditioned on the encoded natural language. Architecture Architecture diagram for CodeFusion showing the Encoder (E), Denoiser (N) and the…

  • Paper – LLemma

    ·

    Paper – LLemma

    Llemma is an LLM for mathematics. Formed by continued pretraining of Code Llama on Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code. Llemma is capable of tool use and formal theorem proving without any further finetuning. Data Proof-Pile-2, a 55B-token mixture of scientific papers, web data containing mathematics, and mathematical…

  • Paper – GPT4V

    ·

    Paper – GPT4V

    GPT-4 with vision (GPT-4V) enables users to instruct GPT-4 to analyze image inputs provided by the user. Incorporating additional modalities (such as image inputs) into LLMs is a key frontier in artificial intelligence research and development. Similar to GPT-4, the GPT-4V pre-trained model was first trained to predict the next word in a document, using…

  • Paper – GPT4

    ·

    Paper – GPT4

    GPT-4 is a large-scale, multimodal Transformer based model pre-trained to predict the next token in a document, which can accept image and text inputs and produce text outputs. GPT-4 is trained using both publicly available data (such as internet data) and data licensed from third-party providers. The post-training alignment process i.e. fine-tuning using Reinforcement Learning…