Latest Posts
-
·
Unveiling Smaug: The Open-Source LLM King
Abacus.ai has introduced Smaug, an enhanced iteration of Qwen_72B by Alibaba, which unarguably stands as the new sovereign of open-source, being the inaugural open-source model to ever achieve an average score of 80 across various benchmarks. Additionally, it stands as irrefutable evidence that we have at last discovered a definitive method that narrows the divide…
-
·
Paper – GPT-NeoX
GPT-NeoX-20B is an autoregressive language model trained on the Pile, and the largest dense autoregressive model that had publicly available weights at the time of submission. Model Architecture GPT-NeoX-20B’s architecture largely follows that of GPT-3 with a few notable deviations. It has 44 layers, a hidden dimension size of 6144, and 64 heads. Rotary Positional…
-
·
Paper – UniLMv2
UniLMv2 introduces a novel training procedure, PMLM, which enables efficient learning of inter-relations between corrupted tokens and context via autoencoding, as well as intra-relations between masked spans via partially autoregressive modeling, significantly advancing the capabilities of language models in diverse NLP tasks. Overview of PMLM pre-training. The model parameters are shared across the LM objectives.…
-
·
Paper – UniLM
UNIfied pre-trained Language Model (UNILM)is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction, by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on, thus can be fine-tuned for both natural language understanding and generation tasks. Methodology Overview of unified LM pre-training.…
-
·
Paper – Zephyr
Zephyr is 7B LLM that utilizes distilled Direct Preference Optimization (dDPO) that significantly improves intent alignment and AI Feedback (AIF) preference data to achieve superior intent alignment in chat-based language modeling without requiring human annotation. Method The approach follows similar stages as InstructGPT. Distilled Supervised Fine-Tuning (dSFT) Starting with a raw LLM, it first needs…
-
·
Paper – CodeFusion
Auto-regressive models for code generation have a limitation: they do not easily allow reconsidering earlier tokens generated. CodeFusion is a 75M pre-trained diffusion code generation model that addresses this limitation by iteratively denoising a complete program conditioned on the encoded natural language. Architecture Architecture diagram for CodeFusion showing the Encoder (E), Denoiser (N) and the…
-
·
Paper – LLemma
Llemma is an LLM for mathematics. Formed by continued pretraining of Code Llama on Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code. Llemma is capable of tool use and formal theorem proving without any further finetuning. Data Proof-Pile-2, a 55B-token mixture of scientific papers, web data containing mathematics, and mathematical…
-
·
Paper – GPT4V
GPT-4 with vision (GPT-4V) enables users to instruct GPT-4 to analyze image inputs provided by the user. Incorporating additional modalities (such as image inputs) into LLMs is a key frontier in artificial intelligence research and development. Similar to GPT-4, the GPT-4V pre-trained model was first trained to predict the next word in a document, using…
-
·
Paper – GPT4
GPT-4 is a large-scale, multimodal Transformer based model pre-trained to predict the next token in a document, which can accept image and text inputs and produce text outputs. GPT-4 is trained using both publicly available data (such as internet data) and data licensed from third-party providers. The post-training alignment process i.e. fine-tuning using Reinforcement Learning…