Paper – GPT3

GPT-3 is an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model. It demonstrates that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art finetuning approaches. Model and Architectures GPT-3 uses the same model architecture as GPT-2, including the modified initialization, pre-normalization, … Continue reading Paper – GPT3