Paper – Mistral

Mistral 7B is an LLM engineered for superior performance and efficiency. It leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. Mistral 7B outperforms the best open 13B model (Llama 2) across all evaluated benchmarks, and the best released … Continue reading Paper – Mistral