fbpx

Paper – Alpaca

Alpaca is fine-tuned from Meta’s LLaMA 7B model. The Alpaca model is trained on 52K instruction-following demonstrations generated in the style of self-instruct using text-davinci-003. On the self-instruct evaluation set, Alpaca shows many behaviors similar to OpenAI’s text-davinci-003 but is also surprisingly small and easy/cheap to reproduce.

Alpaca is intended only for academic research and any commercial use is prohibited. There are three factors in this decision:

  1. Alpaca is based on LLaMA, which has a non-commercial license, so we necessarily inherit this decision.
  2. The instruction data is based on OpenAI’s text-davinci-003, whose terms of use prohibit developing models that compete with OpenAI.
  3. Adequate safety measures have not been designed, so Alpaca is not ready to be deployed for general use.

Training

There are two important challenges to training a high-quality instruction-following model under an academic budget: a strong pretrained language model and high-quality instruction-following data.

The first challenge is addressed with the recent release of Meta’s new LLaMA models.

For the second challenge, instruction-following demonstrations were generated by building upon the self-instruct method. The 175 human-written instruction-output pairs from the self-instruct seed set were started with. More instructions were then generated using the seed set as in-context examples, prompted by text-davinci-003. 52K unique instructions and the corresponding outputs resulted from our data generation process using the OpenAI API.

After being equipped with this instruction-following dataset, the LLaMA models were fine-tuned.

Preliminary Evaluation

Alpaca is evaluated by conducting human evaluations on the inputs from the self-instruct evaluation set. A blind pairwise comparison was performed between text-davinci-003 and Alpaca 7B, and it was found that these two models exhibit very similar performance: 90 comparisons were won by Alpaca against text-davinci-003, compared to 89.

Known limitations

  • Alpaca also exhibits several common deficiencies of language models, including hallucination, toxicity, and stereotypes. Hallucination in particular seems to be a common failure mode for Alpaca, even compared to text-davinci-003.
  • Furthermore, Alpaca can be used to generate well-written outputs that spread misinformation.

Paper

Alpaca: A Strong, Replicable Instruction-Following Model