Llama-2-70b is almost as strong at factuality as gpt-4, and considerably better than gpt-3.5-turbo.
Fleszarjacek

Fleszarjacek @fleszar

About: Programmer Database Python Java Data Science Data Analyst

Joined:
Aug 9, 2023

Llama-2-70b is almost as strong at factuality as gpt-4, and considerably better than gpt-3.5-turbo.

Publish Date: Aug 25 '23
0 0

We used to compare Llama 2 7b, 13b and 70b (chat-hf fine-tuned) vs OpenAI gpt-3.5-turbo and gpt-4. We used a 3-way verified hand-labeled set of 373 news report statements and presented one correct and one incorrect summary of each. Each LLM had to decide which statement was the factually correct summary.😭
[(https://link.medium.com/ugIcBrTXxCb)

Comments 0 total

    Add comment