3 points | by ytpete 6 hours ago ago
1 comments
- When GPT 4 was asked to evaluate resume executive summaries, it preferred ones written by GPT over human-written ones > 93% of the time.
- Similar "bias" was exhibited by other models including LLaMA 3.3 and Deepseek v3.
- Even when human annotators judged the human-written summary to be higher quality, leading LLMs still preferred their own writing 67-82% of the time.
- Preference was stronger in larger models.
- In several cases, LLMs also prefer their own writing over that of other LLMs.
There's a pretty decent longer summary in this thread where I first heard about the article: https://x.com/heynavtoor/status/2048088874686300431
- When GPT 4 was asked to evaluate resume executive summaries, it preferred ones written by GPT over human-written ones > 93% of the time.
- Similar "bias" was exhibited by other models including LLaMA 3.3 and Deepseek v3.
- Even when human annotators judged the human-written summary to be higher quality, leading LLMs still preferred their own writing 67-82% of the time.
- Preference was stronger in larger models.
- In several cases, LLMs also prefer their own writing over that of other LLMs.
There's a pretty decent longer summary in this thread where I first heard about the article: https://x.com/heynavtoor/status/2048088874686300431