1 points | by gpvos 7 hours ago ago
1 comments
The underlying data looks scarce. If there's only a few questions per "category" of bullshit they can easily be gamed to favor one model over another.
The underlying data looks scarce. If there's only a few questions per "category" of bullshit they can easily be gamed to favor one model over another.