I don't know what the Y-axis is supposed to be on that Wharton AI capabilities graph, but I am not really convinced that Opus 4.6 has more than double the intelligence/capability/whatever of GPT 5.1 Max.
IIRC that graph tracks capabilities as time_to_solve a task for humans (i.e. the model can now handle tasks that usually take a human ~8h). Which, depending on what tasks you look at, could be a reasonable finding. I could see Opus 4.6 handling tasks that take ~8h for humans, and that 5.1 couldn't previously handle (with 5.1 being "limited" at 4h tasks let's say). It is a bit arbitrary, but I think this is what they're tracking.
Lindy's Law is not actually a law and many exact minds will be provoked by the very name; it also fails spectacularly in certain contexts (e.g. lifetime of a single organism, though not necessarily existence of entire species).
But at the same time, I am willing to take its invocation in the context of AI somewhat seriously. There is an international arms race with China, which has less compute, but more engineers and scientists. This sort of intellectual arms race does not exhaust itself easily.
A similar space race in the 1950s and 1960s progressed from first unmanned spaceflight to a moonwalk in mere 12 years, which is probably less than what it takes to approve a bicycle lane in Chicago now.
I keep seeing this. Where did it come from? Has China said that they intend to attack other countries using AI? Have other countries declared that they intend to attack China with AI?
Also, why does anyone believe that AI could actually be that dangerous, given it's inherent unpredictable and unreliable performance? I would be terrified to rely on AI in a life or death situation.
AI in war is like Palintirs whole business model. You have a system that can effectively deal with ambiguity and has superhuman performance on reasoning plus superhuman physical abilities via embodiment…
Inherent unpredictable and unreliable performance is also quite the feature of human beings as well.
It was a metaphor. I meant, and later clarified, an intellectual arms race.
BTW your handle is an actual Czech word, minus a diacritic sign ("křupan"), and a bit amusing one. It basically means hillbilly. Not that it matters, just FYI.
"Exponentials all tend to become sigmoids but you can't predict exactly when" is a true statement, but I'm not sure it needed an article.
This doesn't say much, and the author fights their own points a couple times, suggesting that they maybe didn't think through what they wanted to write until they were in the middle of writing it and started realizing their assumptions didn't match what they expected the data to say.
I don't know what the Y-axis is supposed to be on that Wharton AI capabilities graph, but I am not really convinced that Opus 4.6 has more than double the intelligence/capability/whatever of GPT 5.1 Max.
IIRC that graph tracks capabilities as time_to_solve a task for humans (i.e. the model can now handle tasks that usually take a human ~8h). Which, depending on what tasks you look at, could be a reasonable finding. I could see Opus 4.6 handling tasks that take ~8h for humans, and that 5.1 couldn't previously handle (with 5.1 being "limited" at 4h tasks let's say). It is a bit arbitrary, but I think this is what they're tracking.
"It is a bit arbitrary, but I think this is what they're tracking."
I don't know if they can get their numbers right this way, but this seems a way more useful metric, than theoretic capabilities.
https://metr.org/time-horizons/ on linear scale. Clickbait garbage article as most of his in the last year.
…yeah, that’s where you see the exponential?
But they do explain the improvement of AI driving 2017-2021 vs 2022-2026.
Well, curve shape aside, the high watermark might be lower than where it tapers off.
https://news.ycombinator.com/item?id=46199723
News flash: predicting the future is hard
The individual who is the best at predicting the future is predicting ASI and full labor automation by 2040:
https://xcancel.com/peterwildeford/status/202963666232244661...
Past results is no guarantee of future performance.
Hmmm, this is quite an interesting take by Scott.
Lindy's Law is not actually a law and many exact minds will be provoked by the very name; it also fails spectacularly in certain contexts (e.g. lifetime of a single organism, though not necessarily existence of entire species).
But at the same time, I am willing to take its invocation in the context of AI somewhat seriously. There is an international arms race with China, which has less compute, but more engineers and scientists. This sort of intellectual arms race does not exhaust itself easily.
A similar space race in the 1950s and 1960s progressed from first unmanned spaceflight to a moonwalk in mere 12 years, which is probably less than what it takes to approve a bicycle lane in Chicago now.
"There is an international arms race with China"
I keep seeing this. Where did it come from? Has China said that they intend to attack other countries using AI? Have other countries declared that they intend to attack China with AI?
Also, why does anyone believe that AI could actually be that dangerous, given it's inherent unpredictable and unreliable performance? I would be terrified to rely on AI in a life or death situation.
AI in war is like Palintirs whole business model. You have a system that can effectively deal with ambiguity and has superhuman performance on reasoning plus superhuman physical abilities via embodiment…
Inherent unpredictable and unreliable performance is also quite the feature of human beings as well.
https://www.forbes.com/sites/greatspeculations/2025/11/25/wh...
It was a metaphor. I meant, and later clarified, an intellectual arms race.
BTW your handle is an actual Czech word, minus a diacritic sign ("křupan"), and a bit amusing one. It basically means hillbilly. Not that it matters, just FYI.
https://xkcd.com/605/
"Exponentials all tend to become sigmoids but you can't predict exactly when" is a true statement, but I'm not sure it needed an article.
This doesn't say much, and the author fights their own points a couple times, suggesting that they maybe didn't think through what they wanted to write until they were in the middle of writing it and started realizing their assumptions didn't match what they expected the data to say.
I really don't get the point of what I just read.
If you use the log scale you'll see that the time horizon of opus 4.6 was as expected...
As expected by the exponential. The Wharton study was predicting when the exponential would turn into a sigmoid.
Everything is linear on a log log scale with a fat marker.
A lot of words to say "The initial part of a sigmoidal curve is not very informative about the parameters of the sigmoid function in question."
That is true, but I generally enjoy reading a lot of words from Scott, who has a talent for writing.
The entire plot of the Lord of the Rings could probably be compressed into less than 10 kB of text too.
Edit: this seems to be a controversial comment, but IMHO a blog of Scott Alexander's type is an art form, not just a communication channel.