I don't know about AGI but I got bored and ran my plans for a new garage by Opus 4.6 and it was giving me some really surprising responses that have changed my plans a little. At the same time, it was also making some nonsense suggestions that no person would realistically make. When I prompted it for something in another chat which required genuine creativity, it fell flat on its face.
I dunno, mixed bag. Value is positive if you can sort the wheat from the chaff for the use cases I've ran by it. I expect the main place it'll shine for the near and medium term is going over huge data sets or big projects and flagging things for review by humans.
Here's a thought. Lets all arbitrarily agree AGI is here. I can't even be bothered discussing what the definition of AGI is. It's just here, accept it. Or vice versa.
Now what....? Whats happening right now that should make me care that AGI is here (or not). Whats the magic thing thats happening with AGI that wasn't happening before?
<looks out of window>
<checks news websites>
<checks social media...briefly>
<asks wife>
Right, so, not much has changed from 1-2 years ago that I can tell. The job markets a bit shit if you're in software...is that what we get for billions of dollars spent?
I think corporations can definitely transform society in the near future. I don't think it will be a positive transformation, but it will be a transformation.
Most of all, AI will exacerbate the lack of trust in people and institutions that was kicked into high gear by the internet. It will be easy and cheap to convince large numbers of people about almost anything.
We were pretty close to a collapse of the existing financial system. Maybe we’d be better off now if it happened, but the interim devastation would have been costly.
yeah, this is a good point, transition and transformation to new technologies takes time. I'm not sure I agree the current state is upending things though. It's forcing some adaption for sure, but the status quo remains.
Allow me to clarify: I'm not wishing for change. I am an AI pessimist. I think our society is not prepared to deal with what's about to happen. You're right: AI is the key to the enshitification of everything, most of all trust.
> Here's a thought. Lets all arbitrarily agree AGI is here.
A slightly different angle on this - perhaps AGI doesn't matter (or perhaps not in the ways that we think).
LLMs have changed a lot in software in the last 1-2 years (indeed, the last 1-2 months); I don't think it's a wild extrapolation to see that'll come to many domains very soon.
> The transformer architectures powering current LLMs are strictly feed-forward.
This is true in a specific contextual sense (each token that an LLM produces is from a feed-forward pass). But untrue for more than a year with reasoning models, who feed their produced tokens back as inputs, and whose tuning effectively rewards it for doing this skillfully.
Heck, it was untrue before that as well, any time an LLM responded with more than one token.
> A [March] 2025 survey by the Association for the Advancement of Artificial Intelligence (AAAI), surveying 475 AI researchers, found that 76% believe scaling up current AI approaches to achieve AGI is "unlikely" or "very unlikely" to succeed.
I dunno. This survey publication was from nearly a year ago, so the survey itself is probably more than a year old. That puts us at Sonnet 3.7. The gap between that and present day is tremendous.
I am not skilled enough to say this tactfully, but: expert opinions can be the slowest to update on the news that their specific domain may have, in hindsight, have been the wrong horse. It's the quote about it being difficult to believe something that your income requires to be false, but instead of income it can be your whole legacy or self concept. Way worse.
> My take is that research taste is going to rely heavily on the short-duration cognitive primitives that the ARC highlights but the METR metric does not capture.
I don't have an opinion on this, but I'd like to hear more about this take.
Thanks for reading, and I really appreciate your comments!
> who feed their produced tokens back as inputs, and whose tuning effectively rewards it for doing this skillfully
Ah, this is a great point, and not something that I considered. I agree that the token feedback does change the complexity, and it seems that there's even a paper by the same authors about this very thing! https://arxiv.org/abs/2310.07923
I'll have to think on how that changes things. I think it does take the wind out of the architecture argument as it's currently stated, or at least makes it a lot more challenging. I'll consider myself a victim of media hype on this, as I was pretty sold on this line of argument after reading this article https://www.wired.com/story/ai-agents-math-doesnt-add-up/ and the paper https://arxiv.org/pdf/2507.07505 ... who brush this off with:
>Can the additional think tokens provide the necessary complexity to correctly
solve a problem of higher complexity? We don't believe so, for two fundamental reasons: one that
the base operation in these reasoning LLMs still carries the complexity discussed above, and the
computation needed to correctly carry out that very step can be one of a higher complexity (ref our
examples above), and secondly, the token budget for reasoning steps is far smaller than what
would be necessary to carry out many complex tasks.
In hindsight, this doesn't really address the challenge.
My immediate next thought is - even solutions up to P can be represented within the model / CoT, do we actually feel like we are moving towards generalized solutions, or that the solution space is navigable through reinforcement learning? I'm genuinely not sure about where I stand on this.
> I don't have an opinion on this, but I'd like to hear more about this take.
It's general-purpose enough to do web development. How far can you get from writing programs and seeing if you get the answers you intended? If English words are "grounded" by programming, system administration, and browsing websites, is that good enough?
Now that understanding video and projecting what happens next indicates we're getting past the LLM problem of lacking a world model. That's encouraging.
There's more than one way to do intelligence. Basic intelligence has evolved independently three times that we know of - mammals, corvids, and octopuses. All three show at least ape-level intelligence, but the species split before intelligence developed, and the brain architectures are quite different. Corvids get more done with less brain mass than mammals, and don't have a mammalian-type cortex. Octopuses have a distributed brain architecture, and have a more efficient eye design than mammals.
I don't think those are examples of unique intelligence except perhaps in a chauvinistic, anthropomorphic sense. We only know that we can't get other animals to display patterns we associate with intelligence in humans, however truthfully that's just as likely to be that our measures of intelligence don't map cleanly onto cognitive/perceptual representations innate to other animals. As we look for new ways to challenge animals that respect their innate differences, we're finding "simple" organisms like ants and spiders are surprisingly capable.
For a clear analogy, consider how tokenization causes LLMs to behave stupidly in certain cases, even though they're very capable in others.
I don't really understand the argument that AGI cannot be achieved just by scaling current methods. I too believe that (for any sane level of scaling anyway), but this-year's LLMs are not using entirely last-year's methods. And they, in turn, are using methods that weren't used the year before.
It seems like a prediction like "Bob won't become a formula one driver in a minivan". It's true, but not very interesting.
If Bob turned up a couple of years later in Formula one, you'd probably be right in saying that what he is driving is not a mini van. The same is true for AGI anyone who says it can't be done with current methods can point to any advancement along the way and say that's the difference.
A better way to frame it would be, is there any fundimental, quantifiable ability that is blocking AGI? I would not be surprised if the breakthrough technique has been created, but the research has not described the problem that it solves well enough for us to know that it is the breakthrough.
I realise that, for some the notion of AGI is relatively new, but some of us have been considering the matter for some time. I suspect my first essay on the topic was around 1993. It's been quite weird watching people fall into all of the same philosophical potholes that were pointed out to us at university.
Then you don't understand Machine Learning in any real way. Literally the 3rd or 4th thing you learn about ML is that for any given problem, there is an ideal model size. Just making the model bigger doesn't work because of something called the curse of dimensionality. This is something we have discovered about every single problem and type of learning algorithm used in ML. For LLMs, we probably moved past the ideal model size about 18 months ago. From the POV of something who actually learned ML in school (from the person who coined the term), I see no real reason to think that AGI will happen based upon the current techniques. Maybe someday. Probably not anytime soon.
PS The first thing you learn about ML is to compare your models to random to make sure the model didn't degenerate during training.
Um, what? Are you interpreting scaling to mean adding parameters and nothing else?
I'm not entirely sure where you get your confidence that we've past the ideal model size from, but at least that's a clear prediction so you should be able to tell if and when you are proven wrong.
Just for the record, do you care to put an actual number on something we won't go past?
[edit]
Vibe check on user comes out as
Contrarian 45%
Pedantic 35%
Skeptical 15%
Direct 5%
There was a meme going around that said the fall of Rome was an unannounced anticlimactic event where one day someone went out and the bridge wasn't ever repaired.
Maybe AGI's arrival is when one day someone is given an AI to supervise instead of a new employee.
Just a user who's followed the whole mess, not a researcher. I wonder if the scaffolding and bolt-ons like reasoning will sufficiently be an asymptote to 'true AGI'. I kept reading about the limits of transformers around GPT-4 and Opus 3 time, and then those seem basic compared to today.
I gave up trying to guess when the diminishing returns will truly hit, if ever, but I do think some threshold has been passed where the frontier models are doing "white collar work as an API" and basic reasoning better than the humans in many cases, and once capital familiarizes themselves with this idea more, it's going to get interesting.
But it's already like that; models are better than many workers, and I'm supervising agents. I'd rather have the model than numerous juniors; esp. the kind that can't identify the model's mistakes.
The problem becomes your retirement. Sure, you've earned "expert" status, but all the junior developers won't be hired, so they'll never learn from junior mistakes. They'll blindly trust agents and not know deeper techniques.
From my experience, if you think AI is better than most workers, you're probably just generating a whole bunch of semi-working garbage, accepting that input as good enough and will likely learn the hardware your software is full of bugs and incorrect logic.
This is my greatest cause for alarm regarding LLM adoption. I am not yet sure AI will ever be good enough to use without experts watching them carefully; but they are certainly good enough that non-experts cannot tell the difference.
I don't think this is how it'll play out, and I'm generally a bit skeptical of the 'agent' paradigm per se.
There doesn't seem to be a reason why AIs should act as these distinct entities that manage each other or form teams or whatever.
It seems to me way more likely that everything will just be done internally in one monolithic model. The AIs just don't have the constraints that humans have in terms of time management, priority management, social order, all the rest of it that makes teams of individuals the only workable system.
AI simply scales with the compute resources made available, so it seems like you'd just size those resources appropriately for a problem, maybe even on demand, and have a singluar AI entity (if it's even meaningful to think of it as such, even that's kind of an anthropomorphisation) just do the thing. No real need for any organisational structure beyond that.
So I'd think maybe the opposite, seems like what agents really means is a way to use fundamentally narrow/limited AI inside our existing human organisations and workflows, directed by humans. Maybe AGI is when all that goes away because it's just obviously not necessary any more.
I used to also believe along these lines but lately I'm not so sure.
I'm honestly shocked by the latest results we're seeing with Gemini 3 Deep Think, Opus 4.6, and Codex 5.3 in math, coding, abstract reasoning, etc. Deep Think just scored 84.6% on ARC-AGI-2 (https://deepmind.google/models/gemini/)! And these benchmarks are supported by my own experimentation and testing with these models ~ specifically most recently with Opus 4.6 doing things I would have never thought possible in codebases I'm working in.
These models are demonstrating an incredible capacity for logical abstract reasoning of a level far greater than 99.9% of the world's population.
And then combine that with the latest video output we're seeing from Seedance 2.0, etc showing an incredible level of image/video understanding and generation capability.
I was previously deeply skeptical that the architecture we have would be sufficient to get us to AGI. But my belief in that has been strongly rattled lately. Honestly I think the greatest gap now is simply one of orchestration, data presentation, and work around in-context memory representations - that is, converting work done into real world into formats/representations, etc. amenable for AI to run on (text conversion, etc.) and keeping new trained/taught information in context to support continual learning.
>These models are demonstrating an incredible capacity for logical abstract reasoning of a level far greater than 99.9% of the world's population.
This is the key I think that Altman and Amodei see, but get buried in hype accusations. The frontier models absolutely blow away the majority of people on simple general tasks and reasoning. Run the last 50 decisions I've seen locally through Opus 4.6 or ChatGPT 5.2 and I might conclude I'd rather work with an AI than the human intelligence.
It's a soft threshold where I think people saw it spit out some answers during the chat-to-LLM first hype wave and missed that the majority of white collar work (I mean it all, not just the top software industry architects and senior SWEs) seems to come out better when a human is pushed further out of the loop. Humans are useful for spreading out responsibility and accountability, for now, thankfully.
LLMs are very good at logical reasoning in bounded systems. They lack the wisdom to deal with unbounded systems efficiently, because they don't have a good sense of what they don't know or good priors on the distribution of the unexpected. I expect this will be very difficult to RL in.
State of the Art Large Language Models are already Generally Intelligent, in so far as the term has any useful meaning. Their biggest weakness are long horizon planning competency, and spatial reasoning and navigation, both of which continue to improve steadily and are leaps and bounds above where they were a few years ago. I don't think there's any magic wall. Eventually they will simply get good enough, just like everything else.
I think that AGI has already happened, but it's not well understood, nor well distributed yet.
OpenClaw, et al, are one thing that got me nudged a little bit, but it was Sammy Jankis[1,2] that pushed me over the edge, with force. It's janky as all get out, but it'll learn to build it's own memory system on top of an LLM which definitely forgets.
The Sammy Jankis link was certainly interesting. Thanks for sharing.
Whether or not AGI is imminent, and whether or not Sammy Jankis is or will be conscious... it's going to become so close that for most people, there will be no difference except to philosophers.
Is AGI 'right around the corner' or currently already achieved? I agree with the author, no, we have something like 10 years to go IMO. At the end of the post he points to the last 30 years of research, and I would accept that as an upper bound. In 10 to 30 years, 99% of people won't be able to distinguish between an 'AGI' and another person when not in meatspace.
Our brains evolved to hunt prey, find mates, and avoid becoming hunted ourselves. Those three tasks were the main factors for the vast majority of evolutionary history.
We didn't evolve our brains to do math, write code, write letters in the right registers to government institutions, or get an intuition on how to fold proteins. For us, these are hard tasks.
That's why you get AI competing at IMO level but unable to clean toilets or drive cars in all of the settings that humans do.
I'm not excited about a future where the division of labor is something like: AI does all of the interesting stuff and the humans clean the toilets. Especially now that I'm older and my joints won't tolerate it.
It's not that AI is intrinsically better at software engineering, writing, or art than it is at learning how to clean toilets. It's not. The real issue is that cleaning toilets using humans is cheap.
That, sadly, is the incentive driving the current wave of AI innovation. Your job will be automated long before your household chores are.
Don't be ridiculous, AI will create robots that do all the work and the only use for humans will be as amusement for the rich who own everything. Probably not sarcasm, I don't even know.
> Our brains evolved to hunt prey, find mates, and avoid becoming hunted ourselves. Those three tasks were the main factors for the vast majority of evolutionary history.
That seems like a massive oversimplification of the things our brains evolved to do.
> We didn't evolve our brains to do math, write code, write letters in the right registers to government institutions, or get an intuition on how to fold proteins. For us, these are hard tasks.
And it took a massively long time for that to happen after we gained that capability. Human ingenuity really only took off after we put a lot of the work on writing and tools. It wasn't so much that humans created many of these, but the super human organism that uses language and writing to express ideas.
Until recently, philosophy of artificial intelligence seemed to be mostly about arguments why the Turing test was not a useful benchmark for intelligence. Pretty much everyone who had ever thought about the problem seriously had come to the same conclusion.
The fundamental issue was the assumption that general intelligence is an objective property that can be determined experimentally. It's better to consider intelligence an abstraction that may help us to understand the behavior of a system.
A system where a fixed LLM provides answers to prompts is little more than a Chinese room. If we give the system agency to interact with external systems on its own initiative, we get qualitatively different behavior. The same happens if we add memory that lets the system scale beyond the fixed context window. Now we definitely have some aspects of general intelligence, but something still seems to be missing.
Current AIs are essentially symbolic reasoning systems that rely on a fixed model to provide intuition. But the system never learns. It can't update its intuition based on its experiences.
Maybe the ability to learn in a useful way is the final obstacle on the way towards AGI. Or maybe once again, once we start thinking we are close to solving intelligence, we realize that there is more to intelligence than what we had thought so far.
The Turing test isn't as bad as people make it out to be. The naive version, where people just try to vibe out whether something is a human or not, is obviously wrong. On the other hand, if you set a good scientist loose on the Turing test, give them as many interactions as they want to come to a conclusion, and you let them build tools to assist in the analysis, it suddenly becomes quite interesting again.
For example, looking at the statistical distribution of the chat over long time horizons, and looking at input/output correlations in a similar manner would out even the best current models in a "Pro Turing Test." Ironically, the biggest tell in such a scenario would be excess capabilities AI displays that a human would not be able to match.
Humans will never accept we created AI, they'll go so far as to say we were not intelligent in the first place. That is the true power of the AI effect.
And yet another way to look at it is maybe current LLM agents are AGI, but it turns out that AGI in this form is actually not that useful because of its many limitations and solving those limitations will be a slow and gradual process.
I don't think that was the intent of the comment, more that true AGI should be so useful and transformative that it unlocks enough value and efficiencies to boost GDP. Much like the Industrial Revolution or harnessing electricity, instead of a fancy chatbot.
To my knowledge Turing test has not been blown out of the water. The forms I saw were time limited and participants were not pushed hard to interrogate.
I’m under the same impression. I don’t think LLMs are the path to AGI. The “intelligence” we see is mostly illusory. It’s statistical repetition of the mediocre minds who wrote content online.
The intelligence we think we recognize is simply an electronic parrot finding the right words in its model to make itself useful.
That's pre-training. Post training with RL can make models arbitrarily good at specific capabilities, and it's usually done via pooled human experts, so it's definitely not statistically mediocre.
The issue is that we're not modelling the problem, but a proxy for the problem. RL doesn't generalize very well as is, when you apply it to a loose proxy measure you get the abysmal data efficiency we see with LLMs. We might be able to brute-force "AGI" but we'd certainly do better with something more direct that generalizes better.
Maybe i'm misunderstanding your point, but human's have pretty abysmal data efficiency, too. We have to use tools for everything... ledgers, spreadsheets, data-bases, etc. It'll be the same for an AGI, there won't be any reason for it to remember every little detail, just be able to use the appropriate tool, when appropriate.
Until I can get a robot wife maid im not worried about or even confident I will ever see actual AGI. People have been predicting it for as long as fusion power and while progress has been made, we might still be like Romans dreaming of flight.
Dear sir, what does embodiment actually have to do with agi? Not much different than saying someone that is paralyzed is not intelligence.
More so, our recent advances in AI have massively accelerated robotics evolution. They are becoming smarter, faster, and more capable at an ever increasing rate.
> For example, transformer-based language models can't reliably do multi-digit arithmetic because they have no number sense, only statistical patterns over digit tokens.
I asked Opus 4.6 to multiply 50,651 x 895,707. It immediately responded: 45,368,055,057. This was wrong, but very close (good to about 10^-5 relative error). I then asked it how it arrived at that answer, and whether it had used any tool calls. It said:
```
No, I didn't use any specialized tools — I just computed it directly. Though I should be honest: large multiplications done "in my head" like this are prone to error. Let me double-check by breaking it down:
50,651 × 895,707
= 50,651 × (900,000 - 4,293)
= 50,651 × 900,000 - 50,651 × 4,293
First term: 50,651 × 900,000 = 45,585,900,000
Second term: 50,651 × 4,293
Result: 45,585,900,000 - 217,444,743 = 45,368,455,257
So my first answer was actually slightly off! The corrected answer is 45,368,455,257.
I don't have a calculator tool available — I'm doing this arithmetically by decomposition. For mission-critical calculations, I'd recommend verifying with a calculator since I can make arithmetic errors on large numbers like this.
```
So I would rate OP's claim (which is supported by a paper from 2024) highly implausible. Opus 4.6 appears to be able to do multi-digit arithmetic formally, as well as give remarkably accurate estimates based on something like "number sense".
I don’t think that I ever wanted to do an arithmetic operation with a computer and have a wrong answer as the result. One day, you’ll be happy with typing ls and have the system ‘rm -rf /‘ itself
Except we know how these work. There's no number sense. It's predicting tokens. It is able to recount the mathematical foundations because in its training dataset, that often happens, both in instructional material and in proofs.
I think it's really poor argument that AGI won't happen because model doesn't understand physical world. That can be trained the same way everything else is.
I think the biggest issue we currently have is with proper memory. But even that is because it's not feasible to post-train an individual model on its experiences at scale. It's not a fundamental architectural limitation.
When people move the goal posts for AGI toward a physical state, they are usually doing it so they can continue to raise more funding rounds at a higher valuation. Not saying the author is doing that.
AGI is a messy term, so to be concise, we have the models that can do work. What we lack is orchestration, management, and workflows to use models effectively. Give it 5 years and those will be built and they could be built using the models we have today (Opus 4.6 at the time of this message).
Manual orchestration is a brittle crutch IMO - you don't get to the moon by using longer and longer ladders. A powerful model in theory should be able to self orchestrate with basic tools and environment. The thing is that it also might be as expensive as a human to run - from a tokens AND liability perspective.
I just struck me - would be fun to re-read The Age of Spiritual Machines (Kurzweil, 1999.) I was so into it 26-27 years ago. The amount of ridicule this man has suffered on HN is immense.
That's not really the point. If our definition of AGI does not include "being able to reliably do logic" then what are we even talking about? We don't really need computers with human abilities--we have plenty of humans. We need computers with _better_ abilities.
OK, but "what we need" is not the question. If the definition of AGI is "as smart as the average human in all areas", then it doesn't matter if the average human is pretty useless at a lot of tasks, that's still the definition of AGI.
But I'd like to think that, even though you could find exceptions, the average human is never confused about whether dogs can lay eggs or not.
I've said it before and I'll say it again, all AI discussion feels like a waste of effort.
“yes it will”, “no it won’t” - nobody really knows, it's just a bunch of extremely opinionated people rehashing the same tired arguments across 800 comments per thread.
There’s no point in talking about it anymore, just wait to see how it all turns out.
>Imagine you had a frozen [large language] model that is a 1:1 copy of the average person, let’s say, an average Redditor. Literally nobody would use that model because it can’t do anything. It can’t code, can’t do math, isn’t particularly creative at writing stories. It generalizes when it’s wrong and has biases that not even fine-tuning with facts can eliminate. And it hallucinates like crazy often stating opinions as facts, or thinking it is correct when it isn't.
>The only things it can do are basic tasks nobody needs a model for, because everyone can already do them. If you are lucky you get one that is pretty good in a singular narrow task. But that's the best it can get.
>and somehow this model won't shut up and tell everyone how smart and special it is also it claims consciousness. ridiculous.
AGI is here. 90%+ of white collar work _can_ be done by an LLM. We are simply missing a tested orchestration layer. Speaking broadly about knowledge work here, there is almost nothing that a human is better at than Opus 4.6. Especially if you're a typical office worker whose job is done primarily on a computer, if that's all AGI is, then yeah, it's here.
I ran a quick experiment with Claude and Perplexity, both free versions. I input some retirement info (portfolios balances etc), my age, my desired retirement age etc. Simple stuff that a financial planner would have no issue with. Perplexity was very very good on the surface. Rarely made an obvious blunder or error, and was fast. Claude was much slower and despite me inputting my exact birthdate, kept messing up my age by as much as 18 months. This obviously screws up retirement planning. I also asked some questions about how RMDs would affect my taxes, and asked for some strategies. Perplexity was convinced that I should do a Roth conversion to max up to the 22% bracket, while Claude thought that the tax savings would be minimal.
Mind you, I used the EXACT same prompts. I don't know which model Perplexity was using since the free version has multiple it chooses from (including Claude 3.0).
Opus is the very best and I still throw away most of what it produces. If I did not carefully vet its work I would degrade my code bases so quickly.
To accurately measure the value of AI you must include the negative in your sum.
API Opus 4.6 will tell you it's still 2025, admit it's wrong then revert back to being convinced it's 2025 as it nears it's context limit.
I'll go so far as to say LLM agents are AGI-lite but saying we "just need the orchestration layer" is like saying ok we have a couple neurons, now we just need the rest of the human.
AGI is when it can do all intellectual work that can be done by humans. It can improve its own intelligence and create a feedback loop because it is as smart as the humans who created it.
This has always been my personal definition of AGI. But the market and industry doesn't agree. So I've backed off on that and have more or less settled on "can do most of the knowledge work that a human can do"
No, that is ASI. No human can do all intellectual work themselves. You have millions of different human models based on roughly the same architecture to do that.
When you have a single model that can do all you require, you are looking at something that can run billions of copies of itself and cause an intelligence explosion or an apocalypse.
"Artificial general intelligence (AGI) is a type of artificial intelligence that matches or surpasses human capabilities across virtually all cognitive tasks."
Why the super-high bar? What's unsatisfying is that aren't the 'dumbest' humans still a general intelligence that we're nearly past, depending how you squint and measure?
It feels like an arbitrary bar to perhaps make sure we aren't putting AIs over humans, which they are most certainly in the superhuman category on a rapidly growing number of tasks.
> there is almost nothing that a human is better at than Opus 4.6.
Lolwut. I keep having to correct Claude at trivial code organization tasks. The code it writes is correct; it’s just ham-fisted and violates DRY in unholy ways.
I’m very pro AI coding and use it all day long, but I also wouldn’t say “the code it writes is correct”. It will produce all kinds of bugs, vulnerabilities, performance problems, memory leaks, etc unless carefully guided.
That "simple orchestration layer" (paraphrased) is what I consider the AGI.
But yeah, I suspect LLM:s may actually get close enough. "Just" add more reasoning loops and corresponding compute.
It is objectively grotesquely wasteful (a human brain operates on 12 to 25 watts and would vastly outperform something like that), but it would still be cataclysmic.
If we can get AI down to this power requirement then it's over for humans. Just think of how many copies of itself thinking at the levels of the smartest humans it could run at once. Also where all the hardware could hide itself and keep itself powered around the world.
Yeah, but a human brain without the human attached to it is pretty useless. In the US, it averages out to around 2 kW per person for residential energy usage, or 9 kW if you include transportation and other primary energy usage too.
I don't know about AGI but I got bored and ran my plans for a new garage by Opus 4.6 and it was giving me some really surprising responses that have changed my plans a little. At the same time, it was also making some nonsense suggestions that no person would realistically make. When I prompted it for something in another chat which required genuine creativity, it fell flat on its face.
I dunno, mixed bag. Value is positive if you can sort the wheat from the chaff for the use cases I've ran by it. I expect the main place it'll shine for the near and medium term is going over huge data sets or big projects and flagging things for review by humans.
I've used for similar things, I've had some good and disastrous results. In a way I feel like I'm basically where I was "before AI".
Here's a thought. Lets all arbitrarily agree AGI is here. I can't even be bothered discussing what the definition of AGI is. It's just here, accept it. Or vice versa.
Now what....? Whats happening right now that should make me care that AGI is here (or not). Whats the magic thing thats happening with AGI that wasn't happening before?
<looks out of window> <checks news websites> <checks social media...briefly> <asks wife>
Right, so, not much has changed from 1-2 years ago that I can tell. The job markets a bit shit if you're in software...is that what we get for billions of dollars spent?
Cultural changes take time. It took decades for the internet to move from nerdy curiosity to an essential part of everyone's life.
The writing is on the wall. Even if there's no new advances in technology, the current state is upending jobs, education, media, etc
I really think corporations are overplaying their hand if they think they can transform society once again in the next 10 years.
Rapid de industrialization followed by the internet and social media almost broke our society.
Also, I don’t think people necessarily realize how close we were to the cliff in 2007.
I think another transformation now would rip society apart rather than take us to the great beyond.
I think corporations can definitely transform society in the near future. I don't think it will be a positive transformation, but it will be a transformation.
Most of all, AI will exacerbate the lack of trust in people and institutions that was kicked into high gear by the internet. It will be easy and cheap to convince large numbers of people about almost anything.
As a young adult in 2007, what cliff were we close to?
The GFC was a big recession, but I never thought society was near collapse.
We were pretty close to a collapse of the existing financial system. Maybe we’d be better off now if it happened, but the interim devastation would have been costly.
It felt like the entire global financial system had a chance of collapsing.
yeah, this is a good point, transition and transformation to new technologies takes time. I'm not sure I agree the current state is upending things though. It's forcing some adaption for sure, but the status quo remains.
It also took years for the Internet to be usable by most folks. It was hard, expensive and unpractical for decades.
Just about the time it hit the mainstream coincidentally, is when the enshitification began to go exponential. Be careful what you wish for.
Allow me to clarify: I'm not wishing for change. I am an AI pessimist. I think our society is not prepared to deal with what's about to happen. You're right: AI is the key to the enshitification of everything, most of all trust.
> Here's a thought. Lets all arbitrarily agree AGI is here.
A slightly different angle on this - perhaps AGI doesn't matter (or perhaps not in the ways that we think).
LLMs have changed a lot in software in the last 1-2 years (indeed, the last 1-2 months); I don't think it's a wild extrapolation to see that'll come to many domains very soon.
Before enlightenment^WAGI: chop wood, fetch water, prepare food
After enlightenment^WAGI: chop wood, fetch water, prepare food
> The transformer architectures powering current LLMs are strictly feed-forward.
This is true in a specific contextual sense (each token that an LLM produces is from a feed-forward pass). But untrue for more than a year with reasoning models, who feed their produced tokens back as inputs, and whose tuning effectively rewards it for doing this skillfully.
Heck, it was untrue before that as well, any time an LLM responded with more than one token.
> A [March] 2025 survey by the Association for the Advancement of Artificial Intelligence (AAAI), surveying 475 AI researchers, found that 76% believe scaling up current AI approaches to achieve AGI is "unlikely" or "very unlikely" to succeed.
I dunno. This survey publication was from nearly a year ago, so the survey itself is probably more than a year old. That puts us at Sonnet 3.7. The gap between that and present day is tremendous.
I am not skilled enough to say this tactfully, but: expert opinions can be the slowest to update on the news that their specific domain may have, in hindsight, have been the wrong horse. It's the quote about it being difficult to believe something that your income requires to be false, but instead of income it can be your whole legacy or self concept. Way worse.
> My take is that research taste is going to rely heavily on the short-duration cognitive primitives that the ARC highlights but the METR metric does not capture.
I don't have an opinion on this, but I'd like to hear more about this take.
Thanks for reading, and I really appreciate your comments!
> who feed their produced tokens back as inputs, and whose tuning effectively rewards it for doing this skillfully
Ah, this is a great point, and not something that I considered. I agree that the token feedback does change the complexity, and it seems that there's even a paper by the same authors about this very thing! https://arxiv.org/abs/2310.07923
I'll have to think on how that changes things. I think it does take the wind out of the architecture argument as it's currently stated, or at least makes it a lot more challenging. I'll consider myself a victim of media hype on this, as I was pretty sold on this line of argument after reading this article https://www.wired.com/story/ai-agents-math-doesnt-add-up/ and the paper https://arxiv.org/pdf/2507.07505 ... who brush this off with:
>Can the additional think tokens provide the necessary complexity to correctly solve a problem of higher complexity? We don't believe so, for two fundamental reasons: one that the base operation in these reasoning LLMs still carries the complexity discussed above, and the computation needed to correctly carry out that very step can be one of a higher complexity (ref our examples above), and secondly, the token budget for reasoning steps is far smaller than what would be necessary to carry out many complex tasks.
In hindsight, this doesn't really address the challenge.
My immediate next thought is - even solutions up to P can be represented within the model / CoT, do we actually feel like we are moving towards generalized solutions, or that the solution space is navigable through reinforcement learning? I'm genuinely not sure about where I stand on this.
> I don't have an opinion on this, but I'd like to hear more about this take.
I'll think about it and write some more on this.
It's general-purpose enough to do web development. How far can you get from writing programs and seeing if you get the answers you intended? If English words are "grounded" by programming, system administration, and browsing websites, is that good enough?
Now that understanding video and projecting what happens next indicates we're getting past the LLM problem of lacking a world model. That's encouraging.
There's more than one way to do intelligence. Basic intelligence has evolved independently three times that we know of - mammals, corvids, and octopuses. All three show at least ape-level intelligence, but the species split before intelligence developed, and the brain architectures are quite different. Corvids get more done with less brain mass than mammals, and don't have a mammalian-type cortex. Octopuses have a distributed brain architecture, and have a more efficient eye design than mammals.
I don't think those are examples of unique intelligence except perhaps in a chauvinistic, anthropomorphic sense. We only know that we can't get other animals to display patterns we associate with intelligence in humans, however truthfully that's just as likely to be that our measures of intelligence don't map cleanly onto cognitive/perceptual representations innate to other animals. As we look for new ways to challenge animals that respect their innate differences, we're finding "simple" organisms like ants and spiders are surprisingly capable.
For a clear analogy, consider how tokenization causes LLMs to behave stupidly in certain cases, even though they're very capable in others.
I don't really understand the argument that AGI cannot be achieved just by scaling current methods. I too believe that (for any sane level of scaling anyway), but this-year's LLMs are not using entirely last-year's methods. And they, in turn, are using methods that weren't used the year before.
It seems like a prediction like "Bob won't become a formula one driver in a minivan". It's true, but not very interesting.
If Bob turned up a couple of years later in Formula one, you'd probably be right in saying that what he is driving is not a mini van. The same is true for AGI anyone who says it can't be done with current methods can point to any advancement along the way and say that's the difference.
A better way to frame it would be, is there any fundimental, quantifiable ability that is blocking AGI? I would not be surprised if the breakthrough technique has been created, but the research has not described the problem that it solves well enough for us to know that it is the breakthrough.
I realise that, for some the notion of AGI is relatively new, but some of us have been considering the matter for some time. I suspect my first essay on the topic was around 1993. It's been quite weird watching people fall into all of the same philosophical potholes that were pointed out to us at university.
i think the minivan analogy is flawed, and that AGI is moving from "bob driving a minivan" to "bob literally becoming the thing that is formula one"
What would that even mean though? Who is making claims of that sort?
I feel like it's such a bending of the idea,that it's not really making a prediction of anything at all.
Then you don't understand Machine Learning in any real way. Literally the 3rd or 4th thing you learn about ML is that for any given problem, there is an ideal model size. Just making the model bigger doesn't work because of something called the curse of dimensionality. This is something we have discovered about every single problem and type of learning algorithm used in ML. For LLMs, we probably moved past the ideal model size about 18 months ago. From the POV of something who actually learned ML in school (from the person who coined the term), I see no real reason to think that AGI will happen based upon the current techniques. Maybe someday. Probably not anytime soon.
PS The first thing you learn about ML is to compare your models to random to make sure the model didn't degenerate during training.
From the POV of something who actually learned ML in school (from the person who coined the term)
Sounds like that was quite awhile ago.
Um, what? Are you interpreting scaling to mean adding parameters and nothing else?
I'm not entirely sure where you get your confidence that we've past the ideal model size from, but at least that's a clear prediction so you should be able to tell if and when you are proven wrong.
Just for the record, do you care to put an actual number on something we won't go past?
[edit] Vibe check on user comes out as
That's got to be some sort of record.Is there a tool or something that gives this vibe check? (Serious question)
How are you calculating that? Also, my 1000 foot view would see that "rating" as something most HN commenters would match.
There was a meme going around that said the fall of Rome was an unannounced anticlimactic event where one day someone went out and the bridge wasn't ever repaired.
Maybe AGI's arrival is when one day someone is given an AI to supervise instead of a new employee.
Just a user who's followed the whole mess, not a researcher. I wonder if the scaffolding and bolt-ons like reasoning will sufficiently be an asymptote to 'true AGI'. I kept reading about the limits of transformers around GPT-4 and Opus 3 time, and then those seem basic compared to today.
I gave up trying to guess when the diminishing returns will truly hit, if ever, but I do think some threshold has been passed where the frontier models are doing "white collar work as an API" and basic reasoning better than the humans in many cases, and once capital familiarizes themselves with this idea more, it's going to get interesting.
But it's already like that; models are better than many workers, and I'm supervising agents. I'd rather have the model than numerous juniors; esp. the kind that can't identify the model's mistakes.
The problem becomes your retirement. Sure, you've earned "expert" status, but all the junior developers won't be hired, so they'll never learn from junior mistakes. They'll blindly trust agents and not know deeper techniques.
From my experience, if you think AI is better than most workers, you're probably just generating a whole bunch of semi-working garbage, accepting that input as good enough and will likely learn the hardware your software is full of bugs and incorrect logic.
This is my greatest cause for alarm regarding LLM adoption. I am not yet sure AI will ever be good enough to use without experts watching them carefully; but they are certainly good enough that non-experts cannot tell the difference.
I'd always imagined that AGI meant an AI was given other AIs to manage.
I don't think this is how it'll play out, and I'm generally a bit skeptical of the 'agent' paradigm per se.
There doesn't seem to be a reason why AIs should act as these distinct entities that manage each other or form teams or whatever.
It seems to me way more likely that everything will just be done internally in one monolithic model. The AIs just don't have the constraints that humans have in terms of time management, priority management, social order, all the rest of it that makes teams of individuals the only workable system.
AI simply scales with the compute resources made available, so it seems like you'd just size those resources appropriately for a problem, maybe even on demand, and have a singluar AI entity (if it's even meaningful to think of it as such, even that's kind of an anthropomorphisation) just do the thing. No real need for any organisational structure beyond that.
So I'd think maybe the opposite, seems like what agents really means is a way to use fundamentally narrow/limited AI inside our existing human organisations and workflows, directed by humans. Maybe AGI is when all that goes away because it's just obviously not necessary any more.
I used to also believe along these lines but lately I'm not so sure.
I'm honestly shocked by the latest results we're seeing with Gemini 3 Deep Think, Opus 4.6, and Codex 5.3 in math, coding, abstract reasoning, etc. Deep Think just scored 84.6% on ARC-AGI-2 (https://deepmind.google/models/gemini/)! And these benchmarks are supported by my own experimentation and testing with these models ~ specifically most recently with Opus 4.6 doing things I would have never thought possible in codebases I'm working in.
These models are demonstrating an incredible capacity for logical abstract reasoning of a level far greater than 99.9% of the world's population.
And then combine that with the latest video output we're seeing from Seedance 2.0, etc showing an incredible level of image/video understanding and generation capability.
I was previously deeply skeptical that the architecture we have would be sufficient to get us to AGI. But my belief in that has been strongly rattled lately. Honestly I think the greatest gap now is simply one of orchestration, data presentation, and work around in-context memory representations - that is, converting work done into real world into formats/representations, etc. amenable for AI to run on (text conversion, etc.) and keeping new trained/taught information in context to support continual learning.
>These models are demonstrating an incredible capacity for logical abstract reasoning of a level far greater than 99.9% of the world's population.
This is the key I think that Altman and Amodei see, but get buried in hype accusations. The frontier models absolutely blow away the majority of people on simple general tasks and reasoning. Run the last 50 decisions I've seen locally through Opus 4.6 or ChatGPT 5.2 and I might conclude I'd rather work with an AI than the human intelligence.
It's a soft threshold where I think people saw it spit out some answers during the chat-to-LLM first hype wave and missed that the majority of white collar work (I mean it all, not just the top software industry architects and senior SWEs) seems to come out better when a human is pushed further out of the loop. Humans are useful for spreading out responsibility and accountability, for now, thankfully.
LLMs are very good at logical reasoning in bounded systems. They lack the wisdom to deal with unbounded systems efficiently, because they don't have a good sense of what they don't know or good priors on the distribution of the unexpected. I expect this will be very difficult to RL in.
While I think 99.9% is overstating it, I can believe that number is strictly more than 1% at this point.
State of the Art Large Language Models are already Generally Intelligent, in so far as the term has any useful meaning. Their biggest weakness are long horizon planning competency, and spatial reasoning and navigation, both of which continue to improve steadily and are leaps and bounds above where they were a few years ago. I don't think there's any magic wall. Eventually they will simply get good enough, just like everything else.
I think that AGI has already happened, but it's not well understood, nor well distributed yet.
OpenClaw, et al, are one thing that got me nudged a little bit, but it was Sammy Jankis[1,2] that pushed me over the edge, with force. It's janky as all get out, but it'll learn to build it's own memory system on top of an LLM which definitely forgets.
[1] https://sammyjankis.com/
[2] https://news.ycombinator.com/item?id=47018100
The Sammy Jankis link was certainly interesting. Thanks for sharing.
Whether or not AGI is imminent, and whether or not Sammy Jankis is or will be conscious... it's going to become so close that for most people, there will be no difference except to philosophers.
Is AGI 'right around the corner' or currently already achieved? I agree with the author, no, we have something like 10 years to go IMO. At the end of the post he points to the last 30 years of research, and I would accept that as an upper bound. In 10 to 30 years, 99% of people won't be able to distinguish between an 'AGI' and another person when not in meatspace.
I really don't see why AGI can't be a spectrum and we just have very weak AGI and going from weak to strong will take many years, if it ever happens.
Our brains evolved to hunt prey, find mates, and avoid becoming hunted ourselves. Those three tasks were the main factors for the vast majority of evolutionary history.
We didn't evolve our brains to do math, write code, write letters in the right registers to government institutions, or get an intuition on how to fold proteins. For us, these are hard tasks.
That's why you get AI competing at IMO level but unable to clean toilets or drive cars in all of the settings that humans do.
I'm not excited about a future where the division of labor is something like: AI does all of the interesting stuff and the humans clean the toilets. Especially now that I'm older and my joints won't tolerate it.
It's not that AI is intrinsically better at software engineering, writing, or art than it is at learning how to clean toilets. It's not. The real issue is that cleaning toilets using humans is cheap.
That, sadly, is the incentive driving the current wave of AI innovation. Your job will be automated long before your household chores are.
Don't be ridiculous, AI will create robots that do all the work and the only use for humans will be as amusement for the rich who own everything. Probably not sarcasm, I don't even know.
> Our brains evolved to hunt prey, find mates, and avoid becoming hunted ourselves. Those three tasks were the main factors for the vast majority of evolutionary history.
That seems like a massive oversimplification of the things our brains evolved to do.
> We didn't evolve our brains to do math, write code, write letters in the right registers to government institutions, or get an intuition on how to fold proteins. For us, these are hard tasks.
Humans discovered or invented all of those.
And it took a massively long time for that to happen after we gained that capability. Human ingenuity really only took off after we put a lot of the work on writing and tools. It wasn't so much that humans created many of these, but the super human organism that uses language and writing to express ideas.
Now think about what we just created.
Only in small ways and very recently, evolutionarily speaking, were those things rewarded by natural selection (and even that has stopped nowadays).
I'm not sure that's a good way to think about it.
Evolution transcends hard lines in the temporal sand that "separate species".
It also took billions of years of evolution to get to humans. so, humans, on the grander scale of life, is also just a very recent development.
How will we know if its AGI/Not AGI? (I don't think a simple app is gonna cut it here haha)
What is the benchmark now that the Turing test has been blown out of the water?
Until recently, philosophy of artificial intelligence seemed to be mostly about arguments why the Turing test was not a useful benchmark for intelligence. Pretty much everyone who had ever thought about the problem seriously had come to the same conclusion.
The fundamental issue was the assumption that general intelligence is an objective property that can be determined experimentally. It's better to consider intelligence an abstraction that may help us to understand the behavior of a system.
A system where a fixed LLM provides answers to prompts is little more than a Chinese room. If we give the system agency to interact with external systems on its own initiative, we get qualitatively different behavior. The same happens if we add memory that lets the system scale beyond the fixed context window. Now we definitely have some aspects of general intelligence, but something still seems to be missing.
Current AIs are essentially symbolic reasoning systems that rely on a fixed model to provide intuition. But the system never learns. It can't update its intuition based on its experiences.
Maybe the ability to learn in a useful way is the final obstacle on the way towards AGI. Or maybe once again, once we start thinking we are close to solving intelligence, we realize that there is more to intelligence than what we had thought so far.
The Turing test isn't as bad as people make it out to be. The naive version, where people just try to vibe out whether something is a human or not, is obviously wrong. On the other hand, if you set a good scientist loose on the Turing test, give them as many interactions as they want to come to a conclusion, and you let them build tools to assist in the analysis, it suddenly becomes quite interesting again.
For example, looking at the statistical distribution of the chat over long time horizons, and looking at input/output correlations in a similar manner would out even the best current models in a "Pro Turing Test." Ironically, the biggest tell in such a scenario would be excess capabilities AI displays that a human would not be able to match.
There is a different way I look at this.
Humans will never accept we created AI, they'll go so far as to say we were not intelligent in the first place. That is the true power of the AI effect.
And yet another way to look at it is maybe current LLM agents are AGI, but it turns out that AGI in this form is actually not that useful because of its many limitations and solving those limitations will be a slow and gradual process.
I like the line of thinking from an earlier commenter: when an AI company no longer has any humans working, we'll know we're there.
I don't think this is a beneficial line of reasoning. All you need to reach that is a moderate fall in AI stock prices.
Supranormal GDP growth is my bar. When its actually able to get around bottlenecks and produce value on a societal level
An agent need not have wants, so why would it try to increase its efficiency to obtain things?
I don't think that was the intent of the comment, more that true AGI should be so useful and transformative that it unlocks enough value and efficiencies to boost GDP. Much like the Industrial Revolution or harnessing electricity, instead of a fancy chatbot.
Increased productivity is not equivalent to intelligence.
No one said it is. Sometimes correlation does equal causation.
To my knowledge Turing test has not been blown out of the water. The forms I saw were time limited and participants were not pushed hard to interrogate.
You have no idea whether you're talking to an LLM right now, and neither do I. That's good enough for me.
I’m under the same impression. I don’t think LLMs are the path to AGI. The “intelligence” we see is mostly illusory. It’s statistical repetition of the mediocre minds who wrote content online.
The intelligence we think we recognize is simply an electronic parrot finding the right words in its model to make itself useful.
That's pre-training. Post training with RL can make models arbitrarily good at specific capabilities, and it's usually done via pooled human experts, so it's definitely not statistically mediocre.
The issue is that we're not modelling the problem, but a proxy for the problem. RL doesn't generalize very well as is, when you apply it to a loose proxy measure you get the abysmal data efficiency we see with LLMs. We might be able to brute-force "AGI" but we'd certainly do better with something more direct that generalizes better.
Maybe i'm misunderstanding your point, but human's have pretty abysmal data efficiency, too. We have to use tools for everything... ledgers, spreadsheets, data-bases, etc. It'll be the same for an AGI, there won't be any reason for it to remember every little detail, just be able to use the appropriate tool, when appropriate.
I fear that AI will be intelligent enough to negate human general intelligence before it is itself generally intelligent.
Until I can get a robot wife maid im not worried about or even confident I will ever see actual AGI. People have been predicting it for as long as fusion power and while progress has been made, we might still be like Romans dreaming of flight.
Dear sir, what does embodiment actually have to do with agi? Not much different than saying someone that is paralyzed is not intelligence.
More so, our recent advances in AI have massively accelerated robotics evolution. They are becoming smarter, faster, and more capable at an ever increasing rate.
> For example, transformer-based language models can't reliably do multi-digit arithmetic because they have no number sense, only statistical patterns over digit tokens.
I asked Opus 4.6 to multiply 50,651 x 895,707. It immediately responded: 45,368,055,057. This was wrong, but very close (good to about 10^-5 relative error). I then asked it how it arrived at that answer, and whether it had used any tool calls. It said:
``` No, I didn't use any specialized tools — I just computed it directly. Though I should be honest: large multiplications done "in my head" like this are prone to error. Let me double-check by breaking it down: 50,651 × 895,707 = 50,651 × (900,000 - 4,293) = 50,651 × 900,000 - 50,651 × 4,293 First term: 50,651 × 900,000 = 45,585,900,000 Second term: 50,651 × 4,293
50,651 × 4,000 = 202,604,000 50,651 × 293 = 14,840,743 Total: 202,604,000 + 14,840,743 = 217,444,743
Result: 45,585,900,000 - 217,444,743 = 45,368,455,257 So my first answer was actually slightly off! The corrected answer is 45,368,455,257. I don't have a calculator tool available — I'm doing this arithmetically by decomposition. For mission-critical calculations, I'd recommend verifying with a calculator since I can make arithmetic errors on large numbers like this. ```
So I would rate OP's claim (which is supported by a paper from 2024) highly implausible. Opus 4.6 appears to be able to do multi-digit arithmetic formally, as well as give remarkably accurate estimates based on something like "number sense".
I don’t think that I ever wanted to do an arithmetic operation with a computer and have a wrong answer as the result. One day, you’ll be happy with typing ls and have the system ‘rm -rf /‘ itself
Except we know how these work. There's no number sense. It's predicting tokens. It is able to recount the mathematical foundations because in its training dataset, that often happens, both in instructional material and in proofs.
As far as I'm concerned, it's already here.
I think it's really poor argument that AGI won't happen because model doesn't understand physical world. That can be trained the same way everything else is.
I think the biggest issue we currently have is with proper memory. But even that is because it's not feasible to post-train an individual model on its experiences at scale. It's not a fundamental architectural limitation.
You need to be able to at least control things that interact with the world to learn from it.
When people move the goal posts for AGI toward a physical state, they are usually doing it so they can continue to raise more funding rounds at a higher valuation. Not saying the author is doing that.
AGI is a messy term, so to be concise, we have the models that can do work. What we lack is orchestration, management, and workflows to use models effectively. Give it 5 years and those will be built and they could be built using the models we have today (Opus 4.6 at the time of this message).
Manual orchestration is a brittle crutch IMO - you don't get to the moon by using longer and longer ladders. A powerful model in theory should be able to self orchestrate with basic tools and environment. The thing is that it also might be as expensive as a human to run - from a tokens AND liability perspective.
I think it is.
I just struck me - would be fun to re-read The Age of Spiritual Machines (Kurzweil, 1999.) I was so into it 26-27 years ago. The amount of ridicule this man has suffered on HN is immense.
I'm certainly not holding my breath.
In a handful of prompts I got the paid version of ChatGPT to say it's possible for dogs to lay eggs under the right circumstances.
Do you believe you could not find humans who would do this?
That's not really the point. If our definition of AGI does not include "being able to reliably do logic" then what are we even talking about? We don't really need computers with human abilities--we have plenty of humans. We need computers with _better_ abilities.
OK, but "what we need" is not the question. If the definition of AGI is "as smart as the average human in all areas", then it doesn't matter if the average human is pretty useless at a lot of tasks, that's still the definition of AGI.
But I'd like to think that, even though you could find exceptions, the average human is never confused about whether dogs can lay eggs or not.
I reached your view the day my grandma told me I was wrong and a hummingbird was a type of insect...
Like, it's in the name.
His objection might be that those humans aren't actually intelligent.
I've said it before and I'll say it again, all AI discussion feels like a waste of effort.
“yes it will”, “no it won’t” - nobody really knows, it's just a bunch of extremely opinionated people rehashing the same tired arguments across 800 comments per thread.
There’s no point in talking about it anymore, just wait to see how it all turns out.
Nope. Not good enough. Your approach won’t drive engagement. We need the same tired arguments across 1600 comments per thread.
If AGI can be defined as meeting the general intelligence of a Redditor, we hit ASI a while ago. Highly relevant comment <https://www.reddit.com/r/singularity/comments/1jh9c90/why_do...> by /u/Pyros-SD-Models:
>Imagine you had a frozen [large language] model that is a 1:1 copy of the average person, let’s say, an average Redditor. Literally nobody would use that model because it can’t do anything. It can’t code, can’t do math, isn’t particularly creative at writing stories. It generalizes when it’s wrong and has biases that not even fine-tuning with facts can eliminate. And it hallucinates like crazy often stating opinions as facts, or thinking it is correct when it isn't.
>The only things it can do are basic tasks nobody needs a model for, because everyone can already do them. If you are lucky you get one that is pretty good in a singular narrow task. But that's the best it can get.
>and somehow this model won't shut up and tell everyone how smart and special it is also it claims consciousness. ridiculous.
I’d love to see one of the AI behemoths put their money where their mouth is and replace their C-suite with their SOTA chatbot.
AGI is here. 90%+ of white collar work _can_ be done by an LLM. We are simply missing a tested orchestration layer. Speaking broadly about knowledge work here, there is almost nothing that a human is better at than Opus 4.6. Especially if you're a typical office worker whose job is done primarily on a computer, if that's all AGI is, then yeah, it's here.
I ran a quick experiment with Claude and Perplexity, both free versions. I input some retirement info (portfolios balances etc), my age, my desired retirement age etc. Simple stuff that a financial planner would have no issue with. Perplexity was very very good on the surface. Rarely made an obvious blunder or error, and was fast. Claude was much slower and despite me inputting my exact birthdate, kept messing up my age by as much as 18 months. This obviously screws up retirement planning. I also asked some questions about how RMDs would affect my taxes, and asked for some strategies. Perplexity was convinced that I should do a Roth conversion to max up to the 22% bracket, while Claude thought that the tax savings would be minimal.
Mind you, I used the EXACT same prompts. I don't know which model Perplexity was using since the free version has multiple it chooses from (including Claude 3.0).
Opus is the very best and I still throw away most of what it produces. If I did not carefully vet its work I would degrade my code bases so quickly. To accurately measure the value of AI you must include the negative in your sum.
I would and have done the same with Jr. devs. It's not an argument against it being AGI.
API Opus 4.6 will tell you it's still 2025, admit it's wrong then revert back to being convinced it's 2025 as it nears it's context limit.
I'll go so far as to say LLM agents are AGI-lite but saying we "just need the orchestration layer" is like saying ok we have a couple neurons, now we just need the rest of the human.
Giving opus a memory or real-time access to the current year is trivial. I don't see how that's an argument against it being AGI.
AGI is when it can do all intellectual work that can be done by humans. It can improve its own intelligence and create a feedback loop because it is as smart as the humans who created it.
This has always been my personal definition of AGI. But the market and industry doesn't agree. So I've backed off on that and have more or less settled on "can do most of the knowledge work that a human can do"
No, that is ASI. No human can do all intellectual work themselves. You have millions of different human models based on roughly the same architecture to do that.
When you have a single model that can do all you require, you are looking at something that can run billions of copies of itself and cause an intelligence explosion or an apocalypse.
"Artificial general intelligence (AGI) is a type of artificial intelligence that matches or surpasses human capabilities across virtually all cognitive tasks."
Why the super-high bar? What's unsatisfying is that aren't the 'dumbest' humans still a general intelligence that we're nearly past, depending how you squint and measure?
It feels like an arbitrary bar to perhaps make sure we aren't putting AIs over humans, which they are most certainly in the superhuman category on a rapidly growing number of tasks.
Can llms manipulate spread sheets?
> there is almost nothing that a human is better at than Opus 4.6.
Lolwut. I keep having to correct Claude at trivial code organization tasks. The code it writes is correct; it’s just ham-fisted and violates DRY in unholy ways.
And I’m not even a great coder…
This is entirely solvable with skills, memory, context, and further prompting. All of which can be done in a way that's reliable and repeatable.
You wouldn't expect a Jr. dev to be the best at keeping things dry either.
> violates DRY in unholy ways
Well said
I’m very pro AI coding and use it all day long, but I also wouldn’t say “the code it writes is correct”. It will produce all kinds of bugs, vulnerabilities, performance problems, memory leaks, etc unless carefully guided.
So it's even more human than we thought
That "simple orchestration layer" (paraphrased) is what I consider the AGI.
But yeah, I suspect LLM:s may actually get close enough. "Just" add more reasoning loops and corresponding compute.
It is objectively grotesquely wasteful (a human brain operates on 12 to 25 watts and would vastly outperform something like that), but it would still be cataclysmic.
/layperson, in case that wasn't obvious
If we can get AI down to this power requirement then it's over for humans. Just think of how many copies of itself thinking at the levels of the smartest humans it could run at once. Also where all the hardware could hide itself and keep itself powered around the world.
> a human brain operates on 12 to 25 watts
Yeah, but a human brain without the human attached to it is pretty useless. In the US, it averages out to around 2 kW per person for residential energy usage, or 9 kW if you include transportation and other primary energy usage too.
Fair.
Maybe the Matrix (1999) with the human battery farms were on to something. :)
I think "tested" is the hard part. The simple part seems to be there already, loops, crons, and computer use is getting pretty close.