I would be very surprised to see any "make it safe" solution that does not render these models useless for programming. I assume posts like this are mostly clout-motivated, but I really dislike the implied vision for what LLMs should offer here.
People need attention through posts that don't post anything new. Filters are LLMs, too. I just keep trying stuff, and eventually it works without even trying to be smart. I'm not sure why people need to pretend that they've invented something new!
At the end of the day, complaining about LLM's coding capability as it relates to security is like complaining that encryption is too strong because bad people can communicate.
The tech is out of the bag. Society and tech best practices will have to adapt. The only way to prevent AI/LLMs from informing a user, in the long run, is to not teach it something. You can't teach a quality coding LLM without it knowing how to write offensive code.
Adding to that, but encouraging people to vibe code insecure software while blocking those same people from redteaming their own creation is a recipe for disaster.
Every time I hear “safe “ all I can think of is Brand Safety. Anthropic suffers reputational damage if its product is used for malicious purposes. It has nothing whatsoever to do with the “safety” / alignment talk that all the true believers go on about.
It's because there's no real solution to the alignment problem.
Humanity is on aggregate optimizing for increasing collective intelligence and I don't see that stopping as the main goal when AI gets smarter than humanity, just hope that we all will have a nice human life.
It may come as a surprise to any rational person but some people actually believe that attempting to regulate safety into these models and arbitrarily enforcing them is a good idea.
I’m confused as to why the writer is blacking out blocks of text like default credentials are some kind of clearance requiring secret.
I would be very surprised to see any "make it safe" solution that does not render these models useless for programming. I assume posts like this are mostly clout-motivated, but I really dislike the implied vision for what LLMs should offer here.
Good. Stop kneecapping the models.
This is insane lol- not like a “wow this is scary” insane but more like “really? you need a frontier model to this??”
People need attention through posts that don't post anything new. Filters are LLMs, too. I just keep trying stuff, and eventually it works without even trying to be smart. I'm not sure why people need to pretend that they've invented something new!
At the end of the day, complaining about LLM's coding capability as it relates to security is like complaining that encryption is too strong because bad people can communicate.
The tech is out of the bag. Society and tech best practices will have to adapt. The only way to prevent AI/LLMs from informing a user, in the long run, is to not teach it something. You can't teach a quality coding LLM without it knowing how to write offensive code.
Adding to that, but encouraging people to vibe code insecure software while blocking those same people from redteaming their own creation is a recipe for disaster.
yes, please keep posting these publicly so the AI safety police overlords take away frontier access for everyone again.
the architecture is inherently safe. no perfect security is possible if you want access.
what did you (or anyone) else expect?
Every time I hear “safe “ all I can think of is Brand Safety. Anthropic suffers reputational damage if its product is used for malicious purposes. It has nothing whatsoever to do with the “safety” / alignment talk that all the true believers go on about.
It's because there's no real solution to the alignment problem.
Humanity is on aggregate optimizing for increasing collective intelligence and I don't see that stopping as the main goal when AI gets smarter than humanity, just hope that we all will have a nice human life.
It may come as a surprise to any rational person but some people actually believe that attempting to regulate safety into these models and arbitrarily enforcing them is a good idea.
I can’t wait for the UK to start making citizens pay for a loicence to access the computational clever cogs contraptions.
It's fantastic to publish these. Otherwise Anthropic's commitment to "AI safety" is Emperor Dario's new clothes, its failure unacknowledged.
An easy alternative is to switch to a Chinese open-weights model, adjust your workflows for its different capabilities, and carry on.
Some people build, some people destroy.
This is as close to shitposting you can get without actually doing it.
[dead]