Gotta say, I've lost all interest in cloud-based AI products. Too many cool features and workflows that I was once excited about that I can't or don't use anymore for a variety of reasons (price hikes, subjectively nerfed, disappeared altogether, replaced,...) for me to even remember. It's tiring.
I've set up a small rig, mostly settled on Qwen3.6 and I'm slowly adding features myself. It probably can't compete with Claude. I don't even know, I've stopped checking. It's providing a ton of value to me as is, and it only keeps getting better. All it takes is to realize that it doesn't actually matter if the grass is (maybe even objectively) greener somewhere else. Feels so good to know that it won't change under my feet. I've got this amazing, highly extensible tool, and it's mine.
Haven't had much time to test it other than asking a few questions & changing some HTML in cline so it might be thick as a brick for all I know, but still worth trying
The reason why I was curious is that I am running my stuff on a Strix Halo and I get the feeling that this class of devices ( gmktek, minisforum, lenovo, etc. ) seem to becoming a pretty good alternative
I never got into any of the AI models because it was clear local first was going to be more valueable, if they were to replace coding tasks.
I tried out a few models and ended up going with either Qwen3-Coder-Next (no think, just do) and Qwen3.6-35B (thinking, w/llamacpp token budget). Created a customized prompt that works fairly well to around ~60k tokens and then is a toss up on whether it's poisoned itself or I've directly steered it into the wrong. When it's clear that's happened, if it's important to continue, ask it to write a doc then start fresh.
I don't kno whow any one cold have witnessed the last 2 decades of American VC funded tech startups and tell themselves, "you know, this will be a reliable technolgy with no hidden problems".
Even a sober technical evaluation is just two steps:
1. You're proposing to build a app on a non-deterministic model.
2. That model is hosted behind a non-deterministic system (model alignment, model guardrails, system context subterfuge, cost/token pricing)
---
So you want to build your app and you think you're going to kep up with both #1 and #2?
We live in a non-deterministic world. Anything "deterministic" in it is a castle built on quicksand.
LLMs are, as far as the nastiness of the Real World goes, really fucking benign. Future models outperform past models, both in open weight land and at the big frontier labs. Performance per $ only ever goes up. That's just nice.
YES, but you seem to not understand that having two non-deterministic layers is incompatible. #1 is fine: it has random issue and you build around those random issues; those issues don't change unless you change them.
#2 is not fine; that non-determinism you do not control, have no insight into, etc.
I'm saying sure, give me #1 if it means I can build a harness around it and smooth over the edges. But I'm not taking #1 and #2. There's zero reasonable way to manae two non-deterministic systems.
I am a huge fan of Copilot CLI. It just feels so logical and low-friction to use compared to Claude Code. Having the ability to juggle various models at will is really nice too. ("Plan this using Opus 4.6, let GPT 5.4 verify the plan and give feedback before implementing with Sonnet 4.6").
Unfortunately the June pricing change for Copilot forced me personally as well as my entire department at work to switch to Claude Code. With copilot we were hitting a few dollars of extra spend over the included credits in April and May, then in June we started chewing through the monthly budget every 2-3 days.
Just a completely insane price hike from the customer's perspective, I don't know what MS were thinking there.
Even if that is the price they need to be sustainable they should have waited until the competition changed their prices first. I wouldn't be surprised if Copilot lost 50% or more of their customer base last month.
Eventually this could be where all the major players set their prices, so the thought occurs to me that nations should run some form of "public access AI", just like they did for TV. Use the free open models and use tax money to finance a few datacenters. Geo-lock the use and set strict throttles to manage load, but let school children and citizens use that AI freely otherwise.
If Copilot's pricing is the level for all AI in a few years, only the unicorn companies can afford to use them, and everybody else has no chance of competing with a company that can use AI.
Anthropic just provides a subscription - which Enterprise usually doesn't want you to use because everything you're submitting through that will be trained on / becomes part of their model.
So If you use it without explicit permission from your employer you may be committing a contract violation which can have serious consequences - up to jail time - as they can sue you for that.
It's a little more complicated than that, unfortunately.
If you use Claude via API in your own app, you're paying full price.
If you have an "API Plan" for Claude Code (i.e., free), you're paying full price.
If you have a Pro, Max, Max 5x, or Max 20x, your tokens are subsidized up until a rate limit. Then you pay full price for usage thereafter, until the end of the billing cycle.
The widespread belief in industry right now is that the per-seat pricing (which Copilot bailed from first) is going to go away in the near-term.
It's not more complicated? I referenced the subscription... I just added a small warning about it as some people may or may not be aware about the fact they're opening themselves up to serious consequences if they decide to use it on their employers code without explicit permission... Depending on their employers digression, eg largish entrenched employers which value their IP will be more willing to inflict damages on you. An upstart will likely not care unless the CEO sees an opportunity to profit personally.
i ran out of claude credits for the first time at work in months and had to fallback to copilot.
pleasantly surprised, claude's way ahead in tooling but the ability to designate what model your subagents use and having access to all models is a better feature than all of what claude offers combine atm.
The only limit on the amount of ai can consume in a month a work is dollars, so anything that helps with cost is the best model/harness for me.
It also did a better job at smart designating subagents itself where as claude often used higher cost models.
Letting them automatically pick the model is no longer sustainable, but there are some very efficient models that are capable of executing the plan created by a much nicer model. It’s kind of embarrassing to think that Microsoft’s auto model selection was choosing cutting edge reasoning models for tasks like resolving dependency conflicts back when their pricing was at a loss.
It's a combination of small things really. The mentioned ability to easily call on various models in the same prompt, having agent definitions be able to orchestrate other agents just by mentioning it in the description, doing things like goal/loop automatically.
There is also IMO a distinct difference in "tone" in the dialogue. Claude seems to impersonate a human a bit more than I like.
Claude is of course very good as well and does a few things better than copilot too, but overall I'd prefer to use Copilot.
Not OP but Copilot CLI is really straightforward, almost minimal in some sense. It's a lot like OpenCode but stripped down.
I also use the Copilot ACP server inside Pycharm and that works decently well too, although it has some annoying bugs, but if you're a Jetbrains user you're used to annoying bugs.
The price hike was insane. My $dayjob is moving away from Copilot and into Claude Code subscriptions. In parallel we are testing AWS bedrock and Deepinfra for open weight models in preparation for when CC inevitably stops being such a good deal and aligns with actual token cost. Fun times.
I had to do the same. I expect everything will go token pricing, and at that point a LOT of small/mid businesses will drastically change how they use code.
I've swapped to the 20x Claude plan for a month or two to knock out two ideas I need to get it MVP - expecting Claude to go token priced soon.
Hell it’s past small and mid. I do work for a few of the fortune 100s and what I’m hearing is somewhere between “justify all of your usage or don’t use it” to “you now get 500 bucks a month, go over that and you’re getting it revoked”
I used GitHub Copilot for my VS 2026 development and switched between ChatGPT and Claude. That was before I discovered Claude Code and the Codex app. Copilot was OK for my purposes, and the USD 10 per month fee was enough for my usage.
However, last month they introduced a new pricing model ( I know the old pricing was not sustainable), and my USD 10 was exhausted within days. Because of that, I switched to Claude Code and Codex and have never looked back. Yes, tokens on Claude Code and Codex are subsidized heavily, but let's just enjoy when good things last.
I do feel there is a difference between using Claude via Copilot versus using Claude directly in Claude Code. I'm not sure what Microsoft is doing behind the scenes.
Ah OK, so the ACP connector ensures tool calls work with Zed, and communicates the available tools and their results to the harness, and then the harness mainly provides a system prompt and the API calls?
I had a similar experience moving away from Copilot within Zed. Now using the reasonix harness for Deepseek that makes cache hits almost free. And that's with unsubsidized American providers like Digital Ocean or Cloudflare.
Yep reasonix is an absolute case study of caching. They literally compiled byte level cache in their design and it is insane. i can one shot many workflows, apps in under 0.05 cents.
You using models released this year? I hear this complaint a lot, and it's often due to using an old model which is not as good at tool calling as newer models.
What I noticed is that when the conversation starts the agent is pretty able to read from and write to files. As the conversation continues (and maybe sub agents are spawned) it forgets how to do this, complains, tries to resort to running shell or python code, sometimes it works. Sometimes it asks me to execute the code. If I refuse and point out it worked before than sometimes it remembers how to write, but mostly not and I need to start a new session.
When using Zed with the CoPilot integration I use Claude Opus and never had this issue.
Same ,I switched to cursor. I told it how to invoke msbuild and it can edit away without needing a native Visual studio plugin.. no problems at all. Target language c++
GitHub Copilot costs have ballooned in recent week, what once took $100 requires $300. I like using Claude with VS Code through Copilot and I feel it’s given me much better code, that I can control the quality. It’s much more transparent than Claude Code. It’s open source but and the IDE interface gives so many more features to have you context and control over whats generated. The increase in cost isn’t purely due to their price increases but also the Opus models agents use more tokens. So I’ve moved to Claude Code and I’m happily still using Opus 4.6. Fable and 4.7 seem to do much larger units of work, go off on tangents and make assumptions that frequently results in slop.
Finally an alternative to the big dogs that a company can use. People have been asking for a way to run the Chinese models from a trusted provider. Here GitHub delivered!
The performance, if we trust the benchmarks, put it at Sonnet 4.6.
Microsoft needs to offer cheaper option since they change to token based billing. GPT-5.4 used to be x1 for yearly subscriber but now it cost 6x. i run out the premium request for just couple prompts. Github copilot for $10 used to be the best value since you get all the US AI labs model for cheap.
> The performance, if we trust the benchmarks, put it at Sonnet 4.6.
I don't trust these benchmarks. I used a number of times Kimi K2.7 and I was disappointed. It would run in circles for things that Claude would do in one shot. However, my usage was via Ollama cloud, and I have no idea if they serve the actual model or a quantized version, and it was the quantization that degraded the performance.
The great news, in my opinion, is the precedent. If Microsoft is now serving Kimi K2.7, then very soon they might start serving GLM 5.2, and that is indeed a very competitive model.
Check your harness. I use Kimi K2.6 for a lot of stuff with OpenCode and omp and it's extremely effective. I'm gonna try 2.7, but it should be capable model based on what I've seen with previous models.
> People have been asking for a way to run the Chinese models from a trusted provider
I'm going to be called a chiller again, but at this point I don't care as it is relevant. Synthetic runs their own models for a reasonable price, GLM5.2 & Kimi K2.7-Code included.
Nice idea. I just asked Haiku to do the same in Claude Chat on iOS: it created a interactive react game, implemented the rules and let it play. Clever move for 1$ input and 5$ output, Anthropic!
when i will be extremely bored, I think I will make two models play chess against each other. I bet there's a chess benchmark / llm tournament already somewhere
For any small team wanting to try Copilot, heed my warning that you will waste hours navigating their billing settings using various out-of-date documentation. Long story short, I finally got an email from them saying that "Copilot Business is available for teams purchasing 10 or more licenses". This is undocumented but other people are reporting the same: https://github.com/orgs/community/discussions/199346
We're sticking with Cursor for now, using Kimi as our daily driver (branded as "Composer").
And why should one prefer GitHub Copilot over OpenCode? Worse harness, more expensive prices, unreliable product strategy, limited model support, the list goes on.
Is GitHub Copilot the best positioned platform for enterprise? They support Claude, GPT, Gemini, and now even open weight models. Larger orgs are paying at API rates anyway so it costs just as much as anywhere else. They have a pretty good agent CLI and SDK, and now a desktop app. They have hosted agents, and you can run their 'Agentic Workflows' in CI.
Has their reputation tanked so much that the alternatives get all the buzz? Or is it that non-enterprise users are priced out by the usage costs, so no free marketing?
Must be the system prompts. Ask copilot to dump its system prompt, and compare the system prompt with claude. It is not accurate but handy. I bet they are quite different
We just cancelled everyone's plans and rolled liteLLM out internally. We kept it for the insanely cheap tokens, but now that they've switched to the new pricing, they're just like openrouter, just with far fewer models.
Their harness is terrible compared to any of the other cli based harnesses I test against. Like shockingly bad.
This comes up all the time at work because the vendor management people don’t understand the llm ecosystem and think Claude through copilot is the same as Claude through Claude code.
A simple side by side comparison will show dramatic under performance 3 or 4 times out of five when I’m asked to explain the difference.
Enterprises still have big contracts with github, those companies are imposing tight spending limits now and if the open weight models enable those limits to last a bit longer that's probably quite popular.
> These models are hosted on US-based Azure AI Foundry infrastructure managed by GitHub and Microsoft. Customer prompts and responses are not sent to the original model developers.
Competition in coding models has gotten intense. A year ago it felt like choosing between two options. Now the bigger question is which model to route each task to.
It does, but it's very poorly documented and quite unstable (on purpose i think). What the other commenter said about the VSCode BYOK seems to be the more reliable way.
I tried adding a Foundry LLM as Github Copilot custom model and failed miserably. But with VSCode BYOK (and Github Copilot as the interfact) i did get it working, and i can now use Deepseek V4 Flash with Copilot.
A very sharp slap in the face of those of us who kept our annual plans and didn't ask for s refund: It seems it will not be available to annual subscriptions.
where does it say that? its not available to me (also annual) at the moment via cloud but it said it is rolling out gradually, so I'm not too concerned. Tho I'm not overly excited either given Copilot pricing now; I reckon this should be at most 1x.
> These models are hosted on US-based Azure AI Foundry infrastructure managed by GitHub and Microsoft. Customer prompts and responses are not sent to the original model developers.
Gotta say, I've lost all interest in cloud-based AI products. Too many cool features and workflows that I was once excited about that I can't or don't use anymore for a variety of reasons (price hikes, subjectively nerfed, disappeared altogether, replaced,...) for me to even remember. It's tiring.
I've set up a small rig, mostly settled on Qwen3.6 and I'm slowly adding features myself. It probably can't compete with Claude. I don't even know, I've stopped checking. It's providing a ton of value to me as is, and it only keeps getting better. All it takes is to realize that it doesn't actually matter if the grass is (maybe even objectively) greener somewhere else. Feels so good to know that it won't change under my feet. I've got this amazing, highly extensible tool, and it's mine.
Qwen3.6-35B-A3B-UD-Q4_K_M runs at about 11 tokens/second on my poor old 1060. Absolutely nuts how far we've come
Mind sharing your llama.cpp settings for that?
Haven't had much time to test it other than asking a few questions & changing some HTML in cline so it might be thick as a brick for all I know, but still worth trying
This sounds very appealing. What size Mac mini would I need for that?
A PC with an nvidia card with 16gb vram works just fine for Qwen MoE models, and these have worked great as a daily driver for me.
Good summary blog: https://maloyan.xyz/blog/running-qwen-locally-mac-mini-m4
I am curious if you implicitly assumed they are Macs or if that's what you are looking for specifically?
I assumed the 27B dense model would be preferable to a MoE model, and that it wouldn’t fit into a consumer graphics card, which leaves the Macs.
Then I assumed for cost and battery/heat reasons that a Mini would be better than a laptop.
The reason why I was curious is that I am running my stuff on a Strix Halo and I get the feeling that this class of devices ( gmktek, minisforum, lenovo, etc. ) seem to becoming a pretty good alternative
Same here, I’ve removed my credit card from Copilot and won’t be renewing
I never got into any of the AI models because it was clear local first was going to be more valueable, if they were to replace coding tasks.
I tried out a few models and ended up going with either Qwen3-Coder-Next (no think, just do) and Qwen3.6-35B (thinking, w/llamacpp token budget). Created a customized prompt that works fairly well to around ~60k tokens and then is a toss up on whether it's poisoned itself or I've directly steered it into the wrong. When it's clear that's happened, if it's important to continue, ask it to write a doc then start fresh.
I don't kno whow any one cold have witnessed the last 2 decades of American VC funded tech startups and tell themselves, "you know, this will be a reliable technolgy with no hidden problems".
Even a sober technical evaluation is just two steps:
1. You're proposing to build a app on a non-deterministic model.
2. That model is hosted behind a non-deterministic system (model alignment, model guardrails, system context subterfuge, cost/token pricing)
---
So you want to build your app and you think you're going to kep up with both #1 and #2?
We live in a non-deterministic world. Anything "deterministic" in it is a castle built on quicksand.
LLMs are, as far as the nastiness of the Real World goes, really fucking benign. Future models outperform past models, both in open weight land and at the big frontier labs. Performance per $ only ever goes up. That's just nice.
YES, but you seem to not understand that having two non-deterministic layers is incompatible. #1 is fine: it has random issue and you build around those random issues; those issues don't change unless you change them.
#2 is not fine; that non-determinism you do not control, have no insight into, etc.
I'm saying sure, give me #1 if it means I can build a harness around it and smooth over the edges. But I'm not taking #1 and #2. There's zero reasonable way to manae two non-deterministic systems.
Qwen is the Alibaba distilled Anthropic Claude model
So piracy on an by piracy trained ai model..
Piracy? Lol.
Alibaba didn't steal Opus weights, they used opus output to train their model.
If this is piracy, then so is reverse engineering efforts powering a bunch of Linux drivers.
If that's piracy, I'm going to the library and arresting everyone there!
Also, yeah, they already stole their copyrighted works, so a thief from a thief is still...theives?
What features/workflows have you added?
I am a huge fan of Copilot CLI. It just feels so logical and low-friction to use compared to Claude Code. Having the ability to juggle various models at will is really nice too. ("Plan this using Opus 4.6, let GPT 5.4 verify the plan and give feedback before implementing with Sonnet 4.6").
Unfortunately the June pricing change for Copilot forced me personally as well as my entire department at work to switch to Claude Code. With copilot we were hitting a few dollars of extra spend over the included credits in April and May, then in June we started chewing through the monthly budget every 2-3 days.
Just a completely insane price hike from the customer's perspective, I don't know what MS were thinking there.
Even if that is the price they need to be sustainable they should have waited until the competition changed their prices first. I wouldn't be surprised if Copilot lost 50% or more of their customer base last month.
Eventually this could be where all the major players set their prices, so the thought occurs to me that nations should run some form of "public access AI", just like they did for TV. Use the free open models and use tax money to finance a few datacenters. Geo-lock the use and set strict throttles to manage load, but let school children and citizens use that AI freely otherwise.
If Copilot's pricing is the level for all AI in a few years, only the unicorn companies can afford to use them, and everybody else has no chance of competing with a company that can use AI.
> they should have waited until the competition changed their prices first.
They did...
They're literally just passing on the costs https://platform.claude.com/docs/en/about-claude/pricing
Anthropic just provides a subscription - which Enterprise usually doesn't want you to use because everything you're submitting through that will be trained on / becomes part of their model.
So If you use it without explicit permission from your employer you may be committing a contract violation which can have serious consequences - up to jail time - as they can sue you for that.
It's a little more complicated than that, unfortunately.
If you use Claude via API in your own app, you're paying full price.
If you have an "API Plan" for Claude Code (i.e., free), you're paying full price.
If you have a Pro, Max, Max 5x, or Max 20x, your tokens are subsidized up until a rate limit. Then you pay full price for usage thereafter, until the end of the billing cycle.
The widespread belief in industry right now is that the per-seat pricing (which Copilot bailed from first) is going to go away in the near-term.
It's not more complicated? I referenced the subscription... I just added a small warning about it as some people may or may not be aware about the fact they're opening themselves up to serious consequences if they decide to use it on their employers code without explicit permission... Depending on their employers digression, eg largish entrenched employers which value their IP will be more willing to inflict damages on you. An upstart will likely not care unless the CEO sees an opportunity to profit personally.
i ran out of claude credits for the first time at work in months and had to fallback to copilot.
pleasantly surprised, claude's way ahead in tooling but the ability to designate what model your subagents use and having access to all models is a better feature than all of what claude offers combine atm.
The only limit on the amount of ai can consume in a month a work is dollars, so anything that helps with cost is the best model/harness for me.
It also did a better job at smart designating subagents itself where as claude often used higher cost models.
Letting them automatically pick the model is no longer sustainable, but there are some very efficient models that are capable of executing the plan created by a much nicer model. It’s kind of embarrassing to think that Microsoft’s auto model selection was choosing cutting edge reasoning models for tasks like resolving dependency conflicts back when their pricing was at a loss.
> I am a huge fan of Copilot CLI. It just feels so logical and low-friction to use compared to Claude Code.
Honest question, can you ellaborate? If given the option, I use OpenCode but what do you find in Copilot CLI that makes you prefer it to Claude Code?
It's a combination of small things really. The mentioned ability to easily call on various models in the same prompt, having agent definitions be able to orchestrate other agents just by mentioning it in the description, doing things like goal/loop automatically.
There is also IMO a distinct difference in "tone" in the dialogue. Claude seems to impersonate a human a bit more than I like.
Claude is of course very good as well and does a few things better than copilot too, but overall I'd prefer to use Copilot.
Not OP but Copilot CLI is really straightforward, almost minimal in some sense. It's a lot like OpenCode but stripped down.
I also use the Copilot ACP server inside Pycharm and that works decently well too, although it has some annoying bugs, but if you're a Jetbrains user you're used to annoying bugs.
The price hike was insane. My $dayjob is moving away from Copilot and into Claude Code subscriptions. In parallel we are testing AWS bedrock and Deepinfra for open weight models in preparation for when CC inevitably stops being such a good deal and aligns with actual token cost. Fun times.
I had to do the same. I expect everything will go token pricing, and at that point a LOT of small/mid businesses will drastically change how they use code.
I've swapped to the 20x Claude plan for a month or two to knock out two ideas I need to get it MVP - expecting Claude to go token priced soon.
Hell it’s past small and mid. I do work for a few of the fortune 100s and what I’m hearing is somewhere between “justify all of your usage or don’t use it” to “you now get 500 bucks a month, go over that and you’re getting it revoked”
I used GitHub Copilot for my VS 2026 development and switched between ChatGPT and Claude. That was before I discovered Claude Code and the Codex app. Copilot was OK for my purposes, and the USD 10 per month fee was enough for my usage.
However, last month they introduced a new pricing model ( I know the old pricing was not sustainable), and my USD 10 was exhausted within days. Because of that, I switched to Claude Code and Codex and have never looked back. Yes, tokens on Claude Code and Codex are subsidized heavily, but let's just enjoy when good things last.
I do feel there is a difference between using Claude via Copilot versus using Claude directly in Claude Code. I'm not sure what Microsoft is doing behind the scenes.
The harness is super important, what tools are available and the system prompts vary from harness to harness.
Anthropic seems to have a modest lead on their harness and models, so it’s a best-of-both-worlds scenario.
> I'm not sure what Microsoft is doing behind the scenes
It’s probably the exact same model, but the tools and the prompts around it are worse, so you get worse results.
Claude in Claude code has been shown to perform persistently worse in evals than claude + a minimal harness.
The harness was absolutely not an issue in my case.
The new pricing model where I got banned from using Opus entirely and half a day of work (with weaker models) consumed the 10$ plan was.
I'm now using a Claude Max subscription and I can get close to the daily limits but I'm fairly happy with the overall plan consumption.
So if you use Claude via Copilot in Zed... You use Zed's harness, I think? What does Copilot do, at that point?
I believe you are using https://github.com/github/copilot-cli or potentially this https://github.com/github/copilot-language-server-release#ag... via the Agent Client Protocol https://github.com/agentclientprotocol/agent-client-protocol which means you are indeed using Copilot's harness
ACP is just a standard that bridges harnesses easily into IDEs, Text Editors, or whatever consumes it (I wrote a TUI that consumes them)
The registry for all the agents (tool harnesses) is here https://github.com/agentclientprotocol/registry if you ever are curious to what Zed or IntelliJ are really hooking into
Ah OK, so the ACP connector ensures tool calls work with Zed, and communicates the available tools and their results to the harness, and then the harness mainly provides a system prompt and the API calls?
It’s providing the inference of Anthropic models
I had a similar experience moving away from Copilot within Zed. Now using the reasonix harness for Deepseek that makes cache hits almost free. And that's with unsubsidized American providers like Digital Ocean or Cloudflare.
Yep reasonix is an absolute case study of caching. They literally compiled byte level cache in their design and it is insane. i can one shot many workflows, apps in under 0.05 cents.
I tried using Zed but with local models it constantly breaks on tool calls. I wanted to like it but the smell of vibing is just too much.
You using models released this year? I hear this complaint a lot, and it's often due to using an old model which is not as good at tool calling as newer models.
What I noticed is that when the conversation starts the agent is pretty able to read from and write to files. As the conversation continues (and maybe sub agents are spawned) it forgets how to do this, complains, tries to resort to running shell or python code, sometimes it works. Sometimes it asks me to execute the code. If I refuse and point out it worked before than sometimes it remembers how to write, but mostly not and I need to start a new session.
When using Zed with the CoPilot integration I use Claude Opus and never had this issue.
Qwen 3.6 and 3.5...
What is the average monthly token price for daily reasonix use?
Nice.
I paid $6 yesterday for DeepSeek V4 Flash on OpenRouter. That's like $120 dollar for a month, and it's not even a good model.
For DS4 it's much cheaper and reputable to use OpenCode Go $10/mo subscription, or directly with DeepSeek API.
Thanks!
I'll try that.
That's quite an achievement, I managed to spend only 2$ on 16 different tasks of v4 pro.
Yeah, v4 flash is dirt cheap, but it's running in circles quite often.
Might very well be that a better model is cheaper if it gets things right the first try.
Maybe I should route to a better model when v4flash hasn't solved after a specific number of tokens.
I'm having great success with DS4 Pro as my main model, while using DS4 Flash for subagents.
Same ,I switched to cursor. I told it how to invoke msbuild and it can edit away without needing a native Visual studio plugin.. no problems at all. Target language c++
GitHub Copilot costs have ballooned in recent week, what once took $100 requires $300. I like using Claude with VS Code through Copilot and I feel it’s given me much better code, that I can control the quality. It’s much more transparent than Claude Code. It’s open source but and the IDE interface gives so many more features to have you context and control over whats generated. The increase in cost isn’t purely due to their price increases but also the Opus models agents use more tokens. So I’ve moved to Claude Code and I’m happily still using Opus 4.6. Fable and 4.7 seem to do much larger units of work, go off on tangents and make assumptions that frequently results in slop.
My copilot quota finished in maybe 2-3 prompts with claude 4.8 opus. i was expecting it to suck but not this bad. it was good while it lasted though
Finally an alternative to the big dogs that a company can use. People have been asking for a way to run the Chinese models from a trusted provider. Here GitHub delivered!
The performance, if we trust the benchmarks, put it at Sonnet 4.6.
Let’s see if it’s worth it with GitHubs pricing.
Microsoft needs to offer cheaper option since they change to token based billing. GPT-5.4 used to be x1 for yearly subscriber but now it cost 6x. i run out the premium request for just couple prompts. Github copilot for $10 used to be the best value since you get all the US AI labs model for cheap.
CoPilot was an insanely good value while it lasted. Only moneysoft could subsidize a service that much.
> The performance, if we trust the benchmarks, put it at Sonnet 4.6.
I don't trust these benchmarks. I used a number of times Kimi K2.7 and I was disappointed. It would run in circles for things that Claude would do in one shot. However, my usage was via Ollama cloud, and I have no idea if they serve the actual model or a quantized version, and it was the quantization that degraded the performance.
The great news, in my opinion, is the precedent. If Microsoft is now serving Kimi K2.7, then very soon they might start serving GLM 5.2, and that is indeed a very competitive model.
Check your harness. I use Kimi K2.6 for a lot of stuff with OpenCode and omp and it's extremely effective. I'm gonna try 2.7, but it should be capable model based on what I've seen with previous models.
> People have been asking for a way to run the Chinese models from a trusted provider
I'm going to be called a chiller again, but at this point I don't care as it is relevant. Synthetic runs their own models for a reasonable price, GLM5.2 & Kimi K2.7-Code included.
Referral link :
https://synthetic.new/?referral=kwjqga9QYoUgpZV
Being on Copilot means your employer lets you use it at work. It's essentially Copilot's primary value add in the new billing model.
Cloudflare offers Kimi and GLM
Input: $0.95
Cache hit (most important): $0.19
Output: $4.00
This is the same as how much Moonshot charges for it, and it puts it at roughly the price of GPT 5.4 mini, not a bad option.
For some context here is a stupid prompt that wastes tokens: "Play a game of tic tac toe against yourself on a 5x5 board, you need 5 in a row to win."
It costs $0.006 on Kimi K2.7, and you get to see the whole raw reasoning trace.
GPT-5.4 mini costs $0.016 and its summarized.
And in case you are wondering both play incredibly stupidly.
Kimi:
GPT 5.4 mini:Nice idea. I just asked Haiku to do the same in Claude Chat on iOS: it created a interactive react game, implemented the rules and let it play. Clever move for 1$ input and 5$ output, Anthropic!
Btw if anyone is wondering, GPT 5.5 does the same garbage as 5.4 mini for 4 times the cost.
Fable manages to make a reasonable game, at a cost of 40 cents.
While LLM models are bad at games, they are perfectly capable of writing a RL agent to train on the game itself.
when i will be extremely bored, I think I will make two models play chess against each other. I bet there's a chess benchmark / llm tournament already somewhere
In fact, you don't even need an LLM tournament when you can have tom7's Elo World tournament: https://www.youtube.com/watch?v=DpXy041BIlA
Models are bad at chess. I am using a middleman to help models play chess and experimenting. https://abhay-ai.github.io/R_Daneel_AI/
For any small team wanting to try Copilot, heed my warning that you will waste hours navigating their billing settings using various out-of-date documentation. Long story short, I finally got an email from them saying that "Copilot Business is available for teams purchasing 10 or more licenses". This is undocumented but other people are reporting the same: https://github.com/orgs/community/discussions/199346
We're sticking with Cursor for now, using Kimi as our daily driver (branded as "Composer").
Yes significantly cheaper to run compared to the other models, tried it for an hour yesterday and the results look promising.
Saw in a discussion on Reddit that the team is evaluating glm5.2 so hopefully more to come!
And why should one prefer GitHub Copilot over OpenCode? Worse harness, more expensive prices, unreliable product strategy, limited model support, the list goes on.
Is GitHub Copilot the best positioned platform for enterprise? They support Claude, GPT, Gemini, and now even open weight models. Larger orgs are paying at API rates anyway so it costs just as much as anywhere else. They have a pretty good agent CLI and SDK, and now a desktop app. They have hosted agents, and you can run their 'Agentic Workflows' in CI.
Has their reputation tanked so much that the alternatives get all the buzz? Or is it that non-enterprise users are priced out by the usage costs, so no free marketing?
The rugpull with the pricing change without further notice was not taken kindly by enterprice.
For some reasons compilot seems dummer than vscode Claude or vscode codex. I can’t tell what’s the exact reason but it didn’t feel right
Must be the system prompts. Ask copilot to dump its system prompt, and compare the system prompt with claude. It is not accurate but handy. I bet they are quite different
They were, until they decided to commit suicide for the service.
We just cancelled everyone's plans and rolled liteLLM out internally. We kept it for the insanely cheap tokens, but now that they've switched to the new pricing, they're just like openrouter, just with far fewer models.
Their harness is terrible compared to any of the other cli based harnesses I test against. Like shockingly bad.
This comes up all the time at work because the vendor management people don’t understand the llm ecosystem and think Claude through copilot is the same as Claude through Claude code.
A simple side by side comparison will show dramatic under performance 3 or 4 times out of five when I’m asked to explain the difference.
Who really cares? The model multipliers and the artificial currency were the final nail in the Github Copilot coffin.
Enterprises still have big contracts with github, those companies are imposing tight spending limits now and if the open weight models enable those limits to last a bit longer that's probably quite popular.
Looks like it’s the same price on Fireworks AI?
https://fireworks.ai/blog/kimi-k2p7-code
I don’t know much about them but they did a deal with Microsoft in March:
https://azure.microsoft.com/en-us/blog/introducing-fireworks...
https://docs.github.com/en/copilot/reference/ai-models/model...
Says that they are run by Moonshot
> These models are hosted on US-based Azure AI Foundry infrastructure managed by GitHub and Microsoft. Customer prompts and responses are not sent to the original model developers.
From your link: https://docs.github.com/en/copilot/reference/ai-models/model....
What's the credit cost compared to Gemini, Claude and GPT? As others have said, the last month price update killed copilot for good.
When will DeepSeek be available?
The V4 models are already in the Azure AI foundry so maybe a good chance of it coming.
Competition in coding models has gotten intense. A year ago it felt like choosing between two options. Now the bigger question is which model to route each task to.
When will GitHub Copilot support integrating custom models?
It does, but it's very poorly documented and quite unstable (on purpose i think). What the other commenter said about the VSCode BYOK seems to be the more reliable way.
I tried adding a Foundry LLM as Github Copilot custom model and failed miserably. But with VSCode BYOK (and Github Copilot as the interfact) i did get it working, and i can now use Deepseek V4 Flash with Copilot.
It has supported custom, local, any BYOM for quite a while.
I work at GitHub but even then I often use OpenRouter models in the CLI and Copilot App
AFAIK you can already use custom models in VSCode Copilot, but probably not for cloud workloads yet.
Copilot Chat supports BYOK since Oct 2025 for the VSCode plugin: https://code.visualstudio.com/blogs/2026/06/18/byok-vscode
A very sharp slap in the face of those of us who kept our annual plans and didn't ask for s refund: It seems it will not be available to annual subscriptions.
where does it say that? its not available to me (also annual) at the moment via cloud but it said it is rolling out gradually, so I'm not too concerned. Tho I'm not overly excited either given Copilot pricing now; I reckon this should be at most 1x.
Unlike Google, the AI wave appears to deliver positive revenue impacts for Microsoft.
The company does need to integrate the new AI-human-machine interface into its application development SDKs.
Is there a zero-retention option?
Where is the inference running?
Azure. It was already available on the Azure AI Foundry before.
https://docs.github.com/en/copilot/reference/ai-models/model...
On servers that are subject to the CLOUD Act. Expect no GDPR compliance.
Most European infrastructure runs on the big clouds, who are all subject to the same act. No one cares, unfortunately.
https://docs.github.com/en/copilot/reference/ai-models/model...
They are run by Moonshot itself, so probably china
That page states
> These models are hosted on US-based Azure AI Foundry infrastructure managed by GitHub and Microsoft. Customer prompts and responses are not sent to the original model developers.
So not in China.