Creator here. Solo developer, building in public.
Some context that didn’t fit above: we published the full benchmark data and the tool to reproduce it because we think the AI code gen space has a transparency problem. Everyone claims their tool is better, nobody shows data.
The “dev time” metric is the one I’m most proud of. It estimates how long you’d spend debugging the output before it’s production-ready. A model can score 95% but still hand you code with broken imports and failing tests — that’s 15 minutes of your time. The Loop’s goal is zero.
Website with more details: https://crtx-ai.com
Happy to answer questions about the benchmark methodology, the gap closing system, or the architecture. And if anyone runs crtx benchmark --quick with their own keys, I’d genuinely love to see the results.
Creator here. Solo developer, building in public. Some context that didn’t fit above: we published the full benchmark data and the tool to reproduce it because we think the AI code gen space has a transparency problem. Everyone claims their tool is better, nobody shows data. The “dev time” metric is the one I’m most proud of. It estimates how long you’d spend debugging the output before it’s production-ready. A model can score 95% but still hand you code with broken imports and failing tests — that’s 15 minutes of your time. The Loop’s goal is zero. Website with more details: https://crtx-ai.com Happy to answer questions about the benchmark methodology, the gap closing system, or the architecture. And if anyone runs crtx benchmark --quick with their own keys, I’d genuinely love to see the results.