We gave the same code audit to Claude Opus 4.8 and MiniMax M3. Same codebase, same prompt, 17 known bugs planted in advance. MiniMax M3 caught 13 of them for $0.07. The cheapest Claude run caught the same 13 for $1.30. A few things stood out: → Every run caught the big blockers (missing auth, unsafe outbound requests, a worker that could double-send). → More reasoning didn't move in one direction. Claude at max cost 67% more than xhigh and returned nothing better. → MiniMax M3 had the lowest cost per issue by a wide margin. The real takeaway is matching the run to the job. MiniMax M3 for high-volume audits, Claude at xhigh for the most thorough single pass. Open-weight models are closing the gap fast.
About us
Kilo is the all-in-one agentic engineering platform for software developers. Build, ship, and iterate faster with the most popular open source coding agent. 3M+ Kilo Coders. 25T+ tokens processed. Engineers ship faster when their tools work with them, not against them. Check out kilo.ai
- Website
-
https://kilo.ai/
External link for Kilo
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
Locations
-
Primary
Get directions
455 Market St
San Francisco, California 94111, US
Employees at Kilo
Updates
-
Three new models hit Kilo this week, and they're all chasing the same thing: more context, more speed, lower cost. → MiniMax M3 has a 1M context window and is natively multimodal. → Step 3.7 Flash gets you 256k context with three reasoning levels. → NVIDIA's Nemotron 3 Ultra is a 550B open-weight model that Kilo users have already pushed to #1 on OpenRouter, beating Hermes and OpenClaw combined. Step 3.7 Flash and Nemotron 3 Ultra are both free in Kilo right now. The pattern across all three is the part to actually pay attention to. Open-weight models are closing the gap with frontier models fast!
-
-
This is what happens when the people closest to the problem get the tools to solve it themselves.
I built my own social media analytics intelligence system from scratch. In Kilo. Naturally. It's a full relational database that tracks posts across four platforms, with daily performance snapshots, competitive intelligence on competitors, campaign attribution, and a unified cross-platform query layer. Most analytics tools only show you what happened, so I built one to show you why and compared to what. And I made it sexy. What it actually does: Cross-platform normalization: Every platform structures data differently (LinkedIn counts clicks as engagement, X doesn't; impression definitions vary). I built a schema that makes them comparable without losing platform-specific nuance. Time-series tracking: Daily snapshots of follower growth, platform totals, engagement patterns. It's not just "here's this month". I can actually query growth curves, spot inflection points, and see how changes in content strategy correlate with metric shifts. Competitive benchmarking: Monthly tracking of competitors. Follower counts, post volume, engagement rates, top-performing content, so I'm not just measuring my performance. I'm measuring it against the whole field. Campaign attribution: Many-to-many tracking across posts and platforms. A single campaign might span 30 posts across X and LinkedIn over two weeks. The system connects them and rolls up the total impact. Content intelligence: 13 content bucket categories. Which types perform better where? What's the engagement-per-impression by bucket? What should I do more of vs. less of? The stack: Supabase (PostgreSQL backend), TypeScript, proper foreign keys, separation of per-post metrics from daily platform aggregates. The hardest part wasn't the code because Kilo handled that. It was designing a schema flexible enough to handle platform differences but rigid enough to give clean answers. But here's the coolest part: the database even has an AI assistant that knows the entire schema, understands my analytics context, and can write queries on the fly. I can ask "which content buckets performed best last month" or "how did my engagement rate compare to my top three competitors" in plain English, and it writes the SQL, runs it, and explains what it means. It's like having a data analyst on call 24/7 who actually understands my social strategy. I used Kilo to build the analytics system I wish existed. ~32,000 lines of code later, I'm still not a software engineer, but apparently I don't need to be one anymore either. What I produced really ain’t bad for a marketer!
-
Kilo reposted this
Organizations are seeing their GitHub Copilot bills jump 3-10x overnight. On June 1, Copilot moved to usage-based billing. A cost that enterprises treated as fixed for three years is now a meter that runs with every agentic task, and the bill became impossible to predict. Alex, Ari, Olesya, and Brian are at the Gartner Summit this week, and it's the conversation in every hallway. We've been saying this for a year. The era of subsidized, all-you-can-eat AI was always going to end, and betting your whole roadmap on one vendor's pricing was always a risk. This is the final cherry on top of a year of vendor rug pulls. The best lever for keeping agentic costs sane is matching the task to the right model. Route the hard orchestration work to a frontier model, while sending others to cheaper open-weight ones. Doing that by hand on every prompt is hopeless, which is the whole point of what we're building. With Kilo you get 500+ models from 60+ providers in one place. Auto Model routing picks the right model per task, so no need for babysitting a meter or switching models by hand. And you get full visibility into spend, down to the individual developer. Open source, transparent pricing, and no markup. When a vendor reprices, you get to keep your workflows. If this one's hitting home, I wrote up the breakdown here: https://lnkd.in/gADik2VC
-
Ari spent yesterday vibe coding from his phone on a plane and somehow still found time to tell half the Gartner APPS Summit to go try Qwen 3.6 Plus. When someone recommends a model that hard from 30,000 feet, we pay attention. 👀
Been sending this to friends from the Gartner APPS Summit today who were curious about open weight LLMs but not sure where to start. So I thought why not share more widely. I strongly recommend goving Alibaba Cloud Qwen models a try. Qwen 3.6 Plus has become a daily driver for a lot of us for both coding and knowledge work — already incredibly affordable, and we currently have it in Kilo at 50% off (but still with fast inference / high throughput). Yes, there are newer oss models but this one is basically guaranteed to impress you. Give it a spin and let me know your thoughts! And yes, I’m vibe coding from my phone on a plane right now 😀
-
-
Kilo reposted this
This is a bigger deal than it seems if you, like me, are a fan of model agnostic tools. A lot of orgs, especially US pubsec, have had reservations about adopting open weights models because a lot of the best ones (till now) have been Chinese. High-quality US-based open weights will change this dynamic and make model agnostic tooling way more feasible in all kinds of regulated industries with stringent supply chain requirements.
Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.
-
Half of grep is just guessing what the original author named the thing. Customer? User? Kunde? Person? Codebase indexing is back in Kilo Code, generally available today. Describe the concept, get the files and line ranges back in one call. No vocabulary game. It came back because the community brought it back. The core implementation was built by contributor shssoichiro, ported into the v7 architecture, and hardened through weeks of prerelease feedback from people testing it in real repos. Opt-in, local-friendly, and honest about the tradeoffs. Grep still wins when you know the exact symbol. Semantic search wins when you don't.
-
-
The era of free compute is over, and most teams haven't priced in what that means yet. GitHub Copilot moved to usage-based billing on June 1. Anthropic has been steering people off flat subscriptions for a while. The pattern is the same everywhere: subsidized, all-you-can-eat AI was never sustainable, and the bill is coming due. Here's the part that matters for engineering teams. If your entire workflow is tied to one provider, their next repricing isn't a line item. It's a dependency you didn't choose. When the economics shift, you find out what your tier still includes, which models got deprecated, and how much runway you actually have. Kilo bet the other way from day one. Transparent pricing, BYOK, 500+ models, no surcharge. Model choice isn't a premium feature you unlock at checkout. It's the thing that keeps a repricing from becoming a fire drill. Full breakdown in the carousel. 👇
-
New in the Kilo Marketplace: the Architect agent. Most planner agents take a vague prompt, guess at what you meant, and hand you a polished plan that falls apart the moment you build it. Architect doesn't. When the request is underspecified, it grills you first: what matters, what's constrained, what can wait, what success looks like. Then it writes a plan you can actually execute. Its core was inspired by Matt Pocock's Grill-Me skill. Install it free and try it on a real task. We'd love to hear where it helps and where it gets in the way. How to use it + the full story in the comments.
-
-
Kilo reposted this
On Monday, GitHub Copilot’s new pricing went into effect. Even their own pricing calculator couldn’t predict the size of the new bills to enterprise users. This is hands-down the top topic at Gartner APPS Summit today.