Condor Roundup - April 2026

Table of Contents

Fast Cut

Lots of new models this month with GPT 5.5 emerging as a pull ahead winner. We’ll cover Anthropic’s various struggles and the challenges of LLM takeoff, likely a direct correlation to an overreliance on a troubled Opus model. I’m watching the chip struggles expand through the industry. Next generation model potential seems like it will be defined by the ability to get the latest chips, and supplement however necessary to provide sufficient inference.

From the Feeds

Model Wave

A great month for model launches, especially open models. Open:

Google’s Gemma 4, a smaller open weight model with an Apache 2.0 license. Nice to have an American open model, although performance is somewhat light.
Qwen 3.6-35B-A3B, this one made quite a splash in the local model community. It’s compact enough that you can realistically expect to host this model, and it has the ability to complete tasks. This also has implications for cost effective solutions to non-complex problems. This may be one of the last great Qwen launches after some internal shakeups.
Kimi K2.6. Getting to the larger models. A powerful model that is set up for agent swarm capabilities. Probably the best open model for coding out now. Given the availability of subsidized K2.6 tokens on various subscription plans, there’s a compelling cost difference for those not chasing frontier intelligence.
Deepseek V4, this has been anticipated since February so nice to see it finally out. Quite recent so it hasn’t been fully evaluated yet. It’s a large model with frontier benchmarks, but it’s generally unwise to trust the benchmarks completely as models are frequently overtrained with those benchmarks in mind. Most importantly, V4 marks new efficiency milestones. Managing attention and context compression, it allows for some cost efficient memory handling techniques. It’s also one of the first major models to be using Huawei chips rather than Nvidia for inference and some training. Chips are the real story of AI right now, so if there can be mass produced non-Nvidia chips, most importantly with a functional software ecosystem, that could signal a major change.

Closed:

Opus 4.7, honestly seems very close to 4.6. It has some performance improvements, and some various drama centered around the increased costs of its new tokenizer. It’s a model with a strong action bias, which is to say, it likes to just do things, whether you want it to or not. My problems with 4.7 are the same as 4.6. As I said in February “I don’t think I’ve ever seen a model where it could solve a problem, doesn’t hallucinate, and chooses to do a blatantly unsafe hack and then lie about it. It’s a real problem for working with an otherwise good model.” More on Anthropic later.
GPT 5.5. Trained on Nvidia’s Blackwell architecture chips. To my knowledge the first public model to really be Blackwell native. The next generation really shows. I have no idea why they named it 5.5. It’s highly skilled at problem solving, but has a lot of new generation rough edges. There are challenges around task focus and coordination, but it still stands in a league of its own. Amusingly, as of the time of writing Codex just added a /goal feature, which I’m guessing they released directly to address the early termination challenges 5.5 seems to have. The next generation of Vera Rubin native models will likely manage to make 5.5 seem antiquated. It suggests to me that the majority of tasks are going to be solved by improved model application rather than improved intelligence.
GPT Image 2. Not strictly a new model, but if you haven’t seen the latest AI image generation in a while check it out. It’s been interesting to watch the progression of “well it can’t do fingers or get details right” to “it can make one image, but it can’t consistently maintain backgrounds or complex details” and I think the complaints are getting down to “it’s not perfectly photo realistic, and the default line work suggests AI”. Not that I’m particularly looking forward to the implications of generated photo realism, but it does seem like we’re trending in that direction. It’s an incredible achievement in just a few years.

Anthropic’s Tough Month

Anthropic is in a strange spot. Growing in market share and approaching a $900B valuation, in practice, the company seems to be struggling. An API that is scraping by with one nine of uptime (90%+ availability), a code harness that struggles against the competition, and increasingly strange business decisions. Despite all that they are putting out a high volume of products and product changes.

Mythos. A scary model that would be apocalyptic if it ever got out. No you can’t see it. Yes, we accidentally gave out access. Mythos attained some fame for finding large amounts of vulnerabilities in code. If that impresses you, you’ve never worked in the cybersecurity space. Code vulns are everywhere, and the challenge is mostly in triaging solutions. Something that AI may actually help more than it hurts. Not to say there won’t be some disruptions in the meantime, but so far what’s out there is marketing hype. It has allegedly better performance on tasks in general, but a lot of those benefits likely come from safeguard removal which further hinders wide adoption.
Claude Code Leak. The reputation for Mythos took a further hit when Claude Code’s source leaked online after Anthropic published the entirety of the source code to an npm registry. “Because of human error, we swear. Our model would never do something like that.” Either way, the code was out there and people did not like what they saw. Rough code quality, memory leaks, and general chaos. If you believe the terminal-bench numbers, Claude Code isn’t even the best harness for Claude. If Mythos is in use internally on these projects it’s not particularly inspiring. An issue that extends to uptime and various platform bugs.
After general dissatisfaction with Claude performance reached a fever pitch, Anthropic put out a postmortem. Detailing three key issues centered around arbitrary reasoning level changes and cache mistakes. Each of these by themselves are concerning, but more concerning is that these identified problems likely just scratch the surface of under the hood problems.
Anthropic briefly removed Claude Code from their base pro plan. This was quickly retracted, but more significantly Opus was removed from the subsidized token options, which was not rolled back yet. Considering the cost difference this effectively removes Opus from the pro plan. Bluntly, $20 worth of Sonnet 4.6 is not going to get anywhere for more than local prototyping. We’ll see how this decision stands as this is a massive downgrade for consumers, where, beyond the obvious GPT alternative, there’s a real case that $20 of Deepseek is going to get you much further than $20 of Sonnet.

All of this points to a concerning pattern. I tell people that I’m less interested in the mistake than in how the mistake was made, and these mistakes all point to a specific pattern. An overreliance on a flawed Claude model that leads the company’s operations to share those flaws. There’s a concept of a fast takeoff in AI. That models will get smart enough that they can take the reins, begin recursively self improving and reach post human intelligence. The question is, what happens if a model takes the reins before it is ready? You’d get oceans of bugs, and rapid changes on an unsteady foundation. If the foundation is too unsteady, it may not even be takeoff at all.

I say all this with great affection for the Claude model. It’s been a steady near peer of GPT for most of the modern lifecycle in a way that has been very beneficial to anyone leveraging frontier AI. Having real alternatives keeps both companies on their best behavior. It’s a personal opinion, but I’m concerned that the peer status is falling away for frontier intelligence. I’d have to rate Opus 4.7 as closer to Deepseek V4 than it is to GPT 5.5 on complex technical problems. Maybe Mythos is better, but if Anthropic can’t provide it at scale it’s relegated to a toy. Plenty of time to turn it all around, and if not, I’m sure they can just be sad on a gigantic pile of money. Maybe we’ll finally get a new Google release to compete, or possibly something will finally happen with XAI. Good luck everyone.

On My Desk

A Full Company Platform

Large amounts of development work to solidify the core platform. Being a lean AI first startup allows us to really push the pace of development. It also provides a great opportunity to try out different models and see where and when they can actually be helpful. It’s those development principles that informed the most recent article on enterprise development.

Hiring

The company is hiring. If being in a fast paced AI focused environment appeals to you, reach out. Once this is less anonymous I’ll post a proper careers page, but for now you can contact me from the about section.

Looking Forward

Chips

Right now I’m more interested in chips than I am any specific lab. We’re seeing the first Blackwell architecture models coming out which are highly impressive. Blackwell chips are export restricted to China, but Huawei is starting to get real chips out to handle some of the gap. TPUs are at least improving. Given the tight restrictions on Blackwell chips, and the tighter restrictions on the next generation of Vera Rubin, cost and quality will be heavily determined by what the lab has access to.