February 21, 2026

Coding Challenges AI Can't Solve (and How to Design Them)

If an AI can write any arbitrarily complex algorithm, what is it that the engineer actually does? If you can answer that clearly, you can design challenges that test for it. We think the work breaks down into three distinct skills.

1. Understanding intent

Business directives are almost always vague and underspecified. The engineer's real job is to take something ambiguous and arrive at a concrete specification, and understand the implications of the choices they're making along the way.

Here's a simple example. Say you're tasked with building a "delete my account" feature. On the surface it sounds straightforward, but what does it actually mean? Should it delete a row from a database? What side effects might that have? If the account has shared content with other users, should that content be deleted too, or just anonymized? Are there legal requirements to purge the data from your servers, or conversely, requirements to retain certain logs? You can't even begin to ask the right questions here without a solid understanding of how systems work and how they connect to each other.

2. Silent decisions

This is a subtler version of the same problem, taken one level deeper. It's possible to make decisions in code without anyone realizing a decision was made. These choices might be perfectly fine, but it matters that someone notices they happened.

Picture this: you ask an AI to implement a shopping cart. It writes clean, well-structured code for adding items, removing items, and calculating totals. All the tests pass. Everything looks good. But buried in the implementation are assumptions. Nobody considered what happens when the same user has the app open on two devices at once, or what happens if an item goes out of stock between being added to the cart and checkout. The AI didn't make a wrong decision about concurrency. It made no decision about concurrency, and that absence is itself a decision.

This is a pattern that shows up constantly. Code can be correct in the narrow sense — it does exactly what was asked, the tests pass, and a code review focused on implementation quality would approve it without hesitation. The failures happen upstream, in the gap between what was asked and what should have been specified.

3. Orchestrating agents

Once you know what's intended and you can express it precisely, you still need to actually build the thing. In practice, this means giving the AI detailed, structured guidance, especially for multi-step or complex tasks where there are many reasonable ways to approach the problem. And sometimes the AI simply can't complete a task on its own. You might need to give it specialized tools, set up a harness so it can inspect intermediate results, or build a way for it to verify its own output. Knowing when and how to do that is a skill in itself.

Designing the challenge

We start every challenge with a real-world scenario and a desired business outcome. Rather than narrowly specifying how the problem should be solved, we create an environment where multiple solutions are possible and some work meaningfully better than others.

The initial task shouldn't be too hard to get working. The real challenge is in solving it well, and in understanding what the chosen approach implies for edge cases, maintenance, and future requirements.

From there, we add a follow-up task that brings those implications to the surface. If a candidate's initial solution made silent assumptions, this is where those assumptions get tested.

Finally, we force the candidate to close the loop on their own work. For example, we might dump the system's output into thousands of lines of unstructured logs, meaning the candidate has to write a parser just to verify that what they built actually works. We're not interested in whether someone can produce code. We want to see whether they can take ownership of an outcome.

Figuring out what the code should be doing has always been the hard part. That's what we test for. If your hiring process is still measuring how fast someone can invert a binary tree from memory, you're selecting for a skill that gets less valuable every month while ignoring the one that gets more valuable every day.

Want to try AI-native assessments?