September 18, 2025—10 min read

The pilot that never ends

Every enterprise AI pilot has the same failure mode, and it is not technical. Here is what actually kills them, and how to design a pilot that cannot die.

There is a pattern I have seen at least a dozen times in the last eighteen months, and it is starting to look like a law of nature.

A large company decides to "do something with AI." A vendor or a consultant, sometimes me, gets hired to run a pilot. The pilot has a narrow scope, a defined success metric, a timeline of six to twelve weeks. The work gets done. The demo is impressive. The metric gets hit. The executives nod. The slide deck is circulated. Everyone is happy.

And then nothing happens.

The pilot does not roll out to production. It does not get rolled out to other teams. It does not get renewed, killed, or escalated. It enters a limbo state where it is "successful" and also "not really being used by anyone." Six months later someone you have never met asks in a quarterly review, "whatever happened to that AI thing?" Someone else says, "I think it's still running." That is the last time anyone talks about it.

This is the pilot that never ends. It is not a technical failure. It is a political and structural failure, and if you are building AI systems for enterprises in 2026 it is the single biggest thing that will kill your revenue and your reputation. Nobody in the technical community is talking about it because we all want to believe the problem is the model.

The problem is not the model. The problem is that the pilot was designed to impress the people who approved the budget, not to survive the people who have to adopt it. "Impress" and "adopt" are not the same job.

§ 01

The anatomy of a pilot that dies

Let me describe this in enough detail that once you see it, you will see it everywhere.

The pilot is sponsored by an executive, usually at the VP or C-level, who is excited about AI and has budget. They pick a use case that is "high-visibility" because it will make for a good demo. The use case is often one that nobody on the operational side asked for. It is something the sponsor thinks would be impressive, not something the people doing the work think would be useful. This is the first problem, and it is already fatal.

The pilot is then built by an external team, consultants, a vendor, the AI "center of excellence" team, who are competent and fast. They ship on time. They hit the metric. The metric is usually something like "accuracy of the extraction," because that is measurable in a sandbox. The metric is almost never "percentage of the real workflow that has been replaced by this system, three months after launch, with the real users, under real conditions," because that metric is much harder to measure and much scarier to stake your budget on.

At the end of the pilot, the thing is handed off to an operational team that has not been involved in building it. This team is busy. This team has a quarterly roadmap that does not include "adopt the new AI thing." This team does not know how to debug it when it fails, does not know who to call, does not feel any ownership over it. The thing sits in their environment. It runs. It does something. Nobody turns it off. Nobody uses it. Six months later it is still technically "deployed" and functionally dead.

The pilot has succeeded on paper and failed in reality. Everyone involved has an incentive to pretend otherwise, because nobody wants to be the person who killed it, and nobody wants to admit they did not get adoption. So the pilot lingers. Then the next pilot starts, with a different use case, and the cycle repeats.

This is not a rare failure mode. It is the modal outcome of enterprise AI pilots in 2026. If you are running them and it is not happening to you, either you are doing something specific to avoid it, or it is happening and you have not looked hard enough.

§ 02

The six things that make pilots survive

The sponsor is not the executive. The sponsor is the line manager whose team is going to use the thing. The executive signs the check. The manager defines success. This is a small organizational move that changes everything, because now the person deciding what gets built is the person who will have to live with it every day.
The pilot is scoped to a workflow that the end users would rank in the top three of their "things I hate doing this week." Not top ten. Top three. If the thing the AI system replaces is not actively painful to the people doing it, they will not adopt it. The cost of learning a new tool is higher than the cost of continuing to do the painful thing the way they already know how.
The success metric is adoption, not accuracy. Specifically: "eighty percent of the workflow in question, across at least three real users, has been replaced by this system by week six, and those three users prefer the new way." That is the metric. Accuracy is a prerequisite, not the goal. If the accuracy is bad, adoption will not happen, so accuracy gets solved as a side effect. But if the accuracy is great and the adoption is bad, the project is a failure, and you should say so.
The end users are involved in the build from week one. Not "interviewed at the beginning." Involved. They review the system, they break it, they complain about the parts they do not like, they suggest changes, they get the changes. If they are not involved, you are building for the sponsor, and the sponsor is not going to use it, so the thing you build will not match what the users need. This sounds obvious. It is obvious. Almost no pilots do it.
The hand-off is not a hand-off. The team that builds the system is the team that maintains it for the first ninety days after launch. This is the most unpopular thing I tell enterprise clients, and it is also the most important. If you build it and throw it over the wall, it dies. If you build it and run it live for ninety days, with the real users, with the real failures, fixing bugs as they appear, training the users, showing them how to debug, documenting the weird cases, then by day ninety you have a system that is embedded in the daily work of those users, and they cannot give it up. That is what adoption actually looks like. Not a PowerPoint slide. An embedded workflow.
And this is the hardest one: the pilot has a kill criterion as well as a success criterion. "If by week twelve adoption is below fifty percent, we shut it down." The kill criterion exists because if you do not have one, the pilot will linger in limbo, which is worse than either success or failure. Limbo is the expensive outcome. Kill it on time, learn the lesson, and deploy the budget somewhere useful. The willingness to kill a pilot is the thing that separates mature AI programs from immature ones.

Do those six things and your pilot will either become a real system or an honest failure. Both of those are good outcomes. The middle, the pilot that never ends, is the only bad one.

§ 03

A word for the vendors

A last word for the vendors, consultants, and agencies who are reading this. When you are pitching a pilot, you are being invited into a political structure that is set up to produce limbo. The executive sponsor wants the demo. The operational team wants to be left alone. The incentives are aligned for you to ship the impressive-looking thing and disappear with the fee.

Do not do this. If you want to be the vendor that enterprises trust for the next decade, be the one that insists on the adoption metric, insists on the end-user involvement, insists on the ninety-day embedded run, and insists on the kill criterion. Your first client will be uncomfortable. They will also be your reference for the next ten.

This is how I sell pilots, and it is how Kingstone runs the larger engagements. The clients who say yes to these terms are the ones who are serious. The ones who balk wanted a demo, and I would rather find that out on the sales call than at the end of the pilot.

So if you are running an AI pilot right now, ask yourself: what is the success metric, and is it adoption or accuracy? If it is accuracy, you are in the trap. Rewrite the metric while you still can. The difference between a pilot that ships and a pilot that lingers is exactly that one rewrite.

← More essays