An MIT study landed in late 2025 with a number that travelled fast: 95% of corporate generative-AI initiatives fail to deliver measurable ROI. It got quoted in every boardroom, usually as evidence that AI is overhyped. That is the wrong lesson. The interesting question is not why 95% fail — it is what the 5% do differently. We have worked inside enough projects from both groups to answer it, and the answer is uncomfortable for anyone hoping the fix is a better model.
Almost none of it is about the model. The 5% and the 95% are usually building on the same foundation models, the same clouds, the same frameworks. The difference is operational discipline applied before a single line of code is written.
Why the 95% fail
The failure modes are boringly consistent:
- No baseline metric. The project launches without anyone recording what the process cost, took, or scored before AI. With no baseline, there is no way to prove a gain — so even successful systems cannot demonstrate ROI, and the budget dries up.
- The use case was chosen for novelty, not value. Someone wanted to “do something with GenAI,” so they picked the demo-friendly idea rather than the boring, high-value, repetitive workflow that would actually move a number.
- No accountable owner. The project belongs to “innovation” or a committee. Nobody’s objectives depend on it shipping. Diffuse ownership produces diffuse results.
- Weak data foundations. The data is scattered, dirty, or locked behind access barriers, and the team discovers this halfway through. The project stalls in data remediation it never budgeted for.
- The pilot was never built for production. It was a notebook, a demo, a proof of concept with no path to a real system — no monitoring, no evaluation, no security review. It impresses in a meeting and dies on the way to deployment.
Notice what is not on that list: model quality, prompt engineering, choice of vector database, GPU availability. Teams obsess over those because they are the fun part. None of them is why projects fail.
What the 5% do differently
The successful pattern is almost mechanical. It is not clever. It is disciplined.
- They pick one bounded, high-value use case. Not a portfolio of five experiments — one workflow that, if automated or augmented, moves a metric the business already cares about. Depth over breadth.
- They write down the baseline and the target before building. “This process currently takes 14 minutes per case at a 9% error rate; we are targeting under 4 minutes at under 5%.” Now the project is falsifiable. Now ROI is provable.
- They assign one accountable owner. A named person whose objectives include the outcome, with the authority to make decisions and the budget to finish.
- They invest in data and evaluation before scaling. They fix the data for the one use case, build a golden evaluation set, and only expand once the first system is measurably working.
- They treat the pilot as a production candidate from day one. Monitoring, logging, security, and oversight are designed in, not bolted on. The pilot either becomes the product or proves it should not — both are useful outcomes.
The partner effect
One finding from the data is worth dwelling on: engagements involving external partners succeeded about 67% of the time, versus 33% for internal-only efforts. That is not an advertisement for consultants — it is a clue about why projects succeed. Outside partners tend to force the unglamorous discipline that internal teams skip: they insist on a baseline, they scope ruthlessly, they will not start without an owner, and they have seen enough pilots die to refuse to build one that cannot reach production. The value is the discipline, not the headcount. A disciplined internal team beats an undisciplined partner every time.
A test you can run this week
Take any AI project currently running in your organisation and ask four questions:
- What is the baseline metric, written down, from before the project started?
- Who is the single accountable owner, by name?
- What is the target number, and by when?
- Is there a path to production, or is this a permanent pilot?
If you cannot answer all four crisply, the project is statistically in the 95%. The good news is that every one of those gaps is fixable in a week, before more money goes in. The failure rate is high not because AI does not work, but because most organisations skip the cheap, dull discipline that makes it pay. The 5% simply did the homework first.