When AI Gets It Wrong

Last week a lawyer in New York submitted a legal brief full of case citations that looked perfectly legitimate. The court names were real, the formatting was impeccable, and the legal reasoning sounded convincing. There was just one problem: the cases didn’t exist. Every single citation had been invented by ChatGPT, and the lawyer hadn’t bothered to check. He was sanctioned by the court, his reputation took a hit, and the story made national headlines.

He’s not an outlier. Studies show that AI systems fabricate information in roughly one out of every three responses on certain topics. Newer “reasoning” models — the ones marketed as smarter and more capable — actually get facts wrong more often than their predecessors, with error rates climbing as high as 79% on some benchmarks. In 2024, businesses worldwide lost an estimated €62 billion because someone trusted AI output that turned out to be fiction.

These aren’t edge cases. This is how AI works right now, and understanding why is the first step toward using it safely. (If you haven’t already, read Your Data and AI for the privacy side of this equation.)

Newer reasoning models — marketed as smarter and more capable — actually get facts wrong more often than their predecessors, with error rates climbing as high as 79% on some benchmarks.

Why does AI make things up?

Here’s the core issue: AI doesn’t understand anything. It predicts the next word.

Think of it like a very well-read parrot. It’s heard millions of conversations and can mimic them convincingly, but it has no idea what the words actually mean. It’s reproducing patterns, not reasoning from understanding.

Large language models work the same way, just at a much larger scale. When ChatGPT writes a paragraph about the history of aviation, it’s not recalling facts from a mental filing cabinet. It’s generating text that statistically looks like something a knowledgeable person would write about aviation. Most of the time, that produces accurate information. But when the patterns in its training data are thin, contradictory, or missing, the AI doesn’t say “I don’t know.” It keeps predicting the next word anyway — and what comes out is a confident-sounding fabrication.

Researchers call these fabrications “hallucinations,” which is a bit misleading because it implies something unusual. In reality, hallucination is baked into how these systems work. Every response is a prediction, and sometimes predictions are wrong.

Human admitting uncertainty versus AI generating confident text regardless of knowledge gaps

How often does AI get things wrong?

More often than you’d guess, and the answer depends heavily on what you’re asking about.

For general knowledge — “What’s the capital of France?” or “Who wrote Hamlet?” — the best models hallucinate less than 1% of the time. That’s genuinely impressive. But the numbers climb fast as topics get more specialized. Legal information sees error rates around 6-7% even in top-performing models. Programming advice carries roughly a 5% error rate. Medical and scientific questions sit somewhere in between.

Those percentages might sound small, but think about what they mean in practice. If you ask an AI five legal questions a day, you’ll get a wrong answer roughly every three days. If you’re using AI to write code, about one in twenty suggestions will contain a bug the AI presented with complete confidence.

And here’s the part that surprises most people: newer, more advanced models are actually getting worse at factual accuracy in some areas. OpenAI’s latest reasoning models get facts about people wrong 33% of the time — double the error rate of earlier versions. The cause is a shift in training methods. Companies have burned through most of the available internet text for traditional training, so they’ve moved to “reinforcement learning” approaches that improve math and coding skills but seem to hurt factual reliability.

The result is a strange paradox: AI is getting simultaneously more capable and less trustworthy.

AI is getting simultaneously more capable and less trustworthy — a strange paradox driven by training methods that improve reasoning but hurt factual reliability.

What do AI mistakes actually look like?

Knowing what to watch for makes a huge difference. AI errors fall into a few recognizable patterns.

Invented sources

This is the lawyer’s problem. AI loves to generate citations, references, and quotes that don’t exist. One study found that out of 178 citations ChatGPT provided, 28 were completely fabricated — no matching article existed anywhere on the internet. The citations looked real. They had plausible author names, journal titles, and even fake DOI numbers. But they were fiction.

If an AI gives you a source, verify it before you use it. Search for the title in Google Scholar, check the DOI, look up the author. This takes thirty seconds and can save you serious embarrassment.

Confident nonsense

AI never hedges the way a person would. A human expert says “I think it was around 1987, but I’d need to double-check.” An AI says “This occurred on March 14, 1987” — even when the date is completely made up. The more specific and precise an AI answer sounds, the more suspicious you should be. Real expertise usually comes with caveats; AI-generated text rarely includes them.

Outdated information presented as current

Most AI models have a knowledge cutoff — a date after which they have no information. But they almost never volunteer this. Ask about a company’s current stock price or a recent election result, and the AI may give you an answer from two years ago without mentioning that its data is outdated. Always check when the model’s training data ends, especially for anything time-sensitive.

The telephone game with facts

Sometimes AI gets the broad strokes right but mangles the details. It might attribute a quote to the right person but get the wording wrong, or describe a real study but misreport its findings. These partial errors are especially dangerous because the response feels correct — you recognize the general topic — so you’re less likely to verify the specifics.

Four types of AI errors: fake citations, false precision, outdated facts, and misreported study findings

How to fact-check AI output

You don’t need to become a professional fact-checker. But you do need a few habits that catch the most common errors.

The two-source rule

Never trust AI output on anything important without checking at least two independent sources. If the AI says a law was passed in 2019, look it up. If it quotes a statistic, find the original study. This sounds tedious, but it gets fast with practice — and it’s far less painful than discovering an error after you’ve already used the information.

Ask the same question differently

Here’s a simple trick: rephrase your question and ask the AI again. If the answer is factually grounded, you’ll get consistent information both times. If the AI was hallucinating, the second answer will often contradict the first. Reliable facts stay stable; fabrications shift.

Watch for the red flags

Certain patterns should trigger your skepticism immediately:

Suspiciously perfect answers. If the response is exactly what you wanted to hear, in exactly the format you needed, slow down. Reality is messier than that.
Extreme precision without sources. “The study found a 73.4% improvement” — which study? If the AI doesn’t tell you where it got a specific number, treat it as suspect.
Unanimous agreement. Real topics have nuance and disagreement. If the AI presents a complex issue as completely settled, it’s probably oversimplifying.
Unfamiliar proper nouns. Made-up people, organizations, and publications are a hallmark of AI fabrication. If you’ve never heard of a cited expert or institution, look them up.

Use the right tools for verification

Google Scholar is your friend for academic claims. Official government websites handle legal and regulatory questions. For statistics, look for the original report rather than taking the AI’s word for it. If you’re checking medical information, PubMed and established health organizations like the WHO or CDC are far more reliable than any AI model.

When a model hallucinates a conclusion, the chain-of-thought hallucinates a justification. The confidence never wavers — because confidence is what reasoning text looks like in the training data.

Bias: the mistakes you don’t notice

Not all AI errors look like errors. Some show up as bias — skewed perspectives, missing context, or subtly unfair recommendations that seem reasonable on the surface.

AI systems learn from training data that reflects every bias in human society. If the internet contains more career advice written for men than women, the AI absorbs that imbalance. If medical research historically underrepresented certain ethnic groups, the AI inherits those blind spots.

This matters because biased AI output often looks correct. The grammar is perfect, the reasoning sounds logical, and the recommendations seem data-driven. But they’re shaped by patterns in data that may not represent your situation fairly.

A practical test: if you’re using AI for anything that involves people — hiring advice, medical questions, financial planning — try changing the demographic details in your prompt and see if the answer changes. Ask for career advice for “James” and then for “Jasmine.” If the recommendations differ in ways that seem tied to assumed gender rather than qualifications, you’ve found a bias.

AI bias test showing different recommendations when demographic details change in the same question

Model drift: yesterday’s good answer, today’s wrong one

Here’s something most people don’t realize: the same AI model can give you different answers to the same question on different days. This is called model drift, and it happens because AI companies constantly update and retrain their models.

A prompt that worked perfectly last month might produce garbage today because the underlying model changed. Businesses that built workflows around specific AI behaviors have been burned by silent updates that altered how the model responds.

The practical takeaway: don’t assume AI output is repeatable. If you’re building anything important on top of AI — a process, a report, a decision — verify the output each time, even if you’ve asked the same question before and gotten a good answer.

Understanding what AI can and can’t do

Every AI model has a comfort zone. Understanding where that zone ends helps you know when to trust the output and when to double-check.

AI is generally strong at: summarizing text, translating between languages, brainstorming ideas, explaining well-documented concepts, writing first drafts, and pattern recognition in data.

AI is generally weak at: precise factual claims, math and logic, recent events, niche or specialized topics, anything requiring judgment about context, and distinguishing reliable sources from unreliable ones.

The gap between these two lists is where most problems live. People see AI perform impressively on tasks in the first category and assume it’s equally reliable for tasks in the second. It isn’t.

Think of it like a friend who’s a brilliant conversationalist but terrible with dates and numbers. You’d happily ask them to help you brainstorm a presentation, but you wouldn’t trust them to file your taxes without checking their work.

What to do now

Adopt the two-source rule. For any AI output you plan to act on, verify the key claims against at least two independent sources. Make this automatic, not optional.
Check every citation. If an AI gives you a source, reference, or quote, look it up before using it. Fake citations are one of the most common and embarrassing AI errors.
Know your model’s cutoff date. Find out when your AI tool’s training data ends. For anything that happened after that date, the AI is guessing — treat its answers accordingly.
Rephrase and re-ask. When accuracy matters, ask the same question two or three different ways. Consistent answers are more trustworthy; contradictions are a warning sign.
Match your skepticism to the stakes. Quick brainstorming? Take the AI’s output at face value. Legal advice, medical decisions, financial planning? Verify everything, and consider consulting a human expert.
Test for bias. When AI output involves people or demographics, vary the details in your prompt to see if the response changes in ways that seem unfair or stereotyped.
Never assume repeatability. AI models change. An answer that was correct last month might be wrong today. Verify important outputs each time, even for questions you’ve asked before.

AI is a powerful tool, but it’s a tool that lies with a straight face. The more comfortable you get spotting its mistakes, the more safely you can use it for the things it actually does well. Next up in this series: How to Talk to AI — practical prompting techniques that improve output quality and keep your data safe.