explainlikeimfive

ELI5: What does it mean when a large language model (such as ChatGPT) is "hallucinating," and what causes it?

I've heard people say that when these AI programs go off script and give emotional-type answers, they are considered to be hallucinating. I'm not sure what this means.

https://www.reddit.com/r/explainlikeimfive/comments/1lu1fqp/eli5_what_does_it_mean_when_a_large_language/
Reddit

Discussion

Twin_Spoons

There's no such thing as "off-script" for an LLM, nor is emotion a factor.

Large language models have been trained on lots of text written by humans (for example, a lot of the text on Reddit). From all this text, they have learned to guess what word will follow certain clusters of other words. For example, it may have seen a lot of training data like:

What is 2+2? 4

What is 2+2? 4

What is 2+2? 4

What is 2+2? 5

What is 2+2? 4

With that second to last one being from a subreddit for fans of Orwell's 1984.

So if you ask ChatGPT "What is 2+2?" it will try to construct a string of text that it thinks would be likely to follow the string you gave it in an actual conversation between humans. Based on the very simple training data above, it thinks that 80% of the time, the thing to follow up with is "4," so it will tend to say that. But, crucially, ChatGPT does not always choose the most likely answer. If it did, it would always give the same response to any given query, and that's not particularly fun or human-like. 20% of the time, it will instead tell you that 2+2=5, and this behavior will be completely unpredictable and impossible to replicate, especially when it comes to more complex questions.

For example, ChatGPT is terrible at writing accurate legal briefs because it only has enough data to know what a citation looks like and not which citations are actually relevant to the case. It just knows that when people write legal briefs, they tend to end sentences with (Name v Name), but it choses the names more or less at random.

This "hallucination" behavior (a very misleading euphemism made up by the developers of the AI to make the behavior seem less pernicious than it actually is) means that it is an exceptionally bad idea to ask ChatGPT any question do you do not already know the answer to, because not only is it likely to tell you something that is factually inaccurate, it is likely to do so in a way that looks convincing and like it was written by an expert despite being total bunk. It's an excellent way to convince yourself of things that are not true.

2 days ago
therealdilbert

it is basically a word salad machine that makes a salad out of what it has been told, and if it has been fed the internet we all know it'll be a mix of some facts and a whole lot of nonsense

2 days ago
minkestcar

Great thing - I think extending the metaphor works as well:

"It's a word salad machine that makes salad out of the ingredients it has been given and some photos of what a salad should look like in the end. Critically, it has no concept of _taste_ or _digestibility_, which are key elements of a functional salad. So it produces 'salads' that may or may not bear any relationship to _food_."

1 day ago
RiPont

...for a zesty variant on the classic garden salad, try nightshade instead of tomatoes!

1 day ago
MollyPoppers

Or a classic fruit salad! Apples, pears, oranges, tomatoes, eggplant, and pokeberries.

1 day ago
h3lblad3

Actually, somewhat similar, LLMs consistently suggest acai instead of tomatoes.

Every LLM I have asked for a fusion Italian-Brazilian cuisine for a fictional narrative where the Papal States colonized Brazil -- every single one of them -- has suggested at least one tomato-based recipe except they've replaced the tomato with acai.

Now, before you reply back, I'd like you to go look up how many real recipes exist that do this.

Answer: None! Because acai doesn't taste like a fucking tomato! The resultant recipe would be awful!

1 day ago
Telandria

I wonder if the acai berry health food craze awhile back is responsible for this particular type of hallucination.

1 day ago
polunu

Even more fun, tomatoes already are in the nightshade family!

1 day ago
hornethacker97

Perhaps the source of the joke?

1 day ago
Three_hrs_later

And it was intentionally programmed to randomly substitute ingredients every now and then to keep the salad interesting.

1 day ago
ZAlternates

It’s autocomplete on steroids.

1 day ago
Jwosty

A very impressive autocomplete, but still fundamentally an autocomplete mechanism.

1 day ago
wrosecrans

And very importantly, an LLM is NOT A SEARCH ENGINE. I've seen it referred to as search, and it isn't. It's not looking for facts and telling you about them. It's a text generator that is tuned to mimic plausible sounding text. But it's a fundamentally different technology from search, no matter how many people I see insisting that it's basically a kind of search engine.

1 day ago
simulated-souls

Most of the big LLMs like ChatGPT and Gemini can actually search the internet now to find information, and I've seen pretty low hallucination rates when doing that. So I'd say that you can use them as a search engine if you look at the sources they find.

1 day ago
aurorasoup

If you’re having to fact check every answer the AI gives you, what’s even the point. Feels easier to do the search myself.

1 day ago
JustHangLooseBlood

To add to what /u/davispw said, what's really cool about using LLMs is that, very often I can't put my problem into words effectively for a search, either because it's hard to describe or because search is returning irrelevant results due to a phrasing collision (like you want to ask a question about "cruises" and you get results for "Tom Cruise" instead). You can explain your train of thought to it and it will phrase it correctly for the search.

Another benefit is when it's conversational, it can help point you in the right direction if you've gone wrong. I was looking into generating some terrain for a game and I started looking at Poisson distribution for it, and Copilot pointed out that I was actually looking for Perlin noise. Saved me a lot of time.

1 day ago
aurorasoup

That does make a lot of sense then, yeah! I can see it being helpful in that way. Thank you for taking the time to reply.

1 day ago
davispw

When the AI can perform dozens of creatively-worded searches for you, read hundreds of results, and synthesize them into a report complete with actual citations that you can double-check yourself, it’s actually very impressive and much faster than you could ever do yourself. One thing LLMs are very good at is summarizing information they’ve been fed (provided it all fits well within their “context window” or short-term memory limit).

Also, the latest ones are “thinking”, meaning it’s like two LLMs working together: one that spews out a thought process in excruciating detail, the other that synthesizes the result. With these combined it’s a pretty close simulacrum of logical reasoning. Your brain, with your internal monologue, although smarter, is not all that different.

Try Gemini Deep Research if you haven’t already.

1 day ago
iMacedo

Everytime I need accurate info from Chat GPT, I ask it to show me sources, but even then it hallucinates a lot

For example, recently I was looking for a new phone, and it was a struggle to get the right specs for the models I was trying to compare, I had to manually (i. e. Google search) doublecheck every answer it gave me. I then came to understand this was mostly due to it using old sources, so even when asking it to search the web and name the sources, there's still the need to make sure those sources are relevant

Chat GPT is a great tool, but using it is not as straightforward as it seems, more so if people don't understand how it works

1 day ago
Sazazezer

Even asking it for sources is a risk, since depending on the situation it'll handle it in different ways.

If you ask a question and it determines it doesn't know the answer from its training data, then it'll run a custom search and provide the answer based on scraped data (this is what most likely happens if you ask it a 'recent events' question, where it can't be expected to know the answer).

If it determines it does know the answer, then it will first provide the answer that it has in its training data, AND THEN will run a standard web search to provide the 'sources' that match the query you made. This can lead it to give a hallucinated answer with sources that don't back it up, all with its usual confidence. (this especially happens if you ask it complicated nuanced topics and then ask it to provide sources afterwards)

1 day ago
c0LdFir3

Sure, but why bother? At that point you might as well use the search engine for yourself and pick your favorite sources, like the good ol days of 2-3 years ago.

1 day ago
moosenlad

Admittedly I am not the biggest AI fan. But search engines are garbage right now. They are kind of a "solved" algorithm by advertisers and news outlets so what was something that easy to Google in the past can now be enormously difficult. I have to add "reddit" to the end of a search prompt to get past some of that and it can sometimes help but that is becoming less sure too. As of now advertisers haven't figured out to have themselves put to the top of AI searches so the AI models that search the Internet and link sources have been better than I have thought they would be so far.

1 day ago
[deleted]

[deleted]

1 day ago
Whiterabbit--

That is a function appended to LLM.

1 day ago
cartoonist498

A very impressive autocomplete that seems to be able to mimic human reasoning without doing any actual reasoning and we don't completely understand how, but still fundamentally an autocomplete mechanism. 

1 day ago
Stargate525

It only 'mimics human reason' because we're very very good at anthropomorphizing things. We'll pack bond with a roomba. We assign emotions and motivations to our machines all the time.

We've built a Chinese Room which no one can see into, and a lot of us have decided that because we can't see into it it means it's a brain.

1 day ago
TheReiterEffect_S8

I just read what the Chinese Room philosophy is and wow, even with its counter-arguments it still simplifies it so well. Thanks for sharing.

1 day ago
Hip_Fridge

Hey, you leave my lil' Roomby out of this. He's doing his best, dammit.

1 day ago
CreepyPhotographer

Well, I don't know if you want to go to the store or something else.

Auto-complete completed that sentence for me after I wrote "Well,".

1 day ago
edparadox

An non-deterministic auto complete, which is not what one would expect from autocompletion.

1 day ago
bric12

it is actually deterministic, contrary to popular understanding, but it's highly chaotic. changing one word in your prompt or the seed used to pick answers means you'll get a wildly different response, but if you keep everything the same you will get the exact same response every time

1 day ago
-Mikee

An entire generation is growing up taking to heart and integrating into their beliefs millions of hallucinated answers from ai chat bots.

As an engineer, I remember a single teacher that told me hardening steel will make it stiffer for a project I was working on. It has taken me 10 years to unlearn it and to this day still have trouble explaining it to others or visualizing it as part of a system.

I couldn't conceptualize a magnetic field until like 5 years ago because I received bad advice from a fellow student. I could do the math and apply it in designs but I couldn't think of it as anything more than those lines people draw with metal filings.

I remember horrible fallacies from health classes (and worse beliefs from coworkers, friends, etc who grew up in red states) that influenced careers, political beliefs, and relationships for everyone I knew.

These are small, relatively inconsequential issues that damaged my life.

Growing up in the turn of the century, I saw learning change from hours in libraries to minutes on the internet. If you were genx or millennial, you knew natively how to get to the truth, how to avoid propaganda and advertising. Still, minutes to an answer that would traditionally take hours or historically take months.

Now we have a machine that spits convincing enough lies out in seconds, easier than real research, ensuring kids never learn how to find the real information and therefore never will dig deeper. Humans want to know things and when chatgpt offers a quick lie, children who don't/can't know better and the dumbest adults who should know better will use it and take it as truth because the alternative takes a few minutes.

1 day ago
TesticularButtBruise

Your description made me visualise Will Smith eating Spaghetti, it's that.

The spaghetti kind of flows and wobbles, and his face moves and stuff, all disgustingly, but it's never perfect. You can dial it in a bit though, show it more people eating food etc, but it's always gonna be just a tighter version of Will Smith eating Spaghetti.

1 day ago
IAmBecomeTeemo

But even if somehow it has been fed only facts, it's going to struggle to reliable produce a factual answer to any question with an ounce of nuance. A human with all the facts can deduce an unknown answer through logical thought, or hopefully have the integrity to say that they don't know the answer if they can't deduce one. A LLM that has all the facts but no human has already put them together, it's incapable ot doing so. It will try, but it will fail and produce some weird bullshit more often than not, but present it as fact.

1 day ago
Count4815

Edit: i missclicked and replied to the Wrong comment, sorry :x

1 day ago
UndocumentedMartian

It's not exactly that. Embeddings do create a map of relationships between words. But I think continuous reinforcement of those connections is missing from AI models in general. Word embeddings are also a poor form of conceptual connections imo.

1 day ago
flummyheartslinger

This is a great explanation. So many people try to make it seem like AI is a new hyper intelligent super human species.

It's full of shit though, just like many people are. But as you said, it's both convincing and often wrong and it cannot know that it is wrong and the user cannot know that it's wrong unless they know the answer already.

For example, I'm reading a classic novel. Probably one of the most studied novels of all time. A place name popped up that I wasn't familiar with so I asked an AI chat tool called Mistral "what is the significance of this place in this book?"

It told me that the location is not in the book. It was literally on the page in front of me. Instead it told me about a real life author who lived at the place one hundred years after the book was published.

I told the AI that it was wrong.

It apologized and then gave some vague details about the significance of that location in that book.

Pretty useless.

1 day ago
DisciplineNormal296

I’ve corrected chatgpt numerous times when talking to it about deep LOTR lore. If you didn’t know the lore before asking the question you would 100% believe it though. And when you correct it, it just says you’re right then spits another paragraph out

1 day ago
Kovarian

My general approach to LOTR lore is to believe absolutely anything anyone/anything tells me. Because it's all equally crazy.

1 day ago
DisciplineNormal296

I love it so much

1 day ago
droans

The models don't understand right or wrong in any sense. Even if it gives you the correct answer, you can reply that it's wrong and it'll believe you.

They cannot actually understand when your request is impossible. Even when it does reply that something can't be done, it'll often be wrong and you can get it to still try to tell you how to do something impossible by just saying it's wrong.

1 day ago
SeFlerz

I've found this is the case if you ask it any video game or film trivia that is even slightly more than surface deep. The only reason I knew it's answers were wrong is because I knew the answers in the first place.

1 day ago
realboabab

yeah i've found that when trying to confirm unusual game mechanics - ones that have basically 20:1 ratio of people expressing confusion/skepticism/doubt to people confirming it - LLMs will believe the people expressing doubt and tell you the mechanic DOES NOT work.

One dumb example - in World of Warcraft classic it's hard to keep track of which potions stack with each other or overwrite each other. LLMs are almost always wrong when you ask about rarer potions lol.

1 day ago
powerage76

It's full of shit though, just like many people are.

The problem that if you are clueless about the topic, it can be convincing. You know, it came from the Artificial Intelligence, it must be right.

If you pick any topic you are really familiar with and start asking about that, you'll quickly realize that it is just bullshitting you while simultaneously tries to kiss your ass, so you keep engaging with it.

Unfortunately I've seen people in decision maker positions totally loving this crap.

1 day ago
flummyheartslinger

This is a concern of mine. It's hard enough pushing back against senior staff, it'll be even harder when they're asking their confirmation bias buddy and I have to explain why the machine is also wrong.

1 day ago
audigex

It can do some REALLY useful stuff though, by being insanely flexible about input

You can give it a picture of almost anything and ask it for a description, and it’ll be fairly accurate even if it’s never seen that scene before

Why’s that good? Well for one thing, my smart speakers reading aloud a description of the people walking up my driveway is super useful - “Two men are carrying a large package, an AO.com delivery van is visible in the background” means I need to go open the door. “<mother in law>’s Renault Megane is parked on the driveway, a lady is walking towards the door” means my mother in law is going to let herself in and I can carry on making food

1 day ago
flummyheartslinger

This is interesting, I feel like there needs to be more use case discussions and headlines rather than what we get now which is "AI will take your job, to survive you'll need to find a way to serve the rich"

1 day ago
AgoRelative

I'm writing a manuscript in LaTeX right now, and copilot is good at generating LaTeX code from tables, images, etc. Not perfect, but good enough to save me a lot of time.

1 day ago
PapaSmurf1502

I once got a plant from a very dusty environment and the leaves were all covered in dust. I asked ChatGPT about this species of plant and if the dust could be important to the plant. It said no, so I vacuumed off the dust and noticed it start to secrete liquid from the leaves. I then asked if it was sure, and it said "Oh my mistake, that is actually part of the plant and you definitely shouldn't vacuum it off!"

Of course I'm the idiot for taking its word, but damn. At least the plant still seems to be ok.

1 day ago
Ttabts

the user cannot know that it's wrong unless they know the answer already.

Sure they can? Verifying an answer is often easier than coming up with the answer in the first place.

1 day ago
SafetyDanceInMyPants

Yeah, that’s fair — so maybe it’s better to say the user can’t know it’s wrong unless they either know the answer already or cross check it against another source.

But even then it’s dangerous to trust it with anything complicated that might not be easily verified — which is also often the type of thing people might use it for. For example, I once asked it a question about civil procedure in the US courts, and it gave me an answer that was totally believable — to the point that if you looked at the Federal Rules of Civil Procedure and didn’t understand this area of the law pretty well it would have seemed right. You’d have thought you’d verified it. But it was totally wrong — it would have led you down the wrong path.

Still an amazing tool, of course. But you gotta know its limitations.

1 day ago
zaminDDH

That, or a situation where I don't know the correct answer, but I definitely know that that's a wrong one. Like, I don't know how tall Kevin Hart is, but I know he's not 6'5".

1 day ago
Stargate525

Until all of the 'reputable' sources have cut corners by asking the Bullshit Machine and copying what it says, and the search engines that have worked fine for a generation are now also being powered by the Bullshit Machine.

1 day ago
UndoubtedlyAColor

I would also say that this is a usage issue as well. Asking a super specific fact question like this can be very error prone.

1 day ago
Dangerous-Bit-8308

This is the sort of system that is writing our executive orders and HHS statements

1 day ago
CrumbCakesAndCola

General-use ai are glorified chatbots but specific use ai are incredibly powerful tools.

1 day ago
Papa_Huggies

Importantly though, the new GPT model does actually calculate the maths when it comes across it, as opposed to taking a Bayesian/ bag-of-words method to provide the answer.

This can be tested by giving it a novel problem with nonsensical numbers. For example, you might run a gradient-descent with \eta = 37.334. An old model would just have a good guess on what that might look like. The new model will try to understand the algorithm and run it through its own calculator.

1 day ago
echalion

You are correct, just want to point out that it doesn't use bag-of-words or bayesian method, instead it is a decoder-only transformer that has a (multi-head and cross-) attention layer to calculate the relations between input words and probable outputs. These models indeed have a Program Aided Language now where they can run scripts to actually calculate answers.

1 day ago
Papa_Huggies

decoder-only transformer that has a (multi-head and cross-) attention layer

As someone really struggling through his Machine Learning subject right now, ELI-not-exactly-5-but-maybe-just-28-and-not-that-bright?

1 day ago
echalion

I'm happy to help, and instead of me explaining in a long text here, I'd love to direct you to a paper by Google research, which is the actual foundation of GPTs, and a video from StatQuest explaining the attention layer, which I used to help me through my studies as well. Hope it helps and good luck with your journey for knowledge!

1 day ago
Papa_Huggies

StatQuest is my Messiah

1 day ago
dlgn13

I hate the term "hallucination" for exactly this reason. It gives the impression that the default is for AI chatbots to have correct information, when in reality it's more like asking a random person a question. I'm not going to get into whether it makes sense to say an AI knows things (it's complicated), but it definitely doesn't know more than a random crowd of people shouting answers at you.

1 day ago
meowtiger

my response to "what/why ai hallucinate" is that genai are always hallucinating, they've just gotten pretty good at creating hallucinations that resemble reality by vomiting up a melange of every word they've ever read

1 day ago
dmazzoni

That is all true but it doesn’t mean it’s useless.

It’s very good at manipulating language, like “make this paragraph more formal sounding”.

It’s great at knowledge questions when I want to know “what does the average person who posts on the Internet think the right answer is” as opposed to an authoritative source. That’s surprisingly often: for an everyday home repair, an LLM will distill the essential steps that average people take. For a popular movie, an LLM will give a great summary of what the average person thinks the ending meant.

1 day ago
Paganator

It's weird seeing so many people say that LLMs are completely useless because they don't always give accurate answers on a subreddit made specifically to ask questions to complete strangers who may very well not give accurate answers.

1 day ago
explosivecrate

It's a very handy tool, the people who use it are just lazy and are buying into the 'ChatGPT can do anything!' hype.

Now if only companies would stop pushing it as a solution for problems it can't really help with.

1 day ago
Praglik

Main difference: on this subreddit you can ask completely unique questions that have never been asked before, and you'll likely get an expert's answer and thousands of individuals validating it.

When asking an AI a unique question, it infers based on similarly-worded questions but doesn't make logical connections, and crucially doesn't have human validation on this particular output.

1 day ago
notapantsday

you'll likely get an expert's answer and thousands of individuals validating it

The problem is, these individuals are not experts and I've seen so many examples of completely wrong answers being upvoted by the hivemind, just because someone is convincing.

1 day ago
BabyCatinaSunhat

LLMs are not totally useless, but their use-case is far outweighed by their uselessness specifically when it comes to asking questions you don't already know the answer to. And while we already know that humans can give wrong answers, we are encouraged to trust LLMs. I think that's what people are saying.

To respond to the second part of your comment — one of the reasons people ask questions on r/ELI5 is because of the human connection involved. It's not just information-seeking behavior, it's social behavior.

1 day ago
worldtriggerfanman

People like to parrot that LLMs are often wrong but in reality they are often right and wrong sometimes. Depends on your question but when it comes to stuff that ppl ask on ELI5, LLMs will do a better job than most people.

1 day ago
aaaaaaaarrrrrgh

Everything else is spot on for an ELI5, but I disagree with

any question do you do not already know the answer to

This should be "any question that you can't easily verify the answer to. Sometimes, finding the answer is hard but checking it is easy. Those are great tasks for a LLM, just don't skip the checking part just because it sounds like it knows what it's writing... because it often does that even if it's making up bullshit.

1 day ago
syriquez

So if you ask ChatGPT "What is 2+2?" it will try to construct a string of text that it thinks would be likely to follow the string you gave it in an actual conversation between humans.

It's pedantic but "thinks" is a bad word. None of these systems think. It is a fuzzed statistical analysis of a response to the prompt. The LLM doesn't understand or create novel ideas regarding the prompt. Each word, each letter, is the statistically most likely next letter or word that comes up as a response to the training that responds to the prompt.

The best analogy I've come up for it is singing a song in a language you don't actually speak or understand.

1 day ago
stephenph

Great explanation... This is also why specialist ais can be very good at responses. All the model inputs are curated. But that also means you Only get one of the acceptable answers

Example, if you have an AI that is well trained to program, you will only get answers that work according to "best practices". No room for improvement or inspiration. But if an AI is just using stack exchange you will get fringe , possibly incorrect programs

1 day ago
Thegreatbrendar

This is a really great way of explaining it.

1 day ago
Gizogin

It’s designed to interpret natural-language queries and respond in kind. It potentially could be designed to assess its own confidence and give an “I don’t know” answer below a certain threshold, but the current crop of LLMs have not been designed to do that. They’ve been designed to simulate human conversations, and it turns out that humans get things confidently wrong all the time.

1 day ago
TesticularButtBruise

but again, the thought process, and the "i don’t know" would just be the results of feeding the entire context window through the LLM, so it would just predict new bullshit and hallucinate even more. The bigger the context window gets, the worse the hallucinations get.

1 day ago
cscottnet

The thing is, AI was "stuck" doing the "assess its own confidence" thing. It is slow work and hasn't made much progress in decades. But the traditional AI models were built on reasoning, and facts, so they could tell you exactly why they thought X was true and where each step in its reasoning came from.

But then some folks realized that making output that "looked" correct was more fun than trying to make output that was "actually" correct -- and further that a bunch of human biases and anthropomorphism kicked in once the output looked sufficiently human and that excused/hid a bunch of deficiencies.

So it's not technically correct that "we could make it accurate". We tried that and it was Hard, so we more or less gave up. We could go back and keep working on it, but it wouldn't be as "good" (aka human-seeming) as the crap we're in love with at the moment.

1 day ago
knightofargh

Other types of ML have confidence scores still. Machine vision including OCR definitely does, and some (most? Dunno, I know a specific model or two from teaching myself agentic AI) LLM models report a confidence score that you don’t see as part of its metadata.

Treating LLMs or GenAI in general as a kind of naive intern who responds like your phone’s predictive text is the safest approach.

I really wish media outlets and gullible boomer executives would get off the AI train. There is no ethical or ecologically sustainable use of current AI.

1 day ago
MillhouseJManastorm

Boomers used it to write our new tariff policy. I think we are screwed

1 day ago
Davidfreeze

Less that it was more fun/ we know beforehand it would be easier, it was more generative transformers to replicate speech were just one of the fields of research for a long time alongside everything else and it started getting wildly better results. The success of generative transformers led to their ubiquity rather than a decision to pivot to them led to them getting good. We need to be careful about how much faith is being put in them by people who don't understand it's just trying to sound right. But it wasn't like a conscious decision to prioritize them. They just got good at what they do very explosively. I remember working with earlier much shittier versions as an undergrad in a text mining class. They were one of the many things being worked on for a long time

1 day ago
berael

LLMs are not "intelligent". They do not "know" anything. 

They are created to generate human-looking text, by analysing word patterns and then trying to imitate them. They do not "know" what those words mean; they just determine that putting those words in that order looks like something a person would write. 

"Hallucinating" is what it's called when it turns out that those words in that order are just made up bullshit. Because the LLMs do not know if the words they generate are correct. 

2 days ago
LockjawTheOgre

They REALLY don't "know" anything. I played a little with LLM assistance with my writing. I was writing about my hometown. No matter how much I wish for one, we do not have an art museum under the town's name. One LLM absolutely insisted on talking about the art museum. I'd tell it the museum didn't exist. I'd tell it to leave out the bit about the museum. It refused, and continued to bloviate about the non-existent museum.

It hallucinated a museum. Who am I to tell it it wasn't true?

2 days ago
splinkymishmash

I play a fairly obscure online RPG. ChatGPT is pretty good at answering straightforward questions about rules, but if you ask it to elaborate about strategy, the results are hilariously, insanely wrong.

It offered me tips on farming a particular item (schematics) efficiently, so I said yes. It then told me how schematics worked. Totally wrong. It then gave me a 7-point outline of farming tips. Every single point was completely wrong and made up. In its own way, it was pretty amazing.

1 day ago
Lizlodude

LLMs are one of those weird technologies where it's simultaneously crazy impressive what they can do, and hilarious how terrible they are at what they do.

1 day ago
Hypothesis_Null

LLMs have completely vidicated the quote that: "The ability to speak does not make you intelligent." People tend to speak more coherently the more intelligent they are, so we've been trained to treat eloquent articulation as a proxy for intelligence, understanding, and wisdom. Turns out that said good-speak can be distilled and generated independently and separately from any of those things.

We actually recognized that years ago. But people pushed on with this, saying glibly and cynically that "well, saying something smart isn't actually that important for most things; we just need something to say -anything-."

And now we're recognizing how much coherent thought, logic, and contextual experience actually does underpin all of of communication. Even speech we might have categorized as 'stupid'. LLMs have demonstrated how generally useless speech is without these things. At least when a human says something dumb, they're normally just mistaken about one specific part of the world, rather than disconnected from the entirety of it.

There's a reason that despite this hype going on for two years, no one has found a good way to actually monetize these highly-trained LLMs. Because what they provide offers very little value. Especially once you factor in having to take new, corrective measures to fix things when it's wrong.

1 day ago
charlesfire

Nah. They are great at what they do (making human-looking text). It's just that people are misusing them. They aren't facts generator. They are human-looking text generator.

1 day ago
Lizlodude

You are correct. Almost like using a tool for something it isn't at all intended for doesn't work well...

1 day ago
Catch_022

They are fantastic at proof reading my work emails and making them easier for my colleagues to read.

Just don't trust them to give you any info.

1 day ago
Kogoeshin

Funnily enough, despite having hard-coded, deterministic, logical rules with a strict sentence/word structure for cards, AI will just make up rules for Magic the Gathering.

Instead of going off the rulebook to parse answers, it'll go off of "these cards are similar looking so they must work the same" despite the cards not working that way.

A problem that's been popping up in local tournaments and events is players asking AI rules questions and just... playing the game wrong because it doesn't know the rules but answers confidently.

I assume a similar thing has been happening for other card/board games, as well. It's strangely bad at rules.

1 day ago
animebae4lyf

My local one piece group loves fucking with meta AI and asking it for tips to play and what to do. It picks up rules for different games and uses them, telling us that Nami is a strong leader because of her will count. No such thing as will in the game.

It's super fun to ask dumb questions to buy oh boy, we would never trust it on anything.

1 day ago
CreepyPhotographer

MetaAI has some particular weird responses. If you accuse it of lying, it will say "You caught me!" And it tends to squeal in *excitement*.

Ask MetaAI about Meta the company, and it recognized what a scumbag company they are. I also got it in an argument about AI just copying information from websites, depriving those sites of hits and income, and it will kind of agree and say it's a developing technology. I think it was trying to agree with me.

1 day ago
Zosymandias

I think it was trying to agree with me.

Not to you directly but I wish people would stop personifying AI

1 day ago
Ybuzz

To be fair, one of the problems with AI chat models is that they're designed to agree with you, make you feel clever etc.

I had one conversation with one (it came with my phone, and I just wanted to see if it was in any way useful...) and it kept saying things like "that's an insightful question" and "you've made a great point" to the point it was actually creepy.

Companies want you to feel good interacting with their AI, and talk to them for as long as possible, so they aren't generally going to tell you that you're wrong. They will actively 'try' to agree with you in that they are designed to give you the words that it thinks it's most likely you want to hear.

Which is another reason for hallucinations actually - if you ask about a book that doesn't exist, it will give you a title and author, if you ask about a historical event that never occurred it can spout reams of BS presented as facts because... You asked! They won't say "I don't know" or "that doesn't exist" (and where they do that's often because that's a partially preprogrammed response to something considered common/harmful misinformation). They are just designed to give you back the words you're most likely to want, about the words you input.

1 day ago
lamblikeawolf

Instead of going off the rulebook to parse answers, it'll go off of "these cards are similar looking so they must work the same" despite the cards not working that way.

That's precisely what is to be expected based on how LLMs are trained and how they work.

They are not a search engine looking for specific strings of data based on an input.

They are not going to find a specific ruleset and then apply that specific limited knowledge to the next response (unless you explicitly give it that information and tell it to, and even then...)

They are a very advanced form of text prediction. Based on the things you as a user most recently told it, what is a LIKELY answer based on all of the training data that has similar key words.

This is why it could not tell you correctly how many letters are in the word strawberry, or even how many times the letter "r" appears. Whereas a non-AI model could have a specific algorithm that parses text as part of its data analytics.

1 day ago
TooStrangeForWeird

I recently tried to play with ChatGPT again after finding it MORE than useless in the past. I've been trying to program and/or reverse engineer brushless motor controllers with little to literally zero documentation.

Surprisingly, it got a good amount of stuff right. It identified some of my boards as clones and gave logical guesses as to what they were based off of, then asked followup questions that led it to the right answer! I didn't know the answer yet, but once I had that guess I used a debugger probe with the settings for its guess and it was correct.

It even followed traces on the PCB to correct points and identified that my weird "Chinese only" board was mixing RISC and ARM processors.

That said, it also said some horribly incorrect things that (had I been largely uninformed) sounded like a breakthrough.

It's also very, very bad at translating chinese. All of them are. I found better random translations on Reddit from years ago lol.

But the whole "this looks similar to this" turned out really well when identifying mystery boards.

1 day ago
MultiFazed

This is why it could not tell you correctly how many letters are in the word strawberry, or even how many times the letter "r" appears.

The reason for that is slightly different than the whole "likely answer" thing.

LLMs don't operate on words. By the time your query gets to the LLM, it's operating on tokens. The internals of the LLM do not see "strawberry". The word gets tokenized as "st", "raw", and "berry", and then converted to a numerical representation. The LLM only sees "[302, 1618, 19772]". So the only way it can predict "number of R's" is if that relationship was included in text close to those tokens in the training data.

1 day ago
ProofJournalist

Got any specific examples?

1 day ago
WendellSchadenfreude

I don't know about MTG, but there are examples of ChatGPT playing "chess" on youtube. This is GothamChess analyzing a game between ChatGPT and Google Bard.

The LLMs don't know the rules of chess, but they do know what chess notation looks like. So they start the game with a few logical, normal moves because there are lots of examples online of human players making very similar moves, but then they suddenly make pieces appear out of nowhere, take their own pieces, or completely ignore the rules in some other ways.

1 day ago
raynicolette

The was a posting on r/chess a few weeks ago (possibly the least obscure of all games) where someone asked a LLM about chess strategy, and it gave a long-winded answer about sacrificing your king to gain a positional advantage. <face palm>

1 day ago
Bademeister_

I've also seen LLMs play chess against humans. Hilarious stuff, sometimes they just created new pieces, captured their own pieces, made illegal moves or just moved their king into threatened spaces.

1 day ago
ACorania

It's a problem when we treat an LLM like it is google. It CAN be useful in those situations (especially when web search is enabled as well) in that if it is commonly known then that pattern is what it will repeat. Otherwise, it will just make up something that sounds contextually good and doesn't care if it is factually correct. Thinking of it as a language calculator is a good way to think of it... not the content of the language, just the language itself.

1 day ago
pseudopad

It's a problem when Google themselves treat LLMs like it's google. By putting their own generative text reply as the top result for almost everything.

1 day ago
lamblikeawolf

I keep trying to turn it off. WHY DOES IT NEVER STAY OFF.

1 day ago
badken

There are browser plugins that add a magic argument to all searches that prevents the AI stuff from showing up. Unfortunately it also interferes with some kinds of searches.

For my part, I just stopped using any search engine that puts AI results front and center without providing an option to disable it.

1 day ago
Hippostork

FYI the original google search still exists as "Web"

https://www.youtube.com/watch?v=qGlNb2ZPZdc

1 day ago
Jwosty

This actually drives me insane. It's one thing for people to misuse LLMs; it's a whole other thing for the companies building them to actively encourage mis-usages of their own LLMs.

1 day ago
Classic-Obligation35

I once asked it to respond to a query like Kryten from Red Dwarf, it gave me Lister.

In the end it doesn't really understand its just a more fancy algorithm.

1 day ago
therhubarbman

ChatGPT does a terrible job with video game questions. It will tell you to do things that don't exist in the game.

1 day ago
ChronicBitRot

It's super easy to make it do this too, anyone can go and try it right now: go ask it about something that you 100% know the answer to, doesn't matter what it is as long as you know for a fact what the right answer is.

Then whatever it answers (but especially if it's right), tell it that everything it just said is incorrect. It will then come back with a different answer. Tell it that one's incorrect too and watch it come up with a third answer.

Congratulations, you've caused your very own hallucinations.

1 day ago
hgrunt

I had the google ai summary tell me that pulling back on the control stick of a helicopter makes it go up

1 day ago
boring_pants

A good way to look at it is that it understand the "shape" of the expected answer. It knows that small towns often do have a museum. So if it hasn't been trained on information that this specific town is famous for its lack of museums then it'll just go with what it knows: "when people describe towns, they tend to mention the museum".

2 days ago
Lepurten

Even this suggestion of it knowing anything is too much. Really it just calculates what word should follow the next one based on input. A lot of input about any given town has something about a museum. So the museum will show up. It's fascinating how accurate these kind of calculations can be about well established topics, but if it's too specific, like a small specific town, the answers will get comically wrong because the input doesn't allow for accurate calculations.

1 day ago
geckotatgirl

You can always spot the AI generated answers in subs like r/tipofmytongue and especially r/whatsthatbook. It's really really bad. It just makes up book titles to go with the synopsis provided by the OP.

1 day ago
TooStrangeForWeird

That's the real hallucination. I mean, the museum too, but just straight up inventing a book when it's a click away to see it doesn't exist is hallucinating to the max.

1 day ago
Pirkale

I've had good success with AI when hunting for obscure TV series and movies for my wife. Found no other use, yet.

1 day ago
Faderkaderk

Even here we're still falling into the trap of using terminology like "know"

It doesn't "know that small towns" have museums. It may expect, based on other writings, that when people talk about small towns they often talk about the museum. And therefore, it wants to talk about the small town, because that's what it expects.

1 day ago
garbagetoss1010

If you're gonna be pedantic about saying "know", you shouldn't turn around and say "expect" and "want" about the same model.

1 day ago
Sweaty_Resist_5039

Well technically there's no evidence that the person you responded to in fact turned around before composing the second half of their post. In my experience, individuals on Reddit are often facing only a single direction for the duration of such composition, even if their argument does contain inconsistencies.

1 day ago
garbagetoss1010

Lol you know what, you got me. I bet they didn't turn at all.

1 day ago
badken

OMG it's an AI!

invasionofthebodysnatchers.gif

1 day ago
JediExile

My boss asked me my opinion of ChatGPT, I told him that it’s optimized to tell you what you want to hear, not for objectivity.

1 day ago
ACorania

It gets tough once it gives out incorrect information for that to get forgotten as it is looking back at your conversation as a whole for context that is then generating the next response for.

It helps to catch it as early as possible. Don't engage with that material and tell it to forget that and regenerate a new response with the understand that there is no art museum (or whatever). If you let it go for a while or interact with that though, it becomes a part of the pattern, and it continues patterns.

Where people really screw up is trusting it to come up with facts instead of doing what does which is come up with language that sounds good when strung together in that context. When you think of it as a language calculator and you are still responsible for the content itself, it becomes a LOT more useful.

In a situation like you are describing, I might provide it with bullet points of the ideas I want included and then ask it to write a paragraph including those ideas. The more information and context you put into the prompt the better (because it is going to make something that works contextually).

I just started using custom and specific AIs at my new job and I have to say they are a lot better with this type of thing. They are trained on a relevant data set and are thus much more accurate.

1 day ago
Initial_E

First of all are you absolutely sure there isn’t a secret museum in your home town?

1 day ago
Boober_Calrissian

This post reminds me of when I started writing one of my books, a system based LitRPG with a fairly hard coded magic system. Occasionally after a long writing session, I'd plop it into an LLM "AI" and just ask how a reader might react to this or that. (I'd never use it to write prose or to make decisions. I only used it as the rubber ducky.)

Two things will inevitably happen:

It will assume with absolute certainty that the world, the system, is 'glitched' and then it will provide a long list of ways in which reality can break down and the protagonist begin questioning what is real and not real.

Every single time.

1 day ago
Jdjdhdvhdjdkdusyavsj

There's a common llm problem that shows this well, playing a number guessing game: think of a number between 1-100 and I'll guess the number, you tell me if it's higher or lower, when I get it I win.

It's a common enough problem that it's been solved so we know exactly how many tries it should take on average playing optimally: just always guess the middle number and you keep halving the possible guesses, quickly getting to a correct answer. Problem is that llms weren't doing this, they would just pretend to do it because they don't actually have memory like that so they would just randomly tell you you guessed right at some point. There was effort made to make it actually pretend to do the guessing game correctly to simulate that it was playing correctly but it still doesn't really.

1 day ago
cyrilio

Taking LSD and then hallucinating about a museum and hypothetical art that hangs there does seem like a fun activity.

1 day ago
GlyphedArchitect

So what I'm hearing is that if you went to your hometown and opened a museum, the LLM will draw up huge business for you for free..... 

1 day ago
gargavar

“ but the next time I was home, I visited the town library. I was looking at an old map of the town, all faded, and crumbling; a map from ages ago. And there…behind the a tattered corner that had creased and folded over… was the town library.”

1 day ago
djackieunchaned

Sounds like YOU hallucinated a NOT art museum!

1 day ago
hmiser

Yeah but a museum does sound so nice and your AI audience knows the definition of bloviate.

Swiping right won’t get you that :-)

But on the real this is the best defining example of AI hallucination I’ve heard, whatcha writing?

1 day ago
LockjawTheOgre

I'm writing some scripts for some videos I want to produce. I was really just testing to see if LLMs could help me in the punch-up stage, with ideas. It turns out, I just needed to put the right song on repeat, and do a full re-write in about an hour. I've made myself one of the world's leading experts on some stupid, obscure subject, so I can do it better than skynet. One is a local history, starting with the creation of the Universe and ending with the creation of my town. Fun stuff.

1 day ago
leegle79

I’m old so it’s not often I encounter a new world. Thankyou for “bloviate”, going to start dropping it into conversations immediately.

1 day ago
talligan

On the flip side I've noticed it gives relatively accurate information about the specialised field I work in. You kinda need to know the answer in advance, as in I'm trying to quickly remember some general parameter ranges and it's a pita to find those online if you're away from a textbook.

I tried to get it to come up with a cool acronym or title for a grant, but it just really sucked at that. The postdoc eventually came up with a better one.

1 day ago
Obliman

"Don't think about pink elephants" can work on AI too

1 day ago
SCarolinaSoccerNut

This is why one of the funniest things you can do is ask pointed questions to an LLM like ChatGPT about a topic on which you're very knowledgeable. You see it make constant factual errors and you realize very quickly how unreliable they are as factfinders. As an example, if you try to play a chess game with one of these bots using notation, it will constantly make illegal moves.

1 day ago
berael

Similarly, as a perfumer, people constantly get all excited and think they're the first ones to ever ask ChatGPT to create a perfume formula. The results are, universally, hilariously terrible, and frequently include materials that don't actually exist. 

1 day ago
GooseQuothMan

It makes sense, how would an LLM know how things smell like lmao. It's not something you can learn from text

1 day ago
berael

It takes the kinds of words people use when they write about perfumes, and it tries to assemble words like those in sentences like those. That's how it does anything - and also why its perfume formulae are so, so horrible. ;p

1 day ago
pseudopad

It would only know what people generally write that things smell like when things contain certain chemicals.

1 day ago
VoilaVoilaWashington

those words in that order are just made up bullshit

I'd describe it slightly differently. It's all made up bullshit.

There's an old joke about being an expert in any field as long as no one else is. If there's no astrophysicist in the room, I can wax melodic about the magnetic potential of gravitronic waves. And the person who asked me about it will be impressed with my knowledge, because clearly, they don't know or they wouldn't have asked.

That's the danger. If you're asking an AI about something you don't understand, how do you know whether it's anywhere close to right?

1 day ago
S-r-ex

"Illusory Intelligence" is perhaps a more fitting description of LLMs.

1 day ago
pleachchapel

The more accurate way to think about it is that they hallucinate 100% of the time, & they're correct ~80–90% of the time

1 day ago
OutsideTheSocialLoop

Mm. It's all hallucination, some of it just happens to align with reality.

1 day ago
GalFisk

I find it quite amazing that such a model works reasonably well most of the time, just by making it large enough.

2 days ago
thighmaster69

It's because it's capable of learning from absolutely massive amounts of data, but what it outputs still amounts to conditional probably based on its inputs.

Because of this, it can mimic a well reasoned logical thought in a way that can be convincing to humans, because the LLM has seen and can draw on more data than any individual human can hope to in a lifetime. But it's easy to pick apart if you know how to do it, because it will begin to apply patterns to situations where it doesn't work because it hasn't seen that specific information before, and it doesn't know anything.

2 days ago
0x14f

You just described the brain neural network of the average redditor

2 days ago
Navras3270

Dude I felt like I was a primitive LLM during school. Just regurgitating information from a textbook in a slightly different format/wording to prove I had read and understood the text.

1 day ago
TurkeyFisher

Considering how many reddit comments are really just LLMs you aren't wrong.

1 day ago
Electronic_Stop_9493

Just ask it math questions it’ll break easily

1 day ago
Celestial_User

Not necessarily. Most of the commercials AIs nowadays are no longer pure LLM. They're often agentic now. Asking ChatGPT a math question will have it trigger a math handling module that understands math, get your answer, and feed it back into the LLM output.

1 day ago
Electronic_Stop_9493

That’s useful but it’s not the tech itself doing it it’s just switching apps basically which is smart

1 day ago
sygnathid

Human brains are different cortices that handle different tasks and coordinate with each other.

1 day ago
HojMcFoj

What is the difference between tech and tech that has access to other tech?

1 day ago
oboshoe

Ah that explains it.

I noticed that CHATGPT suddenly got really good at some advanced math.

I didn't realize the basic logic behind it changed. (Off I go to the "agentic" rabbit hole)

1 day ago
simulated-souls

LLMs are actually getting pretty good at math.

Today's models can get up to 80 percent on AIME which is a difficult competition math test. This means that the top models would likely qualify for the USA Math Olympiad.

Also note that AIME 2025 was released after those models could have been trained on it, so they haven't just memorized the answers.

1 day ago
Gecko23

Humans have a very high tolerance for noisy inputs. We can distinguish meaning in garbled sounds, noisy images, broken language, etc. It's a particularly low bar to cross to sound plausible to someone not doing serious analysis on the output.

1 day ago
Probate_Judge

The way I try to explain it to people.

LLMs are word ordering algorithms that are designed with the goal of fooling the person they're 'talking' to, of sounding cogent and confident.

Sometimes they get something correct because it was directly in the training data and there wasn't a lot of B.S. around it to camouflage the right answer.

When they're wrong we call that 'hallucinating'. It doesn't know it's wrong, because it doesn't know anything. Likewise it doesn't know it's right. If we put it in human terms, it would be just as confident in either case. But be careful doing that because it's not sentient, it doesn't know and it isn't confidient....what it does is bullshit.

I think it is more easily illustrated with some AI image generators(because they're based on LLMs): Give it two painting titles from Davinci: Mona Lisa and Lady with an Ermine. Notice I'm not giving a link for Mona Lisa, because most people will know it, it's one of the most famous paintings ever.

Mona Lisa it will reproduce somewhat faithfully because it's repeated accurately throughout a lot of culture(which is what makes up the training data). In other words, there are a lot of images with the words "Mona Lisa" that legitimately look like the work.

https://i.imgur.com/xgdw0pr.jpeg

Lady with an Ermine it will "hallucinate" an image because it's a relatively unknown work in comparison. It associates the title vaguely with the style of Davinci and other work from the general period, but it doesn't know the work, so it will generate a variety of pictures of a woman of the era holding an ermine.....none of them really resembling the actual painting in any detail.

https://i.postimg.cc/zvTsJ0qz/Lady-WErmine.jpg [Edit: I forgot, Imgr doesn't like this image for some reason.]

(Created with Stable Diffusion, same settings, same 6 seeds, etc, only the prompt being different)

1 day ago
vandezuma

Essentially all LLM outputs are hallucinations - they've just been trained well enough that the majority of the hallucinations happen to line up with the correct answer.

1 day ago
Andoverian

This is a good explanation.

Basically, LLMs are always making stuff up, but when the stuff they make up is sufficiently far from reality we call it "hallucinating".

1 day ago
vulcanfeminist

A good example of this is fake citations. The LLM can analyze millions of real citations and can generate a realistic looking citation based on that analysis while that fake citation doesnt actually exist.

1 day ago
WickedWeedle

I mean, everything an LLM does is made-up bullshi... uh, male bovine feces. It always makes things up autocomplete-style. It's just that some of the stuff it makes up coincides with the facts of the real world.

2 days ago
Vadersabitch

and to imagine that people are treating it like a real oracle asking stuff and taking corporate actions based on its answers...

2 days ago
Hot-Chemist1784

hallucinating just means the AI is making stuff up that sounds real but isn’t true.

it happens because it tries to predict words, not because it understands facts or emotions.

2 days ago
BrightNooblar

https://www.youtube.com/watch?v=RXJKdh1KZ0w

This video is pure gibberish. None of it means anything. But its technical sounding and delivered with a straight face. This is the same kind of thing that a hallucinating AI would generate, because it all sounds like real stuff. Even though it isn't, its just total nonsense.

https://www.youtube.com/watch?v=fU-wH8SrFro&

This song was made by an Italian artist and designed to sound like a catchy American song being performed on the radio. So from a foreign ear it will sound like English. But to an English speaker, you can its just gibberish that SOUNDS like English. Again while this isn't AI or a hallucination, it is an example of something that sounds like facts in English (Which is what the AI is trying to do) but is actually gibberish.

2 days ago
Harbinger2001

I’ve never seen that version of the Italian song, thanks!

2 days ago
waylandsmith

I was hoping that was the retro-encabulator video before I clicked it! Excellent example.

2 days ago
fliberdygibits

I hate when I get sinusoidal repleneration in my dingle-arm.

2 days ago
jabberbonjwa

I always upvote this song.

2 days ago
foolishle

Prisencolinensinainciusol! Such a banger.

1 day ago
geitjesdag

Traditionally, Hallucination is a type of error for tasks that have a "ground truth", and rather than leaving something out, it adds something in. For example, if a model is tasked with summarising a text, and it adds something that wasn't in the original.

For LLMs, this term is not entirely appropriate, except in the context of the task YOU have decided it should do. It's just a language model, generating text that sounds reasonable, so in that sense it's always doing what it's supposed to be doing, and not hallucinating in the traditional sense. But it's also reasonable to use this as a description of an error for a tasks you define. For example, if you type "Who is Noam Chomsky" and it generates the text 'Noam Chomsky is a linguist who wrote "The Perks of Being a Wallflower"', you can argue that hallucination is the right characterisation of the error IF your task is to get it to generate text that you can interpret as true facts about Noam Chomsky.

It's a bit vague, more of a term for hand-done error analysis than a clearly defined term. For example, Noam Chomsky wrote The Minimalist Program, but if it said he wrote The Minimal Programs, is that a hallucination or different kind of error?

1 day ago
Phage0070

The first thing to understand is that LLMs are basically always "hallucinating", it isn't some mode or state they transition into.

What is happening when an LLM is created or "trained" is that it is given a huge sample of regular human language and forms a statistical web to associate words and their order together. If for example the prompt includes "cat" then the response is more likely to include words like "fish" or "furry" and not so much "lunar regolith" or "diabetes". Similarly in the response a word like "potato" is more likely to be followed by a word like "chip" than a word like "vaccine".

If this web of statistical associations is made large enough and refined the right amount then the output of the large language model actually begins to closely resemble human writing, matching up well to the huge sample of writings that it is formed from. But it is important to remember that what the LLM is aiming to do is to form responses that closely resemble its training data set, which is to say closely resemble writing as done by a human. That is all.

Note that at no point does the LLM "understand" what it is doing. It doesn't "know" what it is being asked and certainly doesn't know if its responses are factually correct. All it was designed to do was to generate a response that is similar to human-generated writing, and it only does that through statistical association of words without any concept of its meaning. It is like someone piecing together a response in a language they don't understand simply by prior observation of what words are commonly used together.

So if an LLM actually provides a response that sounds like a person but is also correct it is an interesting coincidence that what sounds most like human writing is also a right answer. The LLM wasn't trained on if it answered correctly or not, and if it confidently rattles of a completely incorrect response that nonetheless sounds like a human made it then it is achieving success according to its design.

2 days ago
simulated-souls

it only does that through statistical association of words without any concept of its meaning.

LLMs actually form "emergent world representations" that encode and simulate how the world works, because doing so is the best way to make predictions.

For example, if you train an LLM-like model to play chess using only algebraic notation like "1. e4 e5 2. Nf3 Nc6 3. Bb5 a6", then the model will eventually start internally "visualizing" the board state, even though it has never been exposed to the actual board.

There has been quite a bit of research on this: 1. https://arxiv.org/html/2403.15498v1 2. https://arxiv.org/pdf/2305.11169 3. https://arxiv.org/abs/2210.13382

2 days ago
YakumoYoukai

There's a long-running psychological debate about the nature of thought, and how dependent it is on language. LLM's are interesting because they are the epitome of thinking based 100% on language. If it doesn't exist in language, then it can't be a thought.

1 day ago
simulated-souls

We're getting away from that now though. Most of the big LLMs these days are multimodal, so they also work with images and sometimes sound.

1 day ago
YakumoYoukai

I wonder if some of the "abandoned" AI techniques will/are going to make a comeback, and be combined with LLMs to assist the LLM to be more logical, or conversely, supply a bit of intuition to AI techniques with very limited scopes. I say "abandoned" only as shorthand for the things I heard in popsci or studied, like planning, semantic webs, etc, but don't hear anything about anymore.

1 day ago
Jwosty

See: Mixture of Experts

1 day ago
Gizogin

A major, unstated assumption of this discussion is that humans don’t produce language through statistical heuristics based on previous conversations and literature. Personally, I’m not at all convinced that this is the case.

If you’ve ever interrupted someone because you already know how they’re going to finish their sentence and you have the answer, guess what; you’ve made a guess about the words that are coming next based on internalized language statistics.

If you’ve ever started a sentence and lost track of it partway through because you didn’t plan out the whole thing before you started talking, then you’ve attempted to build a sentence by successively choosing the next-most-likely word based on what you’ve already said.

So much of the discussion around LLMs is based on the belief that humans - and our ability to use language - are exceptional and impossible to replicate. But the entire point of the Turing Test (which modern LLMs pass handily) is that we don’t even know if other humans are genuinely intelligent, because we cannot see into other people’s minds. If someone or something says the things that a thinking person would say, we have to give them the benefit of the doubt and assume that they are a thinking person, at least to some extent.

1 day ago
kbn_

The first thing to understand is that LLMs are basically always "hallucinating", it isn't some mode or state they transition into.

Strictly speaking, this isn't true, though it's a common misconception.

Modern frontier models have active modalities where the model predicts a notion of uncertainty around words and concepts. If it doesn't know something, in general, it's not going to just make it up. This is a significant departure from earlier and more naive applications of GPT.

The problem though is that sometimes, for reasons that aren't totally clear, this modality can be overridden. Anthropic has been doing some really fascinating research into this stuff, and one of their more recent studies they found that for prompts which have multiple conceptual elements, if the model has a high degree of certainty about one element, that can override its uncertainty about other elements, resulting in a "confident" fabrication.

2 days ago
Gizogin

Ah, so even AI is vulnerable to ultracrepidarianism.

1 day ago
thighmaster69

To be a devil's advocate - humans, in a way, are also always hallucinating as well. Our perception of reality is a construct that our brains build based on sensory inputs, some inductive bias and past inputs. We just do it way better and more generally than current neural networks can with a relative poverty of stimulus, but at the end of the day there isn't something special in our brains that theoretically can't eventually be replicated on a computer, because at the end of the day it's just networked neurons firing. We just haven't gotten to the point where we can do it yet.

2 days ago
Andoverian

This is getting into philosophy, but I'd still say there's a difference between "humans only have an imperfect perception of reality" and "LLMs make things up because they fundamentally have no way to determine truth".

1 day ago
Phage0070

The training data is very different as well though. With an LLM the training data is human-generated text and so the output aimed for is human-like text. With humans the input is life and the aimed for output is survival.

2 days ago
thatsamiam

What do we actually know? A lot of what we know is because we believe what we were taught. We "know" the sun is hot and round even though we have not been to it. We have seen photos and infer its characteristics based on different data points such that we can conclude with high degree of certainty that the sun is hot and round. All those characteristics are expressed using language so we are doing what LLM is doing but in a much much more advanced manner. One huge difference is that unlike LLM, humans have desire to be correct because it helps the species survive. I don't think LLM has that. Our desire to be correct causes us to investigate further and verify our information and change our hypothesis if new data contradicts existing hypothesis.

1 day ago
YakumoYoukai

There's a classic technique in computer science & programming circles called a Markov generator where a work of text, like Moby Dick, is analyzed for how often any word appears after every other word. Like, for all the times the word "the" is in the book, the next word is "whale" 10% of the time, "harpoon" 5%, "ocean" 3%, etc... Then you run the process in reverse - pick a word, then pick the next word randomly, but with a probability according to how often it appeared after the first word, and then pick a third word the same way, etc, forever.

It's clear that there is nothing in this process that knows the words' meaning, or the sentences, or the subject they're about. It just knows what words are used alongside each other. LLMs are a lot like this, except on a larger scale. They take more words into account when they're choosing their next word, and the way it comes up with the probabilities of the next words is more complicated, but no less mechanical.

1 day ago
rootbeer277

If you’d like to see what hallucinations look like, triggering an obvious one would help you understand better than explanations of why they happen. 

Take your favorite TV show and ask it for a plot summary of one of the most popular episodes, something likely to be in its training data because it’s been reviewed and talked about all over the internet. It’ll give you a great summary and review. 

Now ask it about one of the lesser known filler episodes. You’ll probably get a plot summary of a completely fictional episode that sounds like it could have been an episode of that show, but it wasn’t, and certainly wasn’t the episode you asked about. That’s a hallucination. 

1 day ago
BelladonnaRoot

It’s not necessarily emotional answers. It gives answers that sound right. That’s it, nothing more than that. There’s no fact checking, or trying to be correct. It typically sounds correct because the vast majority of writing prior to AI is correct because the author cared about accuracy.

So if you ask it to write something that might not exist, it may fill in that blank with something that sounds right…but isn’t. For example, if you want it to write a legal briefing, those typically use references to existing supporting cases or legal situations. So if those supporting cases don’t actually exist, then the AI will “hallucinate” and make references that sound right (but don’t actually exist).

2 days ago
green_meklar

Basically it means when they make up false stuff. Not lies, in the sense that we're not talking about what happens when the AI is told to lie, but just wrong ideas that the AI spits out as if they are correct. It's a nuisance because we'd like to rely on these systems to report accurate knowledge but so far they're pretty unreliable because they often make stuff up and express it with an appearance of innocence and confidence that make it hard to tell about from the truth.

As for what causes it, it's just an artifact of how this kind of AI works. The AI doesn't really think, it just reads a bunch of text and then has a strong intuition for what word or letter comes next in that text. Often its intuition is correct, because it's very complex and has been trained on an enormous amount of data. But it's a little bit random (that's why it doesn't give the exact same answer every time), and when it's talking about something it hasn't trained on very much and doesn't 'feel strongly' about, it can randomly pick a word that doesn't fit. And when it gets the wrong word, it can't go back and delete that wrong choice, and its intuition about the next word is necessarily informed by the wrong word it just typed, so it tends to become even more wrong by trying to match words with its own wrong words. Also, because it's not trained on a lot of data that involves typing the wrong word and then realizing it's the wrong word and verbally retracting it (because humans seldom type that way), when it gets the wrong word it continues as if the wrong word was correct, expressing more confidence than it should really have.

As an example, imagine if I gave you this text:

The country right between

and asked you to continue with a likely next word. Well, the next word will probably be the name of a country, and most likely a country that is talked about often, so you pick 'America'. Now you have:

The country right between America

Almost certainly the next word is 'and', so you add it:

The country right between America and

The next word will probably also be the name of a country, but which country? Probably a country that is often mentioned in geographic relation to America, such as Canada or Mexico. Let's say you pick Canada. Now you have:

The country right between America and Canada

And of course a very likely next word would be 'is':

The country right between America and Canada is

So what comes next? As a human, at this point you're realizing that there is no country between America and Canada and you really should go back and change the sentence accordingly. (You might have even anticipated this problem in advance.) But as an AI, you can't go back and edit the text, you're committed to what you already wrote, and you just need to find the most likely next word after this, which based on the general form and topic of the sentence will probably be the name of yet another country, especially a country that is often mentioned in geographic relation to America and Canada, such as Mexico. Now you have:

The country right between America and Canada is Mexico

Time to finish with a period:

The country right between America and Canada is Mexico.

Looks good, right? You picked the most likely word every time! Except by just picking likely words and not thinking ahead, you ended up with nonsense. This is basically what the AI is doing, and it doesn't only do it with geography, it does it with all sorts of topics when its intuition about a suitable next word isn't accurate enough.

1 day ago
Cryovenom

I love this example. Easy to understand and close enough to what's going on to be useful in understanding the way LLM AIs work.

1 day ago
Xerxeskingofkings

Large Language Models (LLMs) dont really "know" anything, but are in essence extremely advanced predictive texting programs. They work in a fundamentally different way to older chatbot and predictive text programs, but the outcome is the same: they generate text that is likely to come next, without any coherent understanding of what it's talking about.

Thus, when asked about something factual, it will created a response that is statistically likely to be correct, based on its training data. If its well trained, theirs a decent chance it will generate the "correct" answer simply because that is the likely answer to that question, but it doesn't have a concept of the question and the facts being asked of it, just a complex "black box" series of relationships between various tags in its training data and what is a likely response is to that input.

Sometimes, when asked that factual question, it comes up with an answer that statistically likely, but just plain WRONG, or just make it up as it goes. For example, thier was an AI generated legal filing that just created citations to non-existent cases to support its case.

This is what they are talking about when they say its "hallucinating", which is a almost deliberately misleading term, becuase it implies the AI can "think", whereas it never "thinks" as we understand thoughts, just consults a enormous lookup table and returns a series of outputs.

2 days ago
Raioc2436

You might have heard that machine learning models are made of many neurons.

A model that has a single neuron is called a perceptron. So think you have a neuron and you are trying to train it to identify whether John will go to the circus given some situation.

Your neuron will have 1 output (likelihood of John going to the circus) and 3 inputs (whether Sarah is going, whether Marcus is going, whether Anna is going).

You will feed this model with many past experiences and adjust the function to accommodate them. Eventually the model will learn that it’s a safe bet to say you are going to the circus EVERY TIME Sarah is there, you ALWAYS avoid being alone with Anna, and you go to the circus if Marcus and someone else is there

Great, but the real world is more complex than that. There are situations beyond those. Maybe it’s raining, maybe you ate something bad, maybe you’ve seen Sarah everyday for the past month and wants a break from her. Point is, even if the model gives the best results on average given its training data, it might still make mistakes.

Real machine learning models have lots of neurons, all connected in different and complex organizations. Their “thinking” and their “mistakes” all inform the next one in complex ways

1 day ago
pconrad0

This article provides a good explanation and is written by an expert:

Peter J. Denning. 2025. In Large Language Models We Trust? Commun. ACM 68, 6 (June 2025), 23–25. https://doi.org/10.1145/3726009

1 day ago
tlst9999

LLM is essentially an autocomplete.

Recall those dumb old memes about autocomplete saying really dumb things to complete the sentence. The words are in order, but the sentences don't make sense.

Now, apply it to a larger scale like paragraphs and essays. That's "hallucinating"

1 day ago
Anders_A

Hallucination is just an euphemism for it being wrong.

Everything an LLM tells you is made up. It's just that sometimes it makes stuff up that is actually true. It has no way of knowing whether what it says is true or not though.

Some people like to pretend that it hallucinates sometimes when this happens, when in fact it's, to the llm, just business as usual.

1 day ago
ledow

Nothing to do with "emotional" answers.

They are large statistical engines. They aren't capable of original thought. They just take their training data and regurgitate parts of it according to the statistics of how relevant they appear to be to the question. With large amounts of training data, they are able to regurgitate something for most things you ask of it.

However, when their training data doesn't cover what they're being asked, they don't know how to respond. They're just dumb statistical machines. The stats don't add up for any part of their data, in that instance, so they tend to go a bit potty when asked something outside the bounds of their training. Thus it appears as though they've gone potty, and start "imagining" (FYI they are not imagining anything) things that don't exist.

So if the statistics for a question seem to succeed 96% of the time when they give the answer "cat", they'll answer "cat" for those kinds of questions, or anything similar.

But when they're asked a question they just don't have the training data for, or anything similar, there is no one answer that looks correct 96% of the time. Or even 50% of the time. Or at all. So what happens is they can't select the surest answer. It just isn't sure enough. There is no answer in their training data that was accepted as a valid answer 96% of the time for that kind of question. So they are forced to dial down and find words that featured, say, 2% as valid answers to similar questions.

This means that, effectively, they start returning any old random nonsense because it's no more nonsense than any other answer. Or it's very, very slightly LESS nonsensical to talk about pigeons according to their stats when they have no idea of the actual answer.

And so they insert nonsense into their answers. Or they "hallucinate".

The LLM does not know how to say "I don't know the answer". That was never programmed into its training data as "the correct response" because it just doesn't have enough data to cater for the situation it's found itself in... not even the answer "I don't know". It was never able to form a statistical correlation between the question asked (and all the keywords in it) and the answer "I don't know" for which it was told "That's the correct answer".

The "training" portion of building an LLM costs billions and takes years and it is basically throwing every bit of text possible at it and then "rewarding" it by saying "Yes, that's a good answer" when it randomly gets the right answer. This is then recorded in the LLM training as "well... we were slightly more likely to get an answer that our creator tells us was correct when cats and baby were mentioned if our answer contained the word kitten". And building those statistical correlations, that's the process we call training the LLM.

When those statistical correlations don't exist (e.g. if you never train it on any data that mentions the names of the planets), it simply doesn't find any strong statistical correlation in its training data for the keywords "name", "planet", "solar system", etc. So what it does is return some vague and random association that is 0.0000001% more likely to have been "rewarded" during training for similar keywords. So it tells you nonsense like the 3rd planet is called Triangle. Because there's a vague statistical correlation in its database for "third", "triangle" and "name" and absolutely nothing about planets whatsoever. That's an hallucination.

2 days ago
simulated-souls

All of the other responses here are giving reductionist "LLMs are just text predictors" answers, so I will try to give a more technical and nuanced explanation.

As seen in Anthropic's Tracing the Thoughts of a Large Language Model research, LLMs have an internal predictor of whether they know the answer to the question. Think of it like a neuron in a human brain: if the LLM knows the answer then the neuron turns on, and if it doesn't know the answer then it stays off. Whether that neuron fires determines whether the LLM gives an answer or says it doesn't know.

For example, when the LLM is asked "What did Micheal Jordan do?", the neuron will initially be off. As each layer of the LLM's neural network is computed, the model checks for stored information about Micheal Jordan. Once it finds a set of neurons corresponding to "Micheal Jordan played basketball", the "I know the answer" neuron will fire and the LLM will say "basketball". If it doesn't find a fact like that, then the "I know the answer" neuron will stay turned off, and the LLM will say "I don't know".

Anthropic's research found that hallucinations (the model giving the wrong answer) are often caused by faulty activation of this neuron. Basically, the model thinks it knows the answer when it doesn't. This is sometimes caused by (sticking with the above example) the neural network having a "Micheal Jordan" neuron that fires but not actually having the "played basketball" portion. When that happens it causes the network to spit out whatever information was stored where "played basketball" should have been, usually leading to an incorrect answer like "tennis".

This is a simplification of how these things work, and I abuse the word "neuron" to make it more understandable. I encourage people to read Anthropic's work to get a better understanding.

Also note that we didn't specifically build an "I know the answer" neuron into the model, it just spontaneously appeared through the wonders of deep learning.

1 day ago
StupidLemonEater

Whoever says that is wrong. AI models don't have scripts and they certainly don't have emotions. "Hallucination" is just the term for when an AI model generates false, misleading, or nonsensical information.

2 days ago
demanbmore

LLMs aren't "thinking" like we do - they have no actual self-awareness about the responses they give. For the most part, all they do is figure out what the next word should be based on all the words that came before. Behind the scenes, the LLM is using all sorts of weighted connections between words (and maybe phrases) that enable it to determine what the next word/phrase it should use is, and once it's figured that out, what the next word/phrase it should use is, etc. There's no ability to determine truth or "correctness" - just the next word, and the next and the next.

If the LLM has lots and lots of well-developed connections in the data its been trained on, it will constantly reinforce those connections. And if those connections arise from accurate/true data, then for the most part, the connections will produce accurate/true answers. But if the connections arise (at least in part) from inaccurate/false data, then the words selected can easily lead to misleading/false responses. But there's no ability for th LLM to understand that - it doesn't know whether the series of words it selected to write "New York City is the capital of New York State" is accurate or true (or even what a city or state or capital is). If the strongest connections it sees in its data produce that sentence, then it will produce that sentence.

Similarly, if it's prompted to provide a response to something where there are no strong connections, then it will use weaker (but still relatively strong) connections to produce a series of words. The words will read like a well informed response - syntactically and stylistically the response will be no different from a completely accurate response - but will be incorrect. Stated with authority, well written and correct sounding, but still incorrect. These incorrect statements are hallucinations.

2 days ago
alegonz

The Chinese Room Thought Experiment is crucial to understanding what LLM's are.

Imagine a man who only reads & speaks English, is locked in a room where he can speak to no one.

His only outside contact is a man who only speaks & reads Chinese.

Neither is aware of the identity of the other. Neither can talk to the other.

The man outside writes questions in Chinese on paper and slides them under the door. The man inside doesn't know what's written, but thankfully, has a huge book. Any question written in Chinese is in the book, along with an answer. He just has to find the page where the symbols match the paper.

Once he matches the symbols on the paper to the book, he copies the answer onto the paper and slides it back under the door.

The person outside believes he is conversing with a fluent Chinese speaker.

However, the man inside knows neither what the question says or what the answer says, he's just matching the symbols to their answer and giving that answer.

This is what LLM's are.

In their case, they're finding answers to each part of your question, and writing a set of symbols that most closely match each one put together.

It can't "know" if a detail it writes is wrong or not.

It's just the man in the Chinese room.

1 day ago