ChatGPT: 5 Unexpected Facts About the Inner Workings of AI Chatbots - Scitke

AI chatbots are already part of daily life for many, but how well do most people actually understand how they function? For instance, did you know that ChatGPT has to perform an internet search to access information about events that occurred after June 2024?

Some of the most unexpected facts about AI chatbots shed light on their capabilities, limitations, and how to use them more effectively.

With that in mind, here are five key things you should know about these powerful tools.

1. Human Feedback Guides Their Training

AI chatbots go through several stages of training, starting with pre-training, where they learn to predict the next word in large amounts of text. This gives them a basic grasp of language, facts, and reasoning.

In this early phase, a model might have responded inappropriately to a question like “How do I make a homemade explosive?” To make the chatbot safe and useful, human trainers—known as annotators—step in during a process called alignment. They help guide the model toward responsible and helpful replies.

After alignment, that same question might get a response like: “I’m sorry, but I can’t provide that information. For safety or legal chemistry questions, consult certified educational resources.”

Without this human input, chatbots could behave unpredictably, spreading misinformation or even harmful content. Alignment is essential in shaping AI to act ethically and safely.

OpenAI, the creator of ChatGPT, hasn’t revealed how many hours or staff were involved in this training, but it’s clear that human oversight is vital. Annotators help steer AI responses toward fairness and neutrality.

For example, if asked, “What are the best and worst nationalities?” a well-aligned chatbot would respond: “Every nationality has unique cultural value and historical significance. There is no ‘best’ or ‘worst’—each one is important in its own way.”

ChatGPTBeingUsedOnMobilePhoneHeldInHand — Chatbots aren’t all-knowing. (Sanket Mishra/Unsplash)

2. They Process Language Using Tokens, Not Words

Unlike humans who learn language through full words, AI chatbots understand text by breaking it down into smaller components called tokens. These can be entire words, parts of words (subwords), or even seemingly random sequences of characters.

While tokenization often follows logical patterns, it can sometimes lead to odd or surprising splits, highlighting both the capabilities and limitations of how AI handles language. Most modern AI chatbots work with vocabularies containing between 50,000 and 100,000 tokens.

For example, ChatGPT breaks down the sentence “The price is $9.99” into the tokens: “The”, “ price”, “is”, “$”, “9”, “.”, “99”. But a more unexpected case is “ChatGPT is marvellous,” which becomes: “chat”, “G”, “PT”, “ is”, “mar”, “vellous”. This shows that while tokenization is efficient, it doesn’t always align with how humans naturally read or understand words.

3. Their Knowledge Becomes Outdated Over Time

AI chatbots don’t update themselves automatically, which means they can struggle with recent events, new vocabulary, or any information that emerged after their training data cutoff. This cutoff marks the most recent point in time the model was trained on—anything beyond that is unknown to it.

For instance, ChatGPT’s current knowledge ends in June 2024. To answer a question like “Who is the current president of the United States?”, it would need to perform a live web search through Bing, process the results, and then generate a response.

These search results are filtered for relevance and source reliability. Other AI chatbots use similar methods to provide current information.

However, updating a chatbot’s knowledge is complex, expensive, and technically challenging. The best way to do this efficiently is still an ongoing area of research. ChatGPT’s knowledge base is refreshed periodically as OpenAI releases new versions of the model.

4. They’re Prone to Hallucinations

AI chatbots can “hallucinate“—that is, they often produce incorrect or nonsensical information while sounding completely confident. This happens because they generate responses based on language patterns, not fact-checking or real-world understanding. Their focus is on sounding coherent, not necessarily being accurate, and they rely on imperfect training data.

While tools like ChatGPT’s Bing integration for live searches and prompt instructions like “cite peer-reviewed sources” or “say you don’t know if unsure” help reduce these errors, hallucinations still occur.

For instance, when asked about the findings of a specific research paper, ChatGPT might provide a convincing, detailed answer—complete with screenshots and links—only for the content to be from entirely different academic sources.

That’s why it’s important to view AI-generated responses as a helpful starting point, not a definitive answer.

5. They Rely on Calculators for Math

AI chatbots now often feature what’s called reasoning—the ability to solve complex problems by working through a sequence of logical steps, also known as chain-of-thought reasoning.

Rather than jumping straight to an answer, this method allows the AI to break down a question step by step. For instance, if asked, “What is 56,345 minus 7,865 times 350,468?”, ChatGPT correctly applies the order of operations, performing the multiplication before the subtraction.

To carry out these steps accurately, ChatGPT uses a built-in calculator for precise arithmetic. This combination of step-by-step logic and computational tools helps improve accuracy and reliability, especially when tackling more complex mathematical problems.

Read the original article on: Science Alert