All posts

Why AI Sucks at Trivia... Usually

AI can answer almost anything. So why is it so bad at writing trivia questions? We dug into the problem and built something that actually works.

Jake Price··6 min read

AI can answer nearly any trivia question you throw at it. Ask ChatGPT the capital of Mongolia, the year penicillin was discovered, or the longest river in Africa, and it nails all three without breaking a sweat.

So it should be great at writing trivia questions, right?

If you've ever tried it, you already know the answer. You paste in "write me 10 trivia questions about geography" and what comes back is... fine. Technically correct. And completely lifeless. The kind of questions that make your players stare at their phones instead of arguing with their teammates.

Here's why that happens, and why we think we've figured out how to fix it.

All Facts Are Equal (To a Machine)

When you ask an AI to generate trivia, it treats every fact in its training data with roughly equal importance. The capital of France and the capital of Burkina Faso are the same kind of retrieval task. There's no internal ranking of "things most people actually know" versus "things only geography nerds know."

This matters because good trivia is built on common ground. The best questions tap into shared knowledge, the stuff people absorbed from school, movies, dinner conversations, and random Wikipedia rabbit holes at 2 AM. AI doesn't have a model for that. It has facts, billions of them, and no intuition about which ones actually stuck in people's heads.

So you end up with rounds where one question is laughably easy and the next is impossible, with no rhyme or reason connecting them.

Difficulty Is a Feeling, Not a Label

Ask ChatGPT for "medium difficulty" trivia and you'll get questions it has labeled "medium" based on... something. Maybe word count. Maybe topic obscurity. It's hard to tell, and that's the problem.

Real difficulty is experiential. A "medium" question for a room full of sports fans is completely different from a "medium" question for a group of coworkers at a corporate event. Difficulty isn't a property of the question. It's the relationship between the question and the people answering it.

Good trivia has pacing and rhythm. You want a warm-up question that gets people talking, a few that build confidence, a curveball that makes the whole room groan, and a closer that separates the contenders. AI doesn't think in arcs. It generates a flat list of ten questions that happen to be about the same topic.

The Sweet Spot AI Can't Find

The best trivia questions live on the edge of memory. That feeling of "I know this... don't tell me... it's right there" is what makes trivia fun. It's the reason people come back week after week.

Finding that sweet spot requires understanding the difference between "hard but guessable" and "impossible without a PhD." A question about which Southern city hosted the 1996 Summer Olympics is hard but guessable. You can reason your way there: major U.S. city, late '90s, maybe you remember the logo or the bombing. A question about the 1906 Intercalated Games host city is just obscure. No amount of reasoning helps if you've never heard of the event.

AI has no instinct for this distinction. It can't tell which questions will produce that satisfying "aha!" moment and which will just produce silence followed by "...what?"

Entertainment Value Is Invisible to AI

Here's something that surprises people: the answer matters as much as the question. Or more precisely, the framing matters.

Compare these two questions:

"What is the highest-grossing animated film of all time?"

"Passing Frozen II and The Lion King remake, what animated film crossed $1.7 billion to become the highest-grossing in the genre's history?"

Same answer. Completely different energy. The second version gives you multiple pathways to get there. Even if you don't know the answer outright, you can work with what the question gives you. That's what makes a question feel satisfying rather than arbitrary.

AI defaults to the first version almost every time. It optimizes for correctness and conciseness, not for the "oh, that's clever" reaction. It doesn't know that a well-constructed question should feel like a puzzle, not a quiz show flashcard.

So Why "Usually"?

Everything above is true about general-purpose AI tools. ChatGPT, Claude, Gemini... they're brilliant at answering questions and mediocre at writing them. But the problem isn't that AI can't do this well. The problem is that nobody told it how.

That's what we built Forge to solve.

Forge isn't a chatbot with a trivia prompt bolted on. It's a multi-agent pipeline where six specialized AI agents each handle a different part of question construction, with explicit quality gates between every step.

Here's what that looks like in practice:

Questions are designed before they're written. An Architect agent creates a blueprint for each question before any writing happens. That blueprint defines multiple "clue pathways," independent hints woven into the question text that give solvers different angles to reason from. The 1996 Olympics question above? That's the blueprint approach at work: CNN headquarters, Coca-Cola's birthplace, the '96 Summer Games. Three ways in, not just one.

Every fact gets verified. A dedicated Researcher agent takes the blueprint's claims and runs targeted searches to confirm them against real sources. If something doesn't check out, the system catches it and self-corrects before the question ever reaches you. No more "fun fact" questions where the fun fact is wrong.

Quality is scored, not vibed. An Evaluator agent scores every question across three pillars: clue richness (are there multiple pathways to the answer?), inferential tractability (can someone reason toward the answer without just knowing it?), and signal efficiency (does every word pull its weight?). Questions that don't clear the bar get sent back for revision automatically.

Difficulty means something specific. Forge maps difficulty levels to concrete knowledge thresholds. "Easy" means 80% or more of a general audience would know the answer from direct recall. "Hard" means you need specific knowledge but can still make an educated guess. These aren't vibes. They're calibration targets that shape everything from research queries to clue design.

The result is trivia that feels like a human wrote it on a good day. Questions with rhythm, personality, and that critical "I should know this" quality that keeps players engaged.

Worth a Try

If you've been burned by AI-generated trivia before, we get it. We were too. That's literally why Forge exists.

It's free to try during beta. Generate a few questions, run them at your next trivia night, and see if your players can tell the difference. We're betting they can't.

Try Forge free →