top of page

Humans vs. AI in Codenames: Word Games, Bad Guesses, and Mental Fatigue

  • Writer: Leoza Kabir Barker
    Leoza Kabir Barker
  • Jun 24
  • 4 min read

Updated: Jun 26

One evening, a few of my friends and I decided to try something different. Instead of another AI debate, we brought it into our game night and made it play Codenames with us.


We played a few rounds of Codenames with a twist: we let ChatGPT, Claude, and Grok take part. It started off as a fun, low-stakes idea, but quickly turned into something more chaotic and interesting than we expected.



Codenames

If you haven’t played Codenames before, here’s a quick rundown.


It’s a team word game where one person (the codemaster) gives a one-word clue and a number to help their team guess the right words on a board. But there’s a catch, some words belong to the other team, some are neutral, and one is the assassin. Guess that, and it’s game over.


In our case, we swapped out some of the players, and even the codemasters, with AI.


Codenames with AI


Game 1: AI vs. Humans: We Were Representing Humanity


We started off lighthearted, but quickly found ourselves saying things like,


“Wait… are we representing humanity right now?”


What was supposed to be a fun experiment suddenly felt a bit more intense.


On one team we had ChatGPT as code master who gave the clues and ChatGPT (different), grok and Claude at guessers.


On the other team was all humans, as code master and guessers.


We gave AI guessers a word board and prompts like:


“Your clue is Machine – 4. Words: [list]. Guess in order of most to least obvious. Your turn ends if you guess wrong.”


Chatgpt and Grok made solid guesses like engine, battery, fan, laser. Claude followed similar logic. But AI still made some odd calls, guessing comic for “Marvel,” or Bermuda for “City” (which, uh, no).


Humans were sweating.


We passed on round one to avoid risky guesses, then made a strong comeback in round two with three more words. We were catching up, four to five.


And then… we picked the assassin.


AI wins. Or more accurately, humans lose....


Friends playing codenames

Game 2: AI as Tool: But Was It Even Helpful?

We played again with the same team setup. This time, the human codemaster in round two decided to use ChatGPT for clue suggestions.


ChatGPT as Codemaster Again

The AI team went first. All three models guessed the same words, but they started off with inaccurate guesses.


Then ChatGPT guessed a word that had already been played. After that, it hit two neutral words. And finally, it guessed the assassin.


We cheered a little louder than we should have. The AI lost the game.


We were off the hook. It felt good to see that AI could make the same kind of fatal mistake we had made earlier. It was a tiny moment of human redemption.


Human Codemaster Using AI Assistance

Our human codemaster used ChatGPT to help come up with clue words. Technically, it was helpful. Practically, it was a lot.


Here’s a real example. The codemaster needed to link the words octopus, tie, tablet, and horn. He typed:


“I’m playing Codenames. I’m trying to find a word that connects with octopus, tie, tablet, and horn.”


ChatGPT responded with a breakdown of all the associations, suggesting words like tentacle, wire, wrap, and finally arm.


It made its case like this:


“Octopus has arms. You tie something with an arm. You hold a tablet with your arm. A horn can be mounted on an arm. So maybe 'arm' is your clue.”


That is a lot of thinking for one clue. And that’s how the whole round went. Every time the codemaster asked for help, ChatGPT gave five possible clues with full justification.


Another time, he tried to link tie, grass, and horn. ChatGPT walked through the logic of “farm,” “cow,” and “pasture,” and eventually suggested “cow.” The code master ended up using “cow”, but it took far too long.


The human reaction?


“This is using more brainpower than doing it myself.”

“I’m exhausted.”


Instead of saving time or simplifying the process, the AI created decision fatigue. The human had to do all the mental filtering and final thinking, ironically, more work than if he’d just done it alone.


After the AI picked an assassin, we put down the game and headed straight to the hot tub.


No more prompting. No more stretching associations.


What We Learned (Between Laughs)


We didn’t expect this to be a science experiment, but we walked away with a few solid takeaways:


  • AI needs a LOT of guidance. We had to explain game rules, clue structure, and even obvious things like “don’t guess a word that’s already been picked.”


  • Grok was the strongest performer with Claude and ChatGPT right behind. I see why Claude struggled a bit more given it’s more optimized for coding tasks than creative language play but I expected more from ChatGPT.


  • Some major human things and differentiators were:

    • Humans brought in social context: “What would this friend think is fun?”

    • Humans remembered game history: past clues, wrong guesses, and eliminated words.

    • Humans overthought, but in a way that showed emotional intelligence and personal nuance.


  • Using AI as a creative partner is exhausting. Instead of saving time, it made us work harder to interpret, correct, or ignore the AI’s suggestions.


  • AI still makes basic mistakes. Like guessing the assassin, repeating words, or choosing unrelated associations (looking at you, “tie” for “flight”).


Final Thoughts

We started this experiment nervously, like we were playing on behalf of humanity.

But by the end, we were laughing at the bizarre associations, celebrating when AI failed, and reminding ourselves that maybe we’re not obsolete just yet.


And after the second game, completely mentally drained from trying to out-think or think with the AI.


If nothing else, we learned this: AI might be smart. But it still wouldn’t make the cut for game night.


About Me

I'm Leoza Kabir Barker, a Functional Architect at Alithya with a focus on the Power Platform. Through my expertise, I aim to streamline processes, optimize operations, and maximize productivity. 


Connect with Me

Comments


ABOUT

1. _DSC0318-Edit-2.jpg

Welcome to my corner of the digital world! I'm Leoza Kabir Barker, a Functional Architect at Alithya with a focus on the Power Platform and Project Operations.

My mission? To unlock efficiency through digital transformation. Through my expertise, I aim to streamline processes, optimize operations, and maximize productivity. 

CONNECT WITH ME

SUBSCRIBE

bottom of page