What stronger chess players see that ChatGPT does not.
ChatGPT's output is words without thought, especially when it comes to chess.
As I was reading The Philosopher and the Housewife, a quote entered into my consciousness and hasn’t left since:
When you look at a position, you see (recognize) what you already know. Therefore, when Tarrasch looks at that French bishop on c8, he sees something else than the beginner. The latter sees a total unknown, while to Tarrasch this bishop is as familiar as his closest relative. There is a world of difference between what Tarrasch sees and what the beginner sees when he looks at that bishop on c8, and that is only one element in a complex position. The idea that you can bridge that immense difference with a few well-chosen words – that is the trainer’s fallacy.
Hendriks, Willy. The Philosopher And The Housewife: Tarrasch, Nimzowitsch and the Evolution of Chess (p. 649). (Function). Kindle Edition.
Hendrik’s point is that it takes more than words in order to build chess strength — you cannot distill the essence of chess into a very well-written paragraph.
Hendriks also writes, slightly earlier:
Understanding of a variation shows itself in the ease with which the connoisseur can constantly conjure up new variations. Together with these variations, he can also provide explanations in words, using positional concepts. Afterwards, when you understand ‘what it’s about’, you can try to explain the variations in words. But that explanation is not a record of the way you arrived at concrete variations starting from general principles, from the spirit of the opening. The extensive, subtle, intricate and widely distributed understanding we are talking about here is not at the beginning of the variations, but is the result of them.
Hendriks, Willy. The Philosopher And The Housewife: Tarrasch, Nimzowitsch and the Evolution of Chess (p. 640). (Function). Kindle Edition.
The reason why I was struck was this has to do with the chess-playing capabilities (or lack thereof) of large language models like ChatGPT. Actually, I’m struck by the analogy between chess and any other doctrine or body of knowledge and how AI can trick us into thinking that it (and possibly consequently we) know more than it does, simply because of its ability to output a gross amount of words.
In fact, for some people, it’s simply inevitable that LLMs like ChatGPT will eventually become extremely strong at chess, because it can plausibly explain variations in words, a la the common caricature of the Silman approach to Chess. I call it a caricature because Silman himself thought in variations like all players did; yet it remains a common idea among some that one can get strong merely by being able to understand the game so deeply you can explain it in pure words, and that this understanding will result in being able to find the right moves over the board. This seems a common beginner “adult improver” mindset, for example.
And in fact, we do have a lot of chess-focused literature which is capable of arresting our attention precisely because they are full of words and not simply printouts of variation after variation. Not only this, but the writing is compelling, and in many cases the examples are good, and the authors themselves are strong players, which makes their claims to create thought systems that really work really credible.
But I think Hendriks is really on to something important when he talks about how the ease with which we can come up with new variations in a given position is an indication of our knowledge of that very position. As he said, (emphasis mine) “the extensive, subtle, intricate and widely distributed understanding we are talking about here is not at the beginning of the variations, but is the result of them". Variations don’t come from understanding. Understanding comes from variations.
And so this is one reason why I think ChatGPT can write plausible things about chess and specific positions that displays a sort of surface understanding (the kind that a human gets from reading a chess book), but not true knowledge (the kind that consists in variations they come up with while analyzing the position). But, like basically any other field of knowledge, to detect ChatGPT’s mistakes often requires a sort of experience that can only be gained through play and an intimate knowledge of a given position or variation that has made its way deep down into the bones of the player. In other words, beware using ChatGPT to learn chess — because it doesn’t know chess. It’s just really good at predicting words.
The position
Take the following position.
How would you answer the following questions? If you want, leave your answers in the comments.
What is your evaluation of the position? Include any details you feel are relevant to your evaluation.
Which side, if any, has the advantage? If so, how big is that advantage?
What would your plan be for both sides in this position?
What opening do you think this position resulted from?
What is your strength level (however you choose to measure that)?
I brought similar questions to ChatGPT and here’s how it responded. My comments are in [brackets].
Evaluating the position
Me: Estimate the position based on typical evaluation principles, please!
ChatGPT: Great! Let’s evaluate the position described by the FEN:
FEN: r2q1rk1/3bppbp/p1pp2p1/8/4P3/2N1QN1P/PPP2PP1/R4RK1 w - - 3 14
Positional Breakdown
✅ White's Positional Highlights
Center Control: White has a strong pawn on e4, supported by knights and queen. The central tension is favorable. [White’s central control isn’t anything to write home about in my opinion, but especially, there is no tension to speak of — Black’s pawns haven’t yet made contact with the White center]
Development: All of White’s minor pieces are developed, and the rooks are connected. Coordination is good. [This is true enough, but Black’s control of the fifth rank makes it difficult for these knights to find active positions]
Piece Activity: Knights on c3 and f3 are well-placed. Queen is centralized on e3 — a powerful square. [The Knights are fine but not particularly active; the queen’s position on e3 isn’t powerful]
King Safety: White's king is castled and well-protected. [Fair!]
⚠️ White's Minor Drawbacks
The light-squared bishop is not yet developed, possibly heading to d2 or e3 soon. [A hallucination. There is no bishop at all.]
The h3 pawn might be a long-term target (though it's not weak now). [It is hard for me to imagine how this pawn could become weak from this position].
✅ Black's Positional Highlights
Solid Pawn Structure: No immediate weaknesses. [Fair!]
Bishop Pair: Both bishops are active and point toward the center and kingside. That’s often valuable in open or semi-open positions. [Also not bad, though I think the bishop on d7 faces similar issues to White’s knights at the moment].
⚠️ Black's Drawbacks
Development: Queen is developed, and bishops are active, but the queenside rook on a8 and knight (not yet developed) are still dormant. [Two mistakes here. The queen hasn’t been developed yet and there isn’t any knight.]
King Safety: King is castled but surrounded by slightly weakened pawns (e.g., fianchetto structure could be targeted later). [Way too abstract. This position is really solid.]
Passive Rooks: Both rooks lack open or semi-open files for now. [What about the b-file?]
Evaluation Estimate
Material: Equal
Positional Factors:
White has slightly better piece coordination and more space. [Half correct. Piece coordination isn’t really any better than Black’s.]
Black’s development is a touch behind, especially on the queenside.
Estimated Evaluation: +0.3 to +0.6 for White [This is optimistic.]
This suggests a small but stable advantage for White, mainly due to better development, central control, and king safety. The position is still very playable for both sides, and it's not yet imbalanced. [I think this is also optimistic].
Thus evaluated ChatGPT. It’s interesting that the explanation of the position with words is almost entirely plausible, if not for the fact that it “hallucinated” a few pieces on the board. I’m not actually sure we can call this a “hallucination” as such — the model is simply trying to predict the next word in a sentence based on training data. It never really saw the position anyway.
My evaluation of the position and conclusion of who has the advantage is as follows:
Black is slightly better, because:
The bishop pair is a long-term asset and the position is likely to remain open.
Black has potential for a huge center. e7-e6, followed by d6-d5 to break down the e4-pawn and get a strong center, looks rather powerful.
The b-file is better for Black than the d-file is for White, in part because the b-pawn is busy defending the c3-knight, which makes moves like b2-b3 risky.
White’s knights don’t have good squares closer to enemy territory. It’s hard to foresee future outposts for them.
I would guess this position is close to equal in Stockfish numbers, but I think Black’s got way more potential to make moves that build the position and introduce central tension in a meaningful way. In Informator parlance, I would call this a =/+ (⩱) position.
OK, how would ChatGPT play this position?
The plan for the position
Me: What is your suggested plan?
ChatGPT: Here’s a strategic plan for both sides, based on the position from the FEN:
♔ White’s Plan (slight advantage)
1. Complete Development
Develop the light-squared bishop to either d2, e3, or possibly g5 (pin). [Still hallucinating that bishop]
Consider Re1 to place the rook on a semi-open central file. [the e-file is not semi-open; both pawns are still on the e-file]
Example moves:
Rd1, Be3, Rfe1 [I’m just curious what square this bishop is supposedly already on?]
2. Central Pressure & Pawn Breaks
Prepare e5 to challenge the center and gain space. You can support this with Nd4 or Qd2. [the e5-plan is definitely worth considering, but Nd4 and Qd2 do not support it at all]
If ...c5 comes from Black, consider dxc5 to open lines and make use of the d6-square. [White cannot play this move, as this requires a d-pawn which White does not have.]
3. Kingside Activity
Knights on c3 and f3 are well-placed; consider bringing Ne2–g3 to reposition or jump into f5. [Now ChatGPT is mixing plans from another opening — perhaps Nc3-e2-g3 is plausible, but not if White is planning on giving away a whole piece on the f5 square (g6xf5!)]
If Black delays play, h4-h5 could be used to gain space on the kingside or weaken Black’s dark squares. [Plausible.]
4. Watch Black’s Bishop Pair
While you’re better coordinated, Black’s dark-squared bishop on g7 could become powerful if the center opens. Keep it under control. [The center is already open.]
♚ Black’s Plan (slightly worse, but solid)
1. Finish Development
Get the a8-rook and c8-queen’s knight into the game: Nd7, Qc7, Rac8. [Again, hallucinating the knight; additionally, I’m not sure what the move Rac8 is supposed to do.]
You might fianchetto the queen’s knight via Nb6–Nd7–Nf6 depending on how White plays. [More hallucination, also, fianchettoing the queen’s knight?].
2. Challenge the Center
Look for ...c5 to hit d4 and create counterplay. [not a bad idea, though d4 is empty and nothing would be “hit” by it per se, it just increases Black’s control of the center].
Alternatively, prepare ...e5 to break White’s grip on the center, possibly with f5 support. [preparing …d5 is the better plan, increasing pressure on White’s center.]
3. Maintain the Bishop Pair
Avoid early trades unless forced. Bishops could dominate if the center opens. [The center is open already, but not bad advice.]
Be patient: once developed, look for diagonals like a1–h8 or b2–g7 to open up. [Good]
4. Avoid Passive Rook Positions
Try to put at least one rook on a central file: Rfd8 or Rac8. [These are “centralizing” but not particularly active.]
Don’t let White own the d-file without contest. [this file isn’t as important as the b-file]
Summary [I had to re-arrange these from a table ChatGPT made]
For White
Strategy: Consolidate and press center.
Key Moves/Ideas: Be3, Rd1, Rfe1, e5, Ne2–g3–f5
For Black
Strategy: Solidify and counterpunch
Key Moves/Ideas: Nd7, Qc7, Rac8, ...c5 or ...e5, keep bishops
A rather big mouthful from ChatGPT! Unfortunately things went off the rails here because a lot of its strategy was based on the non-existent bishops for White and missing knights for Black; additionally, it appeared to over-estimate the importance of the d-file for Black, ignored the potential problems of the b-file for White. Lastly, in the case of its plan for White, it wanted to sacrifice a knight on f5 (or at the very least, move a knight there; it doesn’t understand that this is actually a sacrifice.)
As far as plans go for White, Rf(a)d1 seems like a good start, to put a little pressure on d6 and dissuade any …e5 ideas for now. Rfe1 is also an idea, preparing for a later e5 push themselves. Lastly, White could simply open the position more with the move e5! right away. This is counter-intuitive, because the position opens for Black, who has the bishop pair already. However, Black would be compelled to move the d6 pawn somewhere, and this opens up access to the c5 and e5 squares for White, allow their knights to get more active. White should go for central operations to get some counterplay.
For Black, their position is so solid already, that moves like Rab8 (striking the weak b2-pawn and increasing pressure on the c6 knight) make a lot of sense. At a later point, the move c6-c5, opening up a new diagonal for the d7-bishop, could be useful (it also grants the bishop access to the fianchetto with the move Bc6). Black’s position doesn't have space, but is rather nicely compact, which I think is a strength in this position, not being over-extended.
I asked ChatGPT one more question, and I’ll summarize it a bit, since it’s full of wrong information anyway.
What opening got us to this position?
Me: Based on the position I gave, what opening do you think was played?
ChatGPT: Great question. Let's take a look at the position again and reverse-engineer the likely opening:
FEN after 13...O-O:r2q1rk1/3bppbp/p1pp2p1/8/4P3/2N1QN1P/PPP2PP1/R4RK1 w - - 3 14
Key Clues From the Position:
White’s Pawn Structure:
Pawns on c2, d2, e4, f2, g2, h2 (standard setup). [the pawn is on h3]
No d4 pawn, but an e4 pawn suggests either 1.e4 or a transposition from a queen's pawn game. [This might be possible]
White’s Pieces:
Nc3, Nf3, Be3 (likely just developed or exchanged), Qe3, Rf1, Rad1 in some lines. [No Be3]
Strong center, good development — common in many e4-based openings.
Black’s Setup:
fianchettoed kingside bishop on g7, pawn on g6
Pawns on c6, d6, e7 suggest a flexible setup.
Bishop on d7, queenside undeveloped — common in King's Indian structures. [the queenside is basically developed].
Most Likely Opening:
This strongly resembles a variation of the King’s Indian Defense, particularly the Fianchetto Variation by White.
[Well, this is interesting. For some reason, ChatGPT gave me not one, but four opening lines. The first opening line was:]
1. d4 Nf6 2. Nf3 g6 3. c4 Bg7 4. g3 O-O 5. Bg2 d6 6. O-O Nbd7 7. Nc3 e5 8. e4 exd4 9. Nxd4 Re8 10. Re1 a6 11. Be3 Ng4[??] 12. Qxg4 [the fatal mistake?]
12… Bxg4[!!] 13. hxg4[!?]
Stop the presses.
First, 12…Bxg4!! is an incredible novelty, never before seen. Second, unfortunately, this position is not only impossible but also definitely not the same position as represented in the FEN I gave ChatGPT.
ChatGPT was pretty happy with its work, so it asked me if I wanted it to try to reconstruct the move-by-move opening that led to this exact position. I thought that’s what it was doing before, but I digress.
Round 2! [I took out the explanation of the moves]
Me: Sure, please try to reconstruct the move-by-move opening that led to this exact position.
ChatGPT:
🎯 Goals for Reconstruction
We'll try to match:
White’s development: Nc3, Nf3, Qe3, h3
Black’s setup: fianchettoed Bg7, pawn structure: e7, d6, c6, castled kingside
All pieces developed or exchanged logically, and pawn structure consistent
🔄 Reconstructed Move Order
1. d4 Nf6 2. Nf3 g6 3. Nc3 Bg7 4. e4 d6 5. Be3 O-O 6. Qd2 Nbd7 7. O-O-O e5 8. dxe5 dxe5 9. Bc4 Qe7 10. a4 b5 11. axb5
11… axb5 [Imaginary, and here ChatGPT goes way off the walls] 12. Bb3 Nc5 13. Qe3 Be6 14. Nd5 Qxd5
[Another valiant attempt, but White missed a few chances to call out Black for making illegal moves, and angrily took their own bishop on e3 instead. Black was kind to return the favor with Qe7xd5!? and White now has a great position after the difficult exd5.
Helpfully ChatCPT continued:]
This line doesn't match the position exactly, so let’s pivot [wait, didn’t you just…].
✅ A More Accurate and Clean Move Order
This version better matches the final position:
1. e4 g6 2. d4 Bg7 3. Nc3 d6 4. Nf3 Nf6 5. Be2 O-O 6. O-O Nc6 7. Be3 e5 8. Qd2 exd4 9. Nxd4 Re8 10. f3 d5 11. Nxc6 bxc6 12. Bd3[??] d4 13. Bxd4 Qxd4+ 14. Qf2 Qxf2 (not exact...) [ChatGPT’s comment, not mine!]
[ChatGPT has managed to produce a legal variation, but very obviously not the correct one! Now ChatGPT gets serious with its final answer:]
Still a bit off. Let's consider the most precise line that matches all features:
✅ Final Reconstructed Line (Best Fit for FEN)
1. e4 g6 2. d4 Bg7 3. Nc3 d6 4. Nf3 Nf6 5. Be3 O-O 6. Qd2 Nbd7 7. Bh6 e5 8. Bxg7 Kxg7 9. O-O-O c6 10. dxe5 dxe5 11. Nxe5 Qa5 12. Nc4 Qc7 13. Qe3 b5 14. Nd6 Nb6
🧩 Now we reach a very similar pawn structure and piece placement! [lol, not really]
With some variations, the current position seems most likely to arise from a Pirc Defense or King’s Indian Defense: Fianchetto Variation, with a classical setup by White and early queen development. [maybe this position, but not the one I fed it.]
You know what they say, strike four, you’re out. This position actually resulted from a Sicilian Defense in which White traded on the c6 square early in the game.
How strong ChatGPT thinks it is
For the record, I also asked ChatGPT how strong it “thinks” it is. Its estimate?
ChatGPT:
When playing on my own (no engine), I can consistently play at a 2000–2200 Elo level — strong club or expert level.
That means I understand:
Positional ideas
Long-term plans
Tactical motifs (forks, pins, skewers, etc.)
Transpositions and move-order nuances
I won't miss a simple tactic, but I can make subtle inaccuracies or miss deep tactical resources that a strong master or engine might find.
I think we’ve seen enough to know that it doesn’t play at a 2000-2200 Elo level regardless of whose rating system we might refer to!
How ChatGPT can fool us
ChatGPT is a remarkable tool; it’s remarkable both for its ability to spit out complete sentences on any given subject that are incredibly plausible, and its ability to expose the limitations of applying general principles to specific positions. There comes a point where the player sees something completely different from the talking computer, when the variations that one generates tells them that something is wrong with the response. I don’t think this actually takes too much time to get this strong, but it starts with always calculating the specifics first before defaulting to the general.
It’s an extremely powerful habit that helps a player not to default to abstract principles in concrete positions (National Master Dan Heisman calls this “handwaving”), but thanks to ChatGPT we can see how even in quiescent positions general principles can lead to misevaluations. The risk and penalty in situations like this are rather small in nature and can lead to poor positions over a long series of moves, but only against a player who makes stronger moves.
Now it turns out that human players have an advantage over ChatGPT, in that whether we admit it or not, we do think in variations. Short ones that are easy to see and don’t take much time to explain to ourselves or others, but we still see them. And that is because that is where our knowledge of positions actually consists. I have some thoughts about ChatGPT and why it comes off so plausibly to us sometimes:
It’s really good at predicting what comes next.
What we tend to value is words-based content. ChatGPT is trained on words and essentially uses a form of computational pattern-recognition to predict what it should say next. Ergo, having been trained on a lot of chess-related content, it is able to output words at a shockingly high accuracy when describing things related to chess.
We don't know what we don’t know, and so if ChatGPT gains our trust from its broad base of knowledge, we are not able to distinguish between something that is true, and something that is (more or less) subtly wrong. This is true of essentially any discipline, and chess is not excluded. It takes very careful use of AI in order to not lead a oneself or another astray. Deep knowledge of any subject concerns nearly undetectable subtleties that only experience and training can truly teach. It’s the difference between knowing exactly when to play Bg4 in the Ruy Lopez with Black to the greatest effect, and playing Bg4 simply because you think it “pins the knight to the queen”.
Despite our knowledge consisting in variations, words still matter to us, even in chess. Our familiarity with positions which result from an opening is how we are so quickly able to find “the right move” in positions where unfamiliar players need to take more time and may even misevaluate features of the position; the player who played the position more and knows it better but has read nothing has an advantage over the person who has read all about it but barely practiced. But knowing how to talk about something (the jargon) gives the perception of actual knowledge, even when this is not the case.
Why we are different from ChatGPT
ChatGPT’s output and our output are often similar, but the input process is dramatically and drastically different. ChatGPT predicts the next word — in other words, it only “thinks” and explains things in a “forward” direction based off of training data. It doesn’t think reciprocally; it doesn’t reason, and therefore cannot fix a flaw in its analysis mid-thought and restart the process. Perhaps this is why when I asked for it to tell me how we got to the position at the beginning of this newsletter, it tried (and failed) four times and could never recover from its mistakes.
But this is the opposite of how humans think.
Consider that ChatGPT has possibly “read” every single chess opening book, but also has played virtually no chess. It therefore might have answers that are surface deep — even 14 or 15 moves into the opening in some cases. But it doesn’t know how to play those positions. It might get you there with words; but you’ll have to find your way out with variations.
When we go to explain our ideas, we backsplain. We give a narrative to our moves that might not always be true to how they were generated in our (sub)consciousness. When something would contradict our narrative, we realize it’s wrong, and change the explanation. We think reciprocally. We fix our variations and start over before we write it out. That narrative is perfectly suited to explain what happened and even maybe some of our concluding thoughts, but it doesn’t reflect the body of knowledge that allows us to simply reject some moves and favor others. Human play tends towards finding the best move as skill level increases, and it often turns out that what a stronger player sees in a given position is entirely different from what the weaker player sees because they reject more bad moves automatically.
The knowledge is deeper than words. The words arrive later. First come the variations.
Great post Nick. I wish I had something more insightful of my own words to say, but I’ll give you a link instead. A lot of your ideas ring true, and this article is incredibly well written: https://comment.org/on-the-origin-of-specious/