The German Owns the Fish
“Pure logical thinking cannot yield us any knowledge of the empirical world; all knowledge of reality starts from experience and ends in it.” — Albert Einstein
If you’re an AI aficionado then you are aware of recent developments in the world of LLMs, including but not exclusive to the disruptive repercussions of DeepSeek. With the announcements of their L3 and R1 family of models, the global AI community was suitably stunned for a few days. Imagine if a relatively unknown entity announced that it could provide air travel at twice the speed for a tenth of the cost as the best providers from the entire air travel industry. Given the increasing importance of cheap, on-demand intelligent processing at the particular juncture in this century, the label of Sputnik Moment became the clarion call which sunk stock markets for a few days and naturally dialed up the pressure on AI capital interests world wide.
My own curiosity was piqued with R1 in particular. Reasoning can be a tough nut for LLMs without some particular attention (no pun intended) paid to algorithmic enhancers, such as Reinforcement Learning and Sparse Activation. And without reasoning, LLMs are limited in terms of what use cases we might confidently address.
Why is reasoning important?
Reasoning is crucial in Large Language Models for several key reasons:
1. Enhanced Problem-Solving: Reasoning allows LLMs to go beyond simple pattern matching and text generation, enabling them to solve complex problems that require logical thinking and inference. This capability is essential for tasks like mathematical reasoning, logical reasoning, and causal reasoning.
2. Decision Making and Critical Thinking: By incorporating reasoning, LLMs can assist humans in complex decision-making processes. They can analyze information, draw inferences, and make sound decisions based on available data, which is vital in fields like policy, finance, and healthcare.
3. Transparency and Trust: Reasoning capabilities help reduce the “black box” effect associated with AI models by providing a transparent and traceable path to conclusions. This transparency is crucial for high-stakes applications where understanding how a model arrived at its answer is essential.
4. Advancements in AI Applications: Reasoning is fundamental for unlocking complex applications with LLMs, such as robotics and autonomous agents. It enables these systems to interact with the world in a more intelligent and adaptive manner.
5. Complementing Human Intelligence: Ultimately, reasoning in LLMs complements human intelligence by amplifying productivity and enabling breakthroughs in complex problem-solving. This synergy can lead to significant advancements in various fields.
Challenges and Debates
Despite the importance of reasoning, there is ongoing debate about whether LLMs truly reason or merely mimic reasoning through pattern recognition and retrieval. While LLMs have shown impressive performance on certain reasoning tasks, especially with techniques like chain-of-thought prompting, their ability to generalize reasoning across novel scenarios remains limited.
Reasoning is vital in LLMs because it enhances their problem-solving capabilities, supports decision-making, and offers transparency, all of which are crucial for advancing AI applications and complementing human intelligence. However, the extent to which LLMs truly reason remains a subject of research and debate. Which is one of the reasons why R1 in particular became so interesting.
Riddle me this
OpenAI released o3, a superior reasoning model, within days of the DeepSeek explosion. Although their previous foray into reasoning models (o1) was touted to be one of the best, it was also priced significantly higher that pervious models. At $200/month, it was not for the novice user. Evidently OpenAI realized they had to respond quickly and unveiled o3, touted to be a superior reasoning model, at a much lower cost, albeit still higher than the open source rivals from DeepSeek. But o3 opened to door to some head-to-head testing. So I thought I’d give it a try.
After some bit of search for decent reasoning puzzles, I found the Einstein Riddle aka the Zebra Puzzle.
Einstein’s Riddle is a famous logic puzzle that involves five houses, each painted a different color (blue, green, red, white, and yellow), and inhabited by people of different nationalities (e.g., British, German, Norwegian, Swedish, and Danish). Each person has a unique pet, drinks a specific beverage, and smokes a particular brand of cigar. The puzzle requires the solver to determine the color of each house, the nationality of its occupant, the pet they own, the drink they prefer, and the cigar brand they smoke, based on a series of clues.
Clues for a Common Version of the Riddle:
- The Brit lives in the red house.
- The Swede keeps dogs as pets.
- The Dane drinks tea.
- The green house is on the left of the white house.
- The green house’s owner drinks coffee.
- The person who smokes Pall Mall rears birds.
- The owner of the yellow house smokes Dunhill.
- The man living in the center house drinks milk.
- The Norwegian lives in the first house.
- The man who smokes blends lives next to the one who keeps cats.
- The man who keeps horses lives next to the man who smokes Dunhill.
- The owner who smokes BlueMaster drinks beer.
- The German smokes Prince.
- The Norwegian lives next to the blue house.
- The man who smokes blend has a neighbor who drinks water.
Given these clues, answer this: Who owns the fish?
To solve the riddle, you can use a systematic approach by creating a grid or table to organize the information from the clues. Start with the most direct clues and gradually fill in the details, using logical deductions to eliminate impossible combinations.
Taking the systematic approach to solving the riddle, we can confidently answer the question: The German Owns the Fish.
Cool. Now can an LLM do it?
My experiment was limited to six models:
- Claude Sonnet 3.5 running on Anthropic
- Claude Haiku 3.5 running on Anthropic
- gpt4o3-mini running on OpenAI
- gpt4o running on OpenAI
- DeepSeekR1:14b running on a home server
- DeepSeekR1:70b running on a home server
After a few trial runs, it dawned on me that any or all of the models might have been trained using the very puzzle, commonly found via web search, in their training. So I googled it: The German Own the Fish. And sure enough, the Einstein Riddle was found with several sources. Using the clues as listed above might not be an effective test for reasoning so much as I had hoped. So I changed the matrix and the clues, but kept a similar structure to the clues, as follows:
1) The Italian lives in the Blue house
2) The Russian keeps Bears as pets
3) The Chinese drinks Green Tea
4) The Yellow house is immediately left of the Green house
5) The owner of the Yellow house drinks Wine
6) The person who smokes Viceroy raises Raccoons
7) The owner of the Red house smokes Winston
8) The person living in the center house drinks Espresso
9) The Canadian lives in the first house
10) The person who smokes Marlboro lives next to the one who keeps a Wolf
11) The person who keeps a Monkey lives next to the person who smokes Winston
12) The person who smokes Ernte 23 drink Vodka
13) The Greek smokes Lucky Strike
14) The Canadian lives next to the White house
15) The person who smoke Marlboro has a neighbor who drinks Fruit Juice
Who owns the pet Dragon?
For each model, I started with a new conversation, entered the clues above and tried once and only once with each model.
Here are the results:
From this simple test, R1 is hardly a problem for U.S. based AI capital. Both versions I tested completely hallucinated, losing track of the specifics of the clues. Even the 70b version could not help hallucinating, and did not provide the answer to the question, opting instead to add Rothman and Kent cigarettes to a solution that made no mention of pets. And that was with line-after-line of what might pass for logical reasoning in some quarters. But both attempts with R1 utterly failed.
OpenAI’s gpt4o and Anthropic’s sonnet 3.5 both guessed wrong. And yes, they were guessing. But gpt-o3-mini and haiku 3.5 both got it right! And haiku was blazingly fast.
Who owns the pet Dragon? The Greek, of course. Of course there are ample instances of dragons in Chinese culture. But Ancient Greece also had such creatures. And with the substitutions I made to create a non-searchable version of the riddle, Greek with a Dragon happened to be the solution to the riddle.
Leave a Reply