Unlocking the secret garden of the AI brain: analyzing Claude 3.5 through Anthropic, and seeing how AI thinks

Unlocking the Secret Garden of AI Brains: Analyzing Claude 3.5 with Anthropic to See How AI Thinks

Preface: When AI becomes too smart, black boxes are no longer reassuring

After 2024, AI tools have penetrated into every corner of our lives. From small robots that automatically reply to messages on LINE to smart assistants used by companies to generate reports and write programs, AI seems to have become a part of our work and life. As a user of at least five different AI tools every day, I am often amazed at their fluency and intelligence. At some moments, I even feel that they understand me better than I understand myself!

But because of this, a sense of unease begins to emerge - do we really understand how these AIs reach their conclusions? Whenever I see AI complete an almost flawless report, a question inevitably arises in my mind: Does it truly understand these results, or is it just a coincidence?

If I were to use a picture to describe today's AI, it would probably be: it is like a strange plant that can grow on its own. We see it blooming beautiful flowers and bearing attractive fruits, but when we pick up a magnifying glass, we find that we have no idea how its roots, stems, and leaves interact with each other.

A recent study published by Anthropic, is trying to open this black box. They used an almost biologist-like approach to analyze the internal operating mechanisms of large language models such as Claude 3.5. Instead of just looking at inputs and outputs, we can observe cells and trace neurons, and try to answer the question: "What is each cell of this strange plant doing?"

If AI really enters sensitive fields such as medicine, law, and finance in the future, we cannot just look at the performance results, but must truly understand whether its reasoning process is reliable, safe, and controllable. Today, let’s explore how the AI brain works through Anthropic’s research!

The “biological structure” of AI models: Why use biology as a metaphor?

When it comes to understanding the internals of a large language model (LLM) like Claude 3.5, the Anthropic team chose a refreshing metaphor: think of the model as a living organism.

This idea may sound a bit strange at first. After all, LLM is a neural network architecture designed by humans, not a real living organism. But when we take a closer look at how the model works internally, we find that it is strikingly similar to structures that have evolved in living organisms.

Organisms reproduce through DNA replication and mutation, but within each individual, cells self-organize into complex systems such as the heart, lungs, and brain, each responsible for different functions. LLM adjusts itself in huge amounts of data to form thousands of features, and these features are interconnected to form a high-level structure that we can regard as a circuit system.

In other words, Features are like cells and Circuits are like organ systems. When the model becomes huge, the complexity of its internal self-organization has exceeded the scope that can be fully controlled by simple human design.

Attribution Graphs: Technology that puts a microscope on AI

In order to truly see the internal structure of LLM, Anthropic developed a new technology called Attribution Graphs: it is like installing a microscope on the AI brain, allowing us to track how each Feature participates in the formation of the final output.

Traditional methods of understanding models mostly focus on observing the relationship between input and output. But Attribution Graph does more than that. It can accurately mark the "output" of each feature during the computing process, just like biologists use fluorescent labeling technology to mark living cells and track how they differentiate and move.

Going a step further, Anthropic incorporates the so-called Circuit Tracing method. This is like drawing a connectome map in the brain, trying to draw a complete path map of how each feature affects other features.

As a heavy user of AI, I have felt a sense of instantaneous flashes of inspiration when using various LLMs in the past, but I could not explain where this leap of reasoning came from. The emergence of Attribution Graph gives us the first opportunity to understand the formation mechanism of these "flash points of thinking" from the inside.

Case Analysis: The truth behind Claude 3.5’s “little drama in the brain”

Two-step reasoning: From "Where is Dallas" to "Texas → Austin"

When the model was asked "Which state is Dallas in?", it did not directly memorize the answer, but went through at least two steps of reasoning: first, it recognized that Dallas is in Texas, and second, it inferred that the capital of Texas is Austin.

The existence of this chain of reasoning is clearly visualized through the Attribution Graph. Each intermediate inference step has corresponding feature activation and interaction.

Just like when high school students answer multiple-choice questions, they first quickly filter out the geographical information of "Texas" in their minds, then associate "Austin" with Texas, and finally come up with the answer.

This can actually be compared to the implicit chain of reasoning implied in the answering skills that students have been trained to develop since childhood in Taiwan's education system. If AI can also develop similar chain inference capabilities, it will have great potential in education and exam assistance in the future. However, we must also be vigilant about whether its reasoning process is sound, otherwise there will be a danger of "answering correctly but thinking wrongly."

Poetry Writing: The Secret to Planning Your Rhyme Ahead of Time

When composing poetry, Claude 3.5 does not improvise and write down each sentence. Instead, before he actually starts writing, his internal system has already listed a list of possible rhyming words.

This phenomenon is intuitively visualized through Attribution Graph. Just like when a poet writes a poem, he will first quickly scan in his mind which words can rhyme, and then choose the words that best fit the situation to continue writing. This structural advance planning allows AI to find a better balance between language fluency and beauty, rather than just randomly arranging fancy words.

This also responds to the concerns of many content creators about AI writing: AI is not just about stacking beautiful sentences, it is beginning to be able to "pre-design", which means that it has greater potential for application in copywriting, brand narratives, and even pop culture generation in the future.

Multilingual patterns: language-specific circuits versus cross-language general circuits

Anthropic also discovered that Claude 3.5's brain contains dedicated circuits optimized for different languages (such as English, French, and Spanish), as well as a high-level logic system that is universal across languages.

This can also be applied to the way humans learn: when we learn Chinese as children, our brains specialize in practicing Chinese syllables and grammar; but as we grow up, we also learn to use abstract logic to solve problems in different languages.

As a Chinese speaker, I have deeply realized a fact: if Chinese LLM wants to reach world-class level in the future, it cannot rely solely on translation, but must also develop "native language feature" circuits that are unique to the Chinese context. Otherwise, it will never catch up with native in terms of delicate expression and understanding of implicit semantics.

Diagnostic reasoning: How AI “pre-sets” possible diseases

When faced with medical-related questions, Claude 3.5 showed characteristics of a clinician's thinking pattern. It does not force an answer based on the symptoms, but instead develops a "candidate diagnosis list" in the brain.

For example, when it encounters the description of "sore throat + fever", it simultaneously activates multiple possibilities such as "cold", "flu", "streptococcal infection", and filters them based on the details. This visualization of the thinking process can also be applied to the AI medical application market: If AI-assisted diagnosis is to be localized in the future, it is necessary to ensure that AI is not just reciting textbooks, but truly has the ability to "form and screen candidate hypotheses."

Rejection and misjudgment: How the model decides what to answer and what to reject

Finally, Anthropic also revealed how Claude 3.5 established the "harmful request detection" feature. For example, when encountering sensitive questions, the refusal logic is automatically activated and responded in a safe tone.

However, this system is not perfect. Sometimes it is overly cautious and misclassifies harmless problems; sometimes it makes mistakes and lets harmful problems slip by!

Limitations and unsolved mysteries: What other “black box blind spots” does AI have?

Even though Attribution Graphs technology gives us our first glimpse into the details of LLM, it is still just the tip of the iceberg. Anthropic itself admitted in the paper that current tools cannot fully reconstruct the detailed interactions between all features. Some implicit inferences and contextual integration mechanisms are still hidden in our sight like deep-sea creatures.

To truly understand AI, just as modern neuroscientists are trying to decipher the human brain connectome, it will require more detailed descriptions, more data, and more sustained investment.

For a society like Taiwan that actively embraces technology, I think now is a good time for us to rethink:
In the future, will we be consumers who only use AI tools, or will we become experts who can dissect, understand, and even actively design AI systems?

This choice will also determine our role in the next wave of technology.

Conclusion: Understanding AI is like understanding your own brain

This Anthropic study has undoubtedly revealed a glimpse into the inner world of AI. We are beginning to realize that AI is no longer a pure black box; it has its own "cells", "organ systems", "reasoning networks", and even a primitive "little theater". But at the same time, it also reminds us that true understanding has only just begun, and there are still many corners that have not been illuminated and many mechanisms that have not been fully mastered.

As someone who lives with AI every day and relies on it to speed up work efficiency, my views on the development of AI are contradictory: on the one hand, there is wonder and hope, and on the other hand, there is caution and introspection.

Perhaps understanding AI in the future will be like understanding your own brain. It is a long journey that requires time and energy.
We also hope that Taiwan will not only be a user on this road in the future, but also a creator and guide!

 

Related reports

Learn U.S. Stocks in 5 Minutes》What does NVIDIA do? How to become the world's number one with graphics cards?

After being criticized for using hard labor, how did Scale AI become a unicorn in the data annotation industry?

related articles

Decrypting NVIDIA: 6 key points to help you understand the secret of the AI king’s stock price soaring 240% (Part 1) 

Taiwan’s first AI unicorn: What is Appier, with a market value of US$1.38 billion, doing?

Deciphering Notion’s entrepreneurial story: How can a small No-code idea subvert the global 60 billion productivity market?

 

What is DNS? Introduction to Domain Name System – System Design 06

Introduction to System Design Components Building Block – System Design 05

Back-of-the-envelope Back-of-the-envelope Calculation – System Design 04

Non-functional features of software design – System Design 03

Application of abstraction in system design – System Design 02

Introduction to Modern System Design - System Design 01

en_USEnglish