After 2024, AI tools have penetrated into every corner of our lives. From small robots that automatically reply to messages on LINE to smart assistants used by companies to generate reports and write programs, AI seems to have become a part of our work and life. As a user of at least five different AI tools every day, I am often amazed at their fluency and intelligence. At some moments, I even feel that they understand me better than I understand myself!
But because of this, a sense of unease begins to emerge - do we really understand how these AIs reach their conclusions? Whenever I see AI complete an almost flawless report, a question inevitably arises in my mind: Does it truly understand these results, or is it just a coincidence?
If I were to use a picture to describe today's AI, it would probably be: it is like a strange plant that can grow on its own. We see it blooming beautiful flowers and bearing attractive fruits, but when we pick up a magnifying glass, we find that we have no idea how its roots, stems, and leaves interact with each other.
A study recently published by Anthropic is an attempt to open this black box. They used a nearly biologist-like approach to analyze the internal operating mechanisms of large language models such as Claude 3.5. Instead of just looking at inputs and outputs, we can observe cells and trace neurons, and try to answer the question: "What is each cell of this strange plant doing?"
If AI really enters sensitive fields such as medicine, law, and finance in the future, we cannot just look at the performance results, but must truly understand whether its reasoning process is reliable, safe, and controllable. Today, let’s explore how the AI brain works through Anthropic’s research!