Table of contents
TogglePreface: Is AI the “model intern” that everyone imagines?
When a small or medium-sized enterprise begins to expand, the first problem it faces is usually not the market or product, but a lack of staff. Imagine that you are the person in charge of this company today. You may need to perform three duties: replying to customers, writing copy, and handling customer reviews at the same time.
And in the midst of these tedious but important tasks, you start hearing about a new helper that could change the way you do your job: Artificial Intelligence, or more specifically: Large Language Models (LLMs).
These AI tools are portrayed as dreamy, claiming to be able to help you write copy, translate languages, summarize customer feedback, and even respond to customer service issues in real time. It sounds like a virtual intern who never needs to rest, has an amazing memory, speaks multiple languages, is always online and on call. Such a role must sound very attractive to bosses, but it also makes people curious: Is this "AI intern" really ready to participate in the company's actual operations?
To answer this question, three researchers from the University of Hull and the University of Bradford in the UK - Julius Sechang Mboli, John GO Marko and Rose Anazin Yemson - decided to conduct an experiment. They asked Google's conversational AI "Gemini" (formerly BARD) to assign it a seemingly simple but actually critical task: simplify customer reviews of Disneyland.
These comments come from different regions around the world, with diverse language styles and strong emotions, which is a good way to test whether AI can truly "understand" the meaning, select the key points, and transform them into clearer and more useful content. This is basically what an ordinary intern does.
On the surface, this task seems to be asking interns to help organize customer service records and highlight key points, but in fact it is a test of the strength of natural language processing (NLP) technology. From semantic understanding and sentence restructuring to avoiding misunderstandings or mistranslations, this experiment gave AI a "practical acceptance" to verify whether it can handle the important tasks in corporate communication.
Today’s article will take you through the perspective of this research and see how this AI intern actually performed. From the perspective of the enterprise, we will re-examine whether AI can truly become a good assistant in text work, and explore its advantages and limitations in depth. Are you ready? Let’s take a look at what happened on Gemini’s first day at work.
Research links referenced in this article: Are Large Language Models Ready for Business Integration? A Study on Generative AI Adoption
What are Large Language Models? Like a library assistant and an impromptu writer
The name Large Language Model (LLM) sounds a bit distant, but if we use a metaphor, it is like an assistant who is always on call in the library, remembering every sentence you have ever said and can help you compose a new content at any time.
To be more specific, this assistant does not really "understand" what you are saying, but "predicts" what you will say next and what you want to hear by counting the probability of occurrence of a large number of languages. Just like when you say "Please simplify this comment", it will search its brain for all the simplified sentences it has seen before, and then combine the context to piece together a seemingly reasonable answer.
To give a more intuitive example, if human writing is "internalized understanding followed by output", then LLM creation is more like a "chain game", where it finds possible components from a haystack and then rearranges them to form a paragraph of text. This ability comes from the fact that it absorbed a large amount of online information during the training phase - reading the entire Wikipedia, news, Reddit, and product reviews, but it does not have real common sense judgment.
Experiment begins: Ask AI to help simplify 40,000 Disneyland customer reviews
The research was like a first assignment for the AI intern: process more than 42,000 reviews left by Disneyland patrons and simplify them. These reviews come from customers in different regions around the world, and the language used varies greatly, some are excited, some are emotional, and some are disorganized. If companies can transform these comments into concise and useful insights, it will be of great help to marketing, customer service, and product design.
The experimental method is very practical. The researchers designed a flowchart of Robotic Process Automation (RPA):
First, use a Python program to read each original review, then send a fixed prompt "Simplify: review text" through the API provided by Google Gemini, and then receive the simplified version sent back by AI. They also set a 60 second delay between each request because processing too many at once could be mistaken for abuse.
At first glance, this process seems simple: send out a command and the AI will give you a simplified version of the message.
But in reality, every API request is like pulling a "new intern who knows nothing about the company" into the conference room, handing him a customer message, and then asking him to immediately say a simpler but to-the-point version without any background knowledge.
AI does not simply “translate” language, but requires a more complex three-step process:
First, it must be able to understand the original meaning (semantic comprehension); second, it must determine which information to keep and which to omit (information reorganization); finally, it must rewrite it in a natural and fluent sentence (sentence generation).
In other words, this is not about asking the AI to do a word-for-word conversion, but asking itLike a communication expert who understands words and emotions, help you "digest and absorb" a piece of customer talk and turn it into a clearer and easier to understand version. This is actually not easy for AI at all.
How are internships in AI performing? 70% of the time, 30% of the time wrong, and they will "play dumb"
The final result was that out of 42,000 records, AI successfully simplified approximately 3,324 reviews, or less than 8%. Of these, about three-quarters appeared reasonable, while the rest were wrong or "refused to answer."
We can imagine that the AI intern is processing reviews like listening to a customer complaint and then reporting a summary of it to the boss. Ideally it would say, “OK, let me help you tidy up briefly: this customer thought the venue was beautiful, but it was too crowded.” This is where it mostly succeeds—the tone is steady, the meaning is clear, and sometimes it will add a sentence “I hope this helps you.”
But when it goes wrong, it's like an intern who suddenly spaces out during work hours, pretends to know something, or says "I don't know how to do this." Some error responses have confusing formats, while others simply say, "I'm just a language model, I can't help you." What's more interesting is that comments with the same structure will get different results in different situations: in one case it says it will help, but in the next it says it can't help. This inconsistency makes people wonder if it's in a bad mood. XD
These situations reflect that LLM is not always a stable computational tool. It is not like Excel that always follows formulas, but more like a "robot that can write poetry". Sometimes it is full of inspiration, and sometimes it makes mistakes. It is difficult for us to predict which one it will be next time.
How does a supervisor know if the AI has done a good job? Semantic similarity is key
So do these simplified results actually “converse”? The researchers used a technical tool called "semantic similarity" to evaluate. The principle of this tool is like comparing whether the "meaning angles" of two people's speech are consistent, rather than just comparing whether the words are the same.
They used a model called Sentence-BERT (SBERT), which can convert a piece of text into a "vector", which is a mathematical coordinate point. Next, we use "cosine similarity" to calculate the angle between the two paragraphs. If the angles of the two paragraphs are close, it means that the meanings are consistent; if the angles deviate, it means that the meanings are different.
It’s like if you say, “I was very moved by this movie,” and the AI replies, “The movie was good, I shed a few tears.” The meaning is close; but if it says, “I don’t like popcorn, it’s too sweet,” that’s completely off topic.
Through such comparisons, the study found that many of the AI's responses did retain the core of the meaning, but some simplified versions were "oversimplified", resulting in the original emotions and details being deleted, making them empty and meaningless.
So can companies hand over work to AI? It depends on how you use it.
This experiment shows that AI is like a new intern whose performance is still unstable. If he performs well, he can really help you save a lot of time and quickly convert customer feedback into specific insights; but when he performs poorly, he may misunderstand the customer's tone, miss the point, or even speak nonsense.
If business owners really want to integrate AI into their processes in the future, it is recommended to set up a "human-machine co-review" mechanism: let AI be responsible for preliminary editing and humans be responsible for final review. Only through this kind of cooperation can we combine the efficiency of AI with human judgment to achieve the best results.
Companies can’t expect AI to get it 100% right, just like they wouldn’t let an intern sign a contract alone. The really smart approach is to let AI help you handle the repetitive work of 80% first, and then focus your energy on the most critical 20%.
Conclusion: AI is a student, not a teacher. You need to know how to guide it.
The lesson learned from this study is that although AI is powerful, it is not omnipotent yet. Due to technical limitations and the nature of LLM, it is more like a growing student, not yet a teacher or supervisor who can take charge of everything on his own. It can help you speed up the process and provide inspiration, but it cannot replace human judgment and communication sophistication.
When we discuss "Is AI ready to enter the business world?", what we should actually think about is: "Are we ready to use AI correctly?" This is the key to promoting the successful implementation of AI.
AI is a tool, a partner, and even a member of the team. If used correctly, it can make small and medium-sized enterprises as efficient as large companies; but if used incorrectly, it may also cause you to fall into an information fog, misunderstand customers, and miss opportunities.
Instead of asking “Can AI replace me?”, we should ask “Can I use AI to become stronger than before?”
Related reports
related articles
Taiwan’s first AI unicorn: What is Appier, with a market value of US$1.38 billion, doing?
What is DNS? Introduction to Domain Name System – System Design 06
Introduction to System Design Components Building Block – System Design 05
Back-of-the-envelope Back-of-the-envelope Calculation – System Design 04
Non-functional features of software design – System Design 03
Application of abstraction in system design – System Design 02
Introduction to Modern System Design - System Design 01