Decrypting Scale AI: The "worker" wisdom behind artificial intelligence? Information: The secret behind the unicorn’s valuation of US$7.3 billion in 8 years

ScaleAI

Preface

OpenAI launched a generative AI revolution through ChatGPT, Waymo used self-driving technology to spread on the streets of North America, and NVIDIA once became the second largest company in market capitalization due to its GPU image processing technology...
In addition to using a large amount of AI, these companies have one thing in common: they all cooperate with the startup Scale AI to train AI.

In fact, no matter which industry it is in, behind every successful AI company, there is a group of people who complete the boring but indispensable training task for them - data labeling.

 

Scale AI, a new American unicorn, is one of the leaders.

ScaleAI-4

He dropped out of school to establish Scale AI when he was 19 years old. Alexandr said that Scale AI provides data labeling services, which is like selling shovels in this generative AI gold rush (does this description sound familiar😆).

While many AI startups have yet to make a dime, Scale AI’s revenue last year reached US$250 million and its valuation reached US$7.3 billion. Its customers range from OpenAI and Tesla to the US Air Force, Army, CIA, etc. field.

So what exactly are data tags? Why can it be so profitable? How did Scale AI find its niche market in the highly competitive AI field and successfully grow into a unicorn?

Today I’m going to share the story of Scale AI with you!

3 Takeaways if you only have 1 minute

  1. In the AI era, the growth momentum brought by data:

 In addition to good models and computing power, the ever-improving artificial intelligence requires data and data accuracy, which is often overlooked but extremely important. Scale AI is helping other companies label and process large amounts of data, so that the data fed into AI models is of higher quality.

  1. Scale AI’s products and markets:

Scale AI's products provide services for three levels of AI, namely:data hierarchy, providing training data required by the AI model;model hierarchy, use data to train and optimize AI models;application level, apply the trained AI model to specific business scenarios to solve practical problems. At the same time, customers also range from technology giants such as OpenAI, NVIDIA, Waymo to the US government.

  1. Challenges and risks of Scale AI:

Although Scale AI has firmly grasped the trend of data labeling, it relies heavily on low-wage labor for data labeling, causing sweat labor disputes, which is still an issue that needs to be resolved. At the same time, as more technology companies build their own data labeling environments and advances in artificial intelligence, the need for manual labeling may be reduced in the future, all of which pose a threat to the development of Scale AI.

Founding background

The origin of Scale AI can be told from the story of "Who stole the yogurt?"

ScaleAI-9

Catch the yogurt thief

In 2016, founder Alexandr Wang suspected that one of his college roommates at MIT had stolen his yogurt, but he didn't want to accuse the innocent, so he wanted to build a "smart refrigerator camera" to catch the thief.

He referred to the teachings of Google TensorFlow (an open source machine learning platform) to learn how to make such a camera.

At first, he almost directly copied all the code for training image recognition on Google TensorFlow, but he faced the biggest problem:
It is true that the computer’s learning ability is very strong, but a large number of labeled food photos are still needed to train the computer to recognize.

Without these photos, no matter how smart the computer is, it still doesn't know what the food looks like, and it can't help Alexander catch who stole his yogurt.

At that time, he could only manually label tens of thousands of photos containing food:
Use the marking tool to frame the food in each photo and add tags, such as "apple" and "yoghurt." Repeat this process until all photos are tagged.

After painstakingly labeling tens of thousands of images, Alexandr finally trained an accurate recognition model.

During such a laborious process, Alexandr suddenly realized: to make the AI model smart, it requires not only good programming code, but also a large amount of high-quality labeled data.

After catching the yogurt thief: information marked outside the refrigerator in the vast market

After the yogurt thief story, practical experience made Alexandr Wang and co-founder Lucy Kuo more convinced of the market opportunities of data tagging.

The founding team, Alexandr Wang and Lucy Kuo, are two computer geniuses who started working at Quora and Snapchat respectively when they were 20 years old. They observed that these two social platforms need to review and tag a large number of images and posts every day. The tagging process is very repetitive and cumbersome, and in the later stage, they must rely on outsourced teams to complete it manually.

They then discovered that the boring but extremely important task of "data labeling" has the potential to be automated, modularized and even commoditized.

This Ah-hah moment led them to found Scale AI in 2016, which specializes in helping other companies with data labeling, allowing Scale AI customers to complete the labeling task with just one line of code.



What pain points does Scale AI solve?

Before sharing how Scale AI accelerates data labeling, let’s briefly introduce what data labeling is.

File tag: Uninteresting but vitally important work in the AI journey

Explain data labeling in one sentence: Label the data to facilitate the machine learning model to understand the data.

Back to the metaphor at the beginning:
To train a student who can take exams, not only must the student's mind be smart enough, but also textbooks and question books with correct content are needed.
With the right teaching materials and a good brain, students can learn quickly and answer every question in the exam correctly.

The same goes for training AI. In addition to the powerful code of the machine learning model itself, it also needs to be trained with accurately labeled data.
With a good learning model and accurately labeled data, AI can correctly learn the information fed in and perform better in practical applications.

What kind of company needs data tagging?

Profile tagging may sound unfamiliar, but it’s actually everywhere and used every day by every company you can think of!

Simply put, any company that relies on data to improve its products or services may need data tagging!
Here we briefly share 3 common data tagging application scenarios:

  1. Technology companies such as Google, Apple, and Amazon:
    Use profile tagging to optimize image recognition in Google Photos, speech recognition accuracy in Apple Siri, and personalized product recommendations on Amazon.

  2. Medical companies such as Zebra Medical Vision, GE Healthcare:
    Use labeled medical images to train AI models to help doctors diagnose X-rays or MRI pictures faster and more accurately, and determine possible diseases.

  3. Financial companies such as JPMorgan Chase, Lemonade, Stripe:

Use data markers to detect possible criminal activity in credit card transactions, assess the risk of insurance applications, optimize the security of electronic payment systems, and more.

Why Outsource to a Marking Company?

 

It can be understood by giving a practical example of data tagging!
If GE Healthcare today To train a model that can recognize medical images, the following steps are required:

  1. Data collection: GE Healthcare collects a large amount of medical imaging data, such as X-rays and MRI pictures
  2. data tag: Professional doctors mark these images as "negative" or "positive" to determine which images show disease
  3. Training model: Use a large amount of labeled data to train the AI model so that it can learn to identify negative and positive medical images.
  4. application model: Models are used in medical diagnosis to help doctors identify diseases more quickly.

Scale AI is mainly involved in the "labeling" stage, helping GE Healthcare label large amounts of data quickly and accurately.
After all, it is not economical to ask doctors to sacrifice their consultation time to mark tens of thousands of photos as negative or positive.

With the help of Scale AI, GE Healthcare can use labeled data to train the model so that it can correctly identify new images in the future.

When you see this you may want to ask:

Scale AI sounds like it is just a large human outsourcing company, and there must be many competitors. So where does Scale AI win? Why can it have annual revenue of US$250 million and cooperate with so many large companies and even the US government?

Scale AI’s core competencies 

Scale AI does need to outsource manpower, but after Scale AI outsources data labeling tasks to regions with lower labor costs such as Africa and Southeast Asia, it will use various internally developed software to streamline manpower and speed up the labeling process. Vertical integration in the industry chain to provide Total Solution.

Scale AI’s core technology

Scale AI's ability to gain a foothold in the highly competitive AI field can be attributed to its effective human-machine collaboration model.
Here are 4 key points:

  1. Combination of automation platform and manual work:
    Machine learning and artificial intelligence technologies are used to assist the data labeling process, effectively allocating and managing data labeling work, and reducing reliance on manpower.

  2. Subsidiary Remotasks manages a diverse workforce:
    Through crowdsourcing, taggers from all over the world can participate in data tagging work, completing a large number of tagging tasks in a short period of time, while flexibly responding to the tagging needs of different languages ​​and cultures.

  3. Mark quality management system:
    Scale AI has a strict marking quality control system to ensure the quality and accuracy of human marking. Examples include multi-tagging (multiple taggers tagging the same data) and algorithm checking (machine checking of tagging results) to ensure that every piece of data is carefully verified.

  4. Continuously iterative marking tools make human-machine collaboration smoother:
    Through technological innovation and continuous improvement of labeling tools, Scale AI maintains its leading position in data labeling technology to meet the changing needs of the market.

Through these methods, Scale AI can efficiently handle a large number of data labeling needs and provide high-quality data labeling services to customers including OpenAI, NVIDIA, Waymo and other well-known companies.
This combination of automation technology and global human resources has allowed Scale AI to find its own niche in the highly competitive AI field and quickly grow into a unicorn company.

However, Scale AI’s valuation is still rising. In addition to providing data labeling services, has it also gone through many iterations of the company’s product line?
At this time, it is necessary to mention the story of three product pivots (pivot) even though it has only a short history of 8 years.

Create timeline

Phase One: Data Processing Engine (2016-2019)

In its early days, Scale AI focused on building simple data processing APIs and quickly became the preferred data provider for self-driving car companies such as Lyft, Uber and Waymo.
For example, self-driving car companies can easily upload their road image data through Scale AI's API and use Scale AI's tools to quickly label the data and then use it to train their self-driving models.

After Scale AI established a firm foothold in the self-driving field, it began to expand its service scope and attack various applications such as natural language processing, e-commerce, and AR/VR.

ScaleAI-8

Phase 2: Artificial Intelligence Engine (2020-2022)

After establishing its market position as a training material provider, Scale AI turned its attention to the field of AI and extended its reach to the entire life cycle of customers' artificial intelligence development.

Scale AI begins rolling out fully managed models as a service, working with customers to ensure they have what they need to deliver high-performance models such as large language models, self-driving models, generative AI models, and more infrastructure.
This market expansion allows Scale AI's development to not only provide labeled data, but also manage models, expanding market opportunities.

Phase 3: Generative AI and Application Engine (2022-present)

Scale AI has worked closely with OpenAI from the early stages of Chat GPT development, allowing them to seize the opportunity of the generative AI wave from a developer perspective.
Scale AI subsequently launched new products tailored for generative AI:
Examples include Spellbook, a tool for adjusting prompts, and the Donovan application, which helps defense and intelligence professionals make decisions.

After reading this, if you think this article is good,
Welcome to subscribe to my newsletter [Roxanne's Tech Talk] 
I will share more interesting science and technology stories on it! 🥳
Join 500 people and learn the latest technology and knowledge together 👉 Subscribe Roxanne's Tech Talk

Scale AI products

 

Scale's products can be segmented by AI level (application/model/data) and type (service/software).

Small classroom of vernacular literature:

What do the AI levels (application/model/data) represent?

  • data hierarchy: Provide training data required by the AI model.
  • model hierarchy: Use data to train and optimize AI models.
  • application level: Apply the trained AI model to specific business scenarios to solve practical problems.

AI level

type

product name

Product Description

success case

target data
(Data)

Serve

Rapid

A self-service data annotation platform that helps users quickly upload and mark data.

OpenAI
Used to label training data

Pro

Enable tagging via API and work with professional managers to handle large and complex data volumes.

Waymo
For autonomous driving data labeling

software

Studio

A comprehensive labeling platform that improves the efficiency of internal labeling teams and provides management, monitoring and tracking tools.

Tesla
for internal data tagging

Nucleus

Machine learning data management tools help visualize data, improve model performance, and perform active learning and edge case identification.

NVIDIA
For model training data management

For model
(Model)

Serve

Custom Model Products

Help build, manage, and deploy large language models, focusing on fine-tuning models to improve performance for specific uses.

Google
Used for language model fine-tuning

software

Spellbook

Help teams quickly deploy large-scale language model applications, create and compare hints, and perform evaluation.

OpenAI
for prompt creation and comparison

Generative AI Platform

A full-end solution that allows enterprises to customize, build, test and deploy generative artificial intelligence applications.

Anthropic
For generative AI application development

For applications
(Application)

software

Forge

Helps marketers and brands create AI-generated product images for use in advertising and social media.

Coca-Cola 

Used for advertising image generation

Donovan

Support defense and intelligence community decision-making by analyzing data, quickly identifying trends and anomalies, and providing summary and translation capabilities.

U.S. Department of Defense
for intelligence analysis



Scale started as a company specializing in data labeling, and now provides services and software covering everything from data labeling and management, model training and evaluation, to AI application development and deployment.Full process solution, contracting more tools needed for the AI vertical training process so that they can continue to stand firm and differentiate themselves from competitors.

So what kind of market does this company target with such a diverse product line?

Markets faced by Scale AI

The market opportunity for Scale AI can be divided into two parts:
Core AI-as-a-Service (AIaaS) market,
and the emerging generative AI market.

ScaleAI-5

1. AI-as-a-Service (AIaaS) market

Initially, Scale AI focused on data labeling, but as the product line expanded, Scale AI gradually developed into a comprehensive AI IT service provider to help companies build models.
(As mentioned earlier, extending from data to model and then to end application)

According to research, the market value of AI-as-a-Service (AIaaS) in 2023 has reached $27 billion, the growth rate exceeds 20%.

According to investment research platform Tegus, one investor said:

"You know why I love Scale AI? Because it allows me to work with one company instead of 15 companies. Scale AI integrates many functions, including data labeling, data management and synthetic data. Others The company only focuses on a single function, while Scale AI covers all functions, making cooperative outsourcing more convenient and efficient.”

 

2. Generative AI Market

With the rise of generative AI, the market opportunities for Scale AI have also increased significantly.
Scale AI has been the data labeling partner of choice for tech giants when training their own AI.
For example, when OpenAI developed GPT-4 and DALL-E, Google DeepMind developed Gemini, and Amazon Web Services (AWS) developed Claude, Scale AI helped these companies build customized generative AI artificial intelligence models. It is expected that the generative AI market will double every year by 2027, reaching $55 billion.

After looking at the growth potential of Scale AI’s potential market, it shouldn’t be difficult to guess the recent impressive fundraising results!

Scale AI operational status

According to hot news, Alexandr announced on May 21 that Scale AI raised US$1 billion in Series F financing, with a valuation of US$13.8 billion, almost twice the valuation of the previous round of financing.

The latest round of financing was led by top VC Accel, and participating investors also include new investors such as various technology giants such as Cisco Investments, Intel Capital, AMD Ventures, WCM, Amazon and Meta, as well as Y Combinator (YC), Index Ventures and Existing investors such as Nvidia.

At the same time, Scale AI was also selected for 2024 CNBC Disruptor 50 , ranked 12th among the world's 50 largest innovation-disrupting companies, Alexandr Wang, founder and CEO of Scale AI, said:

“Our mission is to build a data foundry for artificial intelligence, and this funding will accelerate us to achieve this goal and pave the way to AGI (artificial general intelligence).”

After seeing Scale AI’s glamor in the media, let’s finally balance the reports on the controversies and potential risks of the company Scale AI.

Scale AI controversies and potential risks

sweat labor dispute

Scale AI's success relies largely on the 240,000 workers in Kenya, the Philippines, Venezuela and other places who work through Scale AI's subsidiary Remotasks, but earn less than $1 an hour.
These workers label AI training data, but because there is no legal contract guarantee,Sudden dismissal, work account suddenly frozen, and even some workers in the Philippines reported experiencing delays or withholding of payments.https://www.gvm.com.tw/article/104424

Potential operational risks

In 2023, macroeconomic impacts led to layoffs at Scale AI 20%, and competition in data labeling became increasingly fierce. Technology companies such as Google and Amazon began to establish their own data labeling environments to reduce reliance on outsourcing services.

At the same time, the trend of using artificial intelligence to label data is also growing, and models like GPT-4 are already outperforming humans in many tasks.

The University of Zurich recently conducted research and found that ChatGPT's labeling task under zero-shot conditions was even better than that of trained individuals.
Although human labeling is still considered the gold standard for data labeling, it is likely that future multimodal GPT-5 or other models will replace human labeling efforts.

Conclusion

The outstanding fundraising performance, founder Alexandr's genius image, and the increasingly important role of data resources in this wave of AI have all made Scale AI the focus of everyone's attention. How can data empower AI? How can it be handled more efficiently and humanely? I believe that answers to these questions can be found in Scale AI in the future.

3 Takeaways

 

  1. In the AI era, the growth momentum brought by data:

 In addition to good models and computing power, the ever-improving artificial intelligence requires data and data accuracy, which is often overlooked but extremely important. Scale AI is helping other companies label and process large amounts of data, so that the data fed into AI models is of higher quality.

 

  1. Scale AI’s products and markets:

Scale AI's products provide services for three levels of AI, namely:data hierarchy, providing training data required by the AI model;model hierarchy, use data to train and optimize AI models;application level, apply the trained AI model to specific business scenarios to solve practical problems. At the same time, customers also range from technology giants such as OpenAI, NVIDIA, Waymo to the US government.

 

  1. Challenges and risks of Scale AI:

Although Scale AI has firmly grasped the trend of data labeling, it relies heavily on low-wage labor for data labeling, causing sweat labor disputes, which is still an issue that needs to be resolved. At the same time, as more technology companies build their own data labeling environments and advances in artificial intelligence, the need for manual labeling may be reduced in the future, all of which pose a threat to the development of Scale AI.

Thank you for reading this article!
If you feel that you still have more to learn, please subscribe to my newsletter [Roxanne's Tech Talk] 
I will share more interesting science and technology stories on it! 🥳

Join 500 people and learn the latest technology and knowledge together 👉 Subscribe Roxanne's Tech Talk
You are also welcome to communicate through Linkedin👩🏻‍💻 Roxanne Chen

en_USEnglish