Table of contents
TogglePreface
Back-of-the-envelope, also known as back-of-the-envelope calculation, is a method of calculating approximate values of complex problems using simple estimates.Let’s also review here. A decentralized system consists of computing nodes connected through a network. These nodes can be various types of servers, such as web servers, application servers, and storage servers.
When designing a decentralized system, it is important to understand the number of requests each node can handle. At the same time, we can also determine the required number of nodes and traffic, so we will use Back-of-the-envelope to calculate our rough estimate, and finally design the system we need.
Back-of-the-envelope
In reality, distributed systems are composed of computing nodes connected through the network. There are also various computing nodes in the software systems on the market, and they are connected in many different ways. Back-of-the-envelope can help us ignore the details of the system and focus on more important aspects, like the abstract concepts mentioned in the previous article.
Here is an example of Back-of-the-envelope:
- The number of simultaneous TCP connections that the server can accept.
- The number of requests per second (RPS) that a web page, database, or cache server can handle.
- Storage needs of the service.
In these cases, software design flaws may result if unreasonable numbers are calculated. Therefore, when designing the system, we must use Back-of-the-envelope to make rough estimates, and then optimize and expand our system.
Data center server type
Data centers do not only have one type of server. Enterprise solutions use commodity hardware to reduce costs and find solutions to develop scalable systems. The following are commonly used in data centers to handle different workloads (Workload) Server type:
Web Server
For scalability, web servers are separated from application servers. The web server is the first node after the load balancer (Work Balancer). The web server is also the server that handles API calls from the client side. Usually the memory and storage resources vary according to the needs. Of course, usually the larger the memory and storage capacity, the better resources the server will have for processing. For example: Meta uses a server with 32GB RAM and 500GB capacity to meet a large number of calculations.
Application Server
Application servers are used to handle application and business logic. However, it is usually difficult to distinguish between web servers and application servers. The following are the differences between them. Application servers provide dynamic content, while web servers primarily provide static content to client browsers.
Storage Server
As the Internet becomes more and more developed, the amount of data that any network service needs to store will also increase explosively due to traffic and scale. Therefore, we need a storage server (which can be understood as a dedicated database server) to Handle huge amounts of data. We also need to select appropriate databases based on different data types. For example: Youtube uses the following databases:
- Use Blob storage to store compiled video data.
- Use Bigtable specifically to store large amounts of video thumbnails.
- Use RDBMS to store user and video data, such as comments and likes data.
- Use SQL & NoSQL to store various types of data for data analysis.
Common standards
The design, planning, and implementation of system services require a large investment of money, time, and manpower. If we don't know the types of workloads the machine can handle, it's difficult to design further. Latency is a very important thing that allows us to judge which machines are suitable for which workloads. The following is from Resources found on Github, and made into a table for readers’ reference.
Delay
project | Time (nanoseconds) |
---|---|
Execute an instruction | 1/1,000,000,000 seconds = 1 nanosecond |
Fetch from L1 cache | 0.5 nanoseconds |
branch prediction error | 5 nanoseconds |
Fetch from L2 cache | 7 nanoseconds |
Mutex lock/unlock | 25 nanoseconds |
Retrieve from main memory | 100 nanoseconds |
Send 2K bytes over 1Gbps network | 20,000 nanoseconds |
Read 1MB sequentially from memory | 250,000 nanoseconds |
Extract from new disk location (seek) | 8,000,000 nanoseconds |
Read 1MB sequentially from disk | 20,000,000 nanoseconds |
Send data packets to the United States and back | 150 milliseconds = 150,000,000 nanoseconds |
QPS
In addition to the latencies listed above, there is also Queries Per Second (QPS), which measures the volume of database queries.
important rate | Queries per second (QPS) |
---|---|
QPS processed by MySQL | 1000 |
Key-Value database processing QPS | 10,000 |
QPS processed by cache | 100,000 – 1 million |
Unit quantity
index | approximation | full name | abbreviation |
---|---|---|---|
10 | Thousand | Kilobyte | KB |
20 | Million | Megabyte | MB |
30 | Billion | Gigabyte | GB |
40 | Trillion | Terabyte | TB |
Calculate request volume
Next, let’s explain, what is the number of requests the server can handle per second, Requests Per Second (RPS)?
Within the server, resources are limited, and depending on the type of client request, system bottlenecks may occur.
We can mainly divide it into two types of requests:
- CPU-bound requests: The limiting factor for such requests is the CPU.
- Memory-bound requests: Such requests are subject to memory limitations.
CPU-bound requests
A common formula for calculating RPS for CPU-intensive requests is:
RPS-CPU = Num-CPU x 1 / Task-Time
Among them, the meaning of each variable is as follows:
- RPS-CPU: CPU-intensive RPS
- Num-CPU: Number of CPU threads
- Task-time: The time required to complete each task
Memory-bound requests
For memory-intensive requests, we use the following formula:
RPS-Memory = Worker-Memory / RAM-Size x 1 / Task-Time
Among them, the meaning of each variable is as follows:
- RPS-Memory: Memory-intensive RPS
- RAM-size: RAM size
- Worker-Memory: The worker used by memory to manage memory
The service receives both CPU-intensive and memory-intensive requests. Assuming half of the requests are CPU-intensive and the other half are memory-intensive, the total RPS we can handle is:
RPS = (RPS-CPU + RPS-Memory) / 2
The calculations above are just for understanding the approximation of estimating RPS. In reality, there may be many other factors that affect RPS. For example: If the data is not in RAM, or a request is made to the database server, a disk seek (Seek) is required, resulting in a delay. Other factors include: failures, errors in program code, node failures, power outages, network outages, etc., which are all inevitable factors.
Types of Computing in System Design Interviews
In a system design interview, we may need to perform the following types of estimates:
- Load estimation: Predict the number of requests, data volume, or user traffic your system can expect per second.
- Storage estimation: Estimate the amount of storage space required to process data generated by your system.
- Bandwidth estimation: Anticipated traffic and network bandwidth required for data transfer.
- Latency estimation: System architecture and components to predict response times and latencies.
- Resource estimation: Estimate the number of servers, CPUs, or memory required to handle the load.
Practical example of back cover calculation
Load estimation
Suppose you want to design a social media platform with 100 million daily active users (DAU) and each user publishes an average of 10 posts per day. To calculate the load, we need to count the total number of posts generated per day:
100 million DAU * 10 posts/user = 1 billion posts/day
Then estimate the number of requests per second:
1 billion posts/day/86,400 seconds/day = 11,574 requests/second
Storage estimation
Consider a photo-sharing app with 500 million users, each uploading an average of 2 photos per day. The average size of each photo is 2 MB. To estimate the storage space required for a day’s photos, calculate as follows:
500 million users* 2 photos/user* 2 MB/photo = 2,000,000,000 MB/day
Bandwidth estimation
For a video streaming service with 10 million users streaming 1080p video at 4 Mbps, the required bandwidth can be calculated:
10 million users * 4 Mbps = 40,000,000 Mbps
Latency estimation
Suppose you want to design an API that fetches data from multiple sources, and you know that the average latency of each source is 50 milliseconds, 100 milliseconds, and 200 milliseconds. Calculate the latency as follows:
50 milliseconds + 100 milliseconds + 200 milliseconds = 350 milliseconds
If the process is parallel (Parallel), the total delay will be the maximum delay:
max(50ms, 100ms, 200ms) = 200ms
Resource estimation
If you were to design a system that received 10,000 requests per second, each request would require 10 milliseconds of CPU time. To calculate the number of CPU cores required, simply calculate the total CPU time per second:
10,000 requests/second * 10 milliseconds/request = 100,000 milliseconds/second
At this time, we can also assume that each CPU can process 1,000 milliseconds per core. Then the number of cores required is:
100,000 ms/sec / 1,000 ms/core = 100 cores
Conclusion
Back-of-the-envelope is a method for quickly estimating system requirements and can be used in the early stages of system design. This approach enables effective design decisions to be made and avoids problems at subsequent stages.
Here are some considerations for system design using Back-of-the-envelope:
- Back-of-the-envelope can only provide a rough estimate. If there is a real situation of designing the system, detailed analysis is required.
- When performing Back-of-the-envelope, all major factors of the system need to be considered, including hardware, software, and networking.
Quote
Teach Yourself Programming in Ten Years
related articles
Non-functional features of software design – System Design 03
Application of abstraction in system design – System Design 02