Non-functional features of software design – System Design 03

system-design-system design 03-software design non-functional features-hogantech-hoganblab


Non-functional characteristics in modern software systems include: availability (Availability), reliability (Reliability), scalability (Scalability), maintainability (Maintainability) and fault tolerance (Fault Tolerance). These characteristics not only affect the software System performance and efficiency will also directly affect user experience. This article will also provide an in-depth explanation of the above five characteristics.

non-functional features

In system design, we can roughly divide it into two types of requirements for system design planning, namely functional requirements (Functional Requirement) and non-functional requirements (Non Functional Requirement).

  • Functional requirements, such as designing a video viewing platform that includes functions such as user login, uploading videos, watching videos, etc.
  • Non-functional requirements, such as designing a video viewing platform that can handle the traffic of millions of users.

You should be able to find here that both functional and non-functional requirements are very important, so this article will talk about non-functional requirements first.


Availability It refers to the percentage of system services or overall system equipment that can be accessed by users and operate normally. For example, if the availability of a service is 100%, it means that the system's services can operate and respond normally at any time. In vernacular, the system will not be broken.

How to measure usability

Availability can be expressed as a mathematical ratio, symbolized by A, with higher values being better. We express it with the following mathematical formula:

A (percentage) = (total time – service downtime) / total time * 100%

Availability Downtime per year Monthly downtime Weekly downtime
90% (a 9) 36.5 days 72 hours 16.8 hours
99% (two 9) 3.65 days 7.20 hours 1.68 hours
99.9% (three 9s) 8.76 hours 43.8 minutes 10.1 minutes
99.99% (four 9s) 52.56 minutes 4.32 minutes 1.01 minutes
99.999% (five 9s) 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% (six 9s) 31.5 seconds 2.59 seconds 0.605 seconds
99.99999% (seven 9s) 3.15 seconds 0.259 seconds 0.0605 seconds

As you can see from the above table, we can use the availability percentage to judge the stability of a system. Generally, systems with greater traffic will want the availability ratio to be close to 100%, so that users will not be affected.

usability factors

The following are some factors that may affect availability:

  • Hardware failure
  • software error
  • network problems
  • human error
  • natural disaster

System service companies typically take a variety of measures to improve availability, such as:

  • Perform regular maintenance and testing
  • Implement a disaster recovery plan


Reliability is the rate at which a service performs its function within a specified time, where reliability metrics measure the performance of system services under different operating conditions.

We often use "Mean Between Failures (MTBF)" and "Mean Time to Repair (MTTR)" to measure reliability.

MTBF = total number of faults/total elapsed time – sum of faults

MTTR = total repairs/total maintenance time

(We pursue higher MTBF values and lower MTTR values.)

Reliability and availability

Reliability and availability look very similar, but they are actually two different concepts. Reliability (Reliability) and availability (Availability) are two important indicators to measure whether system services meet the agreed service level objectives (SLO). Availability focuses on time loss, reliability focuses on frequency of failures. Availability and reliability are both indispensable. We can judge the stability of a system through these two indicators.


Scalability refers to the system's ability to handle increasing workloads without affecting performance. For example, video platforms must be able to handle an increasing number of users and video downloads and plays.

Workloads can be of different types, including the following:

  • Request workload: This is the number of requests the system handles.
  • Data/storage workload: This is the amount of data stored by the system.


Scalability has different dimensions:

  • Size Scalability: Scalability means that we can add more users and data to the system.
  • Administrative scalability: Management scalability refers to the ability of a single decentralized system to be easily shared by an increasing number of users.
  • Geographical scalability: Geographic scalability refers to how easily a program can serve other regions despite performance constraints. To put it bluntly, the system can be used across different countries and time zones without affecting performance too much.

Different scalability

We usually mention scalability, which can be divided into two types:

Vertical Scaling \ Scaling Up

Vertical expansion refers to upgrading existing server equipment, for example, providing more CPU or RAM for expansion. Vertical expansion allows us to expand the memory capacity and performance of the server, but there is still an upper limit for computer upgrades. There is no way to infinitely upgrade the equipment, and the cost of vertical expansion is usually very high.

Horizontal Scaling \ Scaling Out

Horizontal expansion refers to the increase in the number of servers through network connections. To put it bluntly, if the system needs to be expanded, continue to increase the number of servers and perform connection and transmission through the network. The advantage of this is that the cost is relatively low.


In addition to building software systems, we also need to maintain the system. This maintainability means that the system can fix errors, add new features, keep the system platform updated, and ensure smooth system operation to maintain the normal operation of the system.

The concept of maintainability can be further broken down into three basic aspects:

  1. Operability (Operability): Refers to the difficulty of ensuring that the system operates smoothly under normal circumstances and can be restored to normal status in the event of a failure.
  2. Lucidity: Refers to the complexity of the code. The more concise and clear the code in the system, the easier it is to understand and maintain, and vice versa.
  3. Modifiability: Refers to the ability of the system to easily integrate new and modified functions.

How to measure maintainability (Maintainability)

Maintainability is the probability that a service will return to functionality within a specified period of time after a failure. We can also use maintainability metrics, which measure how easily and quickly a service can be restored to normal operating conditions.

For example, consider the serviceability of a system element in half an hour 95%. In this case, the probability that the system components will return to a fully normal state within half an hour is 0.95. We use mean time to repair (MTTR) as a measure of M.

MTTR = total repairs/total maintenance time

In other words, MTTR is the average time it takes to repair and restore a failed component. Our goal is to have the value of MTTR decrease.

Fault Tolerance

What is fault tolerance?

Large-scale real-world applications typically have hundreds of servers and databases that accept requests from billions of users and store large amounts of data. These applications also require a mechanism to help ensure data security and avoid the need to re-run intensive programs by avoiding single points of failure.

Fault tolerance refers to the ability of a system to continue executing even if one or more system components fail. Here, system components can be software or hardware, but in practice, it is actually very difficult to design a fully fault-tolerant software system.

fault tolerance technology

If the failure occurs in hardware or software, it will eventually affect the data. Fault tolerance can be achieved through a variety of techniques and methods, but it still depends on our system structure.

  • Replication: This is one of the most widely used fault tolerance techniques. This technology is also used especially in databases. We can copy servers and data. When a private server in the system fails, it will be automatically converted to a copied server. When such a system encounters another failure, it will not affect the use. By.
  • Checkpointing: Checkpointing is a method of backing up the system state to the database to facilitate quick inspection in the future when an error or service failure occurs. When a distributed system fails, we can obtain the current data from previous checkpoints to allow engineers to repair it.


This article talks about important concepts such as system availability (Availability), reliability (Reliability), scalability (Scalability), maintainability (Maintainability) and fault tolerance (Fault Tolerance).

  1. Availability(Availability)Represents the availability of the system to users
  2. reliability(Reliability)Represents the likelihood of system failure
  3. Scalability(Scalability)Represents the system's ability to handle workload
  4. maintainability(Maintainability)Represents the ease of maintenance of the system
  5. fault tolerance(Fault Tolerance)Represents the system's ability to withstand failures.

Understanding these concepts and adopting the appropriate techniques can help developers build systems that are more reliable, efficient, and easier to use.

The following are some suggestions for improving system availability, reliability, scalability, maintainability, and fault tolerance:

  • Adopt redundant design: Improve system fault tolerance by using multiple system elements or nodes.
  • Perform regular maintenance: Regularly check and update systems to prevent failures.
  • Use a scalable architecture: Design systems that can easily scale to meet growing user needs or data volumes.
  • Use proven technology: Use proven and reliable techniques to improve system stability.
  • For a thorough test: Test your system thoroughly before deploying it to identify and fix potential problems.

Leave a Comment

Your email address will not be published. Required fields are marked *