Table of contents
ToggleWhat is a database?
As the name suggests, a database is a place where data is stored. So is there any way for us to develop a software system without using a database? If it is an early static web page with only small static text, there is no problem without a database. However, as contemporary systems become more complex and applications become more abundant, we must use databases to store data. For example: Youtube requires a large and complex database to store different types and sizes of data.
Of course, databases also have more precise definitions. The following is the definition excerpted from Oracle:
A database is an organized collection of structured information or data, usually stored electronically in a computer system. Databases are usually controlled by a database management system (DBMS). The data and DBMS together with its associated applications are called a database system, often referred to simply as a database.
Data in the most common types of databases running today are typically modeled as rows and columns in a series of tables to increase the efficiency of processing and querying of the data. Data can then be easily accessed, managed, modified, updated, controlled and organized. Most databases use Structured Query Language (SQL) to write and query data.
There are basically two types of databases, namely: SQL (relational database) and NoSQL (non-relational database). I believe many readers have heard the names of these databases. Among them, relational databases have Structured data, and the database is built on a relational model, displaying data on data tables in a direct and intuitive way. Non-relational databases are unstructured data that are relatively discrete and built for a specific data model, and store the data in a flexible structure that can be easily expanded by modern applications. NoSQL databases are recognized for their ease of development, functionality, and performance at scale.
Database advantages
Databases are very important for every company and organization, because these databases store all information, such as a company's employee information, customer information, transaction records, etc.
Basically the database has the following advantages:
Manage Big Data: Using repositories makes it easy to handle large amounts of data that is not possible with other tools.
Data consistency: We can retrieve the correct data at any time.
Data security: The database ensures data security and only allows authorized users to access the data.
Data integrity: The database ensures data integrity by using different restrictions on the data.
System availability: The database can be replicated on different servers (Replication) and can be updated simultaneously.
Scalability: The database is partitioned (Partition) to manage the load on a single node, improving system scalability.
ACID
ACID is composed of four words, namely: Atomicity, Consistency, Isolation, and Durability. These four characteristics are also written by the database management system (RDBMS). Features that are necessary to ensure that the process is correct when importing or updating data.
In the database system, a transaction (Transaction) refers to: a complete logic composed of database operations, for example: bank transfer, debiting money from the original account, and adding money to the other party's account. These two The operation of the database constitutes a complete logic and is inseparable. Such a process also contains the characteristics of ACID. Next, let's briefly explain ACID!
Atomicity
Atomic guaranteed transactions are treated as indivisible units, meaning that all operations in the transaction must succeed, or none of them can succeed. This is important to prevent data inconsistencies from occurring.
Consistency
Consistency ensures that all updates to the database are consistent, which means the database must remain in a valid state, even if multiple transactions access the data at the same time.
Isolation
Isolation ensures that transactions are executed independently of each other, which means that the operation of a transaction cannot interfere with the operation of another transaction. This feature is very important to prevent the occurrence of data corruption.
Durability
Durability ensures that once a transaction is submitted, its effects are permanent, meaning that the changes made by the transaction to the database will not be lost, even if there is a power outage or other system failure.
Relational Database
After talking about the characteristics of ACID, let us go back to explain the database. The popularity and dominance of relational databases are due to many reasons, including simplicity, robustness, flexibility, performance, scalability and relative ease of managing common data. Capacity.
A relational database essentially provides the properties of atomicity, consistency, isolation, and durability (ACID), so it is essentially a good database property. still remember us Application of abstraction in system design – System Design 02 The concept of abstraction is mentioned, and ACID is also an abstract concept, which simplifies the complexity of data interaction operations. Of course, the reasons why relational databases will become mainstream also include: simplicity, stability, flexibility, performance, scalability, and compatibility.
Some popular DBMS are as follows:
MySQL, Oracle Database, Microsoft SQL Server, IBM DB2, Postgres, SQLite.
Relational database example
First, a relational database (RDB) constructs information in tables, columns, and columns. RDB can establish associations or relationships between information by aggregating data tables, making it easy to understand the relationship between data points and obtain in-depth analysis.
Related databases, originally developed by EF Codd of IBM in the 1970s, allow any data table to be related to other data tables using common attributes. You can think of a related database as an Excel collection that can help companies establish data associations. Like Excel, each spreadsheet is a data table that stores information and is represented by data columns (properties) and data columns (records).
Google Cloud Example
The "Customer" data table contains customer-related information:
Customer ID (primary key), customer name, billing address, shipping address
In the "Customer" data table, the customer ID is a set of primary keys specifically used to identify customers in the related database. The customer ID is different for all customers.
The "Order" data table contains transaction information related to the order:
Order ID (primary key), customer ID (primary key), order date, shipping date, order status
Suppose a small company uses two data tables to process product orders. The first data represents the customer information table, which records customer number, customer name, address, shipping address, billing address, phone number, and other contact information. The primary key at this time can be set to the customer number, because this data is unique and will not be repeated.
The second table represents the customer order data table, which records product order information, including: customer number, order number, ordered products, quantity, size, and color. The primary key at this time can be set to the order number and will not be repeated. Then use the customer number, which is the common index key, to associate the two data tables. Finally, when the company's order processing application submits the order to the database, the database can access the customer order table, extract the correct product order information, and use the customer number from the table to query the customer's billing and shipment information in the customer information table. cargo information.
Why Choose Relational Database (SQL)?
Usually if the database form is not specified, we will default to a relational database during development. Among its advantages are its abstraction of ACID transactions and related programming. Such an abstract concept will also make it very convenient for software engineers to use the database. Let’s talk about its advantages separately!
Advantages of relational databases
Flexibility: In the context of SQL, the full name of Data Definition Language (DDL) is (Data Definition Language). DDL provides us with the flexibility to modify the database, including tables, columns, renaming tables and other changes.
Reduce Redundancy: Data redundancy refers to the same data or information appearing multiple times in the database. In a general file system, because there is no connection between files, sometimes there will be A piece of information appears in multiple files. One of the biggest advantages of relational databases is the reduction of redundancy. The purpose of eliminating data redundancy is to avoid possible problems during updates in order to maintain data consistency.
Concurrency: Concurrency is a very important feature. In high concurrency situations, data will be read and written to the database by many users at the same time. Therefore, we need to coordinate these operations to avoid data inconsistency. For example, if a flight ticket is booked by multiple people at the same time, simultaneous writing must be avoided. Concurrency in relational databases is handled through transactional access to data.
Backup and Recovery: The relational database ensures that the data state is consistent at all times. Most cloud-based relational databases perform continuous mirroring to avoid data loss and make the recovery process easier and faster.
Disadvantages of relational databases
Impedance Mismatch
Here we need to explain carefully first, what is impedance mismatch? Database Impedance Mismatch refers to a common problem in software development when there is a lack of consistency between the data structures and models used by different software components or systems.
In databases, impedance mismatch refers to the difference between the object-oriented programming (OOP) model and the relational model used by relational database management systems (DBMS).
I also found a picture here to briefly explain the meaning of the so-called mismatch. The upper part of the picture is the memory data structure, which may be a tree, array, linked list, hash... etc., and the lower right part is the associative database. legend. When the two methods of storing data are inconsistent, problems will occur when operating the database. Impedance mismatch (Impedance Mismatch) is also an important shortcoming of associative databases.
Non-relational Database
What is NoSQL?
NoSQL means Not Only SQL, is a non-relational database, which means that it does not use the traditional relational database model to store data. NoSQL databases typically use more flexible and extensible data models such as key-value pairs, files, graphs, or wide tables.
Types of NoSQL databases
Key-value database
Key-value databases use key-value methods such as hash tables to store data in the form of key-value pairs. The key acts as the primary key and the value can be anything from a simple scalar value to a complex object. Common key-value databases include: Amazon DynamoDB, Redis, and Memcached DB.
Usage scenarios:
- Caching: Key-value databases are often used as caches to provide quick access to frequently used data. For example, a web application can use a key-value database to cache page content.
- Session data: A key-value database can be used to store session data, such as user login information and shopping cart contents.
- Configuration data: Key-value databases can be used to store configuration data, such as application settings and system parameters.
- Counters: Key-value databases can be used to store counters, such as website traffic statistics.
Document database
Documents in XML, JSON, BSON and other formats are all storage forms in file databases. These file files are composed of special data structures, which can include maps and collections. Common file databases include: MongoDB and Firestore.
Usage scenarios:
- Content Management System (CMS): A CMS uses a document database to store content such as articles, pages, and media files.
- E-commerce websites: E-commerce websites use file databases to store product information, customer information, and order information.
- Social media applications: Social media applications use document databases to store user data, posts, and comments.
- Application logs: Application logs use a file database to store application log data.
- Internet of Things (IoT) devices: Internet of Things (IoT) devices use file databases to store sensor data and device status.
Graph database
Graph databases use graphs to store data. Graphs are composed of nodes and edges. Nodes represent entities and edges represent relationships between entities. Common graph databases include: OrientDB and InfiniteGraph.
Usage scenarios:
- Social network analysis: Social network analysis uses graph libraries to analyze relationships in social networks, such as those between friends, family, and colleagues.
- Recommendation systems: Recommendation systems use graph databases to recommend products or services to users based on their interests and relationships with other users.
- Fraud detection: Fraud detection uses graph databases to detect fraudulent behavior, such as identifying transactions associated with known fraudsters.
- Knowledge Graph: Knowledge graph uses a graph database to store and query knowledge about the real world, such as entities, concepts and the relationships between them.
- Supply Chain Management: Supply chain management uses graph libraries to manage relationships in a supply chain, such as those between suppliers, manufacturers, and retailers.
Columnar database
Columnar database Is a database that stores data in columns rather than rows. Unlike traditional relational databases, which store data in rows, columnar databases can effectively compress data because they can store large numbers of identical values. Common columnar databases include: HBase, Hypertable, and Amazon Redshift.
Usage scenarios:
- Scientific Computing: Scientific computing uses columnar databases to analyze scientific data.
- Financial Analysis: Financial analysis uses columnar databases to analyze financial data.
- Advertising Analytics: Advertising Analytics uses a columnar database to analyze the performance of advertising campaigns.
NoSQL database advantages
Simple design: Unlike relational databases, NoSQL does not need to deal with impedance mismatches, such as storing all employee data in a single file rather than multiple tables that require join operations. This strategy makes writing less code, debugging, and maintaining simpler and easier.
Horizontal scaling: First, NoSQL is preferred because of its ability to run databases on large clusters. NoSQL databases typically distribute data across multiple nodes and automatically balance data and queries across nodes.
Availability: To enhance data availability, node replacement can be performed without any downtime to the application. Most variants of non-relational databases support data replication to ensure high availability and disaster recovery.
Low Cost: Many RDBMS are very expensive to license, while many NoSQL databases are open source and available for free. Likewise, some RDBMS rely on expensive proprietary hardware and storage systems, while NoSQL databases often use clusters of inexpensive commodity servers.
NoSQL database disadvantages
Lack of standardization: NoSQL does not adhere to any specific standards. For example, it is difficult to convert applications from one type of NoSQL database to another.
Consistency: When a NoSQL database fails, consistency and availability need to be considered, and non-relational databases will not have good data integrity, and the data will not be strongly consistent.
Complexity:The model and interface of NoSQL databases may be more complex than that of relational databases. There are several types of non-relational databases, but the interfaces of relational databases are not much different.
How to choose between relational and non-relational databases?
Many factors will influence the choice, and the following table compares relational databases and non-relational databases:
relational database | non-relational database |
---|---|
Structured: Data is stored in tables, with each table consisting of rows and columns. Rows represent data records, and columns represent data attributes. Data must conform to a predefined format. | Flexible: Can store data in various formats, including JSON, BSON and graphics. The data structure can be structured, semi-structured or unstructured. |
Suitable for OLTP (online transaction processing): good at handling frequent read and write operations, such as orders, transactions and logins. | Suitable for OLAP (Online Analytical Processing): Good at handling complex queries and analysis, such as sales trends, customer behavior and market analysis. |
Harder to scale out: As data volume increases, performance may degrade. More servers need to be added to handle more data. | Easy to scale out: More data can be processed by adding more nodes without performance bottlenecks. |
ACID guarantees: atomicity, consistency, isolation, and durability. Data is always consistent, even when multiple users are accessing it simultaneously. | There may be no ACID guarantees: the data may be inconsistent in certain situations, such as when data is replicated between multiple nodes. |
Models and interfaces can be more complex: knowledge of query languages such as SQL is required. | Models and interfaces are generally simpler: easy to use and understand. |
Mature technology: Has been around for decades and has extensive support and tools. | Newer technology: still under development and may lack mature tools and support. |
related articles
Load Balancer Explained – System Design 07
What is DNS? Introduction to Domain Name System – System Design 06
Introduction to System Design Components Building Block – System Design 05
Back-of-the-envelope Back-of-the-envelope Calculation – System Design 04
Non-functional features of software design – System Design 03
Application of abstraction in system design – System Design 02
Introduction to Modern System Design – System Design 01