System Design - Introduction to Scalable System Design

Subscribe Send me a message home page

1. Introduction

In this post, we will present the general architecture of scalable application and the common techniques to design a scalable system. It is structured as follows:

Related Readings

2. Common Strategy and Principles of Good Software Design

Common Strategies

Principles of Good Software Design

Always remember that a stateless component is easier to manage and scale.

3. General Architecture


4. Web Server

The entry point of the system is a load balancer. Using load balancers is standard now because it have many benefits. For example, it's very easy to scale up the system by adding more web servers because web server instances are not visible to client and we will not run into problems such as DNS caching.

The load balance will distribute request to web servers. In a general sense, the code on a web server is basically a requestHandler and it usually can be expressed as a workflow. The actual business logic can be implemented in different services. For example, suppose we are developing a vaccination appointment booking application, service A might be UserRegistrationService and service B might be AppointmentScheduleService. Services work independently therefore they can different technologies and have their own development schedule.

One of the best practice is to make web server stateless. In this ways, we can scale the web server pool by simply adding more server instances. State is managed by each service on their own because different services care about different state.

Here we present a list of common state that services need to manage:

5. Business Logic - Application Services

Business logic must be executed somewhere. In order to create a truly scalable system, it's usually a good practice to apply functional partitioning: we create a service for each business function. There are multiple benefits of this approach


6. Data Layer

There are two common techniques to scale at data layer.

For read intensive application, we could use replication techniques. This basically means we have a leader node and a bunch of follower nodes. The leader node is the source of truth. All writes go to the leader node and a read return by leader node should return the most up-to-date data. Most of read requests are sent to follower nodes, which may return stale data because it takes time for follower nodes to replicate the data from the leader node. It's obvious that the leader node is prone to become a bottle neck or single point of failure.

For write intensive applications or applications that need to store large data set, we need to use partitioning technique. The general idea is to split the data into subsets and each subset of data is handled independently. There are a couple of challenges here. First, we need to be careful with how we partition data. The goal here is to split data evenly. Second, we also need to get prepared for future partitioning changes. For example, as the user base grows, an application may need to handle more data to the point where it needs to use more partitions. Suppose we have an application with 3 partitions in the data set and later we need 8. Depending on how partitioning is implementing, adding 5 more partitions may not be as easy as it appears. There are two common strategies to address this issue

The second approach is similar to consistent hashing.

Some thoughts on data layer:

As we can see from the above discussion, one of the challenges when working with a database is that the application code is aware of implementation details about database. This is obvious is we use partitioning techniques as the partitioning keys (shard keys) are managed at application level. In the case of replication, for it takes time to replicate data from leader node to follower node, a read from followers may return stale data. If application needs to handle the stale data, it means it is aware of the lag thus the implementation details of the database. It's even worse in the sense that this type of knowledge is implicit.

With NonSQL database, the data partitioning and replication may be handled automatically by the database library. However, most NonSQL database support eventual consistency by default. Of course, we could request a strongly consistent operation but we need to specify this in the application code. So we are in the same situation again where the application code is aware of details at the data layer. It seems that this is unavoidable. This might another reason why the persistence storage is the most difficult one in the system to scale.

7. Cache

Caching Based On HTTP

Caching Application Objects

One of challenges with cache is cache invalidation. It can be very hard to track dependencies and find all cached items that need to be updated or invalidated. To get around of this issue, we could try to set a short Time-to-live. This should mitigate the problem a little bit.

Please find more details at System Design - Introduction to Cache

8. Asynchronous Processing

Asynchronous processing basically means operations are not blocking or the execution of code does not need to wait for external events. There are two common ways to execute code asynchronously:

In this section, we focus on message queues as it's a powerful building block in scalable systems.

There are three components in message queues:

Message producers publish/send message to the queues. Ideally, they don't need to know anything about message consumers. The responsibility of message producers is relatively simple. They just create and send messages, nothing else.

Message consumers are also referred as queue workers. They are the ones that process message published by message producers. This is the place where business logic lives so message consumers are more complicated than message producers. In addition to business logic, message consumers also need to handle failure cases.

Message brokers are responsible for routing the messages. Some common routing methods are

With direct worker queue method, both message producer and message consumer need to know the location of the queue. For example, they need to know the queue name. Usually, a message is only processed by one queue worker.

Publish/subscribe pattern is more flexible. The connection between message producers and message consumers is established by topic. A topic can have multiple publishers and subscribers.

Using custom routing rules is the most flexible method and it provides the highest level of decoupling. With this setup, the connection between message producers and message consumers are not established via queue name or topic; therefore, message producers and message consumers don't need to know the location of the underlying queue. The connection is established by the routing rule. A classic implementation of this method is RabbitMQ.

Message queues provide many benefits such as

Of course, it has cost as well. It makes the system design more complex and there are specific challenges when using message queues. The first challenge is the message ordering. Due to its asynchronous nature, the ordering of messages is not guaranteed. Therefore when writing code, we should assume messages can arrive in any random order. The second challenge is that there is no guarantee that a message will be only delivered once. Some message brokers will requeue message if error occurs.

Message queue is often associated with event driven architecture. The basic idea is instead of responding to requests, the code should react to event. Here, an event means a record that indicates something has happened. On the message producer side, it only announces something has happened but it doesn't expect other components to do any work. For there is no expectation of additional work to be done, there is no need for message producer to wait. On the message consumer side, it doesn't need to know anything about the message publisher because it doesn't need to provide any response. All message consumer needs to do is to react to new event.

8. Other Topics

Approximate Correctness

Design is about tradeoff. If we can afford to give up complete correctness, it may help scale the system. Complete correctness such strong consistency and exactly-once is hard to achieve. That's why we see many database support eventual consistency and message brokers support at-least-once delivery.

Multiple Tiers

Not all tasks are equally important. Critical tasks should have dedicated resources. Less import work can sometimes be delayed.

Indication of Coupling

Suppose there are two components A and B, how could we get a sense of the level of coupling between this two components? Here are some quick "test":

If the answer is yes, then it is very likely that there is a high level coupling between these two components.

----- END -----

Welcome to join reddit self-learning community.
Send me a message Subscribe to blog updates

Want some fun stuff?