Scaling up and scaling out in Enterprise Systems

By | November 24, 2016

The two most important things of an enterprise application are performance and scaling because it needs to store and retrieve as much data and as fast as possible. This note will consider scaling up and out metrics in detail.

  • Scaling up (Vertical Scaling): Add more resources to a single machine such as CPU, Memory, Disk storage. It called a centralized system. StackOverflow is the best example of a vertical scaling architecture.
  • Scaling out (Horizontal Scaling): Increase the number of available machines, so-called distributed system. The horizontal scaling architecture usually deals with a single point failure, and throughput drops to zero the system is no longer available. Facebook used horizontal scaling architecture to accommodate massive amounts of data. LinkedIn also moved from a centralized relational database to a collection of distributed systems

1. Master-Slave replication

Master-Slave replication schema is suitable for an enterprise application mostly used by SEO online marketing, where the read/write ratio is very high. Basically, the master accepts write-only transactions, whereas the opposite is true for reading-only transactions from the Slave. The architecture is divided into two separate replications: asynchronous and synchronous replications.

master-salve-replication

Figure 1: Master-Slave replication. (source: Vlad Mihalcea)

  • The asynchronous replication (warm standby) does not happen instantaneously, which leads to the Slave might lag behind the Master. If the Master node crashes, the most recent update record from the list of available Slaves will be elected to become a new Master.
  • The synchronous replication (hot standby) is taken to keep the Slaves up to date. This ensures data consistency because the Master node could be replaced by any available Slave node if the Master node failure occurs.

sequence-diagram-of-synchonous-master-slave-replication

Figure 2: Sequence diagram of synchronous Master-Slave replication

In order to prevent the client from reading stale data, the client request has to read from the latest synchronous Slave for data consistency. Other Slaves can be asynchronous. The most advantage of the architecture is an increase of available read-only connections of Slave nodes, but the drawback is Master node takes longer response time for read-write transactions.

2. Multi-Master replication

One significant difference with Master-Slave replication is all nodes are considered equally as Master including both read-write transactions. Advantages of the Multi-Master replication are an increase of availability and a decrease of read-write transaction response time. However, there is a possibility of conflicting updates. To get better from a data consistency perspective, conflict detection should resolve from both automatic and manual resolutions.

multi-master-replication

Figure 3: Multi-Master Replication. (source: Vlad Mihalcea)

Noticeable that if one node is no longer reachable, the synchronization could fail, and the transaction would roll back on all Masters. Especially, synchronization lateness becomes a big concern once nodes are deployed in a Wide Area Network.rollback-sequence-diagram-of-multi-master-replication

Figure 4: Rollback sequence diagram of Multi-Master Replication

3. Sharding

Sharding means distributing data across multiple nodes so each instance contains only a subset of the overall data. It is also known as Multi-Data Center. The figure below shows a typical sharding topology including at least two separate data centers.
sharding-architecture

Figure 5: Sharding. (source: Vlad Mihalcea)

There are some interesting things about Sharding.

  • As can be seen from the Figure 5, similar to the Master-Slave replication architecture,  each shard must be self-contained because a user transaction can only use data from within a single shard, in other words, pinning users to a geographically close data center.
  • Not all tables need to be partitioned across shards. The country table is a copy from one data center to the other, and partitioning happens on the user table only. This makes it possible for the asynchronous replication to synchronize small size.
  • Indexing database strategies (e.ge using a primary key hashing function) are employed to reduce data per node as well as get shorter transaction response time. This is because indexes will require less space.
  • Conflict handling mechanisms are automatically/manually discovered and resolved.
  • Using Database Cluster which basically takes care of distrubuting data to enable database scale horizontally like MySQL NDB Cluster.
    mysql_cluster_scalability_v1

    Figure 6: Auto-Sharding in MySQL Cluster. (source: MySQL.com)

  • Some resort strategies to increase system capacity. For example, optimizing the data layer to deliver lower transaction response times,  scaling each replicated node to a cost-effective configuration,…
    service-oriented-architecture-linkedin
    Figure 7: An example multi-tier service-oriented architecture (source: engineering.linkedin.com)

References:

  1. Vlad Mihalcea, “Scaling up and scaling out”. In High-Performance Java Persistence, pp.18-23, 2015.
  2. Performance And Scaling In Enterprise Systems. Retrieved from http://highscalability.com/blog/2016/5/11/performance-and-scaling-in-enterprise-systems.html on Nov 25, 2016
  3. Oracle, Scalability and High Availability. Retrieved from http://docs.oracle.com/cd/B14099_19/core.1012/b13994/avscalperf.htm on Nov 25, 2016.
  4. LinkedIn, A Brief History of Scaling LinkedIn. Retrieved from https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin on Nov 25, 2016.
  5. IBM, The sequence diagram. Retrieved from
    http://www.ibm.com/developerworks/rational/library/3101.html on Nov 25, 2016.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.