How can Smarts scale so easily ?

Kristo Truu 9 May 2019

All that is visible to our clients is only 5% of our system. Everything comes down to how the system is architecturally built. Let’s dig into this topic and start with the foundation.

Database

There are several different varieties of databases. The most common database types are relational databases and NoSQL databases.

The main difference between these databases is that the NoSQL databases are much faster than the relational ones.  In addition, they are more able to handle greater data volumes. That’s why we chose NoSQL database engine for Smarts.

When we started the Smarts project, we considered between three different database engine – Couchbase, MongoDB and Cassandra. We found that Mongo was the best fit for us. According to research it is 10 times faster than relational database (MySQL) while the data volume grows. It gives us the opportunity to scale both vertically (Replicas) and horizontally (Sharding) to increase availability  and reliability.

Architecture

The basic questions about choosing architecture are

  • How many users potentially will start using your system ?
  • How to keep the system reliable ?
  • How can the system scale?

Smarts had many other questions but these were the basics.

We realized that microservices are the most suitable solution for us. Using this kind of architecture, it must strictly follow certain requirements, otherwise it will not work out. The main rule is “System services must not be interdependent between each other”. In case of an error, one service must not affect the operation of another services.  There are many examples of how this architecture is used incorrectly. The main mistake that usually made is the non-duplication of data. For example, currently in the Smarts system, institution data is duplicated into  6 different databases. 

Communication between services

However, such services must interact with one another. For this, Smarts is using RabbitMQ. Simplified – if one service falls down and another service tries to send data to it, data  will remain in RabbitMQ for as long as the service recovers. For such a solution, data will not be lost if the error occurs. But what happens when Rabbit falls down instead of services? To avoid this situation we are using Rabbit cluster with duplicated Rabbit nodes.

Caching

NoSQL database engine is fast but there are also fastest places to store data. One such is RAM. Reading and writing from RAM is much faster than write/read from the hard disk. Unfortunately, the volume of RAM is much smaller than the volume of the hard disk. It can store much less data.

Smarts uses cache to temporarily store data that is likely to be used at the moment. For example, purchase checks records are stored in cache. Smarts is using Redis cache engine. In more detail, we are using Redis cluster which contains 7 Redis servers to increase availability and which is minimum requirement to create Redis cluster.

Security

User data must not come into the hands of wrong people and system services must communicate securely with each other. How do we achieve this with such architecture? Smarts is owning their own authorization server which is based on OAuth2.0 and OpenID protocol. When a user enters his/her credentials, a unique identifier is generated for the session. This solution does not require sending original credentials from frontend to the backend every time the request is made. Instead, clients are validated by the session token and if the customer does not use the Smarts service, this identifier will expire.

Hosting and fault tolerance.

Server

When talking about hosting at a lower level, the first thing that has to be mentioned is RAID. We are not going deeper into the description of RAID, but the important thing is that data must be duplicated multiple times, to prevent data loss when hard drive breaks. Of course, the servers must be equipped with UPS power supply as well as multiple internet networks so that the server can be available at any time

Docker

Docker is a bit like a virtual machine. But unlike a virtual machine, rather than creating a whole virtual operating system, Docker allows applications to use the same Linux kernel as the system that they’re running on and only requires applications be shipped with things not already running on the host computer. This gives a significant performance boost and reduces the size of the application.

Smarts is using docker to mange better micro services and to create the required environment for each service.

Kubernetes

Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. K8s was created by Google who shared their best practices.

But why exactly we need to use Kubernetes ? For example, Netflix has hundreds microservices. Manually managing them is impossible. With Kubernetes you are like a crane driver in the port who moves containers.

All our services are containerized and managed by the Kubernetes engine. Due to the market need for our service, we are able to scale up/down our systems by using K8s. For example, at a specific moment Smarts is available only in Estonia, but another day we might need to cover the whole world with our service. We can do it with one command! Yay!

Secondly fault tolerance – If one of the services falls down , K8s can restore it itself faster than average OPS guy. For example, Smarts Communication service went down in the test enironvment and we discovered that action two weeks later because K8s automatically recovered the service.

Release management

The overall process comes down to how to manage entire picture as easily as possible.  Smarts is using Jenkins as continuous integration tool which automates our delivery process. When a developer commits a code to Git, Jenkins notices this and triggers a pipeline. With this pipeline, Docker image is automatically created and pushed to our private Docker registry. Next Kubernetes takes this image and automatically deploys it to our servers.

But how to make a zero down-time release ? K8s offers a rolling updates. This allows us to gradually deliver a new release and test the new release initially on a small number of people and then offer it to everyone.

Much to say, but the article would be very long. So that was the basics of Smarts system.

Thanks for reading.