High-Speed Microservices

January 30, 2017


Microservices Architecture | High-Speed Microservices

This article endeavors to explain high-speed microservices architecture. If you are unfamiliar with the term microservices, you may want to first read this blog post on microservices by Michael Brunton and if have more time on your hands this one by James Lewis and Martin Fowler.

High-speed microservices is a philosophy and set of patterns for building services that can readily back mobile and web applications at scale. It uses a scale up and out versus just a scale-out model to do more with less hardware. A scale-up and out model uses in-memory operational data, efficient queue hand-off, and async calls to handle more calls on a single node.

In general, the cloud scale-out model, employs a sense of reckless abandon. If your app has performance issues, no problem spin up 100 servers. Still not fast enough. Try 1000 servers. This has a cost. This does not replace a cloud scale-out model per se. It just makes a cloud scale-out model more cost effective. Do more with less.

This is not to say that there are not advantages to the cloud scale-out model. This is to say that an ability to do more with less hardware has cost savings, and you can still scale out as needed in the cloud. This is nothing new per se. Actor systems like Akka, have been around for a while. Systems like Akka, Vertx and QBit are key ingredient to many high-speed microservices.

The beauty of high-speed microservices is it gets back to OOP roots where data and logic live together in a cohesive understandable representation of the problem domain, and away from separation of data and logic. Since data lives with the service logic that operates on it. Also, less time is spent dealing with cache coherency issues as the services own or lease the data (own for a period of time). The code tends to be smaller to do the same things.

You can expect to write less code. You can expect the code you write to run faster. To a true developer and software engineer, this is a boon. Algorithm speed matters again. It is not dust on the scale while you wait for a response from the database. This movement frees you to do more in less time and to have code that runs orders of magnitudes faster than typical IO bound cloud lemming services.

There are many Java frameworks and libraries for microservices that you can use to build a high-speed microservice system. Vertx, Akka, Kafka, Redis, Netty, Node.js, Go Channels, Twisted, QBit Java Microservice Lib, etc. are all great tools and technology stacks to build these types of services. This article is not about any particular technology stack or programming language but a more abstract coverage of what it means to build these type of high-speed services devoid of language or technology stack preference.

The model described in this article is the inverse of how many applications are built. It is not uncommon to need 3 to 20 servers for a service where in a traditional system you might need 100s or even 1,000s. Your AWS EC2 bill could be cut into 1/10th (or 1/100th) the cost for example. This is not just supposition but an actual observation.

In this model, you typically add extra services to enable failover support not to scale out per se. You will reduce the number of servers needed and your code base will be more coherent if you adopt these strategies.

You may have heard, keep your services stateless. We are recommending the opposite. Make your services own their operational data.

Attributes of High-speed services

High-speed services have the following attributes:

  • High-speed services are in-memory services
  • High-speed services do not block
  • High-speed services own their data
  • Scale out involves sharding services
  • Reliability is achieved by replicating service stores

An in-memory microservice is a service that runs in-memory. An in-memory service is non-blocking. An in-memory service can load its data from a central data store. In-memory services can load data it owns asynchronously and does not block. It continues to service other requests while the data is loading. It streams in data from its service store whenever leased data is not found, i.e., it faults the data into the services in streams.

At first blush, it appears that an in-memory service can achieve its in-memory from using a cache. This is not the case. An in-memory service can use caches but and in-memory service owns its core data. Cached data is only from other services that own their data.

Single writer rule: Only one service at any point in time can edit service particular set of service data

In-memory services either own their data or own their data for a period of time. Owning the data for a period of time is a lease model.

Think of it this way. Data can only be written to by one service at any given point in time. Cache inconsistencies and cache control logic is the root of all evil. The best way to keep data in sync with caches is to never use caches or use them sparingly. It is better to use a service store that can keep up with your application vending the data as needed in a lease model. Or to create longer leases on service data to improve speed. More on leasing and service stores is described later. A key ingredient is some sort of persistent message queue or better yet a distributed transaction log to sit in front of calls of an in-memory service like Kafka (Kafka Microservices).

Avoid the following: * Caching (use sparingly) * Blocking * Transactions * Databases for operational data

Embrace the following:

In-memory service data and data faulting * Sharding * Async callbacks * Replication / Batching / Remediations * Service Stores for operational data

Data faulting and data leasing seem a lot like caching. The key difference is ownership of data and the single writer principle.

Imagine a mobile app with a set of services that contains user data. The first call to any service checks to see if that users data is already loaded in the service. If the user data is not loaded into the service than the call from the mobile app is put into a queue and the call waits for the user data to get loaded asynchronously. The service continues to handle other calls and the service gets notified when the user data loads, and executes the call on the user data load event then. Since we can get many calls to load user data in a given second, we do not load each user one at a time but we load 100 user at a time or 1000 users at a time or we batch load all requests every 50ms (or both) of all user requests in that time. Loading the user data when it needed is called data faulting. Loading 1000 users at a time or all users in the last 50ms or all users since the last user load is called batching request. Batching requests is combining many requests into a single message to optimize IO throughput. Data faulting is the same way your OS loads disk segments into memory pages for virtual memory.

High speed services employ the following:

  • Timed/Size Batching (Kafka does this as well as QBit)
  • Callbacks
  • Call interception to enable data faulting from the service store
  • Data faulting for elasticity

Data ownership

The more data you can have in-memory the faster your services can run. Not all use cases and data fit this model. Some exceptions can be made. The more important principle is data ownership. This principle comes from the canonical definition of microservices.

In-memory is a means to an end. Mostly to facilitate non-blocking. The more important point is to have the service own the data instead of just being a view into shared data. The more important principle is the single writer principle and the avoidance of cache.

Let’s say that some data is historical data, and historical data rarely gets edited, but it does get edited. Then in this scenario it might make no sense at all to not load the historical data from a database and then update the database directly and skip the service store altogether since the usage is rare and unlikely to hamper the overall performance of the system.

If size of the data is an issue remember that you can shard the services and you can also fault data into a service server in batches or streams. These two vectors should allow most if not all of the operational data to be loaded into memory and enable the single writer principle. Think of this as more of the Pareto principle. You don’t need all. You just need the set of data in-memory that is going to give you the SLA that you need. All would be nice. But you can only have 20% of the data that is faulted in and still have a really fast system.

A lease can be 12 hour, 8 hours, or some other period of time. Once the lease has expired which could be based on the last time that service data was used, then the data just waits in the service store.

Why Lease? Why not just own?

Why not just own data out right. Well you can if the service data is small enough. Leasing data provides a level of elasticity. This allows you to spin up more nodes. If you optimize and tune the data load from the service store to the service then loading users data becomes trivial and very performant. The faster and more trivial the data fault loading, the shorter you can lease the data and the more elastic your services are. In like manner services can save data periodically to the service store or even keep data in a fault tolerant local store (store and forward) and update the service store in batches to accommodate speed and throughput. Leasing data is not the same as getting data from the cache. In the lease model only one node can edit the data at any given time.

Service Sharding / Service Routing

Elasticity is achieved through leasing and sharding. A service server node owns service data for a period of time. All calls for that users data is made to that server. In front of a series of service servers is a service router. A service router could be an F5 (network load balancer) that maintains server/user affinity through an HTTP header. A service router could be a more complex entity that knows more about the problem domain and knows how to route calls to other back end services.

Fault tolerance

The more important the data and the more replication and synchronization that needs to be done. The more important the data the more resources that are needed to ensure data safety.

If a service node goes down, a service router can select another service node to do that work from the service discovery. The service data will be loaded in an async/data faulting/batch. If the service was sending updates to changes then no state is lost except the state that was not sent to the service store since the last update. The more important the state/data, the more synchronization that should be done when the data is modified. For example, the service store can send an async confirmation of a save, the service could enqueue a response to the client. The client or service tier could opt to add retry logic if it does not get a response from the server. You can also replicate calls to services. You can also create a local store and forward for important calls.

Fault tolerance, service router and service discovery are essential to building a reactive Java microservices architecture.

Service Store

The primary store for a high speed service system is a service store. A service store can treat the service data as opaque. A service store is not a database. A service store is not a cache either. A service store may also keep the data in-memory. The primary function of a service store is vending data quickly to services that are faulting data in.

A services store also takes care of data replication for data safety and safe storage. A service store should be able to bulk save data (stream saves) and bulk load data (stream loads) to/from a service and to/from replicas. A service store like the service itself should never block. Responses are sent asynchronously. WebSocket or sockets are a great mechanisms to send responses from a service store to a given number of services. JSON or some form of binary JSON is a good transport and storage mechanism for a service store.

Service stores are elastic and typically sharded but not as elastic as service servers. Service stores employ replication and synchronization to limit data loss. Service stores are special servers so the rest of your application can be elastic and more fault tolerant. It is typical to over provision service stores to allow for a particular span of growth. Adding new nodes and setting up replication is more deliberate than it is with services. The service store and the leasing model is what enables the services to be elastic.

By special servers, we mean service store servers might use special hardware like disk level replication and servers which might employ additional monitoring. All service data saved to a service store should be saved in at least two servers. There is a certain level of replication that is expected. Service stores may also keep a transaction log so that others processes can follow the log and update databases for querying and reporting.

In high-speed services, databases are only for reporting, long term storage, backup, etc. All operational data is kept and vended out of the service stores which maintain their own replication and backups for recovery. All modifications to data is done by services. Service stores typically use JSON or some other standard data format for long term storage for both the transaction logs for storage into secondary databases.

A service store is the polar opposite of Big Data. A service store is just operational data. One could tail the transaction logs to create Big Data.

It is common to use high-speed messaging platform like Kafka as the communication pipe to your service store.

Active Objects / Threading model:

To minimize complex synchronization code that can become a bottle neck one should employ some form of Active Objects pattern for stateful, high-speed services. One could use an event bus system like Vertx or Node.js or an Actor system like Akka or GO channels or Python Twisted and build their own Active Object service system. Queues and messaging are essential if you want to handle back pressure and create reactive applications.

The active object pattern separates method execution from method invocation for objects that each reside in their own thread of control (or in the case of QBit for a group of objects that are in the same thread of control).

The goal of Active Objects is to introduce concurrency, by using asynchronous method invocation and a scheduler for handling requests. The scheduler could be a resumable thread draining a queue of method calls. This scheduler could also check to see if data needed for this call was already loaded into the system and fault data in from the service store into the running service before the call is made.

The Active Object pattern consists of six elements:

  • A client proxy to provide an interface for clients. The client proxy can be local (local client proxy) or remote (remote client proxy).
  • An interface which defines the method request on an active object.
  • A queue of pending method requests from clients.
  • A scheduler, which decides which request to execute next which could for example delay invocation until service data if faulted in or which could reorder method calls based on priority or which could work with several related services from one scheduler allowing said services to make local non-enqueued calls to each other.
  • The implementation of the active object methods. Contains your code.
  • A service callback for the client to receive the result.

High-speed microservices are always reactive microservices.

Reactive Microservices Architecture also know as just Microservices!

By the original definition of microservices, all microservices are reactive. A microservices that is not reactive is akin a bird without wings or a fish who can’t swim.

The Reactive Manifesto outlines qualities of Reactive Systems based on four principles: Responsive, Resilient, Elastic and Message Driven.

Responsiveness means the service should respond in a timely manner, and never let clients or upstream services hang. A system failure should not cause a chain reaction of failures. A failure of a downstream system may cause a degraded response, but a response none-the-less. A key ingredient to be responsive is the ability to handle back pressure and involves systems that are async and handle requests in streams of requests/messaging. If you framework is blocking, then any sort of responsive is usually bolt-on, which is less the ideal when dealing with microservices.

Resilience goes in line with responsiveness, the system should respond even in the face of failure and errors in a timely fashion. It can respond because it can detect an async response is not coming back in time and serve up a degraded response (circuit breaker). It may be able to respond in spite of failure because it can use a replicated version of a failed downstream node. Failure and recovery is built into the system. Monitoring and spinning up new instances to aid in recovery may be delegated to another highly available resource. A key component of resilience is the ability to monitor known good nodes and to perform Service Discovery to find alternative upstream and downstream services. The key to resilience is to avoid cascading failures. Failures should be isolated. Failure isolation is called bulkheading. Resilience is the ability or self-heal and/or the containment of failure. This is why traditional blocking frameworks do not do well with resilience as the strong coupling of synchronous communication does not fit at all with microservices.

Elasticity works with resilience. The ability to spin up new services and for downstream and upstream services and clients to find the new instances is vital to both the resilience of the system as well as the elasticity of the system. Reactive Systems can react to changes in load by spinning up more services to share the load. Imagine a set of services for a professional soccer game that delivers real time stats. During games, you may need to spin up many services. On non-game times, you may need just a few of these services. A reactive system is a system that can increase and decrease resources based on demand. Just like with resilience, Service Discovery aids with elasticity as it provides a mechanism for upstream and downstream services and clients to discover new nodes so the load can be spread across the services.

Message Driven: Reactive Systems rely on asynchronous message passing. This established boundaries between services (in-proc and out of proc) which allows for loose coupling (publish/subscribe or async streams or async calls), isolation (one failure does not ripple through to upstream services and clients), and improved responsive error handling. Having messaging allows one to control throughput (re-route, spin up more services) by applying back-pressure and using back pressure events to trigger changes to shape traffic through the queues. Messaging allows for non-blocking handling of responses. A common messaging platform to use with microservices is Kafka. Akka is an actor system that works well with distributed messaging and often gets used with microservices (Akka Microservices), and Reakt and QBit are two Java centric reactive libs and microservices libs respectively.

A well-written microservice should always apply the principles of the reactive manifesto. One could argue that a microservices architecture is just an extension of the reactive manifesto that is geared towards web services.

There are related subjects of reactive programming and functional reactive programming which are related to the reactive manifesto. A system can be a reactive system and not use a reactive programming model. Reactive programming is often used to coordinate asynchronous calls to multiple services as well as events and streams from clients and other systems.

An example: Client calls Service Z. Service Z calls Service A and Service B, but sends back only the combined results of Service A and Service C. The results of Service B are used to call Service C. Thus Z must call A, B, take the results of B and calls C, then return A/C combined back to the client. And, all of these calls must be asynchronous, non-blocking calls, but we should be able to handle errors for A, B or C, and handle timeouts such that the Client does not hang when Z calls downstream services. The orchestration of calling many services requires some sort of reactive programming coordination. Frameworks like RxJava, Reakt, Akka, RxJS, etc. were conceived to provide an Object Reactive programming model to better work in an environment where there are events, streams and asynchronous calls.

Service Discovery - Design for Failure

In addition to automated deployment, virtualization, and cloud orchestration/automation, microservices use microservices service discovery and microservices monitoring to recover from failure and to spread load across more nodes. The ability to discover service nodes, and adding them into the mix is called elasticity. This includes monitoring of services, detecting failures and removing unhealthy nodes out of the mix (and replacing them). Adding additional services into a running system.

A key component of microservices architecture is reactive programming, which is an async programming mode, and the ability to use back pressure to fail gracefully if the load surpasses the capacity of a node rather than having a cascading failure.

If your service under unexpected load becomes unresponsive then you did not write a microservice. If your service under load, throws error messages and tells clients it is under too much load, and you can spin up new nodes, and the nodes can discover each other (service discovery) and live another day, then you wrote a microservice. Microservices are resilient and elastic.

About Cloudurable

We hope you enjoyed this article. Please provide feedback. Cloudurable provides Kafka training, Kafka consulting, Kafka support and helps setting up Kafka clusters in AWS.

Check out our new GoLang course. We provide onsite Go Lang training which is instructor led.


Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting