Andrew
Harmel-Law interviewed by Maurice Driessen
Earlier this month, while
working on a proposal which turned out successful, I got the opportunity to
have a discussion with Andrew Harmel-Law, around his personal experience with constructing
a system based on a Microservices Architecture leveraging a evolutionary
approach. It made sense to us to share this experience with all of you software
engineers. Focal point of the discussion was how the non-functional
characteristics of a Microservices Architecture contribute to the business
continuity and agility of his client and helps software engineers to sustain a
high quality and complex solution while improving the collaboration with the
business.
Could you tell me something about your client
and how your collaboration with the client evolved?
My client is large mail & parcel fulfillment provider in the UK. We build and run an array of their services for them. Initially started with eBussiness, the public facing websites and micro sites. We used to interface into legacy backend systems, provided by others, not the client and not us. We recently won contracts to take over a good proportion of the legacy backend systems as well. They have an integration gateway, sitting in front of lots of backend systems, ranging from old mainframes to .NET based smaller solutions, the typical standard estate of large and small business critical system one can expect in any large organization grown over time, with all kinds of integration patterns, from point to point to ESB.
My client is large mail & parcel fulfillment provider in the UK. We build and run an array of their services for them. Initially started with eBussiness, the public facing websites and micro sites. We used to interface into legacy backend systems, provided by others, not the client and not us. We recently won contracts to take over a good proportion of the legacy backend systems as well. They have an integration gateway, sitting in front of lots of backend systems, ranging from old mainframes to .NET based smaller solutions, the typical standard estate of large and small business critical system one can expect in any large organization grown over time, with all kinds of integration patterns, from point to point to ESB.
Within the
program, individual projects driven by the client are run on individual
business cases. Because there has not been an overall integration program and
as a result how projects are funded and the individual scope of these projects,
we are again and again requiring an integration between micro site A, with
backend systems B & C, we are
currently leveraging a microservices based architecture to address this
challenge. We found the reuse of the microservices between front office and
back ends as we went forward project by project, which is typical for the agile
approach used to execute the projects. The evolution of the microservices is
the current vision how to address the integration challenge, as opposed to
almost a decade ago were we would do a large scale SOA analysis, where we would
try to identify services and agree how we would produce & consume them.
With the current microservices approach we see the benefit of the microservice
evolving and we see the benefit of an individual microservice being added to
the mix later on. Applying a microservices architecture enabled us to develop a
very large and complex system in a more modular and flexible approach.
Although
the client are driven by projects, which are not very agile, even within the
scope of a fixed price project we do run with our the client, we are required
to accommodate for last minute changes due to changing business needs required
to run their business, the cost of those changes, whether they are early or
later in the project, are constant. While in a traditionally architected
project, where you will incur technical debt, the cost of these changes tend to
increase at the end or after the project.
From a
business perspective the evolution of the microservices has created a
transparency which helped us to make the business understand the complexity any
request for change to the evolving microservices ecosystem. This is because the
business could grasp how the microservices map to their business, (c.f.
Conway’s Law) and as a result the business could also grasp and understand how
a change to business would be a more or less complex change to the
microservices. This the complete opposite of the old days with the traditional monolith
approach, where we as software engineers could only tell them about this “blob”
of integration code & change challenge, which we could not make them
understand and the business was required to trust us. As a result the microservices architecture
contributed to improving the business agility of our client and improving the
collaboration with the business to accommodate change and involving the
business in the evolution of their system architecture.
What can you share with regard to the
maintainability of your microservices solution?
The microservice architecture created a very maintainable code base because for each microservice a separate relatively small code base is created, this code base is put in a separate repository, the name of the microservice makes sense to developers and business, and this code is deployed and maintained as a physically separated executable component. The typical kind of technical debt we see in a microservice, is a piece of business logic which somehow got into the service but which should be in another service, due to responsibilities of each service. Because of the small code base, these kind of issues stand out and are identified easily.
The microservice architecture created a very maintainable code base because for each microservice a separate relatively small code base is created, this code base is put in a separate repository, the name of the microservice makes sense to developers and business, and this code is deployed and maintained as a physically separated executable component. The typical kind of technical debt we see in a microservice, is a piece of business logic which somehow got into the service but which should be in another service, due to responsibilities of each service. Because of the small code base, these kind of issues stand out and are identified easily.
In the old
monolithic approach obviously, we would also create the same modular code, but
we would wrap this in a single executable with a relatively simple
architecture, but in the end very complex codebase due to the size of this
monolith. If one would by honest mistake create a class in a wrong location
within the architecture and this would not get noticed and others would build
upon this, in the end the quality of your code will just drop and eventually
become un-maintainable.
A microservices
architecture also tends to have an evolutionary lifecycle. Because each
microservice makes sense to a product owner and software engineers, they can
have decent and meaningful conversations around them. If they agree a
microservice needs to be split up, the development team just go ahead and do it
and don’t over analyze the decision. After all, the product owner and software
engineers are ultimately responsible for them. This a result of the fact the
microservices architecture is very bare, open and explicit. It is because a
microservice is an individual component, which exposes a real and explicit
interface, which is documented and which may be consumed by a component
developed by the guy sitting next to you. All of this however requires software
engineering discipline and craftsmanship, because if you don’t do, it it won’t
work. The design and engineering has to happen, because it can’t happen by
accident.
Suppose we
would have created two versions of a microservice, we would have done this
explicitly for good reasons. Then again, if we decide to retire an old version,
because the version is deprecated, we
can do so explicitly and will get rid of the old version of the code as a whole.
If we compare this with removing code from a monolith application, removal from
a monolith comes at a cost and would be neglected. As a result the code base
would just be growing all the time, because nobody removed unused code and over
time the solution’s maintainability would decrease.
Because in
a microservices architecture your deployment units of are a lot smaller, you
are most likely to fix things faster in case there is for example a security
issue with a library used by the microservices in the solution. You can pull down
all these microservices, make the changes required to fix the issue, do a build
and a test of all microservices. If for example 7 out of 8 microservices pass
the tests, you could deploy the 7 fixed and patched microservices back into
production. If the remaining issue with the 8th microservice requires
more time to fix, than that is regrettable. But key message here is 7/8th
of your system is already up and running. Compared to a traditional monolith
that is a great improvement. Because with the monolith you would be required to
fix and test everything, before you can make your solution available again.
In a
microservices’ based solution any discussion on removing an old version of a
microservice from the solution environment will require explicit conversations
and decisions. The migrating from old
version of a microservice to an new version will impact depend service
consumers. This needs to be planned and organized explicitly. But you are
removing a complete moving part (the old version of the microservice) from your
solution environment as a whole and replacing it with another moving part, the
new version. You don’t need to remove
any unused pieces of code from a monolith, you just stop deploying the old
version and it is gone.
How does this contribute to the business
continuity of your client?
Well, we engineered our microservices to be stateless and our persistence tier (where our cross-request data resides) is a combination of MongoDB, MySQL and Redis. This enables us to scale up or down elastically, deploy new microservices, retire old microservices, without outage. We do service outages, because our client has gating processes and a way to release software into production, but we have build our solution in the same way as Netflix or other companies do. We could deploy new version of microservices and then use load balancers to redirect a fraction of the network traffic to the new ones and see if it works. And if they are happy, we could direct some more traffic or if it is bad and it’s blow up, we could redirect traffic back to the old versions. We have used this technique in our test environments giving zero-downtime upgrades.
Well, we engineered our microservices to be stateless and our persistence tier (where our cross-request data resides) is a combination of MongoDB, MySQL and Redis. This enables us to scale up or down elastically, deploy new microservices, retire old microservices, without outage. We do service outages, because our client has gating processes and a way to release software into production, but we have build our solution in the same way as Netflix or other companies do. We could deploy new version of microservices and then use load balancers to redirect a fraction of the network traffic to the new ones and see if it works. And if they are happy, we could direct some more traffic or if it is bad and it’s blow up, we could redirect traffic back to the old versions. We have used this technique in our test environments giving zero-downtime upgrades.
How do you actually deploy?
We are looking at being more mature. Currently we are packaging the microservices as jar files, which are executable. They contain Netty, which is our http listener and Camel to do the pipeline processing. Http client and Hystrix are used to call down-stream services or MongoDB or Redis. These jar files are promoted trough our testing environments, levering Puppet and Capistrano to automate our deploys and Jenkins and JMeter to run our smoke test.
We are looking at being more mature. Currently we are packaging the microservices as jar files, which are executable. They contain Netty, which is our http listener and Camel to do the pipeline processing. Http client and Hystrix are used to call down-stream services or MongoDB or Redis. These jar files are promoted trough our testing environments, levering Puppet and Capistrano to automate our deploys and Jenkins and JMeter to run our smoke test.
We are
considering moving to using Docker, because Docker is good in development as
well as in production. Docker provides process isolation – “bulkheading”. But
Docker would also provide us with the opportunity to deploy in a cloud based or
a PaaS platform. This can be done without code changes and limited changes to
our provisioning scripts and would enable us to move to any private, hybrid or
public cloud environment. These capability is inherent in the architecture.
What is your experience with regard to
performance?
We do need to spec everything for the Christmas season, because around Christmas people tend to sent lots of things and that is what our client takes care of. Obviously around that time our systems get the highest load. Take for example the service which provides stamps. A user can sent in a request of up to 200 stamps and that request will turn into 10 calls to a set of sub-microservices, which are 2000 transactions. For stamp requests we can handle around 10 to 15 a second, resulting in 20 to 30 thousand transactions a second overall within this solution. This is not massively high scale, but what we have proven is that we can scale horizontally almost linearly because we do not require any co-ordination between the microservice instances. Our scaling bottleneck is actually our 5 node MongoDB cluster. Because providing stamps is basically like printing money, obviously we must keep track of what we sell to the client’s customers. The synchronized write across the 5 node MongoDB cluster, guaranteed to have written to disk on at least 3 nodes, is our limiting step with regard to our transactional capacity.
We do need to spec everything for the Christmas season, because around Christmas people tend to sent lots of things and that is what our client takes care of. Obviously around that time our systems get the highest load. Take for example the service which provides stamps. A user can sent in a request of up to 200 stamps and that request will turn into 10 calls to a set of sub-microservices, which are 2000 transactions. For stamp requests we can handle around 10 to 15 a second, resulting in 20 to 30 thousand transactions a second overall within this solution. This is not massively high scale, but what we have proven is that we can scale horizontally almost linearly because we do not require any co-ordination between the microservice instances. Our scaling bottleneck is actually our 5 node MongoDB cluster. Because providing stamps is basically like printing money, obviously we must keep track of what we sell to the client’s customers. The synchronized write across the 5 node MongoDB cluster, guaranteed to have written to disk on at least 3 nodes, is our limiting step with regard to our transactional capacity.
For
everything we have constructed in our microservices landscape is very strongly non-
functionally tested, specifically for throughput and scalability. We spend a
lot of time tuning timeouts, tuning thread pool sizes, tuning connection pool
sizes. We spend considerable effort putting logging and monitoring in place, so
we can see what is actually going on in the system. Sometimes you want things
to fail and fail fast. As a result the configuration is set on how we perceive
the demand for our microservices, making sure we do fail at the right point of
actual demand and in our case that is just because some of the backend systems,
our solutions is depend on, are really slow and have limited capacity. So in
case of unforeseen issues we do need to set timeout lower and we have to make
sure there are any knockout effects. Again, the good engineering practices that
Capgemini is good at, are again brought right to the front. You can get very
high throughputs with a microservices architecture, but you need to check. You
won’t get it for free. You need to make sure all your configuration settings
are setup correctly. As long as you stay in control of your configuration, you
can basically scale linear, however knowing the bottlenecks in your system environment
is key.
In lots of
ways with microservices architecture we are doing SOA, an architecture
Capgemini talked a lot about in the early 2000s, but which was at that time
SOAP based. Nowadays with microservices architecture we are evolving a REST
based SOA architecture at a very high level.
What about the choreography of all these individual
microservices? What does it take to make these individual microservices act a
fully fledged enterprise level computing system?
In our solution we ended up with choreography services. Take the example the case of making a stamp. To make a stamp you need to have a tracking number, which are pre-allocated. You need to reserve a tracking number, get the tracking number, use that tracking number to make the stamp. When you have created the stamp with the tracking number, you need to mark the tracking number as used. Then finally you need to make a barcode image of the tracking number and make that into the label. So that is our business process which business people do understand. Our solution has services which sit at that level and marshal the top level request, which we protect from the choreography services, with what we call adapters but are basically microservice session facades.
In our solution we ended up with choreography services. Take the example the case of making a stamp. To make a stamp you need to have a tracking number, which are pre-allocated. You need to reserve a tracking number, get the tracking number, use that tracking number to make the stamp. When you have created the stamp with the tracking number, you need to mark the tracking number as used. Then finally you need to make a barcode image of the tracking number and make that into the label. So that is our business process which business people do understand. Our solution has services which sit at that level and marshal the top level request, which we protect from the choreography services, with what we call adapters but are basically microservice session facades.
We do
expose our microservices to various consumers, which get a distinct functional
and non-functional flavors of our services. Consumers like for example Amazon
and Ebay and the client’s own website. These consumers do their own decoration
and adaptation to the request, but in the end when they need stamps they call
our choreography class with the actual request for stamps. Then the
choreography class calls down to the various resource classes; the resource
microservices which make a tracking number, give a barcode, ect. These are all
responsible for timing out effectively, tiding things up when things go wrong,
making sure no mess is left behind.
Have you seen any patterns evolve in your
solution?
In our solution we see the following reoccurring pattern. We have a single microservice sitting in front of each data store, which for example passes out tracking numbers. We also typically have a management microservice for a data store to setup items in the data store, like for example setting up new tracking numbers, which is deployed separately, because there is no need to massively scale out this kind of service, a few will just do fine. Typically there is also a reporting microservice, if there is a resource which needs reporting on. Again we have that as a separate service. So most logical resources in a system, like for example the resource tracking numbers , have these 3 types of microservices allocated to them. But you only need to scale the microservices which handle to public available request, in our example the “pass out a tracking number” microservice. The other two types will not be hit with a heavy load, they just need to be available when there is a demand for their service. This approach enables us to scale our system at a more granular level, at the level of microservices, compared to the tradition monolith systems, which in turn enables us to more effectively and efficiently leverage the computing resources available and elastically scale the services for a given load profile.
In our solution we see the following reoccurring pattern. We have a single microservice sitting in front of each data store, which for example passes out tracking numbers. We also typically have a management microservice for a data store to setup items in the data store, like for example setting up new tracking numbers, which is deployed separately, because there is no need to massively scale out this kind of service, a few will just do fine. Typically there is also a reporting microservice, if there is a resource which needs reporting on. Again we have that as a separate service. So most logical resources in a system, like for example the resource tracking numbers , have these 3 types of microservices allocated to them. But you only need to scale the microservices which handle to public available request, in our example the “pass out a tracking number” microservice. The other two types will not be hit with a heavy load, they just need to be available when there is a demand for their service. This approach enables us to scale our system at a more granular level, at the level of microservices, compared to the tradition monolith systems, which in turn enables us to more effectively and efficiently leverage the computing resources available and elastically scale the services for a given load profile.
Any last advice for our software engineering community members?
The biggest thing about the microservices architecture to remember is, people had to remember they are software engineers and they don’t get a set of readymade, given practices to leverage on a plate. You do need to be aware of good engineering practices when constructing a microservices based solution, which makes a good case for Capgemini, because Capgemini is known in the market for the expertise of their engineers. But the core around microservices is fun; our engineers like creating these kind of architectures.
The biggest thing about the microservices architecture to remember is, people had to remember they are software engineers and they don’t get a set of readymade, given practices to leverage on a plate. You do need to be aware of good engineering practices when constructing a microservices based solution, which makes a good case for Capgemini, because Capgemini is known in the market for the expertise of their engineers. But the core around microservices is fun; our engineers like creating these kind of architectures.
_________________________________
If you want
more information on our experiences with microservices, check out the
engineering blog at http://capgemini.github.io/categories/index.html#architecture. A reading list Andrew suggested on
this topic is shared on http://bit.ly/MicroservicesArchitecture.