Micro-services gone wrong, my take
05 Apr 2019Let’s get right to it. Each of these services needs to be self manageable and not depended on the other services, meaning you could deploy it on its own, expand on it own ect. If I were to make a change in one API and that results in changes in all the rest APIs and the deployment procedure involves all of the APIs to get a new version than we are not really building services we are building APIs that are connected and very much depended on each other. It can work, but it will be hard to manage both team wise and development requirements wise. When this concept was introduced the idea was that a developer is able to solve a problem: divide and conquer. We would break things into smaller pieces so we can solve them better and reuse each of those small piece anywhere so we make a change in one would be auto-reflected anywhere. That’s like basic narrative and we all know that, but making things simple is very hard.
First, when we started developing we would have modules, in terms of MVC, so I would have a folder with files: model, controller and view and all for this model resides into this folder. That is the analogy of what a products are. Then, we moved the other way and we started creating folder per functionality. And that is the key word about how services should be build as well: functionality. In that folder I could have as many controllers as I want, more models, just service classes making things work. It is really difficult to decide how small services should be, once I read they should be small but not that small. Should they implement only and only one functionality? We could have the way lambda AWS does it, where you would upload a single function and you get an endpoint for it in return. In that sense we are talking about real micro-services, but in reality we are actually building services. I used to think that architecture and development are two separate things and a good architect can be a bad developer, but I don’t support that anymore. The reason for it is that I can see a lot of correlation between the micro and macro concepts. The way developer structures his code and build a component is very similar to extracting that functionality as a service living on a separate server.
Having one monolith and than decide what you can take out and make a separate service makes sense, something you can scale and give proper amount of memory and CPU power. If I keep most of the things in one monolith application I wouldn’t be having much problems updating, implementing new requirements. On the contrary in case of services I have to make sure the data that is being exchanged between two APIs is the right version as changing one API and not breaking the flow in the second, the later should still be working on version 1 until properly implemented with fall-backs. Writing the fall-back before implementing the service is also very important like what happens in the other service I am relating on goes down. This also relates that one service should also be deployable until there is ongoing change on some other service.
One of my favorite question I would like to ask on interviews is: What is your favorite reason of choosing micro-services? I would like to get into more detailed discussions as there is no right answer to this question, but I would like to share mine. Each of the application we build using a framework and having them is cool and they offer us out of the box goodies, but bootstrapping one actually fires up lots of classes, like hundreds. So in a language like php with using these big frameworks like Laravel or Symofony, the php engine will first check or validate all the classes that are required for that specific page load/request. Checking if child classes that extend parent classes have the right overriding, classes extending an interface having the right public method implemented, classes extending abstract classes having the abstract method implemented and so on, all of this needs to be checking since php is not a compiled language but an interpreter and plus a dynamic type not static. Things might change in this regards, starting php7.4 with the pre-loader. Why am I saying all of this? Well, if you have a monolith application and it can grow very large, I’ve seen classes with thousands of lines of code and lots of lots of classes, just this initial step takes couple of seconds, which is ridiculous. So, if I break this huge monolith into many application so they can operate on their own I wouldn’t be having this problem and application will be fast.
It’s all about the speed, things needs to load fast, application needs to be able to scale on-demand, have more and more user but never lose in performance. So far so good, we’ve broken down the monolith application into more application, is there a price to pay, well of-course there is. The question that appears is: How many hops would you have, one API calls another that one calls some other and so on and you end up again trapped inside dependencies. Calling another API will also start the bootstrapping of that application so again you’re facing the problem with the interpreter, even though the biggest time consumer in this case would be the TCP connection, the http request itself to the other services, not the code itself. Having caches above each of the services helps so each API maintains its caching system so the other APIs can use and not needing to drill down to another database.
The idea would be to have an endpoint that point directly to this last API and have a queuing system that would sync all the data in background, so the user is kinda served with already cooked meal. That’s how we maintain the speed of that API, but what about the scalability and huge number of user’s requests. We could put a load balancer and of-course in most cases that is what we do as long as the number of hops is as low possible and the services are mostly independent. You should be able to append services, like 40 of them in your platform and not lose performance, having async requests and work in parallel.
This syncing of data can be very complicated and can be done in a wrong way, so the right choose of technology, tooling and proper memory management should be in place. Most of the cases I would use Redis or Kafka as storage for the jobs, so consumers can read from and your applications can produce them into. The choose of technology is closely related to the proper management of the variable as everything is stored in memory and your consumers are just while(true), infinite loops that could do big mess if not doing the right way. If I have a job that does an update in database and let’s say I have million of records in that table and then does some other sync up to caching, elastic search and so on, this can be very memory consuming and your process might go up to 40MB which would not be ideal, because then you will need to unset all the variable but even then you will hit the threshold very fast and your queue gets restated very often. In order not to have that maybe a better way would be to use C or Rust, the second one is the one I would prefer as it introduces something called ownership and once your variable are out of scope then are removed from memory as well.