On Queues

18 Jan 2018

Following are some challenges I’ve faced with queues. I have multiple queues that have multiple processes each as I curl data from third party and trying to store it in database. There are queues running in parallel, but each queue might dispatch another job in synchronous way even though multiple processes are running on it. The idea is to maximize the time and get the job done quicker, if no need to wait for something to finish, it can go into another queue, but sometime as per business logic I need to wait from update from previous queue, but running in parallel for different entities that are not related. I am using supervisor for running the queues.

Wrapping everything around try-catch statements as these long processes need to be kept alive so any error that might appear needs to be handled properly. Most probably you will have try-catch and throws another exception in the catch block to be catch by the outer try-catch block, or throws exception that will be handled in the next nested catch.
Each process is consuming memory, RAM, so setting up high number of processes per queue might not be recommended if you have a lot of queues as it will eat up all the RAM. Each of the queues have memory limit of 128MB so once that limit is reached it will be restarted.
General error: 2006 MySQL server has gone away. First try to see if you have the posix php extension installed, if not do it with: sudo apt-get install php-posix. This extension is new with php 7.1 version and used to kill the process. Also, you can use the refresh() of the connection as I am using redis for the queues storage of jobs and I actually do work as well in database so these long running processes will eventually hit this error, even more that the job is taken out from redis and afterwards deleted from redis so if this error is not try - catches somehow it job might get lost. The solution is that before each job a dummy sql is being executed (select 1) to check if the connection is alive, if not closes it and then next sql will re-create it. Also, you can use the option –once for each queue worker so all to be done once a job is finished, or directly close the database connection upon exiting the job. But, I would like to keep as long as possible all the object I can reuse in memory and only restart the queue if I hit the memory limit. There are couple of articles about this error and are suggested that it happens when you would use database driver for your queue, but I am using redis, but storing data into mysql, so again I am facing it.
error: “The EntityManager is closed”. As a queue is long running process, the Entity Manager (in my case I am working with symfony&doctrine) due to an error might close, so check first it is open, if so reset it or recreate it. Get fresh EM. Check logs for code errors. This can also happened due to server has gone away error, so as of php7 you can catch the Throwable interface, catch the error, check if $e->getMessage() contains this error and if so release the job back to the queue. When it comes to doctrine if the Entity Manager gets closed it most probably won’t be recovered, I have tried getting new entity manager, closing the connecting to database, but it seems the only way is to kill of the process. That’s how laravel queues are working so I’ve decided in my handleEntityException to try catch this error and in the catch block throw new Exception(‘no connection to the server’); which basically will be caught by the laravel queue system and kill off the process. In this case the job is released again in the laravel case, but in my case I’ve decided to delete the job and dispatch it again with the same payload data in the queue as by default the release method will do the same but won’t set the job attempts to zero. As the job that has this error due to mysql has gone away wasn’t even touched I would prefer to keep the attempts to zero.

/** * Handle Job with proper error handling * * @param $job * @param $data * @param $func * @return mixed * @throws Exception */ public function handle($job, $data, $func) { try { $em = $this->getFreshEM(); $func($em, $data); } catch (CustomException $e) { $this->tryJobAgain($e, $job); $this->handleCustomException($e, $em); } catch (Throwable $e) { $this->handleEntityException($e, $job, $data); } catch (Exception $e) { $this->handleEntityException($e, $job, $data); } finally { $job->delete(); $em->clear(); unset($em); } } Also important is to use the $em->clear() so the next job will have fresh EM without keeping anything in memory, otherwise the new data won’t be updated.

In some cases two queues might want to insert into database an entity that has being tried by another queue. If that PK was not found it the process will try to insert it but another process has inserted it meanwhile than the first INSERT will fail! Wrap it around try catch and re-query it but make sure Entity Manager wasn’t destroyed!
as dealing with third party sometime some endpoints might not be available so I store the failed jobs and have a cron job that runs once a day to try them again, even though max re-tries have already been reached. This is because this provider of data might do some changes and updates as well.

You get these effects with queues:

Instead of running one logic in a loop and have a synchronized logic one by one, you get to set up multiple processes running independent jobs in parallel.
You get real-time scenario so once you dispatch a job it will be automatically pick up by the queue. If you run a cron job instead of queues, in that case for example you take all the entities you want something done for all the related users and inside this two level nested loop you have a log process with hard error handling. With queues each job is in-depended and can relate to only one entity and handle this job as you like, so example you can re-try, delay it and so on or run some other jobs only for that specific job that just finished. Apart from the flexibly that you get with queues, the cron solution might that long and you want to deliver results very fast (if that’s what your application needs - aren’t they all nowadays anyway) and the fastest the cron can go is per minute sync whereas the queue will pick up a job instantaneously.
Most importantly the user doesn’t need to wait on his side for the page to load or finish saving. An example for this is when i am doing a sign up on a web site and press submit on a form that needs to send me an email as well. On the backend side after database work is done using a request to mail server is sent and that will take at least 5 more seconds. And that might be fine but if you have a lot of users doing this thing, have in mind about another functionality that needs sending emails more frequently, this might be a bottleneck. So, this needs to be regulated, it needs to a queue. Instead of connecting to mail server a job for email is dispatch in a queue. We might have high priority queue for the likes of registrations and low priority for some messaging, that’s why we can see some delays for some message on some web sites due to their low priority queue as there’s a trade-off made what needs to be delivered to the customer faster and what not.

Another side affect of the queues is that you have to make sure you don’t egoist the memory and cpu usage. The memory itself will be freed once each job is done, to do that you need to unset your variables but not db connector of sort as the next one might break if plan to reuse it of-course, but also there is an option by default it is 128MB and once that is reach the process will be killed of and memory fried. The cpu is tricky as you can set the supervisor to run 20 processes and if each of them last like 1h or longer the cpu will always be 100%, no idly time, so designing the queues should be done in fashion of how long they run, how much memory they need and set the right number of processes. If you have very long processes you’re better of with smaller number you would still like to have some room for breathing for cpu.

Ljupcho Apostolov /data/devs/

On Queues

Related Posts

The cost of a grpc call 03 Jun 2025

Scaling worker environment with k8s 27 Sep 2023

Path to golang microservices 13 Sep 2023