Tuesday, February 2, 2016

SetCronJob is high availability now!

After 3 servers are added, there were several issues that caused SetCronJob unstable. We were working really hard to fix, improve and upgrade our servers and application.

The service is stabilized after a few days, and we're still monitoring and working on it. Until now, we can proudly announce that SetCronJob is high availability. We removed all single point of failure, including:
- Web server: we're using round robin DNS, so if one web server is down, your browser will retry the request at another server. There will be some delay time (e.g. 30 seconds, tested on my browser), but it's not a big problem. As long as you can access our website, and the cronjobs are executed properly, it's no big deal.
- Cronjob processor: all cronjobs are distributed to all servers to be processed. If one server is down, the other cronjob processor can still handle all the cronjobs properly. Actually just one server is powerful enough to handle all cronjobs. Network failure is automatically detected, so any server has any network problem, it'll be temporarily removed until everything is fine and it can process cronjob properly again.
- Database server: we have synchronous database replication, automatic failure detection and failover, load balancer. No more database down!

With this set up, SetCronJob is now the most reliable online cronjob service. We're processing 3.8 millions cronjobs a day, or 2,600 cronjobs per minute, or 44 cronjobs per second. And the service is already scaled up and out so that it can handle up to x3 times than those numbers :-)

Happy setting up cronjobs!