We use a wrapper script with a lockfile check before every job. Works but it’s boilerplate we copy-paste everywhere and half the team forgets to add it.
Happy to answer questions about how the parser handles edge cases, or why I chose client-side only. Also curious — do you manually check for cron overlaps today, or just find out when something breaks?
How does it handle jobs that run on different servers? Like if I have 5 machines all running their own crontabs, the overlap problem is actually across hosts not just within one crontab.
Single host for now. Multi-host is a real problem but wanted to nail the common case first — most people don't know they have single-host overlaps until something breaks.
Honestly? Find out when something breaks. Had a backup job and a DB vacuum running at the same time last year, took down prod for 20 minutes before I figured out what happened. Wish I’d had something like this.
We use a wrapper script with a lockfile check before every job. Works but it’s boilerplate we copy-paste everywhere and half the team forgets to add it.
Happy to answer questions about how the parser handles edge cases, or why I chose client-side only. Also curious — do you manually check for cron overlaps today, or just find out when something breaks?
That is why i moved to systemd timers. I can manage relationships and prevent duplicate runs.
Fair point — systemd timers solve this properly. This is for the massive installed base that isn't migrating anytime soon.
Should overlap prevention live at the scheduler level or inside the script with lockfiles?
How does it handle jobs that run on different servers? Like if I have 5 machines all running their own crontabs, the overlap problem is actually across hosts not just within one crontab.
Single host for now. Multi-host is a real problem but wanted to nail the common case first — most people don't know they have single-host overlaps until something breaks.
At what point does cron complexity justify moving to a proper job scheduler like Airflow or Temporal?
Honestly? Find out when something breaks. Had a backup job and a DB vacuum running at the same time last year, took down prod for 20 minutes before I figured out what happened. Wish I’d had something like this.
20 minutes of downtime is exactly the kind of thing this was built to prevent. Hope it helps next time.