Move expensive runtime checks (service/application status) after cron
validation to avoid running them for tasks that aren't due. Critical
checks (orphans, infrastructure) remain in first phase.
Also fix database heading parameters to be built from the model.
- Refactor shouldRunNow() to only fire on first run (empty cache) if actually due by cron schedule, preventing spurious executions after cache loss or service restart
- Add enrichSkipLogsWithLinks() method to fetch and populate resource names and links for tasks, backups, and docker cleanup jobs in skip logs
- Update skip logs UI to display resource column with links to related resources, improving navigation and context
- Add fallback display when linked resources are deleted
- Expand tests to cover both restart scenarios: non-due jobs (should not fire) and due jobs (should fire)
Implement dedup key-based cron tracking to make scheduled jobs resilient to queue
delays. Even if a job is delayed by minutes, it will catch the missed cron window
by tracking previousRunDate in cache instead of relying on isDue() alone.
- Add dedupKey parameter to shouldRunNow() in ScheduledJobManager
- When provided, uses getPreviousRunDate() + cache tracking for resilience
- Falls back to isDue() for docker cleanups without dedup key
- Prevents double-dispatch within same cron window
- Optimize ServerConnectionCheckJob dispatch
- Skip SSH checks if Sentinel is healthy (enabled and live)
- Reduces redundant checks when Sentinel heartbeat proves connectivity
- Remove hourly Sentinel update checks
- Consolidate to daily CheckAndStartSentinelJob dispatch
- Crash recovery handled by sentinelOutOfSync → ServerCheckJob flow
- Add logging for skipped database backups with context (backup_id, database_id, status)
- Refactor skip reason methods to accept server parameter, avoiding redundant queries
- Add comprehensive test suite for scheduling with various delay scenarios and timezones
- Add retry configuration to CoolifyTask (3 tries, 600s timeout)
- Add retry configuration to ScheduledTaskJob (3 tries, configurable timeout)
- Add retry configuration to DatabaseBackupJob (2 tries)
- Implement exponential backoff for all jobs (30s, 60s, 120s intervals)
- Add failed() handlers with comprehensive error logging to scheduled-errors channel
- Add execution tracking: started_at, retry_count, duration (decimal), error_details
- Add configurable timeout field to scheduled tasks (60-3600s, default 300s)
- Update UI to include timeout configuration in task creation/editing forms
- Increase ScheduledJobManager lock expiration from 60s to 90s for high-load environments
- Implement safe queue cleanup with restart vs runtime modes
- Restart mode: aggressive cleanup (marks all processing jobs as failed)
- Runtime mode: conservative cleanup (only marks jobs >12h as failed, skips deployments)
- Add cleanup:redis --restart flag for system startup
- Integrate cleanup into Dev.php init() for development environment
- Increase scheduled-errors log retention from 7 to 14 days
- Create comprehensive test suite (unit and feature tests)
- Add TESTING_GUIDE.md with manual testing instructions
Fixes issues with jobs failing after single attempt and "attempted too many times" errors
## Problem
Scheduled tasks, backups, and auto-updates stopped working after 1-2 months
with error: MaxAttemptsExceededException: App\Jobs\ScheduledJobManager has
been attempted too many times.
Root cause: ScheduledJobManager used WithoutOverlapping with only
releaseAfter(60), causing locks without expiration (TTL=-1) that persisted
indefinitely when jobs hung or processes crashed.
## Solution
### Part 1: Prevention (Future Locks)
- Added expireAfter(60) to ScheduledJobManager middleware
- Lock now auto-expires after 60 seconds (matches everyMinute schedule)
- Changed from releaseAfter(60) to expireAfter(60)->dontRelease()
- Follows Laravel best practices and matches other Coolify jobs
### Part 2: Recovery (Existing Locks)
- Enhanced cleanup:redis command with --clear-locks flag
- Scans Redis for stale locks (TTL=-1) and removes them
- Called automatically during app:init on startup/upgrade
- Provides immediate recovery for affected instances
## Changes
- app/Jobs/ScheduledJobManager.php: Added expireAfter(60)->dontRelease()
- app/Console/Commands/CleanupRedis.php: Added cleanupCacheLocks() method
- app/Console/Commands/Init.php: Auto-clear locks on startup
- tests/Unit/ScheduledJobManagerLockTest.php: Test to prevent regression
- STALE_LOCK_FIX.md: Complete documentation
## Testing
- Unit tests pass (2 tests, 8 assertions)
- Code formatted with Pint
- Matches pattern used by CleanupInstanceStuffsJob
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>