Commit graph

11 commits

Author SHA1 Message Date
Andras Bacsai
9a4b4280be refactor(jobs): split task skip checks into critical and runtime phases
Move expensive runtime checks (service/application status) after cron
validation to avoid running them for tasks that aren't due. Critical
checks (orphans, infrastructure) remain in first phase.

Also fix database heading parameters to be built from the model.
2026-02-28 18:37:51 +01:00
Andras Bacsai
31555f9e8a fix(jobs): prevent non-due jobs firing on restart and enrich skip logs with resource links
- Refactor shouldRunNow() to only fire on first run (empty cache) if actually due by cron schedule, preventing spurious executions after cache loss or service restart
- Add enrichSkipLogsWithLinks() method to fetch and populate resource names and links for tasks, backups, and docker cleanup jobs in skip logs
- Update skip logs UI to display resource column with links to related resources, improving navigation and context
- Add fallback display when linked resources are deleted
- Expand tests to cover both restart scenarios: non-due jobs (should not fire) and due jobs (should fire)
2026-02-28 18:03:29 +01:00
Andras Bacsai
a0c177f6f2 feat(jobs): add queue delay resilience to scheduled job execution
Implement dedup key-based cron tracking to make scheduled jobs resilient to queue
delays. Even if a job is delayed by minutes, it will catch the missed cron window
by tracking previousRunDate in cache instead of relying on isDue() alone.

- Add dedupKey parameter to shouldRunNow() in ScheduledJobManager
  - When provided, uses getPreviousRunDate() + cache tracking for resilience
  - Falls back to isDue() for docker cleanups without dedup key
  - Prevents double-dispatch within same cron window

- Optimize ServerConnectionCheckJob dispatch
  - Skip SSH checks if Sentinel is healthy (enabled and live)
  - Reduces redundant checks when Sentinel heartbeat proves connectivity

- Remove hourly Sentinel update checks
  - Consolidate to daily CheckAndStartSentinelJob dispatch
  - Crash recovery handled by sentinelOutOfSync → ServerCheckJob flow

- Add logging for skipped database backups with context (backup_id, database_id, status)

- Refactor skip reason methods to accept server parameter, avoiding redundant queries

- Add comprehensive test suite for scheduling with various delay scenarios and timezones
2026-02-28 15:06:25 +01:00
Andras Bacsai
b88f9fca67 chore: prepare for PR 2026-02-25 12:07:29 +01:00
Andras Bacsai
cb0f5cc812 chore: prepare for PR 2026-02-23 12:19:57 +01:00
Andras Bacsai
664b31212f chore: prepare for PR 2026-02-18 15:42:42 +01:00
Andras Bacsai
b22e79caec feat(jobs): improve scheduled tasks with retry logic and queue cleanup
- Add retry configuration to CoolifyTask (3 tries, 600s timeout)
- Add retry configuration to ScheduledTaskJob (3 tries, configurable timeout)
- Add retry configuration to DatabaseBackupJob (2 tries)
- Implement exponential backoff for all jobs (30s, 60s, 120s intervals)
- Add failed() handlers with comprehensive error logging to scheduled-errors channel
- Add execution tracking: started_at, retry_count, duration (decimal), error_details
- Add configurable timeout field to scheduled tasks (60-3600s, default 300s)
- Update UI to include timeout configuration in task creation/editing forms
- Increase ScheduledJobManager lock expiration from 60s to 90s for high-load environments
- Implement safe queue cleanup with restart vs runtime modes
  - Restart mode: aggressive cleanup (marks all processing jobs as failed)
  - Runtime mode: conservative cleanup (only marks jobs >12h as failed, skips deployments)
- Add cleanup:redis --restart flag for system startup
- Integrate cleanup into Dev.php init() for development environment
- Increase scheduled-errors log retention from 7 to 14 days
- Create comprehensive test suite (unit and feature tests)
- Add TESTING_GUIDE.md with manual testing instructions

Fixes issues with jobs failing after single attempt and "attempted too many times" errors
2025-11-10 11:11:18 +01:00
Andras Bacsai
c6a2d1fe0a Fix stale lock issue causing scheduled tasks to stop (#4539)
## Problem
Scheduled tasks, backups, and auto-updates stopped working after 1-2 months
with error: MaxAttemptsExceededException: App\Jobs\ScheduledJobManager has
been attempted too many times.

Root cause: ScheduledJobManager used WithoutOverlapping with only
releaseAfter(60), causing locks without expiration (TTL=-1) that persisted
indefinitely when jobs hung or processes crashed.

## Solution

### Part 1: Prevention (Future Locks)
- Added expireAfter(60) to ScheduledJobManager middleware
- Lock now auto-expires after 60 seconds (matches everyMinute schedule)
- Changed from releaseAfter(60) to expireAfter(60)->dontRelease()
- Follows Laravel best practices and matches other Coolify jobs

### Part 2: Recovery (Existing Locks)
- Enhanced cleanup:redis command with --clear-locks flag
- Scans Redis for stale locks (TTL=-1) and removes them
- Called automatically during app:init on startup/upgrade
- Provides immediate recovery for affected instances

## Changes
- app/Jobs/ScheduledJobManager.php: Added expireAfter(60)->dontRelease()
- app/Console/Commands/CleanupRedis.php: Added cleanupCacheLocks() method
- app/Console/Commands/Init.php: Auto-clear locks on startup
- tests/Unit/ScheduledJobManagerLockTest.php: Test to prevent regression
- STALE_LOCK_FIX.md: Complete documentation

## Testing
- Unit tests pass (2 tests, 8 assertions)
- Code formatted with Pint
- Matches pattern used by CleanupInstanceStuffsJob

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-23 10:07:33 +02:00
Andras Bacsai
ed93031a39 feat(docker): implement Docker cleanup processing in ScheduledJobManager; refactor server task scheduling to streamline cleanup job dispatching 2025-08-26 14:43:57 +02:00
Andras Bacsai
11341d7c2c refactor(jobs): remove logging for ScheduledJobManager and ServerResourceManager start and completion 2025-07-18 23:15:51 +02:00
Andras Bacsai
80fae306e6 feat(scheduling): introduce ScheduledJobManager and ServerResourceManager for enhanced job scheduling and resource management 2025-07-12 14:44:44 +02:00