Remove temporary documentation file

This commit is contained in:
Andras Bacsai 2025-10-23 10:18:31 +02:00
parent c6a2d1fe0a
commit 2b3892beee

View file

@ -1,154 +0,0 @@
# Fix for Stale Lock Issue in ScheduledJobManager
## Issue
GitHub Issue: #4539 - Scheduled tasks not executing on schedule
### Symptoms
- Scheduled tasks stop executing after working for weeks/months
- Backups don't run
- Auto-updates don't work
- Error in Horizon: `Illuminate\Queue\MaxAttemptsExceededException: App\Jobs\ScheduledJobManager has been attempted too many times`
- Running `horizon:clear`, `cleanup:redis`, `schedule:clear-cache` doesn't fix the problem
## Root Cause
The `ScheduledJobManager` was using `WithoutOverlapping` middleware with only `releaseAfter(60)`:
```php
(new WithoutOverlapping('scheduled-job-manager'))
->releaseAfter(60)
```
**Problems with this approach:**
1. **No automatic lock expiration**: Without `expireAfter()`, locks persist indefinitely if:
- Process hangs or becomes unresponsive
- Job takes longer than expected
- Unexpected termination occurs
2. **Race condition with releaseAfter()**:
- Job acquires lock
- Job gets stuck/hangs
- After 60s, job is released back to queue
- New attempt can't acquire lock (still held by hung process)
- Repeats until MaxAttemptsExceededException
3. **Against Laravel best practices**: Laravel docs explicitly recommend using `expireAfter()` to prevent stale locks
## Solution
This fix has two parts:
### Part 1: Prevention (Fix Future Locks)
Changed the middleware to match the pattern used by other Coolify jobs:
```php
// File: app/Jobs/ScheduledJobManager.php
(new WithoutOverlapping('scheduled-job-manager'))
->expireAfter(60) // Lock expires after 1 minute (matches job frequency)
->dontRelease() // Don't re-queue on lock conflict
```
### Part 2: Recovery (Clear Existing Stale Locks)
Enhanced `cleanup:redis` command to clear existing stale locks:
```php
// File: app/Console/Commands/CleanupRedis.php
// Added --clear-locks flag
php artisan cleanup:redis --clear-locks
```
**What it does:**
- Scans Redis for `laravel-queue-overlap` keys (WithoutOverlapping locks)
- Checks TTL of each lock
- Deletes locks with TTL = -1 (no expiration = stale!)
- Skips active locks that have proper expiration
- Called automatically during `app:init` (on Coolify startup/update)
### Why This Works
**Auto-expiring locks**: Lock automatically expires after 60 seconds, even if:
- Process crashes
- Job hangs
- Network issues occur
**No retry storms**: `dontRelease()` prevents failed jobs from being re-queued repeatedly
**Consistent pattern**: Matches other Coolify jobs like:
- `DockerCleanupJob`: `expireAfter(600)->dontRelease()`
- `ServerCheckJob`: `expireAfter(60)->dontRelease()`
- `RestartProxyJob`: `expireAfter(60)->dontRelease()`
**Laravel recommended**: Follows official Laravel documentation for preventing stale locks
### Why 60 Seconds?
- Job runs **every minute** (`everyMinute()` schedule)
- Matches the job frequency (1:1 ratio)
- Matches `CleanupInstanceStuffsJob` pattern (also runs frequently with 60s expiry)
- Allows next cycle to run if current job hangs
- Still reasonable timeout to prevent long-held locks
## Testing
### Manual Lock Key Inspection
To check for locks in Redis:
```bash
docker exec -it coolify-redis redis-cli
SELECT 0
KEYS *laravel-queue-overlap*ScheduledJobManager*
```
Full key format:
```
coolify_development_database_coolify_development_cache_laravel-queue-overlap:App\Jobs\ScheduledJobManager:scheduled-job-manager
```
Check TTL:
```bash
TTL "<full-key-from-above>"
```
- `-1` = No expiration (STALE LOCK - the bug!)
- `-2` = Key doesn't exist
- Positive number = Seconds until expiration (GOOD!)
### Testing the Fix
Created test jobs to demonstrate the fix:
- `TestStaleLockJob.php` - Uses broken pattern (`releaseAfter` only)
- `TestFixedLockJob.php` - Uses fixed pattern (`expireAfter` + `dontRelease`)
## Impact
This fix will:
- ✅ **Immediate recovery**: Existing stale locks cleared on upgrade/restart
- ✅ **Future prevention**: New locks auto-expire, preventing issue recurrence
- ✅ **Self-recovery**: System can recover from transient issues automatically
- ✅ **Zero manual intervention**: No need for users to manually clear locks
- ✅ **Reliable operations**: Backups, tasks, and auto-updates run consistently
## Files Modified
1. **app/Jobs/ScheduledJobManager.php**
- Changed middleware to use `expireAfter(120)->dontRelease()`
2. **app/Console/Commands/CleanupRedis.php**
- Added `--clear-locks` flag
- Added `cleanupCacheLocks()` method
3. **app/Console/Commands/Init.php**
- Updated to call `cleanup:redis --clear-locks` on startup
4. **tests/Unit/ScheduledJobManagerLockTest.php**
- New unit test to prevent regression
## References
- Laravel Docs: https://laravel.com/docs/12.x/queues#preventing-job-overlaps
- GitHub Issue: https://github.com/coollabsio/coolify/issues/4539
- Related Pattern: All other Coolify jobs use `expireAfter()->dontRelease()`