Remove temporary documentation file
This commit is contained in:
parent
c6a2d1fe0a
commit
2b3892beee
1 changed files with 0 additions and 154 deletions
|
|
@ -1,154 +0,0 @@
|
|||
# Fix for Stale Lock Issue in ScheduledJobManager
|
||||
|
||||
## Issue
|
||||
GitHub Issue: #4539 - Scheduled tasks not executing on schedule
|
||||
|
||||
### Symptoms
|
||||
- Scheduled tasks stop executing after working for weeks/months
|
||||
- Backups don't run
|
||||
- Auto-updates don't work
|
||||
- Error in Horizon: `Illuminate\Queue\MaxAttemptsExceededException: App\Jobs\ScheduledJobManager has been attempted too many times`
|
||||
- Running `horizon:clear`, `cleanup:redis`, `schedule:clear-cache` doesn't fix the problem
|
||||
|
||||
## Root Cause
|
||||
|
||||
The `ScheduledJobManager` was using `WithoutOverlapping` middleware with only `releaseAfter(60)`:
|
||||
|
||||
```php
|
||||
(new WithoutOverlapping('scheduled-job-manager'))
|
||||
->releaseAfter(60)
|
||||
```
|
||||
|
||||
**Problems with this approach:**
|
||||
|
||||
1. **No automatic lock expiration**: Without `expireAfter()`, locks persist indefinitely if:
|
||||
- Process hangs or becomes unresponsive
|
||||
- Job takes longer than expected
|
||||
- Unexpected termination occurs
|
||||
|
||||
2. **Race condition with releaseAfter()**:
|
||||
- Job acquires lock
|
||||
- Job gets stuck/hangs
|
||||
- After 60s, job is released back to queue
|
||||
- New attempt can't acquire lock (still held by hung process)
|
||||
- Repeats until MaxAttemptsExceededException
|
||||
|
||||
3. **Against Laravel best practices**: Laravel docs explicitly recommend using `expireAfter()` to prevent stale locks
|
||||
|
||||
## Solution
|
||||
|
||||
This fix has two parts:
|
||||
|
||||
### Part 1: Prevention (Fix Future Locks)
|
||||
|
||||
Changed the middleware to match the pattern used by other Coolify jobs:
|
||||
|
||||
```php
|
||||
// File: app/Jobs/ScheduledJobManager.php
|
||||
(new WithoutOverlapping('scheduled-job-manager'))
|
||||
->expireAfter(60) // Lock expires after 1 minute (matches job frequency)
|
||||
->dontRelease() // Don't re-queue on lock conflict
|
||||
```
|
||||
|
||||
### Part 2: Recovery (Clear Existing Stale Locks)
|
||||
|
||||
Enhanced `cleanup:redis` command to clear existing stale locks:
|
||||
|
||||
```php
|
||||
// File: app/Console/Commands/CleanupRedis.php
|
||||
// Added --clear-locks flag
|
||||
php artisan cleanup:redis --clear-locks
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
- Scans Redis for `laravel-queue-overlap` keys (WithoutOverlapping locks)
|
||||
- Checks TTL of each lock
|
||||
- Deletes locks with TTL = -1 (no expiration = stale!)
|
||||
- Skips active locks that have proper expiration
|
||||
- Called automatically during `app:init` (on Coolify startup/update)
|
||||
|
||||
### Why This Works
|
||||
|
||||
✅ **Auto-expiring locks**: Lock automatically expires after 60 seconds, even if:
|
||||
- Process crashes
|
||||
- Job hangs
|
||||
- Network issues occur
|
||||
|
||||
✅ **No retry storms**: `dontRelease()` prevents failed jobs from being re-queued repeatedly
|
||||
|
||||
✅ **Consistent pattern**: Matches other Coolify jobs like:
|
||||
- `DockerCleanupJob`: `expireAfter(600)->dontRelease()`
|
||||
- `ServerCheckJob`: `expireAfter(60)->dontRelease()`
|
||||
- `RestartProxyJob`: `expireAfter(60)->dontRelease()`
|
||||
|
||||
✅ **Laravel recommended**: Follows official Laravel documentation for preventing stale locks
|
||||
|
||||
### Why 60 Seconds?
|
||||
|
||||
- Job runs **every minute** (`everyMinute()` schedule)
|
||||
- Matches the job frequency (1:1 ratio)
|
||||
- Matches `CleanupInstanceStuffsJob` pattern (also runs frequently with 60s expiry)
|
||||
- Allows next cycle to run if current job hangs
|
||||
- Still reasonable timeout to prevent long-held locks
|
||||
|
||||
## Testing
|
||||
|
||||
### Manual Lock Key Inspection
|
||||
|
||||
To check for locks in Redis:
|
||||
|
||||
```bash
|
||||
docker exec -it coolify-redis redis-cli
|
||||
SELECT 0
|
||||
KEYS *laravel-queue-overlap*ScheduledJobManager*
|
||||
```
|
||||
|
||||
Full key format:
|
||||
```
|
||||
coolify_development_database_coolify_development_cache_laravel-queue-overlap:App\Jobs\ScheduledJobManager:scheduled-job-manager
|
||||
```
|
||||
|
||||
Check TTL:
|
||||
```bash
|
||||
TTL "<full-key-from-above>"
|
||||
```
|
||||
|
||||
- `-1` = No expiration (STALE LOCK - the bug!)
|
||||
- `-2` = Key doesn't exist
|
||||
- Positive number = Seconds until expiration (GOOD!)
|
||||
|
||||
### Testing the Fix
|
||||
|
||||
Created test jobs to demonstrate the fix:
|
||||
- `TestStaleLockJob.php` - Uses broken pattern (`releaseAfter` only)
|
||||
- `TestFixedLockJob.php` - Uses fixed pattern (`expireAfter` + `dontRelease`)
|
||||
|
||||
## Impact
|
||||
|
||||
This fix will:
|
||||
- ✅ **Immediate recovery**: Existing stale locks cleared on upgrade/restart
|
||||
- ✅ **Future prevention**: New locks auto-expire, preventing issue recurrence
|
||||
- ✅ **Self-recovery**: System can recover from transient issues automatically
|
||||
- ✅ **Zero manual intervention**: No need for users to manually clear locks
|
||||
- ✅ **Reliable operations**: Backups, tasks, and auto-updates run consistently
|
||||
|
||||
## Files Modified
|
||||
|
||||
1. **app/Jobs/ScheduledJobManager.php**
|
||||
- Changed middleware to use `expireAfter(120)->dontRelease()`
|
||||
|
||||
2. **app/Console/Commands/CleanupRedis.php**
|
||||
- Added `--clear-locks` flag
|
||||
- Added `cleanupCacheLocks()` method
|
||||
|
||||
3. **app/Console/Commands/Init.php**
|
||||
- Updated to call `cleanup:redis --clear-locks` on startup
|
||||
|
||||
4. **tests/Unit/ScheduledJobManagerLockTest.php**
|
||||
- New unit test to prevent regression
|
||||
|
||||
## References
|
||||
|
||||
- Laravel Docs: https://laravel.com/docs/12.x/queues#preventing-job-overlaps
|
||||
- GitHub Issue: https://github.com/coollabsio/coolify/issues/4539
|
||||
- Related Pattern: All other Coolify jobs use `expireAfter()->dontRelease()`
|
||||
Loading…
Reference in a new issue