Commit graph

33 commits

Author SHA1 Message Date
Andras Bacsai
d2d9c1b2bc debug: add comprehensive status change logging
Added detailed debug logging to all status update paths to help
diagnose why "unhealthy" status appears in the UI.

## Logging Added

### 1. PushServerUpdateJob (Sentinel updates)
**Location**: Lines 303-315
**Logs**: Status changes from Sentinel push updates
**Data tracked**:
- Old vs new status
- Container statuses that led to aggregation
- Status flags (hasRunning, hasUnhealthy, hasUnknown)

### 2. GetContainersStatus (SSH updates)
**Location**: Lines 441-449, 346-354, 358-365
**Logs**: Status changes from SSH-based checks
**Scenarios**:
- Normal status aggregation
- Recently restarted containers (kept as degraded)
- Applications not running (set to exited)
**Data tracked**:
- Old vs new status
- Container statuses
- Restart count and timing
- Whether containers exist

### 3. Application Model Status Accessor
**Location**: Lines 706-712, 726-732
**Logs**: When status is set without explicit health information
**Issue**: Highlights cases where health defaults to "unhealthy"
**Data tracked**:
- Raw value passed to setter
- Final result after default applied

## How to Use

### Enable Debug Logging
Edit `.env` or `config/logging.php` to set log level to debug:
```
LOG_LEVEL=debug
```

### Monitor Logs
```bash
tail -f storage/logs/laravel.log | grep STATUS-DEBUG
```

### Log Format
All logs use `[STATUS-DEBUG]` prefix for easy filtering:
```
[2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change
{
  "source": "PushServerUpdateJob",
  "app_id": 123,
  "app_name": "my-app",
  "old_status": "running:unknown",
  "new_status": "running:healthy",
  "container_statuses": [...],
  "flags": {...}
}
```

## What to Look For

1. **Default to unhealthy**: Check Application model accessor logs
2. **Status flipping**: Compare timestamps between Sentinel and SSH updates
3. **Incorrect aggregation**: Check flags and container_statuses
4. **Stale database values**: Check if old_status persists across multiple logs

## Next Steps

After gathering logs, we can:
1. Identify the exact source of "unhealthy" status
2. Determine if it's a default issue, aggregation bug, or timing problem
3. Apply targeted fix based on evidence

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:52:08 +01:00
Andras Bacsai
6b62847a11 fix: preserve unknown health status in Sentinel updates (PushServerUpdateJob)
## Problem
Services with "running (unknown)" status were periodically changing
to "running (healthy)" every ~30 seconds when Sentinel pushed updates.
This was confusing for users and inconsistent with SSH-based status checks.

## Root Cause
`PushServerUpdateJob::aggregateMultiContainerStatuses()` was missing
logic to track "unknown" health state. It only tracked "unhealthy" and
defaulted everything else to "healthy".

When Sentinel pushed updates with "running (unknown)" containers:
- The job saw `hasRunning = true` and `hasUnhealthy = false`
- It incorrectly returned "running (healthy)" instead of "running (unknown)"

## Solution
Updated `PushServerUpdateJob` to match the logic in `GetContainersStatus`:

1. Added `$hasUnknown` tracking variable
2. Check for "unknown" in status strings (alongside "unhealthy")
3. Implement 3-way priority: unhealthy > unknown > healthy

This ensures consistency between:
- SSH-based updates (`GetContainersStatus`)
- Sentinel-based updates (`PushServerUpdateJob`)
- UI display logic

## Changes
- **app/Jobs/PushServerUpdateJob.php**: Added unknown status tracking
- **tests/Unit/PushServerUpdateJobStatusAggregationTest.php**: New comprehensive tests
- **tests/Unit/ExcludeFromHealthCheckTest.php**: Updated to match current implementation

## Testing
All 31 status-related unit tests passing:
- 18 tests in ContainerHealthStatusTest
- 8 tests in ExcludeFromHealthCheckTest (updated)
- 6 tests in PushServerUpdateJobStatusAggregationTest (new)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:40:58 +01:00
Andras Bacsai
08d257535a fix(docker): enhance container status aggregation for multi-container applications, including exclusion handling based on docker-compose configuration 2025-09-13 20:32:15 +02:00
Andras Bacsai
0f5c988658 fix(horizon): add silenced jobs 2025-07-12 14:44:32 +02:00
Andras Bacsai
24688b2ad8 fix(jobs): update middleware to use expireAfter for WithoutOverlapping in multiple job classes 2025-07-01 10:50:27 +02:00
Andras Bacsai
f9a0ca2ca6 refactor(proxy): update StartProxy calls to use named parameter for async option 2025-06-16 13:13:01 +02:00
Andras Bacsai
ddcb14500d refactor(proxy-status): refactored how the proxy status is handled on the UI and on the backend
feat(cloudflare): improved cloudflare tunnel automated installation
2025-06-06 14:47:54 +02:00
Andras Bacsai
97ec579910 refactor(push-server-update): enhance application preview handling by incorporating pull request IDs and adding status update protections 2025-06-04 10:03:36 +02:00
Andras Bacsai
9883cef26d refactor(jobs): update middleware to include job-specific identifiers for WithoutOverlapping 2025-05-29 17:31:43 +02:00
Andras Bacsai
0369909408 fix(PushServerUpdateJob): add null checks before updating application and database statuses 2025-05-29 10:47:26 +02:00
Andras Bacsai
c6278a06ba refactor(jobs): unify middleware configuration to prevent job release after expiration for DockerCleanupJob and PushServerUpdateJob 2025-05-07 14:42:42 +02:00
Andras Bacsai
b78f2cccff refactor(jobs): update WithoutOverlapping middleware to use expireAfter for better queue management 2025-04-18 09:52:32 +02:00
Andras Bacsai
b09f0043d1 fix: restrict jobs on cloud
fix: restrict sentinel endpoint
2025-01-10 11:54:45 +01:00
Andras Bacsai
7dc65dfd79 fix: make sure important jobs/actions are running on high prio queue 2024-11-22 11:16:01 +01:00
Andras Bacsai
275edb6c1f put a few things on high queue 2024-11-06 12:33:56 +01:00
Lucas Michot
8e1444eaa7 Get rid of many useless blank lines 2024-10-31 17:44:01 +01:00
Andras Bacsai
96ca72fcdb refactor server view (phuuu) 2024-10-30 20:03:30 +01:00
Lucas Michot
5b6e466e0c Remove some useless catch blocks 2024-10-28 14:37:00 +01:00
Lucas Michot
d557a22b91 Remove all ray() calls 2024-10-28 13:51:23 +01:00
Andras Bacsai
8c96ab52d7 feat: notification rate limiter
fix: limit server up / down notification limits
2024-10-25 15:13:23 +02:00
Andras Bacsai
621e063bf1 Refactor PushServerUpdateJob to implement ShouldBeEncrypted interface 2024-10-24 15:16:00 +02:00
Andras Bacsai
ac768e5313 feat: limit storage check emails
feat: sentinel should send storage usage
2024-10-22 14:01:36 +02:00
Andras Bacsai
537630acc6 Refactor PushServerUpdateJob to handle container restart notifications 2024-10-22 11:42:24 +02:00
Andras Bacsai
d7efe8a6d1 fix: no sentinel for swarm yet 2024-10-22 11:29:43 +02:00
Andras Bacsai
4c95647b96 feat: cleanup sentinel on server deletion
fix: Sentinel should not be enabled on build servers
2024-10-17 11:21:43 +02:00
Andras Bacsai
2702fbc284 Refactor logging in PushServerUpdateJob, Application, and SentinelSeeder 2024-10-15 17:03:50 +02:00
Andras Bacsai
d446cd4f31 sentinel updates 2024-10-15 13:39:19 +02:00
Andras Bacsai
81db57002b Refactor PushServerUpdateJob to handle multiple servers, previews, and emails 2024-10-14 22:53:16 +02:00
Andras Bacsai
fdeb9353be chore: Update project service configuration view 2024-10-14 19:45:03 +02:00
Andras Bacsai
1f72321681 fix: sentinel 2024-10-14 18:04:36 +02:00
Andras Bacsai
8a2c9f3d44 updates sentinel 2024-10-14 17:54:29 +02:00
Andras Bacsai
b2e515f770 sentinel 2024-10-14 13:32:36 +02:00
Andras Bacsai
1f193d465d sentinel updates 2024-10-14 12:07:37 +02:00