Commit graph

36 commits

Author SHA1 Message Date
Andras Bacsai
ae6eef3cdb feat(tests): add comprehensive tests for ContainerStatusAggregator and serverStatus accessor
- Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states.
- Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status.
- Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components.
- Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere.
- Updated version number for sentinel to 0.0.17 in versions.json.
2025-11-20 17:31:07 +01:00
Andras Bacsai
14bba8ba86 fix: correct Sentinel default health status and remove debug logging
This commit addresses container status reporting issues and removes debug logging:

**Primary Fix:**
- Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data
- This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy"
- Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown'

**Service Multi-Container Aggregation:**
- Implemented service container status aggregation (same pattern as applications)
- Added serviceContainerStatuses collection to both Sentinel and SSH paths
- Services now aggregate status using priority: unhealthy > unknown > healthy
- Prevents race conditions where last-processed container would win

**Debug Logging Cleanup:**
- Removed all [STATUS-DEBUG] logging statements (25 total)
- Removed all ray() debugging calls (3 total)
- Removed proof_unknown_preserved and health_status_was_null debug fields
- Code is now production-ready

**Test Coverage:**
- Added 2 new tests for Sentinel default health status behavior
- Added 5 new tests for service aggregation in SSH path
- All 16 tests pass (66 assertions)

**Note:** The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 11:10:34 +01:00
Andras Bacsai
747a48b933 debug: add detailed Sentinel container processing logging
Added comprehensive logging to track why applicationContainerStatuses
collection is empty in PushServerUpdateJob.

## Logging Added

### 1. Raw Sentinel Data (line 113-118)
**Logs**: Complete container data received from Sentinel
**Purpose**: See exactly what Sentinel is sending
**Data**: Container count and full container array with all labels

### 2. Container Processing Loop (line 157-163)
**Logs**: Every container as it's being processed
**Purpose**: Track which containers enter the processing loop
**Data**: Container name, status, all labels, coolify.managed flag

### 3. Skipped Containers - Not Managed (line 165-171)
**Logs**: Containers without coolify.managed label
**Purpose**: Identify containers being filtered out early
**Data**: Container name

### 4. Successful Container Addition (line 193-198)
**Logs**: When container is successfully added to applicationContainerStatuses
**Purpose**: Confirm containers ARE being processed
**Data**: Application ID, container name, container status

### 5. Missing com.docker.compose.service Label (line 200-206)
**Logs**: Containers skipped due to missing com.docker.compose.service
**Purpose**: Identify the most likely root cause
**Data**: Container name, application ID, all labels

## Why This Matters

User reported applicationContainerStatuses is empty (`[]`) even though
Sentinel is pushing updates. This logging will reveal:

1. Is Sentinel sending containers at all?
2. Are containers filtered by coolify.managed check?
3. Is com.docker.compose.service label missing? (most likely)
4. What labels IS Sentinel actually sending?

## Expected Findings

Based on investigation, the issue is likely:
- Sentinel is NOT sending com.docker.compose.service in labels
- Or Sentinel uses a different label format/name
- Containers pass all other checks but fail on line 190-206

## Next Steps

After logs appear, we'll see exactly which filter is blocking containers
and can fix the root cause (likely need to extract com.docker.compose.service
from container name or use a different label source).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 08:34:42 +01:00
Andras Bacsai
d2d9c1b2bc debug: add comprehensive status change logging
Added detailed debug logging to all status update paths to help
diagnose why "unhealthy" status appears in the UI.

## Logging Added

### 1. PushServerUpdateJob (Sentinel updates)
**Location**: Lines 303-315
**Logs**: Status changes from Sentinel push updates
**Data tracked**:
- Old vs new status
- Container statuses that led to aggregation
- Status flags (hasRunning, hasUnhealthy, hasUnknown)

### 2. GetContainersStatus (SSH updates)
**Location**: Lines 441-449, 346-354, 358-365
**Logs**: Status changes from SSH-based checks
**Scenarios**:
- Normal status aggregation
- Recently restarted containers (kept as degraded)
- Applications not running (set to exited)
**Data tracked**:
- Old vs new status
- Container statuses
- Restart count and timing
- Whether containers exist

### 3. Application Model Status Accessor
**Location**: Lines 706-712, 726-732
**Logs**: When status is set without explicit health information
**Issue**: Highlights cases where health defaults to "unhealthy"
**Data tracked**:
- Raw value passed to setter
- Final result after default applied

## How to Use

### Enable Debug Logging
Edit `.env` or `config/logging.php` to set log level to debug:
```
LOG_LEVEL=debug
```

### Monitor Logs
```bash
tail -f storage/logs/laravel.log | grep STATUS-DEBUG
```

### Log Format
All logs use `[STATUS-DEBUG]` prefix for easy filtering:
```
[2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change
{
  "source": "PushServerUpdateJob",
  "app_id": 123,
  "app_name": "my-app",
  "old_status": "running:unknown",
  "new_status": "running:healthy",
  "container_statuses": [...],
  "flags": {...}
}
```

## What to Look For

1. **Default to unhealthy**: Check Application model accessor logs
2. **Status flipping**: Compare timestamps between Sentinel and SSH updates
3. **Incorrect aggregation**: Check flags and container_statuses
4. **Stale database values**: Check if old_status persists across multiple logs

## Next Steps

After gathering logs, we can:
1. Identify the exact source of "unhealthy" status
2. Determine if it's a default issue, aggregation bug, or timing problem
3. Apply targeted fix based on evidence

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:52:08 +01:00
Andras Bacsai
6b62847a11 fix: preserve unknown health status in Sentinel updates (PushServerUpdateJob)
## Problem
Services with "running (unknown)" status were periodically changing
to "running (healthy)" every ~30 seconds when Sentinel pushed updates.
This was confusing for users and inconsistent with SSH-based status checks.

## Root Cause
`PushServerUpdateJob::aggregateMultiContainerStatuses()` was missing
logic to track "unknown" health state. It only tracked "unhealthy" and
defaulted everything else to "healthy".

When Sentinel pushed updates with "running (unknown)" containers:
- The job saw `hasRunning = true` and `hasUnhealthy = false`
- It incorrectly returned "running (healthy)" instead of "running (unknown)"

## Solution
Updated `PushServerUpdateJob` to match the logic in `GetContainersStatus`:

1. Added `$hasUnknown` tracking variable
2. Check for "unknown" in status strings (alongside "unhealthy")
3. Implement 3-way priority: unhealthy > unknown > healthy

This ensures consistency between:
- SSH-based updates (`GetContainersStatus`)
- Sentinel-based updates (`PushServerUpdateJob`)
- UI display logic

## Changes
- **app/Jobs/PushServerUpdateJob.php**: Added unknown status tracking
- **tests/Unit/PushServerUpdateJobStatusAggregationTest.php**: New comprehensive tests
- **tests/Unit/ExcludeFromHealthCheckTest.php**: Updated to match current implementation

## Testing
All 31 status-related unit tests passing:
- 18 tests in ContainerHealthStatusTest
- 8 tests in ExcludeFromHealthCheckTest (updated)
- 6 tests in PushServerUpdateJobStatusAggregationTest (new)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:40:58 +01:00
Andras Bacsai
08d257535a fix(docker): enhance container status aggregation for multi-container applications, including exclusion handling based on docker-compose configuration 2025-09-13 20:32:15 +02:00
Andras Bacsai
0f5c988658 fix(horizon): add silenced jobs 2025-07-12 14:44:32 +02:00
Andras Bacsai
24688b2ad8 fix(jobs): update middleware to use expireAfter for WithoutOverlapping in multiple job classes 2025-07-01 10:50:27 +02:00
Andras Bacsai
f9a0ca2ca6 refactor(proxy): update StartProxy calls to use named parameter for async option 2025-06-16 13:13:01 +02:00
Andras Bacsai
ddcb14500d refactor(proxy-status): refactored how the proxy status is handled on the UI and on the backend
feat(cloudflare): improved cloudflare tunnel automated installation
2025-06-06 14:47:54 +02:00
Andras Bacsai
97ec579910 refactor(push-server-update): enhance application preview handling by incorporating pull request IDs and adding status update protections 2025-06-04 10:03:36 +02:00
Andras Bacsai
9883cef26d refactor(jobs): update middleware to include job-specific identifiers for WithoutOverlapping 2025-05-29 17:31:43 +02:00
Andras Bacsai
0369909408 fix(PushServerUpdateJob): add null checks before updating application and database statuses 2025-05-29 10:47:26 +02:00
Andras Bacsai
c6278a06ba refactor(jobs): unify middleware configuration to prevent job release after expiration for DockerCleanupJob and PushServerUpdateJob 2025-05-07 14:42:42 +02:00
Andras Bacsai
b78f2cccff refactor(jobs): update WithoutOverlapping middleware to use expireAfter for better queue management 2025-04-18 09:52:32 +02:00
Andras Bacsai
b09f0043d1 fix: restrict jobs on cloud
fix: restrict sentinel endpoint
2025-01-10 11:54:45 +01:00
Andras Bacsai
7dc65dfd79 fix: make sure important jobs/actions are running on high prio queue 2024-11-22 11:16:01 +01:00
Andras Bacsai
275edb6c1f put a few things on high queue 2024-11-06 12:33:56 +01:00
Lucas Michot
8e1444eaa7 Get rid of many useless blank lines 2024-10-31 17:44:01 +01:00
Andras Bacsai
96ca72fcdb refactor server view (phuuu) 2024-10-30 20:03:30 +01:00
Lucas Michot
5b6e466e0c Remove some useless catch blocks 2024-10-28 14:37:00 +01:00
Lucas Michot
d557a22b91 Remove all ray() calls 2024-10-28 13:51:23 +01:00
Andras Bacsai
8c96ab52d7 feat: notification rate limiter
fix: limit server up / down notification limits
2024-10-25 15:13:23 +02:00
Andras Bacsai
621e063bf1 Refactor PushServerUpdateJob to implement ShouldBeEncrypted interface 2024-10-24 15:16:00 +02:00
Andras Bacsai
ac768e5313 feat: limit storage check emails
feat: sentinel should send storage usage
2024-10-22 14:01:36 +02:00
Andras Bacsai
537630acc6 Refactor PushServerUpdateJob to handle container restart notifications 2024-10-22 11:42:24 +02:00
Andras Bacsai
d7efe8a6d1 fix: no sentinel for swarm yet 2024-10-22 11:29:43 +02:00
Andras Bacsai
4c95647b96 feat: cleanup sentinel on server deletion
fix: Sentinel should not be enabled on build servers
2024-10-17 11:21:43 +02:00
Andras Bacsai
2702fbc284 Refactor logging in PushServerUpdateJob, Application, and SentinelSeeder 2024-10-15 17:03:50 +02:00
Andras Bacsai
d446cd4f31 sentinel updates 2024-10-15 13:39:19 +02:00
Andras Bacsai
81db57002b Refactor PushServerUpdateJob to handle multiple servers, previews, and emails 2024-10-14 22:53:16 +02:00
Andras Bacsai
fdeb9353be chore: Update project service configuration view 2024-10-14 19:45:03 +02:00
Andras Bacsai
1f72321681 fix: sentinel 2024-10-14 18:04:36 +02:00
Andras Bacsai
8a2c9f3d44 updates sentinel 2024-10-14 17:54:29 +02:00
Andras Bacsai
b2e515f770 sentinel 2024-10-14 13:32:36 +02:00
Andras Bacsai
1f193d465d sentinel updates 2024-10-14 12:07:37 +02:00