Exited containers don't run health checks, so showing "(unhealthy)" is
misleading. This fix ensures exited status displays without health
suffixes across all monitoring systems (SSH, Sentinel, services, etc.)
and at the UI layer for backward compatibility with existing data.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states.
- Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status.
- Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components.
- Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere.
- Updated version number for sentinel to 0.0.17 in versions.json.
Prevents removal and re-download of database images on every restart. Docker cleanup was removing Docker Hub images (postgres, mysql, redis, etc.) that lack the coolify.managed=true label, causing them to be immediately re-pulled. Restart now preserves images while stopping/starting containers.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit addresses container status reporting issues and removes debug logging:
**Primary Fix:**
- Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data
- This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy"
- Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown'
**Service Multi-Container Aggregation:**
- Implemented service container status aggregation (same pattern as applications)
- Added serviceContainerStatuses collection to both Sentinel and SSH paths
- Services now aggregate status using priority: unhealthy > unknown > healthy
- Prevents race conditions where last-processed container would win
**Debug Logging Cleanup:**
- Removed all [STATUS-DEBUG] logging statements (25 total)
- Removed all ray() debugging calls (3 total)
- Removed proof_unknown_preserved and health_status_was_null debug fields
- Code is now production-ready
**Test Coverage:**
- Added 2 new tests for Sentinel default health status behavior
- Added 5 new tests for service aggregation in SSH path
- All 16 tests pass (66 assertions)
**Note:** The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added detailed debug logging to all status update paths to help
diagnose why "unhealthy" status appears in the UI.
## Logging Added
### 1. PushServerUpdateJob (Sentinel updates)
**Location**: Lines 303-315
**Logs**: Status changes from Sentinel push updates
**Data tracked**:
- Old vs new status
- Container statuses that led to aggregation
- Status flags (hasRunning, hasUnhealthy, hasUnknown)
### 2. GetContainersStatus (SSH updates)
**Location**: Lines 441-449, 346-354, 358-365
**Logs**: Status changes from SSH-based checks
**Scenarios**:
- Normal status aggregation
- Recently restarted containers (kept as degraded)
- Applications not running (set to exited)
**Data tracked**:
- Old vs new status
- Container statuses
- Restart count and timing
- Whether containers exist
### 3. Application Model Status Accessor
**Location**: Lines 706-712, 726-732
**Logs**: When status is set without explicit health information
**Issue**: Highlights cases where health defaults to "unhealthy"
**Data tracked**:
- Raw value passed to setter
- Final result after default applied
## How to Use
### Enable Debug Logging
Edit `.env` or `config/logging.php` to set log level to debug:
```
LOG_LEVEL=debug
```
### Monitor Logs
```bash
tail -f storage/logs/laravel.log | grep STATUS-DEBUG
```
### Log Format
All logs use `[STATUS-DEBUG]` prefix for easy filtering:
```
[2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change
{
"source": "PushServerUpdateJob",
"app_id": 123,
"app_name": "my-app",
"old_status": "running:unknown",
"new_status": "running:healthy",
"container_statuses": [...],
"flags": {...}
}
```
## What to Look For
1. **Default to unhealthy**: Check Application model accessor logs
2. **Status flipping**: Compare timestamps between Sentinel and SSH updates
3. **Incorrect aggregation**: Check flags and container_statuses
4. **Stale database values**: Check if old_status persists across multiple logs
## Next Steps
After gathering logs, we can:
1. Identify the exact source of "unhealthy" status
2. Determine if it's a default issue, aggregation bug, or timing problem
3. Apply targeted fix based on evidence
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes container health status aggregation to correctly handle
unknown health states and edge case container states across all resource types.
Changes:
1. **Preserve Unknown Health State**
- Add three-way priority: unhealthy > unknown > healthy
- Detect containers without healthchecks (null health) as unknown
- Apply across GetContainersStatus, ComplexStatusCheck, and Service models
2. **Handle Edge Case Container States**
- Add support for: created, starting, paused, dead, removing
- Map to appropriate statuses: starting (unknown), paused (unknown), degraded (unhealthy)
- Prevent containers in transitional states from showing incorrect status
3. **Add :excluded Suffix for Excluded Containers**
- Parse exclude_from_hc flag from docker-compose YAML
- Append :excluded suffix to individual container statuses
- Skip :excluded containers in non-excluded aggregation sections
- Strip :excluded suffix in excluded aggregation sections
- Makes it clear in UI which containers are excluded from monitoring
Files Modified:
- app/Actions/Docker/GetContainersStatus.php
- app/Actions/Shared/ComplexStatusCheck.php
- app/Models/Service.php
- tests/Unit/ContainerHealthStatusTest.php
Tests: 18 passed (82 assertions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When all containers are excluded from health checks, display their actual status
with :excluded suffix instead of misleading hardcoded statuses. This prevents
broken UI state with incorrect action buttons and provides clarity that monitoring
is disabled.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
When all services in a Docker Compose file have `exclude_from_hc: true`,
the status aggregation logic was returning invalid states causing broken UI.
**Problems fixed:**
- ComplexStatusCheck returned 'running:healthy' for apps with no monitored containers
- Service model returned ':' (null status) when all services excluded
- UI showed active start/stop buttons for non-running services
**Changes:**
- ComplexStatusCheck: Return 'exited:healthy' when relevantContainerCount is 0
- Service model: Return 'exited:healthy' when both status and health are null
- Added comprehensive unit tests to verify the fixes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Merged latest changes from the next branch to keep the feature branch
up to date. No conflicts were encountered during the merge.
Changes from next branch:
- Updated application deployment job error logging
- Updated server manager job and instance settings
- Removed PullHelperImageJob in favor of updated approach
- Database migration refinements
- Updated versions.json with latest component versions
All automatic merges were successful and no manual conflict resolution
was required.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add wait loops to ensure containers are fully removed before restarting.
This fixes race conditions where docker compose would fail because an
existing container was still being cleaned up.
Changes:
- StartProxy: Add explicit stop, wait loop before docker compose up
- StopProxy: Add wait loop after container removal
- Both actions now poll up to 10 seconds for complete removal
- Add error suppression to handle non-existent containers gracefully
Tests:
- Add StartProxyTest.php with 3 tests for cleanup logic
- Add StopProxyTest.php with 4 tests for stop behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add automated Traefik version checking job running weekly on Sundays
- Implement version detection from running containers and comparison with versions.json
- Add notifications across all channels (Email, Discord, Slack, Telegram, Pushover, Webhook) for outdated versions
- Create dismissible callout component with localStorage persistence
- Display cross-branch upgrade warnings (e.g., v3.5 -> v3.6) with changelog links
- Show patch update notifications within same branch
- Add warning icon that appears when callouts are dismissed
- Prevent duplicate notifications during proxy restart by adding restarting parameter
- Fix notification spam with transition-based logic for status changes
- Enable system email settings by default in development mode
- Track last saved/applied proxy settings to detect configuration drift
Stop dispatching PullHelperImageJob to thousands of servers when the helper image version changes. Instead, rely on Docker's automatic image pulling during actual deployments and backups. Inline the helper image pull in UpdateCoolify for the single use case.
This eliminates queue flooding on cloud instances while maintaining all functionality through Docker's built-in image management.
Track container restart counts from Docker and detect crash loops to provide better visibility into application health issues.
- Add restart_count, last_restart_at, and last_restart_type columns to applications table
- Detect restart count increases from Docker inspect data and send notifications
- Show restart count badge in UI with warning icon on Logs navigation
- Distinguish between crash restarts and manual restarts
- Implement 30-second grace period to prevent false "exited" status during crash loops
- Reset restart count on manual stop, restart, and redeploy actions
- Add unit tests for restart count tracking logic
This helps users quickly identify when containers are in crash loops and need attention, even when the container status flickers between states during Docker's restart backoff period.
This commit adds official Docker repository installation methods as fallbacks
when Rancher and get.docker.com convenience scripts fail, providing more
reliable Docker installation across all supported operating systems.
Changes:
- Add apt repository fallback for Debian-based systems (Ubuntu, Debian, Raspbian)
- Fixes installation on Debian 13 (Trixie) where get.docker.com fails
- Uses VERSION_CODENAME for automatic OS version detection
- Add dnf repository fallback for RHEL-based systems (CentOS, Fedora, Rocky, AlmaLinux)
- Add zypper repository fallback for SUSE-based systems (SLES, OpenSUSE)
- Refactor installation methods into dedicated private methods for better maintainability
Installation fallback chain:
1. Rancher install-docker script (preserves version pinning)
2. Docker get.docker.com convenience script
3. Official repository method (new, most reliable)
Benefits:
- Future-proof: Works with new OS releases automatically
- Production-ready: Uses Docker's recommended installation method
- Comprehensive: Covers 95%+ of Linux servers in production
- Maintainable: Clean code structure with single-responsibility methods
Fixes issue where Debian 13 (Trixie) servers fail validation because
get.docker.com script incorrectly uses numeric version "13" instead of
codename "trixie" in repository URLs.
When port mappings are changed in the UI and the database is restarted,
the system now gracefully stops and removes the existing container before
recreating it with the new configuration.
This prevents the "container name already in use" error that occurred when
Docker Compose tried to create a container with the same name but different
port configuration.
Changes:
- Add graceful container stop (10s timeout) before docker compose up
- Remove old container to avoid name conflicts
- Use --timeout flag (modern Docker CLI) instead of deprecated --time
- Apply fix to all database types: MariaDB, MySQL, PostgreSQL, MongoDB,
Redis, KeyDB, Dragonfly, and ClickHouse
- Update StopDatabase.php for consistency
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Added a new function to extract custom proxy commands from existing Traefik configurations before regenerating the proxy configuration.
- Updated the proxy configuration generation logic to include these custom commands, ensuring they are preserved during regeneration.
- Introduced unit tests to validate the extraction of custom commands and handle various scenarios, including invalid YAML and different proxy types.