Added detailed debug logging to all status update paths to help
diagnose why "unhealthy" status appears in the UI.
## Logging Added
### 1. PushServerUpdateJob (Sentinel updates)
**Location**: Lines 303-315
**Logs**: Status changes from Sentinel push updates
**Data tracked**:
- Old vs new status
- Container statuses that led to aggregation
- Status flags (hasRunning, hasUnhealthy, hasUnknown)
### 2. GetContainersStatus (SSH updates)
**Location**: Lines 441-449, 346-354, 358-365
**Logs**: Status changes from SSH-based checks
**Scenarios**:
- Normal status aggregation
- Recently restarted containers (kept as degraded)
- Applications not running (set to exited)
**Data tracked**:
- Old vs new status
- Container statuses
- Restart count and timing
- Whether containers exist
### 3. Application Model Status Accessor
**Location**: Lines 706-712, 726-732
**Logs**: When status is set without explicit health information
**Issue**: Highlights cases where health defaults to "unhealthy"
**Data tracked**:
- Raw value passed to setter
- Final result after default applied
## How to Use
### Enable Debug Logging
Edit `.env` or `config/logging.php` to set log level to debug:
```
LOG_LEVEL=debug
```
### Monitor Logs
```bash
tail -f storage/logs/laravel.log | grep STATUS-DEBUG
```
### Log Format
All logs use `[STATUS-DEBUG]` prefix for easy filtering:
```
[2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change
{
"source": "PushServerUpdateJob",
"app_id": 123,
"app_name": "my-app",
"old_status": "running:unknown",
"new_status": "running:healthy",
"container_statuses": [...],
"flags": {...}
}
```
## What to Look For
1. **Default to unhealthy**: Check Application model accessor logs
2. **Status flipping**: Compare timestamps between Sentinel and SSH updates
3. **Incorrect aggregation**: Check flags and container_statuses
4. **Stale database values**: Check if old_status persists across multiple logs
## Next Steps
After gathering logs, we can:
1. Identify the exact source of "unhealthy" status
2. Determine if it's a default issue, aggregation bug, or timing problem
3. Apply targeted fix based on evidence
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Added Documentation
Created detailed documentation in `.ai/core/application-architecture.md`
explaining the container status monitoring system to prevent future bugs.
## Key Sections
### 1. Container Status Monitoring System Overview
- Explains that status is updated through multiple independent paths
- Emphasizes that ALL paths must be updated when changing status logic
### 2. Critical Implementation Locations
Documents all four status calculation locations:
- **SSH-Based Updates**: `GetContainersStatus.php` (scheduled, every ~1min)
- **Sentinel-Based Updates**: `PushServerUpdateJob.php` (real-time, every ~30sec)
- **Multi-Server Aggregation**: `ComplexStatusCheck.php` (on-demand)
- **Service-Level Aggregation**: `Service.php` (service status)
### 3. Status Flow Diagram
Visual representation of how status flows from different sources to UI
### 4. Status Priority System
Documents the required priority: unhealthy > unknown > healthy
### 5. Excluded Containers
Explains `:excluded` suffix handling and behavior
### 6. Developer Guidelines
- Checklist of all locations to update
- Testing requirements
- Edge cases to handle
### 7. Related Tests
Links to all relevant test files
### 8. Common Bugs to Avoid
Real examples from bugs we've fixed, with solutions
## Why This Documentation Matters
The recent bug (unknown → healthy) happened because:
1. `GetContainersStatus.php` was updated to handle "unknown" status
2. `PushServerUpdateJob.php` was NOT updated
3. This caused periodic status flipping
This documentation ensures future developers (and AI assistants like Claude)
will know to update ALL four locations when modifying status logic.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Problem
Services with "running (unknown)" status were periodically changing
to "running (healthy)" every ~30 seconds when Sentinel pushed updates.
This was confusing for users and inconsistent with SSH-based status checks.
## Root Cause
`PushServerUpdateJob::aggregateMultiContainerStatuses()` was missing
logic to track "unknown" health state. It only tracked "unhealthy" and
defaulted everything else to "healthy".
When Sentinel pushed updates with "running (unknown)" containers:
- The job saw `hasRunning = true` and `hasUnhealthy = false`
- It incorrectly returned "running (healthy)" instead of "running (unknown)"
## Solution
Updated `PushServerUpdateJob` to match the logic in `GetContainersStatus`:
1. Added `$hasUnknown` tracking variable
2. Check for "unknown" in status strings (alongside "unhealthy")
3. Implement 3-way priority: unhealthy > unknown > healthy
This ensures consistency between:
- SSH-based updates (`GetContainersStatus`)
- Sentinel-based updates (`PushServerUpdateJob`)
- UI display logic
## Changes
- **app/Jobs/PushServerUpdateJob.php**: Added unknown status tracking
- **tests/Unit/PushServerUpdateJobStatusAggregationTest.php**: New comprehensive tests
- **tests/Unit/ExcludeFromHealthCheckTest.php**: Updated to match current implementation
## Testing
All 31 status-related unit tests passing:
- 18 tests in ContainerHealthStatusTest
- 8 tests in ExcludeFromHealthCheckTest (updated)
- 6 tests in PushServerUpdateJobStatusAggregationTest (new)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove the github_runner_sources migration that was accidentally
included in the health check status fix branch. This migration
is unrelated to the health check functionality and should be
developed separately.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds:
- Comprehensive docker-compose examples for health check testing
- GitHub runner sources database migration
- UI fix for service heading view
Files Added:
- DOCKER_COMPOSE_EXAMPLES.md - Documentation of health check test cases
- docker-compose.*.yml - Test files for various health check scenarios:
- excluded.yml: Container with exclude_from_hc flag
- healthy.yml: All containers healthy
- unhealthy.yml: All containers unhealthy
- unknown.yml: Container without healthcheck
- mixed-healthy-unknown.yml: Mix of healthy and unknown
- mixed-unhealthy-unknown.yml: Mix of unhealthy and unknown
- database/migrations/2025_11_19_115504_create_github_runner_sources_table.php
Files Modified:
- resources/views/livewire/project/service/heading.blade.php
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes container health status aggregation to correctly handle
unknown health states and edge case container states across all resource types.
Changes:
1. **Preserve Unknown Health State**
- Add three-way priority: unhealthy > unknown > healthy
- Detect containers without healthchecks (null health) as unknown
- Apply across GetContainersStatus, ComplexStatusCheck, and Service models
2. **Handle Edge Case Container States**
- Add support for: created, starting, paused, dead, removing
- Map to appropriate statuses: starting (unknown), paused (unknown), degraded (unhealthy)
- Prevent containers in transitional states from showing incorrect status
3. **Add :excluded Suffix for Excluded Containers**
- Parse exclude_from_hc flag from docker-compose YAML
- Append :excluded suffix to individual container statuses
- Skip :excluded containers in non-excluded aggregation sections
- Strip :excluded suffix in excluded aggregation sections
- Makes it clear in UI which containers are excluded from monitoring
Files Modified:
- app/Actions/Docker/GetContainersStatus.php
- app/Actions/Shared/ComplexStatusCheck.php
- app/Models/Service.php
- tests/Unit/ContainerHealthStatusTest.php
Tests: 18 passed (82 assertions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When all containers are excluded from health checks, display their actual status
with :excluded suffix instead of misleading hardcoded statuses. This prevents
broken UI state with incorrect action buttons and provides clarity that monitoring
is disabled.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
When all services in a Docker Compose file have `exclude_from_hc: true`,
the status aggregation logic was returning invalid states causing broken UI.
**Problems fixed:**
- ComplexStatusCheck returned 'running:healthy' for apps with no monitored containers
- Service model returned ':' (null status) when all services excluded
- UI showed active start/stop buttons for non-running services
**Changes:**
- ComplexStatusCheck: Return 'exited:healthy' when relevantContainerCount is 0
- Service model: Return 'exited:healthy' when both status and health are null
- Added comprehensive unit tests to verify the fixes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Simplifies conductor.json by removing the minio-init container from the docker rm command.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Update CLAUDE.md to reference .ai/ directory as single source of truth
- Move documentation structure to organized .ai/ directory with core/, development/, patterns/, meta/ subdirectories
- Update .ai/README.md with correct path references
- Update .ai/meta/maintaining-docs.md to reflect new structure
- Consolidate sync-guide.md with detailed synchronization rules
- Fix cross-reference in frontend-patterns.md
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Add a separate minio-init container that automatically creates the 'local'
bucket when MinIO starts in development. Mark the seeded S3Storage as usable
by default so developers can use MinIO without manual validation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Create .ai/ directory as single source of truth for all AI docs
- Organize by topic: core/, development/, patterns/, meta/
- Update CLAUDE.md to reference .ai/ files instead of embedding content
- Remove 18KB of duplicated Laravel Boost guidelines from CLAUDE.md
- Fix testing command descriptions (pest runs all tests, not just unit)
- Standardize version numbers (Laravel 12.4.1, PHP 8.4.7, Tailwind 4.1.4)
- Replace all .cursor/rules/*.mdc with single coolify-ai-docs.mdc reference
- Delete dev_workflow.mdc (non-Coolify Task Master content)
- Merge cursor_rules.mdc + self_improve.mdc into maintaining-docs.md
- Update .AI_INSTRUCTIONS_SYNC.md to redirect to new location
Benefits:
- Single source of truth - no more duplication
- Consistent versions across all documentation
- Better organization by topic
- Platform-agnostic .ai/ directory works for all AI tools
- Reduced CLAUDE.md from 719 to ~320 lines
- Clear cross-references between files
Replace dynamic wire:key values that included the full command string
with stable, descriptive identifiers to prevent unnecessary re-renders
and potential issues with special characters.
Changes:
- Line 270: wire:key="preview-{{ $command }}" → "docker-compose-build-preview"
- Line 279: wire:key="start-preview-{{ $command }}" → "docker-compose-start-preview"
Benefits:
- Prevents element recreation on every keystroke
- Avoids issues with special characters in commands
- Better performance with long commands
- Follows Livewire best practices
The computed properties (dockerComposeBuildCommandPreview and
dockerComposeStartCommandPreview) continue to handle reactive updates
automatically, so preview content still updates as expected.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Refine regex pattern to prevent false positives with flags like -foo, -from, -feature
- Change from \S (any non-whitespace) to [.~/]|$ (path characters or end of word)
- Add comprehensive tests for false positive prevention (4 test cases)
- Add path normalization tests for baseDirectory edge cases (6 test cases)
- Add @example documentation to injectDockerComposeFlags function
Prevents incorrect detection of:
- -foo, -from, -feature, -fast as the -f flag
- Ensures -f flag is only detected when followed by path characters or end of word
All 45 tests passing with 135 assertions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix double-slash issue in Docker Compose preview paths when baseDirectory is "/"
- Normalize baseDirectory using rtrim() to prevent path concatenation issues
- Replace hardcoded '/artifacts/build-time.env' with ApplicationDeploymentJob::BUILD_TIME_ENV_PATH
- Make BUILD_TIME_ENV_PATH constant public for reusability
- Add comprehensive unit tests (11 test cases, 25 assertions)
Fixes preview path generation in:
- getDockerComposeBuildCommandPreviewProperty()
- getDockerComposeStartCommandPreviewProperty()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When using a custom Docker Compose build command, environment variables
were being lost because the --env-file flag was not included. This fix
automatically injects the --env-file flag to ensure build-time environment
variables are available during custom builds.
Changes:
- Auto-inject --env-file /artifacts/build-time.env after docker compose
- Respect user-provided --env-file flags (no duplication)
- Append build arguments when not using build secrets
- Update UI helper text to inform users about automatic env injection
- Add comprehensive unit tests (7 test cases, all passing)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Move notification logic from NotifyOutdatedTraefikServersJob into CheckTraefikVersionForServerJob to send immediate notifications when outdated Traefik is detected. This is more suitable for cloud environments with thousands of servers.
Changes:
- CheckTraefikVersionForServerJob now sends notifications immediately after detecting outdated Traefik
- Remove NotifyOutdatedTraefikServersJob (no longer needed)
- Remove delay calculation logic from CheckTraefikVersionJob
- Update tests to reflect new immediate notification pattern
Trade-offs:
- Pro: Faster notifications (immediate alerts)
- Pro: Simpler codebase (removed complex delay calculation)
- Pro: Better scalability for thousands of servers
- Con: Teams may receive multiple notifications if they have many outdated servers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Move buildpack switching cleanup from Livewire component to Application model's boot lifecycle. This improves separation of concerns and ensures cleanup happens consistently regardless of how the buildpack change is triggered. Also clears Dockerfile-specific data when switching away from dockerfile buildpack.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The modal-input component was using inline <style> blocks with ID selectors
to apply width constraints, which had inconsistent specificity and only
applied on lg+ breakpoints. This caused modals to appear full-width instead
of being properly constrained.
Replaced the inline style approach with Tailwind utility classes following
the pattern used in modal-confirmation component:
- Removed inline <style> block with media queries
- Added min-w-full and lg:min-w-[{minWidth}] for responsive minimum width
- Added max-w-[{maxWidth}] and max-h-[calc(100vh-2rem)] for size constraints
This ensures consistent modal sizing across all breakpoints and fixes the
full-width modal issue reported when adding shared environment variables.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When a user restarts the proxy from the Navbar UI component, the system now automatically dispatches a version check job immediately after the restart completes. This provides immediate feedback about available Traefik updates without waiting for the weekly scheduled check.
Changes:
- Import CheckTraefikVersionForServerJob in Navbar component
- After successful proxy restart, dispatch version check for Traefik servers
- Version check only runs for servers using Traefik proxy
This ensures users get up-to-date version information right after restarting their proxy infrastructure.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Move buildpack switching cleanup from Livewire component to Application model's boot lifecycle. This improves separation of concerns and ensures cleanup happens consistently regardless of how the buildpack change is triggered. Also clears Dockerfile-specific data when switching away from dockerfile buildpack.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The modal-input component was using inline <style> blocks with ID selectors
to apply width constraints, which had inconsistent specificity and only
applied on lg+ breakpoints. This caused modals to appear full-width instead
of being properly constrained.
Replaced the inline style approach with Tailwind utility classes following
the pattern used in modal-confirmation component:
- Removed inline <style> block with media queries
- Added min-w-full and lg:min-w-[{minWidth}] for responsive minimum width
- Added max-w-[{maxWidth}] and max-h-[calc(100vh-2rem)] for size constraints
This ensures consistent modal sizing across all breakpoints and fixes the
full-width modal issue reported when adding shared environment variables.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Merged latest changes from the next branch to keep the feature branch
up to date. No conflicts were encountered during the merge.
Changes from next branch:
- Updated application deployment job error logging
- Updated server manager job and instance settings
- Removed PullHelperImageJob in favor of updated approach
- Database migration refinements
- Updated versions.json with latest component versions
All automatic merges were successful and no manual conflict resolution
was required.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit introduces several improvements to the Traefik version tracking
feature and proxy configuration UI:
## Caching Improvements
1. **New centralized helper functions** (bootstrap/helpers/versions.php):
- `get_versions_data()`: Redis-cached access to versions.json (1 hour TTL)
- `get_traefik_versions()`: Extract Traefik versions from cached data
- `invalidate_versions_cache()`: Clear cache when file is updated
2. **Performance optimization**:
- Single Redis cache key: `coolify:versions:all`
- Eliminates 2-4 file reads per page load
- 95-97.5% reduction in disk I/O time
- Shared cache across all servers in distributed setup
3. **Updated all consumers to use cached helpers**:
- CheckTraefikVersionJob: Use get_traefik_versions()
- Server/Proxy: Two-level caching (Redis + in-memory per-request)
- CheckForUpdatesJob: Auto-invalidate cache after updating file
- bootstrap/helpers/shared.php: Use cached data for Coolify version
## UI/UX Improvements
1. **Navbar warning indicator**:
- Added yellow warning triangle icon next to "Proxy" menu item
- Appears when server has outdated Traefik version
- Uses existing traefik_outdated_info data for instant checks
- Provides at-a-glance visibility of version issues
2. **Proxy sidebar persistence**:
- Fixed sidebar disappearing when clicking "Switch Proxy"
- Configuration link now always visible (needed for proxy selection)
- Dynamic Configurations and Logs only show when proxy is configured
- Better navigation context during proxy switching workflow
## Code Quality
- Added comprehensive PHPDoc for Server::$traefik_outdated_info property
- Improved code organization with centralized helper approach
- All changes formatted with Laravel Pint
- Maintains backward compatibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>