- Add missing traefik_outdated_webhook_notifications field to migration schema and population logic
- Remove incorrect docker_cleanup_webhook_notifications from model (split into success/failure variants)
- Consolidate webhook notification migrations from 2025_10_10 to 2025_11_25 for proper execution order
- Ensure all 15 notification fields are properly defined and consistent across migration, model, and Livewire component
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Add path attribute mutator to S3Storage model ensuring paths start with /
- Add updatedS3Path hook to normalize path and reset validation state on blur
- Add updatedS3StorageId hook to reset validation state when storage changes
- Add Enter key support to trigger file check from path input
- Use wire:model.live for S3 storage select, wire:model.blur for path input
- Improve shell escaping in restore job cleanup commands
- Fix isSafeTmpPath helper logic for directory validation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace hardcoded URL paths in getScopeUrl() with Laravel's route() helper
- Add scopeUrls property to EnvVarInput component with named routes
- Pass projectUuid and environmentUuid to enable context-specific environment links
- Environment scope link now navigates to the specific project/environment shared variables page
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Exited containers don't run health checks, so showing "(unhealthy)" is
misleading. This fix ensures exited status displays without health
suffixes across all monitoring systems (SSH, Sentinel, services, etc.)
and at the UI layer for backward compatibility with existing data.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit enhances the boarding flow to handle prerequisite installation asynchronously with proper retry logic and user feedback:
- Add retry mechanism with max 3 attempts for prerequisite installation
- Display live installation logs via ActivityMonitor during boarding
- Reset ActivityMonitor state when starting new activity to prevent stale event dispatching
- Support dynamic header updates in ActivityMonitor
- Add prerequisitesInstalled event handler to revalidate after installation completes
- Extract validation logic into continueValidation() method for cleaner flow
- Add unit tests for prerequisite installation logic
This improves UX by showing users real-time progress during prerequisite installation and handles installation failures gracefully with automatic retries.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Parse template variables directly instead of generating from container names. Always create both SERVICE_URL and SERVICE_FQDN pairs together. Properly separate scheme handling (URL has scheme, FQDN doesn't). Add comprehensive test coverage.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds comprehensive validation improvements and DRY principles for handling Coolify's custom Docker Compose extensions.
## Changes
### 1. Created Reusable stripCoolifyCustomFields() Function
- Added shared helper in bootstrap/helpers/docker.php
- Removes all Coolify custom fields (exclude_from_hc, content, isDirectory, is_directory)
- Handles both long syntax (arrays) and short syntax (strings) for volumes
- Well-documented with comprehensive docblock
- Follows DRY principle for consistent field stripping
### 2. Fixed Docker Compose Modal Validation
- Updated validateComposeFile() to use stripCoolifyCustomFields()
- Now removes ALL custom fields before Docker validation (previously only removed content)
- Fixes validation errors when using templates with custom fields (e.g., traccar.yaml)
- Users can now validate compose files with Coolify extensions in UI
### 3. Enhanced YAML Validation in CalculatesExcludedStatus
- Added proper exception handling with ParseException vs generic Exception
- Added structure validation (checks if parsed result and services are arrays)
- Comprehensive logging with context (error message, line number, snippet)
- Maintains safe fallback behavior (returns empty collection on error)
### 4. Added Integer Validation to ContainerStatusAggregator
- Validates maxRestartCount parameter in both aggregateFromStrings() and aggregateFromContainers()
- Corrects negative values to 0 with warning log
- Logs warnings for suspiciously high values (> 1000)
- Prevents logic errors in crash loop detection
### 5. Comprehensive Unit Tests
- tests/Unit/StripCoolifyCustomFieldsTest.php (NEW) - 9 tests, 43 assertions
- tests/Unit/ContainerStatusAggregatorTest.php - Added 6 tests for integer validation
- tests/Unit/ExcludeFromHealthCheckTest.php - Added 4 tests for YAML validation
- All tests passing with proper Log facade mocking
### 6. Documentation
- Added comprehensive Docker Compose extensions documentation to .ai/core/deployment-architecture.md
- Documents all custom fields: exclude_from_hc, content, isDirectory/is_directory
- Includes examples, use cases, implementation details, and test references
- Updated .ai/README.md with navigation links to new documentation
## Benefits
- Better UX: Users can validate compose files with custom fields
- Better Debugging: Comprehensive logging for errors
- Better Code Quality: DRY principle with reusable validation
- Better Reliability: Prevents logic errors from invalid parameters
- Better Maintainability: Easy to add new custom fields in future
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states.
- Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status.
- Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components.
- Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere.
- Updated version number for sentinel to 0.0.17 in versions.json.
Prevents removal and re-download of database images on every restart. Docker cleanup was removing Docker Hub images (postgres, mysql, redis, etc.) that lack the coolify.managed=true label, causing them to be immediately re-pulled. Restart now preserves images while stopping/starting containers.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes inconsistency where Service model used manual state machine logic while
all other components (Application, ComplexStatusCheck, GetContainersStatus)
use the centralized ContainerStatusAggregator service.
Changes:
- Refactored Service::aggregateResourceStatuses() to use ContainerStatusAggregator
- Removed ~60 lines of duplicated state machine logic
- Added comprehensive ServiceExcludedStatusTest with 24 test cases
- Fixed bugs in old logic where paused/starting containers were incorrectly
marked as unhealthy (should be unknown)
Benefits:
- Single source of truth for status aggregation across all models
- Leverages 42 existing ContainerStatusAggregator tests
- Consistent behavior between Service and Application/Database models
- Easier maintenance (state machine changes only in one place)
All tests pass (37 total):
- ServiceExcludedStatusTest: 24/24 passed
- AllExcludedContainersConsistencyTest: 13/13 passed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit addresses container status reporting issues and removes debug logging:
**Primary Fix:**
- Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data
- This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy"
- Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown'
**Service Multi-Container Aggregation:**
- Implemented service container status aggregation (same pattern as applications)
- Added serviceContainerStatuses collection to both Sentinel and SSH paths
- Services now aggregate status using priority: unhealthy > unknown > healthy
- Prevents race conditions where last-processed container would win
**Debug Logging Cleanup:**
- Removed all [STATUS-DEBUG] logging statements (25 total)
- Removed all ray() debugging calls (3 total)
- Removed proof_unknown_preserved and health_status_was_null debug fields
- Code is now production-ready
**Test Coverage:**
- Added 2 new tests for Sentinel default health status behavior
- Added 5 new tests for service aggregation in SSH path
- All 16 tests pass (66 assertions)
**Note:** The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added comprehensive logging to track why applicationContainerStatuses
collection is empty in PushServerUpdateJob.
## Logging Added
### 1. Raw Sentinel Data (line 113-118)
**Logs**: Complete container data received from Sentinel
**Purpose**: See exactly what Sentinel is sending
**Data**: Container count and full container array with all labels
### 2. Container Processing Loop (line 157-163)
**Logs**: Every container as it's being processed
**Purpose**: Track which containers enter the processing loop
**Data**: Container name, status, all labels, coolify.managed flag
### 3. Skipped Containers - Not Managed (line 165-171)
**Logs**: Containers without coolify.managed label
**Purpose**: Identify containers being filtered out early
**Data**: Container name
### 4. Successful Container Addition (line 193-198)
**Logs**: When container is successfully added to applicationContainerStatuses
**Purpose**: Confirm containers ARE being processed
**Data**: Application ID, container name, container status
### 5. Missing com.docker.compose.service Label (line 200-206)
**Logs**: Containers skipped due to missing com.docker.compose.service
**Purpose**: Identify the most likely root cause
**Data**: Container name, application ID, all labels
## Why This Matters
User reported applicationContainerStatuses is empty (`[]`) even though
Sentinel is pushing updates. This logging will reveal:
1. Is Sentinel sending containers at all?
2. Are containers filtered by coolify.managed check?
3. Is com.docker.compose.service label missing? (most likely)
4. What labels IS Sentinel actually sending?
## Expected Findings
Based on investigation, the issue is likely:
- Sentinel is NOT sending com.docker.compose.service in labels
- Or Sentinel uses a different label format/name
- Containers pass all other checks but fail on line 190-206
## Next Steps
After logs appear, we'll see exactly which filter is blocking containers
and can fix the root cause (likely need to extract com.docker.compose.service
from container name or use a different label source).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added detailed debug logging to all status update paths to help
diagnose why "unhealthy" status appears in the UI.
## Logging Added
### 1. PushServerUpdateJob (Sentinel updates)
**Location**: Lines 303-315
**Logs**: Status changes from Sentinel push updates
**Data tracked**:
- Old vs new status
- Container statuses that led to aggregation
- Status flags (hasRunning, hasUnhealthy, hasUnknown)
### 2. GetContainersStatus (SSH updates)
**Location**: Lines 441-449, 346-354, 358-365
**Logs**: Status changes from SSH-based checks
**Scenarios**:
- Normal status aggregation
- Recently restarted containers (kept as degraded)
- Applications not running (set to exited)
**Data tracked**:
- Old vs new status
- Container statuses
- Restart count and timing
- Whether containers exist
### 3. Application Model Status Accessor
**Location**: Lines 706-712, 726-732
**Logs**: When status is set without explicit health information
**Issue**: Highlights cases where health defaults to "unhealthy"
**Data tracked**:
- Raw value passed to setter
- Final result after default applied
## How to Use
### Enable Debug Logging
Edit `.env` or `config/logging.php` to set log level to debug:
```
LOG_LEVEL=debug
```
### Monitor Logs
```bash
tail -f storage/logs/laravel.log | grep STATUS-DEBUG
```
### Log Format
All logs use `[STATUS-DEBUG]` prefix for easy filtering:
```
[2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change
{
"source": "PushServerUpdateJob",
"app_id": 123,
"app_name": "my-app",
"old_status": "running:unknown",
"new_status": "running:healthy",
"container_statuses": [...],
"flags": {...}
}
```
## What to Look For
1. **Default to unhealthy**: Check Application model accessor logs
2. **Status flipping**: Compare timestamps between Sentinel and SSH updates
3. **Incorrect aggregation**: Check flags and container_statuses
4. **Stale database values**: Check if old_status persists across multiple logs
## Next Steps
After gathering logs, we can:
1. Identify the exact source of "unhealthy" status
2. Determine if it's a default issue, aggregation bug, or timing problem
3. Apply targeted fix based on evidence
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Problem
Services with "running (unknown)" status were periodically changing
to "running (healthy)" every ~30 seconds when Sentinel pushed updates.
This was confusing for users and inconsistent with SSH-based status checks.
## Root Cause
`PushServerUpdateJob::aggregateMultiContainerStatuses()` was missing
logic to track "unknown" health state. It only tracked "unhealthy" and
defaulted everything else to "healthy".
When Sentinel pushed updates with "running (unknown)" containers:
- The job saw `hasRunning = true` and `hasUnhealthy = false`
- It incorrectly returned "running (healthy)" instead of "running (unknown)"
## Solution
Updated `PushServerUpdateJob` to match the logic in `GetContainersStatus`:
1. Added `$hasUnknown` tracking variable
2. Check for "unknown" in status strings (alongside "unhealthy")
3. Implement 3-way priority: unhealthy > unknown > healthy
This ensures consistency between:
- SSH-based updates (`GetContainersStatus`)
- Sentinel-based updates (`PushServerUpdateJob`)
- UI display logic
## Changes
- **app/Jobs/PushServerUpdateJob.php**: Added unknown status tracking
- **tests/Unit/PushServerUpdateJobStatusAggregationTest.php**: New comprehensive tests
- **tests/Unit/ExcludeFromHealthCheckTest.php**: Updated to match current implementation
## Testing
All 31 status-related unit tests passing:
- 18 tests in ContainerHealthStatusTest
- 8 tests in ExcludeFromHealthCheckTest (updated)
- 6 tests in PushServerUpdateJobStatusAggregationTest (new)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes container health status aggregation to correctly handle
unknown health states and edge case container states across all resource types.
Changes:
1. **Preserve Unknown Health State**
- Add three-way priority: unhealthy > unknown > healthy
- Detect containers without healthchecks (null health) as unknown
- Apply across GetContainersStatus, ComplexStatusCheck, and Service models
2. **Handle Edge Case Container States**
- Add support for: created, starting, paused, dead, removing
- Map to appropriate statuses: starting (unknown), paused (unknown), degraded (unhealthy)
- Prevent containers in transitional states from showing incorrect status
3. **Add :excluded Suffix for Excluded Containers**
- Parse exclude_from_hc flag from docker-compose YAML
- Append :excluded suffix to individual container statuses
- Skip :excluded containers in non-excluded aggregation sections
- Strip :excluded suffix in excluded aggregation sections
- Makes it clear in UI which containers are excluded from monitoring
Files Modified:
- app/Actions/Docker/GetContainersStatus.php
- app/Actions/Shared/ComplexStatusCheck.php
- app/Models/Service.php
- tests/Unit/ContainerHealthStatusTest.php
Tests: 18 passed (82 assertions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When all containers are excluded from health checks, display their actual status
with :excluded suffix instead of misleading hardcoded statuses. This prevents
broken UI state with incorrect action buttons and provides clarity that monitoring
is disabled.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Adds a new EnvVarInput component that provides autocomplete suggestions for shared environment variables from team, project, and environment scopes. Users can reference variables using {{ syntax.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
When all services in a Docker Compose file have `exclude_from_hc: true`,
the status aggregation logic was returning invalid states causing broken UI.
**Problems fixed:**
- ComplexStatusCheck returned 'running:healthy' for apps with no monitored containers
- Service model returned ':' (null status) when all services excluded
- UI showed active start/stop buttons for non-running services
**Changes:**
- ComplexStatusCheck: Return 'exited:healthy' when relevantContainerCount is 0
- Service model: Return 'exited:healthy' when both status and health are null
- Added comprehensive unit tests to verify the fixes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix double-slash issue in Docker Compose preview paths when baseDirectory is "/"
- Normalize baseDirectory using rtrim() to prevent path concatenation issues
- Replace hardcoded '/artifacts/build-time.env' with ApplicationDeploymentJob::BUILD_TIME_ENV_PATH
- Make BUILD_TIME_ENV_PATH constant public for reusability
- Add comprehensive unit tests (11 test cases, 25 assertions)
Fixes preview path generation in:
- getDockerComposeBuildCommandPreviewProperty()
- getDockerComposeStartCommandPreviewProperty()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When using a custom Docker Compose build command, environment variables
were being lost because the --env-file flag was not included. This fix
automatically injects the --env-file flag to ensure build-time environment
variables are available during custom builds.
Changes:
- Auto-inject --env-file /artifacts/build-time.env after docker compose
- Respect user-provided --env-file flags (no duplication)
- Append build arguments when not using build secrets
- Update UI helper text to inform users about automatic env injection
- Add comprehensive unit tests (7 test cases, all passing)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Move notification logic from NotifyOutdatedTraefikServersJob into CheckTraefikVersionForServerJob to send immediate notifications when outdated Traefik is detected. This is more suitable for cloud environments with thousands of servers.
Changes:
- CheckTraefikVersionForServerJob now sends notifications immediately after detecting outdated Traefik
- Remove NotifyOutdatedTraefikServersJob (no longer needed)
- Remove delay calculation logic from CheckTraefikVersionJob
- Update tests to reflect new immediate notification pattern
Trade-offs:
- Pro: Faster notifications (immediate alerts)
- Pro: Simpler codebase (removed complex delay calculation)
- Pro: Better scalability for thousands of servers
- Con: Teams may receive multiple notifications if they have many outdated servers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When a user restarts the proxy from the Navbar UI component, the system now automatically dispatches a version check job immediately after the restart completes. This provides immediate feedback about available Traefik updates without waiting for the weekly scheduled check.
Changes:
- Import CheckTraefikVersionForServerJob in Navbar component
- After successful proxy restart, dispatch version check for Traefik servers
- Version check only runs for servers using Traefik proxy
This ensures users get up-to-date version information right after restarting their proxy infrastructure.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit introduces several improvements to the Traefik version tracking
feature and proxy configuration UI:
## Caching Improvements
1. **New centralized helper functions** (bootstrap/helpers/versions.php):
- `get_versions_data()`: Redis-cached access to versions.json (1 hour TTL)
- `get_traefik_versions()`: Extract Traefik versions from cached data
- `invalidate_versions_cache()`: Clear cache when file is updated
2. **Performance optimization**:
- Single Redis cache key: `coolify:versions:all`
- Eliminates 2-4 file reads per page load
- 95-97.5% reduction in disk I/O time
- Shared cache across all servers in distributed setup
3. **Updated all consumers to use cached helpers**:
- CheckTraefikVersionJob: Use get_traefik_versions()
- Server/Proxy: Two-level caching (Redis + in-memory per-request)
- CheckForUpdatesJob: Auto-invalidate cache after updating file
- bootstrap/helpers/shared.php: Use cached data for Coolify version
## UI/UX Improvements
1. **Navbar warning indicator**:
- Added yellow warning triangle icon next to "Proxy" menu item
- Appears when server has outdated Traefik version
- Uses existing traefik_outdated_info data for instant checks
- Provides at-a-glance visibility of version issues
2. **Proxy sidebar persistence**:
- Fixed sidebar disappearing when clicking "Switch Proxy"
- Configuration link now always visible (needed for proxy selection)
- Dynamic Configurations and Logs only show when proxy is configured
- Better navigation context during proxy switching workflow
## Code Quality
- Added comprehensive PHPDoc for Server::$traefik_outdated_info property
- Improved code organization with centralized helper approach
- All changes formatted with Laravel Pint
- Maintains backward compatibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Store both patch update and newer minor version information simultaneously
- Display patch update availability alongside minor version upgrades in notifications
- Add newer_branch_target and newer_branch_latest fields to traefik_outdated_info
- Update all notification channels (Discord, Telegram, Slack, Pushover, Email, Webhook)
- Show minor version in format (e.g., v3.6) for upgrade targets instead of patch version
- Enhance UI callouts with clearer messaging about available upgrades
- Remove verbose logging in favor of cleaner code structure
- Handle edge case where SSH command returns empty response
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Addresses critical performance issues identified in code review by refactoring the monolithic CheckTraefikVersionJob into a distributed architecture with parallel processing.
Changes:
- Split version checking into CheckTraefikVersionForServerJob for parallel execution
- Extract notification logic into NotifyOutdatedTraefikServersJob
- Dispatch individual server checks concurrently to handle thousands of servers
- Add comprehensive unit tests for the new job architecture
- Update feature tests to cover the refactored workflow
Performance improvements:
- Sequential SSH calls replaced with parallel queue jobs
- Scales efficiently for large installations with thousands of servers
- Reduces job execution time from hours to minutes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add wait loops to ensure containers are fully removed before restarting.
This fixes race conditions where docker compose would fail because an
existing container was still being cleaned up.
Changes:
- StartProxy: Add explicit stop, wait loop before docker compose up
- StopProxy: Add wait loop after container removal
- Both actions now poll up to 10 seconds for complete removal
- Add error suppression to handle non-existent containers gracefully
Tests:
- Add StartProxyTest.php with 3 tests for cleanup logic
- Add StopProxyTest.php with 4 tests for stop behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes a critical N+1 query issue in CheckTraefikVersionJob
that was loading ALL proxy servers into memory then filtering in PHP,
causing potential OOM errors with thousands of servers.
Changes:
- Added scopeWhereProxyType() query scope to Server model for
database-level filtering using JSON column arrow notation
- Updated CheckTraefikVersionJob to use new scope instead of
collection filter, moving proxy type filtering into the SQL query
- Added comprehensive unit tests for the new query scope
Performance impact:
- Before: SELECT * FROM servers WHERE proxy IS NOT NULL (all servers)
- After: SELECT * FROM servers WHERE proxy->>'type' = 'TRAEFIK' (filtered)
- Eliminates memory overhead of loading non-Traefik servers
- Critical for cloud instances with thousands of connected servers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add automated Traefik version checking job running weekly on Sundays
- Implement version detection from running containers and comparison with versions.json
- Add notifications across all channels (Email, Discord, Slack, Telegram, Pushover, Webhook) for outdated versions
- Create dismissible callout component with localStorage persistence
- Display cross-branch upgrade warnings (e.g., v3.5 -> v3.6) with changelog links
- Show patch update notifications within same branch
- Add warning icon that appears when callouts are dismissed
- Prevent duplicate notifications during proxy restart by adding restarting parameter
- Fix notification spam with transition-based logic for status changes
- Enable system email settings by default in development mode
- Track last saved/applied proxy settings to detect configuration drift