Extended the maximum allowed timeout for scheduled tasks from 3600 to 36000 seconds (10 hours). Also passes the configured timeout to instant_remote_process() so the SSH command respects the timeout setting.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Allows user-configured backup timeouts > 3600 to be respected. Previously, the SSH process used a hardcoded 3600 second timeout regardless of the job timeout setting. Now the timeout is passed through to instant_remote_process() for all backup operations.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When Sentinel is enabled and in sync, ServerStorageCheckJob was being
dispatched from two locations causing unnecessary duplication:
1. PushServerUpdateJob (every ~30s with real-time filesystem data)
2. ServerManagerJob (scheduled cron check via SSH)
This commit modifies ServerManagerJob to only dispatch ServerStorageCheckJob
when Sentinel is out of sync or disabled. When Sentinel is active and in sync,
PushServerUpdateJob provides real-time storage data, making the scheduled SSH
check redundant.
Benefits:
- Eliminates duplicate storage checks when Sentinel is working
- Reduces unnecessary SSH overhead
- Storage checks still run as fallback when Sentinel fails
- Maintains scheduled checks for servers without Sentinel
Updated tests to reflect new behavior:
- Storage check NOT dispatched when Sentinel is in sync
- Storage check dispatched when Sentinel is out of sync or disabled
- All timezone and frequency tests updated accordingly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Dispatch ProxyStatusChangedUI event after version check completes so the UI updates in real-time without requiring page refresh.
Changes:
- Add ProxyStatusChangedUI::dispatch() at all exit points in CheckTraefikVersionForServerJob
- Ensures UI refreshes automatically via WebSocket when version check completes
- Works for all scenarios: version detected, using latest tag, outdated version, up-to-date
User experience:
- User restarts proxy
- Warning clears automatically in real-time (no refresh needed)
- Leverages existing WebSocket infrastructure
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add preserveRestarting parameter to ContainerStatusAggregator to allow applications
and service sub-resources to display "Restarting" status instead of being marked as
"Degraded". This gives better visibility into container restart behavior.
- Update ContainerStatusAggregator to accept preserveRestarting parameter (defaults to false)
- Update GetContainersStatus to use preserveRestarting: true for applications and service sub-resources
- Update PushServerUpdateJob to use preserveRestarting: true for applications and service sub-resources
- Add comprehensive documentation explaining the parameter behavior and when to use it
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes#7439 where successful deployments were being marked as FAILED due to exceptions during old container cleanup.
Root cause: Commit 97550f406 wrapped stop_running_container() in try-catch that re-throws ALL exceptions as DeploymentException. When old containers are already removed (a common scenario), the "No such container" error propagates and marks successful deployments as failed.
Solution: Check if deployment has already succeeded (newVersionIsHealthy || force) before re-throwing exceptions from cleanup operations. Cleanup failures are logged but don't fail the deployment.
- Add conditional handling in stop_running_container() catch block
- Log cleanup warnings with hidden: true to avoid UI clutter
- Only re-throw exceptions if deployment hasn't succeeded yet
- Preserves backward compatibility and expected behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Pass the server timezone parameter to shouldRunNow() call at line 127,
ensuring ServerCheckJob dispatch respects the server's local timezone
instead of falling back to the instance default.
This aligns the behavior with other scheduled tasks in the same method:
- ServerStorageCheckJob (line 137)
- ServerPatchCheckJob (line 144)
- Sentinel restart (line 152)
All scheduled tasks in processServerTasks() now consistently use the
server's configured timezone for cron evaluation.
Added unit test to verify timezone-aware cron schedule evaluation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed a critical bug where $this->executionTime was being mutated during
the server processing loop, causing incorrect scheduling calculations for
subsequent servers.
The issue occurred at line 123 where subSeconds() was called directly on
the shared executionTime instance. This caused the baseline time to shift
by waitTime seconds with each server iteration, resulting in compounding
scheduling errors (e.g., 1680 seconds drift over 5 servers).
Changed:
- app/Jobs/ServerManagerJob.php:123
Added .copy() before .subSeconds() to prevent mutation
Added comprehensive unit tests that verify:
- Immutability when using .copy()
- Demonstration of the bug without .copy()
- Correct behavior across multiple iterations
This follows the existing pattern in shouldRunNow() (line 167) and aligns
with other jobs in the codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Server disk usage checks now run on their configured schedule regardless of Sentinel status, eliminating monitoring blind spots when Sentinel is offline, out of sync, or disabled. Storage checks now respect server timezone settings, consistent with patch checks.
Changes:
- Moved server timezone calculation to top of processServerTasks()
- Extracted ServerStorageCheckJob dispatch from Sentinel conditional
- Fixed default frequency to '0 23 * * *' (11 PM daily)
- Added timezone parameter to storage check scheduling
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The regex pattern in injectDockerComposeBuildArgs() was too restrictive
and failed to match `docker compose build servicename` commands. Changed
the lookahead from `(?=\s+(?:--|-)|\s+(?:&&|\|\||;|\|)|$)` to the
simpler `(?=\s|$)` to allow any content after the build command,
including service names with hyphens/underscores and flags.
Also improved the ApplicationDeploymentJob to use the new helper function
and added comprehensive test coverage for service-specific builds.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Changes
- **CheckForUpdatesJob**: Add triple version comparison (CDN vs cache vs running)
- Never allows version downgrade from currently running version
- Uses data_set() for safer nested array mutation
- Prevents incorrect new_version_available flag setting
- **UpdateCoolify**: Add cache validation before fallback
- Validates cache against running version on CDN failure
- Throws exception if cache is corrupted/older than running
- Applies to both manual and automated updates
- **Tests**: Add comprehensive test coverage
- tests/Unit/CheckForUpdatesJobTest.php (5 tests)
- tests/Unit/UpdateCoolifyTest.php (3 tests)
## Impact
- Prevents all downgrade scenarios (CDN rollback, corrupted cache, etc.)
- Maintains backward compatibility
- Provides clear logging for debugging
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Refactors generate_buildtime_environment_variables() to use an associative
array (dictionary) approach instead of sequential push() calls. This prevents
duplicate variable declarations in the buildtime.env file.
**Problem:**
After adding nixpacks plan variables to buildtime.env, the same variable
could appear twice in the file:
- Once from nixpacks plan (e.g., NIXPACKS_NODE_VERSION='22')
- Once from user-defined variables (e.g., NIXPACKS_NODE_VERSION="22")
This caused shell errors and undefined behavior during Docker builds.
**Root Cause:**
The push() method adds items sequentially without checking for duplicate
keys. When a variable existed in both nixpacks plan AND user-defined vars,
both would be written to the file.
**Solution:**
- Use associative array ($envs_dict) for automatic deduplication
- Establish clear override precedence:
1. Nixpacks plan variables (lowest priority)
2. COOLIFY_* variables (medium priority)
3. SERVICE_* variables (medium priority)
4. User-defined variables (highest priority - can override everything)
- Convert to collection format at the end
- Add debug logging when user variables override plan variables
**Benefits:**
- Automatic deduplication (array keys are unique by nature)
- User variables properly override nixpacks plan values
- Clear, explicit precedence order
- No breaking changes to existing functionality
Fixes#7114🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix sudo prefix bug: Use word boundary matching to prevent 'do' keyword from matching 'docker' commands
- Add ensureProxyNetworksExist() helper to create networks before docker compose up
- Ensure networks exist synchronously before dispatching async proxy startup to prevent race conditions
- Update comprehensive unit tests for sudo parsing (50 tests passing)
This resolves issues where Docker commands failed to execute with sudo on non-root servers and where proxy networks were not created before the proxy container started.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add two new application settings to control Docker build cache invalidation:
- inject_build_args_to_dockerfile (default: true) - Skip Dockerfile ARG injection
- include_source_commit_in_build (default: false) - Exclude SOURCE_COMMIT from build context
These toggles let users preserve Docker cache when SOURCE_COMMIT or custom ARGs change frequently. Development-only logging shows which ARGs are being injected for debugging.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove COOLIFY_CONTAINER_NAME from build-time ARGs (timestamp-based, breaks cache)
- Use APP_KEY instead of random_bytes for COOLIFY_BUILD_SECRETS_HASH (deterministic)
- Add forBuildTime parameter to generate_coolify_env_variables() to control injection
- Keep COOLIFY_CONTAINER_NAME available at runtime for container identification
- Fix misleading log message about .env file purpose
Fixes#7040🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Exited containers don't run health checks, so showing "(unhealthy)" is
misleading. This fix ensures exited status displays without health
suffixes across all monitoring systems (SSH, Sentinel, services, etc.)
and at the UI layer for backward compatibility with existing data.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states.
- Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status.
- Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components.
- Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere.
- Updated version number for sentinel to 0.0.17 in versions.json.
This commit addresses container status reporting issues and removes debug logging:
**Primary Fix:**
- Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data
- This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy"
- Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown'
**Service Multi-Container Aggregation:**
- Implemented service container status aggregation (same pattern as applications)
- Added serviceContainerStatuses collection to both Sentinel and SSH paths
- Services now aggregate status using priority: unhealthy > unknown > healthy
- Prevents race conditions where last-processed container would win
**Debug Logging Cleanup:**
- Removed all [STATUS-DEBUG] logging statements (25 total)
- Removed all ray() debugging calls (3 total)
- Removed proof_unknown_preserved and health_status_was_null debug fields
- Code is now production-ready
**Test Coverage:**
- Added 2 new tests for Sentinel default health status behavior
- Added 5 new tests for service aggregation in SSH path
- All 16 tests pass (66 assertions)
**Note:** The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added comprehensive logging to track why applicationContainerStatuses
collection is empty in PushServerUpdateJob.
## Logging Added
### 1. Raw Sentinel Data (line 113-118)
**Logs**: Complete container data received from Sentinel
**Purpose**: See exactly what Sentinel is sending
**Data**: Container count and full container array with all labels
### 2. Container Processing Loop (line 157-163)
**Logs**: Every container as it's being processed
**Purpose**: Track which containers enter the processing loop
**Data**: Container name, status, all labels, coolify.managed flag
### 3. Skipped Containers - Not Managed (line 165-171)
**Logs**: Containers without coolify.managed label
**Purpose**: Identify containers being filtered out early
**Data**: Container name
### 4. Successful Container Addition (line 193-198)
**Logs**: When container is successfully added to applicationContainerStatuses
**Purpose**: Confirm containers ARE being processed
**Data**: Application ID, container name, container status
### 5. Missing com.docker.compose.service Label (line 200-206)
**Logs**: Containers skipped due to missing com.docker.compose.service
**Purpose**: Identify the most likely root cause
**Data**: Container name, application ID, all labels
## Why This Matters
User reported applicationContainerStatuses is empty (`[]`) even though
Sentinel is pushing updates. This logging will reveal:
1. Is Sentinel sending containers at all?
2. Are containers filtered by coolify.managed check?
3. Is com.docker.compose.service label missing? (most likely)
4. What labels IS Sentinel actually sending?
## Expected Findings
Based on investigation, the issue is likely:
- Sentinel is NOT sending com.docker.compose.service in labels
- Or Sentinel uses a different label format/name
- Containers pass all other checks but fail on line 190-206
## Next Steps
After logs appear, we'll see exactly which filter is blocking containers
and can fix the root cause (likely need to extract com.docker.compose.service
from container name or use a different label source).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added detailed debug logging to all status update paths to help
diagnose why "unhealthy" status appears in the UI.
## Logging Added
### 1. PushServerUpdateJob (Sentinel updates)
**Location**: Lines 303-315
**Logs**: Status changes from Sentinel push updates
**Data tracked**:
- Old vs new status
- Container statuses that led to aggregation
- Status flags (hasRunning, hasUnhealthy, hasUnknown)
### 2. GetContainersStatus (SSH updates)
**Location**: Lines 441-449, 346-354, 358-365
**Logs**: Status changes from SSH-based checks
**Scenarios**:
- Normal status aggregation
- Recently restarted containers (kept as degraded)
- Applications not running (set to exited)
**Data tracked**:
- Old vs new status
- Container statuses
- Restart count and timing
- Whether containers exist
### 3. Application Model Status Accessor
**Location**: Lines 706-712, 726-732
**Logs**: When status is set without explicit health information
**Issue**: Highlights cases where health defaults to "unhealthy"
**Data tracked**:
- Raw value passed to setter
- Final result after default applied
## How to Use
### Enable Debug Logging
Edit `.env` or `config/logging.php` to set log level to debug:
```
LOG_LEVEL=debug
```
### Monitor Logs
```bash
tail -f storage/logs/laravel.log | grep STATUS-DEBUG
```
### Log Format
All logs use `[STATUS-DEBUG]` prefix for easy filtering:
```
[2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change
{
"source": "PushServerUpdateJob",
"app_id": 123,
"app_name": "my-app",
"old_status": "running:unknown",
"new_status": "running:healthy",
"container_statuses": [...],
"flags": {...}
}
```
## What to Look For
1. **Default to unhealthy**: Check Application model accessor logs
2. **Status flipping**: Compare timestamps between Sentinel and SSH updates
3. **Incorrect aggregation**: Check flags and container_statuses
4. **Stale database values**: Check if old_status persists across multiple logs
## Next Steps
After gathering logs, we can:
1. Identify the exact source of "unhealthy" status
2. Determine if it's a default issue, aggregation bug, or timing problem
3. Apply targeted fix based on evidence
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Problem
Services with "running (unknown)" status were periodically changing
to "running (healthy)" every ~30 seconds when Sentinel pushed updates.
This was confusing for users and inconsistent with SSH-based status checks.
## Root Cause
`PushServerUpdateJob::aggregateMultiContainerStatuses()` was missing
logic to track "unknown" health state. It only tracked "unhealthy" and
defaulted everything else to "healthy".
When Sentinel pushed updates with "running (unknown)" containers:
- The job saw `hasRunning = true` and `hasUnhealthy = false`
- It incorrectly returned "running (healthy)" instead of "running (unknown)"
## Solution
Updated `PushServerUpdateJob` to match the logic in `GetContainersStatus`:
1. Added `$hasUnknown` tracking variable
2. Check for "unknown" in status strings (alongside "unhealthy")
3. Implement 3-way priority: unhealthy > unknown > healthy
This ensures consistency between:
- SSH-based updates (`GetContainersStatus`)
- Sentinel-based updates (`PushServerUpdateJob`)
- UI display logic
## Changes
- **app/Jobs/PushServerUpdateJob.php**: Added unknown status tracking
- **tests/Unit/PushServerUpdateJobStatusAggregationTest.php**: New comprehensive tests
- **tests/Unit/ExcludeFromHealthCheckTest.php**: Updated to match current implementation
## Testing
All 31 status-related unit tests passing:
- 18 tests in ContainerHealthStatusTest
- 8 tests in ExcludeFromHealthCheckTest (updated)
- 6 tests in PushServerUpdateJobStatusAggregationTest (new)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix double-slash issue in Docker Compose preview paths when baseDirectory is "/"
- Normalize baseDirectory using rtrim() to prevent path concatenation issues
- Replace hardcoded '/artifacts/build-time.env' with ApplicationDeploymentJob::BUILD_TIME_ENV_PATH
- Make BUILD_TIME_ENV_PATH constant public for reusability
- Add comprehensive unit tests (11 test cases, 25 assertions)
Fixes preview path generation in:
- getDockerComposeBuildCommandPreviewProperty()
- getDockerComposeStartCommandPreviewProperty()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When using a custom Docker Compose build command, environment variables
were being lost because the --env-file flag was not included. This fix
automatically injects the --env-file flag to ensure build-time environment
variables are available during custom builds.
Changes:
- Auto-inject --env-file /artifacts/build-time.env after docker compose
- Respect user-provided --env-file flags (no duplication)
- Append build arguments when not using build secrets
- Update UI helper text to inform users about automatic env injection
- Add comprehensive unit tests (7 test cases, all passing)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Move notification logic from NotifyOutdatedTraefikServersJob into CheckTraefikVersionForServerJob to send immediate notifications when outdated Traefik is detected. This is more suitable for cloud environments with thousands of servers.
Changes:
- CheckTraefikVersionForServerJob now sends notifications immediately after detecting outdated Traefik
- Remove NotifyOutdatedTraefikServersJob (no longer needed)
- Remove delay calculation logic from CheckTraefikVersionJob
- Update tests to reflect new immediate notification pattern
Trade-offs:
- Pro: Faster notifications (immediate alerts)
- Pro: Simpler codebase (removed complex delay calculation)
- Pro: Better scalability for thousands of servers
- Con: Teams may receive multiple notifications if they have many outdated servers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit introduces several improvements to the Traefik version tracking
feature and proxy configuration UI:
## Caching Improvements
1. **New centralized helper functions** (bootstrap/helpers/versions.php):
- `get_versions_data()`: Redis-cached access to versions.json (1 hour TTL)
- `get_traefik_versions()`: Extract Traefik versions from cached data
- `invalidate_versions_cache()`: Clear cache when file is updated
2. **Performance optimization**:
- Single Redis cache key: `coolify:versions:all`
- Eliminates 2-4 file reads per page load
- 95-97.5% reduction in disk I/O time
- Shared cache across all servers in distributed setup
3. **Updated all consumers to use cached helpers**:
- CheckTraefikVersionJob: Use get_traefik_versions()
- Server/Proxy: Two-level caching (Redis + in-memory per-request)
- CheckForUpdatesJob: Auto-invalidate cache after updating file
- bootstrap/helpers/shared.php: Use cached data for Coolify version
## UI/UX Improvements
1. **Navbar warning indicator**:
- Added yellow warning triangle icon next to "Proxy" menu item
- Appears when server has outdated Traefik version
- Uses existing traefik_outdated_info data for instant checks
- Provides at-a-glance visibility of version issues
2. **Proxy sidebar persistence**:
- Fixed sidebar disappearing when clicking "Switch Proxy"
- Configuration link now always visible (needed for proxy selection)
- Dynamic Configurations and Logs only show when proxy is configured
- Better navigation context during proxy switching workflow
## Code Quality
- Added comprehensive PHPDoc for Server::$traefik_outdated_info property
- Improved code organization with centralized helper approach
- All changes formatted with Laravel Pint
- Maintains backward compatibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Store both patch update and newer minor version information simultaneously
- Display patch update availability alongside minor version upgrades in notifications
- Add newer_branch_target and newer_branch_latest fields to traefik_outdated_info
- Update all notification channels (Discord, Telegram, Slack, Pushover, Email, Webhook)
- Show minor version in format (e.g., v3.6) for upgrade targets instead of patch version
- Enhance UI callouts with clearer messaging about available upgrades
- Remove verbose logging in favor of cleaner code structure
- Handle edge case where SSH command returns empty response
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Addresses critical performance issues identified in code review by refactoring the monolithic CheckTraefikVersionJob into a distributed architecture with parallel processing.
Changes:
- Split version checking into CheckTraefikVersionForServerJob for parallel execution
- Extract notification logic into NotifyOutdatedTraefikServersJob
- Dispatch individual server checks concurrently to handle thousands of servers
- Add comprehensive unit tests for the new job architecture
- Update feature tests to cover the refactored workflow
Performance improvements:
- Sequential SSH calls replaced with parallel queue jobs
- Scales efficiently for large installations with thousands of servers
- Reduces job execution time from hours to minutes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes a critical N+1 query issue in CheckTraefikVersionJob
that was loading ALL proxy servers into memory then filtering in PHP,
causing potential OOM errors with thousands of servers.
Changes:
- Added scopeWhereProxyType() query scope to Server model for
database-level filtering using JSON column arrow notation
- Updated CheckTraefikVersionJob to use new scope instead of
collection filter, moving proxy type filtering into the SQL query
- Added comprehensive unit tests for the new query scope
Performance impact:
- Before: SELECT * FROM servers WHERE proxy IS NOT NULL (all servers)
- After: SELECT * FROM servers WHERE proxy->>'type' = 'TRAEFIK' (filtered)
- Eliminates memory overhead of loading non-Traefik servers
- Critical for cloud instances with thousands of connected servers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>