Prevent double-escaping of COMPOSER_AUTH and other JSON environment variables
by detecting valid JSON objects/arrays in realValue() and skipping quote
escaping entirely. This fixes broken JSON values passed to runtime services
while maintaining proper escaping for non-JSON values.
- Add JSON detection before escaping logic in EnvironmentVariable::realValue()
- JSON objects/arrays pass through unmodified, avoiding quote corruption
- Add comprehensive test coverage for JSON vs non-JSON escaping behavior
Wraps inline chart initialization scripts in IIFEs to create local scope for variables. This prevents "Identifier has already been declared" errors when Livewire's SPA navigation re-executes scripts, allowing smooth navigation between metrics pages without page refresh.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Wrap return values in collect() to maintain Collection compatibility
- Add comment explaining threshold <= 2 prevents division by zero
- Refactor tests to use actual Server model method via reflection
- Use seeded mt_rand() for reproducible test results
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement the Largest-Triangle-Three-Buckets (LTTB) algorithm to downsample
metrics data for large time intervals (30 days generates 260K-500K+ points).
Reduces rendered points to ~1000 while preserving visual accuracy of peaks
and valleys. Fixes unresponsive page when selecting 30-day metrics interval.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit introduces advanced environment variable handling capabilities including:
- Nested environment variable resolution with circular dependency detection
- Extraction of hardcoded environment variables from docker-compose.yml
- New ShowHardcoded Livewire component for displaying detected variables
- Enhanced UI for better environment variable management
The changes improve the user experience by automatically detecting and displaying
environment variables that are hardcoded in docker-compose files, allowing users
to override them if needed. The nested variable resolution ensures complex variable
dependencies are properly handled.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add comment field to EnvironmentVariable model and database
- Update parseEnvFormatToArray to extract inline comments from env files
- Update Livewire components to handle comment field
- Add UI for displaying and editing comments
- Add tests for comment parsing functionality
Add missing edge case test for root path (/) and expand test coverage
for preview FQDN generation. Tests verify that ports and paths are
correctly preserved in preview URLs while excluding root paths.
Fixes#2184
Consolidate all PII/secret sanitization into remove_iip() to protect real-time logs in addition to exported logs. Add detection for GitHub tokens (ghp_, gho_, ghu_, ghs_, ghr_), GitLab tokens (glpat-, glcbt-, glrt-), AWS credentials (AKIA/ABIA/ACCA/ASIA access keys and secret keys), and generic URL passwords for FTP, SSH, AMQP, LDAP, and S3 protocols.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add a copy button to individual container logs that strips sensitive
data before copying to clipboard. Includes sanitization for emails,
database URLs with passwords, JWT tokens, API keys, private key blocks,
and git access tokens.
Allow manually-added servers to be linked to Hetzner Cloud instances by
matching IP address. Once linked, servers gain power controls and status
monitoring.
Changes:
- Add getServers() and findServerByIp() methods to HetznerService
- Add Hetzner linking UI section to Server General page
- Add unit tests for new HetznerService methods
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fixed isReadOnlyVolume() to detect both short-form (:ro) and long-form (read_only: true) Docker Compose volume syntax
- Fixed path matching to use mount_path only (fs_path is transformed during parsing from ./file to absolute path)
- Added "Load from server" button for read-only volumes to allow users to refresh content
- Changed loadStorageOnServer() authorization from 'update' to 'view' since loading is a read operation
- Added helper text to Content field warning users that content may be outdated
- Applied fixes to both LocalFileVolume and LocalPersistentVolume models
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Current version of infrastructure images (coolify-helper, coolify-realtime) are now protected from deletion during docker cleanup, regardless of which registry they're pulled from (ghcr.io, docker.io, or Docker Hub implicit). Old versions continue to be cleaned up as intended.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Adds a new Laravel validation rule to prevent path traversal, hidden files, and invalid filenames in the dynamic proxy configuration feature. Validates filenames to ensure they contain only safe characters, don't exceed filesystem limits, and don't use reserved names.
- New Rule: ValidProxyConfigFilename with comprehensive validation
- Updated: NewDynamicConfiguration to use the new rule
- Added: 13 unit tests covering all validation scenarios
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The {{port}} template variable was undocumented and caused a double port bug
when used in preview URL templates. Since ports are always appended to the final
URL anyway, we remove {{port}} substitution entirely and ensure consistent port
handling across ApplicationPreview, PreviewsCompose, and the applicationParser helper.
Also fix PreviewsCompose.php which wasn't preserving ports at all, and improve
the Blade template formatting in previews-compose.blade.php.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Allows API consumers to control domain auto-generation behavior. When autogenerate_domain is true (default) and no custom domains are provided, the system auto-generates a domain using the server's wildcard domain or sslip.io fallback.
- Add autogenerate_domain parameter to all 5 application creation endpoints
- Add validation and allowlist rules
- Implement domain auto-generation logic across all application types
- Add comprehensive unit tests for the feature
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
For Docker Compose applications with build directives, inject commit-based
image tags (uuid_servicename:commit) to enable rollback functionality.
Previously these services always used 'latest' tags, making rollback impossible.
- Only injects tags for services with build: but no explicit image:
- Uses pr-{id} tags for pull request deployments
- Respects user-defined image: fields (preserves user intent)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Support for Docker Compose applications with build: directives that create
images with uuid_servicename naming pattern (e.g., app-uuid_web:commit).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implement a per-application setting (`docker_images_to_keep`) in `application_settings` table to control how many Docker images are preserved during cleanup. The cleanup process now:
- Respects per-application retention settings (default: 2 images)
- Preserves the N most recent images per application for easy rollback
- Always deletes PR images and keeps the currently running image
- Dynamically excludes application images from general Docker image prune
- Cleans up non-Coolify unused images to prevent disk bloat
Fixes issues where cleanup would delete all images needed for rollback.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When multiple scheduled tasks or database backups run concurrently on
the same server, they compete for the same SSH multiplexed connection
socket, causing race conditions and SSH exit code 255 errors.
This fix adds a `disableMultiplexing` parameter to bypass SSH
multiplexing for jobs that may run concurrently:
- Add `disableMultiplexing` param to `generateSshCommand()`
- Add `disableMultiplexing` param to `instant_remote_process()`
- Update `ScheduledTaskJob` to use `disableMultiplexing: true`
- Update `DatabaseBackupJob` to use `disableMultiplexing: true`
- Add debug logging to track execution without multiplexing
- Add unit tests for the new parameter
Each backup and scheduled task now gets an isolated SSH connection,
preventing contention on the shared multiplexed socket.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Filter out null and empty environment variables when generating Nixpacks build
configuration to prevent JSON parsing errors. Environment variables with null or
empty values were being passed as `--env KEY=` which created invalid JSON with
null values, causing deployment failures.
This fix ensures only valid non-empty environment variables are included in both
user-defined and auto-generated COOLIFY_* environment variables.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Moved .log-highlight styles from Livewire component views to resources/css/app.css for better separation of concerns and reusability. This follows Laravel and Livewire best practices by keeping styles in the appropriate location rather than inline in component views.
Changes:
- Added .log-highlight styles to resources/css/app.css
- Removed inline <style> tags from deployment/show.blade.php
- Removed inline <style> tags from get-logs.blade.php
- Added XSS security test for log viewer
- Applied code formatting with Laravel Pint
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Instead of calling StopProxy::run() (synchronous) then StartProxy::run()
(async), now we build a single command sequence that includes both stop
and start phases. This creates one Activity immediately via remote_process(),
so the UI receives the activity ID right away and can show logs in real-time
from the very beginning of the restart operation.
Key changes:
- Removed dependency on StopProxy and StartProxy actions
- Build combined command sequence inline in buildRestartCommands()
- Use remote_process() directly which returns Activity immediately
- Increased timeout from 60s to 120s to accommodate full restart
- Activity ID dispatched to UI within milliseconds of job starting
Flow is now:
1. Job starts → sets "restarting" status
2. Commands built synchronously (fast, no SSH)
3. remote_process() creates Activity and dispatches CoolifyTask job
4. Activity ID sent to UI immediately via WebSocket
5. UI opens activity monitor with real-time streaming logs
6. Logs show "Stopping proxy..." then "Starting proxy..." as they happen
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Set proxy status to 'restarting' and dispatch ProxyStatusChangedUI event
at the very beginning of handle() method, before StopProxy runs. This
notifies the UI immediately so users know a restart is in progress,
rather than waiting until after the stop operation completes.
Also simplified unit tests to focus on testable job configuration
(middleware, tries, timeout) without complex SchemalessAttributes mocking.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When restarting the proxy on localhost (where Coolify is running), the UI becomes inaccessible because the connection is lost. This change makes all proxy restarts run as background jobs with WebSocket notifications, allowing the operation to complete even after connection loss.
Changes:
- Enhanced ProxyStatusChangedUI event to carry activityId for log monitoring
- Updated RestartProxyJob to dispatch status events and track activity
- Simplified Navbar restart() to always dispatch job for all servers
- Enhanced showNotification() to handle activity monitoring and new statuses
- Added comprehensive unit and feature tests
Benefits:
- Prevents localhost lockout during proxy restarts
- Consistent behavior across all server types
- Non-blocking UI with real-time progress updates
- Automatic activity log monitoring
- Proper error handling and recovery
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add support for degraded status from sub-resources as highest priority
- Handle mixed running+starting state to show service as not fully ready
- Update state priority hierarchy from 8 to 10 levels
- Add comprehensive test coverage for new status scenarios
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Pass the server timezone parameter to shouldRunNow() call at line 127,
ensuring ServerCheckJob dispatch respects the server's local timezone
instead of falling back to the instance default.
This aligns the behavior with other scheduled tasks in the same method:
- ServerStorageCheckJob (line 137)
- ServerPatchCheckJob (line 144)
- Sentinel restart (line 152)
All scheduled tasks in processServerTasks() now consistently use the
server's configured timezone for cron evaluation.
Added unit test to verify timezone-aware cron schedule evaluation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed a critical bug where $this->executionTime was being mutated during
the server processing loop, causing incorrect scheduling calculations for
subsequent servers.
The issue occurred at line 123 where subSeconds() was called directly on
the shared executionTime instance. This caused the baseline time to shift
by waitTime seconds with each server iteration, resulting in compounding
scheduling errors (e.g., 1680 seconds drift over 5 servers).
Changed:
- app/Jobs/ServerManagerJob.php:123
Added .copy() before .subSeconds() to prevent mutation
Added comprehensive unit tests that verify:
- Immutability when using .copy()
- Demonstration of the bug without .copy()
- Correct behavior across multiple iterations
This follows the existing pattern in shouldRunNow() (line 167) and aligns
with other jobs in the codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace substring matching with exact base image name comparison in isDatabaseImage() to prevent false positives (postgres no longer matches postgrest)
- Add 'timescaledb' and 'timescaledb-ha' to DATABASE_DOCKER_IMAGES constants for proper namespace handling
- Add empty state messaging when no applications are defined in Docker Compose configuration
- Maintain backward compatibility with all existing database patterns
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The regex pattern in injectDockerComposeBuildArgs() was too restrictive
and failed to match `docker compose build servicename` commands. Changed
the lookahead from `(?=\s+(?:--|-)|\s+(?:&&|\|\||;|\|)|$)` to the
simpler `(?=\s|$)` to allow any content after the build command,
including service names with hyphens/underscores and flags.
Also improved the ApplicationDeploymentJob to use the new helper function
and added comprehensive test coverage for service-specific builds.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes two critical issues preventing Traefik proxy startup:
1. TypeError when restarting proxy: Handle null return from get_traefik_versions()
- Add null check before dispatching CheckTraefikVersionForServerJob
- Log warning when version data is unavailable
- Prevents: "Argument #2 must be of type array, null given"
2. Docker network error: Filter out predefined Docker networks
- Add isDockerPredefinedNetwork() helper to centralize network filtering
- Apply filtering in collectDockerNetworksByServer() before operations
- Apply filtering in generateDefaultProxyConfiguration()
- Prevents: "operation is not permitted on predefined default network"
Also: Move $cachedVersionsFile assignment after null check in Proxy.php
Tests: Added 7 new unit tests for network filtering function
All existing tests pass with no regressions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changed from `->before('-')` to `->beforeLast('-')` to correctly parse service
names with hyphens. This fixes prerequisite application for ~230+ services
containing hyphens in their template names (e.g., docker-registry,
elasticsearch-with-kibana).
Added comprehensive test coverage for hyphenated service names and fixed
existing tests to use realistic CUID2 UUID format. All unit tests pass.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Refactors the Appwrite and Beszel service-specific application settings
to use a centralized constant-based approach, following the same pattern
as NEEDS_TO_CONNECT_TO_PREDEFINED_NETWORK.
Changes:
- Added NEEDS_TO_DISABLE_GZIP constant for services requiring gzip disabled
- Added NEEDS_TO_DISABLE_STRIPPREFIX constant for services requiring stripprefix disabled
- Created applyServiceApplicationPrerequisites() helper function in bootstrap/helpers/services.php
- Updated all service creation flows to use the centralized helper:
* app/Livewire/Project/Resource/Create.php (web handler)
* app/Http/Controllers/Api/ServicesController.php (API handler - BUG FIX)
* app/Livewire/Project/New/DockerCompose.php (custom compose handler)
* app/Http/Controllers/Api/ApplicationsController.php (API custom compose handler)
- Added comprehensive unit tests for the new helper function
Benefits:
- Single source of truth for service prerequisites
- DRY - eliminates code duplication between web and API handlers
- Fixes bug where API-created services didn't get prerequisites applied
- Easy to extend for future services (just edit the constant)
- More maintainable and testable
Related commits: 3a94f1ea1 (Beszel), 02b18c86e (Appwrite)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Changes
- **CheckForUpdatesJob**: Add triple version comparison (CDN vs cache vs running)
- Never allows version downgrade from currently running version
- Uses data_set() for safer nested array mutation
- Prevents incorrect new_version_available flag setting
- **UpdateCoolify**: Add cache validation before fallback
- Validates cache against running version on CDN failure
- Throws exception if cache is corrupted/older than running
- Applies to both manual and automated updates
- **Tests**: Add comprehensive test coverage
- tests/Unit/CheckForUpdatesJobTest.php (5 tests)
- tests/Unit/UpdateCoolifyTest.php (3 tests)
## Impact
- Prevents all downgrade scenarios (CDN rollback, corrupted cache, etc.)
- Maintains backward compatibility
- Provides clear logging for debugging
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Updated the Clickhouse service template to use the official `clickhouse/clickhouse-server` image.
- Removed the usage of the deprecated `bitnamilegacy/clickhouse` image.
- fixes#7110
Migrates 8 database start action files from deprecated --time=10 to compatible -t 10 flag for Docker v28+ compatibility. Also updates test expectations in StopProxyTest.php.
Docker deprecated the --time flag in v28.0. The -t shorthand works on all Docker versions (pre-28 and 28+), ensuring backward and forward compatibility.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Migrates 8 database start action files from deprecated --time=10 to compatible -t 10 flag for Docker v28+ compatibility. Also updates test expectations in StopProxyTest.php.
Docker deprecated the --time flag in v28.0. The -t shorthand works on all Docker versions (pre-28 and 28+), ensuring backward and forward compatibility.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Include 'Inject Build Args to Dockerfile' and 'Include Source Commit in Build' settings in the configuration hash calculation. These settings affect Docker build behavior, so changes to them should trigger the restart required notification. Add unit tests to verify hash changes when these settings are modified.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add endsWith() checks before appending template paths in serviceParser() to
prevent duplicate paths when parse() is called after FQDN updates. This fixes
the bug where services like Appwrite realtime would get `/v1/realtime/v1/realtime`.
Fixes#7363🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix sudo prefix bug: Use word boundary matching to prevent 'do' keyword from matching 'docker' commands
- Add ensureProxyNetworksExist() helper to create networks before docker compose up
- Ensure networks exist synchronously before dispatching async proxy startup to prevent race conditions
- Update comprehensive unit tests for sudo parsing (50 tests passing)
This resolves issues where Docker commands failed to execute with sudo on non-root servers and where proxy networks were not created before the proxy container started.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes issue #7346 where proxy startup failed on non-root servers due to
bash syntax errors when control structure keywords like 'for', 'do', 'done',
'break', and 'continue' were being prefixed with 'sudo'.
Added comprehensive exclusion list including for/while/until/case/select
loops, conditionals (if/then/else/elif/fi), and loop control keywords
(break/continue). Also excludes comment lines starting with '#'.
All 37 unit tests pass, including new tests for each bash control structure.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add path attribute mutator to S3Storage model ensuring paths start with /
- Add updatedS3Path hook to normalize path and reset validation state on blur
- Add updatedS3StorageId hook to reset validation state when storage changes
- Add Enter key support to trigger file check from path input
- Use wire:model.live for S3 storage select, wire:model.blur for path input
- Improve shell escaping in restore job cleanup commands
- Fix isSafeTmpPath helper logic for directory validation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This fixes critical bugs where Stringable objects were used in strict comparisons and collection key lookups, causing service existence checks and domain lookups to fail.
**Changes:**
- Line 539: Added ->value() to $originalServiceName conversion
- Line 541: Added ->value() to $serviceName normalization
- Line 621: Removed redundant (string) cast now that $serviceName is a plain string
**Impact:**
- Service existence check now works correctly (line 606: $transformedServiceName === $serviceName)
- Domain lookup finds existing domains (line 615: $domains->get($serviceName))
- Prevents duplicate domain entries in docker_compose_domains collection
**Tests:**
- Added comprehensive unit test suite in ApplicationParserStringableTest.php
- 9 test cases covering type verification, strict comparisons, collection operations, and edge cases
- All tests pass (24 tests, 153 assertions across related parser tests)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Exited containers don't run health checks, so showing "(unhealthy)" is
misleading. This fix ensures exited status displays without health
suffixes across all monitoring systems (SSH, Sentinel, services, etc.)
and at the UI layer for backward compatibility with existing data.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit enhances the boarding flow to handle prerequisite installation asynchronously with proper retry logic and user feedback:
- Add retry mechanism with max 3 attempts for prerequisite installation
- Display live installation logs via ActivityMonitor during boarding
- Reset ActivityMonitor state when starting new activity to prevent stale event dispatching
- Support dynamic header updates in ActivityMonitor
- Add prerequisitesInstalled event handler to revalidate after installation completes
- Extract validation logic into continueValidation() method for cleaner flow
- Add unit tests for prerequisite installation logic
This improves UX by showing users real-time progress during prerequisite installation and handles installation failures gracefully with automatic retries.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The updateCompose() function now correctly detects SERVICE_URL_* and
SERVICE_FQDN_* variables regardless of whether they are defined in
YAML list-style or map-style format.
Previously, the code only worked with list-style environment definitions:
```yaml
environment:
- SERVICE_URL_APP_3000
```
Now it also handles map-style definitions:
```yaml
environment:
SERVICE_URL_TRIGGER_3000: ""
SERVICE_FQDN_DB: localhost
```
The fix distinguishes between the two formats by checking if the array
key is numeric (list-style) or a string (map-style), then extracts the
variable name from the appropriate location.
Added 5 comprehensive unit tests covering:
- Map-style environment format detection
- Multiple map-style variables
- References vs declarations in map-style
- Abbreviated service names with map-style
- Verification of dual-format handling
This fixes variable detection for service templates like trigger.yaml,
langfuse.yaml, and paymenter.yaml that use map-style format.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Parse template variables directly instead of generating from container names. Always create both SERVICE_URL and SERVICE_FQDN pairs together. Properly separate scheme handling (URL has scheme, FQDN doesn't). Add comprehensive test coverage.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds comprehensive validation improvements and DRY principles for handling Coolify's custom Docker Compose extensions.
## Changes
### 1. Created Reusable stripCoolifyCustomFields() Function
- Added shared helper in bootstrap/helpers/docker.php
- Removes all Coolify custom fields (exclude_from_hc, content, isDirectory, is_directory)
- Handles both long syntax (arrays) and short syntax (strings) for volumes
- Well-documented with comprehensive docblock
- Follows DRY principle for consistent field stripping
### 2. Fixed Docker Compose Modal Validation
- Updated validateComposeFile() to use stripCoolifyCustomFields()
- Now removes ALL custom fields before Docker validation (previously only removed content)
- Fixes validation errors when using templates with custom fields (e.g., traccar.yaml)
- Users can now validate compose files with Coolify extensions in UI
### 3. Enhanced YAML Validation in CalculatesExcludedStatus
- Added proper exception handling with ParseException vs generic Exception
- Added structure validation (checks if parsed result and services are arrays)
- Comprehensive logging with context (error message, line number, snippet)
- Maintains safe fallback behavior (returns empty collection on error)
### 4. Added Integer Validation to ContainerStatusAggregator
- Validates maxRestartCount parameter in both aggregateFromStrings() and aggregateFromContainers()
- Corrects negative values to 0 with warning log
- Logs warnings for suspiciously high values (> 1000)
- Prevents logic errors in crash loop detection
### 5. Comprehensive Unit Tests
- tests/Unit/StripCoolifyCustomFieldsTest.php (NEW) - 9 tests, 43 assertions
- tests/Unit/ContainerStatusAggregatorTest.php - Added 6 tests for integer validation
- tests/Unit/ExcludeFromHealthCheckTest.php - Added 4 tests for YAML validation
- All tests passing with proper Log facade mocking
### 6. Documentation
- Added comprehensive Docker Compose extensions documentation to .ai/core/deployment-architecture.md
- Documents all custom fields: exclude_from_hc, content, isDirectory/is_directory
- Includes examples, use cases, implementation details, and test references
- Updated .ai/README.md with navigation links to new documentation
## Benefits
- Better UX: Users can validate compose files with custom fields
- Better Debugging: Comprehensive logging for errors
- Better Code Quality: DRY principle with reusable validation
- Better Reliability: Prevents logic errors from invalid parameters
- Better Maintainability: Easy to add new custom fields in future
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states.
- Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status.
- Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components.
- Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere.
- Updated version number for sentinel to 0.0.17 in versions.json.
Fixes inconsistency where Service model used manual state machine logic while
all other components (Application, ComplexStatusCheck, GetContainersStatus)
use the centralized ContainerStatusAggregator service.
Changes:
- Refactored Service::aggregateResourceStatuses() to use ContainerStatusAggregator
- Removed ~60 lines of duplicated state machine logic
- Added comprehensive ServiceExcludedStatusTest with 24 test cases
- Fixed bugs in old logic where paused/starting containers were incorrectly
marked as unhealthy (should be unknown)
Benefits:
- Single source of truth for status aggregation across all models
- Leverages 42 existing ContainerStatusAggregator tests
- Consistent behavior between Service and Application/Database models
- Easier maintenance (state machine changes only in one place)
All tests pass (37 total):
- ServiceExcludedStatusTest: 24/24 passed
- AllExcludedContainersConsistencyTest: 13/13 passed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit addresses container status reporting issues and removes debug logging:
**Primary Fix:**
- Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data
- This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy"
- Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown'
**Service Multi-Container Aggregation:**
- Implemented service container status aggregation (same pattern as applications)
- Added serviceContainerStatuses collection to both Sentinel and SSH paths
- Services now aggregate status using priority: unhealthy > unknown > healthy
- Prevents race conditions where last-processed container would win
**Debug Logging Cleanup:**
- Removed all [STATUS-DEBUG] logging statements (25 total)
- Removed all ray() debugging calls (3 total)
- Removed proof_unknown_preserved and health_status_was_null debug fields
- Code is now production-ready
**Test Coverage:**
- Added 2 new tests for Sentinel default health status behavior
- Added 5 new tests for service aggregation in SSH path
- All 16 tests pass (66 assertions)
**Note:** The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Problem
Services with "running (unknown)" status were periodically changing
to "running (healthy)" every ~30 seconds when Sentinel pushed updates.
This was confusing for users and inconsistent with SSH-based status checks.
## Root Cause
`PushServerUpdateJob::aggregateMultiContainerStatuses()` was missing
logic to track "unknown" health state. It only tracked "unhealthy" and
defaulted everything else to "healthy".
When Sentinel pushed updates with "running (unknown)" containers:
- The job saw `hasRunning = true` and `hasUnhealthy = false`
- It incorrectly returned "running (healthy)" instead of "running (unknown)"
## Solution
Updated `PushServerUpdateJob` to match the logic in `GetContainersStatus`:
1. Added `$hasUnknown` tracking variable
2. Check for "unknown" in status strings (alongside "unhealthy")
3. Implement 3-way priority: unhealthy > unknown > healthy
This ensures consistency between:
- SSH-based updates (`GetContainersStatus`)
- Sentinel-based updates (`PushServerUpdateJob`)
- UI display logic
## Changes
- **app/Jobs/PushServerUpdateJob.php**: Added unknown status tracking
- **tests/Unit/PushServerUpdateJobStatusAggregationTest.php**: New comprehensive tests
- **tests/Unit/ExcludeFromHealthCheckTest.php**: Updated to match current implementation
## Testing
All 31 status-related unit tests passing:
- 18 tests in ContainerHealthStatusTest
- 8 tests in ExcludeFromHealthCheckTest (updated)
- 6 tests in PushServerUpdateJobStatusAggregationTest (new)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes container health status aggregation to correctly handle
unknown health states and edge case container states across all resource types.
Changes:
1. **Preserve Unknown Health State**
- Add three-way priority: unhealthy > unknown > healthy
- Detect containers without healthchecks (null health) as unknown
- Apply across GetContainersStatus, ComplexStatusCheck, and Service models
2. **Handle Edge Case Container States**
- Add support for: created, starting, paused, dead, removing
- Map to appropriate statuses: starting (unknown), paused (unknown), degraded (unhealthy)
- Prevent containers in transitional states from showing incorrect status
3. **Add :excluded Suffix for Excluded Containers**
- Parse exclude_from_hc flag from docker-compose YAML
- Append :excluded suffix to individual container statuses
- Skip :excluded containers in non-excluded aggregation sections
- Strip :excluded suffix in excluded aggregation sections
- Makes it clear in UI which containers are excluded from monitoring
Files Modified:
- app/Actions/Docker/GetContainersStatus.php
- app/Actions/Shared/ComplexStatusCheck.php
- app/Models/Service.php
- tests/Unit/ContainerHealthStatusTest.php
Tests: 18 passed (82 assertions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When all containers are excluded from health checks, display their actual status
with :excluded suffix instead of misleading hardcoded statuses. This prevents
broken UI state with incorrect action buttons and provides clarity that monitoring
is disabled.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Adds a new EnvVarInput component that provides autocomplete suggestions for shared environment variables from team, project, and environment scopes. Users can reference variables using {{ syntax.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
When all services in a Docker Compose file have `exclude_from_hc: true`,
the status aggregation logic was returning invalid states causing broken UI.
**Problems fixed:**
- ComplexStatusCheck returned 'running:healthy' for apps with no monitored containers
- Service model returned ':' (null status) when all services excluded
- UI showed active start/stop buttons for non-running services
**Changes:**
- ComplexStatusCheck: Return 'exited:healthy' when relevantContainerCount is 0
- Service model: Return 'exited:healthy' when both status and health are null
- Added comprehensive unit tests to verify the fixes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Refine regex pattern to prevent false positives with flags like -foo, -from, -feature
- Change from \S (any non-whitespace) to [.~/]|$ (path characters or end of word)
- Add comprehensive tests for false positive prevention (4 test cases)
- Add path normalization tests for baseDirectory edge cases (6 test cases)
- Add @example documentation to injectDockerComposeFlags function
Prevents incorrect detection of:
- -foo, -from, -feature, -fast as the -f flag
- Ensures -f flag is only detected when followed by path characters or end of word
All 45 tests passing with 135 assertions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix double-slash issue in Docker Compose preview paths when baseDirectory is "/"
- Normalize baseDirectory using rtrim() to prevent path concatenation issues
- Replace hardcoded '/artifacts/build-time.env' with ApplicationDeploymentJob::BUILD_TIME_ENV_PATH
- Make BUILD_TIME_ENV_PATH constant public for reusability
- Add comprehensive unit tests (11 test cases, 25 assertions)
Fixes preview path generation in:
- getDockerComposeBuildCommandPreviewProperty()
- getDockerComposeStartCommandPreviewProperty()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When using a custom Docker Compose build command, environment variables
were being lost because the --env-file flag was not included. This fix
automatically injects the --env-file flag to ensure build-time environment
variables are available during custom builds.
Changes:
- Auto-inject --env-file /artifacts/build-time.env after docker compose
- Respect user-provided --env-file flags (no duplication)
- Append build arguments when not using build secrets
- Update UI helper text to inform users about automatic env injection
- Add comprehensive unit tests (7 test cases, all passing)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Move notification logic from NotifyOutdatedTraefikServersJob into CheckTraefikVersionForServerJob to send immediate notifications when outdated Traefik is detected. This is more suitable for cloud environments with thousands of servers.
Changes:
- CheckTraefikVersionForServerJob now sends notifications immediately after detecting outdated Traefik
- Remove NotifyOutdatedTraefikServersJob (no longer needed)
- Remove delay calculation logic from CheckTraefikVersionJob
- Update tests to reflect new immediate notification pattern
Trade-offs:
- Pro: Faster notifications (immediate alerts)
- Pro: Simpler codebase (removed complex delay calculation)
- Pro: Better scalability for thousands of servers
- Con: Teams may receive multiple notifications if they have many outdated servers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Store both patch update and newer minor version information simultaneously
- Display patch update availability alongside minor version upgrades in notifications
- Add newer_branch_target and newer_branch_latest fields to traefik_outdated_info
- Update all notification channels (Discord, Telegram, Slack, Pushover, Email, Webhook)
- Show minor version in format (e.g., v3.6) for upgrade targets instead of patch version
- Enhance UI callouts with clearer messaging about available upgrades
- Remove verbose logging in favor of cleaner code structure
- Handle edge case where SSH command returns empty response
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Addresses critical performance issues identified in code review by refactoring the monolithic CheckTraefikVersionJob into a distributed architecture with parallel processing.
Changes:
- Split version checking into CheckTraefikVersionForServerJob for parallel execution
- Extract notification logic into NotifyOutdatedTraefikServersJob
- Dispatch individual server checks concurrently to handle thousands of servers
- Add comprehensive unit tests for the new job architecture
- Update feature tests to cover the refactored workflow
Performance improvements:
- Sequential SSH calls replaced with parallel queue jobs
- Scales efficiently for large installations with thousands of servers
- Reduces job execution time from hours to minutes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add wait loops to ensure containers are fully removed before restarting.
This fixes race conditions where docker compose would fail because an
existing container was still being cleaned up.
Changes:
- StartProxy: Add explicit stop, wait loop before docker compose up
- StopProxy: Add wait loop after container removal
- Both actions now poll up to 10 seconds for complete removal
- Add error suppression to handle non-existent containers gracefully
Tests:
- Add StartProxyTest.php with 3 tests for cleanup logic
- Add StopProxyTest.php with 4 tests for stop behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes a critical N+1 query issue in CheckTraefikVersionJob
that was loading ALL proxy servers into memory then filtering in PHP,
causing potential OOM errors with thousands of servers.
Changes:
- Added scopeWhereProxyType() query scope to Server model for
database-level filtering using JSON column arrow notation
- Updated CheckTraefikVersionJob to use new scope instead of
collection filter, moving proxy type filtering into the SQL query
- Added comprehensive unit tests for the new query scope
Performance impact:
- Before: SELECT * FROM servers WHERE proxy IS NOT NULL (all servers)
- After: SELECT * FROM servers WHERE proxy->>'type' = 'TRAEFIK' (filtered)
- Eliminates memory overhead of loading non-Traefik servers
- Critical for cloud instances with thousands of connected servers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add automated Traefik version checking job running weekly on Sundays
- Implement version detection from running containers and comparison with versions.json
- Add notifications across all channels (Email, Discord, Slack, Telegram, Pushover, Webhook) for outdated versions
- Create dismissible callout component with localStorage persistence
- Display cross-branch upgrade warnings (e.g., v3.5 -> v3.6) with changelog links
- Show patch update notifications within same branch
- Add warning icon that appears when callouts are dismissed
- Prevent duplicate notifications during proxy restart by adding restarting parameter
- Fix notification spam with transition-based logic for status changes
- Enable system email settings by default in development mode
- Track last saved/applied proxy settings to detect configuration drift
- Refine regex pattern to prevent false positives with flags like -foo, -from, -feature
- Change from \S (any non-whitespace) to [.~/]|$ (path characters or end of word)
- Add comprehensive tests for false positive prevention (4 test cases)
- Add path normalization tests for baseDirectory edge cases (6 test cases)
- Add @example documentation to injectDockerComposeFlags function
Prevents incorrect detection of:
- -foo, -from, -feature, -fast as the -f flag
- Ensures -f flag is only detected when followed by path characters or end of word
All 45 tests passing with 135 assertions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix double-slash issue in Docker Compose preview paths when baseDirectory is "/"
- Normalize baseDirectory using rtrim() to prevent path concatenation issues
- Replace hardcoded '/artifacts/build-time.env' with ApplicationDeploymentJob::BUILD_TIME_ENV_PATH
- Make BUILD_TIME_ENV_PATH constant public for reusability
- Add comprehensive unit tests (11 test cases, 25 assertions)
Fixes preview path generation in:
- getDockerComposeBuildCommandPreviewProperty()
- getDockerComposeStartCommandPreviewProperty()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When using a custom Docker Compose build command, environment variables
were being lost because the --env-file flag was not included. This fix
automatically injects the --env-file flag to ensure build-time environment
variables are available during custom builds.
Changes:
- Auto-inject --env-file /artifacts/build-time.env after docker compose
- Respect user-provided --env-file flags (no duplication)
- Append build arguments when not using build secrets
- Update UI helper text to inform users about automatic env injection
- Add comprehensive unit tests (7 test cases, all passing)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>