coolify

Author	SHA1	Message	Date
Andras Bacsai	70ff73e954	Merge branch 'next' into macau-v1 Resolved conflicts in ServerManagerJob.php by: - Keeping sentinel update check code from macau-v1 - Preserving sentinel restart code from next branch - Ensuring no duplicate code blocks	2025-12-04 15:07:36 +01:00
Andras Bacsai	9e0fa03434	Run proxy restart as background job with real-time logs (#7475 )	2025-12-04 14:59:50 +01:00
Andras Bacsai	4002044877	Refactor: Move sentinel update checks to ServerManagerJob and add tests for hourly dispatch	2025-12-04 14:58:18 +01:00
Andras Bacsai	05eed974cb	Add log search, download, and collapsible sections (#7484 )	2025-12-04 13:55:40 +01:00
Andras Bacsai	bf8dcac88c	Move inline styles to global CSS file Moved .log-highlight styles from Livewire component views to resources/css/app.css for better separation of concerns and reusability. This follows Laravel and Livewire best practices by keeping styles in the appropriate location rather than inline in component views. Changes: - Added .log-highlight styles to resources/css/app.css - Removed inline <style> tags from deployment/show.blade.php - Removed inline <style> tags from get-logs.blade.php - Added XSS security test for log viewer - Applied code formatting with Laravel Pint 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-04 13:15:01 +01:00
Andras Bacsai	d3eaae1aea	Increase scheduled task timeout limit to 36000 seconds Extended the maximum allowed timeout for scheduled tasks from 3600 to 36000 seconds (10 hours). Also passes the configured timeout to instant_remote_process() so the SSH command respects the timeout setting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 20:04:55 +01:00
Andras Bacsai	05fc5d70c5	Fix: Pass backup timeout to remote SSH process Allows user-configured backup timeouts > 3600 to be respected. Previously, the SSH process used a hardcoded 3600 second timeout regardless of the job timeout setting. Now the timeout is passed through to instant_remote_process() for all backup operations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 16:37:38 +01:00
Andras Bacsai	387a093f04	Fix container name conflict during proxy restart The error "container name already in use" occurred because the container wasn't fully removed before docker compose up tried to create a new one. Changes: - Removed redundant stop/remove logic from START PHASE (was duplicating STOP PHASE) - Made STOP PHASE more robust: - Increased wait iterations from 10 to 15 - Added force remove on each iteration in case container got stuck - Added final verification and force cleanup after the loop - Added better logging to show removal progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 16:30:27 +01:00
Andras Bacsai	36da7174d5	Combine stop+start into single activity for real-time logs Instead of calling StopProxy::run() (synchronous) then StartProxy::run() (async), now we build a single command sequence that includes both stop and start phases. This creates one Activity immediately via remote_process(), so the UI receives the activity ID right away and can show logs in real-time from the very beginning of the restart operation. Key changes: - Removed dependency on StopProxy and StartProxy actions - Build combined command sequence inline in buildRestartCommands() - Use remote_process() directly which returns Activity immediately - Increased timeout from 60s to 120s to accommodate full restart - Activity ID dispatched to UI within milliseconds of job starting Flow is now: 1. Job starts → sets "restarting" status 2. Commands built synchronously (fast, no SSH) 3. remote_process() creates Activity and dispatches CoolifyTask job 4. Activity ID sent to UI immediately via WebSocket 5. UI opens activity monitor with real-time streaming logs 6. Logs show "Stopping proxy..." then "Starting proxy..." as they happen 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 16:21:26 +01:00
Andras Bacsai	340e42aefd	Dispatch restarting status immediately when job starts Set proxy status to 'restarting' and dispatch ProxyStatusChangedUI event at the very beginning of handle() method, before StopProxy runs. This notifies the UI immediately so users know a restart is in progress, rather than waiting until after the stop operation completes. Also simplified unit tests to focus on testable job configuration (middleware, tries, timeout) without complex SchemalessAttributes mocking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 16:18:13 +01:00
Andras Bacsai	c42fb81347	Fix restart initiated duplicate and restore activity logs - Add restartInitiated flag to prevent duplicate "Proxy restart initiated" messages - Restore ProxyStatusChangedUI dispatch with activityId in RestartProxyJob - This allows the UI to open the activity monitor and show logs during restart - Simplified restart message (removed redundant "Monitor progress" text) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 16:11:56 +01:00
Andras Bacsai	b00d8902f4	Fix duplicate proxy restart notifications - Remove redundant ProxyStatusChangedUI dispatch from RestartProxyJob (ProxyStatusChanged event already triggers the listener that dispatches it) - Remove redundant Traefik version check from RestartProxyJob (already handled by ProxyStatusChangedNotification listener) - Add lastNotifiedStatus tracking to prevent duplicate toasts - Remove notifications for unknown/default statuses (too noisy) - Simplify RestartProxyJob to only handle stop/start logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 16:09:47 +01:00
Andras Bacsai	e4810a28d2	Make proxy restart run as background job to prevent localhost lockout When restarting the proxy on localhost (where Coolify is running), the UI becomes inaccessible because the connection is lost. This change makes all proxy restarts run as background jobs with WebSocket notifications, allowing the operation to complete even after connection loss. Changes: - Enhanced ProxyStatusChangedUI event to carry activityId for log monitoring - Updated RestartProxyJob to dispatch status events and track activity - Simplified Navbar restart() to always dispatch job for all servers - Enhanced showNotification() to handle activity monitoring and new statuses - Added comprehensive unit and feature tests Benefits: - Prevents localhost lockout during proxy restarts - Consistent behavior across all server types - Non-blocking UI with real-time progress updates - Automatic activity log monitoring - Proper error handling and recovery 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 10:30:12 +01:00
Andras Bacsai	b55aaf34d3	Decouple ServerStorageCheckJob from Sentinel sync (#7454 )	2025-12-03 10:29:10 +01:00
Andras Bacsai	56a0143a25	Fix: Prevent ServerStorageCheckJob duplication when Sentinel is active When Sentinel is enabled and in sync, ServerStorageCheckJob was being dispatched from two locations causing unnecessary duplication: 1. PushServerUpdateJob (every ~30s with real-time filesystem data) 2. ServerManagerJob (scheduled cron check via SSH) This commit modifies ServerManagerJob to only dispatch ServerStorageCheckJob when Sentinel is out of sync or disabled. When Sentinel is active and in sync, PushServerUpdateJob provides real-time storage data, making the scheduled SSH check redundant. Benefits: - Eliminates duplicate storage checks when Sentinel is working - Reduces unnecessary SSH overhead - Storage checks still run as fallback when Sentinel fails - Maintains scheduled checks for servers without Sentinel Updated tests to reflect new behavior: - Storage check NOT dispatched when Sentinel is in sync - Storage check dispatched when Sentinel is out of sync or disabled - All timezone and frequency tests updated accordingly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 10:05:10 +01:00
Andras Bacsai	fb8eb3fa37	Fix Traefik warning persistence after proxy configuration update (#7466 )	2025-12-03 09:57:14 +01:00
Andras Bacsai	13b7c3dbfc	Add real-time UI updates after Traefik version check Dispatch ProxyStatusChangedUI event after version check completes so the UI updates in real-time without requiring page refresh. Changes: - Add ProxyStatusChangedUI::dispatch() at all exit points in CheckTraefikVersionForServerJob - Ensures UI refreshes automatically via WebSocket when version check completes - Works for all scenarios: version detected, using latest tag, outdated version, up-to-date User experience: - User restarts proxy - Warning clears automatically in real-time (no refresh needed) - Leverages existing WebSocket infrastructure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 09:56:04 +01:00
Andras Bacsai	c982d58eee	Refactor: Move Sentinel restart logic into processServerTasks method	2025-12-03 09:22:00 +01:00
Andras Bacsai	f75bc85bc1	Merge branch 'next' into decouple-storage-from-sentinel	2025-12-03 09:19:09 +01:00
Andras Bacsai	9c80e15dd9	fix: prevent cleanup exceptions from marking successful deployments as failed (#7460 )	2025-12-03 09:18:52 +01:00
Andras Bacsai	a767ca30e6	fix: log unhealthy container status during health check	2025-12-03 09:18:32 +01:00
Andras Bacsai	a18e920e4c	fix: remove logging of cleanup failures to prevent false deployment errors	2025-12-03 09:16:28 +01:00
Andras Bacsai	66e81d6d96	Fix container status display: preserve "Restarting" for applications and sub-resources Add preserveRestarting parameter to ContainerStatusAggregator to allow applications and service sub-resources to display "Restarting" status instead of being marked as "Degraded". This gives better visibility into container restart behavior. - Update ContainerStatusAggregator to accept preserveRestarting parameter (defaults to false) - Update GetContainersStatus to use preserveRestarting: true for applications and service sub-resources - Update PushServerUpdateJob to use preserveRestarting: true for applications and service sub-resources - Add comprehensive documentation explaining the parameter behavior and when to use it 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 08:23:35 +01:00
Andras Bacsai	cfea11f189	fix: prevent cleanup exceptions from marking successful deployments as failed Fixes #7439 where successful deployments were being marked as FAILED due to exceptions during old container cleanup. Root cause: Commit `97550f406` wrapped stop_running_container() in try-catch that re-throws ALL exceptions as DeploymentException. When old containers are already removed (a common scenario), the "No such container" error propagates and marks successful deployments as failed. Solution: Check if deployment has already succeeded (newVersionIsHealthy \|\| force) before re-throwing exceptions from cleanup operations. Cleanup failures are logged but don't fail the deployment. - Add conditional handling in stop_running_container() catch block - Log cleanup warnings with hidden: true to avoid UI clutter - Only re-throw exceptions if deployment hasn't succeeded yet - Preserves backward compatibility and expected behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 17:28:48 +01:00
Andras Bacsai	8ff83cc3d6	Fix: Pass $serverTimezone to shouldRunNow() in ServerCheckJob dispatch Pass the server timezone parameter to shouldRunNow() call at line 127, ensuring ServerCheckJob dispatch respects the server's local timezone instead of falling back to the instance default. This aligns the behavior with other scheduled tasks in the same method: - ServerStorageCheckJob (line 137) - ServerPatchCheckJob (line 144) - Sentinel restart (line 152) All scheduled tasks in processServerTasks() now consistently use the server's configured timezone for cron evaluation. Added unit test to verify timezone-aware cron schedule evaluation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 16:58:43 +01:00
Andras Bacsai	ed5796739f	Fix: Prevent ServerManagerJob executionTime mutation across server loop Fixed a critical bug where $this->executionTime was being mutated during the server processing loop, causing incorrect scheduling calculations for subsequent servers. The issue occurred at line 123 where subSeconds() was called directly on the shared executionTime instance. This caused the baseline time to shift by waitTime seconds with each server iteration, resulting in compounding scheduling errors (e.g., 1680 seconds drift over 5 servers). Changed: - app/Jobs/ServerManagerJob.php:123 Added .copy() before .subSeconds() to prevent mutation Added comprehensive unit tests that verify: - Immutability when using .copy() - Demonstration of the bug without .copy() - Correct behavior across multiple iterations This follows the existing pattern in shouldRunNow() (line 167) and aligns with other jobs in the codebase. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 15:27:17 +01:00
Andras Bacsai	b47181c790	Decouple ServerStorageCheckJob from Sentinel sync status Server disk usage checks now run on their configured schedule regardless of Sentinel status, eliminating monitoring blind spots when Sentinel is offline, out of sync, or disabled. Storage checks now respect server timezone settings, consistent with patch checks. Changes: - Moved server timezone calculation to top of processServerTasks() - Extracted ServerStorageCheckJob dispatch from Sentinel conditional - Fixed default frequency to '0 23 * * *' (11 PM daily) - Added timezone parameter to storage check scheduling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 13:36:25 +01:00
Andras Bacsai	d59c75c2b2	Fix: Docker build args injection regex to support service names The regex pattern in injectDockerComposeBuildArgs() was too restrictive and failed to match `docker compose build servicename` commands. Changed the lookahead from `(?=\s+(?:--\|-)\|\s+(?:&&\|\\|\\|\|;\|\\|)\|$)` to the simpler `(?=\s\|$)` to allow any content after the build command, including service names with hyphens/underscores and flags. Also improved the ApplicationDeploymentJob to use the new helper function and added comprehensive test coverage for service-specific builds. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-01 13:16:05 +01:00
Andras Bacsai	cd10796612	Fix: Version downgrade prevention - validate cache and add running version checks ## Changes - CheckForUpdatesJob: Add triple version comparison (CDN vs cache vs running) - Never allows version downgrade from currently running version - Uses data_set() for safer nested array mutation - Prevents incorrect new_version_available flag setting - UpdateCoolify: Add cache validation before fallback - Validates cache against running version on CDN failure - Throws exception if cache is corrupted/older than running - Applies to both manual and automated updates - Tests: Add comprehensive test coverage - tests/Unit/CheckForUpdatesJobTest.php (5 tests) - tests/Unit/UpdateCoolifyTest.php (3 tests) ## Impact - Prevents all downgrade scenarios (CDN rollback, corrupted cache, etc.) - Maintains backward compatibility - Provides clear logging for debugging 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 16:05:41 +01:00
Andras Bacsai	d9774d2968	Fix: Prevent version downgrades and centralize CDN configuration (#7383 ) ## Root Cause Between Nov 25-26, a CDN redirect was added without curl's `-L` flag, causing version cache corruption and automatic downgrades. ## Three Critical Bugs Fixed ### Bug #1: CheckForUpdatesJob could overwrite newer cached version - Problem: CDN serving older version would overwrite local cache - Solution: Smart version merge - keep max Coolify version, update other components - Location: app/Jobs/CheckForUpdatesJob.php:33-52 ### Bug #2: Manual updates bypassed downgrade protection - Problem: Downgrade guard only applied to auto-updates - Solution: Always block downgrades for both manual and auto-updates - Location: app/Actions/Server/UpdateCoolify.php:65-75 ### Bug #3: Updates used stale local cache - Problem: Never validated cache against CDN at update time - Solution: Fetch fresh CDN data before executing updates - Location: app/Actions/Server/UpdateCoolify.php:34-49 ## Additional Improvement: Centralized CDN Configuration Added three new config keys for easy CDN management: - `cdn_url` - Base CDN URL (default: https://cdn.coollabs.io) - `versions_url` - Full versions.json URL - `upgrade_script_url` - Full upgrade.sh URL All configurable via environment variables: ```bash CDN_URL=https://cdn.coolify.io VERSIONS_URL=https://custom-cdn.example.com/versions.json UPGRADE_SCRIPT_URL=https://custom-cdn.example.com/upgrade.sh ``` ## Files Modified - config/constants.php - CDN configuration - app/Jobs/CheckForUpdatesJob.php - Smart version merge + centralized URL - app/Actions/Server/UpdateCoolify.php - Downgrade protection + fresh fetch + centralized URLs - app/Jobs/CheckHelperImageJob.php - Centralized URL - bootstrap/helpers/shared.php - Centralized URL ## Testing - ✅ All modified files pass Pint formatting - ✅ 78 unit tests pass (2 pre-existing failures unrelated to changes) ## Impact - No breaking changes - defaults to current CDN - Easy CDN migration via environment variables - Prevents all downgrade scenarios - Maintains independent Sentinel/Helper/Traefik updates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 15:20:33 +01:00
Andras Bacsai	25a96ad662	fix(docker): replace deprecated --time flag with -t for full compatibility across Docker versions (#6807 )	2025-11-28 10:47:19 +01:00
Andras Bacsai	1d054b23b8	Merge branch 'next' into shadow/fix-docker-time-command	2025-11-28 10:25:42 +01:00
Andras Bacsai	be2b01786a	fix: prevent duplicate environment variables in buildtime.env Refactors generate_buildtime_environment_variables() to use an associative array (dictionary) approach instead of sequential push() calls. This prevents duplicate variable declarations in the buildtime.env file. Problem: After adding nixpacks plan variables to buildtime.env, the same variable could appear twice in the file: - Once from nixpacks plan (e.g., NIXPACKS_NODE_VERSION='22') - Once from user-defined variables (e.g., NIXPACKS_NODE_VERSION="22") This caused shell errors and undefined behavior during Docker builds. Root Cause: The push() method adds items sequentially without checking for duplicate keys. When a variable existed in both nixpacks plan AND user-defined vars, both would be written to the file. Solution: - Use associative array ($envs_dict) for automatic deduplication - Establish clear override precedence: 1. Nixpacks plan variables (lowest priority) 2. COOLIFY_* variables (medium priority) 3. SERVICE_* variables (medium priority) 4. User-defined variables (highest priority - can override everything) - Convert to collection format at the end - Add debug logging when user variables override plan variables Benefits: - Automatic deduplication (array keys are unique by nature) - User variables properly override nixpacks plan values - Clear, explicit precedence order - No breaking changes to existing functionality Fixes #7114 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 10:10:19 +01:00
Andras Bacsai	ef332b9af4	fix: add support for nixpacks plan variables in buildtime environment	2025-11-28 10:10:19 +01:00
Andras Bacsai	0073d045fb	fix: enhance security by validating and escaping database names, file paths, and proxy configuration filenames to prevent command injection	2025-11-27 14:36:31 +01:00
Andras Bacsai	246e3cd8a2	fix: resolve Docker validation race conditions and sudo prefix bug - Fix sudo prefix bug: Use word boundary matching to prevent 'do' keyword from matching 'docker' commands - Add ensureProxyNetworksExist() helper to create networks before docker compose up - Ensure networks exist synchronously before dispatching async proxy startup to prevent race conditions - Update comprehensive unit tests for sudo parsing (50 tests passing) This resolves issues where Docker commands failed to execute with sudo on non-root servers and where proxy networks were not created before the proxy container started. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 09:04:42 +01:00
Andras Bacsai	837391c31b	feat: add Docker build cache preservation toggles and development logging Add two new application settings to control Docker build cache invalidation: - inject_build_args_to_dockerfile (default: true) - Skip Dockerfile ARG injection - include_source_commit_in_build (default: false) - Exclude SOURCE_COMMIT from build context These toggles let users preserve Docker cache when SOURCE_COMMIT or custom ARGs change frequently. Development-only logging shows which ARGs are being injected for debugging. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-26 13:42:02 +01:00
Andras Bacsai	4e896cca05	fix: preserve Docker build cache by excluding dynamic variables from build-time contexts - Remove COOLIFY_CONTAINER_NAME from build-time ARGs (timestamp-based, breaks cache) - Use APP_KEY instead of random_bytes for COOLIFY_BUILD_SECRETS_HASH (deterministic) - Add forBuildTime parameter to generate_coolify_env_variables() to control injection - Keep COOLIFY_CONTAINER_NAME available at runtime for container identification - Fix misleading log message about .env file purpose Fixes #7040 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-26 09:16:32 +01:00
Andras Bacsai	6d8144c18c	Merge remote-tracking branch 'origin/next' into s3-restore Resolve merge conflicts in: - bootstrap/helpers/shared.php (kept both formatBytes, isSafeTmpPath, and formatContainerStatus functions) - database/migrations/2025_10_10_120002_create_cloud_init_scripts_table.php (added Schema::hasTable check) - database/migrations/2025_10_10_120002_create_webhook_notification_settings_table.php (added Schema::hasTable check) - resources/views/livewire/project/application/general.blade.php (formatting/whitespace) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-25 09:35:37 +01:00
Andras Bacsai	e0dc12678b	fix: comprehensive SERVICE_URL/SERVICE_FQDN handling improvements and queue reliability fixes (#7275 )	2025-11-24 11:47:11 +01:00
Andras Bacsai	bf428a0e1c	fix: don't show health status for exited containers (#7317 )	2025-11-24 10:29:57 +01:00
Andras Bacsai	1149d0f746	feat: implement prerequisite validation and installation for server setup (#7297 )	2025-11-24 10:28:10 +01:00
Andras Bacsai	ac9eca3c05	fix: don't show health status for exited containers Exited containers don't run health checks, so showing "(unhealthy)" is misleading. This fix ensures exited status displays without health suffixes across all monitoring systems (SSH, Sentinel, services, etc.) and at the UI layer for backward compatibility with existing data. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-24 09:09:37 +01:00
Andras Bacsai	29135e00ba	feat: enhance prerequisite validation to return detailed results	2025-11-21 13:14:48 +01:00
Andras Bacsai	85b73a8c00	fix: initialize Collection properties to handle queue deserialization edge cases	2025-11-21 12:25:25 +01:00
Andras Bacsai	01957f2752	feat: implement prerequisite validation and installation for server setup	2025-11-21 09:49:33 +01:00
Andras Bacsai	ae6eef3cdb	feat(tests): add comprehensive tests for ContainerStatusAggregator and serverStatus accessor - Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states. - Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status. - Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components. - Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere. - Updated version number for sentinel to 0.0.17 in versions.json.	2025-11-20 17:31:07 +01:00
Andras Bacsai	14bba8ba86	fix: correct Sentinel default health status and remove debug logging This commit addresses container status reporting issues and removes debug logging: Primary Fix: - Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data - This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy" - Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown' Service Multi-Container Aggregation: - Implemented service container status aggregation (same pattern as applications) - Added serviceContainerStatuses collection to both Sentinel and SSH paths - Services now aggregate status using priority: unhealthy > unknown > healthy - Prevents race conditions where last-processed container would win Debug Logging Cleanup: - Removed all [STATUS-DEBUG] logging statements (25 total) - Removed all ray() debugging calls (3 total) - Removed proof_unknown_preserved and health_status_was_null debug fields - Code is now production-ready Test Coverage: - Added 2 new tests for Sentinel default health status behavior - Added 5 new tests for service aggregation in SSH path - All 16 tests pass (66 assertions) Note: The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:10:34 +01:00
Andras Bacsai	747a48b933	debug: add detailed Sentinel container processing logging Added comprehensive logging to track why applicationContainerStatuses collection is empty in PushServerUpdateJob. ## Logging Added ### 1. Raw Sentinel Data (line 113-118) Logs: Complete container data received from Sentinel Purpose: See exactly what Sentinel is sending Data: Container count and full container array with all labels ### 2. Container Processing Loop (line 157-163) Logs: Every container as it's being processed Purpose: Track which containers enter the processing loop Data: Container name, status, all labels, coolify.managed flag ### 3. Skipped Containers - Not Managed (line 165-171) Logs: Containers without coolify.managed label Purpose: Identify containers being filtered out early Data: Container name ### 4. Successful Container Addition (line 193-198) Logs: When container is successfully added to applicationContainerStatuses Purpose: Confirm containers ARE being processed Data: Application ID, container name, container status ### 5. Missing com.docker.compose.service Label (line 200-206) Logs: Containers skipped due to missing com.docker.compose.service Purpose: Identify the most likely root cause Data: Container name, application ID, all labels ## Why This Matters User reported applicationContainerStatuses is empty (`[]`) even though Sentinel is pushing updates. This logging will reveal: 1. Is Sentinel sending containers at all? 2. Are containers filtered by coolify.managed check? 3. Is com.docker.compose.service label missing? (most likely) 4. What labels IS Sentinel actually sending? ## Expected Findings Based on investigation, the issue is likely: - Sentinel is NOT sending com.docker.compose.service in labels - Or Sentinel uses a different label format/name - Containers pass all other checks but fail on line 190-206 ## Next Steps After logs appear, we'll see exactly which filter is blocking containers and can fix the root cause (likely need to extract com.docker.compose.service from container name or use a different label source). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 08:34:42 +01:00
Andras Bacsai	d2d9c1b2bc	debug: add comprehensive status change logging Added detailed debug logging to all status update paths to help diagnose why "unhealthy" status appears in the UI. ## Logging Added ### 1. PushServerUpdateJob (Sentinel updates) Location: Lines 303-315 Logs: Status changes from Sentinel push updates Data tracked: - Old vs new status - Container statuses that led to aggregation - Status flags (hasRunning, hasUnhealthy, hasUnknown) ### 2. GetContainersStatus (SSH updates) Location: Lines 441-449, 346-354, 358-365 Logs: Status changes from SSH-based checks Scenarios: - Normal status aggregation - Recently restarted containers (kept as degraded) - Applications not running (set to exited) Data tracked: - Old vs new status - Container statuses - Restart count and timing - Whether containers exist ### 3. Application Model Status Accessor Location: Lines 706-712, 726-732 Logs: When status is set without explicit health information Issue: Highlights cases where health defaults to "unhealthy" Data tracked: - Raw value passed to setter - Final result after default applied ## How to Use ### Enable Debug Logging Edit `.env` or `config/logging.php` to set log level to debug: ``` LOG_LEVEL=debug ``` ### Monitor Logs ```bash tail -f storage/logs/laravel.log \| grep STATUS-DEBUG ``` ### Log Format All logs use `[STATUS-DEBUG]` prefix for easy filtering: ``` [2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change { "source": "PushServerUpdateJob", "app_id": 123, "app_name": "my-app", "old_status": "running:unknown", "new_status": "running:healthy", "container_statuses": [...], "flags": {...} } ``` ## What to Look For 1. Default to unhealthy: Check Application model accessor logs 2. Status flipping: Compare timestamps between Sentinel and SSH updates 3. Incorrect aggregation: Check flags and container_statuses 4. Stale database values: Check if old_status persists across multiple logs ## Next Steps After gathering logs, we can: 1. Identify the exact source of "unhealthy" status 2. Determine if it's a default issue, aggregation bug, or timing problem 3. Apply targeted fix based on evidence 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 13:52:08 +01:00

1 2 3 4 5 ...

1384 commits