coolify

Author	SHA1	Message	Date
Andras Bacsai	6d47d24169	Fix standalone database "restarting" status flickering and add restart tracking - Fix status flickering: Track databases in active/transient states (restarting, starting, created, paused) not just running - Add isActiveOrTransient() helper to distinguish between active states and terminal states (exited, dead) - Add safeguard: Protect updateNotFoundDatabaseStatus() from marking as exited when containers collection is empty - Add restart_count tracking: New migration adds restart_count, last_restart_at, last_restart_type to all standalone database tables - Update 8 database models with $casts for new restart tracking fields - Update GetContainersStatus to extract RestartCount from Docker and update database models - Reset restart tracking when database exits completely 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-17 16:25:41 +01:00
Andras Bacsai	66e81d6d96	Fix container status display: preserve "Restarting" for applications and sub-resources Add preserveRestarting parameter to ContainerStatusAggregator to allow applications and service sub-resources to display "Restarting" status instead of being marked as "Degraded". This gives better visibility into container restart behavior. - Update ContainerStatusAggregator to accept preserveRestarting parameter (defaults to false) - Update GetContainersStatus to use preserveRestarting: true for applications and service sub-resources - Update PushServerUpdateJob to use preserveRestarting: true for applications and service sub-resources - Add comprehensive documentation explaining the parameter behavior and when to use it 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 08:23:35 +01:00
Andras Bacsai	ac9eca3c05	fix: don't show health status for exited containers Exited containers don't run health checks, so showing "(unhealthy)" is misleading. This fix ensures exited status displays without health suffixes across all monitoring systems (SSH, Sentinel, services, etc.) and at the UI layer for backward compatibility with existing data. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-24 09:09:37 +01:00
Andras Bacsai	ae6eef3cdb	feat(tests): add comprehensive tests for ContainerStatusAggregator and serverStatus accessor - Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states. - Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status. - Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components. - Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere. - Updated version number for sentinel to 0.0.17 in versions.json.	2025-11-20 17:31:07 +01:00
Andras Bacsai	14bba8ba86	fix: correct Sentinel default health status and remove debug logging This commit addresses container status reporting issues and removes debug logging: Primary Fix: - Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data - This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy" - Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown' Service Multi-Container Aggregation: - Implemented service container status aggregation (same pattern as applications) - Added serviceContainerStatuses collection to both Sentinel and SSH paths - Services now aggregate status using priority: unhealthy > unknown > healthy - Prevents race conditions where last-processed container would win Debug Logging Cleanup: - Removed all [STATUS-DEBUG] logging statements (25 total) - Removed all ray() debugging calls (3 total) - Removed proof_unknown_preserved and health_status_was_null debug fields - Code is now production-ready Test Coverage: - Added 2 new tests for Sentinel default health status behavior - Added 5 new tests for service aggregation in SSH path - All 16 tests pass (66 assertions) Note: The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:10:34 +01:00
Andras Bacsai	d2d9c1b2bc	debug: add comprehensive status change logging Added detailed debug logging to all status update paths to help diagnose why "unhealthy" status appears in the UI. ## Logging Added ### 1. PushServerUpdateJob (Sentinel updates) Location: Lines 303-315 Logs: Status changes from Sentinel push updates Data tracked: - Old vs new status - Container statuses that led to aggregation - Status flags (hasRunning, hasUnhealthy, hasUnknown) ### 2. GetContainersStatus (SSH updates) Location: Lines 441-449, 346-354, 358-365 Logs: Status changes from SSH-based checks Scenarios: - Normal status aggregation - Recently restarted containers (kept as degraded) - Applications not running (set to exited) Data tracked: - Old vs new status - Container statuses - Restart count and timing - Whether containers exist ### 3. Application Model Status Accessor Location: Lines 706-712, 726-732 Logs: When status is set without explicit health information Issue: Highlights cases where health defaults to "unhealthy" Data tracked: - Raw value passed to setter - Final result after default applied ## How to Use ### Enable Debug Logging Edit `.env` or `config/logging.php` to set log level to debug: ``` LOG_LEVEL=debug ``` ### Monitor Logs ```bash tail -f storage/logs/laravel.log \| grep STATUS-DEBUG ``` ### Log Format All logs use `[STATUS-DEBUG]` prefix for easy filtering: ``` [2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change { "source": "PushServerUpdateJob", "app_id": 123, "app_name": "my-app", "old_status": "running:unknown", "new_status": "running:healthy", "container_statuses": [...], "flags": {...} } ``` ## What to Look For 1. Default to unhealthy: Check Application model accessor logs 2. Status flipping: Compare timestamps between Sentinel and SSH updates 3. Incorrect aggregation: Check flags and container_statuses 4. Stale database values: Check if old_status persists across multiple logs ## Next Steps After gathering logs, we can: 1. Identify the exact source of "unhealthy" status 2. Determine if it's a default issue, aggregation bug, or timing problem 3. Apply targeted fix based on evidence 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 13:52:08 +01:00
Andras Bacsai	e3746a4b88	fix: preserve unknown health state and handle edge case container states This commit fixes container health status aggregation to correctly handle unknown health states and edge case container states across all resource types. Changes: 1. Preserve Unknown Health State - Add three-way priority: unhealthy > unknown > healthy - Detect containers without healthchecks (null health) as unknown - Apply across GetContainersStatus, ComplexStatusCheck, and Service models 2. Handle Edge Case Container States - Add support for: created, starting, paused, dead, removing - Map to appropriate statuses: starting (unknown), paused (unknown), degraded (unhealthy) - Prevent containers in transitional states from showing incorrect status 3. Add :excluded Suffix for Excluded Containers - Parse exclude_from_hc flag from docker-compose YAML - Append :excluded suffix to individual container statuses - Skip :excluded containers in non-excluded aggregation sections - Strip :excluded suffix in excluded aggregation sections - Makes it clear in UI which containers are excluded from monitoring Files Modified: - app/Actions/Docker/GetContainersStatus.php - app/Actions/Shared/ComplexStatusCheck.php - app/Models/Service.php - tests/Unit/ContainerHealthStatusTest.php Tests: 18 passed (82 assertions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 13:19:25 +01:00
Andras Bacsai	498b189286	fix: correct status for services with all containers excluded from health checks When all containers are excluded from health checks, display their actual status with :excluded suffix instead of misleading hardcoded statuses. This prevents broken UI state with incorrect action buttons and provides clarity that monitoring is disabled. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 10:54:51 +01:00
Andras Bacsai	23c165d4d1	fix: wrap database updates in a transaction for consistency in GetContainersStatus	2025-11-10 15:07:44 +01:00
Andras Bacsai	68a9f2ca77	feat: add container restart tracking and crash loop detection Track container restart counts from Docker and detect crash loops to provide better visibility into application health issues. - Add restart_count, last_restart_at, and last_restart_type columns to applications table - Detect restart count increases from Docker inspect data and send notifications - Show restart count badge in UI with warning icon on Logs navigation - Distinguish between crash restarts and manual restarts - Implement 30-second grace period to prevent false "exited" status during crash loops - Reset restart count on manual stop, restart, and redeploy actions - Add unit tests for restart count tracking logic This helps users quickly identify when containers are in crash loops and need attention, even when the container status flickers between states during Docker's restart backoff period.	2025-11-10 13:04:31 +01:00
Andras Bacsai	f515870f36	fix(docker): enhance container status aggregation to include restarting and exited states	2025-09-18 18:12:52 +02:00
Andras Bacsai	08d257535a	fix(docker): enhance container status aggregation for multi-container applications, including exclusion handling based on docker-compose configuration	2025-09-13 20:32:15 +02:00
Andras Bacsai	684bd823c6	fix(docker): add protection against empty container queries in GetContainersStatus to prevent unnecessary updates	2025-06-04 10:03:07 +02:00
Andras Bacsai	786bfa960f	improvement(core): simplify events for app/db/service status changes	2025-05-19 21:50:32 +02:00
Andras Bacsai	27e4882d57	feat(core): You can validate compose files with docker compose config fix(core): labels are now accepted with both compose styles refactor: remove lots of ray's	2025-02-27 11:29:04 +01:00
Andras Bacsai	1fe4dd722b	Revert "rector: arrrrr" This reverts commit `16c0cd10d8`.	2025-01-07 15:31:43 +01:00
Andras Bacsai	16c0cd10d8	rector: arrrrr	2025-01-07 14:52:08 +01:00
Andras Bacsai	319c3023dc	fix	2024-12-02 22:50:03 +01:00
Andras Bacsai	58988d3686	fix: a few inputs	2024-12-02 22:50:03 +01:00
Andras Bacsai	7dc65dfd79	fix: make sure important jobs/actions are running on high prio queue	2024-11-22 11:16:01 +01:00
Andras Bacsai	a2b6a61c4a	fix: update last online with old function	2024-11-08 09:43:46 +01:00
Andras Bacsai	e69b0ca1a9	disable tcp proxy notification	2024-11-08 09:18:43 +01:00
Andras Bacsai	ca7c214775	fix: new way to update container statuses	2024-11-03 09:02:14 +01:00
Lucas Michot	0c133b113c	Delete some useless imports	2024-10-31 16:33:49 +01:00
Andras Bacsai	ac768e5313	feat: limit storage check emails feat: sentinel should send storage usage	2024-10-22 14:01:36 +02:00
Andras Bacsai	5c93780304	remove unnecessary code	2024-10-22 12:01:46 +02:00
peaklabs-dev	8635f92ed4	Remove duplicated proxy check	2024-10-14 21:35:38 +02:00
Andras Bacsai	dedf2cf87b	fix: proxy fixes	2024-09-27 15:36:51 +02:00
Andras Bacsai	d6b4e33db3	fix: exited services statuses	2024-09-24 20:40:41 +02:00
Andras Bacsai	e4b92bb660	feat: new server checking job feat: show if the server has problems on ui	2024-08-05 15:48:15 +02:00
Thijmen	d86274cc37	Fix styling	2024-06-10 20:43:34 +00:00
Andras Bacsai	086138fbd9	fix: disable containerStopped job for now	2024-05-23 11:31:52 +02:00
Andras Bacsai	6c3b4070ba	chore: Refactor container name logic in GetContainersStatus.php and ForcePasswordReset.php	2024-05-22 14:44:11 +02:00
Andras Bacsai	a3d73634e7	feat: scheduled task failed notification	2024-05-21 15:36:26 +02:00
Andras Bacsai	98b6aec203	feat: admin view for deleting users	2024-05-21 14:29:06 +02:00
Andras Bacsai	1fb7e97700	Fix error handling in GetContainersStatus.php and increase length of stripe_comment field in migrations	2024-05-10 12:10:47 +02:00
Andras Bacsai	e91a64b1cc	Refactor GetContainersStatus.php for improved readability and maintainability	2024-05-09 12:10:06 +02:00
Andras Bacsai	ba40f93386	do not use sentinel for container details for now	2024-05-08 20:59:58 +02:00
Andras Bacsai	829e17ef2b	Refactor GetContainersStatus.php for improved readability and maintainability	2024-05-08 15:19:12 +02:00
Andras Bacsai	bc5d3bea14	Refactor GetContainersStatus.php for improved readability and maintainability	2024-05-08 15:04:13 +02:00
Andras Bacsai	c618e58a11	feat: start Sentinel on servers.	2024-05-08 14:22:35 +02:00
Andras Bacsai	3eb4aed867	chore: Refactor GetContainersStatus.php for improved readability and maintainability	2024-05-08 09:23:32 +02:00
Andras Bacsai	f6f959a897	feat: experimental sentinel	2024-05-07 15:41:50 +02:00

43 commits