Commit graph

635 commits

Author SHA1 Message Date
Andras Bacsai
bf428a0e1c
fix: don't show health status for exited containers (#7317) 2025-11-24 10:29:57 +01:00
Andras Bacsai
1149d0f746
feat: implement prerequisite validation and installation for server setup (#7297) 2025-11-24 10:28:10 +01:00
Andras Bacsai
ac9eca3c05 fix: don't show health status for exited containers
Exited containers don't run health checks, so showing "(unhealthy)" is
misleading. This fix ensures exited status displays without health
suffixes across all monitoring systems (SSH, Sentinel, services, etc.)
and at the UI layer for backward compatibility with existing data.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 09:09:37 +01:00
Andras Bacsai
29135e00ba feat: enhance prerequisite validation to return detailed results 2025-11-21 13:14:48 +01:00
Andras Bacsai
01957f2752 feat: implement prerequisite validation and installation for server setup 2025-11-21 09:49:33 +01:00
Andras Bacsai
355dcc186c
fix: correct status for excluded health check containers (#7283) 2025-11-21 09:17:26 +01:00
Andras Bacsai
ae6eef3cdb feat(tests): add comprehensive tests for ContainerStatusAggregator and serverStatus accessor
- Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states.
- Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status.
- Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components.
- Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere.
- Updated version number for sentinel to 0.0.17 in versions.json.
2025-11-20 17:31:07 +01:00
Andras Bacsai
2f3052a283 Fix database restart to skip unnecessary Docker cleanup
Prevents removal and re-download of database images on every restart. Docker cleanup was removing Docker Hub images (postgres, mysql, redis, etc.) that lack the coolify.managed=true label, causing them to be immediately re-pulled. Restart now preserves images while stopping/starting containers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 17:15:45 +01:00
Andras Bacsai
14bba8ba86 fix: correct Sentinel default health status and remove debug logging
This commit addresses container status reporting issues and removes debug logging:

**Primary Fix:**
- Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data
- This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy"
- Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown'

**Service Multi-Container Aggregation:**
- Implemented service container status aggregation (same pattern as applications)
- Added serviceContainerStatuses collection to both Sentinel and SSH paths
- Services now aggregate status using priority: unhealthy > unknown > healthy
- Prevents race conditions where last-processed container would win

**Debug Logging Cleanup:**
- Removed all [STATUS-DEBUG] logging statements (25 total)
- Removed all ray() debugging calls (3 total)
- Removed proof_unknown_preserved and health_status_was_null debug fields
- Code is now production-ready

**Test Coverage:**
- Added 2 new tests for Sentinel default health status behavior
- Added 5 new tests for service aggregation in SSH path
- All 16 tests pass (66 assertions)

**Note:** The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 11:10:34 +01:00
Andras Bacsai
d2d9c1b2bc debug: add comprehensive status change logging
Added detailed debug logging to all status update paths to help
diagnose why "unhealthy" status appears in the UI.

## Logging Added

### 1. PushServerUpdateJob (Sentinel updates)
**Location**: Lines 303-315
**Logs**: Status changes from Sentinel push updates
**Data tracked**:
- Old vs new status
- Container statuses that led to aggregation
- Status flags (hasRunning, hasUnhealthy, hasUnknown)

### 2. GetContainersStatus (SSH updates)
**Location**: Lines 441-449, 346-354, 358-365
**Logs**: Status changes from SSH-based checks
**Scenarios**:
- Normal status aggregation
- Recently restarted containers (kept as degraded)
- Applications not running (set to exited)
**Data tracked**:
- Old vs new status
- Container statuses
- Restart count and timing
- Whether containers exist

### 3. Application Model Status Accessor
**Location**: Lines 706-712, 726-732
**Logs**: When status is set without explicit health information
**Issue**: Highlights cases where health defaults to "unhealthy"
**Data tracked**:
- Raw value passed to setter
- Final result after default applied

## How to Use

### Enable Debug Logging
Edit `.env` or `config/logging.php` to set log level to debug:
```
LOG_LEVEL=debug
```

### Monitor Logs
```bash
tail -f storage/logs/laravel.log | grep STATUS-DEBUG
```

### Log Format
All logs use `[STATUS-DEBUG]` prefix for easy filtering:
```
[2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change
{
  "source": "PushServerUpdateJob",
  "app_id": 123,
  "app_name": "my-app",
  "old_status": "running:unknown",
  "new_status": "running:healthy",
  "container_statuses": [...],
  "flags": {...}
}
```

## What to Look For

1. **Default to unhealthy**: Check Application model accessor logs
2. **Status flipping**: Compare timestamps between Sentinel and SSH updates
3. **Incorrect aggregation**: Check flags and container_statuses
4. **Stale database values**: Check if old_status persists across multiple logs

## Next Steps

After gathering logs, we can:
1. Identify the exact source of "unhealthy" status
2. Determine if it's a default issue, aggregation bug, or timing problem
3. Apply targeted fix based on evidence

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:52:08 +01:00
Andras Bacsai
e3746a4b88 fix: preserve unknown health state and handle edge case container states
This commit fixes container health status aggregation to correctly handle
unknown health states and edge case container states across all resource types.

Changes:

1. **Preserve Unknown Health State**
   - Add three-way priority: unhealthy > unknown > healthy
   - Detect containers without healthchecks (null health) as unknown
   - Apply across GetContainersStatus, ComplexStatusCheck, and Service models

2. **Handle Edge Case Container States**
   - Add support for: created, starting, paused, dead, removing
   - Map to appropriate statuses: starting (unknown), paused (unknown), degraded (unhealthy)
   - Prevent containers in transitional states from showing incorrect status

3. **Add :excluded Suffix for Excluded Containers**
   - Parse exclude_from_hc flag from docker-compose YAML
   - Append :excluded suffix to individual container statuses
   - Skip :excluded containers in non-excluded aggregation sections
   - Strip :excluded suffix in excluded aggregation sections
   - Makes it clear in UI which containers are excluded from monitoring

Files Modified:
- app/Actions/Docker/GetContainersStatus.php
- app/Actions/Shared/ComplexStatusCheck.php
- app/Models/Service.php
- tests/Unit/ContainerHealthStatusTest.php

Tests: 18 passed (82 assertions)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:19:25 +01:00
Andras Bacsai
498b189286 fix: correct status for services with all containers excluded from health checks
When all containers are excluded from health checks, display their actual status
with :excluded suffix instead of misleading hardcoded statuses. This prevents
broken UI state with incorrect action buttons and provides clarity that monitoring
is disabled.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 10:54:51 +01:00
Andras Bacsai
f81640e316 fix: correct status for services with all containers excluded from health checks
When all services in a Docker Compose file have `exclude_from_hc: true`,
the status aggregation logic was returning invalid states causing broken UI.

**Problems fixed:**
- ComplexStatusCheck returned 'running:healthy' for apps with no monitored containers
- Service model returned ':' (null status) when all services excluded
- UI showed active start/stop buttons for non-running services

**Changes:**
- ComplexStatusCheck: Return 'exited:healthy' when relevantContainerCount is 0
- Service model: Return 'exited:healthy' when both status and health are null
- Added comprehensive unit tests to verify the fixes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 23:24:22 +01:00
Andras Bacsai
1270136da9 merge: merge next branch into feat-traefik-version-checker
Merged latest changes from the next branch to keep the feature branch
up to date. No conflicts were encountered during the merge.

Changes from next branch:
- Updated application deployment job error logging
- Updated server manager job and instance settings
- Removed PullHelperImageJob in favor of updated approach
- Database migration refinements
- Updated versions.json with latest component versions

All automatic merges were successful and no manual conflict resolution
was required.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:56:24 +01:00
Andras Bacsai
7a16938f0c fix(proxy): prevent "container name already in use" error during proxy restart
Add wait loops to ensure containers are fully removed before restarting.
This fixes race conditions where docker compose would fail because an
existing container was still being cleaned up.

Changes:
- StartProxy: Add explicit stop, wait loop before docker compose up
- StopProxy: Add wait loop after container removal
- Both actions now poll up to 10 seconds for complete removal
- Add error suppression to handle non-existent containers gracefully

Tests:
- Add StartProxyTest.php with 3 tests for cleanup logic
- Add StopProxyTest.php with 4 tests for stop behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 11:35:22 +01:00
Andras Bacsai
8c77c63043 feat(proxy): add Traefik version tracking with notifications and dismissible UI warnings
- Add automated Traefik version checking job running weekly on Sundays
- Implement version detection from running containers and comparison with versions.json
- Add notifications across all channels (Email, Discord, Slack, Telegram, Pushover, Webhook) for outdated versions
- Create dismissible callout component with localStorage persistence
- Display cross-branch upgrade warnings (e.g., v3.5 -> v3.6) with changelog links
- Show patch update notifications within same branch
- Add warning icon that appears when callouts are dismissed
- Prevent duplicate notifications during proxy restart by adding restarting parameter
- Fix notification spam with transition-based logic for status changes
- Enable system email settings by default in development mode
- Track last saved/applied proxy settings to detect configuration drift
2025-11-14 11:35:22 +01:00
Andras Bacsai
318cd18dde fix: remove PullHelperImageJob and mass server scheduling
Stop dispatching PullHelperImageJob to thousands of servers when the helper image version changes. Instead, rely on Docker's automatic image pulling during actual deployments and backups. Inline the helper image pull in UpdateCoolify for the single use case.

This eliminates queue flooding on cloud instances while maintaining all functionality through Docker's built-in image management.
2025-11-14 11:31:08 +01:00
Andras Bacsai
23c165d4d1 fix: wrap database updates in a transaction for consistency in GetContainersStatus 2025-11-10 15:07:44 +01:00
Andras Bacsai
f5fa09790e refactor: improve command handling and ensure correct working directory for Docker operations 2025-11-10 14:40:03 +01:00
Andras Bacsai
e63a270fea
Enhance container status tracking and improve user notifications (#7182) 2025-11-10 13:58:22 +01:00
Andras Bacsai
68a9f2ca77 feat: add container restart tracking and crash loop detection
Track container restart counts from Docker and detect crash loops to provide better visibility into application health issues.

- Add restart_count, last_restart_at, and last_restart_type columns to applications table
- Detect restart count increases from Docker inspect data and send notifications
- Show restart count badge in UI with warning icon on Logs navigation
- Distinguish between crash restarts and manual restarts
- Implement 30-second grace period to prevent false "exited" status during crash loops
- Reset restart count on manual stop, restart, and redeploy actions
- Add unit tests for restart count tracking logic

This helps users quickly identify when containers are in crash loops and need attention, even when the container status flickers between states during Docker's restart backoff period.
2025-11-10 13:04:31 +01:00
Andras Bacsai
712d60c75b feat: ensure .env file exists for docker compose and auto-inject in payloads 2025-11-07 15:20:10 +01:00
Andras Bacsai
2db122c851 fix: remove debugging output from StartPostgresql command handling 2025-11-05 09:10:15 +01:00
Andras Bacsai
5b79844a3a fix: update docker stop command to use --time instead of --timeout 2025-11-05 08:48:10 +01:00
Andras Bacsai
f315e4bd9c feat: add dev_helper_version to instance settings and update related functionality 2025-11-03 08:38:43 +01:00
Andras Bacsai
84b0ec1e94
Update app/Actions/Server/InstallDocker.php
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-10-28 09:31:32 +01:00
Andras Bacsai
fc49b9284a Add repository-based Docker installation fallbacks for all major Linux distros
This commit adds official Docker repository installation methods as fallbacks
when Rancher and get.docker.com convenience scripts fail, providing more
reliable Docker installation across all supported operating systems.

Changes:
- Add apt repository fallback for Debian-based systems (Ubuntu, Debian, Raspbian)
  - Fixes installation on Debian 13 (Trixie) where get.docker.com fails
  - Uses VERSION_CODENAME for automatic OS version detection
- Add dnf repository fallback for RHEL-based systems (CentOS, Fedora, Rocky, AlmaLinux)
- Add zypper repository fallback for SUSE-based systems (SLES, OpenSUSE)
- Refactor installation methods into dedicated private methods for better maintainability

Installation fallback chain:
1. Rancher install-docker script (preserves version pinning)
2. Docker get.docker.com convenience script
3. Official repository method (new, most reliable)

Benefits:
- Future-proof: Works with new OS releases automatically
- Production-ready: Uses Docker's recommended installation method
- Comprehensive: Covers 95%+ of Linux servers in production
- Maintainable: Clean code structure with single-responsibility methods

Fixes issue where Debian 13 (Trixie) servers fail validation because
get.docker.com script incorrectly uses numeric version "13" instead of
codename "trixie" in repository URLs.
2025-10-26 12:41:50 +01:00
Andras Bacsai
975d1b8a6b Changes auto-committed by Conductor 2025-10-16 17:13:47 +02:00
Andras Bacsai
8d280b4aac fix: prevent container name conflict when updating database port mappings
When port mappings are changed in the UI and the database is restarted,
the system now gracefully stops and removes the existing container before
recreating it with the new configuration.

This prevents the "container name already in use" error that occurred when
Docker Compose tried to create a container with the same name but different
port configuration.

Changes:
- Add graceful container stop (10s timeout) before docker compose up
- Remove old container to avoid name conflicts
- Use --timeout flag (modern Docker CLI) instead of deprecated --time
- Apply fix to all database types: MariaDB, MySQL, PostgreSQL, MongoDB,
  Redis, KeyDB, Dragonfly, and ClickHouse
- Update StopDatabase.php for consistency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 10:01:54 +02:00
Andras Bacsai
bd88bbca5b fix: streamline proxy status handling in StartProxy and Navbar components 2025-10-10 10:41:58 +02:00
Andras Bacsai
513f6b54f7 feat: implement Hetzner deletion failure notification system with email and messaging support 2025-10-10 09:35:58 +02:00
Andras Bacsai
f4e5c195fe refactor: replace direct SslCertificate queries with server relationship methods for consistency 2025-10-09 17:00:05 +02:00
Andras Bacsai
bf5c08d071 work work on hetzner integration 2025-10-09 16:54:13 +02:00
Andras Bacsai
704ddf2968 improved hetzner features 2025-10-09 12:53:57 +02:00
Andras Bacsai
215301fa8f basics of adding / removing hetzner servers 2025-10-09 10:41:29 +02:00
Andras Bacsai
cef3d3af5d feat(proxy): enhance proxy configuration regeneration by extracting custom commands
- Added a new function to extract custom proxy commands from existing Traefik configurations before regenerating the proxy configuration.
- Updated the proxy configuration generation logic to include these custom commands, ensuring they are preserved during regeneration.
- Introduced unit tests to validate the extraction of custom commands and handle various scenarios, including invalid YAML and different proxy types.
2025-10-07 11:11:13 +02:00
Andras Bacsai
a8bdc3bbfe fix(application): increase docker stop timeout from 10 to 30 seconds for better application shutdown handling 2025-09-29 12:16:13 +02:00
Andras Bacsai
cd2d4070d3 fix(application): reduce docker stop timeout from 30 to 10 seconds for improved application shutdown efficiency 2025-09-28 23:11:58 +02:00
Andras Bacsai
f515870f36 fix(docker): enhance container status aggregation to include restarting and exited states 2025-09-18 18:12:52 +02:00
Andras Bacsai
393745b68c Revert "refactor(file-transfer): replace base64 encoding with direct file transfer method across multiple database actions for improved clarity and efficiency"
This reverts commit 18068857b1.
2025-09-15 17:55:08 +02:00
Andras Bacsai
4027c1426c feat(sentinel): add support for custom Docker images in StartSentinel and related methods 2025-09-14 19:21:55 +02:00
Andras Bacsai
08d257535a fix(docker): enhance container status aggregation for multi-container applications, including exclusion handling based on docker-compose configuration 2025-09-13 20:32:15 +02:00
Andras Bacsai
a6a4fd39bb chore(cleanup): remove deprecated ServerCheck and related job classes to streamline codebase 2025-09-13 18:35:39 +02:00
Andras Bacsai
a2a2bfa6c9 feat(user-management): implement user deletion command with phased resource and subscription cancellation, including dry run option 2025-09-13 15:08:30 +02:00
Andras Bacsai
1c08d32b85 refactor(database): remove volume_configuration_dir and streamline configuration directory usage in MongoDB and PostgreSQL handlers 2025-09-10 16:12:53 +02:00
Andras Bacsai
1ca94b90da fix(proxy): replace CheckConfiguration with GetProxyConfiguration and SaveConfiguration with SaveProxyConfiguration for improved clarity and consistency in proxy management 2025-09-09 12:52:19 +02:00
Andras Bacsai
18068857b1 refactor(file-transfer): replace base64 encoding with direct file transfer method across multiple database actions for improved clarity and efficiency 2025-09-08 14:04:24 +02:00
Andras Bacsai
852b2688d9 refactor(error-handling): remove ray debugging statements from CheckUpdates and shared helper functions to clean up error reporting 2025-09-08 14:03:27 +02:00
Andras Bacsai
9c3345318a fix(user): ensure email attributes are stored in lowercase for consistency and prevent case-related issues 2025-09-05 17:44:34 +02:00
Andras Bacsai
83f2e856ec feat(sentinel): implement SentinelRestarted event and update Livewire components to handle server restart notifications 2025-08-26 10:27:38 +02:00