Commit graph

1343 commits

Author SHA1 Message Date
Andras Bacsai
6d8144c18c Merge remote-tracking branch 'origin/next' into s3-restore
Resolve merge conflicts in:
- bootstrap/helpers/shared.php (kept both formatBytes, isSafeTmpPath, and formatContainerStatus functions)
- database/migrations/2025_10_10_120002_create_cloud_init_scripts_table.php (added Schema::hasTable check)
- database/migrations/2025_10_10_120002_create_webhook_notification_settings_table.php (added Schema::hasTable check)
- resources/views/livewire/project/application/general.blade.php (formatting/whitespace)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 09:35:37 +01:00
Andras Bacsai
e0dc12678b
fix: comprehensive SERVICE_URL/SERVICE_FQDN handling improvements and queue reliability fixes (#7275) 2025-11-24 11:47:11 +01:00
Andras Bacsai
bf428a0e1c
fix: don't show health status for exited containers (#7317) 2025-11-24 10:29:57 +01:00
Andras Bacsai
1149d0f746
feat: implement prerequisite validation and installation for server setup (#7297) 2025-11-24 10:28:10 +01:00
Andras Bacsai
ac9eca3c05 fix: don't show health status for exited containers
Exited containers don't run health checks, so showing "(unhealthy)" is
misleading. This fix ensures exited status displays without health
suffixes across all monitoring systems (SSH, Sentinel, services, etc.)
and at the UI layer for backward compatibility with existing data.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 09:09:37 +01:00
Andras Bacsai
29135e00ba feat: enhance prerequisite validation to return detailed results 2025-11-21 13:14:48 +01:00
Andras Bacsai
85b73a8c00 fix: initialize Collection properties to handle queue deserialization edge cases 2025-11-21 12:25:25 +01:00
Andras Bacsai
01957f2752 feat: implement prerequisite validation and installation for server setup 2025-11-21 09:49:33 +01:00
Andras Bacsai
ae6eef3cdb feat(tests): add comprehensive tests for ContainerStatusAggregator and serverStatus accessor
- Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states.
- Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status.
- Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components.
- Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere.
- Updated version number for sentinel to 0.0.17 in versions.json.
2025-11-20 17:31:07 +01:00
Andras Bacsai
14bba8ba86 fix: correct Sentinel default health status and remove debug logging
This commit addresses container status reporting issues and removes debug logging:

**Primary Fix:**
- Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data
- This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy"
- Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown'

**Service Multi-Container Aggregation:**
- Implemented service container status aggregation (same pattern as applications)
- Added serviceContainerStatuses collection to both Sentinel and SSH paths
- Services now aggregate status using priority: unhealthy > unknown > healthy
- Prevents race conditions where last-processed container would win

**Debug Logging Cleanup:**
- Removed all [STATUS-DEBUG] logging statements (25 total)
- Removed all ray() debugging calls (3 total)
- Removed proof_unknown_preserved and health_status_was_null debug fields
- Code is now production-ready

**Test Coverage:**
- Added 2 new tests for Sentinel default health status behavior
- Added 5 new tests for service aggregation in SSH path
- All 16 tests pass (66 assertions)

**Note:** The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 11:10:34 +01:00
Andras Bacsai
747a48b933 debug: add detailed Sentinel container processing logging
Added comprehensive logging to track why applicationContainerStatuses
collection is empty in PushServerUpdateJob.

## Logging Added

### 1. Raw Sentinel Data (line 113-118)
**Logs**: Complete container data received from Sentinel
**Purpose**: See exactly what Sentinel is sending
**Data**: Container count and full container array with all labels

### 2. Container Processing Loop (line 157-163)
**Logs**: Every container as it's being processed
**Purpose**: Track which containers enter the processing loop
**Data**: Container name, status, all labels, coolify.managed flag

### 3. Skipped Containers - Not Managed (line 165-171)
**Logs**: Containers without coolify.managed label
**Purpose**: Identify containers being filtered out early
**Data**: Container name

### 4. Successful Container Addition (line 193-198)
**Logs**: When container is successfully added to applicationContainerStatuses
**Purpose**: Confirm containers ARE being processed
**Data**: Application ID, container name, container status

### 5. Missing com.docker.compose.service Label (line 200-206)
**Logs**: Containers skipped due to missing com.docker.compose.service
**Purpose**: Identify the most likely root cause
**Data**: Container name, application ID, all labels

## Why This Matters

User reported applicationContainerStatuses is empty (`[]`) even though
Sentinel is pushing updates. This logging will reveal:

1. Is Sentinel sending containers at all?
2. Are containers filtered by coolify.managed check?
3. Is com.docker.compose.service label missing? (most likely)
4. What labels IS Sentinel actually sending?

## Expected Findings

Based on investigation, the issue is likely:
- Sentinel is NOT sending com.docker.compose.service in labels
- Or Sentinel uses a different label format/name
- Containers pass all other checks but fail on line 190-206

## Next Steps

After logs appear, we'll see exactly which filter is blocking containers
and can fix the root cause (likely need to extract com.docker.compose.service
from container name or use a different label source).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 08:34:42 +01:00
Andras Bacsai
d2d9c1b2bc debug: add comprehensive status change logging
Added detailed debug logging to all status update paths to help
diagnose why "unhealthy" status appears in the UI.

## Logging Added

### 1. PushServerUpdateJob (Sentinel updates)
**Location**: Lines 303-315
**Logs**: Status changes from Sentinel push updates
**Data tracked**:
- Old vs new status
- Container statuses that led to aggregation
- Status flags (hasRunning, hasUnhealthy, hasUnknown)

### 2. GetContainersStatus (SSH updates)
**Location**: Lines 441-449, 346-354, 358-365
**Logs**: Status changes from SSH-based checks
**Scenarios**:
- Normal status aggregation
- Recently restarted containers (kept as degraded)
- Applications not running (set to exited)
**Data tracked**:
- Old vs new status
- Container statuses
- Restart count and timing
- Whether containers exist

### 3. Application Model Status Accessor
**Location**: Lines 706-712, 726-732
**Logs**: When status is set without explicit health information
**Issue**: Highlights cases where health defaults to "unhealthy"
**Data tracked**:
- Raw value passed to setter
- Final result after default applied

## How to Use

### Enable Debug Logging
Edit `.env` or `config/logging.php` to set log level to debug:
```
LOG_LEVEL=debug
```

### Monitor Logs
```bash
tail -f storage/logs/laravel.log | grep STATUS-DEBUG
```

### Log Format
All logs use `[STATUS-DEBUG]` prefix for easy filtering:
```
[2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change
{
  "source": "PushServerUpdateJob",
  "app_id": 123,
  "app_name": "my-app",
  "old_status": "running:unknown",
  "new_status": "running:healthy",
  "container_statuses": [...],
  "flags": {...}
}
```

## What to Look For

1. **Default to unhealthy**: Check Application model accessor logs
2. **Status flipping**: Compare timestamps between Sentinel and SSH updates
3. **Incorrect aggregation**: Check flags and container_statuses
4. **Stale database values**: Check if old_status persists across multiple logs

## Next Steps

After gathering logs, we can:
1. Identify the exact source of "unhealthy" status
2. Determine if it's a default issue, aggregation bug, or timing problem
3. Apply targeted fix based on evidence

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:52:08 +01:00
Andras Bacsai
6b62847a11 fix: preserve unknown health status in Sentinel updates (PushServerUpdateJob)
## Problem
Services with "running (unknown)" status were periodically changing
to "running (healthy)" every ~30 seconds when Sentinel pushed updates.
This was confusing for users and inconsistent with SSH-based status checks.

## Root Cause
`PushServerUpdateJob::aggregateMultiContainerStatuses()` was missing
logic to track "unknown" health state. It only tracked "unhealthy" and
defaulted everything else to "healthy".

When Sentinel pushed updates with "running (unknown)" containers:
- The job saw `hasRunning = true` and `hasUnhealthy = false`
- It incorrectly returned "running (healthy)" instead of "running (unknown)"

## Solution
Updated `PushServerUpdateJob` to match the logic in `GetContainersStatus`:

1. Added `$hasUnknown` tracking variable
2. Check for "unknown" in status strings (alongside "unhealthy")
3. Implement 3-way priority: unhealthy > unknown > healthy

This ensures consistency between:
- SSH-based updates (`GetContainersStatus`)
- Sentinel-based updates (`PushServerUpdateJob`)
- UI display logic

## Changes
- **app/Jobs/PushServerUpdateJob.php**: Added unknown status tracking
- **tests/Unit/PushServerUpdateJobStatusAggregationTest.php**: New comprehensive tests
- **tests/Unit/ExcludeFromHealthCheckTest.php**: Updated to match current implementation

## Testing
All 31 status-related unit tests passing:
- 18 tests in ContainerHealthStatusTest
- 8 tests in ExcludeFromHealthCheckTest (updated)
- 6 tests in PushServerUpdateJobStatusAggregationTest (new)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:40:58 +01:00
Andras Bacsai
cfe3f5d8b9 fix: normalize preview paths and use BUILD_TIME_ENV_PATH constant
- Fix double-slash issue in Docker Compose preview paths when baseDirectory is "/"
- Normalize baseDirectory using rtrim() to prevent path concatenation issues
- Replace hardcoded '/artifacts/build-time.env' with ApplicationDeploymentJob::BUILD_TIME_ENV_PATH
- Make BUILD_TIME_ENV_PATH constant public for reusability
- Add comprehensive unit tests (11 test cases, 25 assertions)

Fixes preview path generation in:
- getDockerComposeBuildCommandPreviewProperty()
- getDockerComposeStartCommandPreviewProperty()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:54:21 +01:00
Andras Bacsai
37c3cd9f4e fix: auto-inject environment variables into custom Docker Compose commands 2025-11-18 14:54:21 +01:00
Andras Bacsai
1094ab7a46 fix: inject environment variables into custom Docker Compose build commands
When using a custom Docker Compose build command, environment variables
were being lost because the --env-file flag was not included. This fix
automatically injects the --env-file flag to ensure build-time environment
variables are available during custom builds.

Changes:
- Auto-inject --env-file /artifacts/build-time.env after docker compose
- Respect user-provided --env-file flags (no duplication)
- Append build arguments when not using build secrets
- Update UI helper text to inform users about automatic env injection
- Add comprehensive unit tests (7 test cases, all passing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:54:21 +01:00
Andras Bacsai
50d55a9509 refactor: send immediate Traefik version notifications instead of delayed aggregation
Move notification logic from NotifyOutdatedTraefikServersJob into CheckTraefikVersionForServerJob to send immediate notifications when outdated Traefik is detected. This is more suitable for cloud environments with thousands of servers.

Changes:
- CheckTraefikVersionForServerJob now sends notifications immediately after detecting outdated Traefik
- Remove NotifyOutdatedTraefikServersJob (no longer needed)
- Remove delay calculation logic from CheckTraefikVersionJob
- Update tests to reflect new immediate notification pattern

Trade-offs:
- Pro: Faster notifications (immediate alerts)
- Pro: Simpler codebase (removed complex delay calculation)
- Pro: Better scalability for thousands of servers
- Con: Teams may receive multiple notifications if they have many outdated servers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:54:21 +01:00
Andras Bacsai
f75bb61d21 refactor(CheckTraefikVersionForServerJob): remove unnecessary onQueue assignment in constructor 2025-11-18 14:54:21 +01:00
Andras Bacsai
d2d56ac6b4 refactor(proxy): simplify getNewerBranchInfo method parameters and streamline version checks 2025-11-18 14:53:49 +01:00
Andras Bacsai
7dfe33d1c9 refactor(proxy): implement centralized caching for versions.json and improve UX
This commit introduces several improvements to the Traefik version tracking
feature and proxy configuration UI:

## Caching Improvements

1. **New centralized helper functions** (bootstrap/helpers/versions.php):
   - `get_versions_data()`: Redis-cached access to versions.json (1 hour TTL)
   - `get_traefik_versions()`: Extract Traefik versions from cached data
   - `invalidate_versions_cache()`: Clear cache when file is updated

2. **Performance optimization**:
   - Single Redis cache key: `coolify:versions:all`
   - Eliminates 2-4 file reads per page load
   - 95-97.5% reduction in disk I/O time
   - Shared cache across all servers in distributed setup

3. **Updated all consumers to use cached helpers**:
   - CheckTraefikVersionJob: Use get_traefik_versions()
   - Server/Proxy: Two-level caching (Redis + in-memory per-request)
   - CheckForUpdatesJob: Auto-invalidate cache after updating file
   - bootstrap/helpers/shared.php: Use cached data for Coolify version

## UI/UX Improvements

1. **Navbar warning indicator**:
   - Added yellow warning triangle icon next to "Proxy" menu item
   - Appears when server has outdated Traefik version
   - Uses existing traefik_outdated_info data for instant checks
   - Provides at-a-glance visibility of version issues

2. **Proxy sidebar persistence**:
   - Fixed sidebar disappearing when clicking "Switch Proxy"
   - Configuration link now always visible (needed for proxy selection)
   - Dynamic Configurations and Logs only show when proxy is configured
   - Better navigation context during proxy switching workflow

## Code Quality

- Added comprehensive PHPDoc for Server::$traefik_outdated_info property
- Improved code organization with centralized helper approach
- All changes formatted with Laravel Pint
- Maintains backward compatibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:53:49 +01:00
Andras Bacsai
6dbe58f22b feat(proxy): enhance Traefik version notifications to show patch and minor upgrades
- Store both patch update and newer minor version information simultaneously
- Display patch update availability alongside minor version upgrades in notifications
- Add newer_branch_target and newer_branch_latest fields to traefik_outdated_info
- Update all notification channels (Discord, Telegram, Slack, Pushover, Email, Webhook)
- Show minor version in format (e.g., v3.6) for upgrade targets instead of patch version
- Enhance UI callouts with clearer messaging about available upgrades
- Remove verbose logging in favor of cleaner code structure
- Handle edge case where SSH command returns empty response

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:53:49 +01:00
Andras Bacsai
c77eaddede refactor(proxy): implement parallel processing for Traefik version checks
Addresses critical performance issues identified in code review by refactoring the monolithic CheckTraefikVersionJob into a distributed architecture with parallel processing.

Changes:
- Split version checking into CheckTraefikVersionForServerJob for parallel execution
- Extract notification logic into NotifyOutdatedTraefikServersJob
- Dispatch individual server checks concurrently to handle thousands of servers
- Add comprehensive unit tests for the new job architecture
- Update feature tests to cover the refactored workflow

Performance improvements:
- Sequential SSH calls replaced with parallel queue jobs
- Scales efficiently for large installations with thousands of servers
- Reduces job execution time from hours to minutes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:53:49 +01:00
Andras Bacsai
1dacb94860 fix(performance): eliminate N+1 query in CheckTraefikVersionJob
This commit fixes a critical N+1 query issue in CheckTraefikVersionJob
that was loading ALL proxy servers into memory then filtering in PHP,
causing potential OOM errors with thousands of servers.

Changes:
- Added scopeWhereProxyType() query scope to Server model for
  database-level filtering using JSON column arrow notation
- Updated CheckTraefikVersionJob to use new scope instead of
  collection filter, moving proxy type filtering into the SQL query
- Added comprehensive unit tests for the new query scope

Performance impact:
- Before: SELECT * FROM servers WHERE proxy IS NOT NULL (all servers)
- After: SELECT * FROM servers WHERE proxy->>'type' = 'TRAEFIK' (filtered)
- Eliminates memory overhead of loading non-Traefik servers
- Critical for cloud instances with thousands of connected servers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:53:49 +01:00
Andras Bacsai
0bd4ffb2d7 feat(proxy): add Traefik version tracking with notifications and dismissible UI warnings
- Add automated Traefik version checking job running weekly on Sundays
- Implement version detection from running containers and comparison with versions.json
- Add notifications across all channels (Email, Discord, Slack, Telegram, Pushover, Webhook) for outdated versions
- Create dismissible callout component with localStorage persistence
- Display cross-branch upgrade warnings (e.g., v3.5 -> v3.6) with changelog links
- Show patch update notifications within same branch
- Add warning icon that appears when callouts are dismissed
- Prevent duplicate notifications during proxy restart by adding restarting parameter
- Fix notification spam with transition-based logic for status changes
- Enable system email settings by default in development mode
- Track last saved/applied proxy settings to detect configuration drift
2025-11-18 14:53:49 +01:00
Andras Bacsai
0e66adc376 fix: normalize preview paths and use BUILD_TIME_ENV_PATH constant
- Fix double-slash issue in Docker Compose preview paths when baseDirectory is "/"
- Normalize baseDirectory using rtrim() to prevent path concatenation issues
- Replace hardcoded '/artifacts/build-time.env' with ApplicationDeploymentJob::BUILD_TIME_ENV_PATH
- Make BUILD_TIME_ENV_PATH constant public for reusability
- Add comprehensive unit tests (11 test cases, 25 assertions)

Fixes preview path generation in:
- getDockerComposeBuildCommandPreviewProperty()
- getDockerComposeStartCommandPreviewProperty()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 13:48:06 +01:00
Andras Bacsai
274c37e333 fix: auto-inject environment variables into custom Docker Compose commands 2025-11-18 13:07:54 +01:00
Andras Bacsai
f8e3bb54a3 fix: inject environment variables into custom Docker Compose build commands
When using a custom Docker Compose build command, environment variables
were being lost because the --env-file flag was not included. This fix
automatically injects the --env-file flag to ensure build-time environment
variables are available during custom builds.

Changes:
- Auto-inject --env-file /artifacts/build-time.env after docker compose
- Respect user-provided --env-file flags (no duplication)
- Append build arguments when not using build secrets
- Update UI helper text to inform users about automatic env injection
- Add comprehensive unit tests (7 test cases, all passing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 13:07:54 +01:00
Andras Bacsai
4f2d39af03 refactor: send immediate Traefik version notifications instead of delayed aggregation
Move notification logic from NotifyOutdatedTraefikServersJob into CheckTraefikVersionForServerJob to send immediate notifications when outdated Traefik is detected. This is more suitable for cloud environments with thousands of servers.

Changes:
- CheckTraefikVersionForServerJob now sends notifications immediately after detecting outdated Traefik
- Remove NotifyOutdatedTraefikServersJob (no longer needed)
- Remove delay calculation logic from CheckTraefikVersionJob
- Update tests to reflect new immediate notification pattern

Trade-offs:
- Pro: Faster notifications (immediate alerts)
- Pro: Simpler codebase (removed complex delay calculation)
- Pro: Better scalability for thousands of servers
- Con: Teams may receive multiple notifications if they have many outdated servers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 12:30:50 +01:00
Andras Bacsai
e97222aef2 refactor(CheckTraefikVersionForServerJob): remove unnecessary onQueue assignment in constructor 2025-11-18 10:44:01 +01:00
Andras Bacsai
680b9a2c10
Merge branch 'next' into s3-restore 2025-11-17 15:39:22 +01:00
Andras Bacsai
f8dd44410a refactor(proxy): simplify getNewerBranchInfo method parameters and streamline version checks 2025-11-17 15:03:30 +01:00
Andras Bacsai
1270136da9 merge: merge next branch into feat-traefik-version-checker
Merged latest changes from the next branch to keep the feature branch
up to date. No conflicts were encountered during the merge.

Changes from next branch:
- Updated application deployment job error logging
- Updated server manager job and instance settings
- Removed PullHelperImageJob in favor of updated approach
- Database migration refinements
- Updated versions.json with latest component versions

All automatic merges were successful and no manual conflict resolution
was required.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:56:24 +01:00
Andras Bacsai
5d73b76a44 refactor(proxy): implement centralized caching for versions.json and improve UX
This commit introduces several improvements to the Traefik version tracking
feature and proxy configuration UI:

## Caching Improvements

1. **New centralized helper functions** (bootstrap/helpers/versions.php):
   - `get_versions_data()`: Redis-cached access to versions.json (1 hour TTL)
   - `get_traefik_versions()`: Extract Traefik versions from cached data
   - `invalidate_versions_cache()`: Clear cache when file is updated

2. **Performance optimization**:
   - Single Redis cache key: `coolify:versions:all`
   - Eliminates 2-4 file reads per page load
   - 95-97.5% reduction in disk I/O time
   - Shared cache across all servers in distributed setup

3. **Updated all consumers to use cached helpers**:
   - CheckTraefikVersionJob: Use get_traefik_versions()
   - Server/Proxy: Two-level caching (Redis + in-memory per-request)
   - CheckForUpdatesJob: Auto-invalidate cache after updating file
   - bootstrap/helpers/shared.php: Use cached data for Coolify version

## UI/UX Improvements

1. **Navbar warning indicator**:
   - Added yellow warning triangle icon next to "Proxy" menu item
   - Appears when server has outdated Traefik version
   - Uses existing traefik_outdated_info data for instant checks
   - Provides at-a-glance visibility of version issues

2. **Proxy sidebar persistence**:
   - Fixed sidebar disappearing when clicking "Switch Proxy"
   - Configuration link now always visible (needed for proxy selection)
   - Dynamic Configurations and Logs only show when proxy is configured
   - Better navigation context during proxy switching workflow

## Code Quality

- Added comprehensive PHPDoc for Server::$traefik_outdated_info property
- Improved code organization with centralized helper approach
- All changes formatted with Laravel Pint
- Maintains backward compatibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:53:28 +01:00
Andras Bacsai
b602fef4db fix(deployment): improve error logging with exception types and hidden technical details
- Add exception class names to error messages for better debugging
- Mark technical details (error type, code, location, stack trace) as hidden in logs
- Preserve original exception types when wrapping in DeploymentException
- Update ServerManagerJob to include exception class in log messages
- Enhance unit tests to verify hidden log entry behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:44:39 +01:00
Andras Bacsai
648e111f10 Merge remote-tracking branch 'origin/next' into s3-restore
# Conflicts:
#	app/Models/InstanceSettings.php
2025-11-17 14:30:00 +01:00
Andras Bacsai
fbdd8e5f03 fix: improve robustness and security in database restore flows
- Add null checks for server instances in restore events to prevent errors
- Escape S3 credentials to prevent command injection vulnerabilities
- Fix file upload clearing custom location to prevent UI confusion
- Optimize isSafeTmpPath helper by avoiding redundant dirname calls
- Remove unnecessary --rm flag from long-running S3 restore container
- Prioritize uploaded files over custom location in import logic
- Add comprehensive unit tests for restore event null server handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:13:10 +01:00
Andras Bacsai
97550f4066 fix(deployment): eliminate duplicate error logging in deployment methods
Wraps rolling_update(), health_check(), stop_running_container(), and
start_by_compose_file() with try-catch to ensure comprehensive error logging
happens in one place. Removes duplicate logging from intermediate catch blocks
since the failed() method already provides full error details including stack trace
and chained exception information.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 10:52:09 +01:00
Andras Bacsai
94560ea6c7 feat: streamline S3 restore with single-step flow and improved UI consistency
Major architectural improvements:
- Merged download and restore into single atomic operation
- Eliminated separate S3DownloadFinished event (redundant)
- Files now transfer directly: S3 → helper container → server → database container
- Removed download progress tracking in favor of unified restore progress

UI/UX improvements:
- Unified restore method selection with visual cards
- Consistent "File Information" display between local and S3 restore
- Single slide-over for all restore operations (removed separate S3 download monitor)
- Better visual feedback with loading states

Security enhancements:
- Added isSafeTmpPath() helper for path traversal protection
- URL decode validation to catch encoded attacks
- Canonical path resolution to prevent symlink attacks
- Comprehensive path validation in all cleanup events

Cleanup improvements:
- S3RestoreJobFinished now handles all cleanup (helper container + all temp files)
- RestoreJobFinished uses new isSafeTmpPath() validation
- CoolifyTask dispatches cleanup events even on job failure
- All cleanup uses non-throwing commands (2>/dev/null || true)

Other improvements:
- S3 storage policy authorization on Show component
- Storage Form properly syncs is_usable state after test
- Removed debug code and improved error handling
- Better command organization and documentation

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 10:05:18 +01:00
Andras Bacsai
6593b2a553 feat(proxy): enhance Traefik version notifications to show patch and minor upgrades
- Store both patch update and newer minor version information simultaneously
- Display patch update availability alongside minor version upgrades in notifications
- Add newer_branch_target and newer_branch_latest fields to traefik_outdated_info
- Update all notification channels (Discord, Telegram, Slack, Pushover, Email, Webhook)
- Show minor version in format (e.g., v3.6) for upgrade targets instead of patch version
- Enhance UI callouts with clearer messaging about available upgrades
- Remove verbose logging in favor of cleaner code structure
- Handle edge case where SSH command returns empty response

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 09:59:17 +01:00
Andras Bacsai
cc6a538fca refactor(proxy): implement parallel processing for Traefik version checks
Addresses critical performance issues identified in code review by refactoring the monolithic CheckTraefikVersionJob into a distributed architecture with parallel processing.

Changes:
- Split version checking into CheckTraefikVersionForServerJob for parallel execution
- Extract notification logic into NotifyOutdatedTraefikServersJob
- Dispatch individual server checks concurrently to handle thousands of servers
- Add comprehensive unit tests for the new job architecture
- Update feature tests to cover the refactored workflow

Performance improvements:
- Sequential SSH calls replaced with parallel queue jobs
- Scales efficiently for large installations with thousands of servers
- Reduces job execution time from hours to minutes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 11:42:58 +01:00
Andras Bacsai
11a7f4c8a7 fix(performance): eliminate N+1 query in CheckTraefikVersionJob
This commit fixes a critical N+1 query issue in CheckTraefikVersionJob
that was loading ALL proxy servers into memory then filtering in PHP,
causing potential OOM errors with thousands of servers.

Changes:
- Added scopeWhereProxyType() query scope to Server model for
  database-level filtering using JSON column arrow notation
- Updated CheckTraefikVersionJob to use new scope instead of
  collection filter, moving proxy type filtering into the SQL query
- Added comprehensive unit tests for the new query scope

Performance impact:
- Before: SELECT * FROM servers WHERE proxy IS NOT NULL (all servers)
- After: SELECT * FROM servers WHERE proxy->>'type' = 'TRAEFIK' (filtered)
- Eliminates memory overhead of loading non-Traefik servers
- Critical for cloud instances with thousands of connected servers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 11:35:22 +01:00
Andras Bacsai
8c77c63043 feat(proxy): add Traefik version tracking with notifications and dismissible UI warnings
- Add automated Traefik version checking job running weekly on Sundays
- Implement version detection from running containers and comparison with versions.json
- Add notifications across all channels (Email, Discord, Slack, Telegram, Pushover, Webhook) for outdated versions
- Create dismissible callout component with localStorage persistence
- Display cross-branch upgrade warnings (e.g., v3.5 -> v3.6) with changelog links
- Show patch update notifications within same branch
- Add warning icon that appears when callouts are dismissed
- Prevent duplicate notifications during proxy restart by adding restarting parameter
- Fix notification spam with transition-based logic for status changes
- Enable system email settings by default in development mode
- Track last saved/applied proxy settings to detect configuration drift
2025-11-14 11:35:22 +01:00
Andras Bacsai
318cd18dde fix: remove PullHelperImageJob and mass server scheduling
Stop dispatching PullHelperImageJob to thousands of servers when the helper image version changes. Instead, rely on Docker's automatic image pulling during actual deployments and backups. Inline the helper image pull in UpdateCoolify for the single use case.

This eliminates queue flooding on cloud instances while maintaining all functionality through Docker's built-in image management.
2025-11-14 11:31:08 +01:00
Andras Bacsai
49a3bb0daf refactor(DatabaseBackupJob): remove retry attempts and backoff logic for job execution 2025-11-11 15:39:01 +01:00
Andras Bacsai
644df223dc fix(ScheduledTaskJob): make server property nullable and update logging to handle null values 2025-11-11 15:38:55 +01:00
Andras Bacsai
334892d1ff feat(BackupNotification): include database name in BackupFailed notification for better context 2025-11-11 15:27:57 +01:00
Andras Bacsai
133d6a0349 feat(DeploymentException): add custom exception for deployment errors and update handler to exclude from reporting 2025-11-11 15:08:26 +01:00
Andras Bacsai
64c7d301ce feat(DatabaseBackupJob, ScheduledTaskJob): enforce minimum timeout and add execution ID for timeout handling 2025-11-11 12:07:39 +01:00
Andras Bacsai
104e68a9ac
Merge branch 'next' into improve-scheduled-tasks 2025-11-11 11:38:04 +01:00
Andras Bacsai
e79316c8b5
Update app/Jobs/DeleteResourceJob.php
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-11 11:35:30 +01:00