Commit graph

1317 commits

Author SHA1 Message Date
Andras Bacsai
37c3cd9f4e fix: auto-inject environment variables into custom Docker Compose commands 2025-11-18 14:54:21 +01:00
Andras Bacsai
1094ab7a46 fix: inject environment variables into custom Docker Compose build commands
When using a custom Docker Compose build command, environment variables
were being lost because the --env-file flag was not included. This fix
automatically injects the --env-file flag to ensure build-time environment
variables are available during custom builds.

Changes:
- Auto-inject --env-file /artifacts/build-time.env after docker compose
- Respect user-provided --env-file flags (no duplication)
- Append build arguments when not using build secrets
- Update UI helper text to inform users about automatic env injection
- Add comprehensive unit tests (7 test cases, all passing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:54:21 +01:00
Andras Bacsai
50d55a9509 refactor: send immediate Traefik version notifications instead of delayed aggregation
Move notification logic from NotifyOutdatedTraefikServersJob into CheckTraefikVersionForServerJob to send immediate notifications when outdated Traefik is detected. This is more suitable for cloud environments with thousands of servers.

Changes:
- CheckTraefikVersionForServerJob now sends notifications immediately after detecting outdated Traefik
- Remove NotifyOutdatedTraefikServersJob (no longer needed)
- Remove delay calculation logic from CheckTraefikVersionJob
- Update tests to reflect new immediate notification pattern

Trade-offs:
- Pro: Faster notifications (immediate alerts)
- Pro: Simpler codebase (removed complex delay calculation)
- Pro: Better scalability for thousands of servers
- Con: Teams may receive multiple notifications if they have many outdated servers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:54:21 +01:00
Andras Bacsai
f75bb61d21 refactor(CheckTraefikVersionForServerJob): remove unnecessary onQueue assignment in constructor 2025-11-18 14:54:21 +01:00
Andras Bacsai
d2d56ac6b4 refactor(proxy): simplify getNewerBranchInfo method parameters and streamline version checks 2025-11-18 14:53:49 +01:00
Andras Bacsai
7dfe33d1c9 refactor(proxy): implement centralized caching for versions.json and improve UX
This commit introduces several improvements to the Traefik version tracking
feature and proxy configuration UI:

## Caching Improvements

1. **New centralized helper functions** (bootstrap/helpers/versions.php):
   - `get_versions_data()`: Redis-cached access to versions.json (1 hour TTL)
   - `get_traefik_versions()`: Extract Traefik versions from cached data
   - `invalidate_versions_cache()`: Clear cache when file is updated

2. **Performance optimization**:
   - Single Redis cache key: `coolify:versions:all`
   - Eliminates 2-4 file reads per page load
   - 95-97.5% reduction in disk I/O time
   - Shared cache across all servers in distributed setup

3. **Updated all consumers to use cached helpers**:
   - CheckTraefikVersionJob: Use get_traefik_versions()
   - Server/Proxy: Two-level caching (Redis + in-memory per-request)
   - CheckForUpdatesJob: Auto-invalidate cache after updating file
   - bootstrap/helpers/shared.php: Use cached data for Coolify version

## UI/UX Improvements

1. **Navbar warning indicator**:
   - Added yellow warning triangle icon next to "Proxy" menu item
   - Appears when server has outdated Traefik version
   - Uses existing traefik_outdated_info data for instant checks
   - Provides at-a-glance visibility of version issues

2. **Proxy sidebar persistence**:
   - Fixed sidebar disappearing when clicking "Switch Proxy"
   - Configuration link now always visible (needed for proxy selection)
   - Dynamic Configurations and Logs only show when proxy is configured
   - Better navigation context during proxy switching workflow

## Code Quality

- Added comprehensive PHPDoc for Server::$traefik_outdated_info property
- Improved code organization with centralized helper approach
- All changes formatted with Laravel Pint
- Maintains backward compatibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:53:49 +01:00
Andras Bacsai
6dbe58f22b feat(proxy): enhance Traefik version notifications to show patch and minor upgrades
- Store both patch update and newer minor version information simultaneously
- Display patch update availability alongside minor version upgrades in notifications
- Add newer_branch_target and newer_branch_latest fields to traefik_outdated_info
- Update all notification channels (Discord, Telegram, Slack, Pushover, Email, Webhook)
- Show minor version in format (e.g., v3.6) for upgrade targets instead of patch version
- Enhance UI callouts with clearer messaging about available upgrades
- Remove verbose logging in favor of cleaner code structure
- Handle edge case where SSH command returns empty response

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:53:49 +01:00
Andras Bacsai
c77eaddede refactor(proxy): implement parallel processing for Traefik version checks
Addresses critical performance issues identified in code review by refactoring the monolithic CheckTraefikVersionJob into a distributed architecture with parallel processing.

Changes:
- Split version checking into CheckTraefikVersionForServerJob for parallel execution
- Extract notification logic into NotifyOutdatedTraefikServersJob
- Dispatch individual server checks concurrently to handle thousands of servers
- Add comprehensive unit tests for the new job architecture
- Update feature tests to cover the refactored workflow

Performance improvements:
- Sequential SSH calls replaced with parallel queue jobs
- Scales efficiently for large installations with thousands of servers
- Reduces job execution time from hours to minutes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:53:49 +01:00
Andras Bacsai
1dacb94860 fix(performance): eliminate N+1 query in CheckTraefikVersionJob
This commit fixes a critical N+1 query issue in CheckTraefikVersionJob
that was loading ALL proxy servers into memory then filtering in PHP,
causing potential OOM errors with thousands of servers.

Changes:
- Added scopeWhereProxyType() query scope to Server model for
  database-level filtering using JSON column arrow notation
- Updated CheckTraefikVersionJob to use new scope instead of
  collection filter, moving proxy type filtering into the SQL query
- Added comprehensive unit tests for the new query scope

Performance impact:
- Before: SELECT * FROM servers WHERE proxy IS NOT NULL (all servers)
- After: SELECT * FROM servers WHERE proxy->>'type' = 'TRAEFIK' (filtered)
- Eliminates memory overhead of loading non-Traefik servers
- Critical for cloud instances with thousands of connected servers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 14:53:49 +01:00
Andras Bacsai
0bd4ffb2d7 feat(proxy): add Traefik version tracking with notifications and dismissible UI warnings
- Add automated Traefik version checking job running weekly on Sundays
- Implement version detection from running containers and comparison with versions.json
- Add notifications across all channels (Email, Discord, Slack, Telegram, Pushover, Webhook) for outdated versions
- Create dismissible callout component with localStorage persistence
- Display cross-branch upgrade warnings (e.g., v3.5 -> v3.6) with changelog links
- Show patch update notifications within same branch
- Add warning icon that appears when callouts are dismissed
- Prevent duplicate notifications during proxy restart by adding restarting parameter
- Fix notification spam with transition-based logic for status changes
- Enable system email settings by default in development mode
- Track last saved/applied proxy settings to detect configuration drift
2025-11-18 14:53:49 +01:00
Andras Bacsai
680b9a2c10
Merge branch 'next' into s3-restore 2025-11-17 15:39:22 +01:00
Andras Bacsai
b602fef4db fix(deployment): improve error logging with exception types and hidden technical details
- Add exception class names to error messages for better debugging
- Mark technical details (error type, code, location, stack trace) as hidden in logs
- Preserve original exception types when wrapping in DeploymentException
- Update ServerManagerJob to include exception class in log messages
- Enhance unit tests to verify hidden log entry behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:44:39 +01:00
Andras Bacsai
648e111f10 Merge remote-tracking branch 'origin/next' into s3-restore
# Conflicts:
#	app/Models/InstanceSettings.php
2025-11-17 14:30:00 +01:00
Andras Bacsai
fbdd8e5f03 fix: improve robustness and security in database restore flows
- Add null checks for server instances in restore events to prevent errors
- Escape S3 credentials to prevent command injection vulnerabilities
- Fix file upload clearing custom location to prevent UI confusion
- Optimize isSafeTmpPath helper by avoiding redundant dirname calls
- Remove unnecessary --rm flag from long-running S3 restore container
- Prioritize uploaded files over custom location in import logic
- Add comprehensive unit tests for restore event null server handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:13:10 +01:00
Andras Bacsai
97550f4066 fix(deployment): eliminate duplicate error logging in deployment methods
Wraps rolling_update(), health_check(), stop_running_container(), and
start_by_compose_file() with try-catch to ensure comprehensive error logging
happens in one place. Removes duplicate logging from intermediate catch blocks
since the failed() method already provides full error details including stack trace
and chained exception information.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 10:52:09 +01:00
Andras Bacsai
94560ea6c7 feat: streamline S3 restore with single-step flow and improved UI consistency
Major architectural improvements:
- Merged download and restore into single atomic operation
- Eliminated separate S3DownloadFinished event (redundant)
- Files now transfer directly: S3 → helper container → server → database container
- Removed download progress tracking in favor of unified restore progress

UI/UX improvements:
- Unified restore method selection with visual cards
- Consistent "File Information" display between local and S3 restore
- Single slide-over for all restore operations (removed separate S3 download monitor)
- Better visual feedback with loading states

Security enhancements:
- Added isSafeTmpPath() helper for path traversal protection
- URL decode validation to catch encoded attacks
- Canonical path resolution to prevent symlink attacks
- Comprehensive path validation in all cleanup events

Cleanup improvements:
- S3RestoreJobFinished now handles all cleanup (helper container + all temp files)
- RestoreJobFinished uses new isSafeTmpPath() validation
- CoolifyTask dispatches cleanup events even on job failure
- All cleanup uses non-throwing commands (2>/dev/null || true)

Other improvements:
- S3 storage policy authorization on Show component
- Storage Form properly syncs is_usable state after test
- Removed debug code and improved error handling
- Better command organization and documentation

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 10:05:18 +01:00
Andras Bacsai
318cd18dde fix: remove PullHelperImageJob and mass server scheduling
Stop dispatching PullHelperImageJob to thousands of servers when the helper image version changes. Instead, rely on Docker's automatic image pulling during actual deployments and backups. Inline the helper image pull in UpdateCoolify for the single use case.

This eliminates queue flooding on cloud instances while maintaining all functionality through Docker's built-in image management.
2025-11-14 11:31:08 +01:00
Andras Bacsai
49a3bb0daf refactor(DatabaseBackupJob): remove retry attempts and backoff logic for job execution 2025-11-11 15:39:01 +01:00
Andras Bacsai
644df223dc fix(ScheduledTaskJob): make server property nullable and update logging to handle null values 2025-11-11 15:38:55 +01:00
Andras Bacsai
334892d1ff feat(BackupNotification): include database name in BackupFailed notification for better context 2025-11-11 15:27:57 +01:00
Andras Bacsai
133d6a0349 feat(DeploymentException): add custom exception for deployment errors and update handler to exclude from reporting 2025-11-11 15:08:26 +01:00
Andras Bacsai
64c7d301ce feat(DatabaseBackupJob, ScheduledTaskJob): enforce minimum timeout and add execution ID for timeout handling 2025-11-11 12:07:39 +01:00
Andras Bacsai
104e68a9ac
Merge branch 'next' into improve-scheduled-tasks 2025-11-11 11:38:04 +01:00
Andras Bacsai
e79316c8b5
Update app/Jobs/DeleteResourceJob.php
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-11 11:35:30 +01:00
Andras Bacsai
f9ab2a7ca8
Merge branch 'next' into improve-scheduled-tasks 2025-11-11 11:32:15 +01:00
Andras Bacsai
0039be49b2 fix(DeleteResourceJob): escape deployment UUID and stack name in Docker commands 2025-11-11 11:30:17 +01:00
Andras Bacsai
18a14037c7 fix: improve logging for PORT environment variable mismatch and ensure .env file is created in the correct directory 2025-11-10 14:56:27 +01:00
Andras Bacsai
0b8d3d395e fix: remove redundant process termination logic from deployment methods 2025-11-10 14:46:02 +01:00
Andras Bacsai
71c89d9ba8
Merge branch 'next' into improve-scheduled-tasks 2025-11-10 14:21:03 +01:00
Andras Bacsai
194d023f70
Enhance port detection and improve user notifications (#7184) 2025-11-10 13:56:09 +01:00
Andras Bacsai
99e97900a5 feat: add automated PORT environment variable detection and UI warnings
Add detection system for PORT environment variable to help users configure applications correctly:

- Add detectPortFromEnvironment() method to Application model to detect PORT env var
- Add getDetectedPortInfoProperty() computed property in General Livewire component
- Display contextual info banners in UI when PORT is detected:
  - Warning when PORT exists but ports_exposes is empty
  - Warning when PORT doesn't match ports_exposes configuration
  - Info message when PORT matches ports_exposes
- Add deployment logging to warn about PORT/ports_exposes mismatches
- Include comprehensive unit tests for port detection logic

The ports_exposes field remains authoritative for proxy configuration, while
PORT detection provides helpful suggestions to users.
2025-11-10 13:43:27 +01:00
Andras Bacsai
1580c0d3ad
Update app/Jobs/ScheduledTaskJob.php
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-10 11:41:50 +01:00
Andras Bacsai
0ea27ce37a
Cancel active deployments when a pull request is closed (#7164) 2025-11-10 11:16:54 +01:00
Andras Bacsai
b22e79caec feat(jobs): improve scheduled tasks with retry logic and queue cleanup
- Add retry configuration to CoolifyTask (3 tries, 600s timeout)
- Add retry configuration to ScheduledTaskJob (3 tries, configurable timeout)
- Add retry configuration to DatabaseBackupJob (2 tries)
- Implement exponential backoff for all jobs (30s, 60s, 120s intervals)
- Add failed() handlers with comprehensive error logging to scheduled-errors channel
- Add execution tracking: started_at, retry_count, duration (decimal), error_details
- Add configurable timeout field to scheduled tasks (60-3600s, default 300s)
- Update UI to include timeout configuration in task creation/editing forms
- Increase ScheduledJobManager lock expiration from 60s to 90s for high-load environments
- Implement safe queue cleanup with restart vs runtime modes
  - Restart mode: aggressive cleanup (marks all processing jobs as failed)
  - Runtime mode: conservative cleanup (only marks jobs >12h as failed, skips deployments)
- Add cleanup:redis --restart flag for system startup
- Integrate cleanup into Dev.php init() for development environment
- Increase scheduled-errors log retention from 7 to 14 days
- Create comprehensive test suite (unit and feature tests)
- Add TESTING_GUIDE.md with manual testing instructions

Fixes issues with jobs failing after single attempt and "attempted too many times" errors
2025-11-10 11:11:18 +01:00
Andras Bacsai
67605d50fc fix(deployment): prevent base deployments from being killed when PRs close (#7113)
- Fix container filtering to properly distinguish base deployments (pullRequestId=0) from PR deployments
- Add deployment cancellation when PR closes via webhook to prevent race conditions
- Prevent CleanupHelperContainersJob from killing active deployment containers
- Enhance error messages with exit codes and actual errors instead of vague "Oops" messages
- Protect status transitions in finally blocks to ensure proper job failure handling
2025-11-09 14:41:35 +01:00
Andras Bacsai
712d60c75b feat: ensure .env file exists for docker compose and auto-inject in payloads 2025-11-07 15:20:10 +01:00
Andras Bacsai
d0ee7d0412
Merge branch 'next' into feat-add-dockerfile-from-instruction-par 2025-11-06 09:24:54 +01:00
Andras Bacsai
88aa24057b fix: update environment variable mapping in deployment job 2025-11-06 09:21:41 +01:00
Andras Bacsai
dbf7957795 fix: inserting ARG statements in Dockerfile after FROM instructions 2025-11-06 08:54:35 +01:00
Andras Bacsai
f315e4bd9c feat: add dev_helper_version to instance settings and update related functionality 2025-11-03 08:38:43 +01:00
Andras Bacsai
1f158b9b35 fix: Improve custom_network_aliases handling and testing
The `is_array` check for `custom_network_aliases_array` was too strict and could lead to issues when the value was an empty string or null. This commit changes the check to `!empty()` for more robust handling.

Additionally, the unit tests for `custom_network_aliases` have been refactored to directly use the `Application::isConfigurationChanged()` method. This provides a more accurate and integrated test of the configuration change detection logic, rather than relying on a manual hash calculatio
2025-11-01 13:24:05 +01:00
Andras Bacsai
9a664865ee refactor: Improve handling of custom network aliases
The custom_network_aliases attribute in the Application model was being cast to an array directly. This commit refactors the attribute to provide both a string representation (for compatibility with older configurations and hashing) and an array representation for internal use. This ensures that network aliases are correctly parsed and utilized, preventing potential issues during deployment and configuration updates.
2025-11-01 13:13:14 +01:00
Andras Bacsai
aeba914bda refactor: remove deprecated next() method
The backward-compatible next() method is no longer needed since all
call sites have been updated to use the clearer method names:
- completeDeployment()
- failDeployment()
- transitionToStatus()

This completes the refactoring to make status transitions more explicit
and maintainable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 09:20:30 +01:00
Andras Bacsai
42f916dce2 fix: ensure deployment failure notifications are sent reliably
**Problem:**
Deployment failure notifications were not being sent due to two bugs:

1. **Timing Issue in next() function:**
   - When failed() called next(FAILED), the database still had status "in_progress"
   - The notification check looked for ALREADY failed status (not found yet)
   - Status was updated AFTER the check, losing the notification

2. **Direct Status Update:**
   - Healthcheck failures directly updated status to FAILED
   - Bypassed next() entirely, no notification sent

**Solution:**
Refactored status transition logic with clear separation of concerns:

- Moved notification logic AFTER status update (not before)
- Created transitionToStatus() as single source of truth
- Added completeDeployment() and failDeployment() for clarity
- Extracted status-specific side effects into dedicated methods
- Updated healthcheck failure to use failDeployment()

**Benefits:**
-  Notifications sent for ALL failure scenarios
-  Clear, self-documenting method names
-  Single responsibility per method
-  Type-safe using enum instead of strings
-  Harder to bypass notification logic accidentally
-  Backward compatible (old next() preserved)

**Changed:**
- app/Jobs/ApplicationDeploymentJob.php (+101/-21 lines)

Fixes #6911

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 09:06:59 +01:00
Andras Bacsai
c6a2d1fe0a Fix stale lock issue causing scheduled tasks to stop (#4539)
## Problem
Scheduled tasks, backups, and auto-updates stopped working after 1-2 months
with error: MaxAttemptsExceededException: App\Jobs\ScheduledJobManager has
been attempted too many times.

Root cause: ScheduledJobManager used WithoutOverlapping with only
releaseAfter(60), causing locks without expiration (TTL=-1) that persisted
indefinitely when jobs hung or processes crashed.

## Solution

### Part 1: Prevention (Future Locks)
- Added expireAfter(60) to ScheduledJobManager middleware
- Lock now auto-expires after 60 seconds (matches everyMinute schedule)
- Changed from releaseAfter(60) to expireAfter(60)->dontRelease()
- Follows Laravel best practices and matches other Coolify jobs

### Part 2: Recovery (Existing Locks)
- Enhanced cleanup:redis command with --clear-locks flag
- Scans Redis for stale locks (TTL=-1) and removes them
- Called automatically during app:init on startup/upgrade
- Provides immediate recovery for affected instances

## Changes
- app/Jobs/ScheduledJobManager.php: Added expireAfter(60)->dontRelease()
- app/Console/Commands/CleanupRedis.php: Added cleanupCacheLocks() method
- app/Console/Commands/Init.php: Auto-clear locks on startup
- tests/Unit/ScheduledJobManagerLockTest.php: Test to prevent regression
- STALE_LOCK_FIX.md: Complete documentation

## Testing
- Unit tests pass (2 tests, 8 assertions)
- Code formatted with Pint
- Matches pattern used by CleanupInstanceStuffsJob

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-23 10:07:33 +02:00
Andras Bacsai
587517394b Changes auto-committed by Conductor 2025-10-22 13:03:17 +02:00
Andras Bacsai
d8c89a1abf Changes auto-committed by Conductor 2025-10-21 20:39:39 +02:00
Andras Bacsai
aada45d856
Merge pull request #6876 from thereis/feat/update-applicationpullrequestupdatejob-documentation
feat: include service name in preview deployment updates
2025-10-16 10:10:03 +02:00
Andras Bacsai
41afa9568d fix: handle null environment variable values in bash escaping
Previously, the bash escaping functions (`escapeBashEnvValue()` and `escapeBashDoubleQuoted()`) had strict string type hints that rejected null values, causing deployment failures when environment variables had null values.

Changes:
- Updated both functions to accept nullable strings (`?string $value`)
- Handle null/empty values by returning empty quoted strings (`''` for single quotes, `""` for double quotes)
- Added 3 new tests to cover null and empty value handling
- All 29 tests pass

This fix ensures deployments work correctly even when environment variables have null values, while maintaining the existing behavior for all other cases.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-15 13:35:58 +02:00
Lucas Reis
23250d53c4
Merge branch 'next' into feat/update-applicationpullrequestupdatejob-documentation 2025-10-15 00:59:02 +02:00