Commit graph

195 commits

Author SHA1 Message Date
Andras Bacsai
a0c177f6f2 feat(jobs): add queue delay resilience to scheduled job execution
Implement dedup key-based cron tracking to make scheduled jobs resilient to queue
delays. Even if a job is delayed by minutes, it will catch the missed cron window
by tracking previousRunDate in cache instead of relying on isDue() alone.

- Add dedupKey parameter to shouldRunNow() in ScheduledJobManager
  - When provided, uses getPreviousRunDate() + cache tracking for resilience
  - Falls back to isDue() for docker cleanups without dedup key
  - Prevents double-dispatch within same cron window

- Optimize ServerConnectionCheckJob dispatch
  - Skip SSH checks if Sentinel is healthy (enabled and live)
  - Reduces redundant checks when Sentinel heartbeat proves connectivity

- Remove hourly Sentinel update checks
  - Consolidate to daily CheckAndStartSentinelJob dispatch
  - Crash recovery handled by sentinelOutOfSync → ServerCheckJob flow

- Add logging for skipped database backups with context (backup_id, database_id, status)

- Refactor skip reason methods to accept server parameter, avoiding redundant queries

- Add comprehensive test suite for scheduling with various delay scenarios and timezones
2026-02-28 15:06:25 +01:00
Andras Bacsai
2b7e2ebafb chore: prepare for PR 2026-02-26 16:27:02 +01:00
Andras Bacsai
c93296e9a6
feat(healthcheck): add command-based health check support (#8612) 2026-02-25 12:09:59 +01:00
Andras Bacsai
b88f9fca67 chore: prepare for PR 2026-02-25 12:07:29 +01:00
Andras Bacsai
521d995ea1 Merge remote-tracking branch 'origin/next' into 7765-healthcheck-investigation 2026-02-25 11:57:58 +01:00
Andras Bacsai
57848c25e9
fix(docker): centralize command escaping in executeInDocker helper (#8615) 2026-02-25 11:51:23 +01:00
Andras Bacsai
992b922df3 chore: prepare for PR 2026-02-25 11:50:57 +01:00
Andras Bacsai
0580af0d34 feat(healthchecks): add command health checks with input validation
Add support for command-based health checks in addition to HTTP-based checks:
- New health_check_type field supporting 'http' and 'cmd' values
- New health_check_command field with strict regex validation
- Updated allowedFields in create_application and update_by_uuid endpoints
- Validation rules include max 1000 characters and safe character whitelist
- Added feature tests for health check API endpoints
- Added unit tests for GithubAppPolicy and SharedEnvironmentVariablePolicy
2026-02-25 11:38:09 +01:00
Andras Bacsai
609cb4190e fix(health-checks): sanitize and validate CMD healthcheck commands
- Add regex validation to restrict allowed characters (alphanumeric, spaces, and specific safe symbols)
- Enforce maximum 1000 character limit on healthcheck commands
- Strip newlines and carriage returns to prevent command injection
- Change input field from textarea to text input in UI
- Add warning callout about prohibited shell operators
- Add comprehensive validation tests for both valid and malicious command patterns
2026-02-25 11:28:33 +01:00
Andras Bacsai
30c0b37689 chore: prepare for PR 2026-02-25 10:58:29 +01:00
Andras Bacsai
620da191b1 chore: prepare for PR 2026-02-23 14:15:13 +01:00
Andras Bacsai
76a6960f44 chore: prepare for PR 2026-02-23 13:26:01 +01:00
Andras Bacsai
c0dadc003d fix(env): skip escaping for valid JSON in environment variables (#6160)
Prevent double-escaping of COMPOSER_AUTH and other JSON environment variables
by detecting valid JSON objects/arrays in realValue() and skipping quote
escaping entirely. This fixes broken JSON values passed to runtime services
while maintaining proper escaping for non-JSON values.

- Add JSON detection before escaping logic in EnvironmentVariable::realValue()
- JSON objects/arrays pass through unmodified, avoiding quote corruption
- Add comprehensive test coverage for JSON vs non-JSON escaping behavior
2026-01-28 10:59:00 +01:00
Andras Bacsai
a8aa452475 fix: prevent metric charts from freezing when navigating with wire:navigate
Wraps inline chart initialization scripts in IIFEs to create local scope for variables. This prevents "Identifier has already been declared" errors when Livewire's SPA navigation re-executes scripts, allowing smooth navigation between metrics pages without page refresh.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-02 12:36:17 +01:00
Andras Bacsai
0e9dbc3625 fix(metrics): address code review feedback for LTTB downsampling
- Wrap return values in collect() to maintain Collection compatibility
- Add comment explaining threshold <= 2 prevents division by zero
- Refactor tests to use actual Server model method via reflection
- Use seeded mt_rand() for reproducible test results

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 13:39:43 +01:00
Andras Bacsai
f199b6bfc4 fix(metrics): prevent page freeze with 30-day server metrics interval using LTTB downsampling
Implement the Largest-Triangle-Three-Buckets (LTTB) algorithm to downsample
metrics data for large time intervals (30 days generates 260K-500K+ points).
Reduces rendered points to ~1000 while preserving visual accuracy of peaks
and valleys. Fixes unresponsive page when selecting 30-day metrics interval.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-27 19:10:02 +01:00
Andras Bacsai
582bdc3739 test: Add comprehensive preview deployment port and path tests
Add missing edge case test for root path (/) and expand test coverage
for preview FQDN generation. Tests verify that ports and paths are
correctly preserved in preview URLs while excluding root paths.

Fixes #2184
2025-12-17 21:35:54 +01:00
Andras Bacsai
5e3593e8bf Enhance log sanitization with GitHub, GitLab, AWS, and generic URL passwords
Consolidate all PII/secret sanitization into remove_iip() to protect real-time logs in addition to exported logs. Add detection for GitHub tokens (ghp_, gho_, ghu_, ghs_, ghr_), GitLab tokens (glpat-, glcbt-, glrt-), AWS credentials (AKIA/ABIA/ACCA/ASIA access keys and secret keys), and generic URL passwords for FTP, SSH, AMQP, LDAP, and S3 protocols.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-17 17:59:10 +01:00
Andras Bacsai
96f2e81191
feat: copy resource logs with PII/secret sanitization (#7648) 2025-12-17 16:05:13 +01:00
Andras Bacsai
d4a403278d
Added ClickHouse Migration Support (#7392) 2025-12-17 11:37:53 +01:00
Duane Adam
b0b3098abe
Merge branch 'next' into feat/copy-resource-logs-with-sanitization 2025-12-16 12:03:53 +08:00
Duane Adam
327e8181af
Add copy logs button with PII/secret sanitization
Add a copy button to individual container logs that strips sensitive
data before copying to clipboard. Includes sanitization for emails,
database URLs with passwords, JWT tokens, API keys, private key blocks,
and git access tokens.
2025-12-16 11:49:40 +08:00
Andras Bacsai
67b1db9254 feat: add Hetzner Cloud server linking for manually-added servers
Allow manually-added servers to be linked to Hetzner Cloud instances by
matching IP address. Once linked, servers gain power controls and status
monitoring.

Changes:
- Add getServers() and findServerByIp() methods to HetznerService
- Add Hetzner linking UI section to Server General page
- Add unit tests for new HetznerService methods

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 22:14:41 +01:00
Andras Bacsai
98b99cbb09
Fix read-only volume detection and add refresh capability (#7588) 2025-12-11 21:41:36 +01:00
Andras Bacsai
f152ec00ad fix: Detect read-only Docker volumes with long-form syntax and enable refresh
- Fixed isReadOnlyVolume() to detect both short-form (:ro) and long-form (read_only: true) Docker Compose volume syntax
- Fixed path matching to use mount_path only (fs_path is transformed during parsing from ./file to absolute path)
- Added "Load from server" button for read-only volumes to allow users to refresh content
- Changed loadStorageOnServer() authorization from 'update' to 'view' since loading is a read operation
- Added helper text to Content field warning users that content may be outdated
- Applied fixes to both LocalFileVolume and LocalPersistentVolume models

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-11 14:18:58 +01:00
Andras Bacsai
6ea563c6ac Fix: Prevent coolify-helper and coolify-realtime images from being pruned
Current version of infrastructure images (coolify-helper, coolify-realtime) are now protected from deletion during docker cleanup, regardless of which registry they're pulled from (ghcr.io, docker.io, or Docker Hub implicit). Old versions continue to be cleaned up as intended.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-11 13:38:52 +01:00
Andras Bacsai
6e15d8e5f8
Add ValidProxyConfigFilename rule for dynamic proxy config validation (#7544) 2025-12-09 16:32:41 +01:00
Andras Bacsai
5ec3f39b9b
Add autogenerate_domain API parameter for applications (#7515) 2025-12-09 16:19:49 +01:00
Andras Bacsai
028fb5c22c Add ValidProxyConfigFilename rule for dynamic proxy config validation
Adds a new Laravel validation rule to prevent path traversal, hidden files, and invalid filenames in the dynamic proxy configuration feature. Validates filenames to ensure they contain only safe characters, don't exceed filesystem limits, and don't use reserved names.

- New Rule: ValidProxyConfigFilename with comprehensive validation
- Updated: NewDynamicConfiguration to use the new rule
- Added: 13 unit tests covering all validation scenarios

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-09 16:12:45 +01:00
Andras Bacsai
7c1f230bd3 fix: remove {{port}} template variable and ensure ports are always appended to preview URLs
The {{port}} template variable was undocumented and caused a double port bug
when used in preview URL templates. Since ports are always appended to the final
URL anyway, we remove {{port}} substitution entirely and ensure consistent port
handling across ApplicationPreview, PreviewsCompose, and the applicationParser helper.

Also fix PreviewsCompose.php which wasn't preserving ports at all, and improve
the Blade template formatting in previews-compose.blade.php.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-07 21:53:47 +01:00
Andras Bacsai
eb743cf690 Add autogenerate_domain API parameter for applications
Allows API consumers to control domain auto-generation behavior. When autogenerate_domain is true (default) and no custom domains are provided, the system auto-generates a domain using the server's wildcard domain or sslip.io fallback.

- Add autogenerate_domain parameter to all 5 application creation endpoints
- Add validation and allowlist rules
- Implement domain auto-generation logic across all application types
- Add comprehensive unit tests for the feature

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 21:16:04 +01:00
Andras Bacsai
21429a26b1
Add per-application Docker image retention for rollback (#7504) 2025-12-05 13:00:18 +01:00
Andras Bacsai
439afca642 Inject commit-based image tags for Docker Compose build services
For Docker Compose applications with build directives, inject commit-based
image tags (uuid_servicename:commit) to enable rollback functionality.
Previously these services always used 'latest' tags, making rollback impossible.

- Only injects tags for services with build: but no explicit image:
- Uses pr-{id} tags for pull request deployments
- Respects user-defined image: fields (preserves user intent)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 11:41:47 +01:00
Andras Bacsai
710dc3ca4b Add Docker Compose support for image retention during cleanup
Support for Docker Compose applications with build: directives that create
images with uuid_servicename naming pattern (e.g., app-uuid_web:commit).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 11:17:23 +01:00
Andras Bacsai
4ed7a4238a Add per-application Docker image retention for rollback capability
Implement a per-application setting (`docker_images_to_keep`) in `application_settings` table to control how many Docker images are preserved during cleanup. The cleanup process now:

- Respects per-application retention settings (default: 2 images)
- Preserves the N most recent images per application for easy rollback
- Always deletes PR images and keeps the currently running image
- Dynamically excludes application images from general Docker image prune
- Cleans up non-Coolify unused images to prevent disk bloat

Fixes issues where cleanup would delete all images needed for rollback.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 11:02:07 +01:00
Andras Bacsai
ed979f42ef Fix SSH multiplexing contention for concurrent scheduled tasks (#6736)
When multiple scheduled tasks or database backups run concurrently on
the same server, they compete for the same SSH multiplexed connection
socket, causing race conditions and SSH exit code 255 errors.

This fix adds a `disableMultiplexing` parameter to bypass SSH
multiplexing for jobs that may run concurrently:

- Add `disableMultiplexing` param to `generateSshCommand()`
- Add `disableMultiplexing` param to `instant_remote_process()`
- Update `ScheduledTaskJob` to use `disableMultiplexing: true`
- Update `DatabaseBackupJob` to use `disableMultiplexing: true`
- Add debug logging to track execution without multiplexing
- Add unit tests for the new parameter

Each backup and scheduled task now gets an isolated SSH connection,
preventing contention on the shared multiplexed socket.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 09:54:30 +01:00
Andras Bacsai
558a885fdc
Fix Nixpacks null environment variable parsing error (#7493) 2025-12-04 16:29:56 +01:00
Andras Bacsai
42f08a99fb Fix Nixpacks null environment variable parsing error
Filter out null and empty environment variables when generating Nixpacks build
configuration to prevent JSON parsing errors. Environment variables with null or
empty values were being passed as `--env KEY=` which created invalid JSON with
null values, causing deployment failures.

This fix ensures only valid non-empty environment variables are included in both
user-defined and auto-generated COOLIFY_* environment variables.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 15:10:39 +01:00
Andras Bacsai
70ff73e954 Merge branch 'next' into macau-v1
Resolved conflicts in ServerManagerJob.php by:
- Keeping sentinel update check code from macau-v1
- Preserving sentinel restart code from next branch
- Ensuring no duplicate code blocks
2025-12-04 15:07:36 +01:00
Andras Bacsai
9e0fa03434
Run proxy restart as background job with real-time logs (#7475) 2025-12-04 14:59:50 +01:00
Andras Bacsai
4002044877 Refactor: Move sentinel update checks to ServerManagerJob and add tests for hourly dispatch 2025-12-04 14:58:18 +01:00
Andras Bacsai
21f3ef6f9f
Fix PostgREST misclassification and empty Domains section (#7442) 2025-12-04 14:53:36 +01:00
Andras Bacsai
bf8dcac88c Move inline styles to global CSS file
Moved .log-highlight styles from Livewire component views to resources/css/app.css for better separation of concerns and reusability. This follows Laravel and Livewire best practices by keeping styles in the appropriate location rather than inline in component views.

Changes:
- Added .log-highlight styles to resources/css/app.css
- Removed inline <style> tags from deployment/show.blade.php
- Removed inline <style> tags from get-logs.blade.php
- Added XSS security test for log viewer
- Applied code formatting with Laravel Pint

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 13:15:01 +01:00
Andras Bacsai
36da7174d5 Combine stop+start into single activity for real-time logs
Instead of calling StopProxy::run() (synchronous) then StartProxy::run()
(async), now we build a single command sequence that includes both stop
and start phases. This creates one Activity immediately via remote_process(),
so the UI receives the activity ID right away and can show logs in real-time
from the very beginning of the restart operation.

Key changes:
- Removed dependency on StopProxy and StartProxy actions
- Build combined command sequence inline in buildRestartCommands()
- Use remote_process() directly which returns Activity immediately
- Increased timeout from 60s to 120s to accommodate full restart
- Activity ID dispatched to UI within milliseconds of job starting

Flow is now:
1. Job starts → sets "restarting" status
2. Commands built synchronously (fast, no SSH)
3. remote_process() creates Activity and dispatches CoolifyTask job
4. Activity ID sent to UI immediately via WebSocket
5. UI opens activity monitor with real-time streaming logs
6. Logs show "Stopping proxy..." then "Starting proxy..." as they happen

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 16:21:26 +01:00
Andras Bacsai
340e42aefd Dispatch restarting status immediately when job starts
Set proxy status to 'restarting' and dispatch ProxyStatusChangedUI event
at the very beginning of handle() method, before StopProxy runs. This
notifies the UI immediately so users know a restart is in progress,
rather than waiting until after the stop operation completes.

Also simplified unit tests to focus on testable job configuration
(middleware, tries, timeout) without complex SchemalessAttributes mocking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 16:18:13 +01:00
Andras Bacsai
b00d8902f4 Fix duplicate proxy restart notifications
- Remove redundant ProxyStatusChangedUI dispatch from RestartProxyJob
  (ProxyStatusChanged event already triggers the listener that dispatches it)
- Remove redundant Traefik version check from RestartProxyJob
  (already handled by ProxyStatusChangedNotification listener)
- Add lastNotifiedStatus tracking to prevent duplicate toasts
- Remove notifications for unknown/default statuses (too noisy)
- Simplify RestartProxyJob to only handle stop/start logic

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 16:09:47 +01:00
Andras Bacsai
e4810a28d2 Make proxy restart run as background job to prevent localhost lockout
When restarting the proxy on localhost (where Coolify is running), the UI becomes inaccessible because the connection is lost. This change makes all proxy restarts run as background jobs with WebSocket notifications, allowing the operation to complete even after connection loss.

Changes:
- Enhanced ProxyStatusChangedUI event to carry activityId for log monitoring
- Updated RestartProxyJob to dispatch status events and track activity
- Simplified Navbar restart() to always dispatch job for all servers
- Enhanced showNotification() to handle activity monitoring and new statuses
- Added comprehensive unit and feature tests

Benefits:
- Prevents localhost lockout during proxy restarts
- Consistent behavior across all server types
- Non-blocking UI with real-time progress updates
- Automatic activity log monitoring
- Proper error handling and recovery

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 10:30:12 +01:00
Andras Bacsai
f75bc85bc1
Merge branch 'next' into decouple-storage-from-sentinel 2025-12-03 09:19:09 +01:00
Andras Bacsai
c65ad2e655 Fix complex status logic: handle degraded sub-resources and mixed running+starting states
- Add support for degraded status from sub-resources as highest priority
- Handle mixed running+starting state to show service as not fully ready
- Update state priority hierarchy from 8 to 10 levels
- Add comprehensive test coverage for new status scenarios

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 21:47:15 +01:00
Andras Bacsai
8ff83cc3d6 Fix: Pass $serverTimezone to shouldRunNow() in ServerCheckJob dispatch
Pass the server timezone parameter to shouldRunNow() call at line 127,
ensuring ServerCheckJob dispatch respects the server's local timezone
instead of falling back to the instance default.

This aligns the behavior with other scheduled tasks in the same method:
- ServerStorageCheckJob (line 137)
- ServerPatchCheckJob (line 144)
- Sentinel restart (line 152)

All scheduled tasks in processServerTasks() now consistently use the
server's configured timezone for cron evaluation.

Added unit test to verify timezone-aware cron schedule evaluation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 16:58:43 +01:00