## Added Documentation
Created detailed documentation in `.ai/core/application-architecture.md`
explaining the container status monitoring system to prevent future bugs.
## Key Sections
### 1. Container Status Monitoring System Overview
- Explains that status is updated through multiple independent paths
- Emphasizes that ALL paths must be updated when changing status logic
### 2. Critical Implementation Locations
Documents all four status calculation locations:
- **SSH-Based Updates**: `GetContainersStatus.php` (scheduled, every ~1min)
- **Sentinel-Based Updates**: `PushServerUpdateJob.php` (real-time, every ~30sec)
- **Multi-Server Aggregation**: `ComplexStatusCheck.php` (on-demand)
- **Service-Level Aggregation**: `Service.php` (service status)
### 3. Status Flow Diagram
Visual representation of how status flows from different sources to UI
### 4. Status Priority System
Documents the required priority: unhealthy > unknown > healthy
### 5. Excluded Containers
Explains `:excluded` suffix handling and behavior
### 6. Developer Guidelines
- Checklist of all locations to update
- Testing requirements
- Edge cases to handle
### 7. Related Tests
Links to all relevant test files
### 8. Common Bugs to Avoid
Real examples from bugs we've fixed, with solutions
## Why This Documentation Matters
The recent bug (unknown → healthy) happened because:
1. `GetContainersStatus.php` was updated to handle "unknown" status
2. `PushServerUpdateJob.php` was NOT updated
3. This caused periodic status flipping
This documentation ensures future developers (and AI assistants like Claude)
will know to update ALL four locations when modifying status logic.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Problem
Services with "running (unknown)" status were periodically changing
to "running (healthy)" every ~30 seconds when Sentinel pushed updates.
This was confusing for users and inconsistent with SSH-based status checks.
## Root Cause
`PushServerUpdateJob::aggregateMultiContainerStatuses()` was missing
logic to track "unknown" health state. It only tracked "unhealthy" and
defaulted everything else to "healthy".
When Sentinel pushed updates with "running (unknown)" containers:
- The job saw `hasRunning = true` and `hasUnhealthy = false`
- It incorrectly returned "running (healthy)" instead of "running (unknown)"
## Solution
Updated `PushServerUpdateJob` to match the logic in `GetContainersStatus`:
1. Added `$hasUnknown` tracking variable
2. Check for "unknown" in status strings (alongside "unhealthy")
3. Implement 3-way priority: unhealthy > unknown > healthy
This ensures consistency between:
- SSH-based updates (`GetContainersStatus`)
- Sentinel-based updates (`PushServerUpdateJob`)
- UI display logic
## Changes
- **app/Jobs/PushServerUpdateJob.php**: Added unknown status tracking
- **tests/Unit/PushServerUpdateJobStatusAggregationTest.php**: New comprehensive tests
- **tests/Unit/ExcludeFromHealthCheckTest.php**: Updated to match current implementation
## Testing
All 31 status-related unit tests passing:
- 18 tests in ContainerHealthStatusTest
- 8 tests in ExcludeFromHealthCheckTest (updated)
- 6 tests in PushServerUpdateJobStatusAggregationTest (new)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove the github_runner_sources migration that was accidentally
included in the health check status fix branch. This migration
is unrelated to the health check functionality and should be
developed separately.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds:
- Comprehensive docker-compose examples for health check testing
- GitHub runner sources database migration
- UI fix for service heading view
Files Added:
- DOCKER_COMPOSE_EXAMPLES.md - Documentation of health check test cases
- docker-compose.*.yml - Test files for various health check scenarios:
- excluded.yml: Container with exclude_from_hc flag
- healthy.yml: All containers healthy
- unhealthy.yml: All containers unhealthy
- unknown.yml: Container without healthcheck
- mixed-healthy-unknown.yml: Mix of healthy and unknown
- mixed-unhealthy-unknown.yml: Mix of unhealthy and unknown
- database/migrations/2025_11_19_115504_create_github_runner_sources_table.php
Files Modified:
- resources/views/livewire/project/service/heading.blade.php
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes container health status aggregation to correctly handle
unknown health states and edge case container states across all resource types.
Changes:
1. **Preserve Unknown Health State**
- Add three-way priority: unhealthy > unknown > healthy
- Detect containers without healthchecks (null health) as unknown
- Apply across GetContainersStatus, ComplexStatusCheck, and Service models
2. **Handle Edge Case Container States**
- Add support for: created, starting, paused, dead, removing
- Map to appropriate statuses: starting (unknown), paused (unknown), degraded (unhealthy)
- Prevent containers in transitional states from showing incorrect status
3. **Add :excluded Suffix for Excluded Containers**
- Parse exclude_from_hc flag from docker-compose YAML
- Append :excluded suffix to individual container statuses
- Skip :excluded containers in non-excluded aggregation sections
- Strip :excluded suffix in excluded aggregation sections
- Makes it clear in UI which containers are excluded from monitoring
Files Modified:
- app/Actions/Docker/GetContainersStatus.php
- app/Actions/Shared/ComplexStatusCheck.php
- app/Models/Service.php
- tests/Unit/ContainerHealthStatusTest.php
Tests: 18 passed (82 assertions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add validation for CONDUCTOR_ROOT_PATH environment variable
- Enhance safety checks with explicit blacklist of system directories
- Improve directory detection (symlink vs regular directory)
- Replace Python dependency with cross-platform bash+perl for path calculation
- Use absolute paths consistently to prevent symlink following
- Add detailed comments explaining each safety check
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added multiple safety validations before executing rm -rf commands:
- Check WORKTREE_PATH is not empty, /, /Users, or $HOME
- Verify we're actually in a git repository (.git exists)
This prevents accidental deletion of critical directories if the script
is run in the wrong location or with unexpected environment variables.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changed rm -rf commands to use absolute paths ($WORKTREE_PATH) instead
of relative paths to prevent accidental deletion if symlinks behave
unexpectedly.
Also cleaned up duplicate WORKTREE_PATH definition.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Simplified the worktree setup to use the main repository's node_modules
and vendor directories directly instead of creating a separate .shared-deps
directory.
Changes:
- Updated conductor-setup.sh to symlink directly to main repo's directories
- Updated CLAUDE.md to reflect the simpler approach
- Symlinks now point to ../../node_modules and ../../vendor
Benefits:
- Simpler setup with no extra directories
- All worktrees share the main repo's dependencies
- No need to add .shared-deps to .gitignore
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Setup Conductor to automatically share node_modules and vendor directories
across all git worktrees to save disk space and speed up development.
Changes:
- Updated conductor-setup.sh to create symlinks to shared dependencies
- Added documentation to CLAUDE.md explaining the system
- Dependencies now stored in .shared-deps/ in main repository
- All worktrees use the same dependency versions automatically
Benefits:
- Saves hundreds of MBs to GBs of disk space
- No need to run npm install/composer install for each worktree
- Consistent dependency versions across all worktrees
Note: Add .shared-deps/ to .gitignore in the main repository
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When all containers are excluded from health checks, display their actual status
with :excluded suffix instead of misleading hardcoded statuses. This prevents
broken UI state with incorrect action buttons and provides clarity that monitoring
is disabled.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Adds a new EnvVarInput component that provides autocomplete suggestions for shared environment variables from team, project, and environment scopes. Users can reference variables using {{ syntax.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
When all services in a Docker Compose file have `exclude_from_hc: true`,
the status aggregation logic was returning invalid states causing broken UI.
**Problems fixed:**
- ComplexStatusCheck returned 'running:healthy' for apps with no monitored containers
- Service model returned ':' (null status) when all services excluded
- UI showed active start/stop buttons for non-running services
**Changes:**
- ComplexStatusCheck: Return 'exited:healthy' when relevantContainerCount is 0
- Service model: Return 'exited:healthy' when both status and health are null
- Added comprehensive unit tests to verify the fixes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Simplifies conductor.json by removing the minio-init container from the docker rm command.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Update CLAUDE.md to reference .ai/ directory as single source of truth
- Move documentation structure to organized .ai/ directory with core/, development/, patterns/, meta/ subdirectories
- Update .ai/README.md with correct path references
- Update .ai/meta/maintaining-docs.md to reflect new structure
- Consolidate sync-guide.md with detailed synchronization rules
- Fix cross-reference in frontend-patterns.md
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Add a separate minio-init container that automatically creates the 'local'
bucket when MinIO starts in development. Mark the seeded S3Storage as usable
by default so developers can use MinIO without manual validation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Create .ai/ directory as single source of truth for all AI docs
- Organize by topic: core/, development/, patterns/, meta/
- Update CLAUDE.md to reference .ai/ files instead of embedding content
- Remove 18KB of duplicated Laravel Boost guidelines from CLAUDE.md
- Fix testing command descriptions (pest runs all tests, not just unit)
- Standardize version numbers (Laravel 12.4.1, PHP 8.4.7, Tailwind 4.1.4)
- Replace all .cursor/rules/*.mdc with single coolify-ai-docs.mdc reference
- Delete dev_workflow.mdc (non-Coolify Task Master content)
- Merge cursor_rules.mdc + self_improve.mdc into maintaining-docs.md
- Update .AI_INSTRUCTIONS_SYNC.md to redirect to new location
Benefits:
- Single source of truth - no more duplication
- Consistent versions across all documentation
- Better organization by topic
- Platform-agnostic .ai/ directory works for all AI tools
- Reduced CLAUDE.md from 719 to ~320 lines
- Clear cross-references between files
Replace dynamic wire:key values that included the full command string
with stable, descriptive identifiers to prevent unnecessary re-renders
and potential issues with special characters.
Changes:
- Line 270: wire:key="preview-{{ $command }}" → "docker-compose-build-preview"
- Line 279: wire:key="start-preview-{{ $command }}" → "docker-compose-start-preview"
Benefits:
- Prevents element recreation on every keystroke
- Avoids issues with special characters in commands
- Better performance with long commands
- Follows Livewire best practices
The computed properties (dockerComposeBuildCommandPreview and
dockerComposeStartCommandPreview) continue to handle reactive updates
automatically, so preview content still updates as expected.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Refine regex pattern to prevent false positives with flags like -foo, -from, -feature
- Change from \S (any non-whitespace) to [.~/]|$ (path characters or end of word)
- Add comprehensive tests for false positive prevention (4 test cases)
- Add path normalization tests for baseDirectory edge cases (6 test cases)
- Add @example documentation to injectDockerComposeFlags function
Prevents incorrect detection of:
- -foo, -from, -feature, -fast as the -f flag
- Ensures -f flag is only detected when followed by path characters or end of word
All 45 tests passing with 135 assertions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix double-slash issue in Docker Compose preview paths when baseDirectory is "/"
- Normalize baseDirectory using rtrim() to prevent path concatenation issues
- Replace hardcoded '/artifacts/build-time.env' with ApplicationDeploymentJob::BUILD_TIME_ENV_PATH
- Make BUILD_TIME_ENV_PATH constant public for reusability
- Add comprehensive unit tests (11 test cases, 25 assertions)
Fixes preview path generation in:
- getDockerComposeBuildCommandPreviewProperty()
- getDockerComposeStartCommandPreviewProperty()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When using a custom Docker Compose build command, environment variables
were being lost because the --env-file flag was not included. This fix
automatically injects the --env-file flag to ensure build-time environment
variables are available during custom builds.
Changes:
- Auto-inject --env-file /artifacts/build-time.env after docker compose
- Respect user-provided --env-file flags (no duplication)
- Append build arguments when not using build secrets
- Update UI helper text to inform users about automatic env injection
- Add comprehensive unit tests (7 test cases, all passing)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Move notification logic from NotifyOutdatedTraefikServersJob into CheckTraefikVersionForServerJob to send immediate notifications when outdated Traefik is detected. This is more suitable for cloud environments with thousands of servers.
Changes:
- CheckTraefikVersionForServerJob now sends notifications immediately after detecting outdated Traefik
- Remove NotifyOutdatedTraefikServersJob (no longer needed)
- Remove delay calculation logic from CheckTraefikVersionJob
- Update tests to reflect new immediate notification pattern
Trade-offs:
- Pro: Faster notifications (immediate alerts)
- Pro: Simpler codebase (removed complex delay calculation)
- Pro: Better scalability for thousands of servers
- Con: Teams may receive multiple notifications if they have many outdated servers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When a user restarts the proxy from the Navbar UI component, the system now automatically dispatches a version check job immediately after the restart completes. This provides immediate feedback about available Traefik updates without waiting for the weekly scheduled check.
Changes:
- Import CheckTraefikVersionForServerJob in Navbar component
- After successful proxy restart, dispatch version check for Traefik servers
- Version check only runs for servers using Traefik proxy
This ensures users get up-to-date version information right after restarting their proxy infrastructure.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit introduces several improvements to the Traefik version tracking
feature and proxy configuration UI:
## Caching Improvements
1. **New centralized helper functions** (bootstrap/helpers/versions.php):
- `get_versions_data()`: Redis-cached access to versions.json (1 hour TTL)
- `get_traefik_versions()`: Extract Traefik versions from cached data
- `invalidate_versions_cache()`: Clear cache when file is updated
2. **Performance optimization**:
- Single Redis cache key: `coolify:versions:all`
- Eliminates 2-4 file reads per page load
- 95-97.5% reduction in disk I/O time
- Shared cache across all servers in distributed setup
3. **Updated all consumers to use cached helpers**:
- CheckTraefikVersionJob: Use get_traefik_versions()
- Server/Proxy: Two-level caching (Redis + in-memory per-request)
- CheckForUpdatesJob: Auto-invalidate cache after updating file
- bootstrap/helpers/shared.php: Use cached data for Coolify version
## UI/UX Improvements
1. **Navbar warning indicator**:
- Added yellow warning triangle icon next to "Proxy" menu item
- Appears when server has outdated Traefik version
- Uses existing traefik_outdated_info data for instant checks
- Provides at-a-glance visibility of version issues
2. **Proxy sidebar persistence**:
- Fixed sidebar disappearing when clicking "Switch Proxy"
- Configuration link now always visible (needed for proxy selection)
- Dynamic Configurations and Logs only show when proxy is configured
- Better navigation context during proxy switching workflow
## Code Quality
- Added comprehensive PHPDoc for Server::$traefik_outdated_info property
- Improved code organization with centralized helper approach
- All changes formatted with Laravel Pint
- Maintains backward compatibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Store both patch update and newer minor version information simultaneously
- Display patch update availability alongside minor version upgrades in notifications
- Add newer_branch_target and newer_branch_latest fields to traefik_outdated_info
- Update all notification channels (Discord, Telegram, Slack, Pushover, Email, Webhook)
- Show minor version in format (e.g., v3.6) for upgrade targets instead of patch version
- Enhance UI callouts with clearer messaging about available upgrades
- Remove verbose logging in favor of cleaner code structure
- Handle edge case where SSH command returns empty response
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Addresses critical performance issues identified in code review by refactoring the monolithic CheckTraefikVersionJob into a distributed architecture with parallel processing.
Changes:
- Split version checking into CheckTraefikVersionForServerJob for parallel execution
- Extract notification logic into NotifyOutdatedTraefikServersJob
- Dispatch individual server checks concurrently to handle thousands of servers
- Add comprehensive unit tests for the new job architecture
- Update feature tests to cover the refactored workflow
Performance improvements:
- Sequential SSH calls replaced with parallel queue jobs
- Scales efficiently for large installations with thousands of servers
- Reduces job execution time from hours to minutes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add wait loops to ensure containers are fully removed before restarting.
This fixes race conditions where docker compose would fail because an
existing container was still being cleaned up.
Changes:
- StartProxy: Add explicit stop, wait loop before docker compose up
- StopProxy: Add wait loop after container removal
- Both actions now poll up to 10 seconds for complete removal
- Add error suppression to handle non-existent containers gracefully
Tests:
- Add StartProxyTest.php with 3 tests for cleanup logic
- Add StopProxyTest.php with 4 tests for stop behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes a critical N+1 query issue in CheckTraefikVersionJob
that was loading ALL proxy servers into memory then filtering in PHP,
causing potential OOM errors with thousands of servers.
Changes:
- Added scopeWhereProxyType() query scope to Server model for
database-level filtering using JSON column arrow notation
- Updated CheckTraefikVersionJob to use new scope instead of
collection filter, moving proxy type filtering into the SQL query
- Added comprehensive unit tests for the new query scope
Performance impact:
- Before: SELECT * FROM servers WHERE proxy IS NOT NULL (all servers)
- After: SELECT * FROM servers WHERE proxy->>'type' = 'TRAEFIK' (filtered)
- Eliminates memory overhead of loading non-Traefik servers
- Critical for cloud instances with thousands of connected servers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add automated Traefik version checking job running weekly on Sundays
- Implement version detection from running containers and comparison with versions.json
- Add notifications across all channels (Email, Discord, Slack, Telegram, Pushover, Webhook) for outdated versions
- Create dismissible callout component with localStorage persistence
- Display cross-branch upgrade warnings (e.g., v3.5 -> v3.6) with changelog links
- Show patch update notifications within same branch
- Add warning icon that appears when callouts are dismissed
- Prevent duplicate notifications during proxy restart by adding restarting parameter
- Fix notification spam with transition-based logic for status changes
- Enable system email settings by default in development mode
- Track last saved/applied proxy settings to detect configuration drift