Commit graph

256 commits

Author SHA1 Message Date
Andras Bacsai
bf8dcac88c Move inline styles to global CSS file
Moved .log-highlight styles from Livewire component views to resources/css/app.css for better separation of concerns and reusability. This follows Laravel and Livewire best practices by keeping styles in the appropriate location rather than inline in component views.

Changes:
- Added .log-highlight styles to resources/css/app.css
- Removed inline <style> tags from deployment/show.blade.php
- Removed inline <style> tags from get-logs.blade.php
- Added XSS security test for log viewer
- Applied code formatting with Laravel Pint

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 13:15:01 +01:00
Andras Bacsai
74bb8f49ce Fix: Correct time inconsistency in ServerStorageCheckIndependenceTest
Move Carbon::setTestNow() to the beginning of each test before creating
test data. Previously, tests created servers using now() (real current
time) and only afterwards called Carbon::setTestNow(), making
sentinel_updated_at inconsistent with the test clock.

This caused staleness calculations to use different timelines:
- sentinel_updated_at was based on real time (e.g., Dec 2024)
- Test execution time was frozen at 2025-01-15

Now all timestamps use the same frozen test time, making staleness
checks predictable and tests reliable regardless of when they run.

Affected tests (all 7 test cases in the file):
- does not dispatch storage check when sentinel is in sync
- dispatches storage check when sentinel is out of sync
- dispatches storage check when sentinel is disabled
- respects custom hourly storage check frequency when sentinel is out of sync
- handles VALID_CRON_STRINGS mapping correctly when sentinel is out of sync
- respects server timezone for storage checks when sentinel is out of sync
- does not dispatch storage check outside schedule

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 10:22:09 +01:00
Andras Bacsai
56a0143a25 Fix: Prevent ServerStorageCheckJob duplication when Sentinel is active
When Sentinel is enabled and in sync, ServerStorageCheckJob was being
dispatched from two locations causing unnecessary duplication:
1. PushServerUpdateJob (every ~30s with real-time filesystem data)
2. ServerManagerJob (scheduled cron check via SSH)

This commit modifies ServerManagerJob to only dispatch ServerStorageCheckJob
when Sentinel is out of sync or disabled. When Sentinel is active and in sync,
PushServerUpdateJob provides real-time storage data, making the scheduled SSH
check redundant.

Benefits:
- Eliminates duplicate storage checks when Sentinel is working
- Reduces unnecessary SSH overhead
- Storage checks still run as fallback when Sentinel fails
- Maintains scheduled checks for servers without Sentinel

Updated tests to reflect new behavior:
- Storage check NOT dispatched when Sentinel is in sync
- Storage check dispatched when Sentinel is out of sync or disabled
- All timezone and frequency tests updated accordingly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 10:05:10 +01:00
Andras Bacsai
f75bc85bc1
Merge branch 'next' into decouple-storage-from-sentinel 2025-12-03 09:19:09 +01:00
Andras Bacsai
c65ad2e655 Fix complex status logic: handle degraded sub-resources and mixed running+starting states
- Add support for degraded status from sub-resources as highest priority
- Handle mixed running+starting state to show service as not fully ready
- Update state priority hierarchy from 8 to 10 levels
- Add comprehensive test coverage for new status scenarios

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 21:47:15 +01:00
Andras Bacsai
8ff83cc3d6 Fix: Pass $serverTimezone to shouldRunNow() in ServerCheckJob dispatch
Pass the server timezone parameter to shouldRunNow() call at line 127,
ensuring ServerCheckJob dispatch respects the server's local timezone
instead of falling back to the instance default.

This aligns the behavior with other scheduled tasks in the same method:
- ServerStorageCheckJob (line 137)
- ServerPatchCheckJob (line 144)
- Sentinel restart (line 152)

All scheduled tasks in processServerTasks() now consistently use the
server's configured timezone for cron evaluation.

Added unit test to verify timezone-aware cron schedule evaluation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 16:58:43 +01:00
Andras Bacsai
ed5796739f Fix: Prevent ServerManagerJob executionTime mutation across server loop
Fixed a critical bug where $this->executionTime was being mutated during
the server processing loop, causing incorrect scheduling calculations for
subsequent servers.

The issue occurred at line 123 where subSeconds() was called directly on
the shared executionTime instance. This caused the baseline time to shift
by waitTime seconds with each server iteration, resulting in compounding
scheduling errors (e.g., 1680 seconds drift over 5 servers).

Changed:
- app/Jobs/ServerManagerJob.php:123
  Added .copy() before .subSeconds() to prevent mutation

Added comprehensive unit tests that verify:
- Immutability when using .copy()
- Demonstration of the bug without .copy()
- Correct behavior across multiple iterations

This follows the existing pattern in shouldRunNow() (line 167) and aligns
with other jobs in the codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 15:27:17 +01:00
Andras Bacsai
b47181c790 Decouple ServerStorageCheckJob from Sentinel sync status
Server disk usage checks now run on their configured schedule regardless of Sentinel status, eliminating monitoring blind spots when Sentinel is offline, out of sync, or disabled. Storage checks now respect server timezone settings, consistent with patch checks.

Changes:
- Moved server timezone calculation to top of processServerTasks()
- Extracted ServerStorageCheckJob dispatch from Sentinel conditional
- Fixed default frequency to '0 23 * * *' (11 PM daily)
- Added timezone parameter to storage check scheduling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 13:36:25 +01:00
Andras Bacsai
4b119726d9 Fix Traefik email notification with clickable server links
- Add URL generation to notification class using base_url() helper
- Replace config('app.url') with proper base_url() for accurate instance URL
- Make server names clickable links to proxy configuration page
- Use data_get() with fallback values for safer template data access
- Add comprehensive tests for URL generation and email rendering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 13:08:40 +01:00
Andras Bacsai
d59c75c2b2 Fix: Docker build args injection regex to support service names
The regex pattern in injectDockerComposeBuildArgs() was too restrictive
and failed to match `docker compose build servicename` commands. Changed
the lookahead from `(?=\s+(?:--|-)|\s+(?:&&|\|\||;|\|)|$)` to the
simpler `(?=\s|$)` to allow any content after the build command,
including service names with hyphens/underscores and flags.

Also improved the ApplicationDeploymentJob to use the new helper function
and added comprehensive test coverage for service-specific builds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 13:16:05 +01:00
Andras Bacsai
627cec16fa
Merge branch 'next' into fix-traefik-startup 2025-11-28 17:54:48 +01:00
Andras Bacsai
cb0f2301f5 Fix: Traefik proxy startup issues - handle null versions and filter predefined networks
Fixes two critical issues preventing Traefik proxy startup:

1. TypeError when restarting proxy: Handle null return from get_traefik_versions()
   - Add null check before dispatching CheckTraefikVersionForServerJob
   - Log warning when version data is unavailable
   - Prevents: "Argument #2 must be of type array, null given"

2. Docker network error: Filter out predefined Docker networks
   - Add isDockerPredefinedNetwork() helper to centralize network filtering
   - Apply filtering in collectDockerNetworksByServer() before operations
   - Apply filtering in generateDefaultProxyConfiguration()
   - Prevents: "operation is not permitted on predefined default network"

Also: Move $cachedVersionsFile assignment after null check in Proxy.php

Tests: Added 7 new unit tests for network filtering function
All existing tests pass with no regressions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 17:53:26 +01:00
Andras Bacsai
8c40cc607a Fix: Fragile service name parsing in applyServiceApplicationPrerequisites
Changed from `->before('-')` to `->beforeLast('-')` to correctly parse service
names with hyphens. This fixes prerequisite application for ~230+ services
containing hyphens in their template names (e.g., docker-registry,
elasticsearch-with-kibana).

Added comprehensive test coverage for hyphenated service names and fixed
existing tests to use realistic CUID2 UUID format. All unit tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 17:42:04 +01:00
Andras Bacsai
4706bc23aa Refactor: Centralize service application prerequisites
Refactors the Appwrite and Beszel service-specific application settings
to use a centralized constant-based approach, following the same pattern
as NEEDS_TO_CONNECT_TO_PREDEFINED_NETWORK.

Changes:
- Added NEEDS_TO_DISABLE_GZIP constant for services requiring gzip disabled
- Added NEEDS_TO_DISABLE_STRIPPREFIX constant for services requiring stripprefix disabled
- Created applyServiceApplicationPrerequisites() helper function in bootstrap/helpers/services.php
- Updated all service creation flows to use the centralized helper:
  * app/Livewire/Project/Resource/Create.php (web handler)
  * app/Http/Controllers/Api/ServicesController.php (API handler - BUG FIX)
  * app/Livewire/Project/New/DockerCompose.php (custom compose handler)
  * app/Http/Controllers/Api/ApplicationsController.php (API custom compose handler)
- Added comprehensive unit tests for the new helper function

Benefits:
- Single source of truth for service prerequisites
- DRY - eliminates code duplication between web and API handlers
- Fixes bug where API-created services didn't get prerequisites applied
- Easy to extend for future services (just edit the constant)
- More maintainable and testable

Related commits: 3a94f1ea1 (Beszel), 02b18c86e (Appwrite)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 16:47:09 +01:00
Andras Bacsai
cd10796612 Fix: Version downgrade prevention - validate cache and add running version checks
## Changes
- **CheckForUpdatesJob**: Add triple version comparison (CDN vs cache vs running)
  - Never allows version downgrade from currently running version
  - Uses data_set() for safer nested array mutation
  - Prevents incorrect new_version_available flag setting

- **UpdateCoolify**: Add cache validation before fallback
  - Validates cache against running version on CDN failure
  - Throws exception if cache is corrupted/older than running
  - Applies to both manual and automated updates

- **Tests**: Add comprehensive test coverage
  - tests/Unit/CheckForUpdatesJobTest.php (5 tests)
  - tests/Unit/UpdateCoolifyTest.php (3 tests)

## Impact
- Prevents all downgrade scenarios (CDN rollback, corrupted cache, etc.)
- Maintains backward compatibility
- Provides clear logging for debugging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 16:05:41 +01:00
Andras Bacsai
c136724838 fix(docker): migrate database start actions from --time to -t flag
Migrates 8 database start action files from deprecated --time=10 to compatible -t 10 flag for Docker v28+ compatibility. Also updates test expectations in StopProxyTest.php.

Docker deprecated the --time flag in v28.0. The -t shorthand works on all Docker versions (pre-28 and 28+), ensuring backward and forward compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 11:18:12 +01:00
Andras Bacsai
9503da60b4
Revert "fix(docker): migrate database start actions from --time to -t flag" 2025-11-28 11:15:55 +01:00
Andras Bacsai
5b7a6d9a76 fix(docker): migrate database start actions from --time to -t flag
Migrates 8 database start action files from deprecated --time=10 to compatible -t 10 flag for Docker v28+ compatibility. Also updates test expectations in StopProxyTest.php.

Docker deprecated the --time flag in v28.0. The -t shorthand works on all Docker versions (pre-28 and 28+), ensuring backward and forward compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 11:08:20 +01:00
Andras Bacsai
0073d045fb fix: enhance security by validating and escaping database names, file paths, and proxy configuration filenames to prevent command injection 2025-11-27 14:36:31 +01:00
Andras Bacsai
e60e74ac90
fix: trigger configuration changed detection for build settings (#7371) 2025-11-27 12:23:32 +01:00
Andras Bacsai
9452f0b468 fix: trigger configuration changed detection for build settings
Include 'Inject Build Args to Dockerfile' and 'Include Source Commit in Build' settings in the configuration hash calculation. These settings affect Docker build behavior, so changes to them should trigger the restart required notification. Add unit tests to verify hash changes when these settings are modified.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 12:22:54 +01:00
Andras Bacsai
639a56be52
fix: prevent SERVICE_FQDN/SERVICE_URL path duplication (#7370) 2025-11-27 10:59:39 +01:00
Andras Bacsai
0ecaa191db fix: prevent SERVICE_FQDN/SERVICE_URL path duplication on FQDN updates
Add endsWith() checks before appending template paths in serviceParser() to
prevent duplicate paths when parse() is called after FQDN updates. This fixes
the bug where services like Appwrite realtime would get `/v1/realtime/v1/realtime`.

Fixes #7363

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 10:57:24 +01:00
Andras Bacsai
b5666da342 test: add tests for shared environment variable spacing and resolution 2025-11-27 10:45:39 +01:00
Andras Bacsai
246e3cd8a2 fix: resolve Docker validation race conditions and sudo prefix bug
- Fix sudo prefix bug: Use word boundary matching to prevent 'do' keyword from matching 'docker' commands
- Add ensureProxyNetworksExist() helper to create networks before docker compose up
- Ensure networks exist synchronously before dispatching async proxy startup to prevent race conditions
- Update comprehensive unit tests for sudo parsing (50 tests passing)

This resolves issues where Docker commands failed to execute with sudo on non-root servers and where proxy networks were not created before the proxy container started.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 09:04:42 +01:00
Andras Bacsai
01635e8b80 fix: add bash control structure keywords to sudo command processing
Fixes issue #7346 where proxy startup failed on non-root servers due to
bash syntax errors when control structure keywords like 'for', 'do', 'done',
'break', and 'continue' were being prefixed with 'sudo'.

Added comprehensive exclusion list including for/while/until/case/select
loops, conditionals (if/then/else/elif/fi), and loop control keywords
(break/continue). Also excludes comment lines starting with '#'.

All 37 unit tests pass, including new tests for each bash control structure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:44:53 +01:00
Andras Bacsai
1d277f28dd
feat: custom docker entrypoint (#7097) 2025-11-26 09:31:02 +01:00
Andras Bacsai
9113ed714f feat: add validation methods for S3 bucket names, paths, and server paths; update import logic to prevent command injection 2025-11-25 16:40:35 +01:00
Andras Bacsai
6613f7c6b8
Merge branch 'next' into env-var-autocomplete 2025-11-25 11:21:53 +01:00
Andras Bacsai
875351188f feat: improve S3 restore path handling and validation state
- Add path attribute mutator to S3Storage model ensuring paths start with /
- Add updatedS3Path hook to normalize path and reset validation state on blur
- Add updatedS3StorageId hook to reset validation state when storage changes
- Add Enter key support to trigger file check from path input
- Use wire:model.live for S3 storage select, wire:model.blur for path input
- Improve shell escaping in restore job cleanup commands
- Fix isSafeTmpPath helper logic for directory validation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 10:18:30 +01:00
Andras Bacsai
6d8144c18c Merge remote-tracking branch 'origin/next' into s3-restore
Resolve merge conflicts in:
- bootstrap/helpers/shared.php (kept both formatBytes, isSafeTmpPath, and formatContainerStatus functions)
- database/migrations/2025_10_10_120002_create_cloud_init_scripts_table.php (added Schema::hasTable check)
- database/migrations/2025_10_10_120002_create_webhook_notification_settings_table.php (added Schema::hasTable check)
- resources/views/livewire/project/application/general.blade.php (formatting/whitespace)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 09:35:37 +01:00
Andras Bacsai
01d3f07934
Merge branch 'next' into env-var-autocomplete 2025-11-25 09:35:20 +01:00
Andras Bacsai
e0dc12678b
fix: comprehensive SERVICE_URL/SERVICE_FQDN handling improvements and queue reliability fixes (#7275) 2025-11-24 11:47:11 +01:00
Andras Bacsai
bf428a0e1c
fix: don't show health status for exited containers (#7317) 2025-11-24 10:29:57 +01:00
Andras Bacsai
1149d0f746
feat: implement prerequisite validation and installation for server setup (#7297) 2025-11-24 10:28:10 +01:00
Andras Bacsai
75381af742 fix: convert Stringable to plain strings in applicationParser for strict comparisons and collection lookups
This fixes critical bugs where Stringable objects were used in strict comparisons and collection key lookups, causing service existence checks and domain lookups to fail.

**Changes:**
- Line 539: Added ->value() to $originalServiceName conversion
- Line 541: Added ->value() to $serviceName normalization
- Line 621: Removed redundant (string) cast now that $serviceName is a plain string

**Impact:**
- Service existence check now works correctly (line 606: $transformedServiceName === $serviceName)
- Domain lookup finds existing domains (line 615: $domains->get($serviceName))
- Prevents duplicate domain entries in docker_compose_domains collection

**Tests:**
- Added comprehensive unit test suite in ApplicationParserStringableTest.php
- 9 test cases covering type verification, strict comparisons, collection operations, and edge cases
- All tests pass (24 tests, 153 assertions across related parser tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 09:22:27 +01:00
Andras Bacsai
ac9eca3c05 fix: don't show health status for exited containers
Exited containers don't run health checks, so showing "(unhealthy)" is
misleading. This fix ensures exited status displays without health
suffixes across all monitoring systems (SSH, Sentinel, services, etc.)
and at the UI layer for backward compatibility with existing data.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 09:09:37 +01:00
Andras Bacsai
30d206e7b9 feat: add async prerequisite installation with retry logic and visual feedback
This commit enhances the boarding flow to handle prerequisite installation asynchronously with proper retry logic and user feedback:

- Add retry mechanism with max 3 attempts for prerequisite installation
- Display live installation logs via ActivityMonitor during boarding
- Reset ActivityMonitor state when starting new activity to prevent stale event dispatching
- Support dynamic header updates in ActivityMonitor
- Add prerequisitesInstalled event handler to revalidate after installation completes
- Extract validation logic into continueValidation() method for cleaner flow
- Add unit tests for prerequisite installation logic

This improves UX by showing users real-time progress during prerequisite installation and handles installation failures gracefully with automatic retries.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 08:44:04 +01:00
Andras Bacsai
29135e00ba feat: enhance prerequisite validation to return detailed results 2025-11-21 13:14:48 +01:00
Andras Bacsai
2edf2338de fix: enhance getRequiredPort to support map-style environment variables for SERVICE_URL and SERVICE_FQDN 2025-11-21 12:41:25 +01:00
Andras Bacsai
a5ce1db871 fix: handle map-style environment variables in updateCompose
The updateCompose() function now correctly detects SERVICE_URL_* and
SERVICE_FQDN_* variables regardless of whether they are defined in
YAML list-style or map-style format.

Previously, the code only worked with list-style environment definitions:
```yaml
environment:
  - SERVICE_URL_APP_3000
```

Now it also handles map-style definitions:
```yaml
environment:
  SERVICE_URL_TRIGGER_3000: ""
  SERVICE_FQDN_DB: localhost
```

The fix distinguishes between the two formats by checking if the array
key is numeric (list-style) or a string (map-style), then extracts the
variable name from the appropriate location.

Added 5 comprehensive unit tests covering:
- Map-style environment format detection
- Multiple map-style variables
- References vs declarations in map-style
- Abbreviated service names with map-style
- Verification of dual-format handling

This fixes variable detection for service templates like trigger.yaml,
langfuse.yaml, and paymenter.yaml that use map-style format.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 11:21:49 +01:00
Andras Bacsai
56f32d0f87 fix: properly handle SERVICE_URL and SERVICE_FQDN for abbreviated service names (#7243)
Parse template variables directly instead of generating from container names. Always create both SERVICE_URL and SERVICE_FQDN pairs together. Properly separate scheme handling (URL has scheme, FQDN doesn't). Add comprehensive test coverage.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 11:21:49 +01:00
Andras Bacsai
01609e7f8b feat: implement formatContainerStatus helper for human-readable status formatting and add unit tests 2025-11-21 09:12:56 +01:00
Andras Bacsai
7ceb124e9b feat: add validation for YAML parsing, integer parameters, and Docker Compose custom fields
This commit adds comprehensive validation improvements and DRY principles for handling Coolify's custom Docker Compose extensions.

## Changes

### 1. Created Reusable stripCoolifyCustomFields() Function
- Added shared helper in bootstrap/helpers/docker.php
- Removes all Coolify custom fields (exclude_from_hc, content, isDirectory, is_directory)
- Handles both long syntax (arrays) and short syntax (strings) for volumes
- Well-documented with comprehensive docblock
- Follows DRY principle for consistent field stripping

### 2. Fixed Docker Compose Modal Validation
- Updated validateComposeFile() to use stripCoolifyCustomFields()
- Now removes ALL custom fields before Docker validation (previously only removed content)
- Fixes validation errors when using templates with custom fields (e.g., traccar.yaml)
- Users can now validate compose files with Coolify extensions in UI

### 3. Enhanced YAML Validation in CalculatesExcludedStatus
- Added proper exception handling with ParseException vs generic Exception
- Added structure validation (checks if parsed result and services are arrays)
- Comprehensive logging with context (error message, line number, snippet)
- Maintains safe fallback behavior (returns empty collection on error)

### 4. Added Integer Validation to ContainerStatusAggregator
- Validates maxRestartCount parameter in both aggregateFromStrings() and aggregateFromContainers()
- Corrects negative values to 0 with warning log
- Logs warnings for suspiciously high values (> 1000)
- Prevents logic errors in crash loop detection

### 5. Comprehensive Unit Tests
- tests/Unit/StripCoolifyCustomFieldsTest.php (NEW) - 9 tests, 43 assertions
- tests/Unit/ContainerStatusAggregatorTest.php - Added 6 tests for integer validation
- tests/Unit/ExcludeFromHealthCheckTest.php - Added 4 tests for YAML validation
- All tests passing with proper Log facade mocking

### 6. Documentation
- Added comprehensive Docker Compose extensions documentation to .ai/core/deployment-architecture.md
- Documents all custom fields: exclude_from_hc, content, isDirectory/is_directory
- Includes examples, use cases, implementation details, and test references
- Updated .ai/README.md with navigation links to new documentation

## Benefits
- Better UX: Users can validate compose files with custom fields
- Better Debugging: Comprehensive logging for errors
- Better Code Quality: DRY principle with reusable validation
- Better Reliability: Prevents logic errors from invalid parameters
- Better Maintainability: Easy to add new custom fields in future

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 18:34:49 +01:00
Andras Bacsai
ae6eef3cdb feat(tests): add comprehensive tests for ContainerStatusAggregator and serverStatus accessor
- Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states.
- Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status.
- Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components.
- Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere.
- Updated version number for sentinel to 0.0.17 in versions.json.
2025-11-20 17:31:07 +01:00
Andras Bacsai
70fb4c6869 refactor: standardize Service model status aggregation to use ContainerStatusAggregator
Fixes inconsistency where Service model used manual state machine logic while
all other components (Application, ComplexStatusCheck, GetContainersStatus)
use the centralized ContainerStatusAggregator service.

Changes:
- Refactored Service::aggregateResourceStatuses() to use ContainerStatusAggregator
- Removed ~60 lines of duplicated state machine logic
- Added comprehensive ServiceExcludedStatusTest with 24 test cases
- Fixed bugs in old logic where paused/starting containers were incorrectly
  marked as unhealthy (should be unknown)

Benefits:
- Single source of truth for status aggregation across all models
- Leverages 42 existing ContainerStatusAggregator tests
- Consistent behavior between Service and Application/Database models
- Easier maintenance (state machine changes only in one place)

All tests pass (37 total):
- ServiceExcludedStatusTest: 24/24 passed
- AllExcludedContainersConsistencyTest: 13/13 passed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 15:03:18 +01:00
Andras Bacsai
14bba8ba86 fix: correct Sentinel default health status and remove debug logging
This commit addresses container status reporting issues and removes debug logging:

**Primary Fix:**
- Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data
- This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy"
- Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown'

**Service Multi-Container Aggregation:**
- Implemented service container status aggregation (same pattern as applications)
- Added serviceContainerStatuses collection to both Sentinel and SSH paths
- Services now aggregate status using priority: unhealthy > unknown > healthy
- Prevents race conditions where last-processed container would win

**Debug Logging Cleanup:**
- Removed all [STATUS-DEBUG] logging statements (25 total)
- Removed all ray() debugging calls (3 total)
- Removed proof_unknown_preserved and health_status_was_null debug fields
- Code is now production-ready

**Test Coverage:**
- Added 2 new tests for Sentinel default health status behavior
- Added 5 new tests for service aggregation in SSH path
- All 16 tests pass (66 assertions)

**Note:** The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 11:10:34 +01:00
Andras Bacsai
6b62847a11 fix: preserve unknown health status in Sentinel updates (PushServerUpdateJob)
## Problem
Services with "running (unknown)" status were periodically changing
to "running (healthy)" every ~30 seconds when Sentinel pushed updates.
This was confusing for users and inconsistent with SSH-based status checks.

## Root Cause
`PushServerUpdateJob::aggregateMultiContainerStatuses()` was missing
logic to track "unknown" health state. It only tracked "unhealthy" and
defaulted everything else to "healthy".

When Sentinel pushed updates with "running (unknown)" containers:
- The job saw `hasRunning = true` and `hasUnhealthy = false`
- It incorrectly returned "running (healthy)" instead of "running (unknown)"

## Solution
Updated `PushServerUpdateJob` to match the logic in `GetContainersStatus`:

1. Added `$hasUnknown` tracking variable
2. Check for "unknown" in status strings (alongside "unhealthy")
3. Implement 3-way priority: unhealthy > unknown > healthy

This ensures consistency between:
- SSH-based updates (`GetContainersStatus`)
- Sentinel-based updates (`PushServerUpdateJob`)
- UI display logic

## Changes
- **app/Jobs/PushServerUpdateJob.php**: Added unknown status tracking
- **tests/Unit/PushServerUpdateJobStatusAggregationTest.php**: New comprehensive tests
- **tests/Unit/ExcludeFromHealthCheckTest.php**: Updated to match current implementation

## Testing
All 31 status-related unit tests passing:
- 18 tests in ContainerHealthStatusTest
- 8 tests in ExcludeFromHealthCheckTest (updated)
- 6 tests in PushServerUpdateJobStatusAggregationTest (new)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:40:58 +01:00
Andras Bacsai
e3746a4b88 fix: preserve unknown health state and handle edge case container states
This commit fixes container health status aggregation to correctly handle
unknown health states and edge case container states across all resource types.

Changes:

1. **Preserve Unknown Health State**
   - Add three-way priority: unhealthy > unknown > healthy
   - Detect containers without healthchecks (null health) as unknown
   - Apply across GetContainersStatus, ComplexStatusCheck, and Service models

2. **Handle Edge Case Container States**
   - Add support for: created, starting, paused, dead, removing
   - Map to appropriate statuses: starting (unknown), paused (unknown), degraded (unhealthy)
   - Prevent containers in transitional states from showing incorrect status

3. **Add :excluded Suffix for Excluded Containers**
   - Parse exclude_from_hc flag from docker-compose YAML
   - Append :excluded suffix to individual container statuses
   - Skip :excluded containers in non-excluded aggregation sections
   - Strip :excluded suffix in excluded aggregation sections
   - Makes it clear in UI which containers are excluded from monitoring

Files Modified:
- app/Actions/Docker/GetContainersStatus.php
- app/Actions/Shared/ComplexStatusCheck.php
- app/Models/Service.php
- tests/Unit/ContainerHealthStatusTest.php

Tests: 18 passed (82 assertions)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:19:25 +01:00
Andras Bacsai
498b189286 fix: correct status for services with all containers excluded from health checks
When all containers are excluded from health checks, display their actual status
with :excluded suffix instead of misleading hardcoded statuses. This prevents
broken UI state with incorrect action buttons and provides clarity that monitoring
is disabled.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 10:54:51 +01:00