coolify/.ai/core/application-architecture.md
Andras Bacsai 128c0b00ec docs: add comprehensive container status monitoring system documentation
## Added Documentation

Created detailed documentation in `.ai/core/application-architecture.md`
explaining the container status monitoring system to prevent future bugs.

## Key Sections

### 1. Container Status Monitoring System Overview
- Explains that status is updated through multiple independent paths
- Emphasizes that ALL paths must be updated when changing status logic

### 2. Critical Implementation Locations
Documents all four status calculation locations:
- **SSH-Based Updates**: `GetContainersStatus.php` (scheduled, every ~1min)
- **Sentinel-Based Updates**: `PushServerUpdateJob.php` (real-time, every ~30sec)
- **Multi-Server Aggregation**: `ComplexStatusCheck.php` (on-demand)
- **Service-Level Aggregation**: `Service.php` (service status)

### 3. Status Flow Diagram
Visual representation of how status flows from different sources to UI

### 4. Status Priority System
Documents the required priority: unhealthy > unknown > healthy

### 5. Excluded Containers
Explains `:excluded` suffix handling and behavior

### 6. Developer Guidelines
- Checklist of all locations to update
- Testing requirements
- Edge cases to handle

### 7. Related Tests
Links to all relevant test files

### 8. Common Bugs to Avoid
Real examples from bugs we've fixed, with solutions

## Why This Documentation Matters

The recent bug (unknown → healthy) happened because:
1. `GetContainersStatus.php` was updated to handle "unknown" status
2. `PushServerUpdateJob.php` was NOT updated
3. This caused periodic status flipping

This documentation ensures future developers (and AI assistants like Claude)
will know to update ALL four locations when modifying status logic.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:42:45 +01:00

20 KiB

Coolify Application Architecture

Laravel Project Structure

Core Application Directory (app/)

app/
├── Actions/           # Business logic actions (Action pattern)
├── Console/           # Artisan commands
├── Contracts/         # Interface definitions
├── Data/              # Data Transfer Objects (Spatie Laravel Data)
├── Enums/             # Enumeration classes
├── Events/            # Event classes
├── Exceptions/        # Custom exception classes
├── Helpers/           # Utility helper classes
├── Http/              # HTTP layer (Controllers, Middleware, Requests)
├── Jobs/              # Background job classes
├── Listeners/         # Event listeners
├── Livewire/          # Livewire components (Frontend)
├── Models/            # Eloquent models (Domain entities)
├── Notifications/     # Notification classes
├── Policies/          # Authorization policies
├── Providers/         # Service providers
├── Repositories/      # Repository pattern implementations
├── Services/          # Service layer classes
├── Traits/            # Reusable trait classes
└── View/              # View composers and creators

Core Domain Models

Infrastructure Management

Server.php (46KB, 1343 lines)

  • Purpose: Physical/virtual server management
  • Key Relationships:
    • hasMany(Application::class) - Deployed applications
    • hasMany(StandalonePostgresql::class) - Database instances
    • belongsTo(Team::class) - Team ownership
  • Key Features:
    • SSH connection management
    • Resource monitoring
    • Proxy configuration (Traefik/Caddy)
    • Docker daemon interaction

Application.php (74KB, 1734 lines)

  • Purpose: Application deployment and management
  • Key Relationships:
    • belongsTo(Server::class) - Deployment target
    • belongsTo(Environment::class) - Environment context
    • hasMany(ApplicationDeploymentQueue::class) - Deployment history
  • Key Features:
    • Git repository integration
    • Docker build and deployment
    • Environment variable management
    • SSL certificate handling

Service.php (58KB, 1325 lines)

  • Purpose: Multi-container service orchestration
  • Key Relationships:
    • hasMany(ServiceApplication::class) - Service components
    • hasMany(ServiceDatabase::class) - Service databases
    • belongsTo(Environment::class) - Environment context
  • Key Features:
    • Docker Compose generation
    • Service dependency management
    • Health check configuration

Team & Project Organization

Team.php (8.9KB, 308 lines)

  • Purpose: Multi-tenant team management
  • Key Relationships:
    • hasMany(User::class) - Team members
    • hasMany(Project::class) - Team projects
    • hasMany(Server::class) - Team servers
  • Key Features:
    • Resource limits and quotas
    • Team-based access control
    • Subscription management

Project.php (4.3KB, 156 lines)

  • Purpose: Project organization and grouping
  • Key Relationships:
    • hasMany(Environment::class) - Project environments
    • belongsTo(Team::class) - Team ownership
  • Key Features:
    • Environment isolation
    • Resource organization

Environment.php

  • Purpose: Environment-specific configuration
  • Key Relationships:
    • hasMany(Application::class) - Environment applications
    • hasMany(Service::class) - Environment services
    • belongsTo(Project::class) - Project context

Database Management Models

Standalone Database Models

Common Features:

  • Database configuration management
  • Backup scheduling and execution
  • Connection string generation
  • Health monitoring

Configuration & Settings

EnvironmentVariable.php (7.6KB, 219 lines)

  • Purpose: Application environment variable management
  • Key Features:
    • Encrypted value storage
    • Build-time vs runtime variables
    • Shared variable inheritance

InstanceSettings.php (3.2KB, 124 lines)

  • Purpose: Global Coolify instance configuration
  • Key Features:
    • FQDN and port configuration
    • Auto-update settings
    • Security configurations

Architectural Patterns

Action Pattern (app/Actions/)

Using lorisleiva/laravel-actions for business logic encapsulation:

// Example Action structure
class DeployApplication extends Action
{
    public function handle(Application $application): void
    {
        // Business logic for deployment
    }
    
    public function asJob(Application $application): void
    {
        // Queue job implementation
    }
}

Key Action Categories:

  • Application/: Deployment and management actions
  • Database/: Database operations
  • Server/: Server management actions
  • Service/: Service orchestration actions

Repository Pattern (app/Repositories/)

Data access abstraction layer:

  • Encapsulates database queries
  • Provides testable data layer
  • Abstracts complex query logic

Service Layer (app/Services/)

Business logic services:

  • External API integrations
  • Complex business operations
  • Cross-cutting concerns

Data Flow Architecture

Request Lifecycle

  1. HTTP Requestroutes/web.php
  2. Middleware → Authentication, authorization
  3. Livewire Componentapp/Livewire/
  4. Action/Service → Business logic execution
  5. Model/Repository → Data persistence
  6. Response → Livewire reactive update

Background Processing

  1. Job Dispatch → Queue system (Redis)
  2. Job Processingapp/Jobs/
  3. Action Execution → Business logic
  4. Event Broadcasting → Real-time updates
  5. Notification → User feedback

Security Architecture

Multi-Tenant Isolation

// Team-based query scoping
class Application extends Model
{
    public function scopeOwnedByCurrentTeam($query)
    {
        return $query->whereHas('environment.project.team', function ($q) {
            $q->where('id', currentTeam()->id);
        });
    }
}

Authorization Layers

  1. Team Membership → User belongs to team
  2. Resource Ownership → Resource belongs to team
  3. Policy Authorizationapp/Policies/
  4. Environment Isolation → Project/environment boundaries

Data Protection

  • Environment Variables: Encrypted at rest
  • SSH Keys: Secure storage and transmission
  • API Tokens: Sanctum-based authentication
  • Audit Logging: spatie/laravel-activitylog

Configuration Hierarchy

Global Configuration

Team Configuration

Project Configuration

Application Configuration

Event-Driven Architecture

Event Broadcasting (app/Events/)

Real-time updates using Laravel Echo and WebSockets:

// Example event structure
class ApplicationDeploymentStarted implements ShouldBroadcast
{
    public function broadcastOn(): array
    {
        return [
            new PrivateChannel("team.{$this->application->team->id}"),
        ];
    }
}

Event Listeners (app/Listeners/)

  • Deployment status updates
  • Resource monitoring alerts
  • Notification dispatching
  • Audit log creation

Database Design Patterns

Polymorphic Relationships

// Environment variables can belong to multiple resource types
class EnvironmentVariable extends Model
{
    public function resource(): MorphTo
    {
        return $this->morphTo();
    }
}

Team-Based Soft Scoping

All major resources include team-based query scoping:

// Automatic team filtering
$applications = Application::ownedByCurrentTeam()->get();
$servers = Server::ownedByCurrentTeam()->get();

Configuration Inheritance

Environment variables cascade from:

  1. Shared Variables → Team-wide defaults
  2. Project Variables → Project-specific overrides
  3. Application Variables → Application-specific values

Integration Patterns

Git Provider Integration

Abstracted git operations supporting:

Docker Integration

  • Container Management: Direct Docker API communication
  • Image Building: Dockerfile and Buildpack support
  • Network Management: Custom Docker networks
  • Volume Management: Persistent storage handling

SSH Communication

Testing Architecture

Test Structure (tests/)

tests/
├── Feature/           # Integration tests
├── Unit/              # Unit tests
├── Browser/           # Dusk browser tests
├── Traits/            # Test helper traits
├── Pest.php           # Pest configuration
└── TestCase.php       # Base test case

Testing Patterns

  • Feature Tests: Full request lifecycle testing
  • Unit Tests: Individual class/method testing
  • Browser Tests: End-to-end user workflows
  • Database Testing: Factories and seeders

Performance Considerations

Query Optimization

  • Eager Loading: Prevent N+1 queries
  • Query Scoping: Team-based filtering
  • Database Indexing: Optimized for common queries

Caching Strategy

  • Redis: Session and cache storage
  • Model Caching: Frequently accessed data
  • Query Caching: Expensive query results

Background Processing

  • Queue Workers: Horizon-managed job processing
  • Job Batching: Related job grouping
  • Failed Job Handling: Automatic retry logic

Container Status Monitoring System

Overview

Container health status is monitored and updated through multiple independent paths. When modifying status logic, ALL paths must be updated to ensure consistency.

Critical Implementation Locations

1. SSH-Based Status Updates (Scheduled)

File: app/Actions/Docker/GetContainersStatus.php Method: aggregateApplicationStatus() (lines 487-540) Trigger: Scheduled job or manual refresh Frequency: Every minute (via ServerCheckJob)

Status Aggregation Logic:

// Tracks multiple status flags
$hasRunning = false;
$hasRestarting = false;
$hasUnhealthy = false;
$hasUnknown = false;  // ⚠️ CRITICAL: Must track unknown
$hasExited = false;
// ... more states

// Priority: restarting > degraded > running (unhealthy > unknown > healthy)
if ($hasRunning) {
    if ($hasUnhealthy) return 'running (unhealthy)';
    elseif ($hasUnknown) return 'running (unknown)';
    else return 'running (healthy)';
}

2. Sentinel-Based Status Updates (Real-time)

File: app/Jobs/PushServerUpdateJob.php Method: aggregateMultiContainerStatuses() (lines 269-298) Trigger: Sentinel push updates from remote servers Frequency: Every ~30 seconds (real-time)

Status Aggregation Logic:

// ⚠️ MUST match GetContainersStatus logic
$hasRunning = false;
$hasUnhealthy = false;
$hasUnknown = false;  // ⚠️ CRITICAL: Added to fix bug

foreach ($relevantStatuses as $status) {
    if (str($status)->contains('running')) {
        $hasRunning = true;
        if (str($status)->contains('unhealthy')) $hasUnhealthy = true;
        if (str($status)->contains('unknown')) $hasUnknown = true;  // ⚠️ CRITICAL
    }
}

// Priority: unhealthy > unknown > healthy
if ($hasRunning) {
    if ($hasUnhealthy) $aggregatedStatus = 'running (unhealthy)';
    elseif ($hasUnknown) $aggregatedStatus = 'running (unknown)';
    else $aggregatedStatus = 'running (healthy)';
}

3. Multi-Server Status Aggregation

File: app/Actions/Shared/ComplexStatusCheck.php Method: resource() (lines 48-210) Purpose: Aggregates status across multiple servers for applications Used by: Applications with multiple destinations

Key Features:

  • Aggregates statuses from main + additional servers
  • Handles excluded containers (:excluded suffix)
  • Calculates overall application health from all containers

Status Format with Excluded Containers:

// When all containers excluded from health checks:
return 'running:unhealthy:excluded';  // Container running but unhealthy, monitoring disabled
return 'running:unknown:excluded';     // Container running, health unknown, monitoring disabled
return 'running:healthy:excluded';     // Container running and healthy, monitoring disabled
return 'degraded:excluded';            // Some containers down, monitoring disabled
return 'exited:excluded';              // All containers stopped, monitoring disabled

4. Service-Level Status Aggregation

File: app/Models/Service.php Method: complexStatus() (lines 176-288) Purpose: Aggregates status for multi-container services Used by: Docker Compose services

Status Calculation:

// Aggregates status from all service applications and databases
// Handles excluded containers separately
// Returns status with :excluded suffix when all containers excluded
if (!$hasNonExcluded && $complexStatus === null && $complexHealth === null) {
    // All services excluded - calculate from excluded containers
    return "{$excludedStatus}:excluded";
}

Status Flow Diagram

┌─────────────────────────────────────────────────────────────┐
│                    Container Status Sources                  │
└─────────────────────────────────────────────────────────────┘
                             │
        ┌────────────────────┼────────────────────┐
        │                    │                    │
        ▼                    ▼                    ▼
┌───────────────┐   ┌─────────────────┐   ┌──────────────┐
│ SSH-Based     │   │ Sentinel-Based  │   │ Multi-Server │
│ (Scheduled)   │   │ (Real-time)     │   │ Aggregation  │
├───────────────┤   ├─────────────────┤   ├──────────────┤
│ ServerCheck   │   │ PushServerUp-   │   │ ComplexStatus│
│ Job           │   │ dateJob         │   │ Check        │
│               │   │                 │   │              │
│ Every ~1min   │   │ Every ~30sec    │   │ On demand    │
└───────┬───────┘   └────────┬────────┘   └──────┬───────┘
        │                    │                    │
        └────────────────────┼────────────────────┘
                             │
                             ▼
                 ┌───────────────────────┐
                 │ Application/Service   │
                 │ Status Property       │
                 └───────────────────────┘
                             │
                             ▼
                 ┌───────────────────────┐
                 │ UI Display (Livewire) │
                 └───────────────────────┘

Status Priority System

All status aggregation locations MUST follow the same priority:

For Running Containers:

  1. unhealthy - Container has failing health checks
  2. unknown - Container health status cannot be determined
  3. healthy - Container is healthy

For Non-Running States:

  1. restartingdegraded (unhealthy)
  2. running + exiteddegraded (unhealthy)
  3. dead/removingdegraded (unhealthy)
  4. pausedpaused
  5. created/startingstarting
  6. exitedexited (unhealthy)

Excluded Containers

When containers have exclude_from_hc: true flag:

Behavior:

  • Status is still calculated from container state
  • :excluded suffix is appended to indicate monitoring disabled
  • UI shows "(Monitoring Disabled)" badge
  • Action buttons respect the actual container state

Format: {actual-status}:excluded Examples: running:unknown:excluded, degraded:excluded, exited:excluded

Important Notes for Developers

⚠️ CRITICAL: When modifying container status logic:

  1. Update ALL four locations:

    • GetContainersStatus.php (SSH-based)
    • PushServerUpdateJob.php (Sentinel-based)
    • ComplexStatusCheck.php (multi-server)
    • Service.php (service-level)
  2. Maintain consistent priority:

    • unhealthy > unknown > healthy
    • Apply same logic across all paths
  3. Test both update paths:

    • Run unit tests: ./vendor/bin/pest tests/Unit/
    • Test SSH updates (manual refresh)
    • Test Sentinel updates (wait 30 seconds)
  4. Handle edge cases:

    • All containers excluded (exclude_from_hc: true)
    • Mixed excluded/non-excluded containers
    • Unknown health states
    • Container crash loops (restart count)

Common Bugs to Avoid

Bug: Forgetting to track $hasUnknown flag Fix: Initialize and check for "unknown" in all status aggregation

Bug: Using ternary operator instead of if-elseif-else Fix: Use explicit if-elseif-else to handle 3-way priority

Bug: Updating only one path (SSH or Sentinel) Fix: Always update all four status calculation locations

Bug: Not handling excluded containers with :excluded suffix Fix: Check for :excluded suffix in UI logic and button visibility