Scaling

This guide covers strategies for scaling Cyclonetix from small deployments to large enterprise environments.

Scaling Components

Cyclonetix has several components that can be scaled independently:

Orchestrator: Manages workflow execution and task scheduling
Agents: Execute tasks and report results
State backend: Stores workflow and task state (Redis, PostgreSQL)
UI server: Serves the web interface

Scaling Patterns

Small Deployment (Development/Testing)

For small deployments or development environments:

%%{init: {'theme': 'forest', 'themeCSS': '.node rect { rx: 10; ry: 10; } '}}%%
flowchart TD
Orchestrator["Orchestrator"]
Agent["Agent"]
UI["UI Server"]
InMem["In-Memory State"]

        Orchestrator <--> InMem
        Agent <--> InMem
        UI <--> InMem

Single instance running all components
In-memory or single Redis instance as backend
Suitable for up to a few hundred tasks per day

Medium Deployment (Team/Department)

For medium-sized deployments:

%%{init: {'theme': 'forest', 'themeCSS': '.node rect { rx: 10; ry: 10; } '}}%%
flowchart TD
Orchestrator["Orchestrator"]
UI["UI Server"]
Redis["Redis Cluster"]
AgentCPU["Agent (CPU)"]
AgentMem["Agent (Memory)"]
AgentGPU["Agent (GPU)"]

    Orchestrator <--> Redis
    UI <--> Redis
    
    Redis --> AgentCPU
    Redis --> AgentMem
    Redis --> AgentGPU
    
    AgentCPU --> Orchestrator
    AgentMem --> Orchestrator
    AgentGPU --> Orchestrator

Separate orchestrator and agent instances
Redis cluster for state management
Suitable for thousands of tasks per day

Large Deployment (Enterprise)

For large enterprise deployments:

%%{init: {'theme': 'forest', 'themeCSS': '.node rect { rx: 10; ry: 10; } '}}%%
flowchart TD
Orch1["Orchestrator 1"]
Orch2["Orchestrator 2"]
Orch3["Orchestrator 3"]

    Postgres["PostgreSQL"]
    Redis["Redis"]
    
    Agent1["Agent 1"]
    Agent2["Agent 2"]
    Agent3["Agent 3"]
    AgentN["Agent N..."]
    
    Orch1 <--> Postgres
    Orch2 <--> Postgres
    Orch3 <--> Postgres
    
    Orch1 <--> Redis
    Orch2 <--> Redis
    Orch3 <--> Redis
    
    Redis --> Agent1
    Redis --> Agent2
    Redis --> Agent3
    Redis --> AgentN
    
    Agent1 --> Orch1
    Agent2 --> Orch2
    Agent3 --> Orch3
    AgentN --> Orch1

Multiple orchestrators with work distribution
Many specialized agents
PostgreSQL for state management
Redis for queuing
Load-balanced UI servers
Suitable for millions of tasks per day

Scaling Strategies

Scaling the Orchestrator

The orchestrator can be scaled horizontally:

Multiple Orchestrators: Start multiple orchestrator instances
Work Distribution: DAGs are automatically assigned to orchestrators based on a hashing algorithm
Automatic Failover: If an orchestrator fails, its DAGs are reassigned

Configuration:

orchestrator:
  id: "auto"              # Auto-generate ID or specify
  cluster_mode: true      # Enable orchestrator clustering
  distribution_algorithm: "consistent_hash"  # Work distribution method

Scaling Agents

Agents can be scaled horizontally and specialized:

Task Types: Dedicate agents to specific task types
Resource Requirements: Create agent pools for different resource needs
Locality: Deploy agents close to data or resources they need

Configuration:

agent:
  queues: ["cpu_tasks", "default"]  # Queues this agent subscribes to
  tags: ["region:us-east", "cpu:high", "memory:standard"]  # Agent capabilities
  concurrency: 8                    # Number of concurrent tasks

Queue-Based Distribution

Use specialized queues for workload distribution:

# In task definition
queue: "gpu_tasks"  # Assign task to GPU queue

# In agent configuration
agent:
  queues: ["gpu_tasks"]  # This agent only processes GPU tasks

Auto-Scaling

Kubernetes-Based Auto-Scaling

On Kubernetes, use Horizontal Pod Autoscaler:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cyclonetix-agent-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cyclonetix-agent
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: External
    external:
      metric:
        name: redis_queue_length
        selector:
          matchLabels:
            queue: default
      target:
        type: AverageValue
        averageValue: 5

Cloud-Based Auto-Scaling

For cloud deployments, use cloud provider auto-scaling:

AWS: Auto Scaling Groups
GCP: Managed Instance Groups
Azure: Virtual Machine Scale Sets

Scaling the State Backend

Redis Scaling

For Redis-based state management:

Redis Cluster: Set up a Redis cluster for horizontal scaling
Redis Sentinel: Use Redis Sentinel for high availability
Redis Enterprise: Consider Redis Enterprise for large deployments

Configuration:

backend: "redis"
backend_url: "redis://redis-cluster:6379"
redis:
  cluster_mode: true
  read_from_replicas: true
  connection_pool_size: 20

PostgreSQL Scaling

For PostgreSQL-based state management:

Connection Pooling: Use PgBouncer for connection pooling
Read Replicas: Distribute read operations to replicas
Partitioning: Partition data for large-scale deployments

Configuration:

backend: "postgresql"
backend_url: "postgres://user:password@pg-host/cyclonetix"
postgresql:
  max_connections: 20
  statement_timeout_seconds: 30
  use_prepared_statements: true

UI Scaling

For the UI server:

Load Balancing: Deploy multiple UI servers behind a load balancer
Caching: Implement caching for frequently accessed data
WebSocket Optimization: Tune WebSocket connections for large numbers of clients

Performance Tuning

Task Batching

Group small tasks into batches:

batch:
  enabled: true
  max_tasks: 10
  max_delay_seconds: 5

Optimized Serialization

Use binary serialization for better performance:

serialization_format: "binary"  # Instead of default JSON

Resource Allocation

Tune resource allocation based on workload:

agent:
  concurrency: 8              # Number of concurrent tasks
  resource_allocation:
    memory_per_task_mb: 256   # Memory allocation per task
    cpu_weight_per_task: 1    # CPU weight per task

Monitoring for Scale

As you scale, monitoring becomes crucial:

Prometheus Integration: Expose metrics for Prometheus
Grafana Dashboards: Create Grafana dashboards for monitoring
Alerts: Set up alerts for queue depth, agent health, etc.

monitoring:
  prometheus:
    enabled: true
    endpoint: "/metrics"
  metrics:
    include_task_metrics: true
    include_queue_metrics: true
    include_agent_metrics: true

Scaling Limitations and Considerations

Coordination Overhead: More orchestrators increase coordination overhead
Database Performance: State backend can become a bottleneck
Network Latency: Distributed systems introduce latency
Consistency vs. Availability: Trade-offs in distributed systems

Benchmarks and Sizing Guidelines

Deployment Size	Tasks/Day	Orchestrators	Agents	Backend	VM/Pod Size
Small	<1,000	1	1-3	Redis Single	2 CPU, 4GB RAM
Medium	<10,000	1-3	5-10	Redis Cluster	4 CPU, 8GB RAM
Large	<100,000	3-5	10-30	PostgreSQL	8 CPU, 16GB RAM
Enterprise	>100,000	5+	30+	PostgreSQL HA	16+ CPU, 32+ GB RAM

Example Scaling Configurations

Kubernetes Multi-Node Deployment

# config.yaml
backend: "redis"
backend_url: "redis://redis-service.cyclonetix.svc.cluster.local:6379"

# kubernetes manifests
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cyclonetix-orchestrator
spec:
  replicas: 3
  # ...

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cyclonetix-agent-cpu
spec:
  replicas: 5
  # ...

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cyclonetix-agent-memory
spec:
  replicas: 3
  # ...

Docker Compose Multi-Container

# docker-compose.yml
version: '3'
services:
  redis:
    image: redis:alpine
    # ...

  orchestrator:
    image: cyclonetix:latest
    command: --orchestrator
    # ...

  ui:
    image: cyclonetix:latest
    command: --ui
    ports:
      - "3000:3000"
    # ...

  agent-1:
    image: cyclonetix:latest
    command: --agent
    environment:
      - CYCLO_AGENT_QUEUES=default,cpu_tasks
    # ...

  agent-2:
    image: cyclonetix:latest
    command: --agent
    environment:
      - CYCLO_AGENT_QUEUES=memory_tasks
    # ...

Next Steps

Review Configuration Options for fine-tuning
Set up Security for your scaled deployment
Check the Developer Guide for extending Cyclonetix