%%{init: {'theme': 'forest', 'themeCSS': '.node rect { rx: 10; ry: 10; } '}}%% flowchart TD Orchestrator["Orchestrator"] Agent["Agent"] UI["UI Server"] InMem["In-Memory State"] Orchestrator <--> InMem Agent <--> InMem UI <--> InMem
Scaling
This guide covers strategies for scaling Cyclonetix from small deployments to large enterprise environments.
Scaling Components
Cyclonetix has several components that can be scaled independently:
- Orchestrator: Manages workflow execution and task scheduling
- Agents: Execute tasks and report results
- State backend: Stores workflow and task state (Redis, PostgreSQL)
- UI server: Serves the web interface
Scaling Patterns
Small Deployment (Development/Testing)
For small deployments or development environments:
- Single instance running all components
- In-memory or single Redis instance as backend
- Suitable for up to a few hundred tasks per day
Medium Deployment (Team/Department)
For medium-sized deployments:
%%{init: {'theme': 'forest', 'themeCSS': '.node rect { rx: 10; ry: 10; } '}}%% flowchart TD Orchestrator["Orchestrator"] UI["UI Server"] Redis["Redis Cluster"] AgentCPU["Agent (CPU)"] AgentMem["Agent (Memory)"] AgentGPU["Agent (GPU)"] Orchestrator <--> Redis UI <--> Redis Redis --> AgentCPU Redis --> AgentMem Redis --> AgentGPU AgentCPU --> Orchestrator AgentMem --> Orchestrator AgentGPU --> Orchestrator
- Separate orchestrator and agent instances
- Redis cluster for state management
- Suitable for thousands of tasks per day
Large Deployment (Enterprise)
For large enterprise deployments:
%%{init: {'theme': 'forest', 'themeCSS': '.node rect { rx: 10; ry: 10; } '}}%% flowchart TD Orch1["Orchestrator 1"] Orch2["Orchestrator 2"] Orch3["Orchestrator 3"] Postgres["PostgreSQL"] Redis["Redis"] Agent1["Agent 1"] Agent2["Agent 2"] Agent3["Agent 3"] AgentN["Agent N..."] Orch1 <--> Postgres Orch2 <--> Postgres Orch3 <--> Postgres Orch1 <--> Redis Orch2 <--> Redis Orch3 <--> Redis Redis --> Agent1 Redis --> Agent2 Redis --> Agent3 Redis --> AgentN Agent1 --> Orch1 Agent2 --> Orch2 Agent3 --> Orch3 AgentN --> Orch1
- Multiple orchestrators with work distribution
- Many specialized agents
- PostgreSQL for state management
- Redis for queuing
- Load-balanced UI servers
- Suitable for millions of tasks per day
Scaling Strategies
Scaling the Orchestrator
The orchestrator can be scaled horizontally:
- Multiple Orchestrators: Start multiple orchestrator instances
- Work Distribution: DAGs are automatically assigned to orchestrators based on a hashing algorithm
- Automatic Failover: If an orchestrator fails, its DAGs are reassigned
Configuration:
orchestrator:
id: "auto" # Auto-generate ID or specify
cluster_mode: true # Enable orchestrator clustering
distribution_algorithm: "consistent_hash" # Work distribution method
Scaling Agents
Agents can be scaled horizontally and specialized:
- Task Types: Dedicate agents to specific task types
- Resource Requirements: Create agent pools for different resource needs
- Locality: Deploy agents close to data or resources they need
Configuration:
agent:
queues: ["cpu_tasks", "default"] # Queues this agent subscribes to
tags: ["region:us-east", "cpu:high", "memory:standard"] # Agent capabilities
concurrency: 8 # Number of concurrent tasks
Queue-Based Distribution
Use specialized queues for workload distribution:
# In task definition
queue: "gpu_tasks" # Assign task to GPU queue
# In agent configuration
agent:
queues: ["gpu_tasks"] # This agent only processes GPU tasks
Auto-Scaling
Kubernetes-Based Auto-Scaling
On Kubernetes, use Horizontal Pod Autoscaler:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: cyclonetix-agent-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: cyclonetix-agent
minReplicas: 3
maxReplicas: 20
metrics:
- type: External
external:
metric:
name: redis_queue_length
selector:
matchLabels:
queue: default
target:
type: AverageValue
averageValue: 5
Cloud-Based Auto-Scaling
For cloud deployments, use cloud provider auto-scaling:
- AWS: Auto Scaling Groups
- GCP: Managed Instance Groups
- Azure: Virtual Machine Scale Sets
Scaling the State Backend
Redis Scaling
For Redis-based state management:
- Redis Cluster: Set up a Redis cluster for horizontal scaling
- Redis Sentinel: Use Redis Sentinel for high availability
- Redis Enterprise: Consider Redis Enterprise for large deployments
Configuration:
backend: "redis"
backend_url: "redis://redis-cluster:6379"
redis:
cluster_mode: true
read_from_replicas: true
connection_pool_size: 20
PostgreSQL Scaling
For PostgreSQL-based state management:
- Connection Pooling: Use PgBouncer for connection pooling
- Read Replicas: Distribute read operations to replicas
- Partitioning: Partition data for large-scale deployments
Configuration:
backend: "postgresql"
backend_url: "postgres://user:password@pg-host/cyclonetix"
postgresql:
max_connections: 20
statement_timeout_seconds: 30
use_prepared_statements: true
UI Scaling
For the UI server:
- Load Balancing: Deploy multiple UI servers behind a load balancer
- Caching: Implement caching for frequently accessed data
- WebSocket Optimization: Tune WebSocket connections for large numbers of clients
Performance Tuning
Task Batching
Group small tasks into batches:
batch:
enabled: true
max_tasks: 10
max_delay_seconds: 5
Optimized Serialization
Use binary serialization for better performance:
serialization_format: "binary" # Instead of default JSON
Resource Allocation
Tune resource allocation based on workload:
agent:
concurrency: 8 # Number of concurrent tasks
resource_allocation:
memory_per_task_mb: 256 # Memory allocation per task
cpu_weight_per_task: 1 # CPU weight per task
Monitoring for Scale
As you scale, monitoring becomes crucial:
- Prometheus Integration: Expose metrics for Prometheus
- Grafana Dashboards: Create Grafana dashboards for monitoring
- Alerts: Set up alerts for queue depth, agent health, etc.
monitoring:
prometheus:
enabled: true
endpoint: "/metrics"
metrics:
include_task_metrics: true
include_queue_metrics: true
include_agent_metrics: true
Scaling Limitations and Considerations
- Coordination Overhead: More orchestrators increase coordination overhead
- Database Performance: State backend can become a bottleneck
- Network Latency: Distributed systems introduce latency
- Consistency vs. Availability: Trade-offs in distributed systems
Benchmarks and Sizing Guidelines
Deployment Size | Tasks/Day | Orchestrators | Agents | Backend | VM/Pod Size |
---|---|---|---|---|---|
Small | <1,000 | 1 | 1-3 | Redis Single | 2 CPU, 4GB RAM |
Medium | <10,000 | 1-3 | 5-10 | Redis Cluster | 4 CPU, 8GB RAM |
Large | <100,000 | 3-5 | 10-30 | PostgreSQL | 8 CPU, 16GB RAM |
Enterprise | >100,000 | 5+ | 30+ | PostgreSQL HA | 16+ CPU, 32+ GB RAM |
Example Scaling Configurations
Kubernetes Multi-Node Deployment
# config.yaml
backend: "redis"
backend_url: "redis://redis-service.cyclonetix.svc.cluster.local:6379"
# kubernetes manifests
apiVersion: apps/v1
kind: Deployment
metadata:
name: cyclonetix-orchestrator
spec:
replicas: 3
# ...
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cyclonetix-agent-cpu
spec:
replicas: 5
# ...
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cyclonetix-agent-memory
spec:
replicas: 3
# ...
Docker Compose Multi-Container
# docker-compose.yml
version: '3'
services:
redis:
image: redis:alpine
# ...
orchestrator:
image: cyclonetix:latest
command: --orchestrator
# ...
ui:
image: cyclonetix:latest
command: --ui
ports:
- "3000:3000"
# ...
agent-1:
image: cyclonetix:latest
command: --agent
environment:
- CYCLO_AGENT_QUEUES=default,cpu_tasks
# ...
agent-2:
image: cyclonetix:latest
command: --agent
environment:
- CYCLO_AGENT_QUEUES=memory_tasks
# ...
Next Steps
- Review Configuration Options for fine-tuning
- Set up Security for your scaled deployment
- Check the Developer Guide for extending Cyclonetix