Task Definition
This guide will show you how to define tasks in Cyclonetix, from basic task definitions to more complex configurations with parameters, dependencies, and advanced features.
Basic Task Structure
Tasks in Cyclonetix are defined using YAML files. Here’s the structure of a basic task:
id: "task_id" # Unique identifier for the task
name: "Human-readable name" # Display name
description: "Task description" # Optional description
command: "echo 'Hello World'" # Command to execute
dependencies: [] # List of prerequisite tasks
parameters: {} # Task-specific parameters
queue: "default" # Optional queue name for execution
Task File Organization
By default, Cyclonetix looks for task definitions in the data/tasks
directory. You can organize tasks in subdirectories for better management:
data/
└── tasks/
├── data_processing/
│ ├── extract.yaml
│ ├── transform.yaml
│ └── load.yaml
├── ml/
│ ├── train.yaml
│ └── evaluate.yaml
└── deployment/
├── build.yaml
└── deploy.yaml
Command Specification
The command
field specifies what will be executed when the task runs. This can be:
- A simple shell command
- A complex script with pipes and redirections
- A reference to an executable file
Examples:
# Simple command
command: "echo 'Task completed'"
# Multi-line script
command: |
echo "Starting task"
python /path/to/script.py --arg1 value1 --arg2 value2
echo "Task finished with status $?"
# Using environment variables
command: "python train.py --data-path ${DATA_PATH} --epochs ${EPOCHS}"
Defining Dependencies
Dependencies are specified as a list of task IDs that must complete successfully before this task can start:
dependencies:
- "data_preparation"
- "feature_engineering"
Conditional Dependencies
You can specify parameter-specific dependencies for more granular control:
dependencies:
- "data_preparation:daily" # Depends on data_preparation with parameter set "daily"
- "feature_engineering"
Parameter Configuration
Parameters allow you to make tasks configurable and reusable:
parameters:
inputPath: "/data/input"
outputPath: "/data/output"
mode: "incremental"
maxThreads: 4
Parameters can be: - Referenced in the command using environment variables - Overridden at scheduling time - Used to create task variants
Queue Assignment
Assigning tasks to specific queues allows for resource allocation and specialization:
queue: "gpu_tasks" # Assign to a GPU-specific queue
If not specified, tasks will use the default
queue.
Evaluation Points
Tasks can be designated as evaluation points, which allow for dynamic decision-making during execution:
id: "evaluate_model"
name: "Evaluate Model Performance"
command: "python evaluate.py --model ${MODEL_PATH} --threshold ${THRESHOLD}"
dependencies:
- "train_model"
evaluation_point: true
parameters:
threshold: 0.85
Evaluation points can: - Determine which downstream tasks to execute - Modify the execution graph at runtime - Implement conditional branching - Serve as approval gates
Complete Task Example
Here’s a comprehensive example of a task definition:
id: "train_model"
name: "Train Machine Learning Model"
description: "Trains a machine learning model using prepared data"
command: |
python train.py \
--data-path ${DATA_PATH} \
--model-type ${MODEL_TYPE} \
--epochs ${EPOCHS} \
--batch-size ${BATCH_SIZE} \
--output-path ${OUTPUT_PATH} \
--log-level ${LOG_LEVEL}dependencies:
- "prepare_data"
- "feature_engineering"
parameters:
DATA_PATH: "/data/processed"
MODEL_TYPE: "random_forest"
EPOCHS: "100"
BATCH_SIZE: "32"
OUTPUT_PATH: "/models/latest"
LOG_LEVEL: "INFO"