Retry Policies

Retries are opt-in per command task and help absorb transient failures (network blips, flaky registries, temporary service startup races).

Basic Retry

tasks:
  publish:
    desc: Push release artifact
    cmd: ./scripts/publish.sh
    retry: 3

retry: 3 means up to 3 retry attempts after the first failed run.

Retry Delay

Add a fixed delay between attempts:

tasks:
  publish:
    desc: Push release artifact
    cmd: ./scripts/publish.sh
    retry: 3
    retry_delay: 2s

Backoff Strategy

retry_backoff supports:

fixed (default)
exponential

tasks:
  publish:
    desc: Push release artifact
    cmd: ./scripts/publish.sh
    retry: 4
    retry_delay: 1s
    retry_backoff: exponential

With exponential backoff and 1s base delay, wait windows are 1s, 2s, 4s, 8s.

Conditional Retry With `retry_on`

By default, retry conditions are any. You can restrict retries:

tasks:
  publish:
    desc: Push release artifact
    cmd: ./scripts/publish.sh
    retry: 3
    retry_on:
      - exit_code:1
      - stderr_contains:connection reset

Supported condition forms:

any
exit_code:<n>
stderr_contains:<text>

Worked Example: Flaky Integration Test

tasks:
  test-integration:
    desc: Run integration tests against ephemeral db
    cmd: go test -tags=integration ./...
    retry: 2
    retry_delay: 3s
    retry_backoff: fixed
    retry_on:
      - stderr_contains:connection refused
      - stderr_contains:context deadline exceeded

This retries only when transient DB startup symptoms appear, not for all failures.

Retry Events

When --events is enabled, retry attempts emit structured retry events (attempt number, max attempts, reason, delay).

qp test-integration --events 2>events.jsonl

This makes retry behavior observable in CI logs and downstream tooling.

Retry And Task Outcome

If an attempt eventually passes, task status is pass.
If all attempts fail, final status is fail/timeout/cancelled based on the last attempt outcome.
Retry logic only applies to command tasks.

Practical Guidance

Use retries for external/transient dependencies, not for deterministic lint/test failures.
Pair retries with targeted retry_on filters.
Keep retry counts modest to avoid long hidden loops.
Prefer fixing root causes over large retry budgets.

Next Step

For full machine-readable observability of task and pipeline execution, continue to Events and JSON Output.