Getting Started

This guide takes you from zero to a running Fiber instance with your first workflow in about 10 minutes.

Prerequisites

  • Python 3.12+
  • A running Temporal server (or temporal server start-dev for local dev)
  • PostgreSQL (optional — SQLite works for dev)
  • Redis (optional — needed for rate limiting, locks, and the background worker)

Install

pip install fiber-orchestrator
# Or from source:
pip install -e ".[dev]"

Configure

cp .env.example .env

Set these at minimum:

TEMPORAL_SERVER_URL=localhost:7233
INTEGRATION_SECRET_KEY=$(python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())')

For production, also set DATABASE_URL (PostgreSQL) and REDIS_URL.

Seed your integrations

Integration config is encrypted at rest in the database. Seed from a JSON file or via API:

# From file
fiber seed-integrations integrations.json

# Or via API (after the gateway is running)
curl -X PUT localhost:8000/integrations/github \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"token": "ghp_...", "org": "myorg", "webhook_secret": "..."}'

Your first workflow

Create a rules file:

# rules.d/my-first-rule.yaml
rules:
  - name: pr-notify
    on: github.pr.opened
    do:
      - plane.comment: "PR opened: [{pr.title}]({pr.url}) by @{sender}"
      - plane.state: "In Review"

Validate

fiber validate rules.d/my-first-rule.yaml

Test with dry-run

fiber serve  # start the gateway

curl -X POST localhost:8000/dry-run -d '{
  "event": "github.pr.opened",
  "vars": {
    "pr.title": "Fix login bug",
    "pr.url": "https://github.com/myorg/myapp/pull/42",
    "sender": "alice"
  },
  "explain": true
}'

The response shows which workflows matched and why, without executing anything.

Go live

Point your GitHub webhook at https://your-host/webhook/github with the same secret you configured. Open a PR — Fiber handles the rest.


YAML DSL Reference

Complete reference for Fiber's workflow definition language. Files are loaded from routes.yaml and routes.d/*.yaml.

Top-level structure

defaults:          # key-value pairs available as {key} in all templates
  matrix.room: "!abc:example.com"
  llm.model: "claude-sonnet-4-6"

rules:             # list of workflow definitions
  - name: ...

tools:             # custom shell tools
  fetch-alerts:
    command: "curl -s $ALERTMANAGER_URL/api/v2/alerts"
    timeout: 15
    env: [ALERTMANAGER_URL]

Workflow definition

Every workflow has four layers:

- name: deploy-pipeline           # REQUIRED: unique identifier

  # ── TRIGGER LAYER ──
  on: plane.label_added            # event type(s) — string or list
  if:                              # ALL conditions must match (AND)
    label: deploy
    state: { not: done }
  any_of:                          # at least one group must match (OR of ANDs)
    - { severity: critical }
    - { severity: warning, alert.name: { contains: OOM } }

  # ── PREPARATION LAYER ──
  vars:                            # computed variables, resolved once before steps
    branch: "{pr.branch}"
    tag: "v{version}-{pr.number}"

  # ── EXECUTION LAYER ──
  bail: true                       # stop on first step failure (default: false)
  do:                              # sequential step list
    - shell:lint: null
    - agent: deploy
      wait: true
  finally:                         # always runs, even after bail
    - matrix.send: "Pipeline {_status}"

  # ── VERIFICATION LAYER ──
  verify:
    on: github.check.success       # event to wait for
    if: { check.name: CI }         # conditions on that event
    within: 30m                    # deadline
    else:                          # steps if deadline passes
      - plane.state: "Failed"

  # ── MODIFIERS ──
  cooldown: 300                    # seconds between fires
  schedule: 1h                     # periodic trigger
  strict: true                     # fail on missing template vars

Event types

SourceEvents
Plane plane.issue.created, plane.issue.updated, plane.label_added, plane.label_removed, plane.state_changed
GitHub github.pr.opened, github.pr.closed, github.pr.merged, github.pr.reopened, github.review.approved, github.review.changes_requested, github.check.success, github.check.failure
Alertmanager alertmanager.firing, alertmanager.resolved
Matrix matrix.message
MQTT Events emitted per-topic with mqtt.* vars
Engine outcome.met, outcome.unmet, action.<type>, invoke, schedule

Multiple triggers: on: [plane.label_added, plane.issue.created]

Condition operators

state: "In Review"                          # equality (case-insensitive)
severity: { in: [critical, warning] }       # membership
state: { not: Done }                        # negation
issue.text: { contains: deploy }            # substring
issue.text: { matches: "(?i)deploy" }       # regex
alert.name: { startswith: Database }        # prefix
repo: { endswith: "-api" }                  # suffix
pr.additions: { gt: 500 }                  # numeric: gt, gte, lt, lte, eq
pr.url: { exists: true }                   # non-empty check
description: { empty: false }              # emptiness check

Combination logic: if: keys are ANDed. any_of: is a list of dicts, each ANDed internally, groups ORed. Both together: if AND any_of.

Actions

ActionArgsDescription
plane.state"In Review"Change issue state
plane.comment"text with {vars}"Comment on issue
plane.label.add"incident"Add label
plane.label.remove"in-progress"Remove label
plane.issue{title, description?, labels?, state?, priority?}Create issue
github.comment"body" or {body, repo?, number?}Comment on PR/issue
github.status{state, context?, description?}Set commit status
github.label.add"label" or {labels, repo?}Add labels
github.merge"squash" or {method?, title?}Merge PR
github.issue{title, body?, labels?}Create issue
matrix.send"message" or {room?, body}Send chat message
llm"prompt" or {prompt, model?, system?}LLM completion
agent"type" or {type, presets?, wait?}Claude agent dispatch
alert"summary" or {name?, summary, severity?}Push alert
http"url" or {url, method?, headers?}HTTP request
shell"command"Shell command
shell:<tool>{arg: value}Named tool from tools:
run"workflow-name"Invoke another workflow

Step modifiers

- plane.comment: "PR opened: {pr.title}"
  as: comment_result                    # name the output for later steps
  wait: true                            # block until complete (for agents)
  if: { state: { not: done } }          # per-step condition
  any_of:                               # per-step OR conditions
    - { label: deploy }
    - { label: staging }
  retry: 2                              # retry count (exponential backoff)

Template variables

SourceVariables
Plane{event}, {issue.id}, {issue.title}, {issue.text}, {labels}, {label}, {state}, {old_state}
GitHub{pr.url}, {pr.title}, {pr.number}, {pr.branch}, {repo}, {sender}, {check.name}, {check.conclusion}
Alertmanager{severity}, {alert.name}, {alert.namespace}, {summary}, {description}, {fingerprint}
Matrix{room_id}, {sender}, {body}
MQTT{mqtt.topic}, {mqtt.<field>} (auto-flattened JSON)
Engine{_prev}, {_status}, {_failed_step}, {_failure_reason}

Named step output

do:
  - llm: "Classify: {alert.name}"
    as: classify
  - plane.comment: "Auto-triage: {classify}"
  - plane.issue:
      title: "[ALERT] {alert.name}"
    as: new_issue
  - matrix.send: "Created issue {new_issue.issue_id}"

DAG execution

Alternative to do: for parallel workflows. do: and dag: are mutually exclusive.

dag:
  start:
    - plane.comment: "Starting..."
      then: [lint, test]        # fan-out to parallel nodes
  lint:
    - shell:lint: null
      then: [gate]
  test:
    - shell:test: null
      then: [gate]
  gate:
    - join: all                 # "all" = wait for all parents, "any" = first wins
      then: [deploy]
  deploy:
    - agent: deploy
      wait: true

Verification

Async verification — wait for a confirming event or escalate:

verify:
  on: github.check.success       # event type to wait for
  if: { check.name: deploy }     # conditions on that event
  within: 30m                    # deadline
  else:                          # steps if deadline passes
    - plane.state: "Escalated"
    - alert: { severity: critical, summary: "Not verified in 30m" }

Workflow composition

Workflows invoke other workflows by name. Every step emits a synthetic event that other workflows can match. Max chain depth: 3.

# Parent workflow
- name: critical-alert
  on: alertmanager.firing
  if: { severity: critical }
  do:
    - run: notify-ops       # invoke child workflow

# Child workflow
- name: notify-ops
  on: invoke
  do:
    - matrix.send: "Critical: {alert.name}"

Duration strings

Accepted for cooldown:, schedule:, verify.within::

7d        = 604800s
2h30m     = 9000s
30m       = 1800s
45s       = 45s
1800      = 1800s  (bare int = seconds)

API Reference

Management endpoints require a Keycloak JWT with cikut_admin role. Webhook endpoints use connector-specific HMAC/bearer verification.

Webhook receivers

EndpointAuthSource
POST /webhookHMAC-SHA256Plane
POST /webhook/githubHMAC-SHA256GitHub
POST /webhook/alertmanagerBearer tokenAlertmanager
POST /webhook/matrixBearer tokenMatrix

Direct invocation

EndpointPurpose
POST /runDispatch agent with freeform input or issue ID
POST /codeRun Claude Code on a repository
POST /llmLLM completion via Bifrost
POST /matrix/sendSend Matrix message

Workflow management

EndpointPurpose
GET /workflowsList all named workflows
POST /dry-runSimulate event with match explanation
POST /fireFire a synthetic event through the engine
POST /triggers/syncForce-sync config to Plane
GET /triggers/healthValidate all references
GET /describeFull system introspection

Dry-run example

curl -X POST /dry-run -d '{
  "event": "alertmanager.firing",
  "vars": {"severity": "info", "alert.name": "HighCPU"},
  "explain": true
}'

# Response:
{
  "matched_workflows": 0,
  "total_evaluated": 3,
  "results": [{
    "workflow": "alert-response",
    "matched": false,
    "conditions": [
      {"key": "severity", "expected": "critical", "actual": "info", "matched": false}
    ],
    "reason": "no any_of branch matched"
  }]
}

Integrations API

Connector config is encrypted at rest. Secrets are never returned by the API.

EndpointPurpose
GET /integrationsList all (metadata only)
GET /integrations/{connector}Single integration metadata
PUT /integrations/{connector}Create or update (encrypted at rest)
DELETE /integrations/{connector}Remove integration

PUT and DELETE hot-reload the connector bus — no restart required.

Connector config format

{
  "plane":        {"base_url": "", "api_key": "", "webhook_secret": "",
                   "workspace_slug": "", "project_id": ""},
  "github":       {"token": "", "org": "", "webhook_secret": ""},
  "alertmanager": {"bearer_secret": "", "url": ""},
  "matrix":       {"homeserver": "", "access_token": "", "bot_user": "",
                   "webhook_secret": ""},
  "llm":          {"bifrost_url": "", "bifrost_api_key": "",
                   "default_model": ""},
  "grafana":      {"base_url": "", "api_key": ""},
  "mqtt":         {"broker": "", "username": "", "password": "",
                   "subscriptions": []}
}

Observability

EndpointPurpose
GET /healthzLiveness (no auth)
GET /metricsPrometheus metrics
GET /executionsExecution history (filter by issue, type, workflow, status)
GET /outcomesOutcome success rates

Prometheus metrics

fiber_webhooks_received_total{source, event, action}
fiber_rules_matched_total{event, action_type}
fiber_actions_executed_total{action_type, status}
fiber_agents_dispatched_total{agent}
fiber_agents_completed_total{agent, status}
fiber_agent_duration_seconds{agent}
fiber_agents_running{agent}
fiber_workflows_started_total{workflow}
fiber_workflows_completed_total{workflow, status}
fiber_rate_limited_total{tenant}
fiber_dead_letters_retried_total{status}

Distributed tracing

Every request gets a trace ID (from X-Request-ID or auto-generated). Propagates through gateway, engine, worker, and agent subprocesses. Stored on execution rows for correlation.

Tenants API

EndpointPurpose
GET /tenantsList all tenants
POST /tenantsCreate a new tenant
PUT /tenants/{slug}/workflowsReplace tenant workflows
POST /tenants/{slug}/workflows/appendAppend a workflow rule

Agents

Fiber includes an agent fabric for managed AI agent runtimes with workspace isolation and timeout budgets. Agents are dispatched as workflow steps and operate on Plane issues.

Agent types

AgentTriggerWhat it does
deploylabel:deployScaffold repo, Helm chart, DNS, CI/CD
codelabel:codeImplement features, write tests, create PR
reviewstate:In ReviewReview quality, security, test coverage
kaizenlabel:kaizenAnalyze tech debt, suggest improvements
triageissue createdClassify and route issues
srealert firingInvestigate infrastructure incidents
securityPR openedScan for vulnerabilities
docslabel:docsGenerate or update documentation
productlabel:productProduct requirements analysis
onboardingpulseUser onboarding flows
financepulseFinancial data analysis
growthpulseGrowth metrics and experiments

Dispatching agents

# Simple dispatch
do:
  - agent: deploy

# With options
do:
  - agent:
      type: deploy
      presets: [testing, security]
      wait: true
      context: "Additional context for the agent"

# Dynamic dispatch from event variable
do:
  - agent: "{label}"

# Dynamic dispatch from LLM classification
do:
  - llm: "Classify this issue. Return one of: code, deploy, sre"
    as: triage_result
  - agent: "{triage_result.agent}"

Chaining

Each agent sees the output of all prior runs on the same issue. Agents chain through sequential do: steps with wait: true:

- name: implement-pipeline
  on: plane.label_added
  if: { label: code }
  bail: true
  do:
    - agent: code
      wait: true
    - agent: security
      wait: true
    - agent: review
      wait: true
  verify:
    on: github.check.success
    if: { check.name: CI }
    within: 30m
  finally:
    - matrix.send: "Pipeline {_status} for {issue.title}"

GitHub context

When agents run on issues linked to GitHub PRs, they automatically receive PR diffs, review comments, and CI logs. This context is injected into the agent's prompt.

Presets

Reusable instruction fragments:

PresetPurpose
testingTest-writing conventions
securitySecurity review checklist
docsDocumentation standards
commit-conventionCommit message format
style-pythonPython code style
style-typescriptTypeScript code style

Agent fabric internals

The fabric manages the full lifecycle:

  1. Spec — Load agent configuration (model, workspace, budget, context)
  2. Workspace — Create isolated environment (tmpdir or git-worktree)
  3. Enrich — Inject issue context, prior run outputs, GitHub data
  4. Execute — Run agent subprocess with timeout budget
  5. Parse — Extract results from agent output
  6. Comment — Post results back to the Plane issue

Timeout budget is controlled by AGENT_TIMEOUT (default: 600 seconds).


MCP Server

Fiber exposes its capabilities as an MCP (Model Context Protocol) server, making all operations available as Claude-native tools. Any MCP client — Claude Code, OpenClaw, or custom agents — gets full orchestrator access.

Setup

Add to Claude Code settings:

// ~/.claude/settings.json
{
  "mcpServers": {
    "fiber": {
      "command": "fiber-mcp"
    }
  }
}

Available tools

Agent dispatch

  • run_agent — Dispatch an agent with freeform input
  • run_agent_for_issue — Dispatch an agent for a specific Plane issue
  • invoke_code — Run Claude Code on a repository
  • llm — LLM completion via Bifrost

Communication

  • send_matrix_message — Send to a Matrix room
  • list_matrix_rooms — List joined rooms
  • push_alert — Push to Alertmanager

Workflow management

  • list_workflows — List all named workflows
  • sync_workflows — Force-sync to Plane
  • workflows_health — Validate references
  • update_workflows — Replace tenant workflows
  • append_workflow — Add a single rule

Observability

  • list_executions — Query execution history
  • get_execution — Execution details
  • get_outcomes — Outcome success rates
  • describe — Full system introspection
  • fire_event — Fire a synthetic event
  • dry_run — Simulate event processing

Tenants

  • list_tenants — List all tenants
  • create_tenant — Create a new tenant

Operations Manual

How to run Fiber in production — the components, the state machine, triggering work, monitoring, and troubleshooting.

Running Fiber

Three components are required:

  1. Temporal server — durable workflow execution
  2. Fiber gateway — receives webhooks, matches rules
  3. Temporal worker — executes activities (agents, connectors)
# Verify Temporal is running
temporal operator cluster health

# Start the gateway + worker
fiber serve

In production, these are separate Kubernetes deployments managed by a Helm chart.

Triggering work

Via Plane (recommended): Create an issue, add the auto label. The system triages and dispatches automatically.

Via specific label: code, deploy, review, kaizen — skip triage, run the agent directly.

Via Matrix: Send @bot fix {description} — an issue is created and the pipeline runs.

State machine

Every issue follows the same state progression:

Backlog -> Triage -> In Progress -> In Review -> Done
                      |              |              |
                   classifies     code/deploy    review +
                   + labels       agent works    CI runs
                                  creates PR
                      ^              ^              |
                      |              +-- changes ---+
                      |                  requested
                      +--- fail / timeout ----------+
                                                    |
                                               Cancelled
                                             (PR closed unmerged)

Every failure returns to Triage for human visibility. No issue silently stalls.

The standard path

Issue created in Plane
  -> Auto-triage: LLM classifies type, priority, labels
  -> Label "auto" added
  -> LLM classifies: code / deploy / skip
  -> Code agent implements -> PR created
  -> Review agent reviews -> PR approved or changes requested
  -> CI passes + merge -> Done
  -> CI fails -> auto-fix -> verify within 30m -> Triage if still failing

Alert-triggered path

Alertmanager fires (critical/warning)
  -> Plane issue created: [ALERT] {name}
  -> Code agent investigates -> PR with fix
  -> Review agent validates
  -> Alert resolves within 1h -> Done
  -> Still firing -> labeled "escalate"

Troubleshooting

Agent fails

  1. Issue moves back to "Triage" automatically
  2. Agent output posted as comment on the issue
  3. Matrix notification sent
  4. Check the Plane issue comment for error output
  5. Fix the problem, re-add the auto label

CI doesn't pass within 30 minutes

  1. Issue moves to "Triage" with a comment
  2. Check the PR — the code agent may have introduced a bug
  3. Fix manually or re-trigger with the auto label

Agent stuck (running too long)

  • Default timeout: 10 minutes per agent
  • Temporal kills the activity on timeout
  • Check Temporal UI for workflow state
  • Workspace kept on failure for debugging, cleaned on success

Security model

  • Agents run as sandboxed subprocesses
  • Agents never receive database credentials or cluster admin tokens
  • Secrets passed via environment variables, never inlined
  • Agents create PRs — they never push to main directly
  • Deploy agent never runs kubectl apply — ArgoCD manages deployments
  • Distributed locks prevent concurrent agents on the same issue

Configuration

Only infrastructure settings are environment variables. All connector-specific config lives in the encrypted integrations DB table.

Environment variables

VariableDefaultPurpose
INTEGRATION_SECRET_KEY(required)Fernet key for encrypting integration config
DATABASE_URLsqlite:///data/fiber.dbDatabase (PostgreSQL recommended)
REDIS_URLredis://localhost:6379Locks, dedup, rate limiting, job queue
TEMPORAL_SERVER_URL(required)Temporal server for durable execution
ANTHROPIC_API_KEYAPI key for Claude agents
AGENT_TIMEOUT600Max seconds per agent run
RETENTION_DAYS90Data retention period

Encryption

Integration config uses envelope encryption: a per-row DEK encrypts the JSON config blob, and the DEK is wrapped under the KEK (INTEGRATION_SECRET_KEY). Key rotation is online — fiber rotate-key re-wraps DEKs without touching ciphertext.

# Generate a new encryption key
python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())'

Database migrations

Schema is managed by Alembic. In production (K8s), a PreSync Helm Job runs fiber migrate before gateway/worker pods start.

# Run migrations manually
DATABASE_URL=postgresql://... fiber migrate

# Create a new migration
PYTHONPATH=src alembic revision -m "description"

Architecture

Fiber is an event-driven system built on FastAPI, Temporal, and a connector bus pattern.

Data flow

webhook -> connector (verify + normalize)  \
                                              -> engine (match) -> Temporal workflow -> connector (action) -> agents
stream  -> connector (subscribe + emit)    /

Execution layer

Durable workflow execution via Temporal. The YAML DSL compiles to Temporal workflow inputs:

  • Compiler — Converts parsed YAML workflows + events into WorkflowInput
  • Workflow — Generic Temporal workflow that interprets compiled YAML. Handles linear, DAG, bail, finally, and verify
  • Activities — Thin wrappers: step execution (connectors), agent runs (fabric), synthetic event matching

Connector bus

Registry and dispatcher built from DB at startup. Each connector consolidates inbound (webhook or stream), outbound (action), and schema into a single class.

Adding a new connector: Create a class subclassing Connector, register it, then PUT /integrations/{name} with its config. For webhooks, implement webhook_path() + handle_webhook(). For streams, override subscribe(emit). A connector can implement both.

Multi-tenancy

TenantContext carries pre-resolved runtime state: settings, connector bus, session factory, workflows, defaults, and Temporal client. TenantStore caches contexts. Default tenant auto-reloads workflows on file change; DB tenants build their connector bus from the integrations table.

Background worker

ARQ-based async worker handles cron jobs:

  • run_scheduled_workflows_job (every 60s) — fire interval-based workflows via Temporal
  • retry_dead_letters_job (every 2m) — auto-retry failed webhooks with exponential backoff (max 5 attempts)
  • cleanup_old_data_job (daily 03:00) — delete records older than RETENTION_DAYS

Need help? Check the troubleshooting guide or open an issue.