Getting Started

This guide takes you from zero to a running Fiber instance with your first workflow in about 10 minutes.

Prerequisites

Python 3.12+
A running Temporal server (or temporal server start-dev for local dev)
PostgreSQL (optional — SQLite works for dev)
Redis (optional — needed for rate limiting, locks, and the background worker)

Install

pip install fiber-orchestrator
# Or from source:
pip install -e ".[dev]"

Configure

cp .env.example .env

Set these at minimum:

TEMPORAL_SERVER_URL=localhost:7233
INTEGRATION_SECRET_KEY=$(python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())')

For production, also set DATABASE_URL (PostgreSQL) and REDIS_URL.

Seed your integrations

Integration config is encrypted at rest in the database. Seed from a JSON file or via API:

# From file
fiber seed-integrations integrations.json

# Or via API (after the gateway is running)
curl -X PUT localhost:8000/integrations/github \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"token": "ghp_...", "org": "myorg", "webhook_secret": "..."}'

Your first workflow

Create a rules file:

# rules.d/my-first-rule.yaml
rules:
  - name: pr-notify
    on: github.pr.opened
    do:
      - plane.comment: "PR opened: [{pr.title}]({pr.url}) by @{sender}"
      - plane.state: "In Review"

Validate

fiber validate rules.d/my-first-rule.yaml

Test with dry-run

fiber serve  # start the gateway

curl -X POST localhost:8000/dry-run -d '{
  "event": "github.pr.opened",
  "vars": {
    "pr.title": "Fix login bug",
    "pr.url": "https://github.com/myorg/myapp/pull/42",
    "sender": "alice"
  },
  "explain": true
}'

The response shows which workflows matched and why, without executing anything.

Go live

Point your GitHub webhook at https://your-host/webhook/github with the same secret you configured. Open a PR — Fiber handles the rest.

YAML DSL Reference

Complete reference for Fiber's workflow definition language. Files are loaded from routes.yaml and routes.d/*.yaml.

Top-level structure

defaults:          # key-value pairs available as {key} in all templates
  matrix.room: "!abc:example.com"
  llm.model: "claude-sonnet-4-6"

rules:             # list of workflow definitions
  - name: ...

tools:             # custom shell tools
  fetch-alerts:
    command: "curl -s $ALERTMANAGER_URL/api/v2/alerts"
    timeout: 15
    env: [ALERTMANAGER_URL]

Workflow definition

Every workflow has four layers:

- name: deploy-pipeline           # REQUIRED: unique identifier

  # ── TRIGGER LAYER ──
  on: plane.label_added            # event type(s) — string or list
  if:                              # ALL conditions must match (AND)
    label: deploy
    state: { not: done }
  any_of:                          # at least one group must match (OR of ANDs)
    - { severity: critical }
    - { severity: warning, alert.name: { contains: OOM } }

  # ── PREPARATION LAYER ──
  vars:                            # computed variables, resolved once before steps
    branch: "{pr.branch}"
    tag: "v{version}-{pr.number}"

  # ── EXECUTION LAYER ──
  bail: true                       # stop on first step failure (default: false)
  do:                              # sequential step list
    - shell:lint: null
    - agent: deploy
      wait: true
  finally:                         # always runs, even after bail
    - matrix.send: "Pipeline {_status}"

  # ── VERIFICATION LAYER ──
  verify:
    on: github.check.success       # event to wait for
    if: { check.name: CI }         # conditions on that event
    within: 30m                    # deadline
    else:                          # steps if deadline passes
      - plane.state: "Failed"

  # ── MODIFIERS ──
  cooldown: 300                    # seconds between fires
  schedule: 1h                     # periodic trigger
  strict: true                     # fail on missing template vars

Event types

Source	Events
Plane	`plane.issue.created`, `plane.issue.updated`, `plane.label_added`, `plane.label_removed`, `plane.state_changed`
GitHub	`github.pr.opened`, `github.pr.closed`, `github.pr.merged`, `github.pr.reopened`, `github.review.approved`, `github.review.changes_requested`, `github.check.success`, `github.check.failure`
Alertmanager	`alertmanager.firing`, `alertmanager.resolved`
Matrix	`matrix.message`
MQTT	Events emitted per-topic with `mqtt.*` vars
Engine	`outcome.met`, `outcome.unmet`, `action.<type>`, `invoke`, `schedule`

Multiple triggers: on: [plane.label_added, plane.issue.created]

Condition operators

state: "In Review"                          # equality (case-insensitive)
severity: { in: [critical, warning] }       # membership
state: { not: Done }                        # negation
issue.text: { contains: deploy }            # substring
issue.text: { matches: "(?i)deploy" }       # regex
alert.name: { startswith: Database }        # prefix
repo: { endswith: "-api" }                  # suffix
pr.additions: { gt: 500 }                  # numeric: gt, gte, lt, lte, eq
pr.url: { exists: true }                   # non-empty check
description: { empty: false }              # emptiness check

Combination logic: if: keys are ANDed. any_of: is a list of dicts, each ANDed internally, groups ORed. Both together: if AND any_of.

Actions

Action	Args	Description
`plane.state`	`"In Review"`	Change issue state
`plane.comment`	`"text with {vars}"`	Comment on issue
`plane.label.add`	`"incident"`	Add label
`plane.label.remove`	`"in-progress"`	Remove label
`plane.issue`	`{title, description?, labels?, state?, priority?}`	Create issue
`github.comment`	`"body"` or `{body, repo?, number?}`	Comment on PR/issue
`github.status`	`{state, context?, description?}`	Set commit status
`github.label.add`	`"label"` or `{labels, repo?}`	Add labels
`github.merge`	`"squash"` or `{method?, title?}`	Merge PR
`github.issue`	`{title, body?, labels?}`	Create issue
`matrix.send`	`"message"` or `{room?, body}`	Send chat message
`llm`	`"prompt"` or `{prompt, model?, system?}`	LLM completion
`agent`	`"type"` or `{type, presets?, wait?}`	Claude agent dispatch
`alert`	`"summary"` or `{name?, summary, severity?}`	Push alert
`http`	`"url"` or `{url, method?, headers?}`	HTTP request
`shell`	`"command"`	Shell command
`shell:<tool>`	`{arg: value}`	Named tool from `tools:`
`run`	`"workflow-name"`	Invoke another workflow

Step modifiers

- plane.comment: "PR opened: {pr.title}"
  as: comment_result                    # name the output for later steps
  wait: true                            # block until complete (for agents)
  if: { state: { not: done } }          # per-step condition
  any_of:                               # per-step OR conditions
    - { label: deploy }
    - { label: staging }
  retry: 2                              # retry count (exponential backoff)

Template variables

Source	Variables
Plane	`{event}`, `{issue.id}`, `{issue.title}`, `{issue.text}`, `{labels}`, `{label}`, `{state}`, `{old_state}`
GitHub	`{pr.url}`, `{pr.title}`, `{pr.number}`, `{pr.branch}`, `{repo}`, `{sender}`, `{check.name}`, `{check.conclusion}`
Alertmanager	`{severity}`, `{alert.name}`, `{alert.namespace}`, `{summary}`, `{description}`, `{fingerprint}`
Matrix	`{room_id}`, `{sender}`, `{body}`
MQTT	`{mqtt.topic}`, `{mqtt.<field>}` (auto-flattened JSON)
Engine	`{_prev}`, `{_status}`, `{_failed_step}`, `{_failure_reason}`

Named step output

do:
  - llm: "Classify: {alert.name}"
    as: classify
  - plane.comment: "Auto-triage: {classify}"
  - plane.issue:
      title: "[ALERT] {alert.name}"
    as: new_issue
  - matrix.send: "Created issue {new_issue.issue_id}"

DAG execution

Alternative to do: for parallel workflows. do: and dag: are mutually exclusive.

dag:
  start:
    - plane.comment: "Starting..."
      then: [lint, test]        # fan-out to parallel nodes
  lint:
    - shell:lint: null
      then: [gate]
  test:
    - shell:test: null
      then: [gate]
  gate:
    - join: all                 # "all" = wait for all parents, "any" = first wins
      then: [deploy]
  deploy:
    - agent: deploy
      wait: true

Verification

Async verification — wait for a confirming event or escalate:

verify:
  on: github.check.success       # event type to wait for
  if: { check.name: deploy }     # conditions on that event
  within: 30m                    # deadline
  else:                          # steps if deadline passes
    - plane.state: "Escalated"
    - alert: { severity: critical, summary: "Not verified in 30m" }

Workflow composition

Workflows invoke other workflows by name. Every step emits a synthetic event that other workflows can match. Max chain depth: 3.

# Parent workflow
- name: critical-alert
  on: alertmanager.firing
  if: { severity: critical }
  do:
    - run: notify-ops       # invoke child workflow

# Child workflow
- name: notify-ops
  on: invoke
  do:
    - matrix.send: "Critical: {alert.name}"

Duration strings

Accepted for cooldown:, schedule:, verify.within::

7d        = 604800s
2h30m     = 9000s
30m       = 1800s
45s       = 45s
1800      = 1800s  (bare int = seconds)

API Reference

Management endpoints require a Keycloak JWT with cikut_admin role. Webhook endpoints use connector-specific HMAC/bearer verification.

Webhook receivers

Endpoint	Auth	Source
`POST /webhook`	HMAC-SHA256	Plane
`POST /webhook/github`	HMAC-SHA256	GitHub
`POST /webhook/alertmanager`	Bearer token	Alertmanager
`POST /webhook/matrix`	Bearer token	Matrix

Direct invocation

Endpoint	Purpose
`POST /run`	Dispatch agent with freeform input or issue ID
`POST /code`	Run Claude Code on a repository
`POST /llm`	LLM completion via Bifrost
`POST /matrix/send`	Send Matrix message

Workflow management

Endpoint	Purpose
`GET /workflows`	List all named workflows
`POST /dry-run`	Simulate event with match explanation
`POST /fire`	Fire a synthetic event through the engine
`POST /triggers/sync`	Force-sync config to Plane
`GET /triggers/health`	Validate all references
`GET /describe`	Full system introspection

Dry-run example

curl -X POST /dry-run -d '{
  "event": "alertmanager.firing",
  "vars": {"severity": "info", "alert.name": "HighCPU"},
  "explain": true
}'

# Response:
{
  "matched_workflows": 0,
  "total_evaluated": 3,
  "results": [{
    "workflow": "alert-response",
    "matched": false,
    "conditions": [
      {"key": "severity", "expected": "critical", "actual": "info", "matched": false}
    ],
    "reason": "no any_of branch matched"
  }]
}

Integrations API

Connector config is encrypted at rest. Secrets are never returned by the API.

Endpoint	Purpose
`GET /integrations`	List all (metadata only)
`GET /integrations/{connector}`	Single integration metadata
`PUT /integrations/{connector}`	Create or update (encrypted at rest)
`DELETE /integrations/{connector}`	Remove integration

PUT and DELETE hot-reload the connector bus — no restart required.

Connector config format

{
  "plane":        {"base_url": "", "api_key": "", "webhook_secret": "",
                   "workspace_slug": "", "project_id": ""},
  "github":       {"token": "", "org": "", "webhook_secret": ""},
  "alertmanager": {"bearer_secret": "", "url": ""},
  "matrix":       {"homeserver": "", "access_token": "", "bot_user": "",
                   "webhook_secret": ""},
  "llm":          {"bifrost_url": "", "bifrost_api_key": "",
                   "default_model": ""},
  "grafana":      {"base_url": "", "api_key": ""},
  "mqtt":         {"broker": "", "username": "", "password": "",
                   "subscriptions": []}
}

Observability

Endpoint	Purpose
`GET /healthz`	Liveness (no auth)
`GET /metrics`	Prometheus metrics
`GET /executions`	Execution history (filter by issue, type, workflow, status)
`GET /outcomes`	Outcome success rates

Prometheus metrics

fiber_webhooks_received_total{source, event, action}
fiber_rules_matched_total{event, action_type}
fiber_actions_executed_total{action_type, status}
fiber_agents_dispatched_total{agent}
fiber_agents_completed_total{agent, status}
fiber_agent_duration_seconds{agent}
fiber_agents_running{agent}
fiber_workflows_started_total{workflow}
fiber_workflows_completed_total{workflow, status}
fiber_rate_limited_total{tenant}
fiber_dead_letters_retried_total{status}

Distributed tracing

Every request gets a trace ID (from X-Request-ID or auto-generated). Propagates through gateway, engine, worker, and agent subprocesses. Stored on execution rows for correlation.

Tenants API

Endpoint	Purpose
`GET /tenants`	List all tenants
`POST /tenants`	Create a new tenant
`PUT /tenants/{slug}/workflows`	Replace tenant workflows
`POST /tenants/{slug}/workflows/append`	Append a workflow rule

Agents

Fiber includes an agent fabric for managed AI agent runtimes with workspace isolation and timeout budgets. Agents are dispatched as workflow steps and operate on Plane issues.

Agent types

Agent	Trigger	What it does
deploy	label:deploy	Scaffold repo, Helm chart, DNS, CI/CD
code	label:code	Implement features, write tests, create PR
review	state:In Review	Review quality, security, test coverage
kaizen	label:kaizen	Analyze tech debt, suggest improvements
triage	issue created	Classify and route issues
sre	alert firing	Investigate infrastructure incidents
security	PR opened	Scan for vulnerabilities
docs	label:docs	Generate or update documentation
product	label:product	Product requirements analysis
onboarding	pulse	User onboarding flows
finance	pulse	Financial data analysis
growth	pulse	Growth metrics and experiments

Dispatching agents

# Simple dispatch
do:
  - agent: deploy

# With options
do:
  - agent:
      type: deploy
      presets: [testing, security]
      wait: true
      context: "Additional context for the agent"

# Dynamic dispatch from event variable
do:
  - agent: "{label}"

# Dynamic dispatch from LLM classification
do:
  - llm: "Classify this issue. Return one of: code, deploy, sre"
    as: triage_result
  - agent: "{triage_result.agent}"

Chaining

Each agent sees the output of all prior runs on the same issue. Agents chain through sequential do: steps with wait: true:

- name: implement-pipeline
  on: plane.label_added
  if: { label: code }
  bail: true
  do:
    - agent: code
      wait: true
    - agent: security
      wait: true
    - agent: review
      wait: true
  verify:
    on: github.check.success
    if: { check.name: CI }
    within: 30m
  finally:
    - matrix.send: "Pipeline {_status} for {issue.title}"

GitHub context

When agents run on issues linked to GitHub PRs, they automatically receive PR diffs, review comments, and CI logs. This context is injected into the agent's prompt.

Presets

Reusable instruction fragments:

Preset	Purpose
testing	Test-writing conventions
security	Security review checklist
docs	Documentation standards
commit-convention	Commit message format
style-python	Python code style
style-typescript	TypeScript code style

Agent fabric internals

The fabric manages the full lifecycle:

Spec — Load agent configuration (model, workspace, budget, context)
Workspace — Create isolated environment (tmpdir or git-worktree)
Enrich — Inject issue context, prior run outputs, GitHub data
Execute — Run agent subprocess with timeout budget
Parse — Extract results from agent output
Comment — Post results back to the Plane issue

Timeout budget is controlled by AGENT_TIMEOUT (default: 600 seconds).

MCP Server

Fiber exposes its capabilities as an MCP (Model Context Protocol) server, making all operations available as Claude-native tools. Any MCP client — Claude Code, OpenClaw, or custom agents — gets full orchestrator access.

Setup

Add to Claude Code settings:

// ~/.claude/settings.json
{
  "mcpServers": {
    "fiber": {
      "command": "fiber-mcp"
    }
  }
}

Available tools

Agent dispatch

run_agent — Dispatch an agent with freeform input
run_agent_for_issue — Dispatch an agent for a specific Plane issue
invoke_code — Run Claude Code on a repository
llm — LLM completion via Bifrost

Communication

send_matrix_message — Send to a Matrix room
list_matrix_rooms — List joined rooms
push_alert — Push to Alertmanager

Workflow management

list_workflows — List all named workflows
sync_workflows — Force-sync to Plane
workflows_health — Validate references
update_workflows — Replace tenant workflows
append_workflow — Add a single rule

Observability

list_executions — Query execution history
get_execution — Execution details
get_outcomes — Outcome success rates
describe — Full system introspection
fire_event — Fire a synthetic event
dry_run — Simulate event processing

Tenants

list_tenants — List all tenants
create_tenant — Create a new tenant

Operations Manual

How to run Fiber in production — the components, the state machine, triggering work, monitoring, and troubleshooting.

Running Fiber

Three components are required:

Temporal server — durable workflow execution
Fiber gateway — receives webhooks, matches rules
Temporal worker — executes activities (agents, connectors)

# Verify Temporal is running
temporal operator cluster health

# Start the gateway + worker
fiber serve

In production, these are separate Kubernetes deployments managed by a Helm chart.

Triggering work

Via Plane (recommended): Create an issue, add the auto label. The system triages and dispatches automatically.

Via specific label: code, deploy, review, kaizen — skip triage, run the agent directly.

Via Matrix: Send @bot fix {description} — an issue is created and the pipeline runs.

State machine

Every issue follows the same state progression:

Backlog -> Triage -> In Progress -> In Review -> Done
                      |              |              |
                   classifies     code/deploy    review +
                   + labels       agent works    CI runs
                                  creates PR
                      ^              ^              |
                      |              +-- changes ---+
                      |                  requested
                      +--- fail / timeout ----------+
                                                    |
                                               Cancelled
                                             (PR closed unmerged)

Every failure returns to Triage for human visibility. No issue silently stalls.

The standard path

Issue created in Plane
  -> Auto-triage: LLM classifies type, priority, labels
  -> Label "auto" added
  -> LLM classifies: code / deploy / skip
  -> Code agent implements -> PR created
  -> Review agent reviews -> PR approved or changes requested
  -> CI passes + merge -> Done
  -> CI fails -> auto-fix -> verify within 30m -> Triage if still failing

Alert-triggered path

Alertmanager fires (critical/warning)
  -> Plane issue created: [ALERT] {name}
  -> Code agent investigates -> PR with fix
  -> Review agent validates
  -> Alert resolves within 1h -> Done
  -> Still firing -> labeled "escalate"

Troubleshooting

Agent fails

Issue moves back to "Triage" automatically
Agent output posted as comment on the issue
Matrix notification sent
Check the Plane issue comment for error output
Fix the problem, re-add the auto label

CI doesn't pass within 30 minutes

Issue moves to "Triage" with a comment
Check the PR — the code agent may have introduced a bug
Fix manually or re-trigger with the auto label

Agent stuck (running too long)

Default timeout: 10 minutes per agent
Temporal kills the activity on timeout
Check Temporal UI for workflow state
Workspace kept on failure for debugging, cleaned on success

Security model

Agents run as sandboxed subprocesses
Agents never receive database credentials or cluster admin tokens
Secrets passed via environment variables, never inlined
Agents create PRs — they never push to main directly
Deploy agent never runs kubectl apply — ArgoCD manages deployments
Distributed locks prevent concurrent agents on the same issue

Configuration

Only infrastructure settings are environment variables. All connector-specific config lives in the encrypted integrations DB table.

Environment variables

Variable	Default	Purpose
`INTEGRATION_SECRET_KEY`	(required)	Fernet key for encrypting integration config
`DATABASE_URL`	sqlite:///data/fiber.db	Database (PostgreSQL recommended)
`REDIS_URL`	redis://localhost:6379	Locks, dedup, rate limiting, job queue
`TEMPORAL_SERVER_URL`	(required)	Temporal server for durable execution
`ANTHROPIC_API_KEY`	—	API key for Claude agents
`AGENT_TIMEOUT`	600	Max seconds per agent run
`RETENTION_DAYS`	90	Data retention period

Encryption

Integration config uses envelope encryption: a per-row DEK encrypts the JSON config blob, and the DEK is wrapped under the KEK (INTEGRATION_SECRET_KEY). Key rotation is online — fiber rotate-key re-wraps DEKs without touching ciphertext.

# Generate a new encryption key
python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())'

Database migrations

Schema is managed by Alembic. In production (K8s), a PreSync Helm Job runs fiber migrate before gateway/worker pods start.

# Run migrations manually
DATABASE_URL=postgresql://... fiber migrate

# Create a new migration
PYTHONPATH=src alembic revision -m "description"

Architecture

Fiber is an event-driven system built on FastAPI, Temporal, and a connector bus pattern.

Data flow

webhook -> connector (verify + normalize)  \
                                              -> engine (match) -> Temporal workflow -> connector (action) -> agents
stream  -> connector (subscribe + emit)    /

Execution layer

Durable workflow execution via Temporal. The YAML DSL compiles to Temporal workflow inputs:

Compiler — Converts parsed YAML workflows + events into WorkflowInput
Workflow — Generic Temporal workflow that interprets compiled YAML. Handles linear, DAG, bail, finally, and verify
Activities — Thin wrappers: step execution (connectors), agent runs (fabric), synthetic event matching

Connector bus

Registry and dispatcher built from DB at startup. Each connector consolidates inbound (webhook or stream), outbound (action), and schema into a single class.

Adding a new connector: Create a class subclassing Connector, register it, then PUT /integrations/{name} with its config. For webhooks, implement webhook_path() + handle_webhook(). For streams, override subscribe(emit). A connector can implement both.

Multi-tenancy

TenantContext carries pre-resolved runtime state: settings, connector bus, session factory, workflows, defaults, and Temporal client. TenantStore caches contexts. Default tenant auto-reloads workflows on file change; DB tenants build their connector bus from the integrations table.

Background worker

ARQ-based async worker handles cron jobs:

run_scheduled_workflows_job (every 60s) — fire interval-based workflows via Temporal
retry_dead_letters_job (every 2m) — auto-retry failed webhooks with exponential backoff (max 5 attempts)
cleanup_old_data_job (daily 03:00) — delete records older than RETENTION_DAYS

Need help? Check the troubleshooting guide or open an issue.