Getting Started
This guide takes you from zero to a running Fiber instance with your first workflow in about 10 minutes.
Prerequisites
- Python 3.12+
- A running Temporal server (or
temporal server start-devfor local dev) - PostgreSQL (optional — SQLite works for dev)
- Redis (optional — needed for rate limiting, locks, and the background worker)
Install
pip install fiber-orchestrator
# Or from source:
pip install -e ".[dev]"
Configure
cp .env.example .env
Set these at minimum:
TEMPORAL_SERVER_URL=localhost:7233
INTEGRATION_SECRET_KEY=$(python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())')
For production, also set DATABASE_URL (PostgreSQL) and REDIS_URL.
Seed your integrations
Integration config is encrypted at rest in the database. Seed from a JSON file or via API:
# From file
fiber seed-integrations integrations.json
# Or via API (after the gateway is running)
curl -X PUT localhost:8000/integrations/github \
-H "Authorization: Bearer $TOKEN" \
-d '{"token": "ghp_...", "org": "myorg", "webhook_secret": "..."}'
Your first workflow
Create a rules file:
# rules.d/my-first-rule.yaml
rules:
- name: pr-notify
on: github.pr.opened
do:
- plane.comment: "PR opened: [{pr.title}]({pr.url}) by @{sender}"
- plane.state: "In Review"
Validate
fiber validate rules.d/my-first-rule.yaml
Test with dry-run
fiber serve # start the gateway
curl -X POST localhost:8000/dry-run -d '{
"event": "github.pr.opened",
"vars": {
"pr.title": "Fix login bug",
"pr.url": "https://github.com/myorg/myapp/pull/42",
"sender": "alice"
},
"explain": true
}'
The response shows which workflows matched and why, without executing anything.
Go live
Point your GitHub webhook at https://your-host/webhook/github with the same secret you configured. Open a PR — Fiber handles the rest.
YAML DSL Reference
Complete reference for Fiber's workflow definition language. Files are loaded from routes.yaml and routes.d/*.yaml.
Top-level structure
defaults: # key-value pairs available as {key} in all templates
matrix.room: "!abc:example.com"
llm.model: "claude-sonnet-4-6"
rules: # list of workflow definitions
- name: ...
tools: # custom shell tools
fetch-alerts:
command: "curl -s $ALERTMANAGER_URL/api/v2/alerts"
timeout: 15
env: [ALERTMANAGER_URL]
Workflow definition
Every workflow has four layers:
- name: deploy-pipeline # REQUIRED: unique identifier
# ── TRIGGER LAYER ──
on: plane.label_added # event type(s) — string or list
if: # ALL conditions must match (AND)
label: deploy
state: { not: done }
any_of: # at least one group must match (OR of ANDs)
- { severity: critical }
- { severity: warning, alert.name: { contains: OOM } }
# ── PREPARATION LAYER ──
vars: # computed variables, resolved once before steps
branch: "{pr.branch}"
tag: "v{version}-{pr.number}"
# ── EXECUTION LAYER ──
bail: true # stop on first step failure (default: false)
do: # sequential step list
- shell:lint: null
- agent: deploy
wait: true
finally: # always runs, even after bail
- matrix.send: "Pipeline {_status}"
# ── VERIFICATION LAYER ──
verify:
on: github.check.success # event to wait for
if: { check.name: CI } # conditions on that event
within: 30m # deadline
else: # steps if deadline passes
- plane.state: "Failed"
# ── MODIFIERS ──
cooldown: 300 # seconds between fires
schedule: 1h # periodic trigger
strict: true # fail on missing template vars
Event types
| Source | Events |
|---|---|
| Plane | plane.issue.created, plane.issue.updated, plane.label_added, plane.label_removed, plane.state_changed |
| GitHub | github.pr.opened, github.pr.closed, github.pr.merged, github.pr.reopened, github.review.approved, github.review.changes_requested, github.check.success, github.check.failure |
| Alertmanager | alertmanager.firing, alertmanager.resolved |
| Matrix | matrix.message |
| MQTT | Events emitted per-topic with mqtt.* vars |
| Engine | outcome.met, outcome.unmet, action.<type>, invoke, schedule |
Multiple triggers: on: [plane.label_added, plane.issue.created]
Condition operators
state: "In Review" # equality (case-insensitive)
severity: { in: [critical, warning] } # membership
state: { not: Done } # negation
issue.text: { contains: deploy } # substring
issue.text: { matches: "(?i)deploy" } # regex
alert.name: { startswith: Database } # prefix
repo: { endswith: "-api" } # suffix
pr.additions: { gt: 500 } # numeric: gt, gte, lt, lte, eq
pr.url: { exists: true } # non-empty check
description: { empty: false } # emptiness check
Combination logic: if: keys are ANDed. any_of: is a list of dicts, each ANDed internally, groups ORed. Both together: if AND any_of.
Actions
| Action | Args | Description |
|---|---|---|
plane.state | "In Review" | Change issue state |
plane.comment | "text with {vars}" | Comment on issue |
plane.label.add | "incident" | Add label |
plane.label.remove | "in-progress" | Remove label |
plane.issue | {title, description?, labels?, state?, priority?} | Create issue |
github.comment | "body" or {body, repo?, number?} | Comment on PR/issue |
github.status | {state, context?, description?} | Set commit status |
github.label.add | "label" or {labels, repo?} | Add labels |
github.merge | "squash" or {method?, title?} | Merge PR |
github.issue | {title, body?, labels?} | Create issue |
matrix.send | "message" or {room?, body} | Send chat message |
llm | "prompt" or {prompt, model?, system?} | LLM completion |
agent | "type" or {type, presets?, wait?} | Claude agent dispatch |
alert | "summary" or {name?, summary, severity?} | Push alert |
http | "url" or {url, method?, headers?} | HTTP request |
shell | "command" | Shell command |
shell:<tool> | {arg: value} | Named tool from tools: |
run | "workflow-name" | Invoke another workflow |
Step modifiers
- plane.comment: "PR opened: {pr.title}"
as: comment_result # name the output for later steps
wait: true # block until complete (for agents)
if: { state: { not: done } } # per-step condition
any_of: # per-step OR conditions
- { label: deploy }
- { label: staging }
retry: 2 # retry count (exponential backoff)
Template variables
| Source | Variables |
|---|---|
| Plane | {event}, {issue.id}, {issue.title}, {issue.text}, {labels}, {label}, {state}, {old_state} |
| GitHub | {pr.url}, {pr.title}, {pr.number}, {pr.branch}, {repo}, {sender}, {check.name}, {check.conclusion} |
| Alertmanager | {severity}, {alert.name}, {alert.namespace}, {summary}, {description}, {fingerprint} |
| Matrix | {room_id}, {sender}, {body} |
| MQTT | {mqtt.topic}, {mqtt.<field>} (auto-flattened JSON) |
| Engine | {_prev}, {_status}, {_failed_step}, {_failure_reason} |
Named step output
do:
- llm: "Classify: {alert.name}"
as: classify
- plane.comment: "Auto-triage: {classify}"
- plane.issue:
title: "[ALERT] {alert.name}"
as: new_issue
- matrix.send: "Created issue {new_issue.issue_id}"
DAG execution
Alternative to do: for parallel workflows. do: and dag: are mutually exclusive.
dag:
start:
- plane.comment: "Starting..."
then: [lint, test] # fan-out to parallel nodes
lint:
- shell:lint: null
then: [gate]
test:
- shell:test: null
then: [gate]
gate:
- join: all # "all" = wait for all parents, "any" = first wins
then: [deploy]
deploy:
- agent: deploy
wait: true
Verification
Async verification — wait for a confirming event or escalate:
verify:
on: github.check.success # event type to wait for
if: { check.name: deploy } # conditions on that event
within: 30m # deadline
else: # steps if deadline passes
- plane.state: "Escalated"
- alert: { severity: critical, summary: "Not verified in 30m" }
Workflow composition
Workflows invoke other workflows by name. Every step emits a synthetic event that other workflows can match. Max chain depth: 3.
# Parent workflow
- name: critical-alert
on: alertmanager.firing
if: { severity: critical }
do:
- run: notify-ops # invoke child workflow
# Child workflow
- name: notify-ops
on: invoke
do:
- matrix.send: "Critical: {alert.name}"
Duration strings
Accepted for cooldown:, schedule:, verify.within::
7d = 604800s
2h30m = 9000s
30m = 1800s
45s = 45s
1800 = 1800s (bare int = seconds)
API Reference
Management endpoints require a Keycloak JWT with cikut_admin role. Webhook endpoints use connector-specific HMAC/bearer verification.
Webhook receivers
| Endpoint | Auth | Source |
|---|---|---|
POST /webhook | HMAC-SHA256 | Plane |
POST /webhook/github | HMAC-SHA256 | GitHub |
POST /webhook/alertmanager | Bearer token | Alertmanager |
POST /webhook/matrix | Bearer token | Matrix |
Direct invocation
| Endpoint | Purpose |
|---|---|
POST /run | Dispatch agent with freeform input or issue ID |
POST /code | Run Claude Code on a repository |
POST /llm | LLM completion via Bifrost |
POST /matrix/send | Send Matrix message |
Workflow management
| Endpoint | Purpose |
|---|---|
GET /workflows | List all named workflows |
POST /dry-run | Simulate event with match explanation |
POST /fire | Fire a synthetic event through the engine |
POST /triggers/sync | Force-sync config to Plane |
GET /triggers/health | Validate all references |
GET /describe | Full system introspection |
Dry-run example
curl -X POST /dry-run -d '{
"event": "alertmanager.firing",
"vars": {"severity": "info", "alert.name": "HighCPU"},
"explain": true
}'
# Response:
{
"matched_workflows": 0,
"total_evaluated": 3,
"results": [{
"workflow": "alert-response",
"matched": false,
"conditions": [
{"key": "severity", "expected": "critical", "actual": "info", "matched": false}
],
"reason": "no any_of branch matched"
}]
}
Integrations API
Connector config is encrypted at rest. Secrets are never returned by the API.
| Endpoint | Purpose |
|---|---|
GET /integrations | List all (metadata only) |
GET /integrations/{connector} | Single integration metadata |
PUT /integrations/{connector} | Create or update (encrypted at rest) |
DELETE /integrations/{connector} | Remove integration |
PUT and DELETE hot-reload the connector bus — no restart required.
Connector config format
{
"plane": {"base_url": "", "api_key": "", "webhook_secret": "",
"workspace_slug": "", "project_id": ""},
"github": {"token": "", "org": "", "webhook_secret": ""},
"alertmanager": {"bearer_secret": "", "url": ""},
"matrix": {"homeserver": "", "access_token": "", "bot_user": "",
"webhook_secret": ""},
"llm": {"bifrost_url": "", "bifrost_api_key": "",
"default_model": ""},
"grafana": {"base_url": "", "api_key": ""},
"mqtt": {"broker": "", "username": "", "password": "",
"subscriptions": []}
}
Observability
| Endpoint | Purpose |
|---|---|
GET /healthz | Liveness (no auth) |
GET /metrics | Prometheus metrics |
GET /executions | Execution history (filter by issue, type, workflow, status) |
GET /outcomes | Outcome success rates |
Prometheus metrics
fiber_webhooks_received_total{source, event, action}
fiber_rules_matched_total{event, action_type}
fiber_actions_executed_total{action_type, status}
fiber_agents_dispatched_total{agent}
fiber_agents_completed_total{agent, status}
fiber_agent_duration_seconds{agent}
fiber_agents_running{agent}
fiber_workflows_started_total{workflow}
fiber_workflows_completed_total{workflow, status}
fiber_rate_limited_total{tenant}
fiber_dead_letters_retried_total{status}
Distributed tracing
Every request gets a trace ID (from X-Request-ID or auto-generated). Propagates through gateway, engine, worker, and agent subprocesses. Stored on execution rows for correlation.
Tenants API
| Endpoint | Purpose |
|---|---|
GET /tenants | List all tenants |
POST /tenants | Create a new tenant |
PUT /tenants/{slug}/workflows | Replace tenant workflows |
POST /tenants/{slug}/workflows/append | Append a workflow rule |
Agents
Fiber includes an agent fabric for managed AI agent runtimes with workspace isolation and timeout budgets. Agents are dispatched as workflow steps and operate on Plane issues.
Agent types
| Agent | Trigger | What it does |
|---|---|---|
| deploy | label:deploy | Scaffold repo, Helm chart, DNS, CI/CD |
| code | label:code | Implement features, write tests, create PR |
| review | state:In Review | Review quality, security, test coverage |
| kaizen | label:kaizen | Analyze tech debt, suggest improvements |
| triage | issue created | Classify and route issues |
| sre | alert firing | Investigate infrastructure incidents |
| security | PR opened | Scan for vulnerabilities |
| docs | label:docs | Generate or update documentation |
| product | label:product | Product requirements analysis |
| onboarding | pulse | User onboarding flows |
| finance | pulse | Financial data analysis |
| growth | pulse | Growth metrics and experiments |
Dispatching agents
# Simple dispatch
do:
- agent: deploy
# With options
do:
- agent:
type: deploy
presets: [testing, security]
wait: true
context: "Additional context for the agent"
# Dynamic dispatch from event variable
do:
- agent: "{label}"
# Dynamic dispatch from LLM classification
do:
- llm: "Classify this issue. Return one of: code, deploy, sre"
as: triage_result
- agent: "{triage_result.agent}"
Chaining
Each agent sees the output of all prior runs on the same issue. Agents chain through sequential do: steps with wait: true:
- name: implement-pipeline
on: plane.label_added
if: { label: code }
bail: true
do:
- agent: code
wait: true
- agent: security
wait: true
- agent: review
wait: true
verify:
on: github.check.success
if: { check.name: CI }
within: 30m
finally:
- matrix.send: "Pipeline {_status} for {issue.title}"
GitHub context
When agents run on issues linked to GitHub PRs, they automatically receive PR diffs, review comments, and CI logs. This context is injected into the agent's prompt.
Presets
Reusable instruction fragments:
| Preset | Purpose |
|---|---|
| testing | Test-writing conventions |
| security | Security review checklist |
| docs | Documentation standards |
| commit-convention | Commit message format |
| style-python | Python code style |
| style-typescript | TypeScript code style |
Agent fabric internals
The fabric manages the full lifecycle:
- Spec — Load agent configuration (model, workspace, budget, context)
- Workspace — Create isolated environment (tmpdir or git-worktree)
- Enrich — Inject issue context, prior run outputs, GitHub data
- Execute — Run agent subprocess with timeout budget
- Parse — Extract results from agent output
- Comment — Post results back to the Plane issue
Timeout budget is controlled by AGENT_TIMEOUT (default: 600 seconds).
MCP Server
Fiber exposes its capabilities as an MCP (Model Context Protocol) server, making all operations available as Claude-native tools. Any MCP client — Claude Code, OpenClaw, or custom agents — gets full orchestrator access.
Setup
Add to Claude Code settings:
// ~/.claude/settings.json
{
"mcpServers": {
"fiber": {
"command": "fiber-mcp"
}
}
}
Available tools
Agent dispatch
run_agent— Dispatch an agent with freeform inputrun_agent_for_issue— Dispatch an agent for a specific Plane issueinvoke_code— Run Claude Code on a repositoryllm— LLM completion via Bifrost
Communication
send_matrix_message— Send to a Matrix roomlist_matrix_rooms— List joined roomspush_alert— Push to Alertmanager
Workflow management
list_workflows— List all named workflowssync_workflows— Force-sync to Planeworkflows_health— Validate referencesupdate_workflows— Replace tenant workflowsappend_workflow— Add a single rule
Observability
list_executions— Query execution historyget_execution— Execution detailsget_outcomes— Outcome success ratesdescribe— Full system introspectionfire_event— Fire a synthetic eventdry_run— Simulate event processing
Tenants
list_tenants— List all tenantscreate_tenant— Create a new tenant
Operations Manual
How to run Fiber in production — the components, the state machine, triggering work, monitoring, and troubleshooting.
Running Fiber
Three components are required:
- Temporal server — durable workflow execution
- Fiber gateway — receives webhooks, matches rules
- Temporal worker — executes activities (agents, connectors)
# Verify Temporal is running
temporal operator cluster health
# Start the gateway + worker
fiber serve
In production, these are separate Kubernetes deployments managed by a Helm chart.
Triggering work
Via Plane (recommended): Create an issue, add the auto label. The system triages and dispatches automatically.
Via specific label: code, deploy, review, kaizen — skip triage, run the agent directly.
Via Matrix: Send @bot fix {description} — an issue is created and the pipeline runs.
State machine
Every issue follows the same state progression:
Backlog -> Triage -> In Progress -> In Review -> Done
| | |
classifies code/deploy review +
+ labels agent works CI runs
creates PR
^ ^ |
| +-- changes ---+
| requested
+--- fail / timeout ----------+
|
Cancelled
(PR closed unmerged)
Every failure returns to Triage for human visibility. No issue silently stalls.
The standard path
Issue created in Plane
-> Auto-triage: LLM classifies type, priority, labels
-> Label "auto" added
-> LLM classifies: code / deploy / skip
-> Code agent implements -> PR created
-> Review agent reviews -> PR approved or changes requested
-> CI passes + merge -> Done
-> CI fails -> auto-fix -> verify within 30m -> Triage if still failing
Alert-triggered path
Alertmanager fires (critical/warning)
-> Plane issue created: [ALERT] {name}
-> Code agent investigates -> PR with fix
-> Review agent validates
-> Alert resolves within 1h -> Done
-> Still firing -> labeled "escalate"
Troubleshooting
Agent fails
- Issue moves back to "Triage" automatically
- Agent output posted as comment on the issue
- Matrix notification sent
- Check the Plane issue comment for error output
- Fix the problem, re-add the
autolabel
CI doesn't pass within 30 minutes
- Issue moves to "Triage" with a comment
- Check the PR — the code agent may have introduced a bug
- Fix manually or re-trigger with the
autolabel
Agent stuck (running too long)
- Default timeout: 10 minutes per agent
- Temporal kills the activity on timeout
- Check Temporal UI for workflow state
- Workspace kept on failure for debugging, cleaned on success
Security model
- Agents run as sandboxed subprocesses
- Agents never receive database credentials or cluster admin tokens
- Secrets passed via environment variables, never inlined
- Agents create PRs — they never push to
maindirectly - Deploy agent never runs
kubectl apply— ArgoCD manages deployments - Distributed locks prevent concurrent agents on the same issue
Configuration
Only infrastructure settings are environment variables. All connector-specific config lives in the encrypted integrations DB table.
Environment variables
| Variable | Default | Purpose |
|---|---|---|
INTEGRATION_SECRET_KEY | (required) | Fernet key for encrypting integration config |
DATABASE_URL | sqlite:///data/fiber.db | Database (PostgreSQL recommended) |
REDIS_URL | redis://localhost:6379 | Locks, dedup, rate limiting, job queue |
TEMPORAL_SERVER_URL | (required) | Temporal server for durable execution |
ANTHROPIC_API_KEY | — | API key for Claude agents |
AGENT_TIMEOUT | 600 | Max seconds per agent run |
RETENTION_DAYS | 90 | Data retention period |
Encryption
Integration config uses envelope encryption: a per-row DEK encrypts the JSON config blob, and the DEK is wrapped under the KEK (INTEGRATION_SECRET_KEY). Key rotation is online — fiber rotate-key re-wraps DEKs without touching ciphertext.
# Generate a new encryption key
python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())'
Database migrations
Schema is managed by Alembic. In production (K8s), a PreSync Helm Job runs fiber migrate before gateway/worker pods start.
# Run migrations manually
DATABASE_URL=postgresql://... fiber migrate
# Create a new migration
PYTHONPATH=src alembic revision -m "description"
Architecture
Fiber is an event-driven system built on FastAPI, Temporal, and a connector bus pattern.
Data flow
webhook -> connector (verify + normalize) \
-> engine (match) -> Temporal workflow -> connector (action) -> agents
stream -> connector (subscribe + emit) /
Execution layer
Durable workflow execution via Temporal. The YAML DSL compiles to Temporal workflow inputs:
- Compiler — Converts parsed YAML workflows + events into
WorkflowInput - Workflow — Generic Temporal workflow that interprets compiled YAML. Handles linear, DAG, bail, finally, and verify
- Activities — Thin wrappers: step execution (connectors), agent runs (fabric), synthetic event matching
Connector bus
Registry and dispatcher built from DB at startup. Each connector consolidates inbound (webhook or stream), outbound (action), and schema into a single class.
Adding a new connector: Create a class subclassing Connector, register it, then PUT /integrations/{name} with its config. For webhooks, implement webhook_path() + handle_webhook(). For streams, override subscribe(emit). A connector can implement both.
Multi-tenancy
TenantContext carries pre-resolved runtime state: settings, connector bus, session factory, workflows, defaults, and Temporal client. TenantStore caches contexts. Default tenant auto-reloads workflows on file change; DB tenants build their connector bus from the integrations table.
Background worker
ARQ-based async worker handles cron jobs:
run_scheduled_workflows_job(every 60s) — fire interval-based workflows via Temporalretry_dead_letters_job(every 2m) — auto-retry failed webhooks with exponential backoff (max 5 attempts)cleanup_old_data_job(daily 03:00) — delete records older thanRETENTION_DAYS
Need help? Check the troubleshooting guide or open an issue.