Troubleshooting - AgentRuntime

This guide covers the most common operator issues in AgentRuntime. Start from Command Center for run-level triage, then use the sections below by symptom.

Failed runs

Find the error

Open Command Center → Failed recently (last 24 hours)
Click Open run to view the event log in Workflow Studio
Find the first step with Failed status and read the error message
Check other steps on the canvas — Cancelled and Skipped (Blocked) are expected on fail-fast runs (not bugs)

Step status on failed runs

When one step fails, the run is terminal failed. Other steps are labeled explicitly:

Studio badge	Meaning	Action
Failed	Root cause step	Fix model, credential, tool args, or graph
Cancelled	Was running in parallel when the run failed	No fix needed — audit only
Skipped (Blocked)	Never ran — blocked by failed dependency	No fix needed — will run on a new run after upstream succeeds
Success	Finished before the failure	Outputs remain in run context for inspection

If the Events panel still shows “Processing…” on a Cancelled step after a refresh, reload the run — the event log should include a step_cancelled event. See Runs and Command Center.

Failure codes

Code	Meaning	What to do
`MCP_TOOL_FAILED`	MCP tool returned an error	Check tool args, connection binding, external API status
`LLM_REQUEST_FAILED`	LLM call failed	Verify provider key in Providers; confirm model ID
`INSUFFICIENT_CREDITS`	Out of credits	Top up PAYG or upgrade plan — see Billing
`RUN_STALLED_OR_TIMED_OUT`	No progress within timeout	Increase `timeout_s`; check for hung external API
`DRAIN_TIMEOUT`	Graceful stop took too long	Use immediate stop; investigate slow in-flight step
`RUN_STOPPED`	Operator stopped the run	Expected — start a new run if needed
`HUMAN_TASK_FAILED`	Human task step error	Check task payload; re-run with fixed graph
`UPSTREAM_ERROR`	Internal service unavailable	Retry later; contact support if persistent
`DEPENDENCY_UNAVAILABLE`	Could not reach workflow service	Check platform status; retry

Retry after fixing

Failed runs are terminal. After fixing the root cause:

Publish a new workflow version if the graph changed
Start a new run — do not expect the old run to resume

Cancelled and Skipped (Blocked) steps on the old run are historical — they record what happened on that attempt. A new run executes the graph from the beginning (or from a future checkpoint API when shipped). For transient API errors, add retry_count on flaky steps. See Workflow patterns. Planned continue-from-failure (checkpoint retry, same-run retry) is documented in Run recovery (roadmap) — not available yet.

Run appears stuck

Symptom	Likely cause	Action
Status `paused`	Operator paused or awaiting resume	Click Resume in Command Center
Pending approval	`human_task` waiting	Approve or reject in Command Center
Status `draining`	Graceful stop in progress	Wait, or escalate to immediate stop
Step `running` for a long time	Slow MCP/LLM call	Check external API; increase timeout; stop if needed
Status `queued` / `starting`	Startup delay	Wait 30s; check credits and validation errors

A run waiting on human approval is working as designed, not stuck.

MCP binding errors

Symptoms: MCP_TOOL_FAILED on the first tool step, validation errors mentioning bindings, or “instance not resolved” in dry-run.

Validate the instance

Go to MCP, open Instance config for the instance, confirm the connection is wired and the profile is active. Optionally run MCP validation at /mcp/{server_id}/validate.

Check the connection

Confirm the bound connection exists and credentials are current. OAuth connections showing Reconnect need re-authorization.

Check workflow overrides

If the workflow uses connection overrides, confirm the override points to a valid connection for this environment.

Dry-run validate

In Workflow Studio, click Validate to surface binding errors before running.

Per-connector guides

Google Workspace — OAuth reconnect and scopes
WhatsApp — token and WABA IDs
Gmail — send permissions
Postgres — DSN and network access

Credit exhaustion

Symptoms: INSUFFICIENT_CREDITS failure code, runs fail to start, billing warnings in Console.

Check balance

Go to Settings → Billing → Usage or GET /v1/billing/usage.

Identify spend

Review Analytics credits consumed and usage history for heavy workflows.

Add credits

PAYG top-up from Billing, or upgrade plan if included credits are insufficient.

Credit spend order: trial → included → PAYG. See Billing and credits.

OAuth reconnect

Symptoms: Reconnect banner on a connection, 401/403 from Google or LinkedIn MCP tools, token refresh failures.

Reconnect

On Connections, open the Google account card (or the relevant provider) and Reconnect / re-authorize.

Re-validate MCP instances

Validate all instances bound to that connection.

Check admin policy

For Google Workspace, confirm your admin allows third-party app access and required scopes.

Rotating credentials does not retroactively fix failed runs — start new runs after reconnect.

API 401 and 403 errors

401 Unauthorized

Cause	Fix
Missing `Authorization` header	Add `Bearer pat_...` or session cookie
Expired session	Re-login to Console; refresh token
Revoked PAT	Create a new PAT

403 Forbidden

Cause	Fix
Insufficient project role	Need project_contributor for writes; project_viewer for reads only
Missing PAT scope	Add `workflow:run` or `mcp:execute` to the token
Wrong tenant/project context	Send `X-Tenant-Id` and `X-Project-Id` headers
Tenant admin action as contributor	Billing mutations need tenant_admin

See API authentication and Roles and permissions.

Validation errors (422)

Dry-run validation (POST /v1/workflows/{id}/validate) catches issues before execution:

Error	Fix
Dependency cycle	Remove circular `depends_on`
Missing `tool_name` / `server_url`	Complete MCP step configuration
Lua syntax error	Fix `script` in `lua_script` steps
Unknown MCP instance	Install instance, fix reference, or use Import to remap `server_url` bindings
Invalid template	Check `{{steps.id.result}}` references exist

Template fields look wrong after editing

Symptom	What to do
Nested or duplicated `{{` / `}}` in a prompt or start-input field	Clear the field; re-insert variables from the variable picker or Template intellisense
Variable left as literal text in run output	Confirm the path matches a real step Step ID and field; use Validate (dry-run) before running
Undo (Ctrl+Z) left the field in a odd state	Clear and re-pick from suggestions, or use Variables tab to copy a clean path

See Feature availability for preview vs GA features.

Memory errors (preview)

Error	Fix
`memory kernel not configured`	Memory not enabled in this environment
`queue unavailable`	Extract infrastructure down — contact support
Empty search results	Confirm index job completed — see Memory

Still blocked?

Gather: workflow ID, run ID, failure code, step ID, timestamp
Check Command Center and the run event log
Email support@agentruntime.io with the details

Command Center — daily operator inbox
Runs and Command Center — Cancelled / Blocked badges on failure
Run recovery (roadmap) — planned checkpoint retry
Workflow patterns — retries and error handling
Connections — credential management

​Failed runs

​Find the error

​Step status on failed runs

​Failure codes

​Retry after fixing

​Run appears stuck

​MCP binding errors

​Per-connector guides

​Credit exhaustion

​OAuth reconnect

​API 401 and 403 errors

​401 Unauthorized

​403 Forbidden

​Validation errors (422)

​Template fields look wrong after editing

​Memory errors (preview)

​Still blocked?

​Related docs