How does Tether intercept LLM calls without SDK changes?

Tether runs a local HTTP proxy on your machine. You point your AI client's base_url at http://localhost:8080/v1 — that's the only change required. Tether forwards requests to the real provider and records everything locally.

Which LLM providers does Tether support?

Tether supports OpenAI, Anthropic, Ollama, LM Studio, and any provider that accepts an OpenAI-compatible base_url. It also works with LangChain, LangGraph, LlamaIndex, and similar frameworks.

Yes. Tether is free during the alpha period. The core proxy is open source.

Local-first observability for macOS

Stop debugging AI agents in the dark.

Q: Why not just use print() or logging?

Logging shows you what happened. Tether shows you why. You see the exact point where your agent failed, what response broke it, and you replay with a fix in seconds without re-running the whole chain.

Q: Can I use this with production code?

Yes. It's a local proxy, so your real code doesn't change. Use it locally for debugging, or keep it running. Tether only stores traces locally and never sends anything anywhere.

Q: Will Tether work with my stack?

If your SDK uses a configurable base_url, including OpenAI SDK, LangChain, LangGraph, LlamaIndex, or Anthropic SDK, it works with one line change. If you use a different provider or custom setup, Tether still works as a transparent proxy.

Q: Can I share traces with my team?

Not yet. Each developer runs their own Tether instance locally. Export as JSON is coming in a later release.

Point your agent at localhost. Every LLM call becomes a node in a live trace tree - with caching, mocking, and zero data leaving your machine.

Download for macOS See it in action

Currently alpha — limited seats. Get access now

Free alpha. No signup.1-minute setup.Everything stays on your Mac.

Sits transparently in front of any provider

OpenAIAnthropicOllamaLM StudioLangChainLangGraphLlamaIndex

The 3-pane blueprint

Every agent run, drawn as a tree you can read.

Messy terminal logs become a hierarchical node graph in real time. Each LLM request is a node - color-coded by what actually happened.

Visual Tree Canvas - live render

Intent Classificationgpt-4osuccess

Vector DB Retrievalcached - 0mscached

Context Synthesisclaude-3.5-sonnetsuccess

Response Generationtimeout - 4.10serror

The Visual Tree Canvas

Every LLM request becomes a node in a graph. Nested tool-calls, retries, and sub-agents nest automatically - so the shape of your agent's reasoning is finally something you can see, not scroll past.

successcachederror

Local Response Cache

Identical prompts return instantly from local cache. Iterate on downstream logic without re-running - or re-paying for - upstream calls. $0.0000 per cached hit.

<1ms$0.0000

Time-Travel Mocking

Your agent failed at step 4. Rewrite that node's output and replay from there - no re-running the full chain, no wasted tokens. Fix the exact break, not the whole pipeline.

replay from any node

Air-Gapped Privacy

API keys live encrypted in the macOS Keychain. Prompts, responses, and traces stay in a local SQLite database that never leaves the machine. Nothing is phoned home - verify it yourself with Little Snitch.

KeychainSQLite - local0 outbound

Synced inspector

Click a capability. Watch the inspector react.

The right pane is the real app's inspector. Pick a feature on the left - it switches state exactly like clicking a node in Tether.

Customer Support Agent5 nodes

1. Intent ClassificationSUCCESS

842ms$0.0118412 in / 38 out

2. Vector DB RetrievalCACHED

0ms (cached)$0.000024 in / 0 out

3. Context SynthesisSUCCESS

1.21s$0.02411840 in / 256 out

4. Tool · lookup_orderSUCCESS

318ms$0.000096 in / 142 out

5. Response GenerationERROR

4.10s (timeout)$0.00002210 in / 0 out

response.metaCACHE HIT200 OK

request_idreq_3f88ab

is_cachedtrue

latency0ms

cost$0.0000

tokens_saved1,840 in - 256 out

embedding_hashe3b0c44298fc1c14

retrieved_fromlocal_cache

store~/.Tether/cache.sqlite

hit_rate (session)62%

editing response.jsonUNSAVED

{
  "intent": "order_status",
  "confidence": 0.97,
  "entities": {
    "sentiment": "calm",  <- mocked
    "order_id": "4471"
  }
}

secrets & storage encrypted

OPENAI_API_KEYsk-********************7f2a - macOS KeychainSECURE

ANTHROPIC_API_KEYsk-ant-************91be - macOS KeychainSECURE

Trace database~/.Tether/traces.sqlite - 0 bytes sentLOCAL

Outbound connectionsonly to providers you configured - telemetry off0 / hr

Three lines to first trace

No SDK. Just change one base URL.

Tether is a transparent proxy. Point your client at localhost and every call shows up in the canvas - no code instrumentation, no decorators.

01 -

Point the base_url

Swap your client's endpoint for the local proxy. Works with any OpenAI-compatible SDK.

# your existing code
client = OpenAI(
base_url="http://localhost:8080/v1"
)

02 -

Run your agent

Run anything as usual. Every request is intercepted, cached, and streamed into the tree live.

# nothing else changes
$ python agent.py
# -> 5 calls traced

03 -

Inspect & replay

Open the canvas, click the node that broke, rewrite its output, and replay forward. See exactly where your agent fails - without re-running the whole chain.

# in Tether
opt+cmd+R replay from node
cmd+K mock response

Common questions

Everything you need to know.

Is Tether free?

Yes. Tether is free during the alpha period and the core proxy is open source. No credit card or account required.

Does Tether send my prompts or API keys anywhere?

No. Tether is fully air-gapped. Your prompts, responses, and API keys never leave your Mac. API keys are stored encrypted in the macOS Keychain and are never written to disk in plain text.

How does Tether intercept LLM calls without changing my code?

Tether runs a local HTTP proxy on your machine. You point your AI client's base_url at http://localhost:8080/v1 — that's the only change. Tether transparently forwards every request to the real provider and records the full request/response pair locally.

Which LLM providers and frameworks does Tether support?

Tether supports OpenAI, Anthropic (Claude), Ollama, LM Studio, and any provider that accepts an OpenAI-compatible base_url. It works with LangChain, LangGraph, LlamaIndex, and any SDK with a configurable endpoint.

How is Tether different from LangSmith or Weights & Biases?

LangSmith and W&B send your traces to cloud servers. Tether keeps everything on your machine — there is no cloud, no account, and nothing leaves your Mac. It's designed for developers who can't or won't send production prompts to third-party services.

What is time-travel mocking?

Time-travel mocking lets you click any past node in the agent trace, edit its response JSON, and replay the entire chain from that point forward — without re-running earlier steps or spending tokens. You can test how your agent would behave with a different LLM output in seconds.

Why not just use print() or logging?

Logging shows you what happened. Tether shows you why. You see the exact point where your agent failed, what response broke it, and you replay with a fix in seconds—no re-running the whole chain.

Can I use this with production code?

Yes. It's a local proxy—your real code doesn't change. Use it locally for debugging, or keep it running. Tether only stores traces locally, never sends anything anywhere.

How much money does caching actually save?

It depends on your agent. If you're iterating on prompt logic and re-running the same retrieval steps, caching saves you 50-90% of API spend while you debug. Each cached hit costs $0.0000.

Will Tether work with my stack?

If your SDK uses a configurable base_url (OpenAI SDK, LangChain, LangGraph, LlamaIndex, Anthropic SDK), it works. One line change. If you use a different provider (Claude API via REST, custom setup), Tether still works—it's a transparent proxy.

Does Tether add latency to my agent?

Negligible. Tether runs locally on your Mac. The only overhead is the proxy hop, which is <1ms. Real LLM calls are the bottleneck, not Tether.

Can I share traces with my team?

Not yet. Each developer runs their own Tether instance locally. Export as JSON is coming in a later release.

Feedback loop

Tell me what breaks first.

Alpha users shape the next build. Send the bug, missing workflow, or reason you would not use this yet.

Trace your first agent
in under a minute.

Free during alpha. Get the signed DMG the moment it's ready - no account, no cloud, no strings.

⚡ Limited alpha slots. Mac only for now.

Download DMG

or join the waitlist

Setup steps See product

macOS 13+No account requiredOpen source core

Every agent run, drawn as a tree you can read.

The Visual Tree Canvas

Local Response Cache

Time-Travel Mocking

Air-Gapped Privacy

Click a capability. Watch the inspector react.

Visual Tree

Local Response Cache

Time-Travel Mocking

Air-Gapped Privacy

No SDK. Just change one base URL.

Point the base_url

Run your agent

Inspect & replay

Everything you need to know.

Tell me what breaks first.

Trace your first agentin under a minute.

Trace your first agent
in under a minute.