How To Use AI Proxy (BYOK) API

The BYOK (Bring Your Own Key) Proxy routes requests through Carpathian to your own Anthropic or OpenAI account. Carpathian adds usage tracking, rate limiting, content filtering, an IP firewall, model allowlists, and per-request logging on top of your existing provider account.

Setup

In the dashboard, go to AI → Proxy → New Proxy Key. Pick a provider (anthropic, openai, or other) and paste your provider's API key. Carpathian stores the provider key encrypted at rest and gives you a Carpathian proxy key that starts with cpx_. The cpx_ key is shown once on creation. If you lose it, rotate from the proxy detail page.

In your application, replace your provider's base URL with https://api.carpathian.ai/ai/proxy and replace your provider key with the cpx_ key. Nothing else changes — same model names, same request body, same response shape.

Endpoints

The proxy exposes two upstream-compatible endpoints. The endpoint you call must match the provider configured on the key.

For Anthropic, send POST requests to /ai/proxy/v1/messages. The proxy forwards to https://api.anthropic.com/v1/messages with x-api-key set from your stored key, and adds the anthropic-version: 2023-06-01 header automatically.

For OpenAI (and other, which is treated as OpenAI-compatible), send POST requests to /ai/proxy/v1/chat/completions. /ai/proxy/chat/completions is an alias. The proxy forwards to https://api.openai.com/v1/chat/completions with the Authorization: Bearer header set from your stored key.

Authentication

Send Authorization: Bearer cpx_your_proxy_key and Content-Type: application/json. The proxy translates this into the upstream provider's auth header using the provider key you stored. Your provider key never leaves Carpathian's backend.

Request and response format

The proxy is transparent. The request body you send is forwarded to the provider unchanged (after content-filter inspection), and the provider's response is returned unchanged. Anything the provider supports — tool use, vision, JSON mode, structured outputs, prompt caching, multi-turn, system prompts — works without configuration.

A minimal Anthropic request:

curl https://api.carpathian.ai/ai/proxy/v1/messages \
  -H "Authorization: Bearer cpx_your_proxy_key" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-5","max_tokens":256,"messages":[{"role":"user","content":"Hello"}]}'

A minimal OpenAI SDK migration:

from openai import OpenAI
client = OpenAI(base_url="https://api.carpathian.ai/ai/proxy", api_key="cpx_your_proxy_key")
resp = client.chat.completions.create(model="gpt-4o", messages=[{"role":"user","content":"Hello"}])

The Anthropic Python SDK takes the same change — anthropic.Anthropic(base_url=..., api_key="cpx_..."). The TypeScript SDKs use baseURL and apiKey.

Streaming

Set "stream": true in the request body. The proxy opens an SSE connection to the provider and forwards each chunk through without buffering. All security checks (auth, rate limit, IP allowlist, firewall, content filter, budgets) run before the upstream stream is opened.

Token counts on streamed requests are recorded as zero in usage logs, because the streamed response body is byte-passed and not parsed. The request still counts toward rate limits.

Token budgets

Carpathian doesn't bill for proxy traffic — your provider does. The proxy records token counts (extracted from the upstream response) in usage logs so you can see what you're spending.

You can cap proxy spend with a daily token limit and a monthly token limit, set on the proxy key detail page. Daily resets at midnight server time; monthly resets at the start of the calendar month. When a budget is hit, requests return 429 with code DAILY_BUDGET_EXCEEDED or MONTHLY_BUDGET_EXCEEDED. Both limits are optional — leave blank for no cap.

Rate limiting

Default 60 requests per minute per key. Configurable from 1 up to 10,000 RPM on the proxy key detail page. Implemented as a Redis sliding window over the last 60 seconds. When exceeded, the proxy returns 429 with code RATE_LIMITED.

IP allowlist

Restrict which IPs can use the key. Stored as a JSON list of IPs and CIDR ranges, for example ["203.0.113.10", "198.51.100.0/24"]. When set, requests from IPs not on the list are rejected with 403 and the attempt is logged as a security event. Each rejection counts toward auto-lock.

Geo-aware firewall

Independent from the IP allowlist. When enabled, the firewall tracks every IP that hits the key with country, city, and region from a GeoIP lookup. In monitor mode, new IPs are recorded but allowed. In enforce mode, new IPs are blocked with 403 FIREWALL_BLOCKED until you approve them from the Known IPs tab on the proxy key detail page.

Model allowlist

A comma-separated list of model identifiers the key is allowed to call. When set, requests for any model not on the list return 403 with code MODEL_NOT_ALLOWED. Leave empty to allow any model the provider supports.

Content filtering

Enabled by default. The proxy scans every incoming request for two pattern groups before forwarding to the provider.

When a request matches, the proxy returns 400 with a content_policy_violation error and logs a high-severity security event. The request is not forwarded.

In addition to pattern matching, the proxy HTML-escapes all text content in messages before forwarding. This is invisible for natural-language prompts but affects code samples that contain literal HTML, XML, or JSX.

Auto-lock

If three IP-block events are logged on the same key within five minutes, the key is automatically locked. While locked, every request returns 403 and the key holds that state until manually unlocked. Org admins receive an email listing the offending IPs.

To unlock, go to the proxy key detail page, review the Security Events tab to confirm the blocks were not legitimate, fix the underlying allowlist or firewall configuration, and click Unlock. Unlocking requires 2FA. If the underlying configuration is still wrong when you unlock, the key will lock again on the next batch of failed requests.

Two-factor authentication

Creating a proxy key, rotating it, deleting it, updating the stored provider key, and unlocking it after auto-lock all require 2FA on the user's account. Configure 2FA at Account → Security.

Status codes

A successful request returns the provider's status code (typically 200) and body unchanged.

Carpathian-originated errors:

400 content_filtered — the content filter blocked the request.
401 — missing, invalid, or revoked cpx_ key.
403 — IP not on the allowlist, or the key is locked.
403 FIREWALL_BLOCKED — geo-firewall blocked a new or unapproved IP.
403 MODEL_NOT_ALLOWED — requested model is not in the key's model allowlist.
429 RATE_LIMITED — sliding-window rate limit exceeded.
429 DAILY_BUDGET_EXCEEDED / MONTHLY_BUDGET_EXCEEDED — token cap reached.
502 — the proxy could not reach the provider, or the provider returned a non-JSON response. Most often this means the stored provider key is invalid or the provider is down.
504 — the provider did not respond within 300 seconds.

Limits

Each organization may have at most 10 proxy keys. Rate limits are configurable from 1 to 10,000 RPM. The stored provider key may be up to 500 characters; the key name up to 255. Request timeout is 300 seconds. Auto-lock fires after 3 blocked IPs in 5 minutes. The Anthropic API version sent upstream is 2023-06-01.

Logged data

Per-request logs record the proxy key, provider, model name, token counts, HTTP status, response time, IP address, and user agent. Request and response bodies are not stored — the proxy is a transparent governance layer, not a content archive. Security events (IP blocks, firewall events, lock and unlock events, content-filter violations) are recorded separately and visible on the Security Events tab.