How To Use Carpathian's AI API

The Carpathian AI API gives you OpenAI-compatible access to models that Carpathian hosts on its own infrastructure. Usage is billed by token from the allowance in your AI subscription, then from any token packs you've purchased.

Setup

In the dashboard, go to AI → Plans and choose a subscription if you don't have one. Each plan includes a monthly token allowance and a maximum model grade (lite, medium, or large). Then go to AI → Instances → New Instance, name the instance, and pick the model you want it to back. Carpathian creates an API key starting with cai_ and shows it once on creation. If you lose it, rotate from the instance detail page.

In your application, set the base URL to https://api.carpathian.ai/ai/v1 and the API key to your cai_ key. The model name in the request body is ignored — Carpathian routes every request to the model the instance is bound to. Pass "default" if your client requires a value.

Endpoints

The API accepts POST requests at /ai/v1/chat/completions and several aliases: /ai/v1/completions, /ai/v1, /ai/chat/completions, and /ai/chat. All take the same OpenAI Chat Completions payload. Use whichever path your client expects.

Authentication

Send Authorization: Bearer cai_your_api_key and Content-Type: application/json. The cai_ key is bound to a single instance and inherits that instance's model, rate limit, IP allowlist, firewall configuration, and token budget.

Request and response format

The request body follows OpenAI Chat Completions: a messages array with role and content, plus optional temperature, max_tokens, and response_format. When the instance is backed by an Ollama model, response_format: {"type": "json_object"} is honored by setting Ollama's format=json upstream. For other backends, JSON-mode behavior depends on the model.

The response always follows the standard OpenAI shape, with choices[].message.content and a usage object containing prompt_tokens, completion_tokens, and total_tokens. The model field in the response is the actual backing-model identifier, not the placeholder string you sent. Streaming is not supported — the full response is always returned at once.

A minimal request:

curl https://api.carpathian.ai/ai/v1/chat/completions \
  -H "Authorization: Bearer cai_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"model":"default","messages":[{"role":"user","content":"Hello"}]}'

Migration from the OpenAI Python SDK is two lines:

from openai import OpenAI
client = OpenAI(base_url="https://api.carpathian.ai/ai/v1", api_key="cai_your_api_key")
resp = client.chat.completions.create(model="default", messages=[{"role":"user","content":"Hello"}])

The Node SDK takes the same change with baseURL and apiKey.

Multi-turn conversations

The API does not store conversation state. Keep the message history in your application and resend the full transcript on each request, with a leading {"role":"system","content":"..."} message when you want to set behavior. After receiving a response, append the assistant message to your local history before sending the next user turn.

Tokens and billing

For each successful request, Carpathian deducts ceil(total_tokens × cost_multiplier) from your token balance. The cost multiplier is set per model and is shown on the model card when you create or edit an instance. Lite-grade models are cheap; large-grade models are expensive.

Your balance is split into two buckets. Subscription tokens come from your monthly plan allowance and reset at the start of each billing period. Pack tokens come from one-time token-pack purchases and carry over until used. Subscription tokens are spent first; pack tokens cover the overflow.

If your balance is zero before a request, the API returns 402 with code INSUFFICIENT_TOKENS and the request is not forwarded to the model. Buy a token pack from AI → Plans → Token Packs to top up.

You can also set a per-instance token budget on the instance detail page. Crossing 80%, 50%, or 20% remaining sends an email alert to your org admins. At 0%, requests on that instance return 402 with code INSTANCE_BUDGET_EXCEEDED until the budget is raised.

Failed requests don't deduct tokens. Errors, rate limits, content blocks, and IP blocks are free.

Rate limiting

Default 60 requests per minute per instance. Configurable from 1 up to 10,000 RPM on the instance detail page. Implemented as a Redis sliding window over the last 60 seconds. When exceeded, the API returns 429 with code RATE_LIMITED.

IP allowlist

Restrict which IPs can use the instance's key. Stored as a JSON list of IPs and CIDR ranges. When set, requests from IPs not on the list are rejected with 403 and the attempt is logged as a security event. Each rejection counts toward auto-lock.

Geo-aware firewall

Independent from the IP allowlist. When enabled, the firewall tracks every IP that hits the instance with country, city, and region from a GeoIP lookup. In monitor mode, new IPs are recorded but allowed. In enforce mode, new IPs are blocked with 403 FIREWALL_BLOCKED until you approve them from the Known IPs tab on the instance detail page.

Content filtering

Enabled by default on every instance. The API scans and escapes text on every incoming request before forwarding to the model.

Auto-lock

If three IP-block events are logged on the same instance within five minutes, the instance is automatically locked. While locked, every request returns 403 and the instance holds that state until manually unlocked. Org admins receive an email listing the offending IPs.

To unlock, go to the instance detail page, review the Security Events tab to confirm the blocks were not legitimate, fix the underlying allowlist or firewall configuration, and click Unlock. Unlocking requires 2FA. If the underlying configuration is still wrong when you unlock, the instance will lock again on the next batch of failed requests.

Two-factor authentication

Creating an instance, rotating its key, deleting it, and unlocking it after auto-lock all require 2FA on the user's account. Configure 2FA at Account → Security.

Status codes

A successful request returns 200 with the standard OpenAI body shape and a usage block.

Carpathian-originated errors:

400 content_filtered — the content filter blocked the request.
401 — missing, invalid, or revoked cai_ key.
402 INSUFFICIENT_TOKENS — token balance is zero.
402 INSTANCE_BUDGET_EXCEEDED — the instance's per-instance budget cap is reached.
403 — IP not on the allowlist, or the instance is locked.
403 FIREWALL_BLOCKED — geo-firewall blocked a new or unapproved IP.
429 RATE_LIMITED — sliding-window rate limit exceeded.
502 — the model backend errored or returned a non-JSON response. The body includes upstream_status.
503 — the model is disabled or unavailable.
504 — the model did not respond within 120 seconds.

Most Carpathian errors return a body of the form {"error": "...", "code": "..."}. Content-filter errors return the OpenAI-compatible nested shape {"error": {"message": "...", "type": "content_policy_violation", "code": "content_filtered"}}. There are no rate-limit response headers.

Limits

Rate limits are configurable from 1 to 10,000 RPM. Instance name may be up to 255 characters. Request timeout is 120 seconds. Auto-lock fires after 3 blocked IPs in 5 minutes.

Logged data

Per-request logs record the instance, model, token counts (input, output, total, charged), HTTP status, response time, IP address, user agent, and the encrypted request and response bodies. Security events (IP blocks, firewall events, lock and unlock events, content-filter violations) are recorded separately and visible on the Security Events tab.