Skip to content

How to use LiteLLM for inference

Running inference

Send a chat completion request to the LiteLLM proxy using the OpenAI-compatible API:

curl -s https://litellm.safe.ai/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model-name>",
    "messages": [
      {"role": "user", "content": "Say hello in one sentence."}
    ]
  }' | jq

Set LITELLM_API_KEY to your API key (it starts with sk), or replace $LITELLM_API_KEY with the key directly. Replace <model-name> with any model available on the proxy — you can list them with:

curl -s https://litellm.safe.ai/v1/models \
  -H "Authorization: Bearer $LITELLM_API_KEY" \
  | jq '.data[].id'

Checking your API budget

CAIS provides access to a shared LLM proxy at litellm.safe.ai. You can check your current budget and usage at any time using your API key.

curl -fsSL https://litellm.safe.ai/usage/self/budget \
  -H "Authorization: Bearer <your-api-key>" \
  | jq

Replace <your-api-key> with your API key (it starts with sk).

The response shows your current spend, budget limit, and reset period:

{
  "spend": 1.23,
  "max_budget": 50.0,
  "budget_duration": "monthly",
  "budget_reset_at": "2026-06-01T00:00:00Z"
}

Tip

If you don't have jq installed, you can omit it — the raw JSON will still be printed to your terminal.