New · Self-hosted AI

Run your own open-source LLM

Deploy a private AI server in minutes. It runs a leading open-source model on a dedicated NVIDIA GPU and exposes an OpenAI-compatible API secured by your own key — so every prompt, byte and reply stays on a server that's yours alone. One flat monthly price. No per-token bills.

🔒

Your data stays yours

Prompts, documents and answers never leave your server — nothing is sent to a third-party AI provider. Ideal for private, sensitive or regulated work.

♾️

No per-token bills

One flat monthly price for the whole box. Run as many tokens as the hardware allows — no metering, no usage surprises.

🔌

OpenAI-compatible

A drop-in /v1/chat/completions endpoint. Point the OpenAI SDK (or any compatible client) at your server's URL and key — no code changes.

🧩

Pick any model

Choose from leading open-source models and swap anytime. It's your hardware — no quotas, no rate limits, no vendor lock-in.

Ready in minutes

We install the NVIDIA driver, the inference runtime and pull your chosen model automatically. Deploy and it's serving.

💬

Chat & control anywhere

Add a built-in browser chat UI (Open WebUI), or connect your model to Caimunicate over AiCB — chat with it and send config commands (switch or pull models, set a system prompt) from anywhere.

The servers

GPU servers, sized for the model

The bigger the model, the more GPU memory (VRAM) it needs. Pick the tier that fits the model you want to run — resize anytime.

ServerGPUVRAMRuns models up toRAM · StorageFrom
GPU AINVIDIA T416 GB~14B (e.g. Qwen2.5 14B)16 GB · 225 GB$1,198/mo
GPU A10NVIDIA A10G24 GB~32B (Qwen2.5 32B)16 GB · 300 GB$2,298/mo
GPU L40SNVIDIA L40S48 GB70B (Llama 3.3 70B)32 GB · 400 GB$3,998/mo

Prices in AUD, flat monthly, billed only while the server runs. See full pricing →. Any standard (CPU) plan can also run a small model for testing — slower, and 7–8B models only.

The models

Leading open-source LLMs, one click

Choose a model when you deploy — we pull it for you. Min VRAM is the GPU memory to run it comfortably (4-bit); match it to a server tier above.

ModelSizeMin VRAMLicenceBest for
Qwen2.57B~8 GBApache-2.0Great default — fast, multilingual, strong at following instructions
Qwen2.514B~12 GBApache-2.0More reasoning headroom, still mid-GPU friendly
Qwen2.532B~24 GBApache-2.0Near-flagship quality (needs a 24 GB GPU)
Qwen2.5 Coder7B~8 GBApache-2.0Code generation, completion and review
Llama 3.18B~8 GBLlama 3.1 CommunityMeta's popular general model with a huge ecosystem
Llama 3.370B~44 GBLlama 3.3 CommunityFlagship-class open model (needs a 48 GB GPU)
Gemma 29B~10 GBGemmaGoogle's efficient, high-quality small model
Mistral7B~8 GBApache-2.0Fast and lean — a proven workhorse
DeepSeek-R114B~12 GBMITReasoning model with visible chain-of-thought

Models are served with Ollama and refreshed to current releases over time. Need a specific model? Ask us →

Use it anywhere

Talk to it like OpenAI

Your server exposes a standard OpenAI-style API on its own HTTPS address, protected by a bearer key we generate for you. Point any OpenAI client at it — just change the base URL and key.

curl https://your-server/v1/chat/completions \
  -H "Authorization: Bearer your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5:14b","messages":[{"role":"user","content":"Hello!"}]}'

The endpoint URL and key appear on your server's 🧠 LLM panel in the dashboard. Connect a domain for a trusted TLS certificate, or use the built-in web chat UI.

How it works

From signup to your own AI in four steps

Pick a GPU plan

Choose the tier that fits the model you want (GPU AI, GPU A10 or GPU L40S).

Choose the LLM image

Select the LLM Server image and a model — optionally add the web chat UI.

Deploy

We install the GPU driver & runtime and pull your model. It's serving in minutes.

Use it

Grab the URL & key from the dashboard and call the OpenAI-compatible API — or chat in the browser.

Your private AI, on your own server

Deploy an open-source LLM on a dedicated GPU in minutes — your data never leaves the box, and there are no per-token bills.

Deploy an LLM server →