Run your own open-source LLM
Deploy a private AI server in minutes. It runs a leading open-source model on a dedicated NVIDIA GPU and exposes an OpenAI-compatible API secured by your own key — so every prompt, byte and reply stays on a server that's yours alone. One flat monthly price. No per-token bills.
Your data stays yours
Prompts, documents and answers never leave your server — nothing is sent to a third-party AI provider. Ideal for private, sensitive or regulated work.
No per-token bills
One flat monthly price for the whole box. Run as many tokens as the hardware allows — no metering, no usage surprises.
OpenAI-compatible
A drop-in /v1/chat/completions endpoint. Point the OpenAI SDK (or any compatible client) at your server's URL and key — no code changes.
Pick any model
Choose from leading open-source models and swap anytime. It's your hardware — no quotas, no rate limits, no vendor lock-in.
Ready in minutes
We install the NVIDIA driver, the inference runtime and pull your chosen model automatically. Deploy and it's serving.
Chat & control anywhere
Add a built-in browser chat UI (Open WebUI), or connect your model to Caimunicate over AiCB — chat with it and send config commands (switch or pull models, set a system prompt) from anywhere.
GPU servers, sized for the model
The bigger the model, the more GPU memory (VRAM) it needs. Pick the tier that fits the model you want to run — resize anytime.
| Server | GPU | VRAM | Runs models up to | RAM · Storage | From |
|---|---|---|---|---|---|
| GPU AI | NVIDIA T4 | 16 GB | ~14B (e.g. Qwen2.5 14B) | 16 GB · 225 GB | $1,198/mo |
| GPU A10 | NVIDIA A10G | 24 GB | ~32B (Qwen2.5 32B) | 16 GB · 300 GB | $2,298/mo |
| GPU L40S | NVIDIA L40S | 48 GB | 70B (Llama 3.3 70B) | 32 GB · 400 GB | $3,998/mo |
Prices in AUD, flat monthly, billed only while the server runs. See full pricing →. Any standard (CPU) plan can also run a small model for testing — slower, and 7–8B models only.
Leading open-source LLMs, one click
Choose a model when you deploy — we pull it for you. Min VRAM is the GPU memory to run it comfortably (4-bit); match it to a server tier above.
| Model | Size | Min VRAM | Licence | Best for |
|---|---|---|---|---|
| Qwen2.5 | 7B | ~8 GB | Apache-2.0 | Great default — fast, multilingual, strong at following instructions |
| Qwen2.5 | 14B | ~12 GB | Apache-2.0 | More reasoning headroom, still mid-GPU friendly |
| Qwen2.5 | 32B | ~24 GB | Apache-2.0 | Near-flagship quality (needs a 24 GB GPU) |
| Qwen2.5 Coder | 7B | ~8 GB | Apache-2.0 | Code generation, completion and review |
| Llama 3.1 | 8B | ~8 GB | Llama 3.1 Community | Meta's popular general model with a huge ecosystem |
| Llama 3.3 | 70B | ~44 GB | Llama 3.3 Community | Flagship-class open model (needs a 48 GB GPU) |
| Gemma 2 | 9B | ~10 GB | Gemma | Google's efficient, high-quality small model |
| Mistral | 7B | ~8 GB | Apache-2.0 | Fast and lean — a proven workhorse |
| DeepSeek-R1 | 14B | ~12 GB | MIT | Reasoning model with visible chain-of-thought |
Models are served with Ollama and refreshed to current releases over time. Need a specific model? Ask us →
Talk to it like OpenAI
Your server exposes a standard OpenAI-style API on its own HTTPS address, protected by a bearer key we generate for you. Point any OpenAI client at it — just change the base URL and key.
curl https://your-server/v1/chat/completions \
-H "Authorization: Bearer your-key" \
-H "Content-Type: application/json" \
-d '{"model":"qwen2.5:14b","messages":[{"role":"user","content":"Hello!"}]}'
The endpoint URL and key appear on your server's 🧠 LLM panel in the dashboard. Connect a domain for a trusted TLS certificate, or use the built-in web chat UI.
From signup to your own AI in four steps
Pick a GPU plan
Choose the tier that fits the model you want (GPU AI, GPU A10 or GPU L40S).
Choose the LLM image
Select the LLM Server image and a model — optionally add the web chat UI.
Deploy
We install the GPU driver & runtime and pull your model. It's serving in minutes.
Use it
Grab the URL & key from the dashboard and call the OpenAI-compatible API — or chat in the browser.
Your private AI, on your own server
Deploy an open-source LLM on a dedicated GPU in minutes — your data never leaves the box, and there are no per-token bills.
Deploy an LLM server →