New · Self-hosted AI

Run your own open-source LLM

Deploy a private AI server in minutes. It runs a leading open-source model on a dedicated NVIDIA GPU and exposes an OpenAI-compatible API secured by your own key — so every prompt, byte and reply stays on a server that's yours alone. One flat monthly price. No per-token bills.

Deploy an LLM server →See the models

🔒

Your data stays yours

Prompts, documents and answers never leave your server — nothing is sent to a third-party AI provider. Ideal for private, sensitive or regulated work.

♾️

No per-token bills

One flat monthly price for the whole box. Run as many tokens as the hardware allows — no metering, no usage surprises.

🔌

OpenAI-compatible

A drop-in /v1/chat/completions endpoint. Point the OpenAI SDK (or any compatible client) at your server's URL and key — no code changes.

🧩

Pick any model

Choose from leading open-source models and swap anytime. It's your hardware — no quotas, no rate limits, no vendor lock-in.

⚡

Ready in minutes

We install the NVIDIA driver, the inference runtime and pull your chosen model automatically. Deploy and it's serving.

💬

Chat & control anywhere

Add a built-in browser chat UI (Open WebUI), or connect your model to Caimunicate over AiCB — chat with it and send config commands (switch or pull models, set a system prompt) from anywhere.

The servers

GPU servers, sized for the model

The bigger the model, the more GPU memory (VRAM) it needs. Pick the tier that fits the model you want to run — resize anytime.

Server	GPU	VRAM	Runs models up to	RAM · Storage	From
GPU AI	NVIDIA T4	16 GB	~14B (e.g. Qwen2.5 14B)	16 GB · 225 GB	$1,198/mo
GPU A10	NVIDIA A10G	24 GB	~32B (Qwen2.5 32B)	16 GB · 300 GB	$2,298/mo
GPU L40S	NVIDIA L40S	48 GB	70B (Llama 3.3 70B)	32 GB · 400 GB	$3,998/mo

Prices in AUD, flat monthly, billed only while the server runs. See full pricing →. Any standard (CPU) plan can also run a small model for testing — slower, and 7–8B models only.

The models

Leading open-source LLMs, one click

Choose a model when you deploy — we pull it for you. Min VRAM is the GPU memory to run it comfortably (4-bit); match it to a server tier above.

Model	Size	Min VRAM	Licence	Best for
Qwen2.5	7B	~8 GB	Apache-2.0	Great default — fast, multilingual, strong at following instructions
Qwen2.5	14B	~12 GB	Apache-2.0	More reasoning headroom, still mid-GPU friendly
Qwen2.5	32B	~24 GB	Apache-2.0	Near-flagship quality (needs a 24 GB GPU)
Qwen2.5 Coder	7B	~8 GB	Apache-2.0	Code generation, completion and review
Llama 3.1	8B	~8 GB	Llama 3.1 Community	Meta's popular general model with a huge ecosystem
Llama 3.3	70B	~44 GB	Llama 3.3 Community	Flagship-class open model (needs a 48 GB GPU)
Gemma 2	9B	~10 GB	Gemma	Google's efficient, high-quality small model
Mistral	7B	~8 GB	Apache-2.0	Fast and lean — a proven workhorse
DeepSeek-R1	14B	~12 GB	MIT	Reasoning model with visible chain-of-thought

Models are served with Ollama and refreshed to current releases over time. Need a specific model? Ask us →

Use it anywhere

Talk to it like OpenAI

Your server exposes a standard OpenAI-style API on its own HTTPS address, protected by a bearer key we generate for you. Point any OpenAI client at it — just change the base URL and key.

curl https://your-server/v1/chat/completions \
  -H "Authorization: Bearer your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5:14b","messages":[{"role":"user","content":"Hello!"}]}'

The endpoint URL and key appear on your server's 🧠 LLM panel in the dashboard. Connect a domain for a trusted TLS certificate, or use the built-in web chat UI.

How it works

From signup to your own AI in four steps

Pick a GPU plan

Choose the tier that fits the model you want (GPU AI, GPU A10 or GPU L40S).

Choose the LLM image

Select the LLM Server image and a model — optionally add the web chat UI.

Deploy

We install the GPU driver & runtime and pull your model. It's serving in minutes.

Use it

Grab the URL & key from the dashboard and call the OpenAI-compatible API — or chat in the browser.

Your private AI, on your own server

Deploy an open-source LLM on a dedicated GPU in minutes — your data never leaves the box, and there are no per-token bills.

Deploy an LLM server →