Free LLM APIs for Coeus

Looking for a free model or a generous free tier? This page tracks providers from mnfst/awesome-free-llm-apis plus a small number of manually reviewed additions that are good candidates for Coeus's OpenAI-compatible API provider.

Last reviewed in Coeus docs: 2026-04-20

Upstream source last updated: 2026-04-20

Free tiers, model availability, rate limits, and compatibility can change at any time. Always double-check the provider's current docs before relying on a free tier in production.

This page uses a committed local snapshot, not a live fetch at build time.

Source repository: mnfst/awesome-free-llm-apis
Source data file: data.json
Snapshot size in this docs page: 17 providers

How To Use These In Coeus

Open Settings → AI Providers.
Configure OpenAI-compatible API.
Paste the API URL for Coeus shown in the provider section below.
Add the provider's API key and a supported model name from that provider's list.

The important detail: the upstream project's base URL is not always the exact URL that Coeus should use. For a few providers, you need their OpenAI compatibility layer instead of their native API route.

For voice transcription, Groq is also available in Coeus under Settings → Integrations → Speech & Transcription. It is useful when local Whisper is too heavy for your machine. Groq has free API limits for getting started, and paid accounts can use higher limits. Check Groq's current rate limit page before using it for long recordings or frequent transcription.

Provider Index

Provider APIs

APIs run by the companies that train or fine-tune the models themselves.

Cohere 🇨🇦

Use the Coeus compatibility URL shown below

Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only.

Free Tier Snapshot

Trial key with 1,000 API calls/month.

API URL For Coeus

https://api.cohere.ai/compatibility/v1

Upstream Base URL

https://api.cohere.com/v2

Notes

Use Cohere's compatibility URL for Coeus, not the native v2 endpoint.

Model Name	Context	Max Output	Modality	Rate Limit
Command A (111B)	256K	4K	Text	20 RPM
Command R+	128K	4K	Text	20 RPM
Command R	128K	4K	Text	20 RPM
Command R7B	128K	4K	Text	20 RPM
Embed 4	—	—	Embeddings (Text + Image)	2,000 inputs/min
Rerank 3.5	—	—	Reranking	10 RPM

Google Gemini 🇺🇸

Use the Coeus compatibility URL shown below

Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products.

Free Tier Snapshot

Free tier with rate limits and region restrictions.

API URL For Coeus

https://generativelanguage.googleapis.com/v1beta/openai

Upstream Base URL

https://generativelanguage.googleapis.com/v1beta

Notes

Use Gemini's OpenAI compatibility layer, not the raw Gemini REST base URL.

Model Name	Context	Max Output	Modality	Rate Limit
Gemini 2.5 Flash	1M	65K	Text + Image + Audio + Video	10 RPM, 250 RPD
Gemini 2.5 Flash-Lite	1M	65K	Text + Image + Audio + Video	15 RPM, 1,000 RPD

Mistral AI 🇫🇷

Likely works with the listed URL

Free "Experiment" plan, no credit card. ~1B tokens/month.

Free Tier Snapshot

Experiment plan with roughly 1B tokens/month.

API URL For Coeus

https://api.mistral.ai/v1

Upstream Base URL

https://api.mistral.ai/v1

Notes

Likely works with the listed URL. Coeus already calls out Mistral as a common OpenAI-compatible option.

Model Name	Context	Max Output	Modality	Rate Limit
Mistral Small 4	256K	256K	Text + Image + Code	~1 RPS, 500K TPM
Mistral Medium 3	128K	128K	Text	~1 RPS, 500K TPM
Mistral Large 3	256K	256K	Text	~1 RPS, 500K TPM
Mistral Nemo (12B)	128K	128K	Text	~1 RPS, 500K TPM
Codestral	256K	256K	Code	~1 RPS, 500K TPM
Pixtral Large	128K	128K	Text + Image	~1 RPS, 500K TPM

Z AI (Zhipu AI) 🇨🇳

Likely works with the listed URL

Permanent free models, no credit card required.

Free Tier Snapshot

Permanent free models with strict concurrency limits.

API URL For Coeus

https://open.bigmodel.cn/api/paas/v4

Upstream Base URL

https://open.bigmodel.cn/api/paas/v4

Notes

Likely works with the listed URL.

Model Name	Context	Max Output	Modality	Rate Limit
GLM-4.7-Flash	200K	128K	Text	1 concurrent request
GLM-4.5-Flash	128K	~8K	Text	1 concurrent request
GLM-4.6V-Flash	128K	~4K	Text + Image	1 concurrent request

Inference Providers

Platforms that host or route models from multiple sources.

AIHubMix 🇺🇸

Likely works with the listed URL

OpenAI-compatible router with a large live catalog. Public docs and the live models API expose many explicitly free LLM variants.

Free Tier Snapshot

The live models API currently exposes 20+ explicit free LLM entries, but the exact free subset changes.

API URL For Coeus

https://aihubmix.com/v1

Upstream Base URL

https://aihubmix.com/v1

Notes

Use the listed URL in Coeus. AIHubMix's public /api/v1/models endpoint returned 22 explicit free LLM IDs on April 20, 2026, so treat the count and free subset as live data rather than a fixed promise.

Model Name	Context	Max Output	Modality	Rate Limit
Gemini 3 Flash Preview (free)	1,048,576	65,536	Text + Image + Audio + Video	5 RPM, 250 RPD, 500K daily tokens
GPT 4.1 (free)	1,047,576	32,768	Text + Image	Not shown in public models API
GPT 4.1 Mini (free)	1,047,576	32,768	Text + Image	Not shown in public models API
GPT 4.1 Nano (free)	1,047,576	32,768	Text + Image	Not shown in public models API
GPT 4o (free)	1,047,576	32,768	Text + Image	Not shown in public models API
K2.6 Code Preview (free)	256,000	256,000	Text	5 RPM, 500 RPD, 1M daily tokens
Kimi For Coding (free)	256,000	256,000	Text	5 RPM, 500 RPD, 1M daily tokens
MiMo V2 Flash (free)	256,000	256,000	Text	Not shown in public models API
Step 3.5 Flash (free)	256,000	Not shown	Text + Image	5 RPM, 250 RPD, 500K daily tokens
Coding MiniMax M2.7 (free)	204,800	13,100	Text	5 RPM, 500 RPD, 1M daily tokens
Coding GLM 4.6 (free)	200,000	128,000	Text	5 RPM, 500 RPD, 1M daily tokens
+ more explicit free coding/router models	Varies	Varies	Mostly text	Varies

Cerebras 🇺🇸

Likely works with the listed URL

Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap.

Free Tier Snapshot

Advertised as 1M tokens/day, but free-tier context and RPM can be temporarily reduced per model.

API URL For Coeus

https://api.cerebras.ai/v1

Upstream Base URL

https://api.cerebras.ai/v1

Notes

Use the listed URL. Check Cerebras -> Limits in your account for exact quotas; on April 20, 2026, a personal free-tier dashboard showed llama3.1-8b at 8,192 context and qwen-3-235b-a22b-instruct-2507 at 65,536 context / 5 RPM, and Cerebras noted temporary free-tier reductions.

Model Name	Context	Max Output	Modality	Rate Limit
llama3.1-8b	8,192	8K	Text	30 RPM, 60K TPM, 14,400 RPD, 1M TPD
gpt-oss-120b	128K (8K on free)	8K	Text	30 RPM, 14,400 RPD, 1M TPD
qwen-3-235b-a22b-instruct-2507	65,536	8K	Text	5 RPM, 30K TPM, 14,400 RPD, 1M TPD
zai-glm-4.7	128K (8K on free)	8K	Text	10 RPM, 100 RPD, 1M TPD

Cloudflare Workers AI 🇺🇸

Use the Coeus compatibility URL shown below

10,000 Neurons/day free. 50+ models available on free tier.

Free Tier Snapshot

10,000 neurons/day on the free tier.

API URL For Coeus

https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1

Upstream Base URL

https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run

Notes

Use Cloudflare's OpenAI-compatible ai/v1 path, not the native ai/run path.

Model Name	Context	Max Output	Modality	Rate Limit
@cf/meta/llama-3.3-70b-instruct-fp8-fast	131K	Shared w/ context	Text	10K neurons/day (shared)
@cf/meta/llama-3.1-8b-instruct-fp8-fast	131K	Shared w/ context	Text	10K neurons/day (shared)
@cf/meta/llama-3.2-11b-vision-instruct	131K	Shared w/ context	Text + Vision	10K neurons/day (shared)
@cf/meta/llama-4-scout-17b-16e-instruct	Up to 10M	Shared w/ context	Multimodal	10K neurons/day (shared)
@cf/mistralai/mistral-small-3.1-24b-instruct	128K	Shared w/ context	Text	10K neurons/day (shared)
@cf/google/gemma-4-26b-a4b-it	256K	Shared w/ context	Text	10K neurons/day (shared)
@cf/qwen/qwq-32b	32K	Shared w/ context	Text	10K neurons/day (shared)
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b	32K	Shared w/ context	Text	10K neurons/day (shared)
+ 42 more models	Varies	Varies	Text, Image, Audio, Embeddings	10K neurons/day (shared)

GitHub Models 🇺🇸

Use the Coeus compatibility URL shown below

Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out).

Free Tier Snapshot

Free prototyping with per-request limits for GitHub users.

API URL For Coeus

https://models.github.ai/inference

Upstream Base URL

https://models.inference.ai.azure.com

Notes

Use a GitHub token with models:read. The GitHub Models chat completions endpoint differs from the upstream snapshot URL.

Model Name	Context	Max Output	Modality	Rate Limit
gpt-4.1	1M	32K	Text	10 RPM, 50 RPD
gpt-4.1-mini	1M	32K	Text	15 RPM, 150 RPD
gpt-4o	128K	16K	Text + Vision	10 RPM, 50 RPD
o3-mini	200K	100K	Text (reasoning)	10 RPM, 50 RPD
o4-mini	200K	100K	Text (reasoning)	10 RPM, 50 RPD
Llama-4-Scout-17B-16E	512K	~4K	Text + Vision	15 RPM, 150 RPD
Llama-4-Maverick-17B-128E	256K	~4K	Text + Vision	10 RPM, 50 RPD
Meta-Llama-3.3-70B	131K	~4K	Text	15 RPM, 150 RPD
DeepSeek-R1	64K	8K	Text (reasoning)	15 RPM, 150 RPD
Mistral-Small-3.1	128K	~4K	Text + Vision	15 RPM, 150 RPD
+ 35 more models	Varies	Varies	Text / Image	Varies by tier

Groq 🇺🇸

Likely works with the listed URL

Free tier, no credit card. Ultra-fast LPU inference.

Free Tier Snapshot

Free tier, no credit card, very fast inference.

API URL For Coeus

https://api.groq.com/openai/v1

Upstream Base URL

https://api.groq.com/openai/v1

Notes

Likely works with the listed URL. Good if you want speed more than model breadth.

Model Name	Context	Max Output	Modality	Rate Limit
llama-3.3-70b-versatile	131K	32K	Text	30 RPM, 14,400 RPD
llama-3.1-8b-instant	131K	131K	Text	30 RPM, 14,400 RPD
llama-4-scout-17b-16e-instruct	131K	8K	Text + Vision	30 RPM, 14,400 RPD
llama-4-maverick-17b-128e-instruct	131K	8K	Text + Vision	15 RPM, 500 RPD
qwen3-32b	131K	131K	Text	30 RPM, 14,400 RPD
gpt-oss-120b	131K	32K	Text	30 RPM, 14,400 RPD
kimi-k2-instruct	262K	262K	Text	30 RPM, 14,400 RPD
deepseek-r1-distill-70b	131K	8K	Text	30 RPM, 14,400 RPD
whisper-large-v3	—	—	Audio → Text	20 RPM, 2,000 RPD
whisper-large-v3-turbo	—	—	Audio → Text	20 RPM, 2,000 RPD

Hugging Face 🇺🇸

Use the Coeus compatibility URL shown below

Free Serverless Inference API + ~$0.10/month free credits. Thousands of models.

Free Tier Snapshot

Free serverless inference plus small monthly credits.

API URL For Coeus

https://router.huggingface.co/v1

Upstream Base URL

https://api-inference.huggingface.co/models

Notes

Use the OpenAI-compatible router endpoint for Coeus, not the raw Inference API models path.

Model Name	Context	Max Output	Modality	Rate Limit
Meta-Llama-3.1-8B-Instruct	128K	~4K	Text	~1,000 RPD
Mistral-7B-Instruct-v0.3	32K	~4K	Text	~1,000 RPD
Mixtral-8x7B-Instruct-v0.1	32K	~4K	Text	~1,000 RPD
Phi-3.5-mini-instruct	128K	~4K	Text	~1,000 RPD
Qwen2.5-7B-Instruct	131K	~4K	Text	~1,000 RPD
+ thousands of community models	Varies	Varies	Text, Image, Audio, Embeddings	~$0.10/month free credits

Kilo Code 🇺🇸

Likely works with the listed URL

Free models with no credit card required. `kilo-auto/free` auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%).

Free Tier Snapshot

Free models with a Kilo-hosted router.

API URL For Coeus

https://api.kilo.ai/api/gateway

Upstream Base URL

https://api.kilo.ai/api/gateway

Notes

Likely works with the listed URL, but check Kilo's current docs for model naming and headers.

Model Name	Context	Max Output	Modality	Rate Limit
bytedance-seed/dola-seed-2.0-pro:free	—	—	Text	~200 req/hr
x-ai/grok-code-fast-1:optimized:free	—	—	Text (code)	~200 req/hr
nvidia/nemotron-3-super-120b-a12b:free	262K	32K	Text	~200 req/hr
arcee-ai/trinity-large-thinking:free	—	—	Text (reasoning)	~200 req/hr
openrouter/free	Varies	Varies	Text	~200 req/hr

LLM7.io 🇬🇧

Likely works with the listed URL

Zero-friction API gateway. No registration needed for basic access. 30+ models.

Free Tier Snapshot

Basic free access without registration plus higher tokened limits.

API URL For Coeus

https://api.llm7.io/v1

Upstream Base URL

https://api.llm7.io/v1

Notes

Likely works with the listed URL.

Model Name	Context	Max Output	Modality	Rate Limit
deepseek-r1-0528	—	—	Text (reasoning)	30 RPM (120 with token)
deepseek-v3-0324	—	—	Text	30 RPM (120 with token)
gemini-2.5-flash-lite	—	—	Text + Vision	30 RPM (120 with token)
gpt-4o-mini	—	—	Text + Vision	30 RPM (120 with token)
mistral-small-3.1-24b	32K	—	Text	30 RPM (120 with token)
qwen2.5-coder-32b	—	—	Text (code)	30 RPM (120 with token)
+ ~24 more models	Varies	Varies	Text	30 RPM (120 with token)

ModelScope 🇨🇳

Likely works with the listed URL

Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification.

Free Tier Snapshot

Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification.

API URL For Coeus

https://api-inference.modelscope.cn/v1

Upstream Base URL

https://api-inference.modelscope.cn/v1

Notes

Likely works with the listed URL, but check the provider docs for current auth and model naming.

Model Name	Context	Max Output	Modality	Rate Limit
Qwen/Qwen3.5-35B-A3B	—	—	Text + Vision	2,000 RPD total; <=500 RPD/model (dynamic)
Qwen/Qwen3.5-27B	—	—	Text	2,000 RPD total; <=500 RPD/model (dynamic)
Qwen/Qwen-Image	—	—	Image Generation	2,000 RPD total; model/AIGC-specific caps
+ API-Inference-enabled models	Varies	Varies	LLM, MLLM, AIGC	Dynamic quotas + dynamic concurrency

NVIDIA NIM 🇺🇸

Likely works with the listed URL

Free with NVIDIA Developer Program membership. 100+ models. No daily token cap.

Free Tier Snapshot

Free with NVIDIA Developer Program membership.

API URL For Coeus

https://integrate.api.nvidia.com/v1

Upstream Base URL

https://integrate.api.nvidia.com/v1

Notes

Likely works with the listed URL.

Model Name	Context	Max Output	Modality	Rate Limit
deepseek-ai/deepseek-r1	128K	~163K	Text (reasoning)	~40 RPM
nvidia/llama-3.1-nemotron-ultra-253b-v1	128K	4K	Text	~40 RPM
nvidia/nemotron-3-super-120b-a12b	262K	262K	Text	~40 RPM
nvidia/nemotron-3-nano-30b-a3b	128K	32K	Text	~40 RPM
meta/llama-3.1-405b-instruct	128K	4K	Text	~40 RPM
qwen/qwen2.5-72b-instruct	128K	8K	Text	~40 RPM
google/gemma-4-31b	128K	8K	Text	~40 RPM
mistralai/mistral-large-2-instruct	128K	4K	Text	~40 RPM
nvidia/nemotron-nano-2-vl	128K	8K	Vision + Text + Video	~40 RPM
minimax/minimax-m2.7	128K	8K	Text	~40 RPM
+ 90 more models	Varies	Varies	Text, Image, Video, Speech, Embeddings	~40 RPM

Ollama Cloud 🇺🇸

Use local Ollama v1 for Coeus

Free tier with qualitative usage limits. 400+ models from Ollama library. Not OpenAI SDK-compatible; uses [Ollama API](https://docs.ollama.com/cloud).

Free Tier Snapshot

Cloud-backed models with qualitative usage limits.

API URL For Coeus

http://localhost:11434/v1

Upstream Base URL

https://api.ollama.com

Notes

Best fit for Coeus is local Ollama's OpenAI-compatible v1 endpoint. After ollama signin, cloud-backed models can flow through your local Ollama install.

Model Name	Context	Max Output	Modality	Rate Limit
llama3.1:cloud	128K	Model-dependent	Text	Session/weekly limits (unpublished)
deepseek-r1:cloud	128K	Model-dependent	Text (reasoning)	Session/weekly limits (unpublished)
qwen2.5:cloud	128K	Model-dependent	Text	Session/weekly limits (unpublished)
gemma2:cloud	8K	Model-dependent	Text	Session/weekly limits (unpublished)
mistral:cloud	32K	Model-dependent	Text	Session/weekly limits (unpublished)
+ 400 more models	Varies	Varies	Text	Session/weekly limits (unpublished)

OpenRouter 🇺🇸

Likely works with the listed URL

35+ free models (marked with `:free` suffix). OpenAI SDK-compatible.

Free Tier Snapshot

35+ free models with a :free suffix.

API URL For Coeus

https://openrouter.ai/api/v1

Upstream Base URL

https://openrouter.ai/api/v1

Notes

Strong fit for Coeus. Good starting point if you want many models behind one key.

Model Name	Context	Max Output	Modality	Rate Limit
deepseek/deepseek-r1-0528:free	163K	~163K	Text (reasoning)	20 RPM, 200 RPD
deepseek/deepseek-chat-v3-0324:free	163K	163K	Text	20 RPM, 200 RPD
qwen/qwen3.6-plus:free	1M	65K	Text	20 RPM, 200 RPD
qwen/qwen3-coder-480b-a35b:free	262K	~32K	Text	20 RPM, 200 RPD
meta-llama/llama-4-scout:free	10M	16K	Multimodal	20 RPM, 200 RPD
meta-llama/llama-4-maverick:free	1M	16K	Multimodal	20 RPM, 200 RPD
meta-llama/llama-3.3-70b-instruct:free	65K	~16K	Text	20 RPM, 200 RPD
google/gemma-4-31b-it:free	256K	~8K	Multimodal	20 RPM, 200 RPD
nvidia/nemotron-3-super-120b-a12b:free	1M	~32K	Text	20 RPM, 200 RPD
openai/gpt-oss-120b:free	131K	131K	Text	20 RPM, 200 RPD
minimax/minimax-m2.5:free	196K	8K	Text	20 RPM, 200 RPD
mistralai/devstral-2512:free	256K	~32K	Text	20 RPM, 200 RPD
+ ~23 more free models	Varies	Varies	Text / Image	20 RPM, 200 RPD

SiliconFlow 🇨🇳

Likely works with the listed URL

Free tier with 14 CNY signup credits. Permanently free models available.

Free Tier Snapshot

Signup credits plus some permanently free models.

API URL For Coeus

https://api.siliconflow.cn/v1

Upstream Base URL

https://api.siliconflow.cn/v1

Notes

Likely works with the listed URL.

Model Name	Context	Max Output	Modality	Rate Limit
Qwen/Qwen3-8B	131K	131K	Text	1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B	~33K	16K	Text (reasoning)	1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	131K	Configurable	Text (reasoning)	1,000 RPM, 50K TPM
THUDM/glm-4-9b-chat	32K	32K	Text	1,000 RPM, 50K TPM
THUDM/GLM-4.1V-9B-Thinking	66K	66K	Vision + Text	1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-OCR	—	8K	Vision (OCR)	1,000 RPM, 50K TPM
+ embedding/speech models	Varies	Varies	Embeddings, Speech	1,000 RPM, 50K TPM

Coeus Compatibility Shortcuts

Cohere: use https://api.cohere.ai/compatibility/v1, not the native v2 endpoint.
Cerebras: use https://api.cerebras.ai/v1, but treat the Limits page in your Cerebras account as the source of truth for current free-tier context and rate limits.
Google Gemini: use https://generativelanguage.googleapis.com/v1beta/openai, not the raw Gemini REST base URL.
Cloudflare Workers AI: use the OpenAI-compatible .../ai/v1 path, not .../ai/run.
GitHub Models: use https://models.github.ai/inference with a token that has models:read.
Hugging Face: use https://router.huggingface.co/v1, not the raw Inference API models path.
Ollama Cloud: the safest Coeus setup is still local Ollama at http://localhost:11434/v1. After ollama signin, cloud-backed Ollama models can be used through that local OpenAI-compatible endpoint.

What "Likely Works" Means

This page is meant to help you find promising free providers quickly. It does not mean every provider here has been tested end-to-end by Coeus docs.

In practice, a provider is a good Coeus candidate when:

it exposes an OpenAI-compatible chat completions API
it accepts a base URL and API key in the usual OpenAI client shape
it supports a text chat model Coeus can call through /chat/completions

If you want the lowest-friction options first, start with:

OpenRouter
Groq
Mistral AI
Cerebras
Local Ollama

Rate Limit Glossary

Abbreviation	Meaning
RPM	Requests per minute
RPD	Requests per day
TPM	Tokens per minute
TPD	Tokens per day
RPS	Requests per second

How To Use These In Coeus​

Provider Index​

Provider APIs

Cohere 🇨🇦

Google Gemini 🇺🇸

Mistral AI 🇫🇷

Z AI (Zhipu AI) 🇨🇳

Inference Providers

AIHubMix 🇺🇸

Cerebras 🇺🇸

Cloudflare Workers AI 🇺🇸

GitHub Models 🇺🇸

Groq 🇺🇸

Hugging Face 🇺🇸

Kilo Code 🇺🇸

LLM7.io 🇬🇧

ModelScope 🇨🇳

NVIDIA NIM 🇺🇸

Ollama Cloud 🇺🇸

OpenRouter 🇺🇸

SiliconFlow 🇨🇳

Coeus Compatibility Shortcuts​

What "Likely Works" Means​

Rate Limit Glossary​

How To Use These In Coeus

Provider Index

Coeus Compatibility Shortcuts

What "Likely Works" Means

Rate Limit Glossary