Free LLM APIs for Coeus
Looking for a free model or a generous free tier? This page tracks providers from mnfst/awesome-free-llm-apis plus a small number of manually reviewed additions that are good candidates for Coeus's OpenAI-compatible API provider.
Last reviewed in Coeus docs: 2026-04-20
Upstream source last updated: 2026-04-20
Free tiers, model availability, rate limits, and compatibility can change at any time. Always double-check the provider's current docs before relying on a free tier in production.
This page uses a committed local snapshot, not a live fetch at build time.
- Source repository: mnfst/awesome-free-llm-apis
- Source data file:
data.json - Snapshot size in this docs page: 17 providers
How To Use These In Coeus
- Open Settings → AI Providers.
- Configure OpenAI-compatible API.
- Paste the API URL for Coeus shown in the provider section below.
- Add the provider's API key and a supported model name from that provider's list.
The important detail: the upstream project's base URL is not always the exact URL that Coeus should use. For a few providers, you need their OpenAI compatibility layer instead of their native API route.
For voice transcription, Groq is also available in Coeus under Settings → Integrations → Speech & Transcription. It is useful when local Whisper is too heavy for your machine. Groq has free API limits for getting started, and paid accounts can use higher limits. Check Groq's current rate limit page before using it for long recordings or frequent transcription.
Provider Index
Provider APIs
APIs run by the companies that train or fine-tune the models themselves. Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only. Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products. Free "Experiment" plan, no credit card. ~1B tokens/month. Permanent free models, no credit card required.Cohere 🇨🇦
Use the Coeus compatibility URL shown belowModel Name Context Max Output Modality Rate Limit Command A (111B) 256K 4K Text 20 RPM Command R+ 128K 4K Text 20 RPM Command R 128K 4K Text 20 RPM Command R7B 128K 4K Text 20 RPM Embed 4 — — Embeddings (Text + Image) 2,000 inputs/min Rerank 3.5 — — Reranking 10 RPM Google Gemini 🇺🇸
Use the Coeus compatibility URL shown belowModel Name Context Max Output Modality Rate Limit Gemini 2.5 Flash 1M 65K Text + Image + Audio + Video 10 RPM, 250 RPD Gemini 2.5 Flash-Lite 1M 65K Text + Image + Audio + Video 15 RPM, 1,000 RPD Mistral AI 🇫🇷
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit Mistral Small 4 256K 256K Text + Image + Code ~1 RPS, 500K TPM Mistral Medium 3 128K 128K Text ~1 RPS, 500K TPM Mistral Large 3 256K 256K Text ~1 RPS, 500K TPM Mistral Nemo (12B) 128K 128K Text ~1 RPS, 500K TPM Codestral 256K 256K Code ~1 RPS, 500K TPM Pixtral Large 128K 128K Text + Image ~1 RPS, 500K TPM Z AI (Zhipu AI) 🇨🇳
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit GLM-4.7-Flash 200K 128K Text 1 concurrent request GLM-4.5-Flash 128K ~8K Text 1 concurrent request GLM-4.6V-Flash 128K ~4K Text + Image 1 concurrent request
Inference Providers
Platforms that host or route models from multiple sources. OpenAI-compatible router with a large live catalog. Public docs and the live models API expose many explicitly free LLM variants. Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap. 10,000 Neurons/day free. 50+ models available on free tier. Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out). Free tier, no credit card. Ultra-fast LPU inference. Free Serverless Inference API + ~$0.10/month free credits. Thousands of models. Free models with no credit card required. `kilo-auto/free` auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). Zero-friction API gateway. No registration needed for basic access. 30+ models. Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification. Free with NVIDIA Developer Program membership. 100+ models. No daily token cap. Free tier with qualitative usage limits. 400+ models from Ollama library. Not OpenAI SDK-compatible; uses [Ollama API](https://docs.ollama.com/cloud). 35+ free models (marked with `:free` suffix). OpenAI SDK-compatible. Free tier with 14 CNY signup credits. Permanently free models available.AIHubMix 🇺🇸
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit Gemini 3 Flash Preview (free) 1,048,576 65,536 Text + Image + Audio + Video 5 RPM, 250 RPD, 500K daily tokens GPT 4.1 (free) 1,047,576 32,768 Text + Image Not shown in public models API GPT 4.1 Mini (free) 1,047,576 32,768 Text + Image Not shown in public models API GPT 4.1 Nano (free) 1,047,576 32,768 Text + Image Not shown in public models API GPT 4o (free) 1,047,576 32,768 Text + Image Not shown in public models API K2.6 Code Preview (free) 256,000 256,000 Text 5 RPM, 500 RPD, 1M daily tokens Kimi For Coding (free) 256,000 256,000 Text 5 RPM, 500 RPD, 1M daily tokens MiMo V2 Flash (free) 256,000 256,000 Text Not shown in public models API Step 3.5 Flash (free) 256,000 Not shown Text + Image 5 RPM, 250 RPD, 500K daily tokens Coding MiniMax M2.7 (free) 204,800 13,100 Text 5 RPM, 500 RPD, 1M daily tokens Coding GLM 4.6 (free) 200,000 128,000 Text 5 RPM, 500 RPD, 1M daily tokens + more explicit free coding/router models Varies Varies Mostly text Varies Cerebras 🇺🇸
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit llama3.1-8b 8,192 8K Text 30 RPM, 60K TPM, 14,400 RPD, 1M TPD gpt-oss-120b 128K (8K on free) 8K Text 30 RPM, 14,400 RPD, 1M TPD qwen-3-235b-a22b-instruct-2507 65,536 8K Text 5 RPM, 30K TPM, 14,400 RPD, 1M TPD zai-glm-4.7 128K (8K on free) 8K Text 10 RPM, 100 RPD, 1M TPD Cloudflare Workers AI 🇺🇸
Use the Coeus compatibility URL shown belowModel Name Context Max Output Modality Rate Limit @cf/meta/llama-3.3-70b-instruct-fp8-fast 131K Shared w/ context Text 10K neurons/day (shared) @cf/meta/llama-3.1-8b-instruct-fp8-fast 131K Shared w/ context Text 10K neurons/day (shared) @cf/meta/llama-3.2-11b-vision-instruct 131K Shared w/ context Text + Vision 10K neurons/day (shared) @cf/meta/llama-4-scout-17b-16e-instruct Up to 10M Shared w/ context Multimodal 10K neurons/day (shared) @cf/mistralai/mistral-small-3.1-24b-instruct 128K Shared w/ context Text 10K neurons/day (shared) @cf/google/gemma-4-26b-a4b-it 256K Shared w/ context Text 10K neurons/day (shared) @cf/qwen/qwq-32b 32K Shared w/ context Text 10K neurons/day (shared) @cf/deepseek-ai/deepseek-r1-distill-qwen-32b 32K Shared w/ context Text 10K neurons/day (shared) + 42 more models Varies Varies Text, Image, Audio, Embeddings 10K neurons/day (shared) GitHub Models 🇺🇸
Use the Coeus compatibility URL shown belowModel Name Context Max Output Modality Rate Limit gpt-4.1 1M 32K Text 10 RPM, 50 RPD gpt-4.1-mini 1M 32K Text 15 RPM, 150 RPD gpt-4o 128K 16K Text + Vision 10 RPM, 50 RPD o3-mini 200K 100K Text (reasoning) 10 RPM, 50 RPD o4-mini 200K 100K Text (reasoning) 10 RPM, 50 RPD Llama-4-Scout-17B-16E 512K ~4K Text + Vision 15 RPM, 150 RPD Llama-4-Maverick-17B-128E 256K ~4K Text + Vision 10 RPM, 50 RPD Meta-Llama-3.3-70B 131K ~4K Text 15 RPM, 150 RPD DeepSeek-R1 64K 8K Text (reasoning) 15 RPM, 150 RPD Mistral-Small-3.1 128K ~4K Text + Vision 15 RPM, 150 RPD + 35 more models Varies Varies Text / Image Varies by tier Groq 🇺🇸
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit llama-3.3-70b-versatile 131K 32K Text 30 RPM, 14,400 RPD llama-3.1-8b-instant 131K 131K Text 30 RPM, 14,400 RPD llama-4-scout-17b-16e-instruct 131K 8K Text + Vision 30 RPM, 14,400 RPD llama-4-maverick-17b-128e-instruct 131K 8K Text + Vision 15 RPM, 500 RPD qwen3-32b 131K 131K Text 30 RPM, 14,400 RPD gpt-oss-120b 131K 32K Text 30 RPM, 14,400 RPD kimi-k2-instruct 262K 262K Text 30 RPM, 14,400 RPD deepseek-r1-distill-70b 131K 8K Text 30 RPM, 14,400 RPD whisper-large-v3 — — Audio → Text 20 RPM, 2,000 RPD whisper-large-v3-turbo — — Audio → Text 20 RPM, 2,000 RPD Hugging Face 🇺🇸
Use the Coeus compatibility URL shown belowModel Name Context Max Output Modality Rate Limit Meta-Llama-3.1-8B-Instruct 128K ~4K Text ~1,000 RPD Mistral-7B-Instruct-v0.3 32K ~4K Text ~1,000 RPD Mixtral-8x7B-Instruct-v0.1 32K ~4K Text ~1,000 RPD Phi-3.5-mini-instruct 128K ~4K Text ~1,000 RPD Qwen2.5-7B-Instruct 131K ~4K Text ~1,000 RPD + thousands of community models Varies Varies Text, Image, Audio, Embeddings ~$0.10/month free credits Kilo Code 🇺🇸
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit bytedance-seed/dola-seed-2.0-pro:free — — Text ~200 req/hr x-ai/grok-code-fast-1:optimized:free — — Text (code) ~200 req/hr nvidia/nemotron-3-super-120b-a12b:free 262K 32K Text ~200 req/hr arcee-ai/trinity-large-thinking:free — — Text (reasoning) ~200 req/hr openrouter/free Varies Varies Text ~200 req/hr LLM7.io 🇬🇧
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit deepseek-r1-0528 — — Text (reasoning) 30 RPM (120 with token) deepseek-v3-0324 — — Text 30 RPM (120 with token) gemini-2.5-flash-lite — — Text + Vision 30 RPM (120 with token) gpt-4o-mini — — Text + Vision 30 RPM (120 with token) mistral-small-3.1-24b 32K — Text 30 RPM (120 with token) qwen2.5-coder-32b — — Text (code) 30 RPM (120 with token) + ~24 more models Varies Varies Text 30 RPM (120 with token) ModelScope 🇨🇳
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit Qwen/Qwen3.5-35B-A3B — — Text + Vision 2,000 RPD total; <=500 RPD/model (dynamic) Qwen/Qwen3.5-27B — — Text 2,000 RPD total; <=500 RPD/model (dynamic) Qwen/Qwen-Image — — Image Generation 2,000 RPD total; model/AIGC-specific caps + API-Inference-enabled models Varies Varies LLM, MLLM, AIGC Dynamic quotas + dynamic concurrency NVIDIA NIM 🇺🇸
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit deepseek-ai/deepseek-r1 128K ~163K Text (reasoning) ~40 RPM nvidia/llama-3.1-nemotron-ultra-253b-v1 128K 4K Text ~40 RPM nvidia/nemotron-3-super-120b-a12b 262K 262K Text ~40 RPM nvidia/nemotron-3-nano-30b-a3b 128K 32K Text ~40 RPM meta/llama-3.1-405b-instruct 128K 4K Text ~40 RPM qwen/qwen2.5-72b-instruct 128K 8K Text ~40 RPM google/gemma-4-31b 128K 8K Text ~40 RPM mistralai/mistral-large-2-instruct 128K 4K Text ~40 RPM nvidia/nemotron-nano-2-vl 128K 8K Vision + Text + Video ~40 RPM minimax/minimax-m2.7 128K 8K Text ~40 RPM + 90 more models Varies Varies Text, Image, Video, Speech, Embeddings ~40 RPM Ollama Cloud 🇺🇸
Use local Ollama v1 for CoeusModel Name Context Max Output Modality Rate Limit llama3.1:cloud 128K Model-dependent Text Session/weekly limits (unpublished) deepseek-r1:cloud 128K Model-dependent Text (reasoning) Session/weekly limits (unpublished) qwen2.5:cloud 128K Model-dependent Text Session/weekly limits (unpublished) gemma2:cloud 8K Model-dependent Text Session/weekly limits (unpublished) mistral:cloud 32K Model-dependent Text Session/weekly limits (unpublished) + 400 more models Varies Varies Text Session/weekly limits (unpublished) OpenRouter 🇺🇸
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit deepseek/deepseek-r1-0528:free 163K ~163K Text (reasoning) 20 RPM, 200 RPD deepseek/deepseek-chat-v3-0324:free 163K 163K Text 20 RPM, 200 RPD qwen/qwen3.6-plus:free 1M 65K Text 20 RPM, 200 RPD qwen/qwen3-coder-480b-a35b:free 262K ~32K Text 20 RPM, 200 RPD meta-llama/llama-4-scout:free 10M 16K Multimodal 20 RPM, 200 RPD meta-llama/llama-4-maverick:free 1M 16K Multimodal 20 RPM, 200 RPD meta-llama/llama-3.3-70b-instruct:free 65K ~16K Text 20 RPM, 200 RPD google/gemma-4-31b-it:free 256K ~8K Multimodal 20 RPM, 200 RPD nvidia/nemotron-3-super-120b-a12b:free 1M ~32K Text 20 RPM, 200 RPD openai/gpt-oss-120b:free 131K 131K Text 20 RPM, 200 RPD minimax/minimax-m2.5:free 196K 8K Text 20 RPM, 200 RPD mistralai/devstral-2512:free 256K ~32K Text 20 RPM, 200 RPD + ~23 more free models Varies Varies Text / Image 20 RPM, 200 RPD SiliconFlow 🇨🇳
Likely works with the listed URLModel Name Context Max Output Modality Rate Limit Qwen/Qwen3-8B 131K 131K Text 1,000 RPM, 50K TPM deepseek-ai/DeepSeek-R1-0528-Qwen3-8B ~33K 16K Text (reasoning) 1,000 RPM, 50K TPM deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 131K Configurable Text (reasoning) 1,000 RPM, 50K TPM THUDM/glm-4-9b-chat 32K 32K Text 1,000 RPM, 50K TPM THUDM/GLM-4.1V-9B-Thinking 66K 66K Vision + Text 1,000 RPM, 50K TPM deepseek-ai/DeepSeek-OCR — 8K Vision (OCR) 1,000 RPM, 50K TPM + embedding/speech models Varies Varies Embeddings, Speech 1,000 RPM, 50K TPM
Coeus Compatibility Shortcuts
- Cohere: use
https://api.cohere.ai/compatibility/v1, not the nativev2endpoint. - Cerebras: use
https://api.cerebras.ai/v1, but treat the Limits page in your Cerebras account as the source of truth for current free-tier context and rate limits. - Google Gemini: use
https://generativelanguage.googleapis.com/v1beta/openai, not the raw Gemini REST base URL. - Cloudflare Workers AI: use the OpenAI-compatible
.../ai/v1path, not.../ai/run. - GitHub Models: use
https://models.github.ai/inferencewith a token that hasmodels:read. - Hugging Face: use
https://router.huggingface.co/v1, not the raw Inference API models path. - Ollama Cloud: the safest Coeus setup is still local Ollama at
http://localhost:11434/v1. Afterollama signin, cloud-backed Ollama models can be used through that local OpenAI-compatible endpoint.
What "Likely Works" Means
This page is meant to help you find promising free providers quickly. It does not mean every provider here has been tested end-to-end by Coeus docs.
In practice, a provider is a good Coeus candidate when:
- it exposes an OpenAI-compatible chat completions API
- it accepts a base URL and API key in the usual OpenAI client shape
- it supports a text chat model Coeus can call through
/chat/completions
If you want the lowest-friction options first, start with:
- OpenRouter
- Groq
- Mistral AI
- Cerebras
- Local Ollama
Rate Limit Glossary
| Abbreviation | Meaning |
|---|---|
| RPM | Requests per minute |
| RPD | Requests per day |
| TPM | Tokens per minute |
| TPD | Tokens per day |
| RPS | Requests per second |