Skip to main content

Free LLM APIs for Coeus

Looking for a free model or a generous free tier? This page tracks providers from mnfst/awesome-free-llm-apis plus a small number of manually reviewed additions that are good candidates for Coeus's OpenAI-compatible API provider.

Last reviewed in Coeus docs: 2026-04-20

Upstream source last updated: 2026-04-20

Free tiers, model availability, rate limits, and compatibility can change at any time. Always double-check the provider's current docs before relying on a free tier in production.

This page uses a committed local snapshot, not a live fetch at build time.

How To Use These In Coeus

  1. Open Settings → AI Providers.
  2. Configure OpenAI-compatible API.
  3. Paste the API URL for Coeus shown in the provider section below.
  4. Add the provider's API key and a supported model name from that provider's list.

The important detail: the upstream project's base URL is not always the exact URL that Coeus should use. For a few providers, you need their OpenAI compatibility layer instead of their native API route.

For voice transcription, Groq is also available in Coeus under Settings → Integrations → Speech & Transcription. It is useful when local Whisper is too heavy for your machine. Groq has free API limits for getting started, and paid accounts can use higher limits. Check Groq's current rate limit page before using it for long recordings or frequent transcription.

Provider Index

Provider APIs

APIs run by the companies that train or fine-tune the models themselves.

Cohere 🇨🇦

Use the Coeus compatibility URL shown below

Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only.

Free Tier Snapshot
Trial key with 1,000 API calls/month.
API URL For Coeus
https://api.cohere.ai/compatibility/v1
Upstream Base URL
https://api.cohere.com/v2
Notes
Use Cohere's compatibility URL for Coeus, not the native v2 endpoint.
Model NameContextMax OutputModalityRate Limit
Command A (111B)256K4KText20 RPM
Command R+128K4KText20 RPM
Command R128K4KText20 RPM
Command R7B128K4KText20 RPM
Embed 4Embeddings (Text + Image)2,000 inputs/min
Rerank 3.5Reranking10 RPM

Google Gemini 🇺🇸

Use the Coeus compatibility URL shown below

Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products.

Free Tier Snapshot
Free tier with rate limits and region restrictions.
API URL For Coeus
https://generativelanguage.googleapis.com/v1beta/openai
Upstream Base URL
https://generativelanguage.googleapis.com/v1beta
Notes
Use Gemini's OpenAI compatibility layer, not the raw Gemini REST base URL.
Model NameContextMax OutputModalityRate Limit
Gemini 2.5 Flash1M65KText + Image + Audio + Video10 RPM, 250 RPD
Gemini 2.5 Flash-Lite1M65KText + Image + Audio + Video15 RPM, 1,000 RPD

Mistral AI 🇫🇷

Likely works with the listed URL

Free "Experiment" plan, no credit card. ~1B tokens/month.

Free Tier Snapshot
Experiment plan with roughly 1B tokens/month.
API URL For Coeus
https://api.mistral.ai/v1
Upstream Base URL
https://api.mistral.ai/v1
Notes
Likely works with the listed URL. Coeus already calls out Mistral as a common OpenAI-compatible option.
Model NameContextMax OutputModalityRate Limit
Mistral Small 4256K256KText + Image + Code~1 RPS, 500K TPM
Mistral Medium 3128K128KText~1 RPS, 500K TPM
Mistral Large 3256K256KText~1 RPS, 500K TPM
Mistral Nemo (12B)128K128KText~1 RPS, 500K TPM
Codestral256K256KCode~1 RPS, 500K TPM
Pixtral Large128K128KText + Image~1 RPS, 500K TPM

Z AI (Zhipu AI) 🇨🇳

Likely works with the listed URL

Permanent free models, no credit card required.

Free Tier Snapshot
Permanent free models with strict concurrency limits.
API URL For Coeus
https://open.bigmodel.cn/api/paas/v4
Upstream Base URL
https://open.bigmodel.cn/api/paas/v4
Notes
Likely works with the listed URL.
Model NameContextMax OutputModalityRate Limit
GLM-4.7-Flash200K128KText1 concurrent request
GLM-4.5-Flash128K~8KText1 concurrent request
GLM-4.6V-Flash128K~4KText + Image1 concurrent request

Inference Providers

Platforms that host or route models from multiple sources.

AIHubMix 🇺🇸

Likely works with the listed URL

OpenAI-compatible router with a large live catalog. Public docs and the live models API expose many explicitly free LLM variants.

Free Tier Snapshot
The live models API currently exposes 20+ explicit free LLM entries, but the exact free subset changes.
API URL For Coeus
https://aihubmix.com/v1
Upstream Base URL
https://aihubmix.com/v1
Notes
Use the listed URL in Coeus. AIHubMix's public /api/v1/models endpoint returned 22 explicit free LLM IDs on April 20, 2026, so treat the count and free subset as live data rather than a fixed promise.
Model NameContextMax OutputModalityRate Limit
Gemini 3 Flash Preview (free)1,048,57665,536Text + Image + Audio + Video5 RPM, 250 RPD, 500K daily tokens
GPT 4.1 (free)1,047,57632,768Text + ImageNot shown in public models API
GPT 4.1 Mini (free)1,047,57632,768Text + ImageNot shown in public models API
GPT 4.1 Nano (free)1,047,57632,768Text + ImageNot shown in public models API
GPT 4o (free)1,047,57632,768Text + ImageNot shown in public models API
K2.6 Code Preview (free)256,000256,000Text5 RPM, 500 RPD, 1M daily tokens
Kimi For Coding (free)256,000256,000Text5 RPM, 500 RPD, 1M daily tokens
MiMo V2 Flash (free)256,000256,000TextNot shown in public models API
Step 3.5 Flash (free)256,000Not shownText + Image5 RPM, 250 RPD, 500K daily tokens
Coding MiniMax M2.7 (free)204,80013,100Text5 RPM, 500 RPD, 1M daily tokens
Coding GLM 4.6 (free)200,000128,000Text5 RPM, 500 RPD, 1M daily tokens
+ more explicit free coding/router modelsVariesVariesMostly textVaries

Cerebras 🇺🇸

Likely works with the listed URL

Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap.

Free Tier Snapshot
Advertised as 1M tokens/day, but free-tier context and RPM can be temporarily reduced per model.
API URL For Coeus
https://api.cerebras.ai/v1
Upstream Base URL
https://api.cerebras.ai/v1
Notes
Use the listed URL. Check Cerebras -> Limits in your account for exact quotas; on April 20, 2026, a personal free-tier dashboard showed llama3.1-8b at 8,192 context and qwen-3-235b-a22b-instruct-2507 at 65,536 context / 5 RPM, and Cerebras noted temporary free-tier reductions.
Model NameContextMax OutputModalityRate Limit
llama3.1-8b8,1928KText30 RPM, 60K TPM, 14,400 RPD, 1M TPD
gpt-oss-120b128K (8K on free)8KText30 RPM, 14,400 RPD, 1M TPD
qwen-3-235b-a22b-instruct-250765,5368KText5 RPM, 30K TPM, 14,400 RPD, 1M TPD
zai-glm-4.7128K (8K on free)8KText10 RPM, 100 RPD, 1M TPD

Cloudflare Workers AI 🇺🇸

Use the Coeus compatibility URL shown below

10,000 Neurons/day free. 50+ models available on free tier.

Free Tier Snapshot
10,000 neurons/day on the free tier.
API URL For Coeus
https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1
Upstream Base URL
https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run
Notes
Use Cloudflare's OpenAI-compatible ai/v1 path, not the native ai/run path.
Model NameContextMax OutputModalityRate Limit
@cf/meta/llama-3.3-70b-instruct-fp8-fast131KShared w/ contextText10K neurons/day (shared)
@cf/meta/llama-3.1-8b-instruct-fp8-fast131KShared w/ contextText10K neurons/day (shared)
@cf/meta/llama-3.2-11b-vision-instruct131KShared w/ contextText + Vision10K neurons/day (shared)
@cf/meta/llama-4-scout-17b-16e-instructUp to 10MShared w/ contextMultimodal10K neurons/day (shared)
@cf/mistralai/mistral-small-3.1-24b-instruct128KShared w/ contextText10K neurons/day (shared)
@cf/google/gemma-4-26b-a4b-it256KShared w/ contextText10K neurons/day (shared)
@cf/qwen/qwq-32b32KShared w/ contextText10K neurons/day (shared)
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b32KShared w/ contextText10K neurons/day (shared)
+ 42 more modelsVariesVariesText, Image, Audio, Embeddings10K neurons/day (shared)

GitHub Models 🇺🇸

Use the Coeus compatibility URL shown below

Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out).

Free Tier Snapshot
Free prototyping with per-request limits for GitHub users.
API URL For Coeus
https://models.github.ai/inference
Upstream Base URL
https://models.inference.ai.azure.com
Notes
Use a GitHub token with models:read. The GitHub Models chat completions endpoint differs from the upstream snapshot URL.
Model NameContextMax OutputModalityRate Limit
gpt-4.11M32KText10 RPM, 50 RPD
gpt-4.1-mini1M32KText15 RPM, 150 RPD
gpt-4o128K16KText + Vision10 RPM, 50 RPD
o3-mini200K100KText (reasoning)10 RPM, 50 RPD
o4-mini200K100KText (reasoning)10 RPM, 50 RPD
Llama-4-Scout-17B-16E512K~4KText + Vision15 RPM, 150 RPD
Llama-4-Maverick-17B-128E256K~4KText + Vision10 RPM, 50 RPD
Meta-Llama-3.3-70B131K~4KText15 RPM, 150 RPD
DeepSeek-R164K8KText (reasoning)15 RPM, 150 RPD
Mistral-Small-3.1128K~4KText + Vision15 RPM, 150 RPD
+ 35 more modelsVariesVariesText / ImageVaries by tier

Groq 🇺🇸

Likely works with the listed URL

Free tier, no credit card. Ultra-fast LPU inference.

Free Tier Snapshot
Free tier, no credit card, very fast inference.
API URL For Coeus
https://api.groq.com/openai/v1
Upstream Base URL
https://api.groq.com/openai/v1
Notes
Likely works with the listed URL. Good if you want speed more than model breadth.
Model NameContextMax OutputModalityRate Limit
llama-3.3-70b-versatile131K32KText30 RPM, 14,400 RPD
llama-3.1-8b-instant131K131KText30 RPM, 14,400 RPD
llama-4-scout-17b-16e-instruct131K8KText + Vision30 RPM, 14,400 RPD
llama-4-maverick-17b-128e-instruct131K8KText + Vision15 RPM, 500 RPD
qwen3-32b131K131KText30 RPM, 14,400 RPD
gpt-oss-120b131K32KText30 RPM, 14,400 RPD
kimi-k2-instruct262K262KText30 RPM, 14,400 RPD
deepseek-r1-distill-70b131K8KText30 RPM, 14,400 RPD
whisper-large-v3Audio → Text20 RPM, 2,000 RPD
whisper-large-v3-turboAudio → Text20 RPM, 2,000 RPD

Hugging Face 🇺🇸

Use the Coeus compatibility URL shown below

Free Serverless Inference API + ~$0.10/month free credits. Thousands of models.

Free Tier Snapshot
Free serverless inference plus small monthly credits.
API URL For Coeus
https://router.huggingface.co/v1
Upstream Base URL
https://api-inference.huggingface.co/models
Notes
Use the OpenAI-compatible router endpoint for Coeus, not the raw Inference API models path.
Model NameContextMax OutputModalityRate Limit
Meta-Llama-3.1-8B-Instruct128K~4KText~1,000 RPD
Mistral-7B-Instruct-v0.332K~4KText~1,000 RPD
Mixtral-8x7B-Instruct-v0.132K~4KText~1,000 RPD
Phi-3.5-mini-instruct128K~4KText~1,000 RPD
Qwen2.5-7B-Instruct131K~4KText~1,000 RPD
+ thousands of community modelsVariesVariesText, Image, Audio, Embeddings~$0.10/month free credits

Kilo Code 🇺🇸

Likely works with the listed URL

Free models with no credit card required. `kilo-auto/free` auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%).

Free Tier Snapshot
Free models with a Kilo-hosted router.
API URL For Coeus
https://api.kilo.ai/api/gateway
Upstream Base URL
https://api.kilo.ai/api/gateway
Notes
Likely works with the listed URL, but check Kilo's current docs for model naming and headers.
Model NameContextMax OutputModalityRate Limit
bytedance-seed/dola-seed-2.0-pro:freeText~200 req/hr
x-ai/grok-code-fast-1:optimized:freeText (code)~200 req/hr
nvidia/nemotron-3-super-120b-a12b:free262K32KText~200 req/hr
arcee-ai/trinity-large-thinking:freeText (reasoning)~200 req/hr
openrouter/freeVariesVariesText~200 req/hr

LLM7.io 🇬🇧

Likely works with the listed URL

Zero-friction API gateway. No registration needed for basic access. 30+ models.

Free Tier Snapshot
Basic free access without registration plus higher tokened limits.
API URL For Coeus
https://api.llm7.io/v1
Upstream Base URL
https://api.llm7.io/v1
Notes
Likely works with the listed URL.
Model NameContextMax OutputModalityRate Limit
deepseek-r1-0528Text (reasoning)30 RPM (120 with token)
deepseek-v3-0324Text30 RPM (120 with token)
gemini-2.5-flash-liteText + Vision30 RPM (120 with token)
gpt-4o-miniText + Vision30 RPM (120 with token)
mistral-small-3.1-24b32KText30 RPM (120 with token)
qwen2.5-coder-32bText (code)30 RPM (120 with token)
+ ~24 more modelsVariesVariesText30 RPM (120 with token)

ModelScope 🇨🇳

Likely works with the listed URL

Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification.

Free Tier Snapshot
Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification.
API URL For Coeus
https://api-inference.modelscope.cn/v1
Upstream Base URL
https://api-inference.modelscope.cn/v1
Notes
Likely works with the listed URL, but check the provider docs for current auth and model naming.
Model NameContextMax OutputModalityRate Limit
Qwen/Qwen3.5-35B-A3BText + Vision2,000 RPD total; <=500 RPD/model (dynamic)
Qwen/Qwen3.5-27BText2,000 RPD total; <=500 RPD/model (dynamic)
Qwen/Qwen-ImageImage Generation2,000 RPD total; model/AIGC-specific caps
+ API-Inference-enabled modelsVariesVariesLLM, MLLM, AIGCDynamic quotas + dynamic concurrency

NVIDIA NIM 🇺🇸

Likely works with the listed URL

Free with NVIDIA Developer Program membership. 100+ models. No daily token cap.

Free Tier Snapshot
Free with NVIDIA Developer Program membership.
API URL For Coeus
https://integrate.api.nvidia.com/v1
Upstream Base URL
https://integrate.api.nvidia.com/v1
Notes
Likely works with the listed URL.
Model NameContextMax OutputModalityRate Limit
deepseek-ai/deepseek-r1128K~163KText (reasoning)~40 RPM
nvidia/llama-3.1-nemotron-ultra-253b-v1128K4KText~40 RPM
nvidia/nemotron-3-super-120b-a12b262K262KText~40 RPM
nvidia/nemotron-3-nano-30b-a3b128K32KText~40 RPM
meta/llama-3.1-405b-instruct128K4KText~40 RPM
qwen/qwen2.5-72b-instruct128K8KText~40 RPM
google/gemma-4-31b128K8KText~40 RPM
mistralai/mistral-large-2-instruct128K4KText~40 RPM
nvidia/nemotron-nano-2-vl128K8KVision + Text + Video~40 RPM
minimax/minimax-m2.7128K8KText~40 RPM
+ 90 more modelsVariesVariesText, Image, Video, Speech, Embeddings~40 RPM

Ollama Cloud 🇺🇸

Use local Ollama v1 for Coeus

Free tier with qualitative usage limits. 400+ models from Ollama library. Not OpenAI SDK-compatible; uses [Ollama API](https://docs.ollama.com/cloud).

Free Tier Snapshot
Cloud-backed models with qualitative usage limits.
API URL For Coeus
http://localhost:11434/v1
Upstream Base URL
https://api.ollama.com
Notes
Best fit for Coeus is local Ollama's OpenAI-compatible v1 endpoint. After ollama signin, cloud-backed models can flow through your local Ollama install.
Model NameContextMax OutputModalityRate Limit
llama3.1:cloud128KModel-dependentTextSession/weekly limits (unpublished)
deepseek-r1:cloud128KModel-dependentText (reasoning)Session/weekly limits (unpublished)
qwen2.5:cloud128KModel-dependentTextSession/weekly limits (unpublished)
gemma2:cloud8KModel-dependentTextSession/weekly limits (unpublished)
mistral:cloud32KModel-dependentTextSession/weekly limits (unpublished)
+ 400 more modelsVariesVariesTextSession/weekly limits (unpublished)

OpenRouter 🇺🇸

Likely works with the listed URL

35+ free models (marked with `:free` suffix). OpenAI SDK-compatible.

Free Tier Snapshot
35+ free models with a :free suffix.
API URL For Coeus
https://openrouter.ai/api/v1
Upstream Base URL
https://openrouter.ai/api/v1
Notes
Strong fit for Coeus. Good starting point if you want many models behind one key.
Model NameContextMax OutputModalityRate Limit
deepseek/deepseek-r1-0528:free163K~163KText (reasoning)20 RPM, 200 RPD
deepseek/deepseek-chat-v3-0324:free163K163KText20 RPM, 200 RPD
qwen/qwen3.6-plus:free1M65KText20 RPM, 200 RPD
qwen/qwen3-coder-480b-a35b:free262K~32KText20 RPM, 200 RPD
meta-llama/llama-4-scout:free10M16KMultimodal20 RPM, 200 RPD
meta-llama/llama-4-maverick:free1M16KMultimodal20 RPM, 200 RPD
meta-llama/llama-3.3-70b-instruct:free65K~16KText20 RPM, 200 RPD
google/gemma-4-31b-it:free256K~8KMultimodal20 RPM, 200 RPD
nvidia/nemotron-3-super-120b-a12b:free1M~32KText20 RPM, 200 RPD
openai/gpt-oss-120b:free131K131KText20 RPM, 200 RPD
minimax/minimax-m2.5:free196K8KText20 RPM, 200 RPD
mistralai/devstral-2512:free256K~32KText20 RPM, 200 RPD
+ ~23 more free modelsVariesVariesText / Image20 RPM, 200 RPD

SiliconFlow 🇨🇳

Likely works with the listed URL

Free tier with 14 CNY signup credits. Permanently free models available.

Free Tier Snapshot
Signup credits plus some permanently free models.
API URL For Coeus
https://api.siliconflow.cn/v1
Upstream Base URL
https://api.siliconflow.cn/v1
Notes
Likely works with the listed URL.
Model NameContextMax OutputModalityRate Limit
Qwen/Qwen3-8B131K131KText1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B~33K16KText (reasoning)1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B131KConfigurableText (reasoning)1,000 RPM, 50K TPM
THUDM/glm-4-9b-chat32K32KText1,000 RPM, 50K TPM
THUDM/GLM-4.1V-9B-Thinking66K66KVision + Text1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-OCR8KVision (OCR)1,000 RPM, 50K TPM
+ embedding/speech modelsVariesVariesEmbeddings, Speech1,000 RPM, 50K TPM

Coeus Compatibility Shortcuts

  • Cohere: use https://api.cohere.ai/compatibility/v1, not the native v2 endpoint.
  • Cerebras: use https://api.cerebras.ai/v1, but treat the Limits page in your Cerebras account as the source of truth for current free-tier context and rate limits.
  • Google Gemini: use https://generativelanguage.googleapis.com/v1beta/openai, not the raw Gemini REST base URL.
  • Cloudflare Workers AI: use the OpenAI-compatible .../ai/v1 path, not .../ai/run.
  • GitHub Models: use https://models.github.ai/inference with a token that has models:read.
  • Hugging Face: use https://router.huggingface.co/v1, not the raw Inference API models path.
  • Ollama Cloud: the safest Coeus setup is still local Ollama at http://localhost:11434/v1. After ollama signin, cloud-backed Ollama models can be used through that local OpenAI-compatible endpoint.

What "Likely Works" Means

This page is meant to help you find promising free providers quickly. It does not mean every provider here has been tested end-to-end by Coeus docs.

In practice, a provider is a good Coeus candidate when:

  • it exposes an OpenAI-compatible chat completions API
  • it accepts a base URL and API key in the usual OpenAI client shape
  • it supports a text chat model Coeus can call through /chat/completions

If you want the lowest-friction options first, start with:

  • OpenRouter
  • Groq
  • Mistral AI
  • Cerebras
  • Local Ollama

Rate Limit Glossary

AbbreviationMeaning
RPMRequests per minute
RPDRequests per day
TPMTokens per minute
TPDTokens per day
RPSRequests per second