
Awesome Free Llm Apis
Pick a permanent free-tier LLM endpoint with known rate limits and OpenAI-compatible wiring before you prototype or ship agent features.
Overview
Awesome Free LLM APIs is an agent skill most often used in Build (also Validate, Grow) that maps permanent free LLM endpoints, limits, and OpenAI-compatible integration options.
Install
npx skills add https://github.com/aradotso/trending-skills --skill awesome-free-llm-apisWhat is this skill?
- Curated permanent free tiers (no trial-credit expiry) for text inference
- Provider tables with notable models, rate limits, and region notes
- Splits company-trained APIs vs inference hosts for open-weight models
- OpenAI SDK-compatible endpoints called out unless an exception applies
- Trigger phrases cover free GPT-style keys and no-cost inference search
- Two provider table groups: company-trained APIs and inference hosts for open-weight models
- Documented sample limits include Cohere 20 RPM / 1K req/mo and Mistral 1 req/s / 1B tok/mo
Adoption & trust: 833 installs on skills.sh; 31 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need LLM inference in your app but cannot tell which providers still offer a real free tier, what the caps are, or how to plug them in cheaply.
Who is it for?
Indie hackers wiring first LLM features, eval scripts, or low-traffic agents before committing to a paid provider.
Skip if: Production workloads needing guaranteed uptime, enterprise data residency, or undisclosed quotas you cannot monitor.
When should I use this skill?
User asks for free LLM API, free AI API key, free GPT API, no-cost inference endpoint, or which LLM has a free API.
What do I get? / Deliverables
You shortlist vetted free APIs with documented limits and compatibility notes so you can prototype or ship agent calls with the right key and region choice.
- Shortlisted provider with model and rate-limit fit
- Integration notes for OpenAI-compatible clients
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Build integrations is the primary shelf because the skill is a provider comparison for wiring inference into apps and agents. Integrations matches choosing external API keys, RPM/RPD limits, and SDK-compatible base URLs—not writing model training code.
Where it fits
Compare Gemini Flash vs Mistral Small quotas before you demo a chat MVP on a shoestring budget.
Select an OpenAI-compatible base URL and document RPM limits in your agent’s config.
Swap evaluation prompts to a higher free RPD provider when running nightly quality checks.
How it compares
Use as a curated free-tier directory instead of random blog lists that omit RPM, RPD, and regional restrictions.
Common Questions / FAQ
Who is awesome-free-llm-apis for?
Solo builders and small teams choosing no-cost inference for prototypes, side projects, and agent tooling without subscribing to a paid model plan first.
When should I use awesome-free-llm-apis?
Use it during Validate prototyping when you need a key fast, during Build integrations when picking a default endpoint, and during Grow experiments when testing new models under free caps.
Is awesome-free-llm-apis safe to install?
The skill is reference documentation; API keys and data still flow to third-party providers—review the Security Audits panel on this page and each vendor’s terms before sending production user data.
SKILL.md
READMESKILL.md - Awesome Free Llm Apis
# Awesome Free LLM APIs > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. A curated list of LLM providers offering **permanent free tiers** for text inference — no trial credits, no expiry. All endpoints listed are OpenAI SDK-compatible unless noted. --- ## Provider Overview ### Provider APIs (trained/fine-tuned by the company) | Provider | Notable Models | Rate Limits | Region | |---|---|---|---| | [Cohere](https://dashboard.cohere.com/api-keys) | Command A, Command R+, Aya Expanse 32B | 20 RPM, 1K req/mo | 🇺🇸 | | [Google Gemini](https://aistudio.google.com/app/apikey) | Gemini 2.5 Pro, Flash, Flash-Lite | 5–15 RPM, 100–1K RPD | 🇺🇸 (not EU/UK/CH) | | [Mistral AI](https://console.mistral.ai/api-keys) | Mistral Large 3, Small 3.1, Ministral 8B | 1 req/s, 1B tok/mo | 🇪🇺 | | [Zhipu AI](https://open.bigmodel.cn/usercenter/apikeys) | GLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash | Undocumented | 🇨🇳 | ### Inference Providers (host open-weight models) | Provider | Notable Models | Rate Limits | Region | |---|---|---|---| | [Cerebras](https://cloud.cerebras.ai/) | Llama 3.3 70B, Qwen3 235B, GPT-OSS-120B | 30 RPM, 14,400 RPD | 🇺🇸 | | [Cloudflare Workers AI](https://dash.cloudflare.com/profile/api-tokens) | Llama 3.3 70B, Qwen QwQ 32B | 10K neurons/day | 🇺🇸 | | [GitHub Models](https://github.com/marketplace/models) | GPT-4o, Llama 3.3 70B, DeepSeek-R1 | 10–15 RPM, 50–150 RPD | 🇺🇸 | | [Groq](https://console.groq.com/keys) | Llama 3.3 70B, Llama 4 Scout, Kimi K2 | 30 RPM, 1K RPD | 🇺🇸 | | [Hugging Face](https://huggingface.co/settings/tokens) | Llama 3.3 70B, Qwen2.5 72B, Mistral 7B | $0.10/mo free credits | 🇺🇸 | | [Kluster AI](https://platform.kluster.ai/apikeys) | DeepSeek-R1, Llama 4 Maverick, Qwen3-235B | Undocumented | 🇺🇸 | | [LLM7.io](https://token.llm7.io) | DeepSeek R1, Flash-Lite, Qwen2.5 Coder | 30 RPM (120 with token) | 🇬🇧 | | [NVIDIA NIM](https://build.nvidia.com/explore/discover) | Llama 3.3 70B, Mistral Large, Qwen3 235B | 40 RPM | 🇺🇸 | | [Ollama Cloud](https://ollama.com/settings/keys) | DeepSeek-V3.2, Qwen3.5, Kimi-K2.5 | 1 concurrent, light usage | 🇺🇸 | | [OpenRouter](https://openrouter.ai/keys) | DeepSeek R1, Llama 3.3 70B, GPT-OSS-120B | 20 RPM, 50 RPD (1K with $10+) | 🇺🇸 | --- ## Getting API Keys Each provider has its own key management page: ```bash # Store keys as environment variables — never hardcode them export GROQ_API_KEY="your_groq_key" export GEMINI_API_KEY="your_gemini_key" export OPENROUTER_API_KEY="your_openrouter_key" export MISTRAL_API_KEY="your_mistral_key" export COHERE_API_KEY="your_cohere_key" export CEREBRAS_API_KEY="your_cerebras_key" export GITHUB_TOKEN="your_github_pat" export HF_TOKEN="your_huggingface_token" export NVIDIA_API_KEY="your_nvidia_key" export CLOUDFLARE_API_TOKEN="your_cf_token" export CLOUDFLARE_ACCOUNT_ID="your_cf_account_id" ``` --- ## OpenAI SDK Integration All providers (except Ollama Cloud) are OpenAI SDK-compatible — just swap the `base_url` and `api_key`. ### Python ```python from openai import OpenAI # ── Groq ────────────────────────────────────────────────────────────────────── client = OpenAI( base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"], ) response = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content) # ── Google Gemini ───────────────────────────────────────────────────────────── client = OpenAI( base_url="https://generativelanguage.googleapis.com/v1beta/openai/", api_key=os.e