Overview

Factory CLI supports custom model configurations through BYOK (Bring Your Own Key). Use your own OpenAI or Anthropic keys, connect to any open source model providers, or run models locally on your hardware. Once configured, switch between models using the /model command.

Your API keys remain local and are not uploaded to Factory servers. Custom models are only available in the CLI and won’t appear in Factory’s web or mobile platforms.

Install the CLI with the 5-minute quickstart →

Configuration Reference

Add custom models in ~/.factory/config.json under the custom_models array.

Supported Fields

Field	Required	Description
`model_display_name`	✓	Human-friendly name shown in model selector
`model`	✓	Model identifier sent via API (e.g., `claude-sonnet-4-5-20250929`, `gpt-5-codex`, `qwen3:4b`)
`base_url`	✓	API endpoint base URL
`api_key`	✓	Your API key for the provider. Can’t be empty.
`provider`	✓	One of: `anthropic`, `openai`, or `generic-chat-completion-api`
`max_tokens`	✓	Maximum output tokens for model responses

Understanding Providers

Factory supports three provider types that determine API compatibility:

Provider	API Format	Use For	Documentation
`anthropic`	Anthropic Messages API (v1/messages)	Anthropic models on their official API or compatible proxies	Anthropic Messages API
`openai`	OpenAI Responses API	OpenAI models on their official API or compatible proxies. Required for the newest models like GPT-5 and GPT-5-Codex.	OpenAI Responses API
`generic-chat-completion-api`	OpenAI Chat Completions API	OpenRouter, Fireworks, Together AI, Ollama, vLLM, and most open-source providers	OpenAI Chat Completions API

Factory is actively verifying Droid’s performance on popular models, but we cannot guarantee that all custom models will work out of the box. Only Anthropic and OpenAI models accessed via their official APIs are fully tested and benchmarked.

Model Size Consideration: Models below 30 billion parameters have shown significantly lower performance on agentic coding tasks. While these smaller models can be useful for experimentation and learning, they are generally not recommended for production coding work or complex software engineering tasks.

Prompt Caching

Factory CLI automatically uses prompt caching when available to reduce API costs:

Official providers (anthropic, openai): Factory attempts to use prompt caching via the official APIs. Caching behavior follows each provider’s implementation and requirements.
Generic providers (generic-chat-completion-api): Prompt caching support varies by provider and cannot be guaranteed. Some providers may support caching, while others may not.

Verifying Prompt Caching

To check if prompt caching is working correctly with your custom model:

Run a conversation with your custom model
Use the /cost command in Droid CLI to view cost breakdowns
Look for cache hit rates and savings in the output

If you’re not seeing expected caching savings, consult your provider’s documentation about their prompt caching support and requirements.

Quick Start

Choose a provider from the left navigation to see specific configuration examples:

Baseten - Deploy and serve custom models
DeepInfra - Cost-effective inference for open-source models
Fireworks AI - High-performance inference for open-source models
Google Gemini - Access Google’s Gemini models
Groq - Ultra-fast inference with Groq’s LPU™ Inference Engine
Hugging Face - Connect to models on HF Inference API
Ollama - Run models locally or in the cloud
OpenAI & Anthropic - Use your own API keys for official models
OpenRouter - Access multiple providers through a single interface

Using Custom Models

Once configured, access your custom models in the CLI:

Use the /model command
Your custom models appear in a separate “Custom models” section below Factory-provided models
Select any model to start using it

Custom models display with the name you set in model_display_name, making it easy to identify different providers and configurations.

Troubleshooting

Model not appearing in selector

Check JSON syntax in ~/.factory/config.json
Restart the CLI after making configuration changes
Verify all required fields are present

”Invalid provider” error

Provider must be exactly anthropic, openai, or generic-chat-completion-api
Check for typos and ensure proper capitalization

Authentication errors

Verify your API key is valid and has available credits
Check that the API key has proper permissions
Confirm the base URL matches your provider’s documentation

Local model won’t connect

Ensure your local server is running (e.g., ollama serve)
Verify the base URL is correct and includes /v1/ suffix if required
Check that the model is pulled/available locally

Rate limiting or quota errors

Check your provider’s rate limits and usage quotas
Monitor your usage through your provider’s dashboard

Billing

You pay your provider directly with no Factory markup or usage fees
Track costs and usage in your provider’s dashboard

Welcome

CLI

Reference

Web Platform

Enterprise

Configuration Reference

Supported Fields

Understanding Providers

Prompt Caching

Verifying Prompt Caching

Quick Start

Using Custom Models

Troubleshooting

Model not appearing in selector

”Invalid provider” error

Authentication errors

Local model won’t connect

Rate limiting or quota errors

Billing

Welcome

CLI

Reference

Web Platform

Enterprise

​Configuration Reference

​Supported Fields

​Understanding Providers

​Prompt Caching

​Verifying Prompt Caching

​Quick Start

​Using Custom Models

​Troubleshooting

​Model not appearing in selector

​”Invalid provider” error

​Authentication errors

​Local model won’t connect

​Rate limiting or quota errors

​Billing

Configuration Reference

Supported Fields

Understanding Providers

Prompt Caching

Verifying Prompt Caching

Quick Start

Using Custom Models

Troubleshooting

Model not appearing in selector

”Invalid provider” error

Authentication errors

Local model won’t connect

Rate limiting or quota errors

Billing