Ollama

Run models locally on your hardware with Ollama, or use Ollama Cloud for hosted inference.

Performance Notice: Models below 30 billion parameters have shown significantly lower performance on agentic coding tasks. While smaller models (7B, 13B) can be useful for experimentation and learning, they are generally not recommended for production coding work or complex software engineering tasks.

Local Ollama

Run models entirely on your machine with no internet required.

Configuration

Configuration examples for ~/.factory/config.json:

{
  "custom_models": [
    {
      "model_display_name": "Qwen 2.5 Coder 32B [Local]",
      "model": "qwen2.5-coder:32b",
      "base_url": "http://localhost:11434/v1",
      "api_key": "not-needed",  # add any non-empty value
      "provider": "generic-chat-completion-api",
      "max_tokens": 16000
    },
    {
      "model_display_name": "Qwen 2.5 Coder 7B [Local]",
      "model": "qwen2.5-coder:7b",
      "base_url": "http://localhost:11434/v1",
      "api_key": "not-needed",  # add any non-empty value
      "provider": "generic-chat-completion-api",
      "max_tokens": 4000
    }
  ]
}

Setup

Context Window Configuration: For optimal performance with Factory, ensure you set the context window to at least 32,000 tokens. You can either:

Use the context window slider in the Ollama app (set to 32k minimum)
Set environment variable before starting: OLLAMA_CONTEXT_LENGTH=32000 ollama serve

Without adequate context, the experience will be significantly degraded.

Install Ollama from ollama.com/download

Pull desired models:

# Recommended models
ollama pull qwen2.5-coder:32b
ollama pull qwen2.5-coder:7b

Start the Ollama server with extra context:
```
OLLAMA_CONTEXT_LENGTH=32000 ollama serve
```
Add configurations to Factory config

Approximate Hardware Requirements

Model Size	RAM Required	VRAM (GPU)
3B params	4GB	3GB
7B params	8GB	6GB
13B params	16GB	10GB
30B params	32GB	20GB
70B params	64GB	40GB

Ollama Cloud

Use Ollama’s cloud service for hosted model inference without local hardware requirements

Recommended Cloud Model

The best performance for agentic coding has been observed with qwen3-coder:480b. For a full list of available cloud models, visit: ollama.com/search?c=cloud

Configuration

{
  "custom_models": [
    {
      "model_display_name": "qwen3-coder [Online]",
      "model": "qwen3-coder:480b-cloud",
      "base_url": "http://localhost:11434/v1/",
      "api_key": "not-needed",  # add any non-empty value
      "provider": "generic-chat-completion-api",
      "max_tokens": 128000
    }
  ]
}

Getting Started with Cloud Models

Ensure Ollama is installed and running locally
Cloud models are accessed through your local Ollama instance - no API key needed
Add the configuration above to your Factory config
The model will automatically use cloud compute when requested

Troubleshooting

Local server not connecting

Ensure Ollama is running: ollama serve
Check if port 11434 is available
Try curl http://localhost:11434/api/tags to test

Model not found

Pull the model first: ollama pull model-name
Check exact model name with ollama list

Notes

Local API doesn’t require authentication (use any placeholder for api_key)
Models are stored in ~/.ollama/models/

Welcome

CLI

Reference

Web Platform

Enterprise

Local Ollama

Configuration

Setup

Approximate Hardware Requirements

Ollama Cloud

Recommended Cloud Model

Configuration

Getting Started with Cloud Models

Troubleshooting

Local server not connecting

Model not found

Notes

Welcome

CLI

Reference

Web Platform

Enterprise

​Local Ollama

​Configuration

​Setup

​Approximate Hardware Requirements

​Ollama Cloud

​Recommended Cloud Model

​Configuration

​Getting Started with Cloud Models

​Troubleshooting

​Local server not connecting

​Model not found

​Notes

Local Ollama

Configuration

Setup

Approximate Hardware Requirements

Ollama Cloud

Recommended Cloud Model

Configuration

Getting Started with Cloud Models

Troubleshooting

Local server not connecting

Model not found

Notes