Ollama Auth Proxy

The Ollama Security Gap

Ollama has become a popular choice for running large language models locally, offering impressive performance and ease of use. However, Ollama's default configuration lacks authentication—anyone with network access to the Ollama port can make unlimited requests. For personal use on a single machine, this isn't problematic. But when exposing Ollama to a network, sharing access across a team, or integrating with multiple applications, the absence of access control becomes a significant limitation.

Additionally, many existing AI tools and libraries are built around OpenAI's API format. Adapting these tools to work with Ollama's different API structure requires modifying code, maintaining compatibility layers, or writing custom integrations. This friction discourages seamless adoption of locally-hosted models in existing workflows.

A Simple Security Layer

Ollama Auth Proxy provides a lightweight solution to both problems. The proxy sits between clients and Ollama, adding API key authentication and translating between OpenAI-compatible and Ollama API formats. This enables existing OpenAI-based code to work directly with Ollama while controlling access through API keys.

The implementation is deliberately minimal—a FastAPI server that validates API keys from a JSON file, forwards authenticated requests to Ollama, and transforms request/response formats as needed. The simplicity is intentional: the proxy does exactly what's required and nothing more, making it easy to understand, deploy, and modify.

Core Features

API Key Authentication: The proxy validates bearer tokens against a configurable list of API keys stored in keys.json. Invalid or missing keys receive 401 responses. This straightforward approach provides basic access control without complex user management or database dependencies.

OpenAI API Compatibility: Requests to /v1/chat/completions are automatically transformed from OpenAI format to Ollama's /api/chat format. Response transformations ensure compatibility with OpenAI client libraries. This means existing code using the OpenAI Python client or similar tools can point to the proxy without modification.

HTTPS Support: The proxy supports HTTPS through self-signed certificates, encrypting traffic between clients and the proxy. While self-signed certificates require client-side configuration to trust them, they prevent credential exposure over unencrypted connections. A certificate generation script simplifies setup.

Transparent Proxying: For endpoints that don't require transformation, the proxy forwards requests directly to Ollama, preserving headers, query parameters, and request bodies. This ensures full Ollama functionality remains accessible through the proxy.

Format Transformation

The proxy's format transformation demonstrates practical API bridging. When receiving an OpenAI-formatted request:

{
  "model": "mistral",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 100
}

The proxy transforms it to Ollama's format:

{
  "model": "mistral",
  "messages": [...],
  "stream": false,
  "options": {
    "temperature": 0.7,
    "num_predict": 100
  }
}

Responses undergo reverse transformation, mapping Ollama's response structure to OpenAI's expected format including placeholder values for fields like token counts that Ollama doesn't provide. This allows OpenAI clients to parse responses successfully even though the underlying model is local.

Practical Integration

Using the proxy with existing OpenAI client code is straightforward. The Python OpenAI client simply needs a different base URL and API key:

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-key-from-keys-json",
    base_url="https://localhost:8080"  # Point to proxy instead of OpenAI
)

response = client.chat.completions.create(
    model="mistral",
    messages=[{"role": "user", "content": "Hello!"}]
)

This compatibility enables using Ollama with existing AI applications, frameworks, and tools built around OpenAI's API. Projects using LangChain, AutoGen, or custom applications can switch to local models by changing configuration rather than rewriting code.

Deployment Considerations

The proxy is designed for straightforward deployment scenarios:

Local Development: Running on localhost alongside Ollama, the proxy enables multiple local applications to share Ollama access with individual API keys for tracking usage or access patterns.

Team Sharing: Deployed on a server running Ollama, the proxy allows team members to access shared models through the network while requiring authentication. Different keys can be issued to team members or applications.

Service Integration: Applications requiring OpenAI-compatible interfaces can integrate with locally-hosted models through the proxy, reducing API costs while maintaining code compatibility.

The project acknowledges its limitations clearly. Streaming responses aren't yet supported—all responses are buffered and returned complete. Self-signed certificates require client configuration to disable certificate verification. The proxy must run on the same machine as Ollama unless the code is modified to point to a remote Ollama instance.

Security Trade-offs

The proxy improves security over unauthenticated Ollama access but isn't enterprise-grade authentication. API keys in a JSON file are simple but lack rotation capabilities, audit logging, or rate limiting. HTTPS with self-signed certificates encrypts traffic but remains vulnerable to man-in-the-middle attacks if an attacker can intercept the initial connection.

For production deployments, the README recommends obtaining CA-signed certificates and implementing additional security measures. The proxy provides a foundation that can be extended with proper certificate management, key rotation, request logging, rate limiting, and more sophisticated authentication mechanisms as needs grow.

Technical Implementation

The implementation uses FastAPI for the HTTP server, providing async request handling and dependency injection for authentication. The validate_api_key dependency runs on every request, extracting bearer tokens and validating against the keys file. Failed authentication short-circuits the request before forwarding to Ollama.

The transformation functions (transform_openai_to_ollama and transform_ollama_to_openai) handle format conversion, mapping common parameters like temperature and max tokens between the different naming conventions. The proxy preserves all headers except those that could cause forwarding issues (content-length, transfer-encoding, etc.).

HTTPS support uses Python's ssl module with uvicorn, loading the generated certificates on startup. The certificate generation script creates a basic self-signed certificate valid for localhost, sufficient for development and internal deployments.

Use Cases

The proxy shines in specific scenarios:

Cost Reduction: Projects using OpenAI APIs can switch to local Ollama models for development or specific workloads while maintaining code compatibility, eliminating per-token costs.

Privacy-Sensitive Applications: Organizations requiring data privacy can run models locally while using familiar OpenAI-compatible tooling.

Hybrid Deployments: Systems can use OpenAI for production and Ollama for development/testing by changing only the API endpoint and key, simplifying environment configuration.

Multi-Application Access: Multiple applications can share a single Ollama instance with individual API keys for access tracking or future rate limiting.

Open Source Simplicity

Ollama Auth Proxy is open source and available on GitHub at github.com/andrewcampi/ollama-auth-proxy. The entire implementation is roughly 200 lines of well-commented Python, making it easy to understand, audit, and modify.

The project exemplifies the Unix philosophy: do one thing well. Rather than building a comprehensive authentication system or feature-rich proxy, it solves the specific problem of adding basic access control and OpenAI compatibility to Ollama. For users needing these capabilities, the simplicity is an advantage—less code means fewer bugs, easier customization, and clearer behavior.

The proxy demonstrates that not every solution needs to be complex. Sometimes the most useful tools are those that solve a focused problem with minimal overhead, providing just enough functionality to bridge a gap in existing systems. For anyone running Ollama and needing basic authentication or OpenAI compatibility, Ollama Auth Proxy offers a practical, understandable solution.