We’re currently experiencing intermittent model download issues due to high demand and are actively working on a fix.

Skip to main content
Foundry Local logo Foundry Local logo

Build local AI
into your app

Use native SDKs to download, cache, load, and call optimized local models on-device.

Start with the SDK

Install one package, load a model, then run inference in-process.

Install package
pip install foundry-local-sdk

The fastest path from SDK install to shipped local AI

Start with native SDKs, keep inference in your app process, and use CLI or REST tools only when they help your development workflow.

One SDK lifecycle

Initialize the manager, choose a model alias, download and cache it, load it, then call chat or audio clients from your app.

// SDK lifecycle: initialize, download, load, call
const mgr = FoundryLocalManager.create({ appName: 'my-app' })
const model = await mgr.catalog.getModel('qwen2.5-0.5b')
await model.download(); await model.load()
const res = await model.createChatClient().completeChat(msgs)|

Hardware handled for you

The SDK picks and registers execution providers so apps can target NPU, GPU, or CPU without custom device plumbing.

NPU
Neural Engine
GPU
Graphics Card
CPU
Processor

Offline app runtime

The runtime and cached models stay local so user workflows keep running without a network.

Native SDKs

Start in Python or JavaScript; ship production apps in C# and Rust too.

PY
JS
C#
RS

Native first, REST when needed

Use SDK clients in-process, or start the optional OpenAI-compatible server for frameworks like LangChain.

base_url="api.openai.com"
base_url="localhost"

Data Privacy

Prompts, audio, and responses stay on the user's device.