Build local AI
into your app

Use native SDKs to download, cache, load, and call optimized local models on-device.

Install one package, load a model, then run inference in-process.

Install package

pip install foundry-local-sdk

The fastest path from SDK install to shipped local AI

Start with native SDKs, keep inference in your app process, and use CLI or REST tools only when they help your development workflow.

Initialize the manager, choose a model alias, download and cache it, load it, then call chat or audio clients from your app.

// SDK lifecycle: initialize, download, load, call

const mgr = FoundryLocalManager.create({ appName: 'my-app' })

const model = await mgr.catalog.getModel('qwen2.5-0.5b')

await model.download(); await model.load()

const res = await model.createChatClient().completeChat(msgs)|

The SDK picks and registers execution providers so apps can target NPU, GPU, or CPU without custom device plumbing.

NPU

Neural Engine

GPU

Graphics Card

CPU

Processor

The runtime and cached models stay local so user workflows keep running without a network.

Start in Python or JavaScript; ship production apps in C# and Rust too.

Use SDK clients in-process, or start the optional OpenAI-compatible server for frameworks like LangChain.

base_url="api.openai.com"

base_url="localhost"

Prompts, audio, and responses stay on the user's device.