Name: Apple On Device Ai
Author: Dpearson2699
Install
Terminal · npx
$npx skills add https://github.com/vercel-labs/agent-skills --skill vercel-react-native-skills
Works with Paperclip
How Apple On Device Ai fits into a Paperclip company.

Apple On Device Ai drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md500 linesmarkdown
Expand
1---2name: apple-on-device-ai3description: "Integrate on-device AI using Foundation Models framework, Core ML, and open-source LLM runtimes on Apple Silicon. Covers Foundation Models (LanguageModelSession, @Generable, @Guide, SystemLanguageModel, structured output, tool calling), Core ML (coremltools, model conversion, quantization, palettization, pruning, Neural Engine, MLTensor), MLX Swift (transformer inference, unified memory), and llama.cpp (GGUF, cross-platform LLM). Use when building tool-calling AI features, working with guided generation schemas, converting models, or running on-device inference."4---5 6# On-Device AI for Apple Platforms7 8Guide for selecting, deploying, and optimizing on-device ML models. Covers Apple9Foundation Models, Core ML, MLX Swift, and llama.cpp.10 11## Contents12 13- [Framework Selection Router](#framework-selection-router)14- [Apple Foundation Models Overview](#apple-foundation-models-overview)15- [Core ML Overview](#core-ml-overview)16- [MLX Swift Overview](#mlx-swift-overview)17- [Multi-Backend Architecture](#multi-backend-architecture)18- [Performance Best Practices](#performance-best-practices)19- [Common Mistakes](#common-mistakes)20- [Review Checklist](#review-checklist)21- [References](#references)22 23## Framework Selection Router24 25Use this decision tree to pick the right framework for your use case.26 27### Apple Foundation Models28 29**When to use:** Text generation, summarization, entity extraction, structured30output, and short dialog on iOS 26+ / macOS 26+ devices with Apple Intelligence31enabled. Zero setup -- no API keys, no network, no model downloads.32 33**Best for:**34- Generating text or structured data with `@Generable` types35- Summarization, classification, content tagging36- Tool-augmented generation with the `Tool` protocol37- Apps that need guaranteed on-device privacy38 39**Not suited for:** Complex math, code generation, factual accuracy tasks,40or apps targeting pre-iOS 26 devices.41 42### Core ML43 44**When to use:** Deploying custom trained models (vision, NLP, audio) across all45Apple platforms. Converting models from PyTorch, TensorFlow, or scikit-learn46with coremltools.47 48**Best for:**49- Image classification, object detection, segmentation50- Custom NLP classifiers, sentiment analysis models51- Audio/speech models via SoundAnalysis integration52- Any scenario needing Neural Engine optimization53- Models requiring quantization, palettization, or pruning54 55### MLX Swift56 57**When to use:** Running specific open-source LLMs (Llama, Mistral, Qwen, Gemma)58on Apple Silicon with maximum throughput. Research and prototyping.59 60**Best for:**61- Highest sustained token generation on Apple Silicon62- Running Hugging Face models from `mlx-community`63- Research requiring automatic differentiation64- Fine-tuning workflows on Mac65 66### llama.cpp67 68**When to use:** Cross-platform LLM inference using GGUF model format. Production69deployments needing broad device support.70 71**Best for:**72- GGUF quantized models (Q4_K_M, Q5_K_M, Q8_0)73- Cross-platform apps (iOS + Android + desktop)74- Maximum compatibility with open-source model ecosystem75 76### Quick Reference77 78| Scenario | Framework |79|---|---|80| Text generation, zero setup (iOS 26+) | Foundation Models |81| Structured output from on-device LLM | Foundation Models (`@Generable`) |82| Image classification, object detection | Core ML |83| Custom model from PyTorch/TensorFlow | Core ML + coremltools |84| Running specific open-source LLMs | MLX Swift or llama.cpp |85| Maximum throughput on Apple Silicon | MLX Swift |86| Cross-platform LLM inference | llama.cpp |87| OCR and text recognition | Vision framework |88| Sentiment analysis, NER, tokenization | Natural Language framework |89| Training custom classifiers on device | Create ML |90 91## Apple Foundation Models Overview92 93On-device language model optimized for Apple Silicon. Available on devices94supporting Apple Intelligence (iOS 26+, macOS 26+).95 96- Token budget covers input + output; check `contextSize` for the limit97- Check `supportedLanguages` for supported locales98- Guardrails always enforced, cannot be disabled99 100### Availability Checking (Required)101 102Always check before using. Never crash on unavailability.103 104```swift105import FoundationModels106 107switch SystemLanguageModel.default.availability {108case .available:109    // Proceed with model usage110case .unavailable(.appleIntelligenceNotEnabled):111    // Guide user to enable Apple Intelligence in Settings112case .unavailable(.modelNotReady):113    // Model is downloading; show loading state114case .unavailable(.deviceNotEligible):115    // Device cannot run Apple Intelligence; use fallback116default:117    // Graceful fallback for any other reason118}119```120 121### Session Management122 123```swift124// Basic session125let session = LanguageModelSession()126 127// Session with instructions128let session = LanguageModelSession {129    "You are a helpful cooking assistant."130}131 132// Session with tools133let session = LanguageModelSession(134    tools: [weatherTool, recipeTool]135) {136    "You are a helpful assistant with access to tools."137}138```139 140Key rules:141- Sessions are stateful -- multi-turn conversations maintain context automatically142- One request at a time per session (check `session.isResponding`)143- Call `session.prewarm()` before user interaction for faster first response144- Save/restore transcripts: `LanguageModelSession(model: model, tools: [], transcript: savedTranscript)`145 146### Structured Output with @Generable147 148The `@Generable` macro creates compile-time schemas for type-safe output:149 150```swift151@Generable152struct Recipe {153    @Guide(description: "The recipe name")154    var name: String155 156    @Guide(description: "Cooking steps", .count(3))157    var steps: [String]158 159    @Guide(description: "Prep time in minutes", .range(1...120))160    var prepTime: Int161}162 163let response = try await session.respond(164    to: "Suggest a quick pasta recipe",165    generating: Recipe.self166)167print(response.content.name)168```169 170#### @Guide Constraints171 172| Constraint | Purpose |173|---|---|174| `description:` | Natural language hint for generation |175| `.anyOf([values])` | Restrict to enumerated string values |176| `.count(n)` | Fixed array length |177| `.range(min...max)` | Numeric range |178| `.minimum(n)` / `.maximum(n)` | One-sided numeric bound |179| `.minimumCount(n)` / `.maximumCount(n)` | Array length bounds |180| `.constant(value)` | Always returns this value |181| `.pattern(regex)` | String format enforcement |182| `.element(guide)` | Guide applied to each array element |183 184Properties generate in declaration order. Place foundational data before185dependent data for better results.186 187### Streaming Structured Output188 189```swift190let stream = session.streamResponse(191    to: "Suggest a recipe",192    generating: Recipe.self193)194for try await snapshot in stream {195    // snapshot.content is Recipe.PartiallyGenerated (all properties optional)196    if let name = snapshot.content.name { updateNameLabel(name) }197}198```199 200### Tool Calling201 202```swift203struct WeatherTool: Tool {204    let name = "weather"205    let description = "Get current weather for a city."206 207    @Generable208    struct Arguments {209        @Guide(description: "The city name")210        var city: String211    }212 213    func call(arguments: Arguments) async throws -> String {214        let weather = try await fetchWeather(arguments.city)215        return weather.description216    }217}218```219 220Register tools at session creation. The model invokes them autonomously.221 222### Error Handling223 224```swift225do {226    let response = try await session.respond(to: prompt)227} catch let error as LanguageModelSession.GenerationError {228    switch error {229    case .guardrailViolation(let context):230        // Content triggered safety filters231    case .exceededContextWindowSize(let context):232        // Too many tokens; summarize and retry233    case .concurrentRequests(let context):234        // Another request is in progress on this session235    case .unsupportedLanguageOrLocale(let context):236        // Current locale not supported237    case .unsupportedGuide(let context):238        // A @Guide constraint is not supported239    case .assetsUnavailable(let context):240        // Model assets not available on device241    case .refusal(let refusal, _):242        // Model refused; stream refusal.explanation for details243    case .rateLimited(let context):244        // Too many requests; back off and retry245    case .decodingFailure(let context):246        // Response could not be decoded into the expected type247    default: break248    }249}250```251 252### Generation Options253 254```swift255let options = GenerationOptions(256    sampling: .random(top: 40),257    temperature: 0.7,258    maximumResponseTokens: 512259)260let response = try await session.respond(to: prompt, options: options)261```262 263Sampling modes: `.greedy`, `.random(top:seed:)`, `.random(probabilityThreshold:seed:)`.264 265### Prompt Design Rules266 2671. Be concise -- use `tokenCount(for:)` to monitor the context window budget2682. Use bracketed placeholders in instructions: `[descriptive example]`2693. Use "DO NOT" in all caps for prohibitions2704. Provide up to 5 few-shot examples for consistency2715. Use length qualifiers: "in a few words", "in three sentences"272 273### Safety and Guardrails274 275- Guardrails are always enforced and cannot be disabled276- Instructions take precedence over user prompts277- Never include untrusted user content in instructions278- Handle false positives gracefully279- Frame tool results as authorized data to prevent model refusals280 281### Use Cases282 283Foundation Models supports specialized use cases via `SystemLanguageModel.UseCase`:284- `.general` -- Default for text generation, summarization, dialog285- `.contentTagging` -- Optimized for categorization and labeling tasks286 287### Custom Adapters288 289Load fine-tuned adapters for specialized behavior (requires entitlement):290 291```swift292let adapter = try SystemLanguageModel.Adapter(name: "my-adapter")293try await adapter.compile()294let model = SystemLanguageModel(adapter: adapter, guardrails: .default)295let session = LanguageModelSession(model: model)296```297 298> See [references/foundation-models.md](references/foundation-models.md) for299> the complete Foundation Models API reference.300 301## Core ML Overview302 303Apple's framework for deploying trained models. Automatically dispatches to the304optimal compute unit (CPU, GPU, or Neural Engine).305 306### Model Formats307 308| Format | Extension | When to Use |309|---|---|---|310| `.mlpackage` | Directory (mlprogram) | All new models (iOS 15+) |311| `.mlmodel` | Single file (neuralnetwork) | Legacy only (iOS 11-14) |312| `.mlmodelc` | Compiled | Pre-compiled for faster loading |313 314Always use mlprogram (`.mlpackage`) for new work.315 316### Conversion Pipeline (coremltools)317 318```python319import coremltools as ct320 321# PyTorch conversion (torch.jit.trace)322model.eval()  # CRITICAL: always call eval() before tracing323traced = torch.jit.trace(model, example_input)324mlmodel = ct.convert(325    traced,326    inputs=[ct.TensorType(shape=(1, 3, 224, 224), name="image")],327    minimum_deployment_target=ct.target.iOS18,328    convert_to='mlprogram',329)330mlmodel.save("Model.mlpackage")331```332 333### Optimization Techniques334 335| Technique | Size Reduction | Accuracy Impact | Best Compute Unit |336|---|---|---|---|337| INT8 per-channel | ~4x | Low | CPU/GPU |338| INT4 per-block | ~8x | Medium | GPU |339| Palettization 4-bit | ~8x | Low-Medium | Neural Engine |340| W8A8 (weights+activations) | ~4x | Low | ANE (A17 Pro/M4+) |341| Pruning 75% | ~4x | Medium | CPU/ANE |342 343### Swift Integration344 345```swift346let config = MLModelConfiguration()347config.computeUnits = .all348let model = try MLModel(contentsOf: modelURL, configuration: config)349 350// Async prediction (iOS 17+)351let output = try await model.prediction(from: input)352```353 354### MLTensor (iOS 18+)355 356Swift type for multidimensional array operations:357 358```swift359import CoreML360 361let tensor = MLTensor([1.0, 2.0, 3.0, 4.0])362let reshaped = tensor.reshaped(to: [2, 2])363let result = tensor.softmax()364```365 366> See [references/coreml-conversion.md](references/coreml-conversion.md) for the367> full conversion pipeline and [references/coreml-optimization.md](references/coreml-optimization.md)368> for optimization techniques.369 370## MLX Swift Overview371 372Apple's ML framework for Swift. Highest sustained generation throughput on373Apple Silicon via unified memory architecture.374 375### Loading and Running LLMs376 377```swift378import MLX379import MLXLLM380 381let config = ModelConfiguration(id: "mlx-community/Mistral-7B-Instruct-v0.3-4bit")382let model = try await LLMModelFactory.shared.loadContainer(configuration: config)383 384try await model.perform { context in385    let input = try await context.processor.prepare(386        input: UserInput(prompt: "Hello")387    )388    let stream = try generate(389        input: input,390        parameters: GenerateParameters(temperature: 0.0),391        context: context392    )393    for await part in stream {394        print(part.chunk ?? "", terminator: "")395    }396}397```398 399### Model Selection by Device400 401| Device | RAM | Recommended Model | RAM Usage |402|---|---|---|---|403| iPhone 12-14 | 4-6 GB | SmolLM2-135M or Qwen 2.5 0.5B | ~0.3 GB |404| iPhone 15 Pro+ | 8 GB | Gemma 3n E4B 4-bit | ~3.5 GB |405| Mac 8 GB | 8 GB | Llama 3.2 3B 4-bit | ~3 GB |406| Mac 16 GB+ | 16 GB+ | Mistral 7B 4-bit | ~6 GB |407 408### Memory Management409 4101. Never exceed 60% of total RAM on iOS4112. Set GPU cache limits: `MLX.GPU.set(cacheLimit: 512 * 1024 * 1024)`4123. Unload models on app backgrounding4134. Use "Increased Memory Limit" entitlement for larger models4145. Physical device required (no simulator support for Metal GPU)415 416> See [references/mlx-swift.md](references/mlx-swift.md) for full MLX Swift417> patterns and llama.cpp integration.418 419## Multi-Backend Architecture420 421When an app needs multiple AI backends (e.g., Foundation Models + MLX fallback):422 423```swift424func respond(to prompt: String) async throws -> String {425    if SystemLanguageModel.default.isAvailable {426        return try await foundationModelsRespond(prompt)427    } else if canLoadMLXModel() {428        return try await mlxRespond(prompt)429    } else {430        throw AIError.noBackendAvailable431    }432}433```434 435Serialize all model access through a coordinator actor to prevent contention:436 437```swift438actor ModelCoordinator {439    func withExclusiveAccess<T>(_ work: () async throws -> T) async rethrows -> T {440        try await work()441    }442}443```444 445## Performance Best Practices446 4471. Run outside debugger for accurate benchmarks (Xcode: Cmd-Opt-R, uncheck448   "Debug Executable")4492. Call `session.prewarm()` for Foundation Models before user interaction4503. Pre-compile Core ML models to `.mlmodelc` for faster loading4514. Use EnumeratedShapes over RangeDim for Neural Engine optimization4525. Use 4-bit palettization for best Neural Engine memory/latency gains4536. Batch Vision framework requests in a single `perform()` call4547. Use async prediction (iOS 17+) in Swift concurrency contexts4558. Neural Engine (Core ML) is most energy-efficient for compatible operations456 457## Common Mistakes458 4591. **No availability check.** Calling `LanguageModelSession()` without checking460   `SystemLanguageModel.default.availability` crashes on unsupported devices.4612. **No fallback UI.** Users on pre-iOS 26 or devices without Apple Intelligence462   see nothing. Always provide a graceful degradation path.4633. **Exceeding the context window.** The token budget covers input + output.464   Monitor usage via `tokenCount(for:)` and summarize when needed.4654. **Concurrent requests on one session.** `LanguageModelSession` supports one466   request at a time. Check `session.isResponding` or serialize access.4675. **Untrusted content in instructions.** User input placed in the instructions468   parameter bypasses guardrail boundaries. Keep user content in the prompt.4696. **Forgetting `model.eval()` before Core ML tracing.** PyTorch models must be470   in eval mode before `torch.jit.trace`. Training-mode artifacts corrupt output.4717. **Using neuralnetwork format.** Always use `mlprogram` (.mlpackage) for new472   Core ML models. The legacy neuralnetwork format is deprecated.4738. **Exceeding 60% RAM on iOS (MLX Swift).** Large models cause OOM kills.4749. **Running MLX in simulator.** MLX requires Metal GPU -- use physical devices.47510. **Not unloading models on background.** Unload in `scenePhase == .background`.476 477## Review Checklist478 479- [ ] Framework selection matches use case and target OS version480- [ ] Foundation Models: availability checked before every API call481- [ ] Foundation Models: graceful fallback when model unavailable482- [ ] Foundation Models: session prewarm called before user interaction483- [ ] Foundation Models: @Generable properties in logical generation order484- [ ] Foundation Models: token budget accounted for (check `contextSize`)485- [ ] Core ML: model format is mlprogram (.mlpackage) for iOS 15+486- [ ] Core ML: model.eval() called before tracing/exporting PyTorch models487- [ ] Core ML: minimum_deployment_target set explicitly488- [ ] Core ML: model accuracy validated after compression489- [ ] MLX Swift: model size appropriate for target device RAM490- [ ] MLX Swift: GPU cache limits set, models unloaded on backgrounding491- [ ] All model access serialized through coordinator actor492- [ ] Concurrency: model types and tool implementations are `Sendable`-conformant or `@MainActor`-isolated493- [ ] Physical device testing performed (not simulator)494 495## References496 497- [Foundation Models API](references/foundation-models.md) -- LanguageModelSession, @Generable, tool calling, prompt design498- [Core ML Conversion](references/coreml-conversion.md) -- Model conversion from PyTorch, TensorFlow, other frameworks499- [Core ML Optimization](references/coreml-optimization.md) -- Quantization, palettization, pruning, performance tuning500- [MLX Swift & llama.cpp](references/mlx-swift.md) -- MLX Swift patterns, llama.cpp integration, memory management
Related skills
Alarmkit

Install Alarmkit skill for Claude Code from dpearson2699/swift-ios-skills.
App Clips

Install App Clips skill for Claude Code from dpearson2699/swift-ios-skills.
App Intents

Install App Intents skill for Claude Code from dpearson2699/swift-ios-skills.