onde_inference 1.0.0 copy "onde_inference: ^1.0.0" to clipboard
onde_inference: ^1.0.0 copied to clipboard

On-device LLM inference for Flutter & Dart. Run Qwen 2.5 models locally with Metal on iOS and macOS, CPU on Android and desktop. No cloud, no API key.

1.0.0 #

This is the first stable release. Onde has been running in production across multiple Splitfire AB apps on the App Store for months now — the 0.x versioning stopped making sense a while ago.

New: assigned model loading #

loadAssignedModel(appId:appSecret:) fetches whatever model you've assigned to your app in the ondeinference.com dashboard. If you haven't assigned anything yet, it falls back to the platform default. This is the recommended path for production apps — loadDefaultModel is still there for prototyping.

The example app picks up credentials via --dart-define=ONDE_APP_ID=... and --dart-define=ONDE_APP_SECRET=.... Without them, it falls back to the default model like before.

New models #

  • Qwen 3 8B, 14B, and 1.7B (GGUF Q4_K_M)
  • Qwen 2.5 Coder 7B (GGUF Q4_K_M)
  • DeepSeek Coder 6.7B (GGUF Q4_K_M) with bundled chat template

Type changes #

  • GgufModelConfig has a new optional chatTemplate field for models that need a custom chat template (DeepSeek Coder uses this).
  • InferenceResult now carries a toolCalls list (List<ToolCallInfo>). Most responses return an empty list — but if the model decides to request a tool call, the structured data is there instead of raw markup in text. The InferenceResultToolsX extension adds a hasToolCalls convenience getter.

Engine #

  • Old model weights are now dropped outside the lock when loading a new model. Previously the drop happened while holding the write lock, which blocked status queries during the memory release.

Dependencies #

  • Switched from git-based mistralrs to the published onde-mistralrs 0.8.2 crates on crates.io. Builds are faster and cargo publish works without [patch.crates-io] gymnastics.

Cross-platform #

  • Linux and Windows builds with CPU inference now work out of the box (TokenSource fix for non-Darwin platforms).

0.1.7 #

  • Fix: Removed example/android/app/src/main/java/io/flutter/plugins/GeneratedPluginRegistrant.java from git tracking and added it to .gitignore. This file is regenerated by flutter pub get on every CI run, causing a dirty working tree that blocked pub publish. Added a CI restore step as an additional safety net.

0.1.6 #

  • Fix: Replaced the composite LICENSE file with the canonical MIT license text so pub.dev's pana tool correctly recognises the OSI-approved license and awards the full license score.

0.1.5 #

  • Engine: Added load_assigned_model() — fetches the operator-assigned model config from the Onde SDK backend using app credentials (no user JWT required); falls back gracefully to the platform default when no model is assigned yet.
  • Telemetry: Added GresIQ pulse telemetry client. The engine now reports usage events to the GresIQ dashboard. Configure via GRESIQ_ENVIRONMENT and ONDE_EDGE_ID env vars before the engine initialises.
  • Build: GresIQ API credentials (GRESIQ_API_KEY, GRESIQ_API_SECRET, GRESIQ_APP_ID) are now embedded at build time via dotenvy. CI can inject secrets via env vars without modifying source.

0.1.4 #

  • Added Qwen 3 4B GGUF model (bartowski/Qwen_Qwen3-4B-GGUF) with full OpenAI-compatible tool calling support.
  • Added GgufModelConfig.qwen3_4b() constructor and registered it in the supported model list.

0.1.3 #

  • Platform: Added support for watchOS and visionOS.

0.1.2 #

  • Engine: Switched all platform-specific mistralrs and mistralrs-core dependencies to the setoelkahfi/mistral.rs fork (branch fix/all-platform-fixes), picking up cross-platform stability fixes ahead of the upstream merge.
  • License: Dual-licensed under MIT OR Apache-2.0; added LICENSE-APACHE alongside the existing LICENSE-MIT for pub.dev compliance.
  • Dependencies: Upgraded freezed_annotation to ^3.1.0 and freezed to ^3.2.5.
  • Removed a stale ignore_for_file directive from the generated Flutter Rust Bridge glue code.

0.1.1 #

  • CI/CD: release-sdk-dart.yml publishes onde_inference to pub.dev on tag push.
  • Copyright headers on all hand-written source files (engine.dart, types.dart, dart_test.dart, iOS and macOS Swift plugin classes).
  • Example app README rewritten — branding, platform notes, and an SDK quick reference.
  • android/local.properties gitignored; local SDK paths no longer pollute diffs.

0.1.0 #

  • Initial MVP release.
  • Multi-turn chat inference with Qwen 2.5 1.5B and 3B GGUF Q4_K_M models.
  • Streaming token delivery via Dart Stream<StreamChunk> — display tokens as they are generated.
  • Metal acceleration on iOS and macOS (Apple silicon and Intel).
  • CPU inference on Android, Linux, and Windows.
  • Platform-aware default model selection (1.5B on iOS / Android, 3B on macOS / Linux / Windows).
  • Conversation history management: history(), clearHistory(), pushHistory().
  • One-shot generate() API that does not affect the conversation history.
  • Configurable sampling: temperature, top-p, top-k, min-p, max tokens, frequency and presence penalties.
  • Built-in sampling presets: SamplingConfig.defaultConfig(), SamplingConfig.deterministic(), SamplingConfig.mobile().
  • EngineInfo snapshot: status, loaded model name, approximate memory, and history length.
  • OndeInference static helper namespace for library initialisation and model / sampling config factories.
  • Compilation stub (frb_generated_stub.dart) so the package compiles before the native Rust bridge is built.
  • Powered by flutter_rust_bridge v2 and the Onde Rust engine.
7
likes
0
points
345
downloads

Publisher

verified publisherondeinference.com

Weekly Downloads

On-device LLM inference for Flutter & Dart. Run Qwen 2.5 models locally with Metal on iOS and macOS, CPU on Android and desktop. No cloud, no API key.

Homepage
Repository (GitHub)
View/report issues

License

unknown (license)

Dependencies

flutter, flutter_rust_bridge, freezed_annotation

More

Packages that depend on onde_inference

Packages that implement onde_inference