onde_inference 1.0.0
onde_inference: ^1.0.0 copied to clipboard
On-device LLM inference for Flutter & Dart. Run Qwen 2.5 models locally with Metal on iOS and macOS, CPU on Android and desktop. No cloud, no API key.
1.0.0 #
This is the first stable release. Onde has been running in production across multiple Splitfire AB apps on the App Store for months now — the 0.x versioning stopped making sense a while ago.
New: assigned model loading #
loadAssignedModel(appId:appSecret:) fetches whatever model you've assigned to your app in the ondeinference.com dashboard. If you haven't assigned anything yet, it falls back to the platform default. This is the recommended path for production apps — loadDefaultModel is still there for prototyping.
The example app picks up credentials via --dart-define=ONDE_APP_ID=... and --dart-define=ONDE_APP_SECRET=.... Without them, it falls back to the default model like before.
New models #
- Qwen 3 8B, 14B, and 1.7B (GGUF Q4_K_M)
- Qwen 2.5 Coder 7B (GGUF Q4_K_M)
- DeepSeek Coder 6.7B (GGUF Q4_K_M) with bundled chat template
Type changes #
GgufModelConfighas a new optionalchatTemplatefield for models that need a custom chat template (DeepSeek Coder uses this).InferenceResultnow carries atoolCallslist (List<ToolCallInfo>). Most responses return an empty list — but if the model decides to request a tool call, the structured data is there instead of raw markup intext. TheInferenceResultToolsXextension adds ahasToolCallsconvenience getter.
Engine #
- Old model weights are now dropped outside the lock when loading a new model. Previously the drop happened while holding the write lock, which blocked status queries during the memory release.
Dependencies #
- Switched from git-based
mistralrsto the publishedonde-mistralrs 0.8.2crates on crates.io. Builds are faster andcargo publishworks without[patch.crates-io]gymnastics.
Cross-platform #
- Linux and Windows builds with CPU inference now work out of the box (TokenSource fix for non-Darwin platforms).
0.1.7 #
- Fix: Removed
example/android/app/src/main/java/io/flutter/plugins/GeneratedPluginRegistrant.javafrom git tracking and added it to.gitignore. This file is regenerated byflutter pub geton every CI run, causing a dirty working tree that blockedpub publish. Added a CI restore step as an additional safety net.
0.1.6 #
- Fix: Replaced the composite
LICENSEfile with the canonical MIT license text so pub.dev'spanatool correctly recognises the OSI-approved license and awards the full license score.
0.1.5 #
- Engine: Added
load_assigned_model()— fetches the operator-assigned model config from the Onde SDK backend using app credentials (no user JWT required); falls back gracefully to the platform default when no model is assigned yet. - Telemetry: Added GresIQ pulse telemetry client. The engine now reports usage events to the GresIQ dashboard. Configure via
GRESIQ_ENVIRONMENTandONDE_EDGE_IDenv vars before the engine initialises. - Build: GresIQ API credentials (
GRESIQ_API_KEY,GRESIQ_API_SECRET,GRESIQ_APP_ID) are now embedded at build time viadotenvy. CI can inject secrets via env vars without modifying source.
0.1.4 #
- Added Qwen 3 4B GGUF model (
bartowski/Qwen_Qwen3-4B-GGUF) with full OpenAI-compatible tool calling support. - Added
GgufModelConfig.qwen3_4b()constructor and registered it in the supported model list.
0.1.3 #
- Platform: Added support for watchOS and visionOS.
0.1.2 #
- Engine: Switched all platform-specific
mistralrsandmistralrs-coredependencies to thesetoelkahfi/mistral.rsfork (branchfix/all-platform-fixes), picking up cross-platform stability fixes ahead of the upstream merge. - License: Dual-licensed under MIT OR Apache-2.0; added
LICENSE-APACHEalongside the existingLICENSE-MITfor pub.dev compliance. - Dependencies: Upgraded
freezed_annotationto^3.1.0andfreezedto^3.2.5. - Removed a stale
ignore_for_filedirective from the generated Flutter Rust Bridge glue code.
0.1.1 #
- CI/CD:
release-sdk-dart.ymlpublishesonde_inferenceto pub.dev on tag push. - Copyright headers on all hand-written source files (
engine.dart,types.dart,dart_test.dart, iOS and macOS Swift plugin classes). - Example app README rewritten — branding, platform notes, and an SDK quick reference.
android/local.propertiesgitignored; local SDK paths no longer pollute diffs.
0.1.0 #
- Initial MVP release.
- Multi-turn chat inference with Qwen 2.5 1.5B and 3B GGUF Q4_K_M models.
- Streaming token delivery via Dart
Stream<StreamChunk>— display tokens as they are generated. - Metal acceleration on iOS and macOS (Apple silicon and Intel).
- CPU inference on Android, Linux, and Windows.
- Platform-aware default model selection (1.5B on iOS / Android, 3B on macOS / Linux / Windows).
- Conversation history management:
history(),clearHistory(),pushHistory(). - One-shot
generate()API that does not affect the conversation history. - Configurable sampling: temperature, top-p, top-k, min-p, max tokens, frequency and presence penalties.
- Built-in sampling presets:
SamplingConfig.defaultConfig(),SamplingConfig.deterministic(),SamplingConfig.mobile(). EngineInfosnapshot: status, loaded model name, approximate memory, and history length.OndeInferencestatic helper namespace for library initialisation and model / sampling config factories.- Compilation stub (
frb_generated_stub.dart) so the package compiles before the native Rust bridge is built. - Powered by flutter_rust_bridge v2 and the Onde Rust engine.