llamadart 0.3.0
llamadart: ^0.3.0 copied to clipboard
A Dart/Flutter plugin for llama.cpp - run LLM inference on any platform using GGUF models
0.3.0 #
- [BREAKING] Removal of
LlamaService: The legacyLlamaServicefacade has been removed. UseLlamaEnginewithLlamaBackend()instead for all platforms. - LoRA Support: Added full support for Low-Rank Adaptation (LoRA) on all native platforms (iOS, Android, macOS, Linux, Windows).
- Web Improvements: Significantly enhanced the web implementation using
wllamav2 features, including native chat templating and threading info. - Logging Refactor: Implemented a unified logging architecture.
- Native Platforms: Simplified to an on/off toggle to ensure stability.
LlamaLogLevel.nonesuppresses all output; other levels enable default stderr logging. - Web: Supports full granular filtering (Debug, Info, Warn, Error).
- Native Platforms: Simplified to an on/off toggle to ensure stability.
- Stability Fixes: Resolved frequent "Cannot invoke native callback from a leaf call" crashes during Flutter Hot Restarts by refactoring native resource lifecycle.
- Improved Lifecycle: Removed
NativeFinalizerdependency to avoid race conditions. Explicitly calldispose()to release native resources. - Robust Loading: Improved model loading on all platforms with better instance cleanup, script injection, and URL-based loading support.
- Dynamic Adapters: Implemented APIs to dynamically add, update scale, or remove LoRA adapters at runtime.
- LoRA Training Pipeline: Added a comprehensive Jupyter Notebook for fine-tuning models and converting adapters to GGUF format.
- API Enhancements: Updated
ModelParamsto include initial LoRA configurations and introducedsupportsUrlLoadingfor better platform abstraction. - CLI Tooling: Updated the
basic_appexample to support testing LoRA adapters via the--loraflag.
0.2.0+b7883 #
- Project Rebrand: Renamed package from
llama_darttollamadart. - Pure Native Assets: Migrated to the modern Dart Native Assets mechanism (
hook/build.dart). - Zero Setup: Native binaries are now automatically downloaded and bundled at runtime based on the target platform and architecture.
- Version Alignment: Aligned package versioning and binary distribution with
llama.cpprelease tags (starting withb7883). - Logging Control: Implemented comprehensive logging interception for both
llamaandggmlbackends with configurable log levels. - Performance Optimization: Added token caching to message processing, significantly reducing latency in long conversations.
- Architecture Overhaul:
- Refactored Flutter Chat Example into a clean, layered architecture (Models, Services, Providers, Widgets).
- Rebuilt CLI Basic Example into a robust conversation tool with interactive and single-response modes.
- Cross-Platform GPU: Verified and improved hardware acceleration on macOS/iOS (Metal) and Android/Linux/Windows (Vulkan).
- New Build System: Consolidated all native source and build infrastructure into a unified
third_party/directory. - Windows Support: Added robust MinGW + Vulkan cross-compilation pipeline.
- UI Enhancements: Added fine-grained rebuilds using Selectors and isolated painting with RepaintBoundaries.
0.1.0 #
- WASM Support: Full support for running the Flutter app and LLM inference in WASM on the web.
- Performance Improvements: Optimized memory usage and loading times for web models.
- Enhanced Web Interop: Improved
wllamaintegration with better error handling and progress reporting. - Bug Fixes: Resolved minor UI issues on mobile and web layouts.
0.0.1 #
- Initial release.
- Supported platforms: iOS, macOS, Android, Linux, Windows, Web.
- Features:
- Text generation with
llama.cppbackend. - GGUF model support.
- Hardware acceleration (Metal, Vulkan).
- Flutter Chat Example.
- CLI Basic Example.
- Text generation with