We are thrilled to announce the official release of QuantLLM v2.0! This major release brings a completely redesigned API, enhanced performance, and a beautiful, professional user experience.
✨ Highlights
🚀 TurboModel: The Unified API
We've unified model loading, quantization, finetuning, and export into a single meaningful class: TurboModel.
from quantllm import turbo
# 1. Load: Auto-detects memory & capabilities
# (Automatically enables Flash Attention 2 & 4-bit loading)
model = turbo("meta-llama/Llama-3-8B")
# 2. Chat: Simple completion interface
print(model.generate("What is the future of AI?"))
# 3. Finetune: 1-line training with LORA
# (Automatically handles DataCollators, Gradient Checkpointing)
model.finetune(my_dataset, epochs=3)
# 4. Export: Convert directly to GGUF
model.export("gguf", "llama3-finetuned.gguf")📦 Pure Python GGUF Export (No Binaries!)
Forget compiling llama.cpp or dealing with complex C++ toolchains. QuantLLM v2.0 includes a native Python GGUF writer.
- Works on Windows, Linux, and Mac natively.
- Supports all major quantization types (
Q4_K_M,Q8_0,Q5_K_M). - Zero external dependencies.
🎨 Beautiful UI
We've overhauled the logging system to provide clear, actionable feedback:
- SmartConfig Panel: Displays exact model parameters (e.g., "7.24B") and memory compression stats ("14GB ➔ 4.5GB (Saved 68%)") before loading.
- Themed Logging: A cohesive Orange theme (
orange1) for all spinners, progress bars, and success messages. - Clean Output: Suppressed noise from Hugging Face/Datasets libraries.
⚡ Performance Optimizations
torch.compileEnabled: Automatically compiles training graphs for up to 2x faster training on modern GPUs.- Dynamic Padding: Batches are padded dynamically, significantly reducing VRAM usage compared to static padding.
- OOM Prevention: Automatically sets
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:Trueto prevent fragmentation crashes.
�️ Technical Improvements & Bug Fixes
- FIXED: Resolved
TypeError: object of type 'generator' has no len()during GGUF tensor processing. - FIXED: Solved
ValueError: model did not return a lossduring finetuning by integratingDataCollatorForLanguageModeling. - FIXED: Resolved
AttributeErrorwhen passingSmartConfigobjects as overrides (preservedtorch.dtypeobjects viaasdict). - CHANGED: Disabled
WandBlogging by default to keep the console clean (enable viaWANDB_DISABLED="false").
Installation:
pip install git+https://github.com/codewithdark-git/QuantLLM.gitMade with ❤️ by Dark Coder