QuantLLM v2.0 - Ultra-fast, Pure Python Quantization & Training #17
codewithdark-git
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We are thrilled to announce the official release of QuantLLM v2.0! This major release brings a completely redesigned API, enhanced performance, and a beautiful, professional user experience.
✨ Highlights
🚀 TurboModel: The Unified API
We've unified model loading, quantization, finetuning, and export into a single meaningful class:
TurboModel.📦 Pure Python GGUF Export (No Binaries!)
Forget compiling
llama.cppor dealing with complex C++ toolchains. QuantLLM v2.0 includes a native Python GGUF writer.Q4_K_M,Q8_0,Q5_K_M).🎨 Beautiful UI
We've overhauled the logging system to provide clear, actionable feedback:
orange1) for all spinners, progress bars, and success messages.⚡ Performance Optimizations
torch.compileEnabled: Automatically compiles training graphs for up to 2x faster training on modern GPUs.PYTORCH_CUDA_ALLOC_CONF=expandable_segments:Trueto prevent fragmentation crashes.�️ Technical Improvements & Bug Fixes
TypeError: object of type 'generator' has no len()during GGUF tensor processing.ValueError: model did not return a lossduring finetuning by integratingDataCollatorForLanguageModeling.AttributeErrorwhen passingSmartConfigobjects as overrides (preservedtorch.dtypeobjects viaasdict).WandBlogging by default to keep the console clean (enable viaWANDB_DISABLED="false").Installation:
Made with ❤️ by Dark Coder
⭐ Star us on GitHub • 💖 Sponsor
This discussion was created from the release QuantLLM v2.0 - Ultra-fast, Pure Python Quantization & Training.
Beta Was this translation helpful? Give feedback.
All reactions