Finetuning
- PEFT + TRL — Examples of using peft with trl to finetune 8-bit models with Low Rank Adaption (LoRA)
- Unsloth — Fine-tuning LLMs Guide | Unsloth Documentation
- Axolotl — Axolotl AI - Open Source Fine Tuning
- unsloth is great if you are low on resources and want faster training, but it has some rough edges here and there and some things breaks in new releases. You have to dig into source code to know what changed. But it’s a lot stable and better for the past few months. I used unsloth for finetuning qwen 2.5 vl at work for a pretty complicated task and it worked great for us. - trl + peft: this is more stable and unsloth uses most of the trainers from trl with their own patching, so similar APIs in both cases. - axolotl: this is great option if you want ready to go configs and don’t have to deal with hand rolling code and provides easy distributed GPU support. — via
LLMs that you can run on the desktop or a “regular(ish) PC”.
A look at Apple’s new Transformer-powered predictive text model
the model being used by
AppleSpell, an internal macOS application that checks for spelling and grammar mistakes as you type.
found the predictive text model in
/System/Library/LinguisticData/RequiredAssets_en.bundle/AssetData/en.lm/unilm.bundle. The bundle contains multiple Espresso model files that are used while typing (Espresso appears to be the internal name for the part of CoreML that runs inference on models).
a set of 15,000 tokens in
unilm.bundle/sp.datthat pretty clearly look like they form the vocabulary set for a large language model.
Read the rest of the above blog post to see how the tokenizer works, model architecture (GPT-2?) of about 34M parameters and hidden size of 512 units, which makes it smaller than GPT-2 models.
Orca 2: Teaching Small Language Models How to Reason - Microsoft Research; see
M2 Max with 64GB RAM. It does ~50 tok/s on our q4 quantized 7b mistral fine-tune, with comparable speeds to GPT-4 via
moonbeam is a computer-vision model can answer real-world questions about images. It’s tiny by today’s models, with only 1.6B parameters. That enables it to run on a variety of devices, including mobile phones and edge devices.
Apache 2.0. You can use moondream for commercial purposes.
Applications:
- Security
- Drone and Robotics
- Retail and shopping —
- apache 2.0 license
- “Our goal is to create models that excel at RAG. Since RAG works by processing information at runtime, the main constraint is LLM size. For RAG, models don’t need to be huge; they just need strong text comprehension to give accurate answers when provided with the right context.”
- blog post: SLM Journey Unveiled — “In recent months, the landscape of language models has been enriched by the emergence of several small language models (e.g. TinyLlama, Phi2, Gemma, and StableLM2)”
Florence - a Microsoft Collection; SOTA 200M & 800M parameter vision foundation model. MIT Licensed!. 200M checkpoint beats Flamingo 80B (400x bigger model) by a huge margin. Performs captioning, object detection and segmentation, OCR, phrase grounding and more. Leverages FLD-5B dataset - 5.4 billion annotations across 126 million images. Multi task learning. Fine-tuned model checkpoints beat the likes of PaLI, PaLI-X.
“Florence2 200M, Qwen2 500M, MSFT InstructLM 500M With little fine-tuning they unlock so many creative and on-device use cases” via
Fine-tune Llama-3-8B with Llama-3-405B synthetic data
A simple notebook for fine-tuning a small model (Llama-3-8B) to be an expert in a specific domain, by letting a larger, more capable model (Llama-3-405B) teach it (by generating synthetic dataset for that domain).
—
nisten/Biggie-SmoLlm-0.15B-Base · Hugging Face via
—
AMD Unveils Its First Small Language Model AMD-135… - AMD Community
Papers
we lay out the position that small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, and are therefore the future of agentic AI. Our argumentation is grounded in the current level of capabilities exhibited by SLMs, the common architectures of agentic systems, and the economy of LM deployment. We further argue that in situations where general-purpose conversational abilities are essential, heterogeneous agentic systems (i.e., agents invoking multiple different models) are the natural choice. We discuss the potential barriers for the adoption of SLMs in agentic systems and outline a general LLM-to-SLM agent conversion algorithm.
Just an 8B model trained on calling tools and other LLMs to answer queries It’s a great demo of what frontier SLMs will be about in 2026
MacOS desktop
- Quantized Gemma 2B running at 157 toks/sec in MLX on M1 Max laptop
- simonmysun/ell: A command-line interface for LLMs written in Bash.
Phone
- “phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone.” (Abdin et al., 2024). (No code, or model was announced with the paper).
aiOS™ by Hyperspace “Organizing the World’s AI Agents. Join the world’s largest peer-to-peer AI network and start earning points”