Introducing llamafile “llamafile lets you turn large language model (LLM) weights into executables.”
HN Discussion of llamafile has some cool application of this given by simonw — llamafile is the new best way to run a LLM on your own computer
- Freechat “Local, secure, open source AI chat for macOS”
- LMStudio “Discover, download, and run local LLMs”
Jan: Open source ChatGPT-alternative that runs 100% offline; has to see how it compares to ollama; brew install --cask jan
- has local api server
- connect to cloud api servers like OpenAI
Experimenting with local LLMs on macOS | Fatih’s Personal Blog; Sep 2025 - covers some of the above stuff, but also enumerates different decision points like: model size, runtime, quantization, vision models, reasoning, tool use, agents …
Setting up local LLMs for R and Python by Posit; Aug 2025. This is an introductory tutorial for R and Python on how to setup ollama and then use ellmer and chatlas packages for R, and python respectively to connect to it.
Note on Deepseek v2; Aug 8, 2024
- GLM-4.5-air: best agentic/coding model that runs on consumer hardware at very decent speeds. Rivals Claude 4-sonnet.
-
- Nousresearch/hermes-70B: the only model that will do whatever you ask, and tell you whatever you want to know. Literally critical to have.
-
- GPT-OSS-120B: very intelligent it’s like having 4o at home, great context window, great agent
-
- Qwen3-coder-30B-3A-instruct: very good coding agent, excellent workhorse, incredibly fast
-
- Mistral-magistral-small: very fast, excellent agent, great coder, multimodal, punches way, way, above its size.
Sep 26, 2025 about Cline + LM Studio: the local coding stack with Qwen3 Coder 30B - Cline Blog “Qwen3 Coder + Cline + LM Studio for high-quality local coding”
Also mentioned: facebook/cwm · Hugging Face — llama3 + long-context modifications (alternating attention + large RoPE scaling). MLX should be supported out of the box (verify).