Introducing llamafile “llamafile lets you turn large language model (LLM) weights into executables.”

HN Discussion of llamafile has some cool application of this given by simonwllamafile is the new best way to run a LLM on your own computer

LMStudio + FreeChat

  • Freechat “Local, secure, open source AI chat for macOS”
  • LMStudio “Discover, download, and run local LLMs”

Jan: Open source ChatGPT-alternative that runs 100% offline; has to see how it compares to ollama; brew install --cask jan

  • has local api server
  • connect to cloud api servers like OpenAI

Experimenting with local LLMs on macOS | Fatih’s Personal Blog; Sep 2025 - covers some of the above stuff, but also enumerates different decision points like: model size, runtime, quantization, vision models, reasoning, tool use, agents …

Setting up local LLMs for R and Python by Posit; Aug 2025. This is an introductory tutorial for R and Python on how to setup ollama and then use ellmer and chatlas packages for R, and python respectively to connect to it.


Note on Deepseek v2; Aug 8, 2024


Sep 26, 2025

  1. GLM-4.5-air: best agentic/coding model that runs on consumer hardware at very decent speeds. Rivals Claude 4-sonnet.
    1. Nousresearch/hermes-70B: the only model that will do whatever you ask, and tell you whatever you want to know. Literally critical to have.
    1. GPT-OSS-120B: very intelligent it’s like having 4o at home, great context window, great agent
    1. Qwen3-coder-30B-3A-instruct: very good coding agent, excellent workhorse, incredibly fast
    1. Mistral-magistral-small: very fast, excellent agent, great coder, multimodal, punches way, way, above its size.

Sep 26, 2025 about Cline + LM Studio: the local coding stack with Qwen3 Coder 30B - Cline Blog “Qwen3 Coder + Cline + LM Studio for high-quality local coding”

Also mentioned: facebook/cwm · Hugging Face — llama3 + long-context modifications (alternating attention + large RoPE scaling). MLX should be supported out of the box (verify).