The reason is simple: because you can. M1 or later Macs with 16GB of RAM or more are powerful enough to run useful models on device without sending any data to the cloud. And it’s surprisingly easy. Just download Ollama from ollama.com, run the app, pick a model that fits your hardware, and you’re done. If you’ve invested in a powerful Mac, why not put it to work? Most of us don’t push our computers to their limits with gaming—but running local LLMs is an excellent way to utilize that hardware you already own. (If you can afford to buy 2 full-spec Mac Studio M3 Ultra with 512GB RAM each, you can even run the full version of Deepseek R1 671b with EXO locally at reasonable speed.)
Beyond the practical appeal, there’s a more important business reason: data privacy. When you send queries to OpenAI, Anthropic, Claude, or Google, you’re essentially giving these companies access to your data and usage patterns. Their terms and conditions make it clear—this information can be used to train future models and inform their competitive strategies. While you may not be able to avoid these services entirely, you can be selective. Local LLMs allow you to keep sensitive or personal information on your machine, where it belongs. The rule is simple: don’t send anything to these providers that you wouldn’t want them to have a permanent copy of.
Running Ollama locally puts you back in control of your data while letting your hardware do the heavy lifting. Now, here are the steps to get Ollama running on your macOS (Apple Silicon)
- Visit ollama.com and click the Download for macOS button
- Open the downloaded file and drag Ollama.app to your Applications folder
- Launch Ollama from your Applications folder or Spotlight
That’s it—Ollama will now run in the background.
Running Your First Model
Open Terminal and run:
ollama run mistralThe first time you run this, it will download the model (may take a few minutes). After that, you’ll see a prompt where you can chat with the model. You can use other models, for the example here, we use “mistral” because it is small and can run on most Macs. Try “ollama run gemma3:4b”, this is my preferred small model to use for quick rewording tasks.
To stop, press Ctrl+D or /exit.
RAM Recommendation by Configuration
| RAM | Recommendation | Best Models |
|---|---|---|
| 8GB | Not recommended for serious use | Only Phi-4, Mistral 7B, Gemma 2 (very slow, system will strain) |
| 16GB | Good baseline | Mistral 7B, Llama 3 8B, Phi-3 (fast and smooth) |
| 24GB+ | Excellent | Llama 3 13B, Mistral Large, larger quantized models |
RAM Configuration Recommendation
For most Apple Silicon Mac users, 16GB is the sweet spot. It provides smooth performance with fast models like Mistral 7B and Llama 3 8B without system strain. If you’re considering a purchase, prioritize 16GB over base 8GB configurations.
Common Commands
ollama pull mistral # Download a model
ollama list # See installed models
ollama run mistral # Run a model interactively
ollama rm mistral # Remove a model to free space
ollama show mistral # Show the model details
ollama run mistral --verbose # Show timings for responseOther Common Reasons to Use Local LLMs Instead of Paid APIs
• Privacy and data control—data never leaves your computer
• Cost efficiency—no recurring subscription or API fees after initial hardware investment
• Reduced latency and faster response times—no network round-trips to external servers
• Customization and fine-tuning—full control over model behavior and optimization
• Offline availability—works without internet connectivity
• Data sovereignty and regulatory compliance—keeps processing within local boundaries
• Technical control and experimentation—freedom to modify, debug, and iterate without restrictions
• No dependency on vendor changes—immunity to API changes, pricing increases, or service discontinuation
Other Tips
• Get Enchanted on your Mac, iPhone and iPad. It’s free, and you can easily access your Ollama remotely. Just need to configure your home router for external access. Or follow this guide.
• Learn basic bash script for your terminal to process multiple documents.
• Or follow this guide to have your own RAG.

Leave a Comment