Just starting this page to share what AI tools I have been playing with. Kind of a journal, but might write up separate articles if I end up using any of these in my regular workflow.
—————
20250119
To process many PDFs, install these.
brew install poppler
brew install ocrmypdf
—————
20250117
Play around with the context size for different models. Default ollama context size is 2048. Don’t go wild as you will get odd results.
https://github.com/ollama/ollama/issues/8356#issuecomment-2579221678
$ ollama run llama3.2
>>> /set parameter num_ctx 4096
Set parameter 'num_ctx' to '4096'
>>> /save wizard
Created new model 'wizard'
>>> /bye
$ ollama show wizard
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q4_K_M
Parameters
num_ctx 4096
stop "<|start_header_id|>"
stop "<|end_header_id|>"
stop "<|eot_id|>"
License
LLAMA 3.2 COMMUNITY LICENSE AGREEMENT
Llama 3.2 Version Release Date: September 25, 2024
$ ollama run wizard
>>> hello
Hello! How can I assist you today?
>>> /bye
—————
20250108
There are many cool models to play around with on Ollama. Try these: Qwen, Granite, Deepseek, Llama, Gemma.
—————
20250104
Been playing around with open source AI models quite a bit in 2024. They are getting much better. One of the key development has been the context length of these models. The “original” ChatGPT that we started with in 2022 (yes, only 2 years ago), was only 4096 tokens. Loosely meaning that only the first 4096 “important” words in your input matters. The newer models can now do 128k tokens. A typical non-fiction book has between 50,000 and 75,000 words. Kind of loosely means the current models can now read the whole book in one shot. That’s a big deal, because as humans, when we read a book, by the end of it, we would have forgotten about the a few chapters back, but the new models can process them all at once.
And it’s not just OpenAI that can do these. Many new models are already way better than the ChatGPT we were amazed by 2 years ago. At least 3 of these models are by Chinese teams, Qwen, Yi and Deepseek. Even with the trade war, etc, etc. Interesting times ahead.
—————
20240807
Using open source LLM can be confusing, so many choices, even for the same model, there are so many variants. Here are a few articles which help to explain what they are:
* Base, Instruct, Chat and Code models
* Parameters or Model size (8B, 14B, 70B and more)
* Quantisation
* Context size (8k, 32k, 128k)
https://thoughtbot.com/blog/understanding-open-source-llms
https://www.reddit.com/r/LocalLLaMA/comments/142q5k5/updated_relative_comparison_of_ggml_quantization/
https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9
20240807
Leaderboard of which LLM is best for code editing and refactoring.
https://aider.chat/docs/leaderboards/
https://aider.chat/2024/07/25/new-models.html
I am using deepseek-coder for now https://ollama.com/library/deepseek-coder:latest
20240807
Another app to play around with on Mac, to add LLM integration to your system, a quick shortcut key and you can have direct answers from top LLMs online or local on your computer. alter – https://alterhq.com/
20240807
Even Apple Intelligence will be making good use of prompt engineering with ChatGPT. Wider discussion on stochastic vs deterministic computing. They are not explicitly programming instructions for the LLMs, but using natural language to instruct the LLMs to get the result with certain levels of unpredictability. The programming world will take time to adjust as they have mostly been doing imperative programming all along. Stochastic is not just useful for AI, but for quantum computing.
To play around with it, you will need to install macOS 15.1 beta, not 15 beta. You will find the files inside /System/Library/AssetsV2/com_apple_MobileAsset_UAF_FM_GenerativeModels
Source: https://www.reddit.com/r/MacOSBeta/comments/1ehivcp/macos_151_beta_1_apple_intelligence_backend/
20240807
Another useful prompt with GPT-4o to get clean JSON output.
Please generate a JSON response that adheres strictly to the following schema. Do not add or remove any fields, and make sure to follow the exact structure.\n\n${json_scheme}\n\nFill in the placeholders with appropriate values based on the image the user sends. Don't return with backticks, simply return with the json
20240727
Use “–verbose” option with ollama to display the performance of the model.
e.g. ollama run llama3.1:70b-instruct-q4_0 –verbose
total duration: 1m29.230942s
load duration: 40.905541ms
prompt eval count: 26 token(s)
prompt eval duration: 2.024605s
prompt eval rate: 12.84 tokens/s
eval count: 1013 token(s)
eval duration: 1m27.162336s
eval rate: 11.62 tokens/s
Can’t run the 405B model on my Mac Studio M1 Ultra with 64GB RAM.
Will try this soon, to distribute the load to 2 (or more) Macs.
Eco Labs + MLX.
https://x.com/ac_crypto/status/1815969489990869369
https://x.com/awnihannun/status/1815972700097261953
Hard to say that I need the largest model. But the idea of being able run that at home seems cool. The 70B model is the one I use on my laptop. Waiting for codellama to be updated to use 3.1.
20240724
Llama 3.1 is out, available in 8B, 70B, and 405B versions. It’s big. I need to cleanup my Mac Studio to play with the 405B, currently just testing with the 70B version.
https://www.marktechpost.com/2024/07/27/llama-3-1-vs-gpt-4o-vs-claude-3-5-a-comprehensive-comparison-of-leading-ai-models/
To cleanup Ollama a little, use “ollama list” to show the currently installed models and “ollama rm <model name>” to remove a model. New 405B will take up 231GB.
20240723
Getting Copilot setup again. Want to play around with coding with AI again. (Kind of annoying that I don’t get the 30-day free trial as participated in the year-long Technical Preview.)
Try Codeium too.
I am setting it up for Xcode and VSCode. You will need https://github.com/intitni/CopilotForXcode and https://github.com/intitni/CodeiumForXcode
Will need to install nvm and node; use “which node” to check the path of your node installation.
20240517
Been playing with Llama 3 on my Mac. Easy to install with Ollama https://ollama.com. Might try out Gemma and Mistral eventually too. Llama 3 isn’t great for Chinese output. There is another tweaked model. https://huggingface.co/shenzhi-wang/Llama3-70B-Chinese-Chat , will figure out how to use that soon.
20240516
Using Audio Hijack from Rogue Amoeba to transcribe some old voice recording. It’s not bad. Powered by Whisper https://github.com/openai/whisper
Also found a free app, Aiko, that use Whisper to do the same on the Mac. https://apps.apple.com/us/app/aiko/id1672085276
Leave a Comment