Wednesday, December 24, 2025

Local LLMs does not cut it

 After using ollama on a local LLMs, I found it is not very useful and I was constantly going to online versions of the same model or proprietary models. My local setup was under $1K and the results from the various queries over last 6 months were not any better than a web search. There are two problems with local inferencing (I am going to skip the most obvious one i.e. the hardware is not as powerful). 

The first one is distribution shift where the data on which the LLM is trained is not the same and sometimes differs significantly to the prompt query. This results in incomplete or superficial answers to queries where online models provide significantly more detail. This is kind of like talking to a person who is not in the know but knows enough to sound dangerous. 

The second is overfitting. This is where the weights have decided to converge on an answer for any set of prompts that are similar. It does not find accurate answers that fit the prompt so it serves the generalized version. This is kind of like thinking in stereotypes.

The key point is the real slim shady (yes eminem) here is the data itself. If you train on prior exams and then take an exam that was set by some external body, you will find yourself not as prepared to take the exam as you thought you were. The content and theories have not changed, just the way to test is different. Failure to detect and correct these shifts can be disastrous because sometime it means the model has to be trained from scratch. Yep that $150M used to train the model was flushed down the toilet. Hopefully, in 2026, we focus more on the data and how to partition and reassemble weights and not as much on GPU/VRAM. The latter is really a commodity and in abundant supply. 

BTW, B200 only supports certain versions of PyTorch. If this continues we will get fragmentation in jobs as with every new release of GPU, all software would have to be upgraded. In the end GenAI is not much different from Natural AI. 

Happy Holidays. 



Local LLMs does not cut it

 After using ollama on a local LLMs, I found it is not very useful and I was constantly going to online versions of the same model or propri...