Tuesday, May 26, 2026

Token Pricing: Reasoning Tax,, GPU Utilization & GPU Recency

 According to newly published TPI (Token Price Index), average cost of 1M tokens is just over $2. It rose over 75% in 1 year. So does a model serving provider make money off this pricing? Yes and No. The marginal cost of power + infra + facilities ranges between 6 cents to 1.60 cents per million token. The key factors driving costs higher are (a) low utilization of GPU (currently around 15%), (b) reasoning tax (the token generated inside to support customer tokens (c) GPU recency. 

On the latest GPU, the cost of generating token is lower creating an incentive for providers to move up to the latest GPU. But this also means they pay for supply constraint which will likely stay forever because the GPUs that are one generation behind are not used. So the amortization on those GPUs which assume 100% consumption for the full 6 years is just too low i.e. margins are artificially high. 

The GPU utilization relies on many techniques at deployment time but batching is the one which makes a meaningful impact. Assuming 100% of the input is batched is euphoric. We are average 15-25% on recent GPU and almost 1-2% on older GPUs. No one wants to run on legacy GPUs. Net net this factor is also artificially inflating margins. 

The reasoning tax comes from ordinary queries but especially from agentic AI. Agents generate output that is sent back to the model which then starts generating reasoning tokens which are currently not charged to the customer. These reasoning token cost is eaten by the provider and is referred to as "Reasoning Tax". 

Saturday, April 11, 2026

OpenClaw is MicroSaaS

 This OpenClaw has a annoyingly long setup, but after all the effort, it is worth it. I set it up to routinely check my gmail. The process is well documented on openclaw but it does require account with Google Cloud, Installation of gcloud SDK, Go, GoG. And of course a lot of knowledge of how Linux works. I did it on WSL on windows. 

Finally, it gave me all my emails which is not the point. You can ask it questions which you can't in a email client. Check this out. 

Question: Who keeps sending me email? In the last week who has sent me the most emails? 



Question: Can you summarize all the emails from Federal Reserve Board?



Question: Can you summarize the April 8th FOMC Minutes? 


Ok, so all of this is pretty straightforward. But note that I haven't used any tokens or have I?  What about monetization? Who is making money with all this? 

It turns out LLM providers like Anthropic, Gemini are charging fees for tokens usage. Cloud hosting services charge for any managed instances (I am using my laptop). ClawHub skills for specific vertical (read custom integration) get paid. Skills vendors are making some money (not enough to quit your job yet but good for Happy Meals from McD). 



Token Pricing: Reasoning Tax,, GPU Utilization & GPU Recency

 According to newly published TPI ( Token Price Index) , average cost of 1M tokens is just over $2. It rose over 75% in 1 year. So does a mo...