Sunday, October 05, 2008

Application Streaming

Here is a problem... You want to increase the bandwidth per pin on a chip but power budget dictates that the core cannot be run faster. So you increase the number of cores and buffer to meet the bandwidth per pin. Why is this important to application streaming?

Systems are deal with real-time information do not have the luxury of disk storage. Most of the RT information is computed inside the chip complex and semiconductor technology places the current bottleneck at the pins. To remain real-time, we need to get the information off the chip and on the network on its way to its consumer as fast as possible. If I am a remote consumer of information, the core that is provisioned to me is not the bottleneck but the pins that it drives to get data to my handheld is.

So I have now resigned to the fate that my desktop will be hosted in the cloud and my service provider will charge me for the resolution at which I interact with it. Higher the resolution, higher the charge. I am sure they are not thinking that people will be sharing a remote session on a server like 1970s mainframe. What will make people give up their local desktop is if the desktop is like TV. Consumer buys the screen and the service provider delivers information. Service providers also differentiate among one another through the range of supported peripherals like today's MMPOG (games).

All of this puts the emphasis on application streaming. Moving code to client for execution is not going to fly. To make it usable, I need a thick client and that kills the cloud economics. Remote session is too slow. Streaming looks like the only approach right now. And for it to be useful, the chips need to increase the bandwidth per pin.

Costs in Training LLMs

 I went through the Llama-2 white paper that was released with the model by meta. I was hoping to learn some special technique they may be ...