Friday, May 10, 2019

Data is the new Narcotic

About six months ago, I tweeted saying Data is not the new oil, but it is the new narcotic.  A key property of any narcotic is that you need it periodically and with time more of it to achieve onset. Data has the same effect on any cloud based digital platform. Let me explain.

Let's say you are like me - frugual - and shop in big lots. A data at a point in time would be the receipt issued to you that has item's SKU, your loyalty number (for identification) and (most importantly) location/time of the purchase. If the store shared this data with a digital platform, that data would be the first attempt at a drug for that platform. The platform would need this data periodically i.e. your receipt at your next purchase. Why? Because the analytics is done on a time series and insights like time between purchases and second order insights like cough syrup in June means spread of some respiratory infection in that zipcode or if that fails then cross that data with doctor's office visits to arrive at a probability of upper respiratory chronic disease or just smoking. All of these common sense analytics can be done by a computer in seconds, but it needs data to start and it needs to be fed periodically to increase its accuracy. And to arrive at second order insights, it needs more of that data about you. In fact, if all your activities from body functions, to daily habits are digitized and fed to this platform, the smarter it will get on predicting your intentions. To a point that it will predict (serve you an advert) before you have felt the need for it. Like when you wonder how did it know that I need a cough syrup.

They are calling this type of learning where repeated encounters with a data point increases the weight of that happening again (probability) on an underlying learning network:  "AI". And the data is the narcotic that this system needs regularly and in increasing quantities.

Costs in Training LLMs

 I went through the Llama-2 white paper that was released with the model by meta. I was hoping to learn some special technique they may be ...