Thursday, September 20, 2012

Biggest Bottleneck in Bigdata is the human

Today's WSJ carries an article on Bigdata being the brain behind hiring in companies. There are lots of Bigdata articles all around and each one points to a new bottleneck for the industry to overcome. There is one bottleneck that no one discusses. It is the ultimate consumer of Bigdata - the human. If we have trouble getting computers to deal with Bigdata, imagine presenting the analysis to a human. We are simply not wired to consume all this analysis. That is where visualization steps in.

So what is visualization of Bigdata? It is the rendering of insight in a data analysis using images, animations, selective disclosures, progressive disclosures or charts/figures/clouds/bubbles etc. The challenge here is not in rendering these visual elements, but in mapping these elements to the data that is sourced across the internet, parsed by multiple parsers, collated/curated and correlated with multiple streams and displayed on a canvas that is hosted in yet another place. Then there is the debate of HTML5 vs native too.

Visualization of Bigdata is a field actively targeted by research community and there is a strong business case for it as well: it solves the biggest bottleneck in the bigdata ecosystem i.e. the human.

Costs in Training LLMs

 I went through the Llama-2 white paper that was released with the model by meta. I was hoping to learn some special technique they may be ...