Thursday, June 21, 2012

MapR at UJUG - Real-time Hadoop (Ted Dunning)

Real-time and Long-time with Storm and MapR - Slides may take some time to come online.

Hadoop is great for processing vats of data, but sucks for real-time (by design).
Real-time and Long-time working together. Hadoop for the Long-time part, Storm for the Real-time part. Presentation can blend the two parts into a cohesive whole.

Simply dumping into NoSQL engine doesn't quite work.
Insert rate is limited

I have 15 versions of my landing page
Each visitor is assigned to a version
 - Which version?
A conversion or sale or whatever can happen
 - how long to wait?

Probability as expressed by humans is subjective and depends on information and experience.

Bayesian Bandit
 - Compute the distribution
 - Sample p1 and p2 from these distributions
 - Put a coin in bandit 1 if p1 > p2
 - Else, put the coin in bandit 2

Very interesting, which I could capture more of the slides and demos. Basically, it's about trying to maximize value of what you're looking for with little knowledge. Regret is the different between a perfect outcome and the actual outcome.

Ted showed a very cool video that shows the algorithm learning in real time. Video link may be in the slides.

We can encode a distribution by sampling.

Original Bayesian Bandit only requires real-time
Generalized Bandit may require access to lng history for learning
Bandit variables can include content

  - Craig