Thursday, June 21, 2012
MapR at UJUG - Real-time Hadoop (Ted Dunning)
Real-time and Long-time with Storm and MapR - http://info.mapr.com/ted-utahjug. Slides may take some time to come online.
Hadoop is great for processing vats of data, but sucks for real-time (by design).
Real-time and Long-time working together. Hadoop for the Long-time part, Storm for the Real-time part. Presentation can blend the two parts into a cohesive whole.
Simply dumping into NoSQL engine doesn't quite work.
Insert rate is limited
I have 15 versions of my landing page
Each visitor is assigned to a version
- Which version?
A conversion or sale or whatever can happen
- how long to wait?
Probability as expressed by humans is subjective and depends on information and experience.
- Compute the distribution
- Sample p1 and p2 from these distributions
- Put a coin in bandit 1 if p1 > p2
- Else, put the coin in bandit 2
Very interesting, which I could capture more of the slides and demos. Basically, it's about trying to maximize value of what you're looking for with little knowledge. Regret is the different between a perfect outcome and the actual outcome.
Ted showed a very cool video that shows the algorithm learning in real time. Video link may be in the slides.
We can encode a distribution by sampling.
Original Bayesian Bandit only requires real-time
Generalized Bandit may require access to lng history for learning
Bandit variables can include content