Monday, June 18, 2012

Introducing Distributed Execution and MapReduce Framework

Read through Introducing Distributed Execution and MapReduce Framework today. I had read about some of the plans for Infinispan a while ago and was quite interested to see how it was coming along.

If you're not familiar with Infinispan, it's a Java-based, in-memory data grid sponsored by JBoss/Redhat. It's really a very cool project. What I think is most impressive about Infinispan is the ability to run Map/Reduce jobs across the data grid using the Callable interface. It looks like they have an extended DistributedCallable and DistributedExecutorServiceto help with running tasks on the data grid.

They are modeling their Map/Reduce framework after Hadoop while providing what could a more familiar Callable/ExecutorService model. It may lower the barrier to entry for a subset of developers more familiar with standard java threading models. I guess the only real drawback is that since this is an in-memory data grid, you're limited on how much data that you can traverse. However, it should be very fast since you're memory-centric instead of file-centric as with Hadoop.

  - Craig

2 comments:

  1. Craig,

    Infinispan is in fact not only an in-memory grid as it supports state persisting to configurable cache stores. The real limitation of our Map/Reduce implementation was executing reduce phase on a single node rather than distributing execution across the cluster. This has been adressed recently: http://infinispan.blogspot.ca/2012/07/mapreduce-improvements-in-infinispan.html

    Thanks for the review and the feedback!

    Regards,
    Vladimir

    ReplyDelete
  2. Introducing Distributed Execution and MapReduce Framework
    Thanks for sharing the information
    informatica training

    ReplyDelete