Paper: A Study of Practical Deduplication
This is another highscalability.com article. This really has huge implications for big data systems. The more data these systems hold, the more important this becomes. My favorite for this is ZFS. I run Solaris 10 on my own systems that I haven't tried using the dedup feature yet. I've read several articles on ZFS dedup including one on using dedup with the compression feature which is a bit tricky.