Sat 13 Jun 2009
Reading an interesting paper on d-Left Hashing (pdf link) by Bonomi, Mitzenmacher, et. al. This is a space and effeciency improvement on Bloom filters. Wondering how it could be incorporated into a Hadoop mapfile to avoid scanning compressed blocks for keys that aren’t present. Maybe the work in hbase on o.a.h.hbase.io.BloomFilterMapFile would provide good clues. Need to understand the dynamic bit reassignment stuff first though.