Longitudinal Analytics
of Web Archive Data


Skewed Key Spaces in Map Reduce

Lev Faerman has published a technical report on "Skewed Key Spaces in Map Reduce".

This paper discusses the effects of non-uniform key spaces (such as ones created by processing English text) on load balancing in Hadoop. It demonstrates that a potential problem exists by observing the characteristics of the English language, their effect on reducer loading and then discusses a simple improvement of Hadoop partitioners to improve load balancing.

Technical Report