Longitudinal Analytics
of Web Archive Data

 

Distributed Access to Large Scale Data Sets

This area focuses on making large data set accessible to researchers around the world.. (More…)
 
 
 
 
 

More about Distributed Access to Large Scale Data Sets: 

This area focuses on making large data set accessible to researchers around the world. Disk capacities have so far outstripped wide area networks that current systems must co-locate the data and processing capacity in the same physical location. Our primary efforts will be to extend the Hadoop system so that data can be read and filtered at its home location and then distributed out to processing nodes anywhere in the Internet. This approach will require extensions to Hadoop for wide area operation as extensive use of distributed indices to capture the location and contents of the remote heterogeneous data stores.