The Challenges of Working with Big Data

This research is the closest that TASCHA has come to a big data project – prior to now, we’ve mostly been able to share data via email or Google Drive. That is much more challenging with a dataset that contains over 500 million rows, and is 300 GB when uncompressed. And, this is only the static dataset upon which we are focusing our efforts – Worldreader’s live datasets accumulate another million rows of data every day! This might not be the same magnitude of data that employees at Google or Amazon work with every day, but it is certainly large for us!

Open Data and Research Ethics

The last week of October marked the 10th annual celebration of Open Access Week, an event designed to “help inspire wider participation in helping to make Open Access a new norm in scholarship and research.” The Open Access and Open Data movements are becoming increasingly visible, both within research communities and with the general public.

Data Variables

This is a short entry that just describes the dataset with which we are working. Each time a user interacts with the Worldreader application, the app collects information about that interaction. The following data may be collected for each interaction, although not all interactions will include data for every variable.