In the article, “Creating a Data Analysis Plan: What to Consider When Choosing Statistics for a Study,” Scot Simpson states that “the first step in a data analysis plan is to describe the data collected in the study” (2015 p. 312). In a previous blog post, Lucas outlined the data variables in our dataset. With those variables in mind, I will present some descriptive statistics of our overall data, and two smaller segments of the overall data.
This research is the closest that TASCHA has come to a big data project – prior to now, we’ve mostly been able to share data via email or Google Drive. That is much more challenging with a dataset that contains over 500 million rows, and is 300 GB when uncompressed. And, this is only the static dataset upon which we are focusing our efforts – Worldreader’s live datasets accumulate another million rows of data every day! This might not be the same magnitude of data that employees at Google or Amazon work with every day, but it is certainly large for us!
The last week of October marked the 10th annual celebration of Open Access Week, an event designed to “help inspire wider participation in helping to make Open Access a new norm in scholarship and research.” The Open Access and Open Data movements are becoming increasingly visible, both within research communities and with the general public.