Posted November 28, 2017
By Jason Young
On 27-28th of September, the research teams from Worldreader and TASCHA met to kick off the Mobile Reading Data Exchange. During these two days of meetings we began exploring the dataset, generated some of the research questions that will guide the project,and developed our plan for analysis. This post discusses the research questions that we generated throughout the meetings.
We cast a very wide net, as you can see by the long list below. This is largely due to the nature of big data research – due to the size and changing nature of big datasets, it is often quite difficult to know what types of questions the data will support until some initial research has been completed. This is why many big data research projects begin with an exploratory data analysis stage. This stage, which uses highly inductive and descriptive methods to get a better sense of the data, is important for identifying potential explanatory variables and generating research questions and hypotheses. As a result, there will be questions in our list that we will be able to answer, some that we will be able to partially answer with serious qualifications, and others that we won’t be able to answer at all. It is also likely that we will come across answers to questions that we didn’t ask about in the first place!
Nevertheless, this list will help guide some of our initial explorations of the data, and also offers a good starting point for understanding the intellectual goals of our project. We came up with several different clusters of questions, with topical areas including user demographics; device use; user engagement; linguistic, temporal, and geographic patterns; reading speed; education and content; and research methods. Below, we describe the overall research goals related to each cluster, and then list some of the questions that we came up with during brainstorming. We will post updates as our exploratory analysis leads to shifts and revisions in these initial questions!
These questions all explore who is engaging with Worldreader, and how those users are interacting with the site. We are particularly interested in whether there are different profiles of users - for example, are there ways to identify power users, short-term users, long-term users, etc.? From Worldreader’s perspective, this type of information might be used to direct improved services to particular types of users - for example, to try to find ways to increase the retention of short-term users. From TASCHA’s perspective, we would love to be able to generalize answers to these questions to get insight, more broadly, into the identities and behaviors of digital readers in the Global South. While these are some of the most interesting questions that this project will ask, they are also some of the most difficult to answer. This is because we do not have a lot of demographic information - only registered users are able to contribute any demographic information at all, and not all of them choose to do so1.
In answering these questions about demographics, the following variables were flagged as being of interest:
These questions focus on the types of devices that Worldreader users utilize in order to access the application.
This is another highriority area of research, since it explores user engagement with the Worldreader app. This is particularly valuable to Worldreader, since they are always looking for ways to increase the engagement both in terms of quantity and quality of their users. Some of these questions are more theoretical in nature, such as questions about how ‘engagement’ should best be measured given the data we have. Other questions are much more practical what drives user engagement, what behaviors predict long or short term engagement by an individual user, etc.
These questions all focus on reading patterns across different language groups, different times of the day or year, and different countries.
Questions about reading speed can be quite useful, for example, in exploring whether digital reading has a correlation with reading fluency over the long run. If we can track the longerm speed of a reader, then we can also analyze whether that speed increases over time. Unfortunately, it is also quite difficult to operationalize reading speed as a research variable. Given current data constraints, it is difficult to determine whether a reader is just very slow on a particular page, or whether they have become distracted from a book. A number of other conditions might also disrupt a correlation between speed and reading fluency for example, we don’t have data on the reading difficulty of the reading material, which can have an impact on reading speed. In other cases, where users are just learning to read (e.g., kids or people starting to read in a second language), reading speed may have little to no correlation to reading fluency. These issues make this category of questions very interesting, but perhaps not something that we can currentlyultimately answer through backend data alone.
Here we ask questions about the content that is (1) accessible from Worldreader and (2) being used by readers.
These are metaevel questions focused on the methods that we use to perform the research itself.
Finally, to answer some of these questions we will have to define a number of tricky measures. These include the following:
Worldreader has chosen to make registration for their app optional in order to reduce the barriers to people reading on the app. ↩