User Preferences: Part 1, Genre

Posted August 8, 2018
By Jason Young

In a previous post, we examined genre progression within registered users from Nigeria. In that analysis, we found that love was one of the most popular categories, and tended to retain users relative to other categories. However, the analysis left us with a lot of additional, and important, questions about the genre preferences of Worldreader users.Better understanding of these preferences can lead to some important insights. For Worldreader, this understanding can help them to build their e-reading collection or even to build a recommendation system for users. From a broader applied perspective, it can help us understand what books might attract individuals to literacy or education programs, so that those programs are more successful. And, finally, from a social scientific perspective, reading preferences might help us understand more general social dynamics, since they give us a view into the types of knowledge and leisure being sought out across a population. With these goals in mind, this post describes some broad trends in the reading preferences of Worldreader users.

Before digging into the analysis, a quick review of the data available to us might be useful.1 Every time a user interacts with a book, Worldreader logs information including the book title, author, publisher, and the category or categories in which the book falls. These categories roughly correspond to the genre of the book, and they range anywhere from love to respiratory infections. Two attributes of the category classification system are notable, since they make our analysis more difficult. First, a single book may be associated with two or more categories. In some cases these categories may be related in a hierarchical manner – e.g., the book may be labeled as both health and malaria, where malaria is a child node of health. In this case, we can simplify analysis by only looking at parent nodes. In other cases, though, a book will be attached to categories with no relationship to one another, with no indication of which category better describes the book.2 This makes analysis difficult.3 Second, Worldreader usually does not classify the books themselves. They instead ask publishers to classify their own content, and Worldreader only classifies books themselves if the publisher chooses not to do so. This creates a methodological problem in that we cannot be certain that each publisher uses the same framework for classification. As a result, a book classified as love by one publisher could hypothetically fit the criteria for a totally different category for a different publisher. It also makes the overall category schema for Worldreader’s collection much messier, since that schema is the union of many different classification systems. Figure 1 shows all of the categories of books in Worldreader’s collection.

Network Graph of Genre Schema Figure 1.

As you can see, it is a very complex system! One of our recommendations to Worldreader was to try to develop, or select, a more streamlined schema system, and then to reclassify their books into it. Fortunately, Worldreader has been working on just that, which should aid future projects. They are currently implementing a new metadata schema in coordination with the development of a new, central database, to be completed in early 2019. It will be based on a mix of ONIX and LRMI All books will be assigned a single, primary Thema subject code, which will greatly improve the analytical capabilities of their data. This process is rife with challenges, given that Worldreader is working with an international collection of over 30,000 titles, that includes many non-Western subjects and different standards. Not only do standards have a long way to go to catch up to these complex collections, but it can be quite expensive to have books professionally catalogued, especially in the 40+ languages within the Worldreader collections. Nevertheless, these efforts should enable even more powerful analytic possibilities for Worldreader in the future, with long term payoffs in terms of building their collection and directing users to recommended content based on their reading preferences, using a simple collaborative recommender system.

Despite the complexity of the current category system, analysis has still provided some interesting, initial results. Our priority was to simply carry out some descriptive analysis, to determine what types of books are popular within Worldreader. This type of information can help Worldreader as they expand their collection, and can potentially inform content choices for other literacy programs across similar geographies. Worldreader has long known that, globally, their most popular books fall within the love category. This makes sense given global book sales – romance has long dominated the fiction market, with sales of over $1 billion per year in the United States alone (Economist 2016). This amounts to one-third of all fiction sold in the US (Charles 2017). The romance publishing industry has also proven itself to be quite innovative, and was an early adopter of the e-book format (Tapper 2014). It therefore makes sense that love books would be popular in an e-reading format like Worldreader.

However, the data allows us to ask questions beyond what Worldreader already knows about their love category. By looking at data on what registered users are viewing in the top ten countries, we can dive deeper into the reading patterns of specific demographic groups and geographies. We aggregated the number of books within each parent category that were viewed by registered users – these data were aggregated by country and gender. For the purposes of this analysis, a book is recorded whether the user read it in its entirety, or simply viewed the front cover. We did not determine whether a user has successfully completed a book. This is a constraint of this analysis, and future research might focus on refining these methods. For now, though, we believe that this approach still provides interesting results.

Across the top ten countries, love makes up 21% of all books viewed by registered users (Figure 2) – even though it only makes up 2% of the titles available on Worldreader (Figure 3). [Footnote: Note that not all titles are available in every country, meaning that this percentage could vary across the top ten countries. We were not able to access data on what books are available in each of the top ten countries. These data would be useful in shedding further light on reader preferences.] Love is followed by the categories of children’s books (14%), young adult (13%), and learn4 (13%). This is initial confirmation that Worldreader should be focused on expanding its romance collection, and an indication that reading programs may do well to choose these types of books to garner public interest.

Number of Books Viewed by Genre Barchart Figure 2.

Number of Titles by Genre Barchart Figure 3.

By breaking this analysis down by gender and country5, we are able to get a better sense of the different geographies of reading. Figure 4 shows what percentage of viewed books fall within each category, broken down by gender. Female users view more love books than any other category and, as a percentage, view love twice as much as their male counterparts. Love is followed by children’s and then young adult. In contrast, male users view the most in the learning category – 16% of viewed books fall into this category for males, compared to 6% for female users. For males this category is closely followed by children’s, love, and young adult. These breakdowns offer potential lessons, although further analysis is necessary to confirm many of these interpretations of the data. First, if one wants to design content for reading programs that specifically appeals to women then romance novels appear to be an appealing choice. Furthermore, the inclusion of romance into reading programs appears unlikely to negatively impact male participation – male interest in the love category is nearly equal to the other top male interests. Second, women appear to access children’s books more often than men, but men still view them at a fairly high rate. This may reflect the traditional role of women in caregiving, but indicates that men are involved as well. Third, men access books in the learn category far more often than women. This may reflect the higher access that men have to jobs and education, but more exploration of this is necessary to make any strong conclusions.

what percentage of viewed books fall within each category, broken down by gender barchart Figure 4.

Figure 5 breaks category patterns down by country. The most surprising finding may be just how consistent reading preferences are across these ten countries. With the exception of three countries, love is always the most viewed category of book. In these countries, love is always followed by learn or young adult. The three countries that form exceptions to this rule are Côte d’Ivoire, Ethiopia, and India. India offers the most dramatic difference from the other countries, but one that is easy to explain. Worldreader, in collaboration with Results for Development (R4D) and Pearson’s Project Literacy, implemented the Read to Kids program in India from 2015 – 2017. Through this program, Worldreader launched a Kids app that included 550 children’s books in Hindi and English.6 This program reached over 203,000 households, and explains the high level of interest in children’s books in India in Figure 5. In Cote d’Ivoire there is a slight preference for learn books over children’s books, with health coming in third place. These results may be impacted by the availability of French content within the Worldreader app, given that Cote d’Ivoire is the only one of the top 10 countries that doesn’t have English as an official language. Approximately 30% of Worldreader’s French holdings are categorized as learn and about 20% as children’s, so this tracks well with our results. Only about 1% of their French collection is labeled health, so this finding is more surprising. In Ethiopia the users demonstrate a very strong interest in learn books. More research, likely involving survey work, would be necessary to fully understand why these two countries differ from the others.

Percentage of Total Books Viewed Broken Down by Country Barchart Figure 5.

Figure 6 goes on to break reading preferences down by both gender and country, to give us the most granular look at the categories. In order to make the graph manageable, we restricted our analysis to the top four categories across these geographies – children’s books, learn, love, and young adult. It is interesting to see that there is a little more variation across the countries when breaking things down by gender. For example, women in Cote d’Ivoire exhibit a clear preference for children’s books, rather than the more typical preference for love in most of the other countries. Men view more learn books in most of the countries, but have a preference for love books in Kenya and Uganda and a preference for young adult books in Nigeria. Both men and women prefer children’s books in India. This detailed analysis can help Worldreader continue to expand their programs in these countries, from adding to their collection to designing gender-sensitive reading and literacy programs.7

Percentage of Total Books Viewed Broken Down by Country and Gender Barchart Figure 6.

These analyses provide valuable insights into the current Worldreader material that users are accessing. However, this is only half of the story since it does not tell us what type of material users want to access, but that is currently not available within the application. This type of information can be even more valuable for Worldreader as they build their collections. Next week we examine user queries to better understand user demand for new materials.


Charles R. 2017. Books Perspective: Stop dissing romance novels already. The Washington Post. 7 August.

Tapper O. 2014. Romance and Innovation in Twenty-First Century Publishing. Publishing Research Quarterly. 30: 249-59

The Economist. 2016. Book-publishing’s naughty secret. The Economist. 26 May 2016. Last Acc. 8 Aug 2018.

  1. As with all of the analysis on this blog, the data analyzed within this post came from the Worldreader application, which is aimed at adult and youth readers aged 16+ engaged in Worldreader’s Lifelong Reading Program. This program has different content than Worldreader’s Pre-Reading, School Reading, and Library Reading Programs. 

  2. For example, there is no indication as to whether the book is more love or more science fiction

  3. For example, it makes analysis of user progression much more difficult. Instead of just looking to see, for example, whether users often transition from young adult books to love books, we need to understand whether the classification of a book as both young adult and science fiction has any impact on the user’s progression through genres. This drastically expands the complexity of the computations. 

  4. Learn has a distinct advantage over the other categories, given that it makes up 58% of Worldreader’s total collection and encompasses a wide range of non-fiction topics. 

  5. See our past post on gender for a thorough discussion of the limitations of analyzing gender within this project. 

  6. More information on the Read to Kids program can be found here

  7. As a result of the findings of the UNESCO report that while less women were reading on their mobile app, those women that were reading were consuming 6x more content per month than men, Worldreader began exploring this gender divide in 2016 through the Anasoma project in Kenya. The Anasoma Project aims to increase female participation in mobile reading and positively influence gender social norms and stereotypes, by identifying the barriers of and drivers to female mobile readership as well as by testing new empowering and engaging content through the Worldreader mobile app. For more information see the Anasoma Midterm Report