User Preferences: Part 2, Queries

Posted August 15, 2018
By Jason Young

Our last post examined user genre preferences within Worldreader’s Lifelong Reading Program. As we described in that post, those analyses provided valuable insights into the current Worldreader material that users are accessing, but a second source of information can provide even finer-grained details on what users want to be accessing. This comes in the form of user queries, which will be the focus of this post. The Worldreader application allows users to search for books using their own query terms. The search box prompts them to search by author or title, although it allows for open-ended input. This is then used to query the title, author, and description fields of the books in Worldreader’s collection. The query terms used by Worldreader users are then captured within the log data. We are able to extract this information by pulling out all rows in which the controller variable is ‘Search’ and the action variable is ‘Results’. For each row, we will then find a value in the query field, which corresponds to the text that the user inputs during their query. We have been able to create some very basic word clouds to start exploring what types of queries are popular amongst Worldreader users.

Before digging into the types of queries being performed, it’s important to understand who is performing these queries. Is the function used often enough to merit consideration? And, if so, does it reveal more information about any particular group of users? Figure 7 shows that the search function is being used across the top 10 countries, with a raw count of searches varying from over 1.8 million in Nigeria down to 88,344 in India. These searches are being made by a fairly large number of users. Figure 8 shows the number of unique IDs (either user ID or client ID) associated with at least one query across the top ten countries. This time the numbers vary from 440,100 unique IDs in South Africa to 30,110 unique IDs in Zimbabwe. These results need to be qualified slightly, since a client ID may not be equivalent to a single user – one unregistered user could potentially be using multiple devices, which would inflate these numbers. Nevertheless, the number of queries, and IDs associated with queries, seems high enough to justify further analysis.

Raw Count of Searches by Country
Figure 7.


IDs with at least one query
Figure 8.


These raw counts give us some idea of the overall quantity of queries, but do not shed too much light on the relative popularity of queries amongst different populations. It is not surprising, for example, that Nigeria has a huge number of queries, given that there are many more Worldreader users in Nigeria. It is therefore useful to also look at number of queries performed within each country, relative to the user base of that country. We can also look at differences between the rate of querying by male and female users, to get a better idea of what demographics are using the capability. Figure 9 shows the number of unique IDs within a country that have made a query, divided by the total number of unique IDs within that country. Generally, it seems that about 5% of unique IDs across these countries have made at least one query. The query function appears to be the most widely popular in India (6.3%) and South Africa (6.03%), and least popular in Ethiopia (2.18%), Cote d’Ivoire (2.42%) and Uganda (2.5%). Figure 9 also shows that querying is much more widespread amongst registered users, as compared to unregistered users – a much higher percentage of female and male IDs are associated with queries, as compared to the broader population of users1. The percentage of male and female users performing queries is relatively equal across many of the countries. Males perform queries at a high rate in Cote d’Ivoire, India, and Zimbabwe; females perform queries at a higher rate in Ethiopia, Ghana, Kenya, Nigeria, Uganda, and Zambia; and the rate is exactly equal in South Africa. In most of these countries, 15 – 20% of male and female registered users have performed at least 1 query.

Number of users performing queries by gender
Figure 9.


Now that we know who is making at least 1 query, it’s also interesting to see how many queries, on average, each of these populations makes. This is indicated in Figure 10, which shows the mean number of queries performed by users that have made at least 1 query. The most interesting result here is that female users tend to make many more queries than their male counterparts – in some countries, they are using the function, on average, twice as much! This supports some of our earlier findings, that female users tend to engage with the Worldreader application more than male users. Overall rates vary from 2.8 queries per ID in Uganda, up to 4.9 queries per ID in Nigeria. Female rates vary from 4.3 queries per user in India up to 10.7 queries per user in Nigeria, and male rates from 3.9 in Zambia to 5.3 in Nigeria. Taken together, these results demonstrate that queries are quite popular amongst registered users, that both males and females engage with the search function, that females tend to perform more queries once they’ve engaged with the function, and that the popularity of searching varies by country. Given these conclusion it does seem as though an analysis of queries could give good insight into user preferences, particularly for registered users. The high percentage of registered users involved in searching is particularly exciting, since it also allows us to examine gender differences in search terms.

Mean Queries Per User by Gender
Figure 10.


So, what is it that users are searching for? Figure 11 is a word cloud of the most popular search terms across the top ten countries. It was created by making all queries lowercase (so that capitalization didn’t affect the aggregation of terms), aggregating the number of unique IDs associated with each query term, filtering for terms that are associated with at least 1,000 unique IDs, and then visualizing the terms so that larger font and darker text corresponds with larger numbers of unique IDs. I would note, at the outset, that this method could be dramatically improved upon. For example, this method does not account for language differences, meaning that an English search term will not be aggregated with its exact translation in, for example, French. Given that English is spoken in most, but not all, of the top ten countries, it is likely that non-English searches are underrepresented in the results. Additionally, we would get much more robust and interpretable results if we designed and implemented a more rigorous content analysis methodology. Currently, very similar search terms (such as ‘romance’ and ‘romantic’) are counted separately from one another. It would be very useful to develop a framework whereby these similar terms could be aggregated together. This could form a future project for Worldreader, to better leverage these data. For now, though, we’ll take a look at the basic word cloud, to see what it offers us in terms of preliminary exploration.

All Queries Word Cloud
Figure 11.


Across these ten countries, the following terms are associated with the most number of unique IDs:

  • Sex: 27,843
  • Bible: 14,623
  • Love: 12,884
  • Things Fall Apart: 10,295
  • Animal Farm: 9,195
  • Romeo and Juliet: 8977
  • Nothing but the Truth: 8,186
  • Biology: 7,047
  • Morgan Rice: 6,876
  • Fifty Shades of Grey: 6,006

It is not surprising that four of these queries – sex, love, Romeo and Juliet, and Fifty Shades of Grey – can be directly associated with the love category of books within the Worldreader application. The vast popularity of sex as a search term indicates that the popularity of the love category is directly related to an interest in sexual topics. However, the ranking of the term love at number three certainly indicates that the popularity of love as a category is more multi-faceted than just an interest in sex. The appearance of the Bible as a search term is consistent with the heavily Christian demographics of many of these countries, and Nothing but the Truth may also be religious in nature. Nothing but the Truth: Precious memories is a book authored by Pastor Deborah C. Dallas. However, more research is necessary to determine whether this is actually what these users are searching for – they could just as easily be looking for the 2008 movie Nothing but the Truth. Other search terms make it seem as though some users do search for videos within the Worldreader app, despite it being an e-reading platform. Many of the other top search terms appear to be popular books and authors, including Things Fall Apart (an account of colonial Nigeria by Chinua Achebe), Animal Farm, and Morgan Rice (a self-published young adult/science fiction/fantasy author). Finally, it does appear that a large number of users are accessing the application for educational/reference purposes, given the high number of searches for the term biology.

These results can easily be divided by gender, to explore whether there are any clear gender differences in reading interest. Figure 12 shows popular male search terms within the top ten countries (filtered so that only terms associated with at least 100 unique user IDs appear), and Figure 13 shows popular female search terms. Popular search terms for the males are quite similar to that of the overall population. Sex (664 users) is easily the most popular search term, with love (390) coming in third most popular. Things Fall Apart (395) and its author, Chinua Achebe (334) fall very close to the top, and the terms Bible, biology, Morgan Rice, Animal Farm, and Romeo and Juliet all remain as top ten terms. The only new term that emerges is The River and the Source, which is interesting given the content of this book. Written by Kenyan author Margaret Ogola, this award-winning book has been described as a feminist work that explores the role of women in Kenyan society (Kamau 2017). It is interesting that a book that follows three generations of women would show up as popular amongst male users, but not in the female results. In contrast, the female word cloud shows a very clear shift to romance novels. Romance author Heather Graham (565 users) overtakes both sex (329) and love (317) as the most popular search term, and all of the authors that show up are both women and, with the exception of Morgan Rice, known for their work in romance. These include Nora Roberts, Danielle Steel, Maya Blake, Maisey Yates, Sharon Kendrick, Dani Collins, and Lucy Monroe. Harlequin and Fifty Shades of Grey round out the romance-focus. The other new terms that appear within the female results include Twilight, Harry Potter, and vampire, which may indicate some leanings toward fantasy and young adult novels. Even at this basic level, these results provide valuable information about what types of books are popular amongst Worldreader users across these geographies. They may also present insights into differences between male and female preferences, which could be used to design specific reading programs targeted at different gender groups.

Male Queries Wordcloud
Figure 12.


Female Queries Wordcloud
Figure 13.


Beyond these applied results, though, content analysis of user queries could also lead to broader sociological insights. For example, the high interest in the love category – and issues of sex and romance – may provide an opportunity to better understand the impact of romance novels within the Global South. Romance has long been ignored or unfairly criticized by the literary world, with many critics dismissing the genre as meaningless fluff (Faircloth 2017). Some scholars, however, have pushed back against these stereotypes by pointing out that romance not only makes up one of the world’s largest fiction markets, but also one of the most diverse and innovative (Charles 2017; Tapper 2014). Romance novels are one of the first places that some youth learn about issues of love and sex, particularly if they lack access to sexual education resources, lack family support, or live within a sexually repressive culture (Charles 2017; Tapper 2014). These books can provide very practical information, from insights into building healthy family relationships to how to engage in sex in a safe and healthy manner (Iqbal 2014). [Footnote: Of course, romance is a very large genre, and not all romance novels are equally beneficial for readers. Iqbal (2014) finds that they can also perpetuate sex-related problems, from normalizing heterosexual relationships to ignoring the importance of contraception.] They may be even more important for individuals whose gender or sexual orientation is marginalized or stigmatized within a particular society. As Faircloth (2017) points out, one of the largest conversations in romance right now is that of inclusion, such that the genre represents a broader range of identities and romantic experiences. This is even the case in countries that are generally repressive around issues of sex. In Nigeria, for example, authors are increasingly writing books that explore taboo issues like female sexuality, sexual violence, and even polygamy (Alter 2017). These books may help readers to explore socially stigmatized issues, from interracial relationships to non-heterosexual sex, in ways that are not otherwise available to them. They may also be a particular source of empowerment for women, since many (although, certainly not all!) romance novels give female characters agency, are written by female authors, and positively describe female sexuality (English 2017; Iqbal 2014).

The questions of importance to us, then, center around why Worldreader users seek out the love category. What do they want to get out of these love books? What information are they really seeking when they search for sex-related content? Are there gendered differences related to these searches? In order to fully answer these, it would require a focused research project involving rigorous content analysis and interviews of Worldreader users. For the time being, though, we’ll have to settle for some initial evidence that these are interesting questions to ask within the context of Worldreader data. We filtered our dataset so that it only included queries that contained the text ‘sex’ in some way. [Footnote: This was only performed with the English word ‘sex’ – another constraint to the research.] Surveying the list, it was immediately clear that users had diverse interests related to sex, from the purely pornographic (e.g., sexy porn; sex picture; sex movie) to interests in instruction (e.g., 14 things you should know about sex; how to perform sex; sexual positions; sex for the first time), relationships (e.g., sex and marriage; sex and romance), health (e.g., sexual health; are you ready for sex; 6 things do after sex for a healthy vagina), and gendered violence (e.g., sexual abuse; female victim of sexual violence; sex and violence in the media; sexual and domestic violence). Many of these topics could fuel their own research study. Beyond analyzing these terms generally, it is also possible to filter them by country and by gender. For example, Figure 14 shows the most popular search terms that contain the text ‘sex’ for men in the top ten countries, and Figure 15 shows the most popular search terms for females. In each of these word clouds we filtered out the query for ‘sex’ alone, since it far eclipsed all other queries. We also required that the query was performed by at least 2 distinct user IDs. While we haven’t had much time to analyze these results, two initial thoughts jump to mind. First, even across ten countries, the number of registered users making these specific queries about sex is relatively small – only about 30 users, maximum, for either gender. Developing a more robust content analysis method may help to increase these numbers, since it would allow us to conceptually aggregate different types of queries (e.g., queries related to sexual health, queries related to sexual violence, etc.). However, it may also be the case that users do not often query for sex-related concepts when they are logged into their account. This could constrain the amount of gender-based data we can perform related to sex-related queries. Second, it is interesting to see that the top search terms are quite similar between male and female users. Both sets of users appear to have the highest interest in sex stories/books, sexual intercourse, love and sex, and sex positions. Some of the smaller queries might point to interesting differences in focus – for example, sex worker and safe sex in the female results – but the number of queries is just too small to reach much of a conclusion at this point.

Male Queries of Sex Wordcloud
Figure 14.


Female Queries of Sex Wordcloud
Figure 15.


In summary, the past two posts have examined the reading preferences of Worldreader users, both in terms of the category of books being read and the searches users are performing within the application. This provides important insights into the types of books, and even the specific titles and authors, that users are interested in, and these insights can be broken down to focus on specific geographies or user groups. Already, this preliminary analysis could be used to drive Worldreader’s acquisition of new books, or to encourage inclusion of popular content (such as romance novels) within literacy programs in these countries. We also found that these data have a lot of potential for broader sociological research. However, it would be ideal if the data were improved, either through the re-categorization of books using a new schema or through the use of a more robust content analysis methodology. Beyond this, Worldreader may be able to perform further analysis on the query data, to see how effective the query function is at directing users to books that they want to read. At any rate, these data are quite rich, and this work points to a lot of other valuable research that Worldreader might consider in the future!

References

Alter A. 2017. A Wave of New Fiction from Nigeria, as Young Writers Experiment with New Genres. The New York Times. Nov. 23. https://nyti.ms/2hYRFzJ

Charles R. 2017. Stop dissing romance novels already. The Washington Post. 7 August. https://www.washingtonpost.com/entertainment/books/stop-dissing-romance-novels-already/2017/08/07/960e8bda-7abe-11e7-83c7-5bd5460f0d7e_story.html?utm_term=.c7238947aba3

English, Jessica. 2017. Reading the Romance: Through the Eyes of a Millennial Feminist. EWU Masters Thesis Collection. 456. Masters Thesis. Dept. of Communication. Eastern Washington University. http://dc.ewu.edu/theses/456

Faircloth, Kelly. 2017. Here’s How Not to Critique Romance Novels. Jezebel. 6 Oct. https://jezebel.com/heres-how-not-to-critique-romance-novels-1819188174

Kamau, Lucy. 2017. Place of women in society and other themes in ‘The River and the Source’. Daily Nation. Last acc. 15 Aug 2018. https://www.nation.co.ke/lifestyle/weekend/themes-in-The-River-and-the-Source/1220-4026226-i47hen/index.html

Iqbal, Kundan. 2014. The impact of romance novels on women’s sexual and reproductive health. Journal of Family Planning and Reproductive Health Care. 40: 300-02.

Tapper, Olivia. 2014. Romance and Innovation in Twenty-First Century Publishing. Publishing Research Quarterly. 30: 249-59.

  1. Remember that users that have recorded their gender are necessarily registered users.