University Post
University of Copenhagen
Independent of management


Sexy women and brave men: Gender stereotypes in literature are affecting artificial intelligence

Research — In literature, men are courageous, rational, and commit acts of violence. Women are typically sexy, pretty and fertile. The gender norms in the books influence how the machines of today learn human language.

In collaboration with an international research group, computer scientist at the University of Copenhagen Isabelle Augenstein has studied language use in fiction and academic literature from 1900 to 2008. Using machine learning, researchers have identified the adjectives and verbs associated with nouns in 3.5 million books.

We do not just present the research results in itself, but also a general method on how to trace gender-biased language in larger collections of text

Isabelle Augenstein, assistant professor at the Department of Computer Science

The group’s research shows that there are significant differences in the literature on how female nouns (e.g. daughter, mother, princess, queen) and male nouns (father, son, brother, husband) are described with adjectives.

The female gendered nouns are described with adjectives associated with their appearance, while men are described with words associated with their behaviour. While literature’s women are beautiful or sexy, men are primarily righteous and courageous.

In sociolinguistics, researchers have for a long time been interested in the extent to which our language is gender specific. However, the computer scientists’ latest research results have been achieved with new quantitative methods.

»We do not just present the research results by themselves, but also as a general method to trace gender-biased language in larger collections of text. A method that can be used on other data sets,« Isabella Augenstein writes in an email to the University Post.

Via machine learning, the computer scientists have used a huge data set, and they have been able to determine the extent of gendered language and whether the adjectives and verbs are positive or negative.

Affects today’s society

Although the researchers have not used literature written after 2008, the gendered language of older texts is relevant for today: The algorithm used in programmes and machines that recognise people’s voices largely use texts available on the internet – including literary works and articles from many years ago. This goes for when smartphones recognise voices, or when Google prompts keywords.

»A commonly used data set is, for example, the New York Times’ data set, which contains articles from the period 1987-2007,« writes Isabelle Augenstein.

The machines build up their ‘truth’ based on older texts that make use of gender stereotypes.

Read more

The research article on the project, which was recently presented at the conference ACL 2019, can be found here.


»In addition, I would argue that people still engage with older literature«. Older, written, language influences our society in the present: »Think of the classic children’s books like the Brothers Grimm’s fairy tales or recent film adaptations of historical dramas,« writes Isabelle Augenstein.

To follow up on the results, the research group which Isabelle Augenstein is a part of has just started a new study. It will be able to say more about how gender stereotypes have unfolded historically. They do this by taking the newly presented research results and dividing them into decades.

»Based on what other researchers have found, we expect to be able to see that there are still gender stereotypes in modern books, even though they may be couched in a slightly different language,« she writes.