Researcher Uses Machine Learning to Understand Cognitive Processes of Language 

Key Findings 

  • Computational models can learn from past experiences, including social media posts, to predict future language use.
  • Considering contexts such as content domain improves behavior prediction.
  • Computational models of language can improve services such as cognitive assessment and secondary language learning.

Language is a fundamental component of the human cognitive system and enables complex social dynamics. However, the cognitive processes underlying components of language use — such as the acquisition of knowledge and grammar, and the individual experiences that influence these processes —remain a mystery. Dr. Brendan Johns is pioneering new ways to study this domain by leveraging machine learning, computational modeling, and big data. The Federation of Associations in Behavioral and Brain Sciences (FABBS) is pleased to name Dr. Johns an Early Career Impact Award winner, as nominated by the Society for Computation in Psychology (SCiP). Dr. Johns received his Ph.D. in Cognitive Science from Indiana University and is currently an assistant professor in the Department of Psychology at McGill University.  

Researchers have long taken a “top-down” approach to studying language: first making assumptions about the mind’s function, then seeking evidence to either support or contradict them. For example, famed linguist Noam Chomsky posited that all humans have an innate ability for language comprehension and structure.   

However, Dr. Johns points out that this approach may fail to consider how context and past experiences may influence an individual’s cognitive processes. His “bottom-up” approach focuses on how individual embeddedness in various social, linguistic, and semantic environments affect language. Contrasting Chomsky’s view of innate linguistic capacity, Dr. Johns explores usage-based theories where children study their environment to learn to use language from a very early age. As they grow older, they continue to learn from others and eventually develop a deeper understanding of language processes such as grammatical structures. Importantly, children will differ in their cognitive language processes based on their individual experiences, resulting in variation in grammatical structures, the meaning of words, and the effect of various contexts on language expectations.

It is a fundamental question. How much of our behavior is innate, and how much of our behavior is based upon past experience?

Dr. Brendan T. Johns

Under this theory, learning from past behaviors is key to understanding cognitive processes in language and to explain current and predict future behaviors. “It is a fundamental question. How much of our behavior is innate, and how much of our behavior is based upon past experience?” 

Dr. Johns uses computational machine-learning algorithms to derive patterns from large bodies of texts, such as Wikipedia articles, books, and social media. He then formulates testable hypotheses from these patterns and evaluates them with predictive studies. He is particularly interested in texts that are relevant to everyday language. “I have all of Reddit downloaded onto a hard drive!”

Dr. Johns’s research is focused on uncovering language patterns to understand the role of context in two areas: lexical semantics, the study of word meaning, and lexical organization, how language is organized in the mind.

In one study, Dr. Johns parsed the posts from the top 350,000 commenters on Reddit, a forum-based social media site. As Reddit contains a large diversity of content areas, Dr. Johns identified words that are common across multiple domains, such as science, politics, and pop culture. He found that individuals have stronger expectations for a word to occur if it is common in diverse domains. For example, “occasion” would be more expected to occur more frequently as it can be used in almost any situation, while “molecule” typically only occurs in science-based discussions. The use of context, in this case communicative discourse, improved the ability of machine learning models to predict human language use beyond traditional methods, such as just counting word frequencies.

Dr. Johns’s work has several important societal implications. Specifically, he believes computational models that focus on context and past experience can improve clinical care and language education.

In one application, Dr. Johns examined testing of cognitive impairment in older adults at risk for Alzheimer’s disease. Cognitive impairment is often assessed with tasks such as verbal fluency where participants are asked to “name as many animals as you can in 60 seconds,” with the outcome being the raw count of animals listed, with fewer animals produced indicating greater levels of impairment. Dr. Johns proposed a more nuanced examination of the semantic pathway that an individual takes in their responses, such as switching between house pets, zoo animals, and farm animals. He found that individuals with cognitive impairment had greater difficulty shifting between different categories and that this information was better able to predict cognitive deficits than raw counts alone. This finding highlighted how computational models can improve clinical assessment using data that are already being collected.

Dr. Johns also hopes to extend his work to secondary language acquisition and language rehabilitation. Noting that individuals will differ in their background experiences, such as regional language differences and general cognitive abilities, it is likely that individuals begin learning a second language with highly variable starting points. Acknowledging these individual differences, standardized language education programs may not benefit everyone equally. Dr. Johns therefore hopes to use his machine learning strategies to study individual’s past language uses, such as social media posts or books read, in order to build educational programs tailored to each individual’s experiences.

This goal is representative of Dr. Johns’s overarching research program: to use large-scale computational modeling to study past experiences related to cognitive processes in language to predict future behavior.

Potential for Future Impact

  • Better understand cognitive risk markers for dementia.
  • Tailor secondary language programs to individuals based on past experiences.
  • Improve language development in individuals with cognitive deficits.

Dr. Brendan T. Johns is a recipient of the 2022 Federation of Associations in Behavioral & Brain Sciences (FABBS) Early Career Impact Award and was nominated by the Society for Computation in Psychology (SCiP).

The 52nd annual SCiP Conference took place in Boston on November 17, 2022, in tandem with the Psychonomic Society Meeting.

You can read more about Dr. Johns’s work at the links below: 

View All Articles