Obtaining a solid measurement of creativity can be hugely time consuming. Well-established tests — such as the Alternative Uses Task (AUT), which asks participants to generate unusual ways to use common objects — require substantial time and effort in order to properly score participant responses. Not only that, but assessment of the creativity of responses varies wildly as a result of both the scorers’ judgements and the qualities of answers relative to the rest of the data. For example, one especially creative response amongst a sea of generic responses may garner extra points; place that same answer amongst other highly creative responses, however, and it is likely to score lower.
But take heart, overstretched researchers — a new paper in PNAS suggests there may be an easier, more reliable way to measure creativity.
In an effort to combat these issues, researchers led by Jay A. Olson from Harvard University have attempted to streamline the process by devising a new task which can be easily analysed by a computer algorithm.
Their research suggests that the newly created measure — the Divergent Association Task (DAT) — may be at least as effective at measuring verbal creativity as other, more widely known creativity measures, with the added bonuses of being both shorter and more enjoyable to participants.
The DAT relies on the concept of semantic distance, which is a measure of how related two words are to each other. Traditional approaches to measuring creativity require participants to give long, prose-like answers, and algorithms struggle to analyse semantic distance in these tasks. This is because the exact meaning of words often hinges on nuance, sentence structure, and context. For instance, the authors write, in the Alternative Uses Task someone might say that they use a pen to “record a break” or to “break a record”. The latter is clearly more creative, but a programme looking at semantic distance wouldn’t be able to tell the difference.
However, by using single words as opposed to prose, the DAT effectively circumnavigates these issues by breaking responses down into components that can be easily analysed by a computer. During the task, participants are asked to produce 10 nouns as unrelated to each other as possible over the span of four minutes, generating a small pool of words for analysis. The semantic distance between the first seven valid words provided is then computed using an algorithm called GIoVe, which has been trained on text from billions of web pages.
The team conducted two studies comparing the DAT to established measures of creativity. The first study, conducted on a total of 476 predominantly female undergraduate students from Australia and Canada, demonstrated that the new measure correlated well with two components of the AUT’s creativity measure — flexibility and originality — though not the third component, fluency. There was also a strong positive correlation between DAT scores and scores on the Bridge-The-Associative-Gap task, in which participants are presented with two words (eg. giraffe, scarf) and are asked to provide the word that links them (eg. neck).
In the pre-registered second study, a more generalisable selection of 8,572 participants from 98 countries performed a shortened version of the AUT and the DAT. While the shortened AUT meant fluency and flexibility couldn’t be measured, the AUT and DAT scores did correlate positively on the dimension of originality. Participants also completed four problems probing insight, creativity, and analytical ability; higher scores in these tasks also positively correlated with DAT scores.
Not only does this new approach cut out much of the human effort and variability involved in judging scores, it also provides distinct advantages in certain research contexts. In the case of fMRI studies, for example, providing a handful of one-word answers is preferable to whole sentences, as it reduces the amount of participant movement. Researchers interested in simple creativity tasks for participants to complete under the influence of drugs such as psychedelics may also find the method particularly appealing.
As with most measurements, this new test does have some limitations. The authors highlight that several strategies can be employed to boost scores. For instance, participants might use items in the room to help them come up with new ideas, or intentionally reach for rarely used words which the algorithm may read as more semantically remote. It’s possible that this could lead the algorithm to inadvertently rating participants with higher education levels or more diverse vocabularies as more “creative”. It’s also unclear what influence multilingualism may have on DAT scores, though all of these factors are the subject of ongoing investigations by the team. In future, the analysis algorithm could be trained to process responses from more languages, time periods, and cultures to further improve the method’s precision.
Emma L. Barratt (@E_Barratt) is a staff writer at BPS Research Digest