ChatGPT Tests Into Top 1% for Original Creative Thinking

Share

Facebook
Twitter
LinkedIn

Recent findings from the University of Montana and partners indicate that artificial intelligence can rival the creative abilities of the top 1% of human participants based on a standard test for creativity.

Led by Dr. Erik Guzik, an assistant clinical professor at UM’s College of Business, the team employed the Torrance Tests of Creative Thinking – a well-known tool used for decades to assess human creativity.

The researchers submitted eight responses generated by ChatGPT, the application powered by the GPT-4 artificial intelligence engine. They also submitted answers from a control group of 24 UM students taking Guzik’s entrepreneurship and personal finance classes. These scores were compared with 2,700 college students nationally who took the TTCT in 2016. All submissions were scored by Scholastic Testing Service, which didn’t know AI was involved.

The results placed ChatGPT in elite company for creativity. The AI application was in the top percentile for fluency – the ability to generate a large volume of ideas – and for originality – the ability to come up with new ideas. The AI slipped a bit – to the 97th percentile – for flexibility, the ability to generate different types and categories of ideas.

“For ChatGPT and GPT-4, we showed for the first time that it performs in the top 1% for originality,” Guzik said. “That was new.”

He was gratified to note that some of his UM students also performed in the top 1%. However, ChatGPT outperformed the vast majority of college students nationally.

Guzik tested the AI and his students during the spring semester. He was assisted in the work by Christian Gilde of UM Western and Christian Byrge of Vilnius University. The researchers presented their work in May at the Southern Oregon University Creativity Conference.

“We were very careful at the conference to not interpret the data very much,” Guzik said. “We just presented the results. But we shared strong evidence that AI seems to be developing creative ability on par with or even exceeding human ability.”

Guzik said he asked ChatGPT what it would indicate if it performed well on the TTCT. The AI gave a strong answer, which they shared at the conference:

“ChatGPT told us we may not fully understand human creativity, which I believe is correct,” he said. “It also suggested we may need more sophisticated assessment tools that can differentiate between human and AI-generated ideas.”

He said the TTCT is protected proprietary material, so ChatGPT couldn’t “cheat” by accessing information about the test on the internet or in a public database.

Guzik has long been interested in creativity. As a seventh grader growing up in the small town of Palmer, Massachusetts, he was in a program for talented-and-gifted students. That experience introduced him to the Future Problem Solving process developed by Ellis Paul Torrance, the pioneering psychologist who also created the TTCT. Guzik said he fell in love with brainstorming at that time and how it taps into human imagination, and he remains active with the Future Problem Solving organization – even meeting his wife at one of its conferences.

Guzik and his team decided to test the creativity of ChatGPT after playing around with it during the past year.

“We had all been exploring with ChatGPT, and we noticed it had been doing some interesting things that we didn’t expect,” he said. “Some of the responses were novel and surprising. That’s when we decided to put it to the test to see how creative it really is.”

Guzik said the TTCT test uses prompts that mimic real-life creative tasks. For instance, can you think of new uses for a product or improve this product?

“Let’s say it’s a basketball,” he said. “Think of as many uses of a basketball as you can. You can shoot it in a hoop and use it in a display. If you force yourself to think of new uses, maybe you cut it up and use it as a planter. Or with a brick, you can build things, or it can be used as a paperweight. But maybe you grind it up and reform it into something completely new.”

Guzik had some expectations that ChatGPT would be good at creating a lot of ideas (fluency) because that’s what generative AI does. And it excelled at responding to the prompt with many ideas that were relevant, useful, and valuable in the eyes of the evaluators.

He was more surprised at how well it did in generating original ideas, which is a hallmark of human imagination. The test evaluators are given lists of common responses for a prompt – ones that are almost expected to be submitted. However, the AI landed in the top percentile for coming up with fresh responses.

“At the conference, we learned of previous research on GPT-3 that was done a year ago,” Guzik said. “At that time, ChatGPT did not score as well as humans on tasks that involved original thinking. Now with the more advanced GPT-4, it’s in the top 1% of all human responses.”

With AI advances speeding up, he expects it to become a key tool for the world of business going forward and a significant new driver of regional and national innovation.

Related Post