Humans As Baseline Performance

Mar 20

Written By Brian McKenna

The common approach to AI development is to compare AI performance to human performance. The question is, why?

At a recent OSU AI Summit , a panelist posed an interesting question:

“Why are humans used as the standard for AI performance when humans are imperfect?”

They followed it with another statement:

“AI is more efficient and reliable, so why are we trying to achieve imperfect human performance?”

There’s a lot to unpack in those ideas (and a number of misconceptions about how AI actually functions within systems; revisit Bainbridge’s Ironies of Automation when you have a chance). But I want to focus on the first question: why do we compare AI performance to humans at all?

One of the most influential misconceptions about the implementation of AI is the myth that we can simply replace the human in the system with the AI agent (Now you should revisit The seven deadly myths of autonomous systems). If you are a believer in this substitution myth, it becomes easy to think of humans as the baseline condition, the control group. And, generously, I would like to think this was the point that the panelist was trying to make, that if we take this approach, we set the target performance lower than we otherwise could. If the sole goal is improvement, any statistically significant improvement suddenly looks like a big win.

But humans have never been the entire picture. What performance is dictated by, what the baseline is, is humans doing work with designed technologies. This is true even in the most technologically limited situations of a human jotting notes on a piece of paper of some math equation or some drug interaction that they’ll need to remember later. It also applies to larger system, like humans driving a vehicle. People created a system that the human is just one part of.

Real understanding of performance is not comparing humans to AI, it’s comparing the joint cognitive system with current tools versus the joint cognitive system (JCS) with this new sparkly AI tool.

Right now, there is an issue of reducing the original JCS in these studies to just the human and then we get into these distracting conversations about human imperfections. When we ignore that we have designed these larger systems that people are operating in, then we have to focus on the human in the system. But we can change the larger systems, too.

And it is true - humans are imperfect. But as a rebuttal to the panelists follow-up comment, systems have continued to function because humans adapt. They improvise and recover when things go wrong. What’s often ignored in the conversation is that AI is not immune to imperfection. It is built by imperfect humans, trained on imperfect data, and deployed into an imperfect world. The difference is that humans have decades of evidence demonstrating their ability to adapt in complex situations to keep systems moving. AI has not proven it can do the same (and likely will not; See: The law of requisite variety from Patterns of Joint Cognitive Systems) recent challenges with autonomous vehicles navigating unexpected disruptions are a good reminder of that.

These panelists comments reveal an important lesson about how AI testing is often structured today.

We frequently evaluate:

humans performing a task alone, vs
AI alone performing the same exact task that a human was dealing with before

But introducing AI into a system changes the system itself. Roles shift. Information flows change. Decision-making dynamics evolve.

The real question ISN’T “Is AI better than humans?” The real question is (and should always have been): How does the human–technology system perform together? Adding AI into the JCS is no different than adding in any technology. It should improve the performance of the JCS.

Decades of human factors research show that technology supporting humans typically outperforms either humans or automation operating alone. Yet many evaluation approaches still treat them as separate competitors rather than components of a joint system.

If we want to understand the real impact of AI, our mindset on using AI needs to change. We need to shift our thinking to how we integrate AI into the joint system and to how people and AI work together in the environments where the technology will actually be used. Ultimately this will change how we build AI and prompt a more comprehensive evaluation and testing process that emphasizes joint performance and overarching goals rather than isolated efficiency metrics.

Because in the end, AI doesn’t replace the system. It becomes part of it.

Brian McKenna

Humans As Baseline Performance

Adding meaning to ‘human in the loop’