The ethics of an interface which simulates 100% confidence on a product which is anything less than 100% accurate.
Scene from Christine (1983)
In this week’s OpenAI “Spring Update” video they showed their new product called GPT-4o, pronounced “For? Oh…”
The presentation includes a series of smartphone demos where GPT-4o was spoken to and shown things via the camera. There were vague mentions of this model being more capable but no detail about what that actually meant. The presentation was for the new and improved interfaces between person and product. Voice and vision input and voice output.
For the past couple of years we’ve been very focused on improving the intelligence of these models, and they’ve gotten pretty good. But this is the first time that we are really making a huge step forward when it comes to the ease of use
– Mira Murati, OpenAI CTO — OpenAI Spring Update, May 13, 2024
In one demo, an OpenAI research lead, Barret Zoph, shows GPT-4o a simple equation written on paper with a marker; 3x + 1 = 4. The product correctly identifies the equation and walks him through the steps to solve it.
https://medium.com/media/d3669aa164b87f170d63e33c5332d451/href
One of the key improvements of this model, which they repeat often through the presentation, is how much faster it is to respond. They tell us how they’ve worked to move from a staccato style command-and-response interaction to a seamless conversation style. And this demo certainly shows how well they have succeeded.
What this demo also shows is that an immediate response — without hesitation — is an attribute of confidence. When someone responds immediately they are telling you they are sure of what they are saying, or at least, they want you to think they are sure.
Consider the real-world scenario of the algebra demo. Barret asks GPT-4o for help with a very simple equation. Simple enough for everyone in the audience to work out the answer in their heads in advance. This is by design so we’re free to marvel as the product walks him through the problem to reach the correct answer.
The implied use-case here is a child doing their homework without access to a person who can help them. If the child did have access to someone who could help them, GPT-4o would be redundant.
A separate demo video featuring Sal Khan of Khan Academy, and his son, inadvertently highlights this redundancy well. Sal supervises as GPT-4o guides his son through some trigonometry, he nods in approval of each step the product takes in solving the problem.
https://medium.com/media/47c59bba060f984a50ff398b5c42d1a1/href
The voice synthesis is strong, confident, emotive, and most of all, fast in its responses. A human-like voice explaining — without hesitation. It’s impressive, but important to remember that the usefulness would only exist if Sal wasn’t there to help, supervise, or correct any mistakes the bot might make.
In both of these videos the audience and the demonstrators are acting from a position where they have the ability to solve the problem themselves. The implied usefulness relies on there being trust that the model will be truthful and accurate without the knowledgable participants present.
I can’t help but wonder what Sal Kahn would do if he reviewed his son’s homework, which was done alone with the GPT-4o tutor, and found that any of it was incorrect.
The elephant-sized problem in the generative AI room is the unpredictable veracity of the responses. As a problem, it should be considered more important than ease-of-use and the creation of an interface that leans full-tilt into people’s tendency to anthropomorphise the product.
It’s not just more important, it’s essential.
What OpenAI have presented with GPT-4o is a fresh paint job on a car with a dangerously corroded chassis. A false confidence machine.
Improvements to the voice synthesis, response time, and the ability to “butt in” without throwing the bot off, are nothing but very impressive cosmetics — user experience improvements. There is no new evidence that the product is more trustworthy other than it being better at simulating trustworthiness.
The more useful something is, the more complexity people will endure to use it. GPT-4o is a reaction to the inverse. The less useful something is, the less complexity people will endure to use it.
This is one of the fundamental motivations for corporate UX endeavours. A fear of potential, or paying, customers being caught in friction long enough to realise they don’t really need the thing.
Another fundamental motivator is a need to hide how the thing works. Knowing how something works is to know how it doesn’t really work as well or as universally as it may seem.
It’s possible that a need to remove complexity that is obscuring a purpose is a motivator as well. But that’s only valid if the party behind the product knows what the purpose is, can clearly explain it, and demonstrate it. If that’s not the case, we’re in the realm of potential over purpose. I’ve written about this in my article, Complicated Sticks.
It is unethical to slap an interface, which convincingly simulates 100% confidence, onto a product which is anything less than 100% accurate, let alone a product that CTO, Mira Murati, calls “pretty good”.
No exceptions; no “it will get better”. If the house doesn’t have a roof, don’t paint the walls.
This does not mean that reduction or removal of complexity is inherently deceitful, but it does mean that the complexity which informs a person, not how, but why something works the way it does can be an important factor in them deciding to use it.
Nothing could make this more evident than the crypto/web3 community’s obsession with “mass adoption” which they generally resolve to being a UX problem. They know that the complexity of crypto is intimidating to non-technical people (crimes and scams aside) so they relentlessly try to remove as much of the complexity as possible.
The unfortunate thing about removing complexity is that you never remove it, but rather, you move it to another place. The other place is always what crypto people like to call a “trusted third party” the very thing that Bitcoin, was created to eliminate.
Commerce on the Internet has come to rely almost exclusively on financial institutions serving as trusted third parties to process electronic payments. While the system works well enough for most transactions, it still suffers from the inherent weaknesses of the trust based model
– Satoshi Nakamoto — Bitcoin white paper, October 31, 2008
Knowing how crypto works is key to it being useful. Trusting that crypto works has created, and will continue to create, fraud, crime, and financial hardship.
Coinbase and Binance are successful because the burden of complexity is on them. Every customer is trusting them as a third party. If cryptocurrencies were used according to the sacred word of Satoshi Nakamoto, they would be more like stashing cash in a safe or under the mattress than a high-tech, frictionless, secure, system of value transfer. Every efficiency a cryptocurrency product creates in the interface between person and blockchain is a denial of the core value proposition of cryptocurrencies.
What this translates to is a lack of usefulness, or at least a lack of evidence that it is useful enough to overcome the technical barriers that make it hard to use.
Comparisons to the “early internet” fail at this very point because the early internet, and the early web, both flourished despite agonising — and expensive — connection methods and complicated software that was designed and created by software engineers. Despite this, people were still clawing at their keyboards to get online.
Generative AI products being used for medical diagnosis, self-driving cars, or tutoring a child in mathematics, suffer from the same burden of the spectrum of knowing and trusting. If the power of AI is to mitigate or completely remove human error we’re either 100% certain of its reliability or we’re led to believe that it is 100% reliable. The former is impossible, the latter is a design challenge.
That design challenge is also known as marketing. Because GPT-4o and the like are not technologies, they are products that are being marketed.
Knowing what these things can’t do helps understand the problems that will arise when these things are used anyway. The goal of the massively funded startups behind these products is to market the awareness of those problems away.
This article was originally published at https://fasterandworse.com/known-purpose-and-trusted-potential/
You can follow me on Twitter at https://twitter.com/fasterandworse and Mastodon at https://hci.social/@fasterandworse
Known purpose and trusted potential was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.