From human-like interactions to voice customization and accessibility, learn how to create smarter, more user-centered chatbots.
The rise of AI has transformed how we think about product design and development. Platforms like GPT and Gemini have made it possible to create chatbots with unprecedented sophistication, bringing cutting-edge technology closer to everyday applications. But this isn’t just about tools or capabilities — it’s about a shift in how we approach design itself.
For designers, the introduction of AI marks the beginning of a new chapter that requires us to rethink traditional processes and embrace entirely new methods. Building AI-powered products is far from a plug-and-play process; it demands careful attention to user experience, deeper insights into user behavior, and a commitment to crafting solutions beyond functionality. With AI, we have an extraordinary opportunity to connect with users more personally, creating tailored experiences that address their unique needs, preferences, and limitations.
Over the past year, I’ve been deeply immersed in designing an AI-driven chatbot, gathering valuable insights and experience along the way. In this article, I’ll share some thoughts on how to make chatbot experiences feel more real, natural, and user-friendly — qualities that people genuinely seek in conversational AI.
Designing the look of your chatbot
There are a few schools of thought when it comes to visualizing chatbots. Faceless chatbots, like those of GPT, Gemini, or Google Assistant, are often represented by simple illustrations or icons — especially in text mode, where their small avatar size requires a clear, recognizable design. In voice mode, these chatbots sometimes adopt abstract compositions, such as the visual styles seen with GPT, Gemini, or the recently refreshed Siri. This approach is common for AI models designed to be integrated into a variety of specific products. (For the record, I’m a fan of Siri’s new look!)
ChatGPT & Gemini in voice chat mode
As we delve deeper into building more specialized products, the avatar strategy tends to shift. In these cases, it’s not uncommon to see chatbots represented by character avatars. While some might find this approach too literal, it can be highly effective, particularly in contexts like customer service. However, this strategy comes with a potential pitfall: if the avatar appears very human-like but doesn’t fully reach the level of realism needed to feel truly human, it risks crossing into the “uncanny valley.” This is that strange moment when the avatar feels almost human but not quite enough, creating an awkward or unsettling experience for users. I’ll delve further into this issue in future posts.
Praktika.ai: Automated 1–1 tutorship powered by gen-AI avatars
Choosing the right design
If you’re unsure which approach to take, consider allowing users to customize the look of the chatbot in the settings. Provide a few different options, including abstract and literal representations, and let users choose their preferences. This approach not only personalizes the experience but also provides valuable insights — by analyzing the resulting data, you can identify trends and make more informed design decisions.
Tailoring the voice: tone and style
With the advancement of products like ElevenLabs, we now have powerful tools to fine-tune the tone and style of a chatbot’s voice responses. Designers can decide whether they want the chatbot to respond in a neutral, generic tone, adopt a softer, whispering style, or even adapt its tone and intonation dynamically based on specific contexts.
ElevenLabs.io: AI agent; test mode
Why is this level of customization so crucial? For two reasons. First, in real life, the way we speak is rarely linear. Humans are emotional beings, and context almost always shapes our communication. For example, the tone we use when apologizing is very different from the tone we use when celebrating. To make the experience feel more authentic (and potentially increase user engagement — though there’s a caveat, which I’ll elaborate on at the end of this entry), it’s vital to align the speaking style of the chatbot with the weight of the words and the context of the conversation.
Good communication is about more than just the words themselves. According to the 55/38/7 formula, only 7% of communication is conveyed through words. A significant 38% comes from vocal tone, and 55% is from nonverbal cues. This makes it essential for chatbots to respond in a manner that feels human and emotional. This doesn’t just mean matching the tone to the context; it also requires the chatbot to interpret the user’s input on a deeper, more emotional level to ensure a truly natural interaction.
The role of accents
Another important aspect of a chatbot’s speaking style is its accent. For users outside English-speaking countries, there’s often a perception of a “standard” British accent, sometimes — though less and less often — associated with Received Pronunciation (RP). However, within the UK, there are nearly 40 distinct regional accents, each with its unique character and identity, showcasing the true diversity of English speech.
https://medium.com/media/5f134159c2d2a6e96e2f34c7b6682a0b/href
One of the most surprising and entertaining updates to ChatGPT’s voice mode has been its ability to adopt accents. But it doesn’t stop at simply choosing an accent for your assistant, which is already a common feature. You can now ask the assistant to speak in a mixed accent, such as that of a Polish person who has lived in Ireland for years. GPT handles this surprisingly well, combining strong Eastern European pronunciation with the unique rhythm and intonation typical of Irish English, resulting in an authentic and highly entertaining interaction.
ChatGPT: voice chat; choose a voice section
Now, imagine you’re designing a customer service chatbot for different regions of the UK. Instead of offering a one-size-fits-all voice, your chatbot could adopt the local accent of each region, creating a more relatable and tailored experience for users. For example, a chatbot in Newcastle could use a Geordie accent, while one in Birmingham might adopt the Brummie style. This level of customization would not only enhance user engagement but also add cultural familiarity, making the interaction feel more personal and genuine.
https://medium.com/media/75d038594ab5223a9db188ab6fc1a740/href
Currently, none of the available models offer a wide range of regional accents (which is unfortunate), but GPT does include a limited selection of English accents. With ongoing experiments in this area, the future of regional accent customization looks promising.
Text reveal: balancing message length and user experience
When it comes to chatbot message length, platforms like GPT and Gemini generally aim to balance conciseness and depth. By default, these models prioritize concise responses while ensuring they fully address the user’s query. For instance, simple questions typically result in answers averaging around 20–50 words.
However, not all chatbots need to follow this formula. For example, a Storytelling Chatbot might require longer and more engaging narratives to entertain users, where the goal extends beyond providing information.
ChatGPT: text chat mode
Why does this matter?
Aligning the message style with the product’s purpose and the conversation’s context is essential. At the same time, overly lengthy paragraphs can feel overwhelming, especially if the UI isn’t designed to handle them effectively. Thoughtful text-reveal strategies and interactions play a vital role in ensuring a smooth user experience that aligns both UI and UX.
Looking at popular AI models like GPT, Claude, Gemini, and Grok, we can observe notable differences in how information is revealed to users:
GPT and Claude present the text in a typewriter-like fashion, where the words appear as if they’re being typed out in real time. While this adds an element of dynamism, it can feel stressful for users who are more sensitive to visual stimulation or time pressure.Claude: text chat modeGemini takes a different approach by displaying a shimmering preloader while the response is being generated, which can feel more anticipatory and less jarring.Gemini: text chat modeGrok and Pi.ai (built on Claude) stand out with a more subtle and polished reveal. Their text appears smoothly and pleasingly, making the experience particularly comfortable, especially when the generated content is lengthy.Pi.ai: text chat mode
Managing cognitive load
Another critical aspect of chatbot design is managing cognitive load by reducing visual clutter and maintaining focus. Platforms like Pi.ai, for instance, shift older responses out of view as new ones are generated. This approach keeps the interface clean and allows users to focus on the most relevant and recent information without being overwhelmed by chat history clutter.
Adjusting the pace of the responses
One of the lesser-explored patterns in voice chatbots is providing settings to adjust the pace of the responses. While similar tools are commonly used by screen reader users, they remain a novelty in the context of voice chatbots.
Now, imagine two simple sliders: one controlling the overall response rate (how fast the chatbot speaks), and another adjusting the pauses between sentences or paragraphs.
This solution is both simple and incredibly powerful, yet it’s an area that hasn’t been fully explored in AI chatbots. (Let me know in the comments if you’ve come across a chatbot that offers something similar!)
VoiceOver settings: speaking rate slider
This kind of customization could be particularly helpful for:
Users with hearing difficulties need slower and clearer responses.Non-native speakers, often benefit from slower speech and longer pauses for comprehension.Users with cognitive challenges, for whom more deliberate pacing aids understanding.High-stress situations, where slower and calmer responses help reduce anxiety (e.g., mental health or crisis support chatbots).
Integrating this feature would not only improve accessibility but also create a more personalized and user-friendly experience. It’s a small addition with the potential for a big impact.
Other conversation dynamics vs. UI patterns
When it comes to human-chatbot interaction, there are currently three primary UI patterns:
Voice-to-Voice Mode: This is the most natural and hands-free option, where users don’t need to interact with the device to communicate physically.Hold-to-Talk Mode: The user presses and holds a microphone button to speak to the chatbot.Record Mode: A familiar pattern found in most messaging apps, where the user records a message and sends it to the chatbot (or to a person) for processing.1: Voice-to-voice; 2: Hold-to-Talk; 3: Record
From a communication standpoint, hands-free voice-to-voice interaction feels the most natural. However, it presents significant UX challenges, even with advanced models like ChatGPT. One notable issue is that chatbots still struggle to accurately detect when a user has finished speaking.
Enhancing voice interactions
In the latest version of GPT’s voice chatbot, there are still occasional scenarios where the assistant might step in prematurely if a user pauses mid-sentence to gather their thoughts. While this can interrupt the flow of the conversation, GPT offers some features that significantly improve the experience:
Interruptibility: Users can interrupt the assistant mid-response. It immediately stops speaking and resumes listening, allowing the user to continue seamlessly.Adjustable Listening Time: Users can request the assistant to allow more time for their responses. This feature helps ensure that pauses for thinking don’t lead to interruptions, resulting in a smoother conversational flow.
These features make the latest GPT version one of the most advanced voice chat assistants available, demonstrating noticeable progress in addressing common challenges in voice-to-voice interactions.
Reliable voice input methods
If you’re designing a chatbot interface, especially for voice interactions, it’s important to acknowledge these challenges. At the current stage of technology, the most reliable input methods remain:
Hold-to-Talk Buttons: A simple and familiar method that minimizes errors in detecting when the user is finished speaking.Record Mode: A practical and widely accepted solution for asynchronous voice input.
While the hands-free voice-to-voice experience is improving rapidly, it’s not yet flawless. For now, designing with more controlled interaction patterns like hold-to-talk or record mode will provide a safer, more consistent user experience. Eventually, as technology advances, voice-to-voice interaction will likely become seamless — but we’re not there quite yet.
Summary
All the points mentioned above should not be taken as definitive advice for the design process. Since we are still in the early stages of the robotics era — and chatbots are, in essence, a form of robotics — we cannot fully predict how users will adapt to them. Some chatbots may excel with a more natural, human-like tone, while others might perform better with a rigid, robotic approach.
As we navigate this new chapter in UX/UI design, it’s clear there is no universal formula or one-size-fits-all solution. The key to creating a high-performing chatbot lies in following an iterative process: designing, testing, learning, and repeating. Only through this cycle can we refine and adapt to meet the evolving needs and preferences of users.
References I recommend to go through:
AI: First New UI Paradigm in 60 Years by Jakob NielsenWhat is chatbot design? by IBM.comThe Art of Building Customer-Facing AI Chatbots by Phaneendra Kumar NamalaThe best links to get started with Conversational UI and chatbots by Caio BragaThe Power of Voice: How Sound Shapes Our Emotions and Interactions by MillianSpeaks | The Psychology of SoundDesigning for AI: beyond the chatbot by Ridhima GuptaThe chatbot that mimics your accent — and uses street slang by Mark Sellman for The Sunday TimesCognitive Load and UI Design: Simplifying Interfaces for Enhanced User Experience by Jakub WojciechowskiDigital Accessibility: Understanding Screen Reader Interaction by Customer Experience PrudentialIterative Design: How to Optimize the Product Design Process by Vladimir PavlovWeb Accessibility Tips: Give People Enough Time by Bureau of Internet Accessibility
Beyond the bot: redefining chatbot design in the age of AI was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.