Can I use LLMs to translate or localize conversational AI experiences?

*How f**ailing to engage a translator will negatively affect your users’ experience, adversely impacting your ROI and CSAT.*

A lot of companies want to expand conversational AI experiences in multiple languages and they want to do it fast. Thus, in this early era of LLMs, many companies are looking to use LLMs to accelerate the translation process. This is a natural assumption given the sheer language- generation capabilities of LLMs.

In the last 3 years I’ve advised 50+ brands in designing and optimizing their conversational experiences. Given that I’m multilingual and also have a PhD in linguistics, translation strategy is a frequent question I’m asked about. I have seen various approaches when it comes to localization and translation.

Keep reading to learn why you shouldn’t rely only on LLMs for translations, and to discover what’s the best practice you can follow that will allow you to still use LLMs but not compromise on quality.

What does translating a bot entail?

Translating a bot may seem like a relatively simple task. Look at the image below. At first glance it seems like what needs translation are all the words you see in the image below.

But take a closer look above and you’ll see a lot more that needs translation/localization:

the bot’s name

The bot’s name “Buddy” is a casual term for “friend”. A good translation will capture the essence of the name in the other language.

2. local/regional expressions

In the example above, the bot greets with a “Howdy” as opposed to a “Hello” and it uses a specific idiom “be fixin’ to do something” — you will need to capture the persona and tone expressed with these word choices in the other language

3. buttons

Those options or buttons you see, “Buy me some boots”, “Get help with my order” etc., often have a character limit depending on the interface you use, for example, Facebook buttons are limited to 20 characters, and Viber, 25. This means you’ll need to translate them within the character limit so they render well. In general different user interfaces have different constraints, look for resources online to learn more about the constraints and test them yourself.

4. regex patterns and NLU set up

Taking it one step further, what happens when a user types something into the bot? To capture the different things a user might say, you’ll need to add Regex patterns and set up all your NLU (intents, entities/slots, and training phrases) to make sure you understand the user correctly.

Regex is very helpful in cases where the bot is looking for defined patterns such as email addresses, telephone numbers, flight numbers etc. If the order number for a specific company has a regular pattern of 2 letters plus 6 numbers — AB123456, XY987654, etc, you can add the rule b[a-zA-Z]{2}[0–9]{6}b to capture an order number in a user’s utterance such as “Can you check on RF345678?”. This will help the bot understand the user better. While numbers are universal, how they are written in non-Latin scripts and in right to left writing systems is different. Regex can also be really helpful to set up affirmations or negations, so in addition to accepting “yes” in response to a question, you can add “sure” or “yep” and more affirmation options using Regex to make sure the bot understands the user. You will need to set up regex patterns based on the language in the bot and a translator can help you do that.

Setting up a strong NLU with intents, entities/slots, and training phrases helps understand the user correctly. For the intent “track my order’ you will need to think of all the ways a user might ask to track an order — “Where’s my sandwich?”, “When will my order arrive?”, “Help track my food”. All of these utterances can be used as training phrases to create the intent. Specifically in the utterance “Where’s my sandwich?”, depending on what the item is, the word “sandwich” could be replaced by “pizza” or “soup”. You can create a category called “menu items” and add each of those as an entity so the NLU wil capture “Where’s my [menu item]?” instead of specifically “Where’s my sandwich?”.

Other than using Regex and setting up a strong NLU, you’ll also need to account for standard commands like “back”, “repeat” etc.; and all of this needs to be handled in the language you are translating the bot into.

A quick note on voice bots before we go any further. If you’re working on translating a voice bot, you’ll need to do all of the above plus you’ll need to make an informed decision on what voice to choose for the bot: the voice that the bot speaks with and the speech recognition system to understand what the user is saying.

5. linguistic and cultural differences

It goes without saying that there are numerous linguistic and cultural differences you need to be not just aware of but knowledgeable on to get the translations and localizations just right. Let me share two examples. The first example is illustrated below. In Tamil, there is an inclusive “we” and an exclusive “we”. In English if I said “We are going to the movies”, the listener would not know from the words I used if “we” means “you and I” or “someone else and I. They would need to figure that out from context.

In Tamil, if I said “We-inclusive (நம்ப — namba) are going to the movies”, you would know that I mean “you and I” and if i said “We-exclusive (நாங்க — naangu) are going to the movies” it would be clear I mean “someone else and I”. The second example, in American English if you want to tell someone that you’re on your way, you might say “I’m coming”. In Spanish you would say “I’m going (Voy)”.

And it doesn’t end here. Once the translation is complete, pre-launch, you will need to carry out testing and post-launch, the bot needs to be maintained and iterated with a regular cadence. It is impossible to build and maintain a bot and to analyze and assess the quality of the bot in the native language without having a native/fluent speaker review these transcripts.

Need more reasons not to rely on LLMs or machine translations?

What does the research say?

Several studies have shown that translations using machines are not reliable — I’m linking one, two, and three here. While machines may be fast and could be more easily trained to be consistent with lexical choices, machines struggle to handle context and ambiguity. LLMs are also not very accurate in their translations; one study determined accuracy varies greatly by language: Spanish 94%, Tagalog 90%, Korean 82.5%, Chinese 81.7%, Farsi 67.5%, Armenian 55%).

Look at the following examples using machine-only translations, taken from these studies:

(1) English to Armenian

English: You can take over the counter ibuprofen as needed for pain.Translated: You may take anti-tank missile as much as you need for pain.

(2) English to Chinese

English: Your Coumadin level was too high today. Do not take any more Coumadin until your doctor reviews the results.Translated: Your soybean level was too high today. Do not take anymore soybean until your doctor reviews the results.

(3) Arabic to English

Arabic: ﻣﺎ ھﻮ اﻟﻤﻌﺪل اﻟﻤﻨﺎﺳﺐ ﻟﻠﻨﺠﺎحTranslated: what is the appropriate rate of success

Example (1) is such a bad translation, the user will probably realize it’s not translated right, but example (2) could pass for a correct translation and might endanger the user’s life since they wouldn’t stop taking Coumadin. In example (3), the issue is that the English translation is not a natural way of phrasing the statement in English and so it wouldn’t be clear to the user what was actually meant.

Look at the following image. This bot, powered by GPT-3, is built in India, a multilingual country with 18 officially recognized languages. When asked in multiple ways to speak Hindi, the bot doesn’t switch to Hindi and even gaslights the user by saying “I’ll continue to respond in English as requested”.

At the end when it is asked a question in Hindi, “Do you serve food on the flight?”; it responds to the question in English, “you can indulge in some munchies while you soar through the skies with us!”; making it clear that it does understand Hindi but refuses to speak it. There was obviously a business decision made about whether to allow this bot to speak Hindi or not and perhaps for legal reasons it doesn’t speak it. Personally, I think it would be more helpful if the bot let the user know where to get help in Hindi.

My point is if you decide to use an LLM, you will need to tell it how to handle situations like these in which a user requests the bot to speak another language. There’s no getting out of it, you’ll need a well-thought out strategy. If you let your LLM speak multiple languages, you’ll need to make sure it speaks those languages right.

Given all the above, don’t compromise on a great customer experience by using just LLMs for translation. Maybe in time, there could be more complex prompts or a reenvisioned UI that enables us to better translate all these nuances, but even that work needs to be completed by a human team and cannot be replaced by AI team.There are no shortcuts for a great customer experience.

Best practice: How to get it right

The ideal practice to translate an existing bot to another language is to hire an entire team — conversation designer, bot tuner, devs, testers, conversation analysts — in the target language, that is, the language in which you are translating the bot into.

And if the ideal practice is beyond reach — let’s be honest, most often the reality is that companies don’t usually afford the luxury of having a full team in the target language — so here’s the best workaround. Hire an expert translator; educate them on who the target users are, on conversation design, and bot tuning. Also educate and involve them on internal testing, user testing and have them help you with post launch analysis, iteration, and maintenance.

If you’re still unsure about when to use an LLM, that’s ok. Ask the translator, even if they have not had experience with nor understand LLMs, when they try it, they will know to assess the resulting translation. Most likely they will use it to create a first draft translation and then they will polish up the draft with knowledge only they posses for a final version. If you’re building a Gen AI bot, the conversation designer and translator will need to work closely on the prompt design. Together they can determine if they should write the prompt in the target language, or keep it in the original language and add extra details about the target language so that the bot has rules in the target language for the translation. Basically with the current technology, you cannot use an LLM without human involvement for good quality translations.

But if this also seems like a lot of investment and you cannot do it right, then don’t do it. If you don’t do a thorough job, you aren’t respecting your customer nor are you being inclusive which most likely will result in a net negative CX. Instead, revisit the reason you wanted to make the bot available in another language in the first place. Assess: is there really a need to provide this experience in multiple languages?; will enough people use the experience to make it worth it?; will making the bot available in multiple languages improve the user experience significantly?; do you truly have the resources to make it worthwhile for your company and your customers? It is not worthwhile to treat translation activities as a box-checking activity.

Alternative solutions to handling other languages

Don’t have bandwidth or time to do the above, but have people coming to your bot expecting it to work in another language? An alternative to translating the entire bot is to let the customers know where and how they can get help. So if you have many customers that speak only Spanish, it’s ok to add a button in the bot so these customers can select the button “español” and connect directly to a Spanish speaking agent, as shown in the image below.

Alternatively, if you do not have agents on that platform who can help, let the customer know where they can get the help they need in the language the customer uses. An example of how an LLM powered bot could handle this elegantly is illustrated in the image below; all you need to do is make sure you design your prompt to handle for multiple languages.

If you’re building conversational experiences in multiple languages, go ahead, use LLMs but don’t skip engaging an expert translator to get it right. Remember a mediocre customer experience can really impact a brand’s reputation. It’s worth the ROI and it is ethical, inclusive, and humane to respect your customers and do translations right.

Many thanks to Meredith Schulz and Cathy Pearl for their valuable advice on drafts of this article. I originally presented a version of this article as a talk at the Unparsed 2024 conference.

Can I use LLMs to translate or localize conversational AI experiences? was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.

Article Categories:

Technology