Learn how to write scripts that help the avatar speak clearly and sound natural when using the Text to avatar feature.
When using the Text to avatar feature, it's important to follow best practices when adding scripted dialogue in the "Content" field. This field is where you input the lines the avatar will speak. To ensure the avatar delivers the dialogue accurately and naturally, the script should be well-structured, clearly written, and aligned with the best practices outlined below. This will help maximize the feature’s effectiveness and result in the most realistic avatar performance.
Maintain correct spelling and grammar in your scripted dialogue to ensure a professional and intended avatar performance. Well-written text enables the AI avatar to speak more naturally and fluently, significantly reducing the risk of mispronunciations or awkward phrasing in the final generated video.
Examples of Do's
- This is a good example.
- What is an example like?
- This could be because of a delay in the service.
Examples of Don'ts
- This is a gud example.
- Wut is an example like?
- This could be coz of a delay in the service.
When writing dialogue, using the correct punctuation acts like a guide for how the avatar should speak, affecting both the meaning and the way the words are delivered. Commas, periods, and other marks help control the rhythm, tone, and emotion in the speech. With careful punctuation, the avatar sounds more natural, expressive, and easy to understand.
Examples of Do's
- Let’s eat, grandma!
- Go to the next step.
- Let’s all begin now!
Examples of Don'ts
- Let’s eat grandma!
- Go to the next step
- We all will start... now.
If the dialogue text doesn't clearly convey certain emotions, you can enhance it by adding character-acting quotes around the text. This helps ensure that the avatar in the final generated video expresses the intended emotions and meaning of the text.
Examples of Do's
- He said with joy: "That is amazing!"
- She said with a smile: "Welcome to your new topic."
- She warned sternly: "Please avoid this approach."
Words written in all capital letters are treated as acronyms or initialisms, meaning each letter is pronounced separately. However, the same word in lowercase is read as a regular word. This difference plays a key role in avatar dialogue, as it directly affects how the speech sounds and how accurately the intended message is delivered.
For example, "POC" will be spoken as "Pee-Oh-See", but "poc" will be spoken as "pock".
Examples of Do's
- "AI" will be spoken as "Aye-Eye"
- "CEO" will be spoken as "See-Ee-Oh"
- "USA" will be spoken as "You-Ess-Ay"
Examples of Don'ts
- "ai" might be pronounced as "eye"
- "ceo" could be misread as a single word.
- "usa" might be pronounced as "oo-sa".
When writing abbreviations, especially those that end with the letter 's'—insert a dash between each letter to ensure the avatar pronounces them clearly and individually. This prevents the abbreviation from being misread or spoken as a single word.
For example, "PDFs" will be spoken as the "s" sound rather than "es". Another example, "POCi", will cause the word to be spoken as "posi" rather than "Pee-Oh-Si". Add a dash between letters, for example, "P-O-Ci", to get the intended speech.
Examples of Do's
- "A-P-I-S" spoken as "Aye-Pee-Eye-Ess"
- "S-D-K-S" spoken as "Ess-Dee-Kay-Ess"
- "U-I-S" spoken as "You-Eye-Ess"
Examples of Don'ts
- "APIs" spoken as "apees"
- "SDKs" spoken as "esdkes"
- "UIs" spoken as "yous" or "oois"
When writing dialogue for the Text to Avatar feature, write numbers as words instead of digits, especially when talking about things like dates, times, prices, measurements, or amounts. This helps the avatar say them more naturally and makes sure the meaning comes across clearly.
Examples of Do's
- nineteen point eight four
- three hundred dollars
- forty-two degrees Celsius
Examples of Don'ts
- 19.84
- $300
- 42°C
Understand that AI-generated voices are nondeterministic, meaning they can produce slightly different results each time, even with identical scripts and voices. For example, imagine a voice actor giving multiple takes, and each take has subtle differences.