TTS1 Spanish text normalization

The LumenVox TTS1 Text-To-Speech synthesizer works internally by synthesizing words. However, input text documents contain not only words, such as leche and azúcar, but also various other written elements, such as numbers (15), date (3-4-2003), acronyms (Renfe, Unesco), abbreviations (A.D.), symbols ($), etc. All such elements must first be converted to actual words, and only then synthesized. This conversion takes place internally within the synthesizer. Such conversion is called text normalization.

The European and American Spanish TTS1 Text-To-Speech voices correctly normalize and synthesize the majority of Spanish texts. This document describes how LumenVox accomplishes the task of text normalization.

The user may extend LumenVox' text normalization by using PLS lexicons (as defined in the W3C pronunciation-lexicon Recommendation).

Please note that this article does not apply to our TTS2 voices.

Text Structure

This section describes how unannotated input text is split into paragraphs, sentences and words.

Paragraph

Paragraphs are separated by empty lines.

Paragraphs may be explicitly marked with SSML elements <p>.

Sentence

A sentence contains by default less than 1000 characters. Sentences longer than that will be broken into multiple smaller sentences.

Sentences may be explicitly marked with SSML elements <s>.

Word

A word contains by default less than 100 characters. Words longer than that will be broken into multiple smaller words.

Words without any vowels will be spelled out.

Supported characters

LumenVox accepts all Unicode characters. LumenVox handles most characters found in texts based on the Latin script.

Punctuation

Punctuation plays a key role in the way texts are interpreted by the TTS system. LumenVox supports majority of punctuation marks found in Spanish texts. However, in the end all punctuation marks which have effect on pauses or intonation are mapped to the following marks.

Punctuation marksPauseIntonation
,smallslightly rising
; :
mediumfalling
. !long
falling
¿ ?long
rising or falling

Default normalization rules

This section describes in general how LumenVox normalizes input text, excluding text fragments marked with the SSML say-as element.

This section is not exhaustive. LumenVox normalizes lots of various text elements, but only the most common have been described over here.

Number

Decimal and fraction separators

By default, European Spanish voices treat a dot as a thousands separator and a comma as a fraction separator and American Spanish voices treat a comma as a thousands separator and a dot as a fraction separator.  This means that some numbers will be read differently by European and American Spanish voices.

Examples

  • 10.000 will be pronounced diez mil (European Spanish) or diez punto cero cero cero (American Spanish).
  • 10,000 will be pronounced diez coma cero cero cero (European Spanish) or diez mil (American Spanish).

However, LumenVox properly detects decimal and fraction separators for number which are unambiguous.

Examples

  • 10,3 will be pronounced diez coma tres.
  • 10.000,34 will be pronounced diez mil coma treinta y cuatro.
  • 10,000.34 will be pronounced diez mil punto treinta y cuatro.
  • 10.000.000 will be pronounced diez millones.

Cardinal number

A cardinal number is either any single digit (0, 1, …, 9) or a sequence of digit not starting with 0.

Longer cardinal numbers may make use of a space, a dot (European Spanish) or a comma (American Spanish) as a thousands separator.

Examples

  • 256 will be pronounced doscientos cincuenta y seis.
  • 4358 will be pronounced cuatro mil trescientos cincuenta y ocho.
  • 10.000 will be pronounced diez mil (European Spanish) or diez punto cero cero cero (American Spanish).
  • 10,000 will be pronounced diez coma cero cero cero (European Spanish) or diez mil (American Spanish).
  • 20 000 000 will be pronounced veinte millones.

Signed integer

A signed integer consists of a sign character followed immediately by a cardinal number. Valid sign characters are the plus sign (+), the minus sign (-, U+2212). The popular hyphen-minus character (-), as well as other dash-like characters, are also supported as the sign character, but they are ambiguous and should best be avoided.

Examples

  • +5 will be pronounced más cinco.
  • -300 will be pronounced menos trescientos.

Real number

A cardinal or signed integer followed immediately by a comma (European Spanish) or a dot (American Spanish) and a sequence of digits will be recognized as a real number.

Examples

  • 4,5 will be pronounced cuatro coma cinco.
  • -3,1 will be pronounced menos tres coma uno.
  • 1.000,12 will be pronounced mil coma doce.

Ordinal number

1er, 3er or a ordinal number with suffixes º, ª, o or a, is interpreted as an ordinal number.

Examples

  • 1er will be pronounced primer.
  • 3er will be pronounced tercer.
  • 21a will be pronounced vigésima primera.
  • 42o will be pronounced cuadragésimo segundo.
  • 6a will be pronounced sexta.
  • 1.000.000a will be pronounced primera millonésima.

Sequence of digits

Sequences of more than one digit starting with 0 are always read as a sequence of digits.

Examples

  • 0123 will be pronounced cero uno dos tres.

Currency

LumenVox supports a wide list of currencies in multiple formats. Valid currency symbols include commonly used symbols such as £, $,  €,  ¥.

The number may be followed by the words mil, millones, millardos. In this case the currency will be pronounced at the end.

Examples

  • $10 will be pronounced diez dólares.
  • $5,27 will be pronounced cinco dólares y veintisiete centavos.
  • £5.27 will be pronounced cinco libras y veintisiete peniques.
  • €5,27 will be pronounced cinco euros y veintisiete céntimos.
  • €2 mil millones will be pronounced dos mil millones de euros.
  • ¥1000,2 will be pronounced mil yenes y veinte senes.
  • ¥1 millón will be pronounced un millón de yenes.

Time

LumenVox supports time specified in both the 12-hour and the 24-hour clock.

  • 1:59 will be pronounced una cincuenta y nueve.
  • 2:00 will be pronounced dos.
  • 01:59am will be pronounced una cincuenta y nueve a eme.
  • 2 AM will be pronounced dos a eme.
  • 13:00 will be pronounced trece.

Date

One-digit numbers for the day and for the month may have an optional leading zero.

Supported formats for month expressions: numbers (4, 04), name (abril), abbreviation (abr).

The year may have either 2 or 4 digits.

European format (D/M/Y,  D-M-Y,  D.M.Y), default for European Spanish voices:

  • 12/mayo/1995 will be pronounced doce de mayo de mil novecientos noventa y cinco.
  • 12-abr-2007 will be pronounced doce de abril de dos mil siete.
  • 20.3.2011 will be pronounced veinte de marzo de dos mil once.

Standard US format (M/D/Y,  M-D-Y,  M.D.Y), default for American Spanish voices:

  • 12/31/1999 will be pronounced treinta y uno de diciembre de mil novecientos noventa y nueve.
  • 10-25-99 will be pronounced veinticinco de octubre de mil novecientos noventa y nueve.
  • dic/31/1999 will be pronounced treinta y uno de diciembre de mil novecientos noventa y nueve.
  • abril-25-1999 will be pronounced veinticinco de abril de mil novecientos noventa y nueve.

ISO 8601 standard (Y-M-D,  Y/M/D,  Y.M.D), handles only 4-digit years:

  • 2007/01/01 will be pronounced uno de enero de dos mil siete.
  • 2007-Jan-01 will be pronounced uno de enero de dos mil siete.
  • 2007-January-01 will be pronounced uno de enero de dos mil siete.

Abbreviations

Most abbreviations will be expanded to full words. There will be no sentence break on the dot sign (full stop) following a supported abbreviation. In order to force a sentence break please use two dots: one to mark the abbreviation and one to mark the sentence ending.

Example

  • 10 km will be interpreted as diez kilómetros.
  • Núm. tel. de Sr. Rodriguez will be interpreted as Número teléfono de Señor Rodriguez.
  • EE. UU. will be interpreted as Estados Unidos.

Initialisms

Initialisms with a period (dot) following each letter (e.g. D.E.A.) will be pronounced by spelling out each letter.

Most common initialisms without dots (e.g. UE, ONG) will also be recognized as such and properly pronounced.

All vowelless words are recognized as initialisms.

Examples

  • S.M.S. will be pronounced ese eme ese.
  • URL will be pronounced u erre ele.
  • R.T.V.E. will be pronounced erre te uve e.
  • pwq will be pronounced pe uve doble cu.

Telephone numbers

LumenVox recognizes Spanish and American telephone number formats and reads them as series of 2 or 3-digit numbers.

Examples

  • (+34) 32 456 2344 will be pronounced as más treinta y cuatro, treinta y dos cuatrocientos cincuenta y seis veintitrés cuarenta y cuatro.
  • 22 345 22 12 will be pronounced as veintidós trescientos cuarenta y cinco veintidós doce.
  • 596-334-3443 will be pronounced as quinientos noventa y seis trescientos treinta y cuatro treinta y cuatro cuarenta y tres.
  • (55) 4323 3345 will be pronounced as cincuenta y cinco cuarenta y tres veintitrés treinta y tres cuarenta y cinco.
  • (334) 966-8223 will be pronounced as trescientos treinta y cuatro novecientos sesenta y seis ochenta y dos veintitrés.
  • 443/298-9280 will be pronounced as cuatrocientos cuarenta y tres doscientos noventa y ocho noventa y dos ochenta.
  • +1-433-853-2892 will be pronounced as más uno cuatrocientos treinta y tres ochocientos cincuenta y tres veintiocho noventa y dos.

Identifier

Non-words not described elsewhere will be treated as identifiers. This group includes mixes of letters and digits, such as r121, as well as URL’s, e-mail addresses, or fancy proper names unknown to the synthesizer.

Punctuation characters within identifiers will be pronounced.

Examples

  • er125lp will be pronounced er ciento veinticinco ele pe.
  • http://www.lumenvox.com will be pronounced hache te te pe dos puntos barra barra uve doble uve doble uve doble punto lumenvox punto com.
  • B!0 will be pronounced be signo de exclamación cero.


SSML say-as attribute values

The SSML element say-as gives users the possibility to annotate fragments of text in order to force particular interpretation.

Marking a fragment with say-as disables most default normalization rules, which would have otherwise been applied. Therefore, it is advised to mark text with say-as scarcely, only when the default normalization rules fail and render different speech than expected by the user.

The standards authority W3C Working Group has issued a note describing SSML 1.0 say-as attribute values, which is mostly followed by LumenVox.

Date

LumenVox will interpret a value as a date, when used within say-as with interpret-as="date". This works just as defined in the W3C note. The format attribute may be set to any of the following: mdy, dmy, ymd, md, dm, ym, my, y, d, m.

Examples

  • <say-as interpret-as="date" format="mdy">05/02/03</say-as> will be pronounced dos de mayo de dos mil tres.
  • <say-as interpret-as="date" format="dmy">05/02/03</say-as> will be pronounced cinco de febrero de dos mil tres.
  • <say-as interpret-as="date" format="ymd">05/02/03</say-as> will be pronounced tres de febrero de dos mil cinco.

Duration

Tokens like 7'10" can be recognized as duration in minutes and seconds by surrounding with say-as having interpret-as="time".

Example

  • <say-as interpret-as="time">1'23"</say-as> will be pronounced un minuto y veintitrés segundos.

Character string

LumenVox will read individual characters for text within the say-as element having interpret-as="characters". The format attribute is ignored. The detail attribute may be used to force pauses, as described in the W3C Note.

Examples

  • <say-as interpret-as="characters">velocidad</say-as> will be pronounced uve e ele o ce i de a de.
  • <say-as interpret-as="characters">1a3BZ7</say-as> will be pronounced uno a tres be zeta siete.

Cardinal number

LumenVox will attempt to read values within say-as having interpret-as="cardinal" as cardinal numbers. The format and detail attributes are ignored. Roman numerals are supported.

Examples

  • <say-as interpret-as="cardinal">13</say-as> will be pronounced trece.
  • <say-as interpret-as="cardinal">C</say-as> will be pronounced cien.
  • <say-as interpret-as="cardinal">MCMXCIX</say-as> will be pronounced mil novecientos noventa y nueve.

Ordinal number

LumenVox will attempt to read values within say-as having interpret-as="ordinal" as ordinal numbers. The format and detail attributes are ignored. Roman numerals are supported.

Examples

  • <say-as interpret-as="ordinal">986</say-as> will be pronounced noningentésimo octogésimo sexto.
  • <say-as interpret-as="ordinal">C</say-as> will be pronounced centésimo.
  • <say-as interpret-as="ordinal">MCMXCIX</say-as> will be pronounced milésimo noningentésimo nonagésimo noveno.

Digits

LumenVox will attempt to read values within say-as having interpret-as="digits" as digits.

Examples

  • <say-as interpret-as="digits">123</say-as> will be pronounced uno dos tres.
  • <say-as interpret-as="digits">C</say-as> will be pronounced uno cero cero.
  • <say-as interpret-as="digits">MCMXCIX</say-as> will be pronounced uno nueve nueve nueve.

Fraction

LumenVox will interpret values within say-as having interpret-as="fraction" as common fractions. The syntax for fractions is as follows:

Fraction
["+" | "-"] cardinal “/” cardinal.

where cardinal is a number as defined in Cardinal numbers above.

Examples

  • <say-as interpret-as="fraction">15/2</say-as> will be pronounced quince entre dos.
  • <say-as interpret-as="fraction">-1/2</say-as> will be pronounced menos uno entre dos.

LumenVox special characters

As mentioned at the very beginning of this text, it is sometimes necessary to modify texts to be synthesized in order to make them compatible with the system constraints and achieve the expected output. LumenVox provides a set of special characters that work only in certain contexts, changing the way texts are being synthesized in terms of pronunciation or intonation. The characters are language-specific and do not apply to other languages unless specified otherwise in the language-specific documentation.


Replacement of ?

The ? letter (n-acute) will be pronounced the same way as ñ  (n with a diacritical tilde).


Force rising intonation

A question mark followed by caret also known as circumflex (?^) can be used to force the intonation of a question to be rising. Wh-questions (questions starting with an interrogative pronoun) by default have falling intonation. This can be changed by appending a caret to the question mark.

Example

  • ¿Cómo estás?^ will result in a rising intonation.


Force falling intonation

A question mark followed by an underscore (?_) can be used to force the intonation of a question to be falling. Yes/No questions by default have a rising intonation. This can be changed by appending the underscore character to the question mark.

Example

  • ¿Estás bien?_ will result in a falling intonation.

Was this article helpful?
Copyright (C) 2001-2024, Ai Software, LLC d/b/a LumenVox