TTS1 German text normalization

The LumenVox TTS1 Text-To-Speech synthesizer works internally by synthesizing words. However, input text documents contain not only words, such as milch and zucker, but also various other written elements, such as numbers (15), date (3/4/2003), acronyms (ADAC, AG), abbreviations (z.B.), symbols ($), etc. All such elements must first be converted to actual words, and only then synthesized. This conversion takes place internally within the synthesizer. Such conversion is called text normalization.

The German TTS1 Text-To-Speech voices correctly normalize and synthesize the majority of German texts. This document describes how LumenVox accomplishes the task of text normalization.

The user may extend LumenVox' text normalization by using PLS lexicons (as defined in the W3C pronunciation-lexicon Recommendation).

Please note that this article does not apply to our TTS2 voices.

Text Structure

This section describes how unannotated input text is split into paragraphs, sentences and words.

Paragraph

Paragraphs are separated by empty lines.

Paragraphs may be explicitly marked with SSML elements <p>.

Sentence

A sentence contains by default less than 1000 characters. Sentences longer than that will be broken into multiple smaller sentences.

Sentences may be explicitly marked with SSML elements <s>.

Word

A word contains by default less than 100 characters. Words longer than that will be broken into multiple smaller words.

Words without any vowels will be spelled out.

Supported characters

LumenVox accepts all Unicode characters. LumenVox handles most characters found in texts based on the Latin script.

Punctuation

Punctuation plays a key role in the way texts are interpreted by the TTS system. LumenVox supports majority of punctuation marks found in German texts. However, in the end all punctuation marks which have effect on pauses or intonation are mapped to the following marks.

Punctuation marksPauseIntonation
,smallslightly rising
; :
mediumfalling
. !long
falling
?
long
rising or falling

Default normalization rules

This section describes in general how LumenVox normalizes input text, excluding text fragments marked with the SSML say-as element.

This section is not exhaustive. LumenVox normalizes lots of various text elements, but only the most common have been described over here.

Number

Cardinal number

A cardinal number is either any single digit (0, 1, …, 9) or a sequence of digit not starting with 0.

Longer cardinal numbers may make use of comma as a thousands separator.

Examples

  • 10.000 will be pronounced zhentausend.
  • 256 will be pronounced zweihundertsechsundfünfzig.
  • 4358 will be pronounced viertausenddreihundertachtundfünfzig.
  • 1.000 will be pronounced eintausend.

Signed integer

A signed integer consists of a sign character followed immediately by a cardinal number. Valid sign characters are the plus sign (+), the minus sign (-, U+2212) and the plus-minus sign (±). The popular hyphen-minus character (-), as well as other dash-like characters, are also supported as the sign character, but they are ambiguous and should best be avoided.

Examples

  • +5 will be pronounced plus fünf.
  • -3.000 will be pronounced minus dreitausend.

Real number

A cardinal or signed integer followed immediately by the dot and a sequence of

digits will be recognized as a real number.

Examples

  • 4,5 will be pronounced vier komma fünf.
  • -3,1 will be pronounced minus drei komma eins.
  • 1.000,12 will be pronounced eintausend komma zwölf.

Ordinal number

A cardinal followed by a dot is interpreted as an ordinal number. There are exceptions from this rule when such a sequence is placed at the end of the sentence. In this case the dot will just mark the end of the sentence and will not influence the interpretation of the numeral. To get a number interpreted as an ordinal at the end of the sentence an additional dot should be applied.

  • 21. will be pronounced einundzwanzigste.
  • 42. will be pronounced zweiundvierzigste.
  • 6. will be pronounced sechste.
  • 1.000.000. will be pronounced millionste.
  • Ich bin der 5.. will be pronounced ich bin der fünfte.

Cardinal followed by s, ten and other inflections

Suffixes such as ter, ten and other inflections may be applied to any cardinal number.

  • 60es will be pronounced sechziges.
  • 200en will be pronounced zweihunderten.
  • 7er will be pronounced siebener.
  • 1000em will be pronounced eintausendem.
  • 5ter will be pronounced fünfter.
  • 25ten will be pronounced fünfundzwanzigsten.

Sequence of digits

Sequences starting with 0 are always read as a sequence of digits.

Example

  • 0123 will be pronounced null eins zwei drei.

Currency

LumenVox supports a wide list of currencies in multiple formats. Valid currency symbols include commonly used symbols such as £,  $,  €,  ¥.

The number may be followed by the words Million or Milliard their various abbreviations. In this case the currency will be pronounced afterwards.

  • $10 will be pronounced zehn dollar.
  • 5,27$ will be pronounced fünf dollar und siebenundzwanzig cent.
  • £5.27 will be pronounced fünf pfund und siebenundzwanzig pence.
  • 1000,2¥ will be pronounced eintausend yen und zwanzig sen.
  • ¥1 Million will be pronounced eine million yen.
  • 5,27 will be pronounced fünf euro und siebenundzwanzig cent.
  • €5,27 will be pronounced fünf euro und siebenundzwanzig cent.

Time

LumenVox supports time specified in both the 12-hour and the 24-hour clock.

  • 1:59 will be pronounced ein uhr neunundfünfzig.
  • 2:00 will be pronounced zwei uhr.
  • 01:59am will be pronounced ein uhr neunundfünfzig a m.
  • 2 AM will be pronounced zwei a m.
  • 13:00 will be pronounced dreizehn uhr.

Date

One-digit numbers for the day and for the month may have an optional leading zero.

Supported formats for month expressions: numbers (4, 04), name (April), abbreviation (Apr).

The year may have either 2 or 4 digits.

European format (D/M/Y,  D-M-Y,  D.M.Y), default for all German voices:

  • 12/mai/1995 will be pronounced zwölfte mai neunzehn hundert fünfundneunzig.
  • 12-Apr-2007 will be pronounced zwölfte april zweitausendsieben.
  • 20.3.2011 will be pronounced zwanzigste märz zweitausendelf.

Standard US format (M/D/Y,  M-D-Y,  M.D.Y):

  • 12/31/1999 will be pronounced einunddreißigste dezember neunzehn hundert neunundneunzig.
  • 10-25-99 will be pronounced fünfundzwanzigste oktober neunzehn hundert neunundneunzig.
  • Dez/31/1999 will be pronounced einunddreißigste dezember neunzehn hundert neunundneunzig.
  • April-25-1999 will be pronounced fünfundzwanzigste april neunzehn hundert neunundneunzige.

ISO 8601 standard (Y-M-D,  Y/M/D,  Y.M.D), handles only 4-digit year:

  • 2007/01/01 will be pronounced erste januar zweitausendsieben.
  • 2007-Jan-01 will be pronounced erste januar zweitausendsieben.
  • 2007-Januar-01 will be pronounced erste januar zweitausendsieben.

Abbreviations

Most abbreviations will be expanded to full words. There will be no sentence break on the dot sign (full stop) following a recognized abbreviation. In order to force a sentence break please use two periods: one for the abbreviation and one for the sentence ending.

Example

  • z.B. Hr Fisher von Friedrichstr.. will be interpreted as zum beispiel herr fischer von friedrichstraße.

Initialisms

Initialisms with a period (dot) following each letter (e.g. U.S.A.) will be pronounced by spelling out each letter.

Most common initialisms without periods (e.g. USA, ARD) will also be recognized and properly pronounced.

All vowelless words are recognized as initialisms.

Examples

  • F.A.Z. will be pronounced f a z.
  • In den USA will be pronounced in den u s a.
  • ARD will be pronounced a r d.
  • pwq will be pronounced p w q.

Telephone number

LumenVox currently supports various standard (DIN 5008 and E.123) and non-standard formats of telephone numbers and groups the digits in 2 or 3-digit numbers adding a pause after each group. It also recognizes extension numbers of certain formats and adds the word Durchwahl  before such sequences.

Examples

  • 0180-1234050 will be pronounced as null eins, achtzig, einhundertdreiundzwanzig, vierzig, fünfzig.
  • 0201 12-46542 will be pronounced as null zwei, null eins, zwölf, vierhundertfünfundsechzig, zweiundvierzig.
  • +49 (0) 6251 / 1 75 29 - 0 will be pronounced as plus neunundvierzig, null, zweiundsechzig, einundfünfzig, eins, fünfundsiebzig, neunundzwanzig Durchwahl null.
  • +49 30 588459-258 will be pronounced as plus neunundvierzig, dreißig, achtundfünfzig, vierundachtzig, neunundfünfzig Durchwahl zweihundertachtundfünfzig.
  • 0043 5226 2789-20 will be pronounced as _null null vier drei, zweiundfünfzig, sechsundzwanzig, siebenundzwanzig, neunundachtzig Durchwahl zwanzig.

Identifier

Non-words not described elsewhere will be treated as identifiers. This group includes mixes of letters and digits, such as r121, as well as URL’s, e-mail addresses, or fancy proper names unknown to the synthesizer.

Punctuation characters within identifiers will be pronounced.

Examples

  • er125lp will be pronounced er einhundertfünfundzwanzig l p.
  • http://www.lumenvox.com will be pronounced h t t p doppelpunkt schrägstrich schrägstrich w w w punkt lumenvox punkt com.
  • B!0 will be pronounced b ausrufezeichen null.


SSML say-as attribute values

The SSML element say-as gives users the possibility to annotate fragments of text in order to force particular interpretation.

Marking a fragment with say-as disables most default normalization rules, which would have otherwise been applied. Therefore, it is advised to mark text with say-as scarcely, only when the default normalization rules fail and render different speech than expected by the user.

The standards authority W3C Working Group has issued a note describing SSML 1.0 say-as attribute values, which is mostly followed by LumenVox.

Date

LumenVox will interpret a value as a date, when used within say-as with interpret-as="date". This works just as defined in the W3C note. The format attribute may be set to any of the following: mdy, dmy, ymd, md, dm, ym, my, y, d, m.

Examples

  • <say-as interpret-as="date" format="mdy">05/02/03</say-as> will be pronounced zweite mai zweitausenddrei.
  • <say-as interpret-as="date" format="dmy">05/02/03</say-as> will be pronounced fünfte februar zweitausenddrei.
  • <say-as interpret-as="date" format="ymd">05/02/03</say-as> will be pronounced dritte februar zweitausendfünf.

Duration

Tokens like 7'10" can be recognized as duration in minutes and seconds by surrounding with say-as having interpret-as="time".

Example

  • <say-as interpret-as="time">1'23"</say-as> will be pronounced eine minute und dreiundzwanzig sekunden.

Character string

LumenVox will read individual characters for text within the say-as element having interpret-as="characters". The format attribute is ignored. The detail attribute may be used to force pauses, as described in the W3C Note.

Examples

  • <say-as interpret-as="characters">achtzig</say-as> will be pronounced a c h t z i g.
  • <say-as interpret-as="characters">1a3BZ7</say-as> will be pronounced eins a drei b z sieben.

Cardinal number

LumenVox will attempt to read values within say-as having interpret-as="cardinal" as cardinal numbers. The format and detail attributes are ignored. Roman numerals are supported.

Examples

  • <say-as interpret-as="cardinal">13</say-as> will be pronounced dreizehn.
  • <say-as interpret-as="cardinal">C</say-as> will be pronounced einhundert.

Ordinal number

LumenVox will attempt to read values within say-as having interpret-as="ordinal" as ordinal numbers. The format and detail attributes are ignored. Roman numerals are supported.

Examples

  • <say-as interpret-as="ordinal">986</say-as> will be pronounced neunhundertsechsundachtzigste.
  • <say-as interpret-as="ordinal">C</say-as> will be pronounced hundertste.

Digit

LumenVox will attempt to read values within say-as having interpret-as="digit" as digits.

Examples

  • <say-as interpret-as="digit">123</say-as> will be pronounced eins zwei drei.
  • <say-as interpret-as="digit">C</say-as> will be pronounced eins null null.

Fraction

LumenVox will interpret values within say-as having interpret-as="fraction" as common fractions. The syntax for fractions is any of the following:

Fraction
["+" | "-" ] cardinal “/” cardinal.

where cardinal is a number as defined in Cardinal numbers above.

Examples

  • <say-as interpret-as="fraction">15/2</say-as> will be pronounced fünfzehn zweitel.
  • <say-as interpret-as="fraction">-1/2</say-as> will be pronounced minus ein zweitel.


LumenVox special characters

As mentioned at the very beginning of this text, it is sometimes necessary to modify texts to be synthesized in order to make them compatible with the system constraints and achieve the expected output. LumenVox provides a set of special characters that work only in certain contexts, changing the way texts are being synthesized in terms of pronunciation or intonation. The characters are language-specific and do not apply to other languages unless specified otherwise in the language-specific documentation.


Replacements of German letters

The non-ASCII German letters ä, ö, ü and ß can be replaced by two-letter combinations (ae, oe, ue and ss respectively) commonly used when typing German special characters is not possible or difficult. In most cases the pronunciation of both versions should be identical.

Example

  • Bürger will be pronounced the same way as Buerger.


Force rising intonation

A question mark followed by caret also known as circumflex (?^) can be used to force the intonation of a question to be rising. Wh-questions (questions starting with an interrogative pronoun) by default have falling intonation. This can be changed by appending a caret to the question mark.

Example

  • Wo bist du?^ will result in a rising intonation.


Force falling intonation

A question mark followed by an underscore (?_) can be used to force the intonation of a question to be falling. Yes/No questions by default have a rising intonation. This can be changed by appending the underscore character to the question mark.

Example

  • Alles klar?_ will result in a falling intonation.



Was this article helpful?
Copyright (C) 2001-2025, Ai Software, LLC d/b/a LumenVox