"lexicon" Element
<lexicon>
Description
Sets the location of the user-defined pronunciation dictionary.
A lexicon file is an XML file following the PLS specification (http://www.w3.org/TR/pronunciation-lexicon/).
Syntax
<lexicon uri = “URI” type = “application/pls+xml” xml:lang = “locale” xml:id = “reference_id” /> |
Attributes
Attribute | Description |
uri | URI of the dictionary. mandatory field |
type | Media type of the dictionary. optional field |
xml:lang | Locale. optional field |
xml:id | May be used to give the lexicon reference an identifier that is unique to the document, allowing the element to be specified, via the <lookup> ref attribute as shown in the examples below. This is an SSML 1.1 extension. |
Parent
<speak>
Children
None
Limitations/Restrictions
Note that the Lexicon element is not supported when using TTS2 voices.
Lexicons of types other than application/pls+xml are not supported. The Lexicon element must have a uri attribute specifying a URI that identifies the location of the pronunciation lexicon document. This location may utilize the optional xml:base attribute specified in the speak element (see examples below).
PLS Lexicon files are loaded separately for each synthesized SSML document. The only lexicons loaded are the ones declared by a <lexicon> element.
LumenVox Text-To-Speech supports PLS 1.0 lexicons referenced from SSML documents, as defined by Pronunciation Lexicon Specification (PLS) Version 1.0, W3C Recommendation 14 October 2008, http://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/.
LumenVox supports both SSML 1.0 and an extension to the SSML 1.1 standard for supporting lexicon references as shown in the examples below. The corresponding SSML 1.1 <lookup> element is also supported (with corresponding ref attribute), also as shown below.
Example : SSML 1.0 style, local file reference to lexicon...
<?xml version="1.0"?> <speak version="1.0"> <!-- references the file "C:\lexicons\general.pls" --> <lexicon uri="file:///C:/lexicons/general.pls" /> <voice name="Amanda" xml:lang="en-US"> <s>LumenVox has been recognized as a leading innovator of speech technology since <say-as interpret-as="date" format="year">2001</say-as>.</s> </voice> </speak> |
Example : SSML 1.1 style, local file reference to lexicon, using lookup element...
<?xml version="1.0"?> <speak version="1.1"> <!-- references the file "C:\lexicons\general.pls" --> <lexicon uri="file:///C:/lexicons/general.pls" xml:id="general"/> <lookup ref="general"> <voice name="Amanda" xml:lang="en-US"> <s>LumenVox has been recognized as a leading innovator of speech technology since <say-as interpret-as="date" format="year">2001</say-as>.</s> </voice> </lookup> </speak> |
Example : SSML 1.1 style, xml:base specifying lexicon folder...
<?xml version="1.0"?> <speak version="1.1" xml:base="file:///C:/lexicons/"> <!-- references the file "general.pls" in folder "C:\lexicons\" --> <lexicon uri="general.pls" xml:id="general"/> <lookup ref="general"> <voice name="Amanda" xml:lang="en-US"> <s>LumenVox has been recognized as a leading innovator of speech technology since <say-as interpret-as="date" format="year">2001</say-as>.</s> </voice> </lookup> </speak> |
Example : SSML 1.1 style, xml:base specifying lexicon folder on web server...
<?xml version="1.0"?> <speak version="1.1" xml:base="http://192.168.12.12/lexicons/"> <!-- references the file "general.pls" on the web server --> <lexicon uri="general.pls" xml:id="general"/> <lookup ref="general"> <voice name="Amanda" xml:lang="en-US"> <s>LumenVox has been recognized as a leading innovator of speech technology since <say-as interpret-as="date" format="year">2001</say-as>.</s> </voice> </lookup> </speak> |
Example : SSML 1.0 style, lexicon URI referencing web server (with xml:base)...
<?xml version="1.0"?> <speak version="1.0" xml:base="http://192.168.12.12/lexicons/"> <!-- references the file "general.pls" on the web server --> <lexicon uri="general.pls" /> <voice name="Amanda" xml:lang="en-US"> <s>LumenVox has been recognized as a leading innovator of speech technology since <say-as interpret-as="date" format="year">2001</say-as>.</s> </voice> </speak> |
Example : SSML 1.0 style, lexicon URI referencing web server (without xml:base)...
<?xml version="1.0"?> <speak version="1.0"> <!-- references the file "general.pls" on the web server --> <lexicon uri="http://192.168.12.12/lexicons/general.pls" /> <voice name="Amanda" xml:lang="en-US"> <s>LumenVox has been recognized as a leading innovator of speech technology since <say-as interpret-as="date" format="year">2001</say-as>.</s> </voice> </speak> |
Example Lexicon...
<?xml version="1.0" encoding="UTF-8" ?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd" alphabet="x-sampa" xml:lang="en-US"> <lexeme> <grapheme>LumenVox</grapheme> <phoneme> lUUmEnvO:ks </phoneme> </lexeme> <lexeme> <grapheme>Cool Guys</grapheme> <alias> LumenVox </alias> </lexeme> </lexicon> |
You can copy this example lexicon as a starting point, but of course you will need to add your own lexemes to it. Please see our pages on the phonetic alphabets available for building the <phoneme> elements of the lexicon.
Multiple Lexicons in SSML 1.0
In order to work with multiple different lexicons in a single SSML 1.0 file, you need to separately group the portions of the file according to which lexicon should be used. For example:
<?xml version="1.0"?> <speak version="1.0"> <!-- first lexicon used in following 'voice' section --> <lexicon uri="http://192.168.12.12/lexicons/first_lexicon.pls" /> <voice name="Amanda" xml:lang="en-US"> <s>Some phrase using words in first lexicon.</s> </voice> <!-- second lexicon used in following 'voice' section --> <lexicon uri="http://192.168.12.12/lexicons/second_lexicon.pls" /> <voice name="Amanda" xml:lang="en-US"> <s>Some phrase using words in second lexicon.</s> </voice> <!-- third lexicon used in following 'voice' section --> <lexicon uri="http://192.168.12.12/lexicons/third_lexicon.pls" /> <voice name="Amanda" xml:lang="en-US"> <s>Some phrase using words in third lexicon.</s> </voice> </speak> |
As you can see in this example, sections of the files to be synthesized are contained in <voice> tags, which themselves are preceded by their corresponding lexicon references.
Note that the most recent lexicon reference is used for each <voice> section. It is also important to remember that <lexicon> tags must be children of <speak> elements (not within <voice> elements) and also <voice> elements are also children of <speak> elements, but should follow after any lexicon definition they use.
Note:
The syntax shown in this example is specific to SSML 1.0, since SSML 1.1 was changed to make use of the <lookup> element for selectively activating lexicons as described earlier in this article.