SRGS best practices

SRGS grammars, especially when combined with SISR, allow for the creation of very complex and flexible documents. The underlying logic of rule expansions and rule references makes writing SRGS grammars almost like developing small applications (and by adding JavaScript with SISR this becomes even more true).

There are a number of traps new grammar writers often fall into that can be avoided with a few best practices. Consider a grammar designed to capture a number with two digits:

root $TwoDigits;

$TwoDigits = {out=0}
([$TensDigit {out+=rules.latest()}] [$OnesDigit {out+=rules.latest()}] |
$TeensDigit{out+=rules.latest()} |
[$OnesDigit {out+=rules.latest()*10}] [$OnesDigit {out+=rules.latest()}]) {out=out.toString()};
$TensDigit =
ten {out=10} | twenty {out=20} | thirty {out=30} | forty {out=40} | fifty {out=50} | sixty {out=60} | seventy {out=70} | eighty {out=80} | ninety {out=90};
$TeensDigit =
eleven {out=11} | twelve {out=12} | thirteen {out=13} | fourteen {out=14} | fifteen {out=15} | sixteen {out=16} | seventeen {out=17} | eighteen {out=18} | nineteen {out=19};
$OnesDigit =
zero {out=0} | one {out=1} | two {out=2} | three {out=3} | four {out=4} | five {out=5} | six {out=6} | seven{out=7} | eight {out=8} | nine {out=9};


Avoid Ambiguity

Any input to a grammar should only have one valid parse (the Grammar Editor tool, included with the LumenVox Speech Tuner, can show how many parses a grammar returns for a given input).

The more parses a grammar has, the longer it takes the Engine to decode an utterance against it. It also decreases accuracy. As the number of valid parses increases, decode time can increase dramatically.

In the grammar above, the grammar is capable of correctly handling parses such as "two one" or "twenty one." But if a caller says just "one", it allows for two valid parses, as the last part of the root rule allows two optional $OnesDigit rule matches. In this case, each parse has a different interpretation: the first $OnesDigit match multiplies the interpretation by 10, returning a result of 10, while the second one returns a result of 1.

This sort of ambiguity not only increases decode time while decreasing accuracy, it also makes it harder for your application to correctly handle results. You would probably not expect a caller saying "one" to return a result of "10", but that is precisely one thing this grammar allows for.


Eliminate Unwanted Parses

You would obviously want to not allow the above example, where "one" has a valid parse that returns as "ten." But even allowing "one" to be a valid parse is quite possibly a bad idea if all you want to capture are two digit strings.

The grammar allows for other parses such as "twenty zero"(it returns with an interpretation of "20"), or "ten two" (returning with an interpretation of "12"). Even a null input is a valid parse ("" returns with an interpretation of "0").

Unwanted parses slow down decodes and reduce accuracy. It's pretty unlikely that a caller would ever say "twenty zero" or that a developer would want to allow for that sort of input. Accounting for these sorts of unlikely cases increases the probability that a caller behaving appropriately will be misrecognized. E.g. somebody who says "twenty two" might get mistaken for the unreasonable "twenty zero."


Keep Rules Compact

The larger and more complex rules are, the longer it takes to compile a grammar or decode against it. One good trick to keeping rules short is to combine rules with common words, where possible. For instance, the following rule:

$name = James Anderson | Jim Anderson | Jimmy Anderson | James | Jim | Jimmy;

Can be combined into:

$name = (James | Jim | Jimmy) [Anderson];

While it is a relatively small savings for one rule, across large grammars this sort of compactness can add up, decreasing load and decode times.


Be Careful with Recursion

SRGS allows for recursive rules, that is rules with references to themselves. Any time you work with recursion, you must be careful to avoid infinite loops. Since the LumenVox grammar parser parses from left to right, you should always avoid doing left-hand recursion.

For instance, the following rule will match the word "foo" any number of times:

$rule = foo ($rule | $NULL);

If the input is "foo foo," the Engine parses the rule, expanding the reference $rule each time until it matches $NULL and terminates. On the other hand, if your rule is written:

$rule = ($rule | $NULL) foo;

The parser will get caught in an infinite loop. The first thing it will attempt to do is to expand the $rule reference, only to expand it again, and again, ad infinitum.


Was this article helpful?
Copyright (C) 2001-2024, Ai Software, LLC d/b/a LumenVox