Applying grammar weights

Ultimately, the Speech Engine is just a probability machine.

When the Engine decodes audio input, it compares the sounds in the audio to its phoneme tables to figure out which phonemes are contained in the audio. Using the grammars as a guide, the Engine comes up with probabilities that a series of sounds in the audio matches a word in the grammar.

You can modify the probabilities in an SRGS grammar by applying weights to words, phrases, and rules. By weighting parts of the grammar, you can make the Engine more or less likely to match audio to specific grammar items.

As an example, suppose we have a grammar that recognizes a person speaking a number that is four digits long:

ABNF Example

#ABNF 1.0;
language en-US;
mode voice;
root $number;

$one_digit = zero | one | two | three | four | five | six | seven | eight | nine;

$teens = ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen;
$above_twenty = (twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety)[$one_digit];
$double_digit = $teens | $above_twenty;

$single_digits = $one_digit<4>; //e.g. one two three four
$double_digits = $double_digit<2>; //e.g. twelve thirty four
$single_double = $one_digit<2> $double_digit; //e.g. one two thirty four
$double_single = $double_digit $one_digit<2>; //e.g. twelve three four

 $number = $single_digits | $double_digits | $single_double | $double_single;


GrXML Example

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" root="number" mode="voice">

<rule id="one_digit">
<one-of>
<item>zero</item>
<item>one</item>
<item>two</item>
<item>three</item>
<item>four</item>
<item>five</item>
<item>six</item>
<item>seven</item>
<item>eight</item>
<item>nine</item>
</one-of>
</rule>

<rule id="teens">
<one-of>
<item>ten</item>
<item>eleven</item>
<item>twelve</item>
<item>thirteen</item>
<item>fourteen</item>
<item>fifteen</item>
<item>sixteen</item>
<item>seventeen</item>
<item>eighteen</item>
<item>nineteen</item>
</one-of>
</rule>

<rule id="above_twenty">
<one-of>
<item>twenty</item>
<item>thirty</item>
<item>forty</item>
<item>fifty</item>
<item>sixty</item>
<item>seventy</item>
<item>eighty</item>
<item>ninety</item>
</one-of>
<item repeat="0-1"><ruleref uri="#one_digit"/></item>
</rule>

<rule id="double_digit">
<one-of>
<item><ruleref uri="#teens"/></item>
<item><ruleref uri="#above_twenty"/></item>
</one-of>
</rule>

<rule id="single_digits"> <!-- e.g. one two three four -->
<item repeat="4"><ruleref uri="#one_digit"/></item>
</rule>

<rule id="double_digits"> <!-- e.g. twelve thirty four -->
<item repeat="2"><ruleref uri="#double_digit"/></item>
</rule>

<rule id="single_double"> <!-- e.g. one two thirty four -->
<item repeat="2"><ruleref uri="#one_digit"/></item>
<item><ruleref uri="#double_digit"/></item>
</rule>

<rule id="double_single"> <!-- e.g. twelve three four -->
<item><ruleref uri="#double_digit"/></item>
<item repeat="2"><ruleref uri="#one_digit"/></item>
</rule>

<rule id="number">
<one-of>
<item><ruleref uri="#single_digits"/></item>
<item><ruleref uri="#double_digits"/></item>
<item><ruleref uri="#single_double"/></item>
<item><ruleref uri="#double_single"/></item>
</one-of>
</rule>

</grammar> 


This is a flexible grammar, but if you used it in practice you might be disappointed. You might notice that too often words like "four three" are being misrecognized as "forty." In general, your callers may be speaking a sentence that matches single_digits the majority of the time, but the ASR too frequently returns a result that matches one of the other three rules.

You can help the ASR get the right answer more frequently by adding a weight to predispose it to choose the single_digits rule.

Weights are numeric, and are entered into an ABNF grammar between two forward-slashes (the / character), or by setting the weight attribute <item> on  elements in GrXML. Weights specify how much more or less likely one item is to be matched than another; in this sense weights are relative to other weights. Items are assumed to have a weight of 1 if no weight is specified.

So if an item is given a weight of 2 and a second item given a weight of 1, the first item is twice as likely to be recognized than the second. Likewise you could assign the items weights of 200 and 100 and it would have the same effect.

Suppose that callers match the single_digits rule five times as often as the other rules. We could weight the grammar to reflect this:

ABNF Example

#ABNF 1.0;
language en-US;
mode voice;
root $number;

$one_digit = zero | one | two | three | four | five | six | seven | eight | nine;
$teens = ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen;
$above_twenty = (twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety)[$one_digit];
$double_digit = $teens | $above_twenty;

$single_digits = $one_digit<4>; //one two three four
$double_digits = $double_digit<2>; //twelve thirty four
$single_double = $one_digit<2> $double_digit; //one two thirty four
$double_single = $double_digit $single_digit<2>; //twelve three four

$number = /50/ $single_digits | /10/ ($double_digits | $single_double | $double_single);

/**********************************************************
* You could also write the weights as:
* /5/ $single_digits | $double_digits | $single_double | $double_single;
 **********************************************************


GrXML Example

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" root="number" mode="voice">

<rule id="one_digit">
 <one-of>
  <item>zero</item>
  <item>one</item>
  <item>two</item>
  <item>three</item>
  <item>four</item>
  <item>five</item>
  <item>six</item>
  <item>seven</item>
  <item>eight</item>
  <item>nine</item>
 </one-of>
</rule>

<rule id="teens">
 <one-of>
  <item>ten</item>
  <item>eleven</item>
  <item>twelve</item>
  <item>thirteen</item>
  <item>fourteen</item>
  <item>fifteen</item>
  <item>sixteen</item>
  <item>seventeen</item>
  <item>eighteen</item>
  <item>nineteen</item>
 </one-of>
</rule>

<rule id="above_twenty">
 <one-of>
  <item>twenty</item>
  <item>thirty</item>
  <item>forty</item>
  <item>fifty</item>
  <item>sixty</item>
  <item>seventy</item>
  <item>eighty</item>
  <item>ninety</item>
 </one-of>
 <item repeat="0-1"><ruleref uri="#one_digit"/></item>
</rule>

<rule id="double_digit">
 <one-of>
  <item><ruleref uri="#teens"/></item>
  <item><ruleref uri="#above_twenty"/></item>
 </one-of>
</rule>

<rule id="single_digits"> <!-- e.g. one two three four -->
 <item repeat="4"><ruleref uri="#one_digit"/></item>
</rule>

<rule id="double_digits"> <!-- e.g. twelve thirty four -->
 <item repeat="2"><ruleref uri="#double_digit"/></item>
</rule>

<rule id="single_double"> <!-- e.g. one two thirty four -->
 <item repeat="2"><ruleref uri="#one_digit"/></item>
 <item><ruleref uri="#double_digit"/></item>
</rule>

<rule id="double_single"> <!-- e.g. twelve three four -->
 <item><ruleref uri="#double_digit"/></item>
 <item repeat="2"><ruleref uri="#one_digit"/></item>
</rule>

<rule id="number">
 <one-of>
  <item weight="50"><ruleref uri="#single_digits"/></item>
  <item weight="10">
   <item><ruleref uri="#double_digits"/></item> 
   <item><ruleref uri="#single_double"/></item>
   <item><ruleref uri="#double_single"/></item> 
  </item>
 </one-of>
</rule>

<!-- You could also write the weights as:

 <item weight="5"><ruleref uri="#single_digits"/></item>
 <item><ruleref uri="#double_digits"/></item> 
 <item><ruleref uri="#single_double"/></item>
 <item><ruleref uri="#double_single"/></item>

-->
</grammar>

Now, in cases where the Engine has a borderline decision to make between matching single_digits or one of the others, it will more frequently choose single_digits. We weighted the rules with a 5:1 ratio because we had actual data that reflected the fact that our callers were saying one rule five times as often as the others.

Weights are most useful when two items sound similar and are thus likely to be confused -- if applied properly, they will affect the outcome of a recognition only when the Engine had a close choice between two items. For this reason, it is a good idea to avoid very high or very low numbers for weights, unless you are weighting all the rules accordingly. If you were to weight one rule at 10,000 and leave all the other rules with the default weight of 1, the Engine would likely match every utterance to the rule with the extremely high weight, regardless of what was said.

If you give rules weights below 1, it can become very difficult for the Engine to match them, as this is effectively a negative weight. In addition to trying to match sounds to the phonemes in a grammar, the Engine also tries to match audio to noise, which it discards. If you apply very strong negative weights to rules, the Engine can end up almost always favoring noise over the negatively weighted rules.

Do Not Apply Weights Without Data

Applying grammar weights should never be the first thing you do to your grammar. Initially, you don't know how often each rule will be matched, so you are better off letting all rules be treated equally. Only after you have a compelling amount of data to suggest that applying grammar weights will help the application, as we did above, should you apply them. And after you do apply them, you must test their effects on real call data. Badly applied weights are worse than no weights at all.


Was this article helpful?
Copyright (C) 2001-2024, Ai Software, LLC d/b/a LumenVox