Vendor specific parameters

As defined in the MRCP specifications, there are a set of headers allowing the client to adjust vendor specific parameters. These headers may be sent in the SET-PARAMS/GET-PARAMS methods.

The following parameters are LumenVox-specific extensions to the MRCP specification. They can be controlled via the media_server.conf file, located in the config directory of the Windows LumenVox installation folder. By default, this location is It is located by default in mrcp-api/docker/lumenvox/media_server.conf.

They may also be set with the appropriate header as part of a RECOGNITION or SET-PARAMS method; see Specifying Vendor-Specific Properties via MRCP Headers below.

See  Configuration Parameters for more information about changing various MRCP parameters.

wind-back-time

The length of audio wound back at the beginning of voice.

It helps in the situation of weak speech onset. The resolution of this parameter is 40 ms and it is rounded to the closes multiple of 40ms, which means setting this value to 139 ms is the same as setting it to 120 and setting this value to 141 ms is the same as setting it to 160 ms. It is specified in milliseconds.

Range: >0

Default: 480

snr-sensitivity-lvl

This setting controls the minimum SNR of streamed audio data for it to be processed to identify whether it is speech. Data below this threshold is automatically assumed to be  silence/noise. The Noise estimate for the calculation is obtained from the initial silence specified by STREAM_PARM_VAD_STREAM_INIT_DELAY. The higher the value the harder it is to barge in. The default value of 50 equals 5 dB SNR. The parameter range is mapped between 3.5dB to  20dB. If the application is expected to be in a very noisy environment and speech is not expected to be much louder than the background, this setting may need to be lowered. If speech is expected to be much louder than the surrounding noise, then raising this value allows the VAD to ignore lower volume background speech or babble noise that may otherwise cause barge-in

Note that this parameter can be set in the range 0-100, with higher values (closer to 100) being more sensitive to barge-in in noisy situations with low SNR (where speech and background noise are similar)

Range: 0-100

Note that the LumenVox setting (0 is most sensitive) is opposite to the snr-sensitivity-lvl setting (100 is most sensitive). Note that this vendor specific setting should not be confused with the similar MRCP Sensitivity-Level header setting, which affects the STREAM_PARM_VAD_VOLUME_SENSITIVITY setting in the API.

Default: 50

vad-stream-init-delay

The length of audio (in milliseconds) that the VAD module uses to estimate the acoustic environment.

Accurate VAD depends on good estimation of acoustic environment. The VAD module uses the first a couple of frames of audio to estimate the acoustic environment, such as noise level. The length of this period is defined by this parameter.

Range: >0

Default: 100

vad-bargein-threshold

VAD speech sensitivity setting.

A higher value makes the VAD more sensitive to speech which means that the VAD is very sure the data is speech before barge in. Raising the value will reject more false positives/noises However, it may mean that some speech that is on the borderline may be rejected This value should not be changed from the default without significant tuning and verification.

Range: 0 - 100 (MRCP v1 and MRCP v2)

Default: 50

compatibility_mode

Enables compatibility encoding of results

This option may need to be enabled to match the output of LumenVox decodes with those of other vendors.

Please contact LumenVox support for more specific details

Default: 0

end-of-speech-timeout

Controls the end of speech timeout setting

This value affects the underlying STREAM_STATUS_END_SPEECH_TIMEOUT of the speech port, which is used in an MRCP ASR recognition session.

After barge-in, the streaming interface will flag STREAM_STATUS_END_SPEECH_TIMEOUT, if it did detect end-of-speech in the time specified by this property. This is different from the end of speech delay; STREAM_PARM_END_OF_SPEECH_TIMEOUT represents the total amount of time a caller has to speak after barge-in is detected.

Default: -1 (infinite)


secure_context

Enables suppression of potentially sensitive ASR data.

When enabled, this option will prevent logging of any potentially sensitive data to either log files or callsre data files, which includes any associated audio segments. Where potentially sensitive data would have appeared, the word _SUPPRESSED will replace the potentially sensitive data to indicate that suppression occurred.

Possible Values:

  • 0 - Disabled. Normal logging will be performed
  • 1 - Secure Context mode enabled. Sensitive data will be suppressed

Default: 0

tts.secure_context

Enables suppression of potentially sensitive TTS data

When enabled, this option will prevent logging of any potentially sensitive data to either log files or callsre data files, which includes any associated audio segments. Where potentially sensitive data would have appeared, the word _SUPPRESSED will replace the potentially sensitive data to indicate that suppression occurred.

Possible Values:

  • 0 - Disabled. Normal logging will be performed
  • 1 - Secure Context mode enabled. Sensitive data will be suppressed

Default: 0

enable-sre-logging

This has been deprecated, please see data archiving

callsre-prefix

Allows the addition of a custom string prefix to the beginning of the Response File filename for the current session

When specified, this option will add the specified prefix to Response Files generated for the current session. This may be useful when identifying certain specific calls, such as those belonging to a certain application or customer controlled category.

Note that the callsre-prefix and callsre-suffix options are both independent, so can be used individually, together, or not at all, as needed.

Possible Values:

  • A string containing valid filename characters (avoid reserved characters)

Default

callsre-suffix

Allows the addition of a custom string suffix to the end of the Response File filename for the current session

Similar to callsre-prefix, when specified, this option will add the specified suffix to Response Files generated for the current session. This may be useful when identifying certain specific calls, such as those belonging to a certain application, or some customer controlled category.

Note that the callsre-prefix and callsre-suffix options are both independent, so can be used individually, together, or not at all, as needed.

Possible Values:

  • A string containing valid filename characters (avoid reserved characters)

Default

logging-verbosity

Controls the Logging Verbosity within the current session

When used, this option allows users to override the default LOGGING_VERBOSITY setting in client_property.conf, which is used to control the verbosity of log messages. This flexible option therefore allows users to independently control the amount of logging generated for individual sessions. It can even be used to control interactions and requests within a single session so that, for example, only certain recognition requests are logged with specified verbosity, if this is needed.

Possible Values (same as LOGGING_VERBOSITY setting):

  • 1 - Minimal Logging - only errors and critical issues
  • 2 - Medium Logging - all non-debug information as events occur
  • 3 - Debug Logging - all types of events, including information and debugging activity
  • 4 and higher values - typically higher levels of logging verbosity are useful to LumenVox.

Default: (as specified by LOGGING_VERBOSITY setting)

sticky-save-waveform

Allows for option to override the platform's default SAVE-WAVEFORM setting

Possible Values:

  • True - If set to true, regardless of the save-waveform header value, the save wave-form option will be set to true for the remainder of the MRCP session
  • False - If set to false, regardless of the save-waveform header value, the save wave-form option will be set to false for the remainder of the MRCP session

Default


Specifying Vendor-Specific Properties via MRCP Headers

As mentioned previously, you may specify the above parameters in an MRCP header. You must use the following format. Note that a semicolon (";") is used as the delimiter:

Vendor-Specific: com.lumenvox.wind-back-time=300;com.lumenvox.vad-stream-init-delay=200

This header field may be specified in RECOGNIZE, recognizer SET-PARAMS or synthesizer SET-PARAMS method during an MRCP session. The following header field names may be used:

com.lumenvox.wind-back-time
com.lumenvox.snr-sensitivity-lvl
com.lumenvox.vad-stream-init-delay
com.lumenvox.vad-bargein-threshold
com.lumenvox.compatibility-mode
com.lumenvox.end-of-speech-timeout
com.lumenvox.secure_context
com.lumenvox.tts.secure_context
com.lumenvox.enable-sre-logging
com.lumenvox.callsre-prefix
com.lumenvox.callsre-suffix
com.lumenvox.logging-verbosity
com.lumenvox.sticky-save-waveform



Was this article helpful?
Copyright (C) 2001-2024, Ai Software, LLC d/b/a LumenVox