Release notes 5.4.0
Release date: 19th February 2025
Summary
This page highlights all the changes, new features and bugs addressed within the LumenVox Containers version 5.4.0 release. This change affects Speech products. This version is not available for Voice biometric products, this will be made available in upcoming releases. These release notes also include the following patches:
asr
5.3.15.3.2
deployment:
5.3.1
grammar:
5.3.1
5.3.2
mrcp-api:
5.3.1
neural-tts:
5.3.1
This release builds upon the 5.3.0 release - see LumenVox Containers Release Notes 5.3.0.
Highlights
- LumenVox now supports .gsm audio format
- Enable TLS communication to Rabbit MQ (Apple request)
- Support added for X509 for Mongo authentication with TLS encrypted passphrase (Apple request)
- Mongo sharding has been implemented (Apple request)
- Prometheus metrics added for MRCP API
- Enhancements made to improve performance for enhanced transcription
- Grammar loading speed improved and ripcord implemented to provide error for grammars exceeding pre-defined limit
- ASR memory growth optimization
- FinalResults message added when continuous transcription is ended with Finalize API call.
- Analysis portal now caters for Continuous Transcription interactions
- New Spanish Encoder model released 4.1.2 to enhance recognition with audio containing single words
- Add support for en-IE (Ireland) dialect
*Note: For TTS, we recommend that text not exceed 4mb (this is roughly 1300 Characters with spaces or around 250 words)
**Note: For Transcription, we recommend that users not exceed 90 minutes of transcribed audio due to gRPC size limits.
Whatโs new on LumenVox Cloud 5.4.0
New features
- LumenVox now supports .gsm audio formats
Enable TLS communication to Rabbit MQ (Apple request)
Support added for X509 for Mongo authentication with TLS encrypted passphrase (Apple request)
Mongo sharding has been implemented (Apple request)
Enhancements made to improve performance for enhanced transcription
Prometheus metrics added for MRCP API:
mrcp_total_requests
mrcp_active_requests
mrcp_average_request_process_time_dist
mrcp_total_responses_returned
mrcp_max_calls
mrcp_sip_calls
mrcp_sip_tcp_connections
mrcp_rtsp_calls
mrcp_garbage_collection_calls
Grammar loading speed improved and ripcord implemented to provide error for grammars exceeding pre-defined limit
ASR memory growth optimization
FinalResults message added when continuous transcription is ended with Finalize API call.
Analysis portal now caters for Continuous Transcription interactions
Updates
New Acoustic model for Spanish created to resolve recognition issues with audio containing single words
Enabled audio quality tool to run on either GPU or CPU for Apple. New environment variable added-
FORCE_CPU_PROCESSING
. When set totrue
, it forces CPU processing even if a GPU is available. In cases where a GPU is unavailable, the speech quality tool will default to CPU processing automatically.FinalResults message added when continuous transcription is ended with Finalize API call.
A memory leak in ASR container resolved.
New Prometheus metrics added for Apple to track fine-tuned vs regular ASR model usage (asr_fine_tuned_result).
FT - if transcription request was processed by the fine-tuned model
DNN - if transcription request was processed by the DNN model
Error - if there was an error processing interaction
Timeout - if there was a decode timeout
ASR changed to ignore "Garbage" rules as they dynamically create grammars that by default have grammar rules.
Grammar manager changed to return a grammar load failure if grammars are loaded that are larger than a specified threshold (applied by environment variable if required)
Grammar service crashes under load using MRCP tests resolved
Renamed fr-fr Voice Mapping from "Lauren" to "Laurent"
Media Server memory leak resolved
Issue with deployment importing resolved.
Issue resolved when running ASR/Transcription decodes with audio files longer than 3 minutes in batch mode.
Continuous transcription issue with barge out errors under load when the Fine tune model is enabled was resolved.
TTS voice specified in the MRCP header not working resolved.
Issue when RabbitMQ configuration string was invalid or incorrect causing fatal error resolved.
Resolved issue with Session management crashes when certain environment variables were empty
Added support for Spanish post-processing into Spanish ASR/Transcription
Issue with DTMF input not processed when voice and dtmf grammars are loaded resolved
Add support for en-IE (Ireland) dialect
Installation notes
The following helm chart can be used
Note that for MRCP there is no helm chart but a docker compose file. MRCP will run on its own Docker virtual machine which will integrate into the Kubernetes cluster.
Run the following command helm repo update to update the helm charts
Note: if using TTS we recommend you add the following toggle into the values file legacyEnabled. To enable legacy TTS this must be set to True, and False to enable the new neural TTS. The new neural TTS voices must be loaded in the values file in order for the models to be retrieved from S3
Note if installing from 4.7 or below: There have been helm charts changes - please ensure that if you have custom helm charts that you take note of all the changes before installing/upgrading e.g. licensing has moved from common to global - looking for custom license guid
If installing for MRCP - notes that the conf file settings for MRCP API have been replaced with environment variables e.g. to enable compatibility mode.
Key installation guide changes:
LumenVox now recommends that a minimum of version 1.30 Kubernetes is installed
Upgrade procedures
Upgrade or migration from previous versions is supported. Please contact LumenVox to discuss. See notes above.
Updated API guide
APIs for all speech products available on version 5.4 can be obtained here: https://developer.lumenvox.com/
Information for voice biometric products relates to version 3.4.0-3.4.3
Model versions as part of the release
ASR - 4.1.0 (4.1.2 Acoustic model released for Spanish to enhance recognition with audio containing single words)
TTS - 3.0 (Neural TTS) sample rate 24 & 16 - can be down sampled to 8kHz (note change). Legacy TTS models will still run under version 1.0. Further voice enhancements made in version 3.0.1 for energetic call center use).
VB - 2.1.15
VB incorporates Selene 2.4.3 which was integrated into the Container stack
Model version changes
4.1.2 Acoustic model released for Spanish