Release notes 5.3.0
Release date: 21st October 2024
Summary
This page highlights all the changes, new features and bugs addressed within the LumenVox Containers version 5.3.0 release. This change affects Speech products. This version is not available for Voice biometric products, this will be made available in upcoming releases. These release notes also include changes made as part of 5.2.1 (built-in grammar changes)
This release builds upon the 5.2.0 release - see https://lumenvox.capacity.com/article/560601/release-notes-5-2-0.
Highlights
New Beta Neural TTS released including new voice models under Neural TTS Version 3.0.0 (see further information below)
Built-in grammars updated to standardize across grammars - note changes have been made to existing number and currency grammars across all languages (also updated in 5.1 and 5.2)
Digits built-in grammars across all languages updated to cater for more than 10 characters (also updated in 5.1 and 5.2).
*Note: For TTS, we recommend that text not exceed 4mb (this is roughly 1300 Characters with spaces or around 250 words)
**Note: For Transcription, we recommend that users not exceed 90 minutes of transcribed audio due to gRPC size limits.
Whatโs new on LumenVox Cloud 5.3.0
New features
New Beta Neural Text to Speech released including new voice models under Neural TTS Version 3.0.0. This includes 80 new voices across 29 languages & dialects including new bubbly voices for contact center use. The voices available can be seen here: Text to Speech | Lumenvox (toggle demos to Neural). Voices are available for 24,16 & 8 kHz synthesis. There are no changes to the LumenVox APIs, so no development changes are required if currently using legacy TTS in the new containerized architecture. A voice mapping feature allows you to map existing voices to the new Neural voices. No development changes are required if using TTS via MRCP on the legacy architecture. The new voices also allow you to use the same SSML capabilities including prosody control. Catering for custom phonemes is supported in IPA format. Further integration notes can be found here: LumenVox API Documentation. If you are wanting to make use of Neural TTS please note helm chart changes below under installation notes. Neural TTS includes a caching mechanism that is used to speed up TTS synthesis.
Updates
- Built-in grammars updated to standardize across all number and currency grammars to use a decimal point instead of comma. The currency grammars were also amended to always include cents (with the exception of Japanese currency grammars).
- The Japanese built-in grammars were also modified to include spaces in certain character sets to match the ASR engine output - this would impact on all existing Japanese grammars.
- Digits built-in grammar across all languages was updated to cater for more than 10 characters.
- Storage service will now reflect an unhealthy status only if all defined deployments are marked as unhealthy.
- A change has been made to allow clients utilizing grammar-based CPA and AMD to disable ASR processing - the following environment variable must be set to false: SETTINGS__ENABLE_ASR_PROCESSING
- Port-level licensing reporting issue was resolved
- Redis refactoring was implemented for the retrieving keys e.g. grammar keys. No impact to customer deployments
- Exceptions when running MRCP TTS load tests resolved.
- Proto files changed to make sample rates for ulaw/alaw implicitly 8kHz
Installation notes
The following helm chart can be used
Note that for MRCP there is no helm chart but a docker compose file. MRCP will run on its own Docker virtual machine which will integrate into the Kubernetes cluster.
Run the following command helm repo update to update the helm charts
Note: if using TTS we recommend you add the following toggle into the values file legacyEnabled. To enable legacy TTS this must be set to True, and False to enable the new neural TTS. The new neural TTS voices must be loaded in the values file in order for the models to be retrieved from S3
Note if installing from 4.7 or below: There have been helm charts changes - please ensure that if you have custom helm charts that you take note of all the changes before installing/upgrading e.g. licensing has moved from common to global - looking for custom license guid
If installing for MRCP - notes that the conf file settings for MRCP API have been replaced with environment variables e.g. to enable compatibility mode.
Key installation guide changes:
LumenVox now recommends that a minimum of version 1.30 Kubernetes is installed
Upgrade procedures
Upgrade or migration from previous versions is supported. Please contact LumenVox to discuss. See notes above
If migrating from a previous version we recommend that you permanently delete any logically deleted deployments. You can do this through the admin portal or via the Management API LumenVox API Documentation.
Upgrade or migration from previous versions is supported. Please contact LumenVox to discuss. See notes above
Updated API guide
APIs for all speech products available on version 5.3 can be obtained here: https://developer.lumenvox.com/
Information for voice biometric products relates to version 3.4.0-3.4.3
Model versions as part of the release
ASR - 4.1.0
TTS - 3.0 (Neural TTS) sample rate 24 & 16 - can be down sampled to 8kHz (note change). Legacy TTS models will still run under version 1.0
VB - 2.1.15
VB incorporates Selene 2.4.3 which was integrated into the Container stack
Model version changes
Neural TTS models run under version 3.0, legacy TTS models remain under version 1.0