Release notes 6.1.0
Release date: 8th June 2025
Summary
This page highlights all the changes, new features and bugs addressed within the LumenVox Containers version 6.1.0 release. This change affects Speech products. This version is not available for Voice biometric products, this will be made available in upcoming releases.
This release builds upon the 6.0.0 release - see LumenVox Containers Release Notes 6.0.0.
Highlights
Neural TTS now incorporates partial results allowing for faster TTS synthesis
New TTS Prometheus metrics added for time-to-first-byte
ASR grammar cache housekeeping introduced to cleanup old grammar caches
Grammar ripcords for large grammars now configurable at a global/session/interaction level
Issue resolved for instances where VAD event offsets are longer than the audio duration in Redis resulting in empty transcription results
Various NLU issues resolved
Neural TTS memory leak issue resolved
For TTS, we recommend that text not exceed 4mb
For Transcription, we recommend that users not exceed 120 minutes of transcribed audio due to gRPC size limits.
What’s new on LumenVox Cloud 6.1.0
New features
Partial results enabled for Neural TTS to speed up the return of TTS synthesis results, this is set in ttsSettings.enable_partial_results
New TTS Prometheus metric added for time-to-first-byte
tts_first_result_time_max
tts_first_result_time_min
tts_first_result_time_dist
ASR grammar cache housekeeping introduced to cleanup old grammar caches. By default, this new housekeeping/cleanup mechanism will scan the folders every 10 minutes and delete any files older than 1 month since they were last modified. Overrides can be applied within the environment variables if required
| Name | Description | Default Value | |------------------------------------------------------|-----------------------------------------------|---------------| | `GRAMMAR_SETTINGS__FILE_CACHE_CLEANUP_PURGE_MINUTES` | Files older than 1 month deleted (0= disable) | 43200 | | `GRAMMAR_SETTINGS__FILE_CACHE_CLEANUP_SLEEP_SECONDS` | Sleep time between scans | 600 |
Grammar ripcord now added to configurations for global, session or interaction settings and can be set within the deployment portal. If the grammar is above a specified size, then a grammar load failure is raised (
GrammarSettings.grammar_threshold
)
Updates
Issue resolved for instances where VAD event offsets are longer than the audio duration in Redis resulting in empty transcription results
Various NLU issues resolved e.g.
Unable to process large text requests
Language -translation: If the input text contains "()" it prevents the full text from being translated
Diarization and Language ID returning negative Prometheus request counters
Resolve issues with translate_from_language
Issue with reporting-api crashing on certain requests when x-scopes is not included rectified
SSML markers being sent out of the MRCP faster than the outbound TTS audio stream resolved, they are now sent out when the corresponding part of the TTS stream is sent
Neural TTS memory leak issue resolved
Installation notes
The following helm chart can be used
Note that for MRCP there is no helm chart but a docker compose file. MRCP will run on its own Docker virtual machine which will integrate into the Kubernetes cluster.
Run the following command helm repo update to update the helm charts
Note: if using TTS we recommend you add the following toggle into the values file legacyEnabled. To enable legacy TTS this must be set to True, and False to enable the new neural TTS. The new neural TTS voices must be loaded in the values file in order for the models to be retrieved from S3
ttsLanguages: - name: "en_us" legacyEnabled: false voices: - name: "jeff" version: "4.0.0" - name: "megan" version: "4.0.0"
Note if installing from 4.7 or below: There have been helm charts changes - please ensure that if you have custom helm charts that you take note of all the changes before installing/upgrading e.g. licensing has moved from common to global - looking for custom license guid
If installing for MRCP - note that the conf file settings for MRCP API have been replaced with environment variables e.g. to enable compatibility mode.
Key installation guide changes:
LumenVox now recommends that a minimum of version 1.30 Kubernetes is installed.
Upgrade procedures
Upgrade or migration from previous versions is supported. Please contact LumenVox to discuss. See notes above if upgrading from 4.7.
If you are performing an upgrade, you need ensure that your NGINX versions are updated from 1.11 to 1.12.
If upgrading and using Neural TTS utilizing MRCP, edit the .env file making the following changes:
PRODUCT_VERSION=6.1
MEDIA_SERVER__ENABLE_TTS_PARTIAL_STREAMING=1
Edit the docker-compose.yml file making the following changes:
MEDIA_SERVER__ENABLE_TTS_PARTIAL_STREAMING: "${MEDIA_SERVER__ENABLE_TTS_PARTIAL_STREAMING}"
If upgrading neural TTS from version 6.0.0, the TTS cache folder and the Neural TTS models folder needs to be cleared. Reach out to LumenVox support should you have any questions support@lumenvox.com
Updated API guide
APIs for all speech products available on version 6.1 can be obtained here:
LumenVox API Documentation
Information for voice biometric products relates to version 3.4.0-3.4.3
Model versions as part of the release
ASR - 4.1.0
TTS - 4.0 (Neural TTS) sample rate 24 & 16 - can be down sampled to 8kHz (note change). Legacy TTS models will still run under version 1.0. Further voice enhancements are currently being released in TTS voices version 4.0.1 so LumenVox recommends that clients cater for 4.0.X)
VB - 2.1.15
VB incorporates Selene 2.4.3 which was integrated into the Container stack