Release notes 6.1.0

Release date: 8th June 2025

Summary

This page highlights all the changes, new features and bugs addressed within the LumenVox Containers version 6.1.0 release. This change affects Speech products. This version is not available for Voice biometric products, this will be made available in upcoming releases.

This release builds upon the 6.0.0 release - see LumenVox Containers Release Notes 6.0.0.

Highlights

Neural TTS now incorporates partial results allowing for faster TTS synthesis
New TTS Prometheus metrics added for time-to-first-byte
ASR grammar cache housekeeping introduced to cleanup old grammar caches
Grammar ripcords for large grammars now configurable at a global/session/interaction level
Issue resolved for instances where VAD event offsets are longer than the audio duration in Redis resulting in empty transcription results
Various NLU issues resolved
Neural TTS memory leak issue resolved

For TTS, we recommend that text not exceed 4mb
For Transcription, we recommend that users not exceed 120 minutes of transcribed audio due to gRPC size limits.

What’s new on LumenVox Cloud 6.1.0

New features

Partial results enabled for Neural TTS to speed up the return of TTS synthesis results, this is set in ttsSettings.enable_partial_results
New TTS Prometheus metric added for time-to-first-byte
- tts_first_result_time_max
- tts_first_result_time_min
- tts_first_result_time_dist
ASR grammar cache housekeeping introduced to cleanup old grammar caches. By default, this new housekeeping/cleanup mechanism will scan the folders every 10 minutes and delete any files older than 1 month since they were last modified. Overrides can be applied within the environment variables if required
Grammar ripcord now added to configurations for global, session or interaction settings and can be set within the deployment portal. If the grammar is above a specified size, then a grammar load failure is raised (GrammarSettings.grammar_threshold)

Updates

Issue resolved for instances where VAD event offsets are longer than the audio duration in Redis resulting in empty transcription results
Various NLU issues resolved e.g.
- Unable to process large text requests
- Language -translation: If the input text contains "()" it prevents the full text from being translated
- Diarization and Language ID returning negative Prometheus request counters
- Resolve issues with translate_from_language
Issue with reporting-api crashing on certain requests when x-scopes is not included rectified
SSML markers being sent out of the MRCP faster than the outbound TTS audio stream resolved, they are now sent out when the corresponding part of the TTS stream is sent
Neural TTS memory leak issue resolved

Installation notes

The following helm chart can be used

Helm Chart

Note that for MRCP there is no helm chart but a docker compose file. MRCP will run on its own Docker virtual machine which will integrate into the Kubernetes cluster.

Run the following command helm repo update to update the helm charts

Note: if using TTS we recommend you add the following toggle into the values file legacyEnabled. To enable legacy TTS this must be set to True, and False to enable the new neural TTS. The new neural TTS voices must be loaded in the values file in order for the models to be retrieved from S3

 ttsLanguages:
    - name: "en_us" 
      legacyEnabled: false
      voices:
         - name: "jeff"    
           version: "4.0.0"       
         - name: "megan"         
           version: "4.0.0"

Note if installing from 4.7 or below: There have been helm charts changes - please ensure that if you have custom helm charts that you take note of all the changes before installing/upgrading e.g. licensing has moved from common to global - looking for custom license guid

If installing for MRCP - note that the conf file settings for MRCP API have been replaced with environment variables e.g. to enable compatibility mode.

Key installation guide changes:

LumenVox now recommends that a minimum of version 1.30 Kubernetes is installed.

Upgrade procedures

Upgrade or migration from previous versions is supported. Please contact LumenVox to discuss. See notes above if upgrading from 4.7.

If you are performing an upgrade, you need ensure that your NGINX versions are updated from 1.11 to 1.12.

If upgrading and using Neural TTS utilizing MRCP, edit the .env file making the following changes:

PRODUCT_VERSION=6.1  MEDIA_SERVER__ENABLE_TTS_PARTIAL_STREAMING=1

Edit the docker-compose.yml file making the following changes:

MEDIA_SERVER__ENABLE_TTS_PARTIAL_STREAMING: "${MEDIA_SERVER__ENABLE_TTS_PARTIAL_STREAMING}"

If upgrading neural TTS from version 6.0.0, the TTS cache folder and the Neural TTS models folder needs to be cleared. Reach out to LumenVox support should you have any questions support@lumenvox.com

Updated API guide

APIs for all speech products available on version 6.1 can be obtained here: LumenVox API Documentation

Information for voice biometric products relates to version 3.4.0-3.4.3

Model versions as part of the release

ASR - 4.1.0

TTS - 4.0 (Neural TTS) sample rate 24 & 16 - can be down sampled to 8kHz (note change). Legacy TTS models will still run under version 1.0. Further voice enhancements are currently being released in TTS voices version 4.0.1 so LumenVox recommends that clients cater for 4.0.X)

VB - 2.1.15

VB incorporates Selene 2.4.3 which was integrated into the Container stack

Was this article helpful?