Release notes 5.2.0
Release date: 7th October 2024
Summary
This page highlights all the changes, new features and bugs addressed within the LumenVox Containers version 5.2.0 release. This change affects Speech products. This version is not available for Voice biometric products, this will be made available in upcoming releases. These release notes also include changes made as part of 5.1.1 - 5.1.4
This release builds upon the 5.1.0 release - see https://lumenvox.capacity.com/article/314233/release-notes-5-1-0.
Highlights
Continuous transcription released
Custom fine-tuned ASR model & transcription accuracy score released for Apple (GPU based)
This includes a new Prometheus metric asr_fine_tuned_results
New intent processing added to perform word spotting (requires training of custom intent model), includes on-the-fly grammar changes
Enhancements to secure context applied
Configurable barge-in timeout/eos delay introduced for Apple
Analysis portal caters for enhanced transcription interactions
Analysis portal allows for playback speed to be selected
Port-level license tracking added
Built in grammars released for Japanese
Python sample script released to perform transcription interactions stereo audio streams and continuous transcription
*Note: For TTS, we recommend that text not exceed 4mb (this is roughly 1300 Characters with spaces or around 250 words)
**Note: For Transcription, we recommend that users not exceed 90 minutes of transcribed audio due to gRPC size limits.
Whatโs new on LumenVox Cloud 5.2.0
New features
- Continuous transcription released (for more information see Continuous transcription under LumenVox API Documentation)
- Custom fine-tuned ASR model & transcription accuracy score released for Apple (GPU based) see Custom fine-tuned model & transcription quality score under LumenVox API Documentation
- A new Prometheus metric was released for the fine-tuned model: asr_fine_tuned_results (for final results). There are four buckets available in the metric
FT - if transcription request was processed by the fine-tuned model
DNN - if transcription request was processed by the DNN model
Error - if there was an error processing interaction
Timeout - if there was a decode timeout
- A new Prometheus metric was released for the fine-tuned model: asr_fine_tuned_results (for final results). There are four buckets available in the metric
Analysis portal caters for enhanced transcription interactions
Analysis portal allows for playback speed to be selected
Port-level license tracking added
Built in grammars released for Japanese
Python sample script released to perform transcription interactions stereo audio streams and continuous transcription
Updates
Implement a configurable barge-in timeout/eos delay for Apple
Final results appearing in LumenVox API Logs removed if secure context enabled
TTS Input Text in LumenVox API logs removed if secure context enabled
LumenVox API service logs revealing grammar parse input text removed if secure context enabled
LumenVox API logs revealing normalize text input removed if secure context enabled
Incorrect interpretation grammar response being returned resolved for grammar-based CPA & AMD interactions
License expiry issue when license cache expires resolved
en-AU language option generates a decoder model error (transcription) resolved
Issues with transcription when enabling non en-US languages (MRCP) resolved
Incorrect audio offsets resolved for streaming transcription
Grammar based transcription interactions not being recorded as transcriptions in the analysis portal resolved
gRPC Grammar-Based AMD/CPA & enhanced transcription not being properly archived resolved
Issued with enhanced transcription leading to a session crash resolved
Transcription wav formatted file not generating results resolved in python script
DTMF MRCP result format changed to align with legacy 19.x platform
Simple_mrcpclient text-based grammar parse not working resolved
Installation notes
The following helm chart can be used
Note that for MRCP there is no helm chart but a docker compose file. MRCP will run on its own Docker virtual machine which will integrate into the Kubernetes cluster.
Note: There have been helm charts changes - please ensure that if you have custom helm charts that you take note of all the changes before installing/upgrading e.g. licensing has moved from common to global - looking for custom license guid
If installing for MRCP - notes that the conf file settings for MRCP API have been replaced with environment variables e.g. to enable compatibility mode.
Upgrade procedures
Upgrade or migration from previous versions is supported. Please contact LumenVox to discuss. See notes above
Updated API guide
APIs for all speech products available on version 5.2 can be obtained here: https://developer.lumenvox.com/
Information for voice biometric products relates to version 3.4.0-3.4.3
Model versions as part of the release
ASR - 4.1.0
TTS - 1.0 sample rate 22
VB - 2.1.15
VB incorporates Selene 2.4.3 which was integrated into the Container stack
Model version changes
None