Release notes 5.2.0

Release date: 7th October 2024

Summary

This page highlights all the changes, new features and bugs addressed within the LumenVox Containers version 5.2.0 release. This change affects Speech products. This version is not available for Voice biometric products, this will be made available in upcoming releases. These release notes also include changes made as part of 5.1.1 - 5.1.4

This release builds upon the 5.1.0 release - see https://lumenvox.capacity.com/article/314233/release-notes-5-1-0.

Highlights

Continuous transcription released
Custom fine-tuned ASR model & transcription accuracy score released for Apple (GPU based)
1. This includes a new Prometheus metric asr_fine_tuned_results
New intent processing added to perform word spotting (requires training of custom intent model), includes on-the-fly grammar changes
Enhancements to secure context applied
Configurable barge-in timeout/eos delay introduced for Apple
Analysis portal caters for enhanced transcription interactions
Analysis portal allows for playback speed to be selected
Port-level license tracking added
Built in grammars released for Japanese
Python sample script released to perform transcription interactions stereo audio streams and continuous transcription

*Note: For TTS, we recommend that text not exceed 4mb (this is roughly 1300 Characters with spaces or around 250 words)

**Note: For Transcription, we recommend that users not exceed 90 minutes of transcribed audio due to gRPC size limits.

What’s new on LumenVox Cloud 5.2.0

New features

Continuous transcription released (for more information see Continuous transcription under LumenVox API Documentation)
Custom fine-tuned ASR model & transcription accuracy score released for Apple (GPU based) see Custom fine-tuned model & transcription quality score under LumenVox API Documentation
- A new Prometheus metric was released for the fine-tuned model: asr_fine_tuned_results (for final results). There are four buckets available in the metric
  - FT - if transcription request was processed by the fine-tuned model
  - DNN - if transcription request was processed by the DNN model
  - Error - if there was an error processing interaction
  - Timeout - if there was a decode timeout
Analysis portal caters for enhanced transcription interactions
Analysis portal allows for playback speed to be selected
Port-level license tracking added

Built in grammars released for Japanese
Python sample script released to perform transcription interactions stereo audio streams and continuous transcription

Updates

Implement a configurable barge-in timeout/eos delay for Apple
Final results appearing in LumenVox API Logs removed if secure context enabled
TTS Input Text in LumenVox API logs removed if secure context enabled
LumenVox API service logs revealing grammar parse input text removed if secure context enabled
LumenVox API logs revealing normalize text input removed if secure context enabled
Incorrect interpretation grammar response being returned resolved for grammar-based CPA & AMD interactions
License expiry issue when license cache expires resolved
en-AU language option generates a decoder model error (transcription) resolved
Issues with transcription when enabling non en-US languages (MRCP) resolved
Incorrect audio offsets resolved for streaming transcription
Grammar based transcription interactions not being recorded as transcriptions in the analysis portal resolved
gRPC Grammar-Based AMD/CPA & enhanced transcription not being properly archived resolved
Issued with enhanced transcription leading to a session crash resolved
Transcription wav formatted file not generating results resolved in python script
DTMF MRCP result format changed to align with legacy 19.x platform
Simple_mrcpclient text-based grammar parse not working resolved

Installation notes

The following helm chart can be used

Helm Chart

Note that for MRCP there is no helm chart but a docker compose file. MRCP will run on its own Docker virtual machine which will integrate into the Kubernetes cluster.

Note: There have been helm charts changes - please ensure that if you have custom helm charts that you take note of all the changes before installing/upgrading e.g. licensing has moved from common to global - looking for custom license guid

If installing for MRCP - notes that the conf file settings for MRCP API have been replaced with environment variables e.g. to enable compatibility mode.

Upgrade procedures

Upgrade or migration from previous versions is supported. Please contact LumenVox to discuss. See notes above

Updated API guide

APIs for all speech products available on version 5.2 can be obtained here: https://developer.lumenvox.com/

Information for voice biometric products relates to version 3.4.0-3.4.3

Model versions as part of the release

ASR - 4.1.0

TTS - 1.0 sample rate 22

VB - 2.1.15

VB incorporates Selene 2.4.3 which was integrated into the Container stack

Model version changes

None

Was this article helpful?