Release notes 5.2.0

Release date: 7th October 2024



Summary

This page highlights all the changes, new features and bugs addressed within the LumenVox Containers version 5.2.0 release. This change affects Speech products. This version is not available for Voice biometric products, this will be made available in upcoming releases. These release notes also include changes made as part of 5.1.1 - 5.1.4

This release builds upon the 5.1.0 release - see https://lumenvox.capacity.com/article/314233/release-notes-5-1-0.

Highlights

  1. Continuous transcription released

  2. Custom fine-tuned ASR model & transcription accuracy score released for Apple (GPU based) 

    1. This includes a new Prometheus metric asr_fine_tuned_results

  3. New intent processing added to perform word spotting (requires training of custom intent model), includes on-the-fly grammar changes

  4. Enhancements to secure context applied

  5. Configurable barge-in timeout/eos delay introduced for Apple

  6. Analysis portal caters for enhanced transcription interactions

  7. Analysis portal allows for playback speed to be selected

  8. Port-level license tracking added

  9. Built in grammars released for Japanese

  10. Python sample script released to perform transcription interactions stereo audio streams and continuous transcription

 *Note: For TTS, we recommend that text not exceed 4mb (this is roughly 1300 Characters with spaces or around 250 words)

**Note: For Transcription, we recommend that users not exceed 90 minutes of transcribed audio due to gRPC size limits.

Whatโ€™s new on LumenVox Cloud 5.2.0 

New features

  • Continuous transcription released (for more information see Continuous transcription under LumenVox API Documentation
  • Custom fine-tuned ASR model & transcription accuracy score released for Apple (GPU based) see Custom fine-tuned model & transcription quality score under LumenVox API Documentation
    • A new Prometheus metric was released for the fine-tuned model: asr_fine_tuned_results (for final results). There are four buckets available in the metric
      • FT - if transcription request was processed by the fine-tuned model

      • DNN - if transcription request was processed by the DNN model

      • Error - if there was an error processing interaction

      • Timeout - if there was a decode timeout

  • Analysis portal caters for enhanced transcription interactions

  • Analysis portal allows for playback speed to be selected

  • Port-level license tracking added

  • Built in grammars released for Japanese

  • Python sample script released to perform transcription interactions stereo audio streams and continuous transcription

Updates

  • Implement a configurable barge-in timeout/eos delay for Apple

  • Final results appearing in LumenVox API Logs removed if secure context enabled 

  • TTS Input Text in LumenVox API logs removed if secure context enabled 

  • LumenVox API service logs revealing grammar parse input text removed if secure context enabled 

  • LumenVox API logs revealing normalize text input removed if secure context enabled 

  • Incorrect interpretation grammar response being returned resolved for grammar-based CPA & AMD interactions

  • License expiry issue when license cache expires resolved

  • en-AU language option generates a decoder model error (transcription) resolved

  • Issues with transcription when enabling non en-US languages (MRCP) resolved

  • Incorrect audio offsets resolved for streaming transcription

  • Grammar based transcription interactions not being recorded as transcriptions in the analysis portal resolved

  • gRPC Grammar-Based AMD/CPA & enhanced transcription not being properly archived resolved

  • Issued with enhanced transcription leading to a session crash resolved

  • Transcription wav formatted file not generating results resolved in python script

  • DTMF MRCP result format changed to align with legacy 19.x platform

  • Simple_mrcpclient text-based grammar parse not working resolved

Installation notes

The following helm chart can be used

Helm Chart

Note that for MRCP there is no helm chart but a docker compose file. MRCP will run on its own Docker virtual machine which will integrate into the Kubernetes cluster.

Note: There have been helm charts changes - please ensure that if you have custom helm charts that you take note of all the changes before installing/upgrading e.g. licensing has moved from common to global - looking for custom license guid

If installing for MRCP - notes that the conf file settings for MRCP API have been replaced with environment variables e.g. to enable compatibility mode.

Upgrade procedures

Upgrade or migration from previous versions is supported. Please contact LumenVox to discuss. See notes above

Updated API guide

APIs for all speech products available on version 5.2 can be obtained here: https://developer.lumenvox.com/    

Information for voice biometric products relates to version 3.4.0-3.4.3 

Model versions as part of the release

ASR - 4.1.0

TTS - 1.0 sample rate 22

VB - 2.1.15

VB incorporates Selene 2.4.3 which was integrated into the Container stack

Model version changes

None


Was this article helpful?
Copyright (C) 2001-2024, Ai Software, LLC d/b/a LumenVox