Product architecture
The following diagram depicts the high-level architecture of the solution. It shows how the client can integrate with LumenVox voice biometrics services through APIs:
Figure 1: High-level solution architecture
Product Components
APIs
Customers can consume the voice biometrics services via the following APIs available: Assure/Biometrics, Speech, Management, and Reporting.
Assure Biometrics API
This collection of APIs is used to create identities, enroll them into the platform, and verify them. They are also used to manage identities e.g., disabling an identity, or removing a voiceprint.
Speech API
This collection can be used to consume LumenVox speech products such as ASR (Automatic Speech Recognition), TTS (Text to Speech), CPA (Call Progress Analysis), Transcription, and NLU gateway (Natural Language Understanding).
Management API
These APIs are used to manage configuration and deployment parameters.
Reporting API
This collection of APIs is used to extract session and transactional data from the platform. It is also used to provide various statistics e.g., number of enrollment transactions processed per month or total voiceprint enrolled to-date. The extracts can be used to obtain further information on biometric operations or audit trails.
Open-Source Software Components
RabbitMQ
This message queuing component is implemented independently by the customer/partner and is used by LumenVox software for the queuing of all transaction, audit, and management information. Once a service processes the data requests, they are removed from the queue. A distributed cluster of RabbitMQ instances is recommended in production for high availability and scalability.
Redis
This in-memory cache component is implemented independently by the customer/partner and is used by LumenVox software for the caching of session & transaction information throughout the whole solution whilst it is being processed by the various internal services. It is this component that receives and processes the audio for biometric processing. A distributed cluster of Redis instances is recommended in production, for high availability and disaster recovery.
MongoDB
This document-oriented database is implemented independently by the customer/partner and is used by LumenVox software to store the enrollment & verification audio files along with the voiceprint models. It is important to store the audio for audit/troubleshooting purposes. Enrollment audio is also required if voiceprints require re-enrollment (e.g., a new DNN model is implemented). The storing of audio files is configurable. A distributed cluster of MongoDB instances is recommended in production for high availability and scalability.
PostgreSQL
This SQL compliant relational database is implemented independently by the customer/partner and is used by LumenVox software to store all the transactional data. A distributed cluster of PostgreSQL instances is recommended in production for high availability and scalability.
Licensing Service
To consume LumenVox products, the containers are required to communicate with LumenVox’s cloud licensing service to submit information on product utilization. The customer needs to ensure that external firewall requirements are modified to allow the external connection.
LumenVox Portal
The LumenVox Portal is a web-based interface that utilizes our APIs to allow customers/partners to manage the deployments, view or edit configurations and perform health checks. A customer/partner may integrate the APIs into their own web portals and dashboards, if they don’t want to use the one provided by LumenVox. The portal can also be used to access reports and manage voiceprints.
Solution Requirements
Hardware
The solution requires an environment in which the containers can be installed - this can be a Linux or Microsoft Windows environment. A Linux-based environment is the preferred environment for optimal performance.
An example of the required hardware is provided below:
Kubernetes Environment
- 3 nodes X 8 CPU X 8 Gig memory
- 100 concurrent requests
| # of Pods | CPU | Memory |
assure-identity | 2 | 140m | 1850Mi |
assure-api | 2 | 980m | 2050Mi |
audit | 1 | 10m | 90Mi |
binarystorage | 1 | 170m | 210Mi |
configuration | 1 | 10m | 150Mi |
deployment | 1 | 10m | 100Mi |
engineresource | 1 | 10m | 890Mi |
license | 1 | 10m | 20Mi |
management-api | 1 | 10m | 100Mi |
reporting | 1 | 10m | 70Mi |
reporting-api | 1 | 10m | 90Mi |
transaction | 1 | 260m | 390Mi |
voice-verifier | 2 | 2460m | 1290Mi |
Rabbit MQ |
|
| |
CPU Usage: 13% |
|
| |
Memory Usage: 1008Mi |
|
| |
Peak Messages 4800 |
|
| |
Redis Cache |
|
|
|
CPU Usage: 2.8% |
|
| |
Memory Usage: 390Mi |
|
|
|
Calls: Get 174/s |
|
|
|
Setex 101/s |
|
|
|
set 57/s |
|
|
|
Network: 6MB/s |
|
|
|
Mongo |
|
|
|
CPU Usage: 7% |
|
|
|
Memory Usage: 14.48Gi |
|
|
|
Postgres |
|
|
|
CPU Usage: 11% |
|
|
|
Memory Usage: 4.91Gi |
|
|
|
* Note that the measurements shown for the "provisioned" services above only show a standalone (non-clustered) test environment. Sizing for production will need to be determined based on your specific cluster requirements.
The client should consider creating both a test and a production system. It is recommended that the Redis, RabbitMQ, MongoDB and PostgreSQL components be provisioned outside of the Kubernetes cluster for performance purposes.
Software
LumenVox Software works well on Kubernetes infrastructure.
The following additional software components are necessary:
- RabbitMQ
- Redis
- MongoDB
- PostgreSQL
- Prometheus or similar product, should the customer want to monitor their solution. Note that although we provide seamless plugin capability for our services to Prometheus (or other services) leveraging the Prometheus metrics endpoints, we don’t specialize in setting up Prometheus. A host of Prometheus metrics are available on each container for the client to monitor.
Audio
Audio must be recorded in one of the following formats and converted & submitted as the following types of headerless base64 encoded byte-streams:
- Linear signed PCM - 16-bit 8kHz sample rate (PCM-16)
- alaw compressed 8-bit 8kHz sample rate
- ulaw compress 8-bit 8kHz sample rate
Model
The solution comes with an out-of-the-box model for voiceprint enrollment and verification. However, the best accuracy is always achieved when the model is calibrated and tuned with a client’s own domain-specific data. To improve accuracy, the existing model could be recalibrated to adjust score thresholds. The model can be fine-tuned where the model’s scoring back-end is retrained with client production audio. The model can also be calibrated and fine-tuned prior to production should the client have either:
- Voice biometric recordings from an existing system or
- Collected audio samples via a data collection exercise
Other Software and Activities in a Production Environment
Some or all of the following activities will take place in a production environment and are customer or partner responsibility:
- Provision of required hardware & host containerization software
- Installation of LumenVox software
- Management of Kubernetes, RabbitMQ, Redis, MongoDB, and PostgreSQL
- Monitoring of hardware, software, and services e.g., by using tools like Prometheus
- Monitoring of log files e.g., by using Log analysis tools like Datadog, Splunk
- Stress testing of the full solution in customer environment
- Set up and monitoring of network/component latency
- Database management including scheduling of database cleanups
- Backup management