Call Progress Analysis (CPA) Technical Reference
Overview: Industry-Leading Outbound Connectivity
Capacity Call Progress Analysis leverages the strength of proprietary signal processing and speech recognition to accurately determine whether a human or machine has answered the call, and in each case, whether it’s a business or residence. By utilizing statistically driven timing cues and next-generation orchestration, the system ensures that outbound messaging applications are informed of the precise next step—whether to hand off to a live agent or leave a personalized voicemail.
Strategic Impact: Efficiency and Customer Experience
In high-volume environments, CPA can relieve outbound call agents from 65% of calls that are unproductive, while also preventing customer frustration when answering a call with no live agent present.
- Intelligent Discrimination: The system distinguishes between commercial/business responses, private residences, fax answers, and various Special Information Tones (SIT).
- Precision Timing: For appointment reminders and time-sensitive voice notices, the engine determines the precise timing to deliver a message—ensuring it is not delivered too early (cutting off the beginning) or too late (causing listener fatigue).
How Call Progress Analysis Works
Perfect timing and precision are core to the functionality, and partners can customize configurations for optimum results. The solution involves two complementary algorithms that can be used via separate API calls in parallel or in any order that suits your business.
The Two-Track Detection Engine
- The CPA Algorithm: Performs speech analysis to classify he answering party based on the greeting length and silence intervals. This classification is always performed post-connection— after the call has been answered.
- The AMD Algorithm: Deciphers tones (such as answering machine beeps) to detect the optimal timing for message delivery when interacting with a machine.
When configuring these settings, implementers must decide the acceptable balance between increasing CPA’s accuracy and the delays that responders on the line experience.
Standard Classification Outcomes
The engine analyzes the duration of the initial greeting following a "Connect" signal to categorize the call into one of four states:
| Classification | Greeting Length (statistical trigger) | Typical Example |
|---|---|---|
| Human Residence | Less than 1.8 seconds | "Hello?" |
| Human Business | 1.8 to 3.0 seconds | "Thank you for calling Capacity..." |
| Unknown Speech | Greater than 3.0 seconds | AI prompts or answering machine messages |
| Unknown Silence | No speech within timeout (typ. 5s) | Dead air or unconfigured machines |
Advanced Orchestration (Capacity Private Cloud 7.0+)
Release 7.0 introduces specialized logic to navigate the "purgatory" of modern AI call screening assistants like Apple and Google Call Screening.
Prompt End Detection (prompt_end_detect)
To prevent audio "collisions" where your message overlaps with an AI assistant's prompt, v7.0 introduces an orchestration milestone:
- Workflow: Once the engine identifies Unknown Speech as an AI prompt, the application can set prompt_end_detect to true.
- Notification: The engine notifies the application of a "PROMPT END" event in the transcript value of the asrResult object, exactly when the AI assistant stops speaking.
- Impact: This allows the application to deliver its message during the critical ~20-second window for clear mobile transcription.
To utilize the orchestration features, ensure your client application manages the cpaSettings object:
{
"cpaSettings": {
"prompt_end_detect": true,
"prompt_end_timeout_ms": 10000
}
}Note: The default prompt_end_timeout_ms is 10000ms (10 seconds) if omitted, which is the amount of time the system will wait for the prompt end to be detected before returning a "PROMPT_END_TIMEOUT" notification. Applications can configure longer timeouts if they prefer.
Apple Screening Tone Detection
For Apple devices, a specific tone plays after a secondary prompt, signaling that the user is currently viewing the live transcript on their screen while deciding whether to answer the call.
- The Handoff: This allows the application to re-engage standard CPA to catch the exact moment a human interrupts the screening, facilitating a rapid, professional connection to a live agent.
Core Timing Parameters
Precision in outbound dialing depends on three critical timing variables:
- Leading Silence: The pause before the recipient begins speaking.
- VAD_EOS_DELAY (End of Speech Delay): The buffer used to confirm speech has ended. The default is 1200ms. Shorter values increase speed but may misclassify machine pauses as human.
- Maximum Silence Timeout: The threshold after which a call is classified as Unknown Silence. Setting this too low can lead to misclassifying slow-to-speak humans.
Typical Integration Workflows
The two primary use-cases for Call Progress Analysis are for automated message or payload delivery, and to connect a call center agent to a human, while avoiding connecting them to answering machines or otherwise non-responsive number.
Automated Message Delivery
Some applications are designed to deliver messages to recipients. This use case for CPA determines whether a live human answered the call, or an answering machine or AI Agent picked up. In either scenario, it is possible to deliver the recorded (or synthesized) message efficiently using CPA.
Once the call has begun, the outbound system waits for a short period, listening for the presence of a greeting message (from a human or machine) before deciding how to proceed. If a human is detected, the message can be played as desired, whereas if a machine is detected, the call flow could wait until speech ends, potentially allowing much longer for this to occur than when performing predictive dialing for an agent, before message delivery.
Tone detection can be used to listen for an answering machine beep or tone while the message is being delivered, and if detected, the call flow can simply restart the message. In this way, the answering machine will not get a truncated end of message, but the entire message that was intended. This is more useful and less annoying than if tone detection was not used in this case. Also, there may be some importance assigned to accurately delivering the complete message, such as an appointment reminder, or some emergency alert that may not be accurately conveyed if CPA is not used fully.
- Start CPA: Call is answered.
- Detection: System identifies Unknown Speech (AI prompt).
- Enable prompt_end_detect: Application sets this parameter to true and re-activates CPA.
- Wait for PROMPT END: Engine signals when the AI has finished its prompt.
- Deliver Payload: Application plays message for transcription.
Predictive Dialer
The "Predictive Dialer" use case is designed to maximize agent productivity by ensuring they are only connected to live, responsive human contacts. By utilizing Capacity Private Cloud's statistically-driven CPA, the system filters out the 65% of outbound calls that are typically unproductive, such as those resulting in ringing phones, busy signals, faxes, or SIT tones. This ensures that agents spend their valuable time speaking with intended recipients rather than waiting for connections.
When a call is initiated, the system analyzes the answering party's initial greeting with pinpoint accuracy to distinguish between a human residence, a business, or an automated machine. If a human is detected, the dialer bridges the call to a live agent immediately, allowing the connection to occur within the critical two-second window following the greeting. In scenarios involving modern AI gatekeepers, such as Apple Call Screening, the engine identifies the specific "Decision Milestone" tone. This notifies the system that the user is actively viewing the transcript, allowing the application to re-engage CPA to catch the exact moment the human interrupts to say "Hello" and route the call to an agent without delay.
This precision-based approach avoids the frustration of "robocalls" where no agent is present upon answering. If the system determines that a machine or AI agent has answered instead of a human, the dialer can be configured to either disconnect or transition to an automated message flow, preserving the agent's availability for the next live connection. Through this advanced orchestration, contact centers can dramatically reduce idle time and operational costs while maintaining a professional and compliant customer experience.
- Start CPA: Engine monitors for a greeting.
- Detection: System identifies Human Residence.
- Action: Application bridges the live agent immediately (within the 2-second regulatory window).
