G10L2025/906

Machine learning based call routing system

Machine learning technology can analyze in real-time the data from a call between a person and a customer service representative. Based on this analysis, a server can determine a sentiment score that describes a sentiment expressed by the person or the customer service representative. If the server determines that the sentiment score is less than or equal to a pre-determined value, the server can inform the customer service representative's manager so that the manager can take further action to help the person and/or the customer service representative.

Extracting content from speech prosody

A prosodic speech recognition engine configured to identify prosodic features and patterns in a speech continuum for the extraction of linguistic content including para-syntactic content, discourse function, information structure, meaning, and speaker sentiment.

Estimating Pitch of Harmonic Signals
20170294196 · 2017-10-12 ·

A time-varying pitch of a signal may be estimated by processing a sequence of frames of the speech signal. An estimated fractional chirp rate may be computed for each frame of the sequence of frames, and the estimated fractional chirp rates may be used to compute a pitch template for the sequence, where the pitch template indicates the time-varying pitch of the signal subject to a scale factor. A first pitch estimate for each frame of the sequence of frames may be computed by computing a scale factor and multiplying the pitch template by the scale factor. A second pitch estimate may be computed from the first pitch estimate by identifying peaks in the frequency representations using the first pitch estimates and fitting a parametric function to the peaks.

METHODS AND APPARATUSES FOR TRACKING WEAK SIGNAL TRACES
20220051075 · 2022-02-17 ·

Systems, methods, apparatuses, and computer program products for tracking weak signal traces under severe noise and/or distortions. A method may include tracking at least one candidate frequency trace from a time-frequency representation of a signal. The method may also include identifying a frequency trace of the signal based on tracking results. In addition, the method may include outputting an estimated frequency vector related to the frequency trace. Further, the tracking may be performed under a noisy condition environment.

Dynamically adapted pitch correction based on audio input

Systems and methods for adjusting pitch of an audio signal include detecting input notes in the audio signal, mapping the input notes to corresponding output notes, each output note having an associated upper note boundary and lower note boundary, and modifying at least one of the upper note boundary and the lower note boundary of at least one output note in response to previously received input notes. Pitch of the input notes may be shifted to match an associated pitch of corresponding output notes. Delay of the pitch shifting process may be dynamically adjusted based on detected stability of the input notes.

SPEECH ENHANCEMENT USING CLUSTERING OF CUES
20220148611 · 2022-05-12 · ·

A method for speech enhancement, the method may include receiving or generating sound samples that represent sound signals that were received during a given time period by an array of microphones; frequency transforming the sound samples to provide frequency-transformed samples; clustering the frequency-transformed samples to speakers to provide speaker related clusters, wherein the clustering is based on (i) spatial cues related to the received sound signals and (ii) acoustic cues related to the speakers; determining a relative transfer function for each speaker of the speakers to provide speakers related relative transfer functions; applying a multiple multiple output (MIMO) beamforming operation on the speakers related relative transfer functions to provide beamformed signals; and inverse-frequency transforming the beamformed signals to provide speech signals.

PREDICTING GLOTTAL INSUFFICIENCY USING FREQUENCY ANALYSIS
20220028416 · 2022-01-27 ·

A system comprising at least one hardware processor and a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor to: receive a voice recording comprising a phonation by a subject, analyze said voice recording to calculate a fundamental frequency contour curve of said phonation, measure at least one of (i) a time period from a start of said phonation until said contour curve reaches a settled level, (ii) a slope of said contour curve during said time period, and (iii) an area under said contour curve during said time period, and determine a glottal closure insufficiency in said subject based, at least in part, on said measuring.

Machine learning based call routing system

Machine learning technology can analyze in real-time the data from a call between a person and a customer service representative. Based on this analysis, a server can determine a sentiment score that describes a sentiment expressed by the person or the customer service representative. If the server determines that the sentiment score is less than or equal to a pre-determined value, the server can inform the customer service representative's manager so that the manager can take further action to help the person and/or the customer service representative.

Real-time pitch tracking by detection of glottal excitation epochs in speech signal using Hilbert envelope

A technique, suitable for real-time processing, is disclosed for pitch tracking by detection of glottal excitation epochs in speech signal. It uses Hilbert envelope to enhance saliency of the glottal excitation epochs and to reduce the ripples due to the vocal tract filter. The processing comprises the steps of dynamic range compression, calculation of the Hilbert envelope, and epoch marking. The Hilbert envelope is calculated using the output of a FIR filter based Hilbert transformer and the delay-compensated signal. The epoch marking uses a dynamic peak detector with fast rise and slow fall and nonlinear smoothing to further enhance the saliency of the epochs, followed by a differentiator or a Teager energy operator, and amplitude-duration thresholding. The technique is meant for use in speech codecs, voice conversion, speech and speaker recognition, diagnosis of voice disorders, speech training aids, and other applications involving pitch estimation.

Method, System and Apparatus for Understanding and Generating Human Conversational Cues
20220115001 · 2022-04-14 ·

A voice-based digital assistant (VDA) uses a conversation intelligence (CI) manager module having a rule-based engine on conversational intelligence to process information from one or more modules to make determinations on both i) understanding the human conversational cues and ii) generating the human conversational cues, including at least understanding and generating a backchannel utterance, in a flow and exchange of human communication in order to at least one of grab or yield a conversational floor between a user and the VDA. The CI manager module uses the rule-based engine to analyze and make a determination on a conversational cue of, at least, prosody in a user's flow of speech to generate the backchannel utterance to signal any of i) an understanding, ii) a correction, iii) a confirmation, and iv) a questioning of verbal communications conveyed by the user in the flow of speech during a time frame when the user still holds the conversational floor.