G10L25/93

Voice processing method, apparatus, electronic device, and storage medium

Provided in the present disclosure are a voice processing method, an apparatus, an electronic device, and a storage medium, the method comprising: detecting the working state of a current call system, and when the working state is a two-end speaking state or a remote-end speaking state, performing compression processing on a subsequent remote-end voice signal, acquiring a near-end voice signal by means of a microphone, performing echo processing on the basis of the near-end voice signal and the compression-processed remote-end voice signal to obtain an echo-processed near-end voice signal and a remaining echo signal, performing non-linear suppression processing on the near-end voice signal and the remaining echo signal, and performing gain control on the suppression-processed near-end voice signal.

METHOD AND SYSTEM FOR TIME AND FEATURE MODIFICATION OF SIGNALS

The application relates to a computer implemented method and system for modifying at least one feature of an input audio signal based on features in a guide audio signal. The method comprises: determining matchable and unmatchable sections of the guide and input audio signals; generating a time-alignment path for modifying the at least one feature of the input audio signal in the matchable sections of the input audio signal based on corresponding features in the matchable sections of the guide audio signal, based on the time-alignment path, modifying the at least one feature in the matchable sections of the audio input signal.

Machine learning based call routing system

Machine learning technology can analyze in real-time the data from a call between a person and a customer service representative. Based on this analysis, a server can determine a sentiment score that describes a sentiment expressed by the person or the customer service representative. If the server determines that the sentiment score is less than or equal to a pre-determined value, the server can inform the customer service representative's manager so that the manager can take further action to help the person and/or the customer service representative.

APPARATUS AND METHOD FOR SEPARATING VOICE SECTIONS FROM EACH OTHER

The present disclosure relates to an apparatus and method for separating voice sections from each other. Various embodiments are directed to providing an apparatus and method for separating voice sections from each other, which can maximize speaker separation performance for a short voice section by dividing a short voice section having low speaker separation reliability and separating multiple speakers from one another.

APPARATUS AND METHOD FOR SEPARATING VOICE SECTIONS FROM EACH OTHER

The present disclosure relates to an apparatus and method for separating voice sections from each other. Various embodiments are directed to providing an apparatus and method for separating voice sections from each other, which can maximize speaker separation performance for a short voice section by dividing a short voice section having low speaker separation reliability and separating multiple speakers from one another.

Textual echo cancellation
11482244 · 2022-10-25 · ·

A method includes receiving an overlapped audio signal that includes audio spoken by a speaker that overlaps a segment of synthesized playback audio. The method also includes encoding a sequence of characters that correspond to the synthesized playback audio into a text embedding representation. For each character in the sequence of characters, the method also includes generating a respective cancelation probability using the text embedding representation. The cancelation probability indicates a likelihood that the corresponding character is associated with the segment of the synthesized playback audio overlapped by the audio spoken by the speaker in the overlapped audio signal.

Textual echo cancellation
11482244 · 2022-10-25 · ·

A method includes receiving an overlapped audio signal that includes audio spoken by a speaker that overlaps a segment of synthesized playback audio. The method also includes encoding a sequence of characters that correspond to the synthesized playback audio into a text embedding representation. For each character in the sequence of characters, the method also includes generating a respective cancelation probability using the text embedding representation. The cancelation probability indicates a likelihood that the corresponding character is associated with the segment of the synthesized playback audio overlapped by the audio spoken by the speaker in the overlapped audio signal.

Customizing Computer Generated Dialog for Different Pathologies

A computer-generated dialog session is customized for a user having a pathology characterized at least in part by a speech pathology. The user's speech is analyzed for spans of speech in which the starts and ends of the spans satisfy predetermined thresholds of time. Customization occurs by altering at least one of the following configurable parameters: (a) a threshold minimum signal strength of speech (dB) to consider as the start of the span of speech; (b) an adjustment factor by which signal strengths of background noise increases between consecutive spans of speech; (c) a threshold between signal strength during the span of speech and signal strength during the span of non-speech; (d) a start speech time threshold; and (e) an end speech time threshold.

Customizing Computer Generated Dialog for Different Pathologies

A computer-generated dialog session is customized for a user having a pathology characterized at least in part by a speech pathology. The user's speech is analyzed for spans of speech in which the starts and ends of the spans satisfy predetermined thresholds of time. Customization occurs by altering at least one of the following configurable parameters: (a) a threshold minimum signal strength of speech (dB) to consider as the start of the span of speech; (b) an adjustment factor by which signal strengths of background noise increases between consecutive spans of speech; (c) a threshold between signal strength during the span of speech and signal strength during the span of non-speech; (d) a start speech time threshold; and (e) an end speech time threshold.

System and method for virtual assistant situation commentary

Techniques for virtual assistant situation commentary are provided. At least one image frame of a field of view (FOV) of a camera may be received, the at least one image frame intended to be sent to at least one participant of a talk group. A description associated with each element of a plurality of elements within the FOV of the camera may be generated. It may be determined that the at least one participant of the talk group is not currently visually engaged. Audio communication of a sender of the at least one image frame may be monitored to identify a reference to an element of the plurality of elements. The audio communication may be supplemented to include portions of the description of the element that were not included in the audio communication from the sender when it is determined that the at least one participant is not visually engaged.