G10L15/12

AUDIO DATA PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, MEDIUM AND PROGRAM PRODUCT
20240038251 · 2024-02-01 ·

An audio data processing method is provided. The method includes: obtaining human voice audio data to be adjusted and reference human voice audio data; performing framing on the human voice audio data to be adjusted and the reference human voice audio data respectively so as to obtain a first audio frame set and a second audio frame set respectively; recognizing a pronunciation unit corresponding to each audio frame respectively; determining, based on a timestamp of each audio frame, a timestamp of each pronunciation unit in the human voice audio data to be adjusted and the reference human voice audio data respectively; and adjusting the timestamp of at least one pronunciation unit to make the timestamp of the pronunciation unit in the human voice audio data to be adjusted to be consistent with the timestamp of the corresponding pronunciation unit in the reference human voice audio data.

AUDIO DATA PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, MEDIUM AND PROGRAM PRODUCT
20240038251 · 2024-02-01 ·

An audio data processing method is provided. The method includes: obtaining human voice audio data to be adjusted and reference human voice audio data; performing framing on the human voice audio data to be adjusted and the reference human voice audio data respectively so as to obtain a first audio frame set and a second audio frame set respectively; recognizing a pronunciation unit corresponding to each audio frame respectively; determining, based on a timestamp of each audio frame, a timestamp of each pronunciation unit in the human voice audio data to be adjusted and the reference human voice audio data respectively; and adjusting the timestamp of at least one pronunciation unit to make the timestamp of the pronunciation unit in the human voice audio data to be adjusted to be consistent with the timestamp of the corresponding pronunciation unit in the reference human voice audio data.

DIALOGUE SYSTEM AND DIALOGUE PROCESSING METHOD

A dialogue system for a vehicle may include: an input processor configured to receive a user's utterance, to acquire an utterance text by recognizing the user's utterance, to recognize a dialogue subject based on the acquired utterance text, and to identify the user; and a dialogue manager including a memory storing program instructions and a processor configured to execute the stored program instructions, the dialogue manager configured to verify whether a chat room related to the dialogue subject is present, and to determine whether to add the identified user as a participant of the chat room based on a result of the verification.

Method and apparatus for searching for geographic information using interactive voice recognition

An apparatus for searching for geographic information using interactive voice recognition includes: a receiver configured to receive a voice signal; a voice recognition unit configured to recognize the voice signal; a result analysis processing unit configured to search for geographic information on the basis of the recognized voice signal, and analyze a search result of the geographic information; and a question generating unit configured to generate a question in response to the result of determination. A method for searching for geographic information using interactive voice recognition includes: receiving a voice signal, and recognizing the voice signal; searching for geographic information on the basis of the recognized voice signal; analyzing a search result of the geographic information; and generating a question in response to the result of determination.

Method and apparatus for searching for geographic information using interactive voice recognition

An apparatus for searching for geographic information using interactive voice recognition includes: a receiver configured to receive a voice signal; a voice recognition unit configured to recognize the voice signal; a result analysis processing unit configured to search for geographic information on the basis of the recognized voice signal, and analyze a search result of the geographic information; and a question generating unit configured to generate a question in response to the result of determination. A method for searching for geographic information using interactive voice recognition includes: receiving a voice signal, and recognizing the voice signal; searching for geographic information on the basis of the recognized voice signal; analyzing a search result of the geographic information; and generating a question in response to the result of determination.

UNSUPERVISED KEYWORD SPOTTING AND WORD DISCOVERY FOR FRAUD ANALYTICS
20240062753 · 2024-02-22 · ·

Embodiments described herein provide for a computer that detects one or more keywords of interest using acoustic features, to detect or query commonalities across multiple fraud calls. Embodiments described herein may implement unsupervised keyword spotting (UKWS) or unsupervised word discovery (UWD) in order to identify commonalities across a set of calls, where both UKWS and UWD employ Gaussian Mixture Models (GMM) and one or more dynamic time-warping algorithms. A user may indicate a training exemplar or occurrence of call-specific information, referred to herein as a named entity, such as a person's name, an account number, account balance, or order number. The computer may perform a redaction process that computationally nullifies the import of the named entity in the modeling processes described herein.

Augmentation of Audiographic Images for Improved Machine Learning

Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.

Augmentation of Audiographic Images for Improved Machine Learning

Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.

Analog-digital converter and analog-to-digital conversion method
10419016 · 2019-09-17 · ·

An ADC and an analog-to-digital conversion method are provided. The ADC includes: a clock generator, including M transmission gates, where the M transmission gates are configured to receive a first clock signal that is periodically sent and separately perform gating control on the first clock signal, so as to generate M second clock signals, M is an integer that is greater than or equal to 2; M ADC channels that are configured in a time interleaving manner, configured to receive one analog signal and separately perform, under the control of the M second clock signals, sampling and analog-to-digital conversion on the analog signal, so as to obtain M digital signals, where each ADC channel is corresponding to one clock signal of the M second clock signals; and an adder, configured to add the M digital signals together in a digital field, so as to obtain a digital output signal.

Analog-digital converter and analog-to-digital conversion method
10419016 · 2019-09-17 · ·

An ADC and an analog-to-digital conversion method are provided. The ADC includes: a clock generator, including M transmission gates, where the M transmission gates are configured to receive a first clock signal that is periodically sent and separately perform gating control on the first clock signal, so as to generate M second clock signals, M is an integer that is greater than or equal to 2; M ADC channels that are configured in a time interleaving manner, configured to receive one analog signal and separately perform, under the control of the M second clock signals, sampling and analog-to-digital conversion on the analog signal, so as to obtain M digital signals, where each ADC channel is corresponding to one clock signal of the M second clock signals; and an adder, configured to add the M digital signals together in a digital field, so as to obtain a digital output signal.