G10L15/02

Personal Voice-Based Information Retrieval System
20180007201 · 2018-01-04 ·

The present invention relates to a system for retrieving information from a network such as the Internet. A user creates a user-defined record in a database that identifies an information source, such as a web site, containing information of interest to the user. This record identifies the location of the information source and also contains a recognition grammar based upon a speech command assigned by the user. Upon receiving the speech command from the user that is described within the recognition grammar, a network interface system accesses the information source and retrieves the information requested by the user.

Personal Voice-Based Information Retrieval System
20180007201 · 2018-01-04 ·

The present invention relates to a system for retrieving information from a network such as the Internet. A user creates a user-defined record in a database that identifies an information source, such as a web site, containing information of interest to the user. This record identifies the location of the information source and also contains a recognition grammar based upon a speech command assigned by the user. Upon receiving the speech command from the user that is described within the recognition grammar, a network interface system accesses the information source and retrieves the information requested by the user.

OBFUSCATING TRAINING DATA
20180005626 · 2018-01-04 ·

Examples disclosed herein involve obfuscating training data. An example method includes computing a sequence of acoustic features from audio data of training data, the training data comprising the audio data and a corresponding text transcript; mapping the acoustic features to acoustic model states to generate annotated feature vectors, the annotated feature vectors comprising the acoustic features and corresponding context from the text transcript; and providing a randomized sequence of the annotated feature vectors as obfuscated training data to an audio analysis system.

OBFUSCATING TRAINING DATA
20180005626 · 2018-01-04 ·

Examples disclosed herein involve obfuscating training data. An example method includes computing a sequence of acoustic features from audio data of training data, the training data comprising the audio data and a corresponding text transcript; mapping the acoustic features to acoustic model states to generate annotated feature vectors, the annotated feature vectors comprising the acoustic features and corresponding context from the text transcript; and providing a randomized sequence of the annotated feature vectors as obfuscated training data to an audio analysis system.

METHOD AND DEVICE FOR INFORMATION PROCESSING
20180005624 · 2018-01-04 ·

An information processing method and an electronic device are provided. The method includes: obtaining audio data collected by a slave device; obtaining contextual data corresponding to the slave device; and obtaining a recognition result of recognizing the audio data based on the contextual data. The contextual data characterizes a voice environment of the audio data collected by the slave device.

METHOD AND DEVICE FOR INFORMATION PROCESSING
20180005624 · 2018-01-04 ·

An information processing method and an electronic device are provided. The method includes: obtaining audio data collected by a slave device; obtaining contextual data corresponding to the slave device; and obtaining a recognition result of recognizing the audio data based on the contextual data. The contextual data characterizes a voice environment of the audio data collected by the slave device.

METHOD, APPARATUS FOR ELIMINATING POPPING SOUNDS AT THE BEGINNING OF AUDIO, AND STORAGE MEDIUM
20180012620 · 2018-01-11 ·

A method and apparatus for eliminating popping sounds at the beginning of audio includes: examining audio frames within a pre-set time period at the beginning of audio to determine a popping residing section; applying popping elimination to audio frames in the popping residing section; calculating an average value of amplitudes of M audio frames preceding the popping residing section and an average value of amplitudes of K audio frames succeeding the popping residing section; setting the amplitudes of the audio frames in the popping residing section to zero in response to a determination that the two average values are both smaller than a pre-set sound reduction threshold; weakening the amplitudes of the audio frames in the popping residing section in response to a determination that both the two average values are not smaller than a pre-set sound reduction threshold; M and K are integers larger than one.

METHOD, APPARATUS FOR ELIMINATING POPPING SOUNDS AT THE BEGINNING OF AUDIO, AND STORAGE MEDIUM
20180012620 · 2018-01-11 ·

A method and apparatus for eliminating popping sounds at the beginning of audio includes: examining audio frames within a pre-set time period at the beginning of audio to determine a popping residing section; applying popping elimination to audio frames in the popping residing section; calculating an average value of amplitudes of M audio frames preceding the popping residing section and an average value of amplitudes of K audio frames succeeding the popping residing section; setting the amplitudes of the audio frames in the popping residing section to zero in response to a determination that the two average values are both smaller than a pre-set sound reduction threshold; weakening the amplitudes of the audio frames in the popping residing section in response to a determination that both the two average values are not smaller than a pre-set sound reduction threshold; M and K are integers larger than one.

Audio data processing method, apparatus and storage medium for detecting wake-up words based on multi-path audio from microphone array

An audio data processing method is provided. The method includes: obtaining multi-path audio data in an environmental space, obtaining a speech data set based on the multi-path audio data, and separately generating, in a plurality of enhancement directions, enhanced speech information corresponding to the speech data set; matching a speech hidden feature in the enhanced speech information with a target matching word, and determining an enhancement direction corresponding to the enhanced speech information having a highest degree of matching with the target matching word as a target audio direction; obtaining speech spectrum features in the enhanced speech information, and obtaining, from the speech spectrum features, a speech spectrum feature in the target audio direction; and performing speech authentication on the speech hidden feature and the speech spectrum feature that are in the target audio direction based on the target matching word, to obtain a target authentication result.

AUTOMATIC INTERPRETATION METHOD AND APPARATUS

Provided is an automated interpretation method, apparatus, and system. The automated interpretation method includes encoding a voice signal in a first language to generate a first feature vector, decoding the first feature vector to generate a first language sentence in the first language, encoding the first language sentence to generate a second feature vector with respect to a second language, decoding the second feature vector to generate a second language sentence in the second language, controlling a generating of a candidate sentence list based on any one or any combination of the first feature vector, the first language sentence, the second feature vector, and the second language sentence, and selecting, from the candidate sentence list, a final second language sentence as a translation of the voice signal.