Patent classifications
G10L25/93
TAIL POINT DETECTION METHOD, ELECTRONIC DEVICE, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
Provided are a tail point detection method and apparatus, a device, and a storage medium. The implementation scheme includes acquiring a target audio; identifying a sentence pattern type of the target audio; determining detection waiting duration according to the sentence pattern type; and determining a result of detecting a tail point of the target audio according to the detection waiting duration.
TAIL POINT DETECTION METHOD, ELECTRONIC DEVICE, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
Provided are a tail point detection method and apparatus, a device, and a storage medium. The implementation scheme includes acquiring a target audio; identifying a sentence pattern type of the target audio; determining detection waiting duration according to the sentence pattern type; and determining a result of detecting a tail point of the target audio according to the detection waiting duration.
AUTOMATIC DETERMINATION OF TIMING WINDOWS FOR SPEECH CAPTIONS IN AN AUDIO STREAM
A content system accessing an audio stream. The content system inputs segments of the audio stream into a speech classifier for classification, the speech classifier generating, for the segments of the audio stream, raw scores representing likelihoods that the respective segment of the audio stream includes an occurrence of a speech sound. The content system generates binary scores for the audio stream based on the set of raw scores, each binary score generated based on an aggregation of raw scores from consecutive series of the segments of the audio stream. The content system generates one or more timing windows for the speech sounds in the audio stream based on the binary scores, each timing window indicating an estimate of a beginning and ending timestamps of one or more speech sounds in the audio stream.
AUTOMATIC DETERMINATION OF TIMING WINDOWS FOR SPEECH CAPTIONS IN AN AUDIO STREAM
A content system accessing an audio stream. The content system inputs segments of the audio stream into a speech classifier for classification, the speech classifier generating, for the segments of the audio stream, raw scores representing likelihoods that the respective segment of the audio stream includes an occurrence of a speech sound. The content system generates binary scores for the audio stream based on the set of raw scores, each binary score generated based on an aggregation of raw scores from consecutive series of the segments of the audio stream. The content system generates one or more timing windows for the speech sounds in the audio stream based on the binary scores, each timing window indicating an estimate of a beginning and ending timestamps of one or more speech sounds in the audio stream.
Audio signal encoding and decoding method, and audio signal encoding and decoding apparatus
An audio signal encoding and decoding method, an audio signal encoding and decoding apparatus, a transmitter, a receiver, and a communications system, which can improve encoding and/or decoding performance. The audio signal encoding method includes dividing a to-be-encoded time domain signal into a low band signal and a high band signal; encoding the low band signal to obtain a low frequency encoding parameter; calculating a voiced degree factor, and predicting a high band excitation signal; weighting the high band excitation signal and random noise using the voiced degree factor, so as to obtain a synthesized excitation signal; and obtaining a high frequency encoding parameter based on the synthesized excitation signal and the high band signal. Technical solutions in the embodiments of the present invention can improve an encoding or decoding effect.
METHODS AND SYSTEMS FOR COMPUTER-GENERATED VISUALIZATION OF SPEECH
Methods, systems and apparatuses for computer-generated visualization of speech are described herein. An example method of computer-generated visualization of speech including at least one segment includes: generating a graphical representation of an object corresponding to a segment of the speech; and displaying the graphical representation of the object on a screen of a computing device. Generating the graphical representation includes: representing a duration of the respective segment by a length of the object and representing intensity of the respective segment by a width of the object; and placing, in the graphical representation, a space between adjacent objects.
METHODS AND SYSTEMS FOR COMPUTER-GENERATED VISUALIZATION OF SPEECH
Methods, systems and apparatuses for computer-generated visualization of speech are described herein. An example method of computer-generated visualization of speech including at least one segment includes: generating a graphical representation of an object corresponding to a segment of the speech; and displaying the graphical representation of the object on a screen of a computing device. Generating the graphical representation includes: representing a duration of the respective segment by a length of the object and representing intensity of the respective segment by a width of the object; and placing, in the graphical representation, a space between adjacent objects.
METHODS AND SYSTEMS FOR CLASSIFYING AUDIO SEGMENTS OF AN AUDIO SIGNAL
The disclosed embodiments illustrate a method for classifying one or more audio segments of an audio signal. The method includes determining one or more first features of a first audio segment of the one or more audio segments. The method further includes determining one or more second features based on the one or more first features. The method includes determining one or more third features of the first audio segment, wherein each of the one or more third features is determined based on a second feature of the one or more second features of the first audio segment and at least one second feature associated with a second audio segment. Additionally, the method includes classifying the first audio segment either in an interrogative category or a non-interrogative category based on one or more of the one or more second features and the one or more third features.
METHODS AND SYSTEMS FOR CLASSIFYING AUDIO SEGMENTS OF AN AUDIO SIGNAL
The disclosed embodiments illustrate a method for classifying one or more audio segments of an audio signal. The method includes determining one or more first features of a first audio segment of the one or more audio segments. The method further includes determining one or more second features based on the one or more first features. The method includes determining one or more third features of the first audio segment, wherein each of the one or more third features is determined based on a second feature of the one or more second features of the first audio segment and at least one second feature associated with a second audio segment. Additionally, the method includes classifying the first audio segment either in an interrogative category or a non-interrogative category based on one or more of the one or more second features and the one or more third features.
Method and Apparatus for Obtaining Information from the Web
An intelligent conversation system augmenting a conversation between two or more individuals uses a speech to text block configured to convert voices of the conversation into text, a determination circuit configured to determine topics from the text of the conversation, search parameters determined by the determination circuit from the topics are sent to an Internet, search results corresponding to the search parameters are received from the Internet; and a memory configured to store the search results received from the Internet. The speech to test block is configured to convert the search results to speech. An earphone is configured to transmit the speech to one of the two or more individuals. The speech is used by one of the individuals to augment the conversation.