Patent classifications
G10L21/00
SOUND MODIFICATION OF SPEECH IN AUDIO SIGNALS OVER MACHINE COMMUNICATION CHANNELS
Apparatus, systems, articles of manufacture, and methods to modify sound of speech in an audio signal are disclosed. An example apparatus includes processor circuitry to execute instructions to: identify a first portion of a keyword in the speech of the audio signal during generation of the speech; determine a waveform to replace a second portion of the keyword; and transform the keyword into a different word by introducing the waveform into the audio signal.
Method and apparatus for identifying acoustic background environments based on time and speed to enhance automatic speech recognition
Disclosed are systems, methods, and computer readable media for identifying an acoustic environment of a caller. The method embodiment comprises analyzing acoustic features of a received audio signal from a caller, receiving meta-data information based on a previously recorded time and speed of the caller, classifying a background environment of the caller based on the analyzed acoustic features and the meta-data, selecting an acoustic model matched to the classified background environment from a plurality of acoustic models, and performing speech recognition as the received audio signal using the selected acoustic model.
System and method for speech-enabled access to media content by a ranked normalized weighted graph using speech recognition
Disclosed herein are systems, methods, and computer-readable storage media for generating a speech recognition model for a media content retrieval system. The method causes a computing device to retrieve information describing media available in a media content retrieval system, construct a graph that models how the media are interconnected based on the retrieved information, rank the information describing the media based on the graph, and generate a speech recognition model based on the ranked information. The information can be a list of actors, directors, composers, titles, and/or locations. The graph that models how the media are interconnected can further model pieces of common information between two or more media. The method can further cause the computing device to weight the graph based on the retrieved information, wherein the weighted graph is further normalized to yield a normalized weighted graph to help with speech query searching of media content using speech recognition. The graph can further model relative popularity information in the list. The method can rank information based on a PageRank algorithm.
Coding device, decoding device, and method and program thereof
A coding method and a decoding method are provided which can use in combination a predictive coding and decoding method which is a coding and decoding method that can accurately express coefficients which are convertible into linear prediction coefficients with a small code amount and a coding and decoding method that can obtain correctly, by decoding, coefficients which are convertible into linear prediction coefficients of the present frame if a linear prediction coefficient code of the present frame is correctly input to a decoding device. A coding device includes: a predictive coding unit that obtains a first code by coding a differential vector formed of differentials between a vector of coefficients which are convertible into linear prediction coefficients of more than one order of the present frame and a prediction vector containing at least a predicted vector from a past frame, and obtains a quantization differential vector corresponding to the first code; and a non-predictive coding unit that generates a second code by coding a correction vector which is formed of differentials between the vector of the coefficients which are convertible into the linear prediction coefficients of more than one order of the present frame and the quantization differential vector or formed of some of elements of the differentials.
Systems and methods for adjusting dubbed speech based on context of a scene
Systems and methods are disclosed herein for detecting dubbed speech in a media asset and receiving metadata corresponding to the media asset. The systems and methods may determine a plurality of scenes in the media asset based on the metadata, retrieve a portion of the dubbed speech corresponding to the first scene, and process the retrieved portion of the dubbed speech corresponding to the first scene to identify a speech characteristic of a character featured in the first scene. Further, the systems and methods may determine whether the speech characteristic of the character featured in the first scene matches the context of the first scene, and if the match fails, perform a function to adjust the portion of the dubbed speech so that the speech characteristic of the character featured in the first scene snatches the context of the first scene.
Home appliance having speech recognition function
The present disclosure relates to a home appliance capable of being operated by speech of a user. The home appliance includes a main body forming an outer appearance, a microphone including at least one sensing portion disposed to direct to the front of the main body to detect speech of a user, and a speaker unit disposed to be spaced apart from the microphone unit by a predetermined distance.
Voiceprint authentication method and apparatus
The present disclosure provides a voiceprint authentication method and a voiceprint authentication apparatus. The method includes: displaying a first character string to a user, in which the first character string includes a predilection character preset by the user, and the predilection character is displayed as a symbol corresponding to the predilection character in the first character string; obtaining a speech of the first character string read by the user; obtaining a first voiceprint identity vector of the speech of the first character string; comparing the first voiceprint identity vector with a second voiceprint identity vector registered by the user to determine a result of a voiceprint authentication.
Rate convertor
Embodiments of the invention may be used to implement a rate converter that includes: 6 channels in forward (audio) path, each channel having a 24-bit signal path per channel, an End-to-end SNR of 110 dB, all within the 20 Hz to 20 KHz bandwidth. Embodiment may also be used to implement a rate converter having: 2 channels in a reverse path, such as for voice signals, 16-bit signal path per channel, an End-to-end SNR of 93 dB, all within 20 Hz to 20 KHz bandwidth. The rate converter may include sample rates such as 8, 11.025, 12, 16, 22.05, 24, 32 44.1, 48, and 96 KHz. Further, rate converters according to embodiments may include a gated clock in low-power mode to conserve power.
Voice-controlled three-dimensional fabrication
An additive three-dimensional fabrication system includes voice control for user interaction. This voice-controlled interface can enable a variety of voice-controlled functions and operations, while supporting interactions specific to consumer-oriented fabrication processes.
Analyzing speech delivery
In an aspect of the present disclosure, a method for analyzing the speech delivery of a user is disclosed including presenting to the user a plurality of speech delivery analysis criteria, receiving from the user a selection of at least one of the speech delivery analysis criterion, receiving, from at least one sensing device, speech data captured by the at least one sensing device during the delivery of a speech by the user, transmitting the speech data and the selected at least one speech delivery analysis criterion to an analysis engine for analysis based on the selected at least one speech delivery analysis criteria, receiving, from the analysis engine an analysis report for the speech data, the analysis report comprising an analysis of the speech data performed by the analysis engine based on the selected at least one criterion, and presenting to the user the analysis report.