Patent classifications
G10L21/00
SYSTEMS AND METHODS FOR GENERATING BOOKMARK VIDEO FINGERPRINTS
Systems and methods for replacing original media bookmarks of at least a portion of a digital media file with replacement bookmarks is described. A media fingerprint engine detects the location of the original fingerprints associated with the portion of the digital media file and a region analysis algorithm characterizes regions of media file spanning the location of the original bookmarks by data class types. The replacement bookmarks are associated with the data class types and are overwritten or otherwise are substituted for the original bookmarks. The replacement bookmarks then are subjected to a fingerprint matching algorithm that incorporates media timeline and media related metadata.
SYSTEMS AND METHODS FOR GENERATING BOOKMARK VIDEO FINGERPRINTS
Systems and methods for replacing original media bookmarks of at least a portion of a digital media file with replacement bookmarks is described. A media fingerprint engine detects the location of the original fingerprints associated with the portion of the digital media file and a region analysis algorithm characterizes regions of media file spanning the location of the original bookmarks by data class types. The replacement bookmarks are associated with the data class types and are overwritten or otherwise are substituted for the original bookmarks. The replacement bookmarks then are subjected to a fingerprint matching algorithm that incorporates media timeline and media related metadata.
System and apparatus for real-time speech enhancement in noisy environments
A system may perform speech enhancement of audio data in real-time by suppressing noise components that are present in the audio data while preserving speech components. The system may include an in-ear module and a separate signal processing module that is wirelessly communicatively coupled to the in-ear module. The system may include non-negative matrix factorization (NMF) dictionaries capable of identifying frequency band components associated with speech and frequency band components associated with noise. The NMF dictionaries may be trained using voice samples and noise samples. The NMF dictionaries may be applied to noisy speech data to produce an NMF representation of the speech data which may then be applied using a dynamic mask to the noisy speech data in order to suppress the noise components of the noisy speech data and produce speech enhanced data.
METHOD AND APPARATUS FOR AUDIO DATA PROCESSING
Embodiments of the disclosure provide methods and apparatuses processing audio data. The method can include: acquiring audio data by an audio capturing device, determining feature information of an enclosure in which the audio capturing device is located, and reverberating the feature information into the audio data.
METHOD AND APPARATUS FOR AUDIO DATA PROCESSING
Embodiments of the disclosure provide methods and apparatuses processing audio data. The method can include: acquiring audio data by an audio capturing device, determining feature information of an enclosure in which the audio capturing device is located, and reverberating the feature information into the audio data.
Method and system for mixing multiple sound sources
A multiple sound source mixing method includes dividing a plurality of sound source data into segments each with a desired length; sequentially inputting sound source data of a corresponding segment for each segment through a desired number of nodes with respect to the plurality of sound source data and mixing the input sound source data into a single piece of sound source data; and concatenating the sound source data mixed for the respective segments.
Method and system for mixing multiple sound sources
A multiple sound source mixing method includes dividing a plurality of sound source data into segments each with a desired length; sequentially inputting sound source data of a corresponding segment for each segment through a desired number of nodes with respect to the plurality of sound source data and mixing the input sound source data into a single piece of sound source data; and concatenating the sound source data mixed for the respective segments.
SIGNAL PROCESSING APPARATUS AND METHOD, AND PROGRAM
The present technology relates to a signal processing apparatus and method, and a program that make it possible to obtain high-sound-quality signals even with a small processing amount. A signal processing apparatus includes a selecting section that is supplied with a plurality of audio signals and selects an audio signal to be subjected to a sound quality enhancement process, and a sound-quality-enhancement processing section that performs the sound quality enhancement process on the audio signal selected by the selecting section. The present technology may be applied to a portable terminal.
SIGNAL PROCESSING APPARATUS AND METHOD, AND PROGRAM
The present technology relates to a signal processing apparatus and method, and a program that make it possible to obtain high-sound-quality signals even with a small processing amount. A signal processing apparatus includes a selecting section that is supplied with a plurality of audio signals and selects an audio signal to be subjected to a sound quality enhancement process, and a sound-quality-enhancement processing section that performs the sound quality enhancement process on the audio signal selected by the selecting section. The present technology may be applied to a portable terminal.
Iterative training for text-image-layout transformer
Disclosed herein is a system and method for Natural Language Processing (NLP) of real world documents. the system and method combines various models not previously combined and overcomes the challenges of this combination. Models include an encoder-decoder model, a spatial model, and a multi-modal model. An iterative training process receives documents and generates outputs, wherein the iterative training process comprises enabling information retrieval from documents without training data.