Patent classifications
G10L21/14
APPARATUS AND METHOD FOR PROCESSING SENSING DATA
According to an embodiment of the present disclosure, an apparatus for processing sensing data comprises an amplifier amplifying analog sensing data inputted from an outside source, an analog-digital converter converting the amplified analog sensing data into digital sensing data, a micro controller unit (MCU) including a signal modulator modulating the digital sensing data to a data wave having a sound waveform, transmittable to a sound input port of a terminal, and an output unit having a sound output terminal corresponding to the sound input port and outputting the data wave to the sound input port through the sound output terminal, wherein the data wave inputted to the sound input port is converted to an information value corresponding to the analog sensing data, and the information value is displayed on the terminal.
Facilitating inferential sound recognition based on patterns of sound primitives
The disclosed embodiments provide a system that performs a sound-recognition operation. During operation, the system recognizes a sequence of sound primitives in an audio stream, wherein a sound primitive is associated with a semantic label comprising one or more words that describe a sound characterized by the sound primitive. Next, the system feeds the sequence of sound primitives into a finite-state automaton that recognizes events associated with sequences of sound primitives. Finally, the system feeds the recognized events into an output system that generates an output associated with the recognized events to be displayed to a user.
Facilitating inferential sound recognition based on patterns of sound primitives
The disclosed embodiments provide a system that performs a sound-recognition operation. During operation, the system recognizes a sequence of sound primitives in an audio stream, wherein a sound primitive is associated with a semantic label comprising one or more words that describe a sound characterized by the sound primitive. Next, the system feeds the sequence of sound primitives into a finite-state automaton that recognizes events associated with sequences of sound primitives. Finally, the system feeds the recognized events into an output system that generates an output associated with the recognized events to be displayed to a user.
DISPLAY SYSTEM OF MOBILE AUDIO DEVICES
The object is to provide a display system of mobile audio devices which enables comparative display between frequency range of audio source read externally and memorize by the main body of the mobile audio devices and frequency range replayed by the main body of the mobile audio devices.
The display system of audio devices comprises the audio source file 18 memorizing outer audio source data, the replay unit 14 of audio source data of the audio source file 18, the device data memory 16 of data of replay devices, the controller 15 controlling among circuits, the sampling rate output 21 of audio source data of said audio source file 18, the replay sampling rate output 23 outputting said replay sampling rates, the audio source output display part 12 displaying said sampling rate output 21, and the replay output display part 13 displaying said replay sampling rate output 23.
REAL-TIME ADAPTIVE AUDIO SOURCE SEPARATION
Methods and systems for audio source separation in real-time are described. In an embodiment, the present disclosure describes reading and decoding an audio source into PCM samples, fragmenting Pulse Code Modulation (PCM) samples into fragments, transforming fragments into spectrograms, performing audio source separation using a training database that includes a training dictionary and non-negative matrix factorization (NMF) to generate a set of component signals, and streaming the component signals to a playback engine. In an embodiment, a semantic equalizer graphical user allows for fading of individual component signals.
REAL-TIME ADAPTIVE AUDIO SOURCE SEPARATION
Methods and systems for audio source separation in real-time are described. In an embodiment, the present disclosure describes reading and decoding an audio source into PCM samples, fragmenting Pulse Code Modulation (PCM) samples into fragments, transforming fragments into spectrograms, performing audio source separation using a training database that includes a training dictionary and non-negative matrix factorization (NMF) to generate a set of component signals, and streaming the component signals to a playback engine. In an embodiment, a semantic equalizer graphical user allows for fading of individual component signals.
System and method for signal decomposition, analysis and reconstruction
A system and method for representing quasi-periodic waveforms, for example, representing a plurality of limited decompositions of the quasi-periodic waveform. Each decomposition includes a first and second amplitude value and at least one time value. In some embodiments, each of the decompositions is phase adjusted such that the arithmetic sum of the plurality of limited decompositions reconstructs the quasi-periodic waveform. Data-structure attributes are created and used to reconstruct the quasi-periodic waveform. Features of the quasi-periodic wave are tracked using pattern-recognition techniques. The fundamental rate of the signal (e.g., heartbeat) can vary widely, for example by a factor of 2-3 or more from the lowest to highest frequency. To get quarter-phase representations of a component (e.g., lowest frequency “rate” component) that varies over time (by a factor of two to three) many overlapping filters use bandpass and overlap parameters that allow tracking the component's frequency version on changing quarter-phase basis.
THREE-DIMENSIONAL FACE ANIMATION FROM SPEECH
A method for training a three-dimensional model face animation model from speech, is provided. The method includes determining a first correlation value for a facial feature based on an audio waveform from a first subject, generating a first mesh for a lower portion of a human face, based on the facial feature and the first correlation value, updating the first correlation value when a difference between the first mesh and a ground truth image of the first subject is greater than a pre-selected threshold, and providing a three-dimensional model of the human face animated by speech to an immersive reality application accessed by a client device based on the difference between the first mesh and the ground truth image of the first subject. A non-transitory, computer-readable medium storing instructions to cause a system to perform the above method, and the system, are also provided.
THREE-DIMENSIONAL FACE ANIMATION FROM SPEECH
A method for training a three-dimensional model face animation model from speech, is provided. The method includes determining a first correlation value for a facial feature based on an audio waveform from a first subject, generating a first mesh for a lower portion of a human face, based on the facial feature and the first correlation value, updating the first correlation value when a difference between the first mesh and a ground truth image of the first subject is greater than a pre-selected threshold, and providing a three-dimensional model of the human face animated by speech to an immersive reality application accessed by a client device based on the difference between the first mesh and the ground truth image of the first subject. A non-transitory, computer-readable medium storing instructions to cause a system to perform the above method, and the system, are also provided.
METHOD AND SYSTEM FOR LEARNING AND USING LATENT-SPACE REPRESENTATIONS OF AUDIO SIGNALS FOR AUDIO CONTENT-BASED RETRIEVAL
A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.