Patent classifications
G10L25/75
AUDIO TRANSLATOR
Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.
AUDIO TRANSLATOR
Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.
Method and apparatus for measuring distortion and muffling of speech by a face mask
Systems and methods are provided for measuring the distortion and muffling caused by a face mask. For example, in one embodiment a simulated voice source produces a sound. The sound is then acoustically coupled to a simulated vocal tract and a face mask. A microphone receives sound and produces a signal and an analyzer receives the signal from the microphone. A manikin head or other facial structure may also simulate fitting of the face mask onto a face. The analyzer may further produce a quantitative assessment of the distortion and muffling of the face mask, for example, by comparing at least one spectrum obtained with the face mask and at least one spectrum obtained without the face mask.
Method and apparatus for measuring distortion and muffling of speech by a face mask
Systems and methods are provided for measuring the distortion and muffling caused by a face mask. For example, in one embodiment a simulated voice source produces a sound. The sound is then acoustically coupled to a simulated vocal tract and a face mask. A microphone receives sound and produces a signal and an analyzer receives the signal from the microphone. A manikin head or other facial structure may also simulate fitting of the face mask onto a face. The analyzer may further produce a quantitative assessment of the distortion and muffling of the face mask, for example, by comparing at least one spectrum obtained with the face mask and at least one spectrum obtained without the face mask.
UTTERANCE SECTION DETECTION DEVICE, UTTERANCE SECTION DETECTION METHOD, AND STORAGE MEDIUM
Provided is an utterance section detection device including: a first lip shape estimation module configured to estimate a first lip shape of an utterer, based on sound data including a voice of the utterer, a second lip shape estimation module configured to estimate a second lip shape of the utterer, based on image data in which an image of at least a face of the utterer is photographed; and an utterance section detection module configured to detect an utterance section in which the utterer is vocalizing in the sound data, based on changes in the first lip shape and changes in the second lip shape.
UTTERANCE SECTION DETECTION DEVICE, UTTERANCE SECTION DETECTION METHOD, AND STORAGE MEDIUM
Provided is an utterance section detection device including: a first lip shape estimation module configured to estimate a first lip shape of an utterer, based on sound data including a voice of the utterer, a second lip shape estimation module configured to estimate a second lip shape of the utterer, based on image data in which an image of at least a face of the utterer is photographed; and an utterance section detection module configured to detect an utterance section in which the utterer is vocalizing in the sound data, based on changes in the first lip shape and changes in the second lip shape.
Processing audio with an audio processing operation
An apparatus or method to allow a user to control an audio processing operation of an internal and/or external microphone(s). The method includes providing a configurable user interface which defines an audio processing operation. The status of the audio processing operation can be defined through interaction with the user interface. Capture of sound with the microphone(s) may be controlled based on the status of the audio processing operation.
Processing audio with an audio processing operation
An apparatus or method to allow a user to control an audio processing operation of an internal and/or external microphone(s). The method includes providing a configurable user interface which defines an audio processing operation. The status of the audio processing operation can be defined through interaction with the user interface. Capture of sound with the microphone(s) may be controlled based on the status of the audio processing operation.
Processing audio with a visual representation of an audio source
An apparatus or method to give a user information about, and control of, internal and/or external microphone(s) so that the user can make adjustments to audio recording in real time. The method includes choosing microphones, displaying visual representations of microphones, capturing an acoustic source using a microphone, allowing a user to interact with a visual representation of a microphone to select or deselect the microphone, and processing of the audio signal from the acoustic source captured by a microphone.
Processing audio with a visual representation of an audio source
An apparatus or method to give a user information about, and control of, internal and/or external microphone(s) so that the user can make adjustments to audio recording in real time. The method includes choosing microphones, displaying visual representations of microphones, capturing an acoustic source using a microphone, allowing a user to interact with a visual representation of a microphone to select or deselect the microphone, and processing of the audio signal from the acoustic source captured by a microphone.