G10L25/93

VOICE-BASED CONTROL OF SEXUAL STIMULATION DEVICES
20230210716 · 2023-07-06 ·

A system and method for voice-based control of sexual stimulation devices. In some configurations, the system and method involve receiving voice data, analyzing the voice data to detect spoken commands, and generating control signals based on the commands. In some configurations, the system and method involve receiving voice data, analyzing the voice data for non-speech vocalizations, detecting voice stress patterns, and generating control signals based on the detected patterns. In some configurations, the analyses of the voice data are performed by machine learning algorithms which may be trained on associations between speech and non-speech vocalizations of a user while the user engages in one or more voice-based training tasks, associating speech and non-speech vocalizations with controls of the sexual stimulation device. In some configurations, machine learning algorithms are used to make the associations. In some configurations, data from other biometric sensors is included in the associations.

AUDIO EVENT DETECTION WITH WINDOW-BASED PREDICTION

A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.

AUDIO EVENT DETECTION WITH WINDOW-BASED PREDICTION

A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.

SYSTEMS AND METHODS FOR VIRTUAL MEETING SPEAKER SEPARATION
20230005495 · 2023-01-05 ·

A computer-implemented machine learning method for improving speaker separation is provided. The method comprises processing audio data to generate prepared audio data and determining feature data and speaker data from the prepared audio data through a clustering iteration to generate an audio file. The method further comprises re-segmenting the audio file to generate a speaker segment and causing to display the speaker segment through a client device.

SYSTEMS AND METHODS FOR VIRTUAL MEETING SPEAKER SEPARATION
20230005495 · 2023-01-05 ·

A computer-implemented machine learning method for improving speaker separation is provided. The method comprises processing audio data to generate prepared audio data and determining feature data and speaker data from the prepared audio data through a clustering iteration to generate an audio file. The method further comprises re-segmenting the audio file to generate a speaker segment and causing to display the speaker segment through a client device.

Recognition or synthesis of human-uttered harmonic sounds
11545143 · 2023-01-03 ·

Within each harmonic spectrum of a sequence of spectra derived from analysis of a waveform representing human speech are identified two or more fundamental or harmonic components that have frequencies that are separated by integer multiples of a fundamental acoustic frequency. The highest harmonic frequency that is also greater than 410 Hz is a primary cap frequency, which is used to select a primary phonetic note that corresponds to a subset of phonetic chords from a set of phonetic chords for which acoustic spectral is available. The spectral data can also include frequencies for primary band, secondary band (or secondary note), basal band, or reduced basal band acoustic components, which can be used to select a phonetic chord from the subset of phonetic chords corresponding to the selected primary note.

Language agnostic missing subtitle detection
11538461 · 2022-12-27 · ·

Some implementations include methods for detecting missing subtitles associated with a media presentation and may include receiving an audio component and a subtitle component associated with a media presentation, the audio component including an audio sequence, the audio sequence divided into a plurality of audio segments; evaluating the plurality of audio segments using a combination of a recurrent neural network and a convolutional neural network to identify refined speech segments associated with the audio sequence, the recurrent neural network trained based on a plurality of languages, the convolutional neural network trained based on a plurality of categories of sound; determining timestamps associated with the identified refined speech segments; and determining missing subtitles based on the timestamps associated with the identified refined speech segments and timestamps associated with subtitles included in the subtitle component.

Language agnostic missing subtitle detection
11538461 · 2022-12-27 · ·

Some implementations include methods for detecting missing subtitles associated with a media presentation and may include receiving an audio component and a subtitle component associated with a media presentation, the audio component including an audio sequence, the audio sequence divided into a plurality of audio segments; evaluating the plurality of audio segments using a combination of a recurrent neural network and a convolutional neural network to identify refined speech segments associated with the audio sequence, the recurrent neural network trained based on a plurality of languages, the convolutional neural network trained based on a plurality of categories of sound; determining timestamps associated with the identified refined speech segments; and determining missing subtitles based on the timestamps associated with the identified refined speech segments and timestamps associated with subtitles included in the subtitle component.

Wear detection
11533574 · 2022-12-20 · ·

A method is used for detecting whether a device is being worn, when the device comprises a first transducer and a second transducer. It is determined when a signal detected by at least one of the first and second transducers represents speech. It is then determined when said speech contains speech of a first acoustic class and speech of a second acoustic class. A first correlation signal is generated, representing a correlation between signals generated by the first and second transducers during at least one period when said speech contains speech of the first acoustic class. A second correlation signal is generated, representing a correlation between signals generated by the first and second transducers during at least one period when said speech contains speech of the second acoustic class. It is then determined from the first correlation signal and the second correlation signal whether the device is being worn.

Wear detection
11533574 · 2022-12-20 · ·

A method is used for detecting whether a device is being worn, when the device comprises a first transducer and a second transducer. It is determined when a signal detected by at least one of the first and second transducers represents speech. It is then determined when said speech contains speech of a first acoustic class and speech of a second acoustic class. A first correlation signal is generated, representing a correlation between signals generated by the first and second transducers during at least one period when said speech contains speech of the first acoustic class. A second correlation signal is generated, representing a correlation between signals generated by the first and second transducers during at least one period when said speech contains speech of the second acoustic class. It is then determined from the first correlation signal and the second correlation signal whether the device is being worn.