G10L21/003

Device to amplify and clarify voice
11508391 · 2022-11-22 ·

A voice enhancing device amplifies and clarifies the voice of a user with hypophonia or other voice issues. The device includes a collar of either rigid or a soft material that is shaped to comfortably sit on the shoulders of the user. One or more microphone arrays are adjustably mounted to the collar to capture audio of the user talking. An electronics module enhances the captured audio signal and generates an enhanced audio signal that drives at least one speaker adjustably attached to the collar. The electronic controller implements one or more of an AGC amplifier to correct amplitude variation in spoked words, adaptive filtering to actively filter out background noise, a variable attack and decay function to improve intelligibility of the spoken words, a diphthong modification function to clarify the spoken words, and an echo cancelation function to reduce echo and feedback in the enhanced audio.

UTTERANCE EVALUATION APPARATUS, UTTERANCE EVALUATION, AND PROGRAM

A stable evaluation result is obtained from a voice of speech for any sentence. A speech evaluation device (1) outputs a score for evaluating speech of an input voice signal spoken by a speaker in a first group. A feature extraction unit (11) extracts an acoustic feature from the input voice signal. A conversion unit (12) converts the acoustic feature of the input voice signal to an acoustic feature when a speaker in a second group speaks the same text as text of the input voice signal. An evaluation unit (13) calculates a score indicating a higher evaluation as a distance between the acoustic feature before the conversion and the acoustic feature after the conversion becomes shorter.

HYBRID EXPANSIVE FREQUENCY COMPRESSION FOR ENHANCING SPEECH PERCEPTION FOR INDIVIDUALS WITH HIGH-FREQUENCY HEARING LOSS
20220366921 · 2022-11-17 · ·

A method of audio signal processing comprising Hybrid Expansive Frequency Compression (hEFC) via a digital signal processor, wherein the method includes: classifying an audio signal input, wherein the audio signal input includes frication high-frequency speech energy, into two or more speech sound classes followed by selecting a form of input-dependent frequency remapping function; and performing hEFC including, re-coding of one or more input frequencies of the speech sound via the input-dependent frequency remapping function to generate an audio output signal, wherein the output signal is a representation of the audio signal input having a lower sound frequency.

AUTOMATED PIPELINE SELECTION FOR SYNTHESIS OF AUDIO ASSETS

An example method of automated selection of audio asset synthesizing pipelines includes: receiving an audio stream comprising human speech; determining one or more features of the audio stream; selecting, based on the one or more features of the audio stream, an audio asset synthesizing pipeline; training, using the audio stream, one or more audio asset synthesizing models implementing respective stages of the selected audio asset synthesizing pipeline; and responsive to determining that a quality metric of the audio asset synthesizing pipeline satisfies a predetermined quality condition, synthesizing one or more audio assets by the selected audio asset synthesizing pipeline.

Voice modulation apparatus and methods
11495207 · 2022-11-08 · ·

A modular sound modulation system that may be incorporated into a wearable harness for use beneath a costume or clothing. The system includes a sound modulation unit having an amplifier and programmable sound modulation controller. Power is provided via a wired remote power supply. Speakers are disposed at key locations to cause modulated sound to appear to come from the head of the costume. A high-fidelity microphone captured vocalizations from the wearer. The system may further include a transmitted to broadcast or store a recording of the modulated sound.

SYSTEMS AND METHODS TO ALTER VOICE INTERACTIONS
20220351740 · 2022-11-03 ·

Systems and methods are disclosed for providing voice interactions based on user context. Data is received that causes a voice interaction to be generated for output at a user device. Current user contextual data of the user device is retrieved. An audio characteristic for an utterance at a location of the user device is determined from the current user contextual data. One or more audio characteristics of the voice interaction are altered to overcome the utterance based on the determined audio characteristic. The voice interaction comprising the altered audio characteristics is outputted at the user device.

SYSTEMS AND METHODS TO ALTER VOICE INTERACTIONS
20220351740 · 2022-11-03 ·

Systems and methods are disclosed for providing voice interactions based on user context. Data is received that causes a voice interaction to be generated for output at a user device. Current user contextual data of the user device is retrieved. An audio characteristic for an utterance at a location of the user device is determined from the current user contextual data. One or more audio characteristics of the voice interaction are altered to overcome the utterance based on the determined audio characteristic. The voice interaction comprising the altered audio characteristics is outputted at the user device.

SYSTEMS AND METHODS TO ALTER VOICE INTERACTIONS
20220351741 · 2022-11-03 ·

Systems and methods are disclosed for providing voice interactions based on user context. Data is received that causes a voice interaction to be generated for output at a user device during an output time interval. In response, current user contextual data of the user device is retrieved. The voice interaction and output time interval are altered to increase consumption likelihood of the voice interaction based on the current user contextual data. The altered voice interaction is outputted at the user device during the altered output time interval.

SYSTEMS AND METHODS TO ALTER VOICE INTERACTIONS
20220351741 · 2022-11-03 ·

Systems and methods are disclosed for providing voice interactions based on user context. Data is received that causes a voice interaction to be generated for output at a user device during an output time interval. In response, current user contextual data of the user device is retrieved. The voice interaction and output time interval are altered to increase consumption likelihood of the voice interaction based on the current user contextual data. The altered voice interaction is outputted at the user device during the altered output time interval.

AUDIO REACTIVE AUGMENTED REALITY

Methods, systems, and storage media for augmenting a video are disclosed. Exemplary implementations may: receive a selection of an effect; receive user-generated content comprising video data and audio data; detect a characteristic of the audio data comprising at least a volume and/or a pitch of the audio data during a period of time; determine a series of numeric values based on the characteristic of the audio data during the period of time, individual numeric values of the series of numeric values being correlated with an amplitude of the volume and/or pitch at a discrete point within the period of time; and augment at least one of the video data and/or the audio data to include the effect based on the series of numeric values at discrete points in time within the period of time.