G10L21/043

Audio modulation for an audio interface

A method, apparatus, system, and computer program product for generating an audio communication. An urgency for a user is determined by a computer system in response to detecting a trigger event in a verbal communication from the user. A frequency modulator is selected by the computer system from a plurality of frequency modulators based on the urgency determined to form a selected frequency modulator. A frequency of words in an audio communication is modulated by the computer system using the selected frequency modulator to form a modulated audio communication, wherein the modulated audio communication comprises a natural language response generated in response to the trigger event. The modulated audio communication is sent by the computer system to an audio output device.

Systems and methods for generating a graphical representation of audio-file playback during playback manipulation

Systems and methods for generating a graphical representation of audio-file playback during playback manipulation are provided. The system may include a processor that performs a method including displaying a waveform during audio-file playback by scrolling the waveform from a right to a left portion of a display. The method includes receiving a command to manipulate the audio-file playback and displaying a first half of the waveform corresponding to the manipulated audio-file playback until a command is received to resume the audio-file playback. The first half of the waveform is a portion of the waveform adjacent to a horizontal or vertical axis. The method includes simultaneously displaying a second half of the waveform and displaying the first half of the waveform corresponding to the manipulated audio-file playback. The second half of the waveform is on an opposite side of the axis from the first half of the waveform.

ENCODED FEATURES AND RATE-BASED AUGMENTATION BASED SPEECH AUTHENTICATION

In some examples, with respect to encoded features and rate-based augmentation based speech authentication, a plurality of features of a registration speech signal for a user that is to be registered may be extracted. A speech rate of the registration speech signal may be modified to generate a rate-adjusted speech signal, and a plurality of features of the rate-adjusted speech signal may be extracted. The user may be registered by training, based on the plurality of extracted features of the registration speech signal and the plurality of extracted features of the rate-adjusted speech signal, a machine learning model. Further, based on the trained machine learning model, a determination may be made as to whether an authentication speech signal is authentic to authenticate the registered user.

ENCODED FEATURES AND RATE-BASED AUGMENTATION BASED SPEECH AUTHENTICATION

In some examples, with respect to encoded features and rate-based augmentation based speech authentication, a plurality of features of a registration speech signal for a user that is to be registered may be extracted. A speech rate of the registration speech signal may be modified to generate a rate-adjusted speech signal, and a plurality of features of the rate-adjusted speech signal may be extracted. The user may be registered by training, based on the plurality of extracted features of the registration speech signal and the plurality of extracted features of the rate-adjusted speech signal, a machine learning model. Further, based on the trained machine learning model, a determination may be made as to whether an authentication speech signal is authentic to authenticate the registered user.

Systems and methods for generating a visual color display of audio-file data

Systems and methods for generating a visual color display of audio-file data are provided. The system includes a processor that performs a method including receiving audio-file data; generating filtered-audio data by processing the audio-file data by frequency-band filters. The frequency band filters have different frequency bands. The method includes generating one or more waveforms corresponding to the filtered-audio data and displaying the waveforms superimposed in unique color relative to one another. The method includes downsampling the waveforms. The method includes processing the waveforms through an envelope detector. The method includes processing the waveforms through an expander and applying a gain factor. The waveforms have transparency levels at sections that are proportional or inversely proportional to amplitudes at the sections.

Systems and methods for intelligent voice activation for auto-mixing

Embodiments allow for an auto-mixer to gate microphones on and off based on speech detection, without losing or discarding the speech received during the speech recognition period. An example method includes receiving and storing an input audio signal. The method also includes determining, based on a first segment of the input audio signal, that the input audio signal comprises speech, and determining a delay between the input audio signal and a corresponding output audio signal provided to a speaker. The method also includes reducing the delay, wherein reducing the delay comprises removing one or more segments of the stored input audio signal to create a time-compressed audio signal and providing the time-compressed audio signal as the corresponding output audio signal. The method also includes determining that the delay is less than a threshold duration, and responsively providing the input audio signal as the corresponding output audio signal.

Localized and standalone semi-randomized character conversations

Techniques for randomized device interaction are provided. A first communication pattern is selected, with at least a degree of randomness, from a plurality of communication patterns, where each of the plurality of communication patterns specifies one or more audio profiles. A first audio profile specified in the first communication pattern is identified. A first portion of audio is extracted from a first audio file with at least a degree of randomness, and the first portion of audio is modified based on the first audio profile. Finally, the first modified portion of audio is outputted by a first device.

SYSTEMS AND METHODS FOR GENERATING A VISUAL COLOR DISPLAY OF AUDIO-FILE DATA

Systems and methods for generating a visual color display of audio-file data are provided. The system includes a processor that performs a method including receiving audio-file data; generating filtered-audio data by processing the audio-file data by frequency-band filters. The frequency band filters have different frequency bands. The method includes generating one or more waveforms corresponding to the filtered-audio data and displaying the waveforms superimposed in unique color relative to one another. The method includes downsampling the waveforms. The method includes processing the waveforms through an envelope detector. The method includes processing the waveforms through an expander and applying a gain factor. The waveforms have transparency levels at sections that are proportional or inversely proportional to amplitudes at the sections.

Method and System for Providing A Speech-Based Service, in Particular for the Control of Room Control Elements in Buildings
20210043332 · 2021-02-11 · ·

Various embodiments of the teachings herein include methods and systems for providing a speech-based service for the control of room control elements in buildings. Speech instructions are received by means of an audio device. The audio device is configured to analyze the received speech instructions, to convert them into corresponding operating commands for room control elements for the control of, in particular, HVAC devices (e.g. field devices) in a building and to pass them on to the corresponding room control elements. Before the receipt of the speech instructions by the audio device, the identity of the sender (user) of the speech instructions is anonymized by means of an anonymization service.

Intelligent exercise music synchronization

A method, computer system, and a computer program product for intelligently synchronizing exercise music for an instructor based group workout is provided. The present invention may include identifying at least one goal workout. The present invention may then include receiving a plurality of verbal cues associated with an instructor and a plurality of nonverbal cues associated with the instructor. The present invention may also include analyzing the received plurality of verbal cues and the received plurality of nonverbal cues. The present invention may further include generating the exercise music based on the analyzed plurality of verbal cues and analyzed plurality of nonverbal cues.