G10L21/00

Insertion of Sound Objects Into a Downmixed Audio Signal

A method for inserting a first audio signal into a bitstream which comprises a downmix signal and associated bitstream metadata is described. The downmix signal and associated bitstream metadata are indicative of an audio program comprising a plurality of spatially diverse audio signals. The downmix signal comprises at least one audio channel and the bitstream metadata comprise upmix metadata for reproducing the plurality of spatially diverse audio signals from the at least one channel. The method comprises mixing the first audio signal with the at least one audio channel to generate a modified downmix signal. The method further comprises generating an output bitstream comprising the modified downmix signal and the associated modified bitstream metadata indicative of a modified audio program comprising a plurality of modified spatially diverse audio signals.

Systems and methods for identifying human emotions and/or mental health states based on analyses of audio inputs and/or behavioral data collected from computing devices

Systems and methods are provided for analyzing voice-based audio inputs. A voice-based audio input associated with a user (e.g., wherein the voice-based audio input is a prompt or a command) is received and measures of one or more features are extracted. One or more parameters are calculated based on the measures of the one or more features. The occurrence of one or more mistriggers is identified by inputting the one or more parameters into a predictive model. Further, systems and methods are provided for identifying human mental health states using mobile device data. Mobile device data (including sensor data) associated with a mobile device corresponding to a user is received. Measurements are derived from the mobile device data and input into a predictive model. The predictive model is executed and outputs probability values of one or more symptoms associated with the user.

Dynamically adapted pitch correction based on audio input

Systems and methods for adjusting pitch of an audio signal include detecting input notes in the audio signal, mapping the input notes to corresponding output notes, each output note having an associated upper note boundary and lower note boundary, and modifying at least one of the upper note boundary and the lower note boundary of at least one output note in response to previously received input notes. Pitch of the input notes may be shifted to match an associated pitch of corresponding output notes. Delay of the pitch shifting process may be dynamically adjusted based on detected stability of the input notes.

Position directed acoustic array and beamforming methods

Methods and systems are provided for receiving desired sounds. The system includes a position sensor configured to determine an occupant position of an occupant engaging in speech within a defined space and transmit the speaking occupant position. A plurality of microphones are configured to receive sound from within the defined space and transmit audio signals corresponding to the received sound. A processor, in communication with the position sensor and the microphones, is configured to receive the speaking occupant position and the audio signals, apply a beamformer to the audio signals to direct a microphone beam toward the occupant position, and generate a beamformer output signal.

Position directed acoustic array and beamforming methods

Methods and systems are provided for receiving desired sounds. The system includes a position sensor configured to determine an occupant position of an occupant engaging in speech within a defined space and transmit the speaking occupant position. A plurality of microphones are configured to receive sound from within the defined space and transmit audio signals corresponding to the received sound. A processor, in communication with the position sensor and the microphones, is configured to receive the speaking occupant position and the audio signals, apply a beamformer to the audio signals to direct a microphone beam toward the occupant position, and generate a beamformer output signal.

Direct selection of audio source

When an indication is received from a user input, a name of an audio source from an ordered list of audio sources to which a headset is currently connected is output. Whenever a successive indication is received within a predefined amount of time, the name of the next wireless audio source in the list is output. If the next wireless audio source in the list is the last wireless audio source in the list, and the successive indication from the user input is received before the elapsed time exceeds the predefined value, the name of the audio source to which the headset is currently connected is output as the next selected wireless audio source in the list. When an amount of time greater than the predefined value elapses without a successive indication from the user input, the last wireless audio source that had its name output is connected.

Predicting individual or crowd behavior based on graphical text analysis of point recordings of audible expressions

Embodiments relate to determining a crowd behavior. A method of determining a crowd behavior is provided. The method collects, at one or more recording points in a crowd of individuals, audible expressions that the individuals of the crowd make. The method generates a graph of the audible expressions as the audible expressions are collected from the individuals. The method determines a crowd behavior by performing a graphical text analysis on the graph. The method outputs an indication of the crowd behavior to trigger a crowd control measure.

Devices and methods for reducing the processing time of the convergence of a spatial filter
09749746 · 2017-08-29 · ·

A noise-suppression device includes an input buffer, a spatial filter, a delay buffer, and a controller. The input buffer stores sound data. The spatial filter generates processed data by using an internal adaptive control according to a control signal. The delay buffer stores the processed data. The controller operates in either one of a training stage, a flushing stage, or a normal stage and generates the control signal. When the controller operates in the training stage, the spatial filter receives the sound data from the input buffer to generate the processed data which is continuously processed by the spatial filter and then stored in the delay buffer over and over again until the internal adaptive control is converged.

Devices and methods for reducing the processing time of the convergence of a spatial filter
09749746 · 2017-08-29 · ·

A noise-suppression device includes an input buffer, a spatial filter, a delay buffer, and a controller. The input buffer stores sound data. The spatial filter generates processed data by using an internal adaptive control according to a control signal. The delay buffer stores the processed data. The controller operates in either one of a training stage, a flushing stage, or a normal stage and generates the control signal. When the controller operates in the training stage, the spatial filter receives the sound data from the input buffer to generate the processed data which is continuously processed by the spatial filter and then stored in the delay buffer over and over again until the internal adaptive control is converged.

Network computer system to generate voice response communications

A network computer system for managing a network service (e.g., a transport service) can include a voice-assistant subsystem for generating dialogues and performing actions for service providers of the network service. The network computer system can receive, from a user device, a request for the network service. In response, the network computer system can identify a service provider and transmit an invitation to the provider device of the service provider. In response to the identification of the service provider for the request, the voice-assistant subsystem can trigger an audio voice prompt to be presented on the provider device and a listening period during which the provider device monitors for an audio input from the service provider. Based on the audio input captured by the provider device, the network computer system can determine an intent corresponding to whether the service provider accepts or declines the invitation.