Patent classifications
G10L21/00
Altering undesirable communication data for communication sessions
This disclosure describes techniques implemented partly by a communications service for identifying and altering undesirable portions of communication data, such as audio data and video data, from a communication session between computing devices. For example, the communications service may monitor the communications session to alter or remove undesirable audio data, such as a dog barking, a doorbell ringing, etc., and/or video data, such as rude gestures, inappropriate facial expressions, etc. The communications service may stream the communication data for the communication session partly through managed servers and analyze the communication data to detect undesirable portions. The communications service may alter or remove the portions of communication data received from a first user device, such as by filtering, refraining from transmitting, or modifying the undesirable portions. The communications service may send the modified communication data to a second user device engaged in the communication session after removing the undesirable portions.
Speaker identity and content de-identification
One embodiment of the invention provides a method for speaker identity and content de-identification under privacy guarantees. The method comprises receiving input indicative of privacy protection levels to enforce, extracting features from a speech recorded in a voice recording, recognizing and extracting textual content from the speech, parsing the textual content to recognize privacy-sensitive personal information about an individual, generating de-identified textual content by anonymizing the personal information to an extent that satisfies the privacy protection levels and conceals the individual's identity, and mapping the de-identified textual content to a speaker who delivered the speech. The method further comprises generating a synthetic speaker identity based on other features that are dissimilar from the features to an extent that satisfies the privacy protection levels, and synthesizing a new speech waveform based on the synthetic speaker identity to deliver the de-identified textual content. The new speech waveform conceals the speaker's identity.
A METHOD AND SYSTEM FOR MONITORING AND ANALYSING COUGH
The method and system for monitoring cough comprises receiving audio signals or audio recordings, where said signals or audio recordings comprises one or more of silent segments, cough sound segments, speech segments and extraneous noise. The processing of said received sound signals or sound recordings comprise one or more of removing one or more speech components from speech segments to render the speech unintelligible and clipping said silent segments, wherein one or more speech components include vowel sounds. Further processing of said received audio signals or audio recordings further comprises compressing said audio signals or audio recordings. In the alternative, processing of audio signals or audio recordings comprises compressing a resultant signal after said removal of one or more speech components and/or clipping of silent segments from said audio signals.
VOICE ASSISTANT SYSTEM WITH AUDIO EFFECTS RELATED TO VOICE COMMANDS
Voice command type entry used as a basis for applying “audio effects” (see definition herein), “sound effects” (see definition herein) and/or audio edits (see definition herein) to a sound signal. This may be done so that the various types of instructed audio processing evoke, in typical listeners, a desired sentiment or mood. Artificial intelligence may be used to accomplish this objective.
Methods, Apparatus and Systems for Determining Reconstructed Audio Signal
According to an aspect of the present invention, a method for reconstructing an audio signal having a baseband portion and a highband portion is disclosed. The method includes obtaining a decoded baseband audio signal by decoding an encoded audio signal and obtaining a plurality of subband signals by filtering the decoded baseband audio signal. The method further includes generating a high-frequency reconstructed signal by copying a number of consecutive subband signals of the plurality of subband signals and obtaining an envelope adjusted high-frequency signal. The method further includes generating a noise component based on a noise parameter. Finally, the method includes adjusting a phase of the high-frequency reconstructed signal and obtaining a time-domain reconstructed audio signal by combining the decoded baseband audio signal and the combined high-frequency signal to obtain a time-domain reconstructed audio signal.
Personal audio assistant device and method
A system includes a first microphone that captures audio, a communication module communicatively coupled to the first microphone, a logic circuit communicatively coupled to the first microphone and communication module, a speaker operatively coupled to the logic circuit, and an interaction element. The interaction element and logic circuit are configured to initiate control of audio content for output from the speaker in response to at least one voice command detected in captured audio. Other embodiments are disclosed.
Personal audio assistant device and method
A system includes a first microphone that captures audio, a communication module communicatively coupled to the first microphone, a logic circuit communicatively coupled to the first microphone and communication module, a speaker operatively coupled to the logic circuit, and an interaction element. The interaction element and logic circuit are configured to initiate control of audio content for output from the speaker in response to at least one voice command detected in captured audio. Other embodiments are disclosed.
Systems, methods, and storage media for performing actions based on utterance of a command
Systems and methods for recognizing and executing spoken commands using speech recognition. Exemplary implementations may: store actionable phrases; obtain audio information representing sound captured by a mobile client computing platform associated with a user; detect any spoken instances of a predetermined keyword present in the sound represented by the audio information; perform speech recognition on the sound represented by the audio information; identify an utterance of an individual actionable phrase in speech temporally adjacent to the spoken instance of the predetermined keyword that is present in the sound represented by the audio information; perform natural language processing to identify an individual command uttered temporally adjacent to the spoken instance of the predetermined keyword that is present in the sound represented by the audio information; and effectuate performance of instructions corresponding to the command.
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND NON-TRANSITORY RECORDING MEDIUM
An information processing apparatus includes circuitry to acquire behavior information of a plurality of users having a conversation, generate sound data based on the behavior information, and cause an output device to output an ambient sound based on the sound data.
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND NON-TRANSITORY RECORDING MEDIUM
An information processing apparatus includes circuitry to acquire behavior information of a plurality of users having a conversation, generate sound data based on the behavior information, and cause an output device to output an ambient sound based on the sound data.