Patent classifications
G10L25/84
Adaptive energy limiting for transient noise suppression
The present disclosure describes aspects of adaptive energy limiting for transient noise suppression. In some aspects, an adaptive energy limiter sets a limiter ceiling for an audio signal to full scale and receives a portion of the audio signal. For the portion of the audio signal, the adaptive energy limiter determines a maximum amplitude and evaluates the portion with a neural network to provide a voice likelihood estimate. Based on the maximum amplitude and the voice likelihood estimate, the adaptive energy limiter determines that the portion of the audio signal includes noise. In response to determining that the portion of the audio signal includes noise, the adaptive energy limiter decreases the limiter ceiling and provides the limiter ceiling to a limiter module effective to limit an amount of energy of the audio signal. This may be effective to prevent audio signals from carrying full energy transient noise into conference audio.
ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF
An electronic apparatus includes: a memory storing a first threshold value and a second threshold value corresponding to a receiving direction of a wake-up word, a sound receiver comprising sound receiving circuitry, and a processor configured to: identify a receiving direction of the sound based on a sound received through the sound receiver, based on a similarity between sound data obtained in response to the received sound and the wake-up word being greater than or equal to the first threshold value corresponding to the identified receiving direction, perform voice recognition for a subsequent sound received through the sound receiver, and based on the similarity being less than the first threshold value and greater than or equal to the second threshold value, change the first threshold value.
ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF
An electronic apparatus includes: a memory storing a first threshold value and a second threshold value corresponding to a receiving direction of a wake-up word, a sound receiver comprising sound receiving circuitry, and a processor configured to: identify a receiving direction of the sound based on a sound received through the sound receiver, based on a similarity between sound data obtained in response to the received sound and the wake-up word being greater than or equal to the first threshold value corresponding to the identified receiving direction, perform voice recognition for a subsequent sound received through the sound receiver, and based on the similarity being less than the first threshold value and greater than or equal to the second threshold value, change the first threshold value.
NOISE DETECTOR FOR TARGETED APPLICATION OF NOISE REMOVAL
Techniques for performing conditional or controlled noise removal from audio that may contain background noise. The techniques involve obtaining audio from an environment that may have one or more unwanted noise sources, and converting the audio to digital audio data. The digital audio data is analyzed to detect whether there is noise in the audio. When noise is detected in the audio, noise removal is performed on the digital audio data to remove the noise from the audio. When noise is not detected in the audio, the digital audio data is further processed without performing noise removal on the digital audio data.
Method and apparatus for audio data processing
Embodiments of the disclosure provide methods and apparatuses processing audio data. The method can include: acquiring audio data by an audio capturing device, determining feature information of an enclosure in which the audio capturing device is located, and reverberating the feature information into the audio data.
Method and apparatus for audio data processing
Embodiments of the disclosure provide methods and apparatuses processing audio data. The method can include: acquiring audio data by an audio capturing device, determining feature information of an enclosure in which the audio capturing device is located, and reverberating the feature information into the audio data.
INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
Provided is an information processing system including: an information processing device (20) and a playback device (10), the information processing device including: a first detection unit (204) that detects, from collected sound, audio processing superimposed on the sound by the playback device; a specifying unit (206) that specifies an utterance subject of the sound on the basis of the audio processing that has been detected; and a determination unit (208) that determines whether or not to execute a command included in the sound on the basis of a result of the specification.
INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
Provided is an information processing system including: an information processing device (20) and a playback device (10), the information processing device including: a first detection unit (204) that detects, from collected sound, audio processing superimposed on the sound by the playback device; a specifying unit (206) that specifies an utterance subject of the sound on the basis of the audio processing that has been detected; and a determination unit (208) that determines whether or not to execute a command included in the sound on the basis of a result of the specification.
SPEECH DETECTION USING IMAGE CLASSIFICATION
Speech detection can be achieved by identifying a speech segment within an audio segment using image classification. An audio segment of radio communications is obtained. An audio sub-segment within the audio segment is extracted. A sampled histogram is generated of a plurality of sampled values across a sampled time window of the audio sub-segment. A two-dimensional image is generated that represents a two-dimensional mapping of the sampled histogram along a first dimension and a predefined histogram along a second dimension that is orthogonal to the first dimension. The two-dimensional image is provided to an image classifier previously trained using the predefined histogram. An output is received from the image classifier based on the two-dimensional image. The output indicates whether the audio sub-segment contains speech.
SYSTEMS AND METHODS FOR ENABLING VOICE-BASED TRANSACTIONS AND VOICE-BASED COMMANDS
Aspects of the present disclosure involve processing audio signals to determine the presence and proximity of a user to a computing device, such as a voice-controlled computing device located within an environment. When the proximity of the user in comparison to the computing device is within an acceptable threshold, a voice command is detected that is associated with the user of a plurality of users located in the environment. In some instances, a device command is generated based on the voice command. The device command is executed, for example, at the computing device.