G10L25/84

Method and System for Scene-Aware Audio-Video Representation

Embodiments disclose a method and system for a scene-aware audio-video representation of a scene. The scene-aware audio video representation corresponds to a graph of nodes connected by edges. A node in the graph is indicative of the video features of an object in the scene. An edge in the graph connecting two nodes indicates an interaction of the corresponding two objects in the scene. In the graph, at least one or more edges are associated with audio features of a sound generated by the interaction of the corresponding two objects. The graph of the audio-video representation of the scene may be used to perform a variety of different tasks. Examples of the tasks include one or a combination of an action recognition, an anomaly detection, a sound localization and enhancement, a noisy-background sound removal, and a system control.

ELECTRONIC DEVICE AND SPEAKER VERIFICATION METHOD OF ELECTRONIC DEVICE
20230016465 · 2023-01-19 ·

An electronic device is provided. The electronic device includes a microphone configured to receive an audio signal including a voice of a user, a sensor configured to detect a vibration signal generated by the user, at least one processor, and a memory configured to store an instruction executable by the processor. The at least one processor may be configured to determine a noise level included in the audio signal, calculate a verification score based on the noise level, the audio signal, and the vibration signal, and perform speaker verification for the user based on the verification score.

ELECTRONIC DEVICE AND SPEAKER VERIFICATION METHOD OF ELECTRONIC DEVICE
20230016465 · 2023-01-19 ·

An electronic device is provided. The electronic device includes a microphone configured to receive an audio signal including a voice of a user, a sensor configured to detect a vibration signal generated by the user, at least one processor, and a memory configured to store an instruction executable by the processor. The at least one processor may be configured to determine a noise level included in the audio signal, calculate a verification score based on the noise level, the audio signal, and the vibration signal, and perform speaker verification for the user based on the verification score.

SYSTEMS AND METHODS FOR DISTINGUISHING VALID VOICE COMMANDS FROM FALSE VOICE COMMANDS IN AN INTERACTIVE MEDIA GUIDANCE APPLICATION
20230223020 · 2023-07-13 ·

Systems and methods for distinguishing valid voice commands from false voice commands in an interactive media guidance application. In some aspects, the interactive media guidance application receives, at a user device, a signature sound sequence. The interactive media guidance application determines, using control circuitry, based on the signature sound sequence, a threshold gain for the current location of the user device. The interactive media guidance application receives, at the user device, a voice command. The interactive media guidance application determines, using the control circuitry, based on the voice command, a gain for the voice command. The interactive media guidance application determines, using the control circuitry, whether the gain for the voice command is different from the threshold gain. Based on determining that the gain for the voice command is different from the threshold gain, the interactive media guidance application executes, using the control circuitry, the voice command.

METHODS, SYSTEMS, AND MEDIA FOR CONTEXT-AWARE ESTIMATION OF STUDENT ATTENTION IN ONLINE LEARNING

Methods, systems and media for context-aware estimation of student attention in online learning are described. An attention monitoring system filters or restricts the time periods in which student attention is monitored or assessed to those time periods in which student attention is important. These time periods of high attention importance may be determined by processing data from the teacher, such as audio data representing the teacher's voice and/or visual presentation data representing slides or other visual material being presented to the students. Various types of presenter data from the teacher and attendee data from the students may be used in assessing the importance of attention and each student's attention during each time period. The presenter may be provided with feedback in various forms showing student attention performance aggregated or segmented according to various criteria.

METHODS, SYSTEMS, AND MEDIA FOR CONTEXT-AWARE ESTIMATION OF STUDENT ATTENTION IN ONLINE LEARNING

Methods, systems and media for context-aware estimation of student attention in online learning are described. An attention monitoring system filters or restricts the time periods in which student attention is monitored or assessed to those time periods in which student attention is important. These time periods of high attention importance may be determined by processing data from the teacher, such as audio data representing the teacher's voice and/or visual presentation data representing slides or other visual material being presented to the students. Various types of presenter data from the teacher and attendee data from the students may be used in assessing the importance of attention and each student's attention during each time period. The presenter may be provided with feedback in various forms showing student attention performance aggregated or segmented according to various criteria.

SENSITIVITY MODE FOR AN AUDIO SPOTTING SYSTEM
20230223042 · 2023-07-13 ·

An audio spotting system configured for various operating modes including a regular mode and sensitivity mode is described. An example cascade audio spotting system may include a high-power subsystem including a high-power trigger and a transfer module. This high-power trigger includes one or more detection models used to detect whether a target sound activity is included in the one or more audio streams. The one or more detection models are associated with a first set of hyperparameters when the cascade audio spotting system is in a regular mode, and the one or more detection models are associated with a second set of hyperparameters when the cascade audio spotting system is in a sensitivity mode. The transfer module provides at least one of one or more processed audio streams for further processing in response to the high-power trigger detecting the target sound activity in the one or more audio streams.

SENSITIVITY MODE FOR AN AUDIO SPOTTING SYSTEM
20230223042 · 2023-07-13 ·

An audio spotting system configured for various operating modes including a regular mode and sensitivity mode is described. An example cascade audio spotting system may include a high-power subsystem including a high-power trigger and a transfer module. This high-power trigger includes one or more detection models used to detect whether a target sound activity is included in the one or more audio streams. The one or more detection models are associated with a first set of hyperparameters when the cascade audio spotting system is in a regular mode, and the one or more detection models are associated with a second set of hyperparameters when the cascade audio spotting system is in a sensitivity mode. The transfer module provides at least one of one or more processed audio streams for further processing in response to the high-power trigger detecting the target sound activity in the one or more audio streams.

CASCADE AUDIO SPOTTING SYSTEM
20230223041 · 2023-07-13 ·

Systems and methods for identifying audio events in one or more audio streams include the use of a cascade audio spotting system (such as a cascade keyword spotting system (KWS)) to reduce power consumption while maintaining a desired performance. An example cascade audio spotting system may include a first module and a high-power subsystem. The first module is to receive an audio stream from one or more audio streams, process the audio stream to detect a first target sound activity in the audio stream, and provide a first signal in response to detecting the first target sound activity in the audio stream. The high-power subsystem is to (in response to the first signal being provided by the first module) receive the one or more audio streams and process the one or more audio streams to detect a second target sound activity in the one or more audio streams.

CASCADE AUDIO SPOTTING SYSTEM
20230223041 · 2023-07-13 ·

Systems and methods for identifying audio events in one or more audio streams include the use of a cascade audio spotting system (such as a cascade keyword spotting system (KWS)) to reduce power consumption while maintaining a desired performance. An example cascade audio spotting system may include a first module and a high-power subsystem. The first module is to receive an audio stream from one or more audio streams, process the audio stream to detect a first target sound activity in the audio stream, and provide a first signal in response to detecting the first target sound activity in the audio stream. The high-power subsystem is to (in response to the first signal being provided by the first module) receive the one or more audio streams and process the one or more audio streams to detect a second target sound activity in the one or more audio streams.