G10L25/24

HIERARCHICAL GENERATED AUDIO DETECTION SYSTEM

Disclosed is a hierarchical generated audio detection system, comprising an audio preprocessing module, a CQCC feature extraction module, a LFCC feature extraction module, a first-stage lightweight coarse-level detection model and a second-stage fine-level deep identification model; the audio preprocessing module preprocesses collected audio or video data to obtain an audio clip with a length not exceeding the limit; inputting the audio clip into CQCC feature extraction module and LFCC feature extraction module respectively to obtain CQCC feature and LFCC feature; inputting CQCC feature or LFCC feature into the first-stage lightweight coarse-level detection model for first-stage screening to screen out the first-stage real audio and the first-stage generated audio; inputting the CQCC feature or LFCC feature of the first-stage generated audio into the second-stage fine-level deep identification model to identify the second-stage real audio and the second-stage generated audio, and the second-stage generated audio is identified as generated audio.

Authentication method, authentication device, electronic device and storage medium
11700127 · 2023-07-11 · ·

The present disclosure provides an authentication method, an authentication device, an electronic device and a storage medium. The authentication method includes: receiving target voice data; obtaining a first voiceprint feature parameter corresponding to the target voice data from a device voiceprint model library; performing a first encryption process on the first voiceprint feature parameter with a locally stored private key to generate to-be-verified data; transmitting the to-be-verified data to a server, so that the server uses a public key which matches the private key to decrypt the to-be-verified data to obtain the first voiceprint feature parameter, and performs authentication on the first voiceprint feature parameter to obtain an authentication result; receiving the authentication result returned by the server.

Authentication method, authentication device, electronic device and storage medium
11700127 · 2023-07-11 · ·

The present disclosure provides an authentication method, an authentication device, an electronic device and a storage medium. The authentication method includes: receiving target voice data; obtaining a first voiceprint feature parameter corresponding to the target voice data from a device voiceprint model library; performing a first encryption process on the first voiceprint feature parameter with a locally stored private key to generate to-be-verified data; transmitting the to-be-verified data to a server, so that the server uses a public key which matches the private key to decrypt the to-be-verified data to obtain the first voiceprint feature parameter, and performs authentication on the first voiceprint feature parameter to obtain an authentication result; receiving the authentication result returned by the server.

DETECTION DEVICE

A detection device detecting a scene related to a sponsor credit included in a commercial message from a target video is provided. The detection device comprises a detection unit that associates, from a preliminary video, a still image related to the sponsor credit with an audio signal related to the sponsor credit included other than in a frame or an audio signal configuring the commercial message so as to detect the scene related to the sponsor credit from the target video.

METHOD AND DEVICE FOR SPEECH/MUSIC CLASSIFICATION AND CORE ENCODER SELECTION IN A SOUND CODEC
20230215448 · 2023-07-06 ·

Two-stage speech/music classification device and method classify an input sound signal and select a core encoder for encoding the sound signal. A first stage classifies the input sound signal into one of a number of final classes. A second stage extracts high-level features of the input sound signal and selects the core encoder for encoding the input sound signal in response to the extracted high-level features and the final class selected in the first stage.

Wearable respiratory monitoring system based on resonant microphone array

A method for continuous acoustic signature recognition and classification includes a step of obtaining an audio input signal from a resonant microphone array positioned proximate to a target, the audio input signal having a plurality of channels. The target produces characterizing audio signals depending on a state or condition of the target. A plurality of features is extracted from the audio input signal with a signal processor. The plurality of features is classified to determine the state of the target. An acoustic monitoring system implementing the method is also provided.

Systems and methods for animation generation

Systems and methods for animating from audio in accordance with embodiments of the invention are illustrated. One embodiment includes a method for generating animation from audio. The method includes steps for receiving input audio data, generating an embedding for the input audio data, and generating several predictions for several tasks from the generated embedding. The several predictions includes at least one of blendshape weights, event detection, and/or voice activity detection. The method includes steps for generating a final prediction from the several predictions, where the final prediction includes a set of blendshape weights, and generating an output based on the generated final prediction.

Systems and methods for animation generation

Systems and methods for animating from audio in accordance with embodiments of the invention are illustrated. One embodiment includes a method for generating animation from audio. The method includes steps for receiving input audio data, generating an embedding for the input audio data, and generating several predictions for several tasks from the generated embedding. The several predictions includes at least one of blendshape weights, event detection, and/or voice activity detection. The method includes steps for generating a final prediction from the several predictions, where the final prediction includes a set of blendshape weights, and generating an output based on the generated final prediction.

Communication with in-game characters
11691076 · 2023-07-04 ·

A system for coordinating reactions of a virtual character with script spoken by a player in a video game or presentation, comprising an internet-connected server executing software and streaming video games or presentations to a player's computerized device. The system senses start of a dialogue between the player and the virtual character, displays a script for the player on a display of the computerized platform, prompts the player to speak the script. A timer then starts, or the system tracks an audio stream of the spoken script, determines where the player is in the script by the timer or the audio stream, and causes specific actions and responses of the virtual character according to pre-programmed association of actions and responses of the character to points of time or specific variations in the audio stream.

Communication with in-game characters
11691076 · 2023-07-04 ·

A system for coordinating reactions of a virtual character with script spoken by a player in a video game or presentation, comprising an internet-connected server executing software and streaming video games or presentations to a player's computerized device. The system senses start of a dialogue between the player and the virtual character, displays a script for the player on a display of the computerized platform, prompts the player to speak the script. A timer then starts, or the system tracks an audio stream of the spoken script, determines where the player is in the script by the timer or the audio stream, and causes specific actions and responses of the virtual character according to pre-programmed association of actions and responses of the character to points of time or specific variations in the audio stream.