G06F16/65

Self-supervised AI-assisted sound effect generation for silent video using multimodal clustering

An automated method, system, and computer readable medium for generating sound effect recommendations for visual input by training machine learning models that learn audio-visual correlations from a reference image or video, a positive audio signal, and a negative audio signal. A machine learning algorithm is used with a reference visual input, a positive audio signal input or a negative audio signal input to train a multimodal clustering neural network to output representations for the visual input and audio input as well as correlation scores between the audio and visual representations. The trained multimodal clustering neural network is configured to learn representations in such a way that the visual representation and positive audio representation have higher correlation scores than the visual representation and a negative audio representation or an unrelated audio representation.

METHOD AND APPARATUS FOR DISPLAYING LYRIC EFFECTS, ELECTRONIC DEVICE, AND COMPUTER READABLE MEDIUM
20220351454 · 2022-11-03 ·

The present disclosure provides a method and an apparatus for displaying lyric effects, an electronic device, and a computer-readable medium. The method includes: obtaining, based on a lyric effect display operation of a user, an image sequence and music data to be displayed, the music data including audio data and lyrics; determining a target time point, playing at least one target image corresponding to the target time point in the image sequence, and determining target lyrics corresponding to the target time point in the lyrics, and adding animation effects on the at least one target image, displaying the target lyrics on the at least one target image, and playing a part of the audio data corresponding to the target lyrics.

Guidance query for cache system

A device may be configured to determine whether an audio file is a first type of audio file that is capable of being processed to recognize the voice query based on a characteristic of the audio file itself or a second type of audio file that may require speech recognition processing in order to recognize the voice query associated with the audio file. In determining whether the audio file is a first type of audio file or a second type of audio file, a query filter associated with the device may be configured to access one or more guidance queries. Using the one or more guidance queries, the device may classify the audio file as a first type of audio file or a second type of audio file based on receiving only a portion of the audio file, thereby improving the speed at which the audio file can be processed.

Guidance query for cache system

A device may be configured to determine whether an audio file is a first type of audio file that is capable of being processed to recognize the voice query based on a characteristic of the audio file itself or a second type of audio file that may require speech recognition processing in order to recognize the voice query associated with the audio file. In determining whether the audio file is a first type of audio file or a second type of audio file, a query filter associated with the device may be configured to access one or more guidance queries. Using the one or more guidance queries, the device may classify the audio file as a first type of audio file or a second type of audio file based on receiving only a portion of the audio file, thereby improving the speed at which the audio file can be processed.

Music streaming, playlist creation and streaming architecture

A system and method for making categorized music tracks available to end user applications. The tracks may be categorized based on computer-derived rhythm, texture and pitch (RTP) scores for tracks derived from high-level acoustic attributes, which is based on low level data extracted from the tracks. RTP scores are stored in a universal database common to all of the music publishers so that the same track, once RTP scored, does not need to be re-RTP scored by other music publishers. End user applications access an API server to import collections of tracks published by publishers, to create playlists and initiate music streaming. Each end user application is sponsored by a single music publisher so that only tracks capable of being streamed by the music publisher are available to the sponsored end user application.

Music streaming, playlist creation and streaming architecture

A system and method for making categorized music tracks available to end user applications. The tracks may be categorized based on computer-derived rhythm, texture and pitch (RTP) scores for tracks derived from high-level acoustic attributes, which is based on low level data extracted from the tracks. RTP scores are stored in a universal database common to all of the music publishers so that the same track, once RTP scored, does not need to be re-RTP scored by other music publishers. End user applications access an API server to import collections of tracks published by publishers, to create playlists and initiate music streaming. Each end user application is sponsored by a single music publisher so that only tracks capable of being streamed by the music publisher are available to the sponsored end user application.

DETECTION, ANALYSIS AND REPORTING OF FIREARM DISCHARGE
20230130926 · 2023-04-27 ·

A shot fired detector can receive an audio signal or acoustic stream and determine that a firearm has been discharged. One or more detectors can be used to continuously capture acoustic streams and process the acoustic streams for anomaly detection. A detected anomaly can be classified by a machine learning model to detect that a shot has been fired. The detector can send acoustic data and meta data associated with the shot fired to a server for further storage and/or processing. An alert can be automatically generated that is associated with the shot fired.

DETECTION, ANALYSIS AND REPORTING OF FIREARM DISCHARGE
20230130926 · 2023-04-27 ·

A shot fired detector can receive an audio signal or acoustic stream and determine that a firearm has been discharged. One or more detectors can be used to continuously capture acoustic streams and process the acoustic streams for anomaly detection. A detected anomaly can be classified by a machine learning model to detect that a shot has been fired. The detector can send acoustic data and meta data associated with the shot fired to a server for further storage and/or processing. An alert can be automatically generated that is associated with the shot fired.

SYSTEMS AND METHODS FOR DETECTING EMOTION FROM AUDIO FILES
20230076242 · 2023-03-09 ·

Disclosed embodiments may include a system that may receive an audio file comprising an interaction between a first user and a second user. The system may detect, using a deep neural network (DNN), moment(s) of interruption between the first and second users from the audio file. The system may extract, using the DNN, vocal feature(s) from the moment(s) of interruption. The system may determine, using a machine learning model (MLM) and based on the vocal feature(s), whether a threshold number of moments of the moment(s) of interruption corresponds to a first emotion type. When the threshold number of moments corresponds to the first emotion type, the system may transmit a first message comprising a first binary indication. When the threshold number of moments do not correspond to the first emotion type, the system may transmit a second message comprising a second binary indication.

SYSTEMS AND METHODS FOR DETECTING EMOTION FROM AUDIO FILES
20230076242 · 2023-03-09 ·

Disclosed embodiments may include a system that may receive an audio file comprising an interaction between a first user and a second user. The system may detect, using a deep neural network (DNN), moment(s) of interruption between the first and second users from the audio file. The system may extract, using the DNN, vocal feature(s) from the moment(s) of interruption. The system may determine, using a machine learning model (MLM) and based on the vocal feature(s), whether a threshold number of moments of the moment(s) of interruption corresponds to a first emotion type. When the threshold number of moments corresponds to the first emotion type, the system may transmit a first message comprising a first binary indication. When the threshold number of moments do not correspond to the first emotion type, the system may transmit a second message comprising a second binary indication.