G10H2210/041

Media content identification on mobile devices

A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.

SYSTEMS AND METHODS FOR CLASSIFYING MUSIC FROM HETEROGENOUS AUDIO SOURCES
20230409897 · 2023-12-21 ·

The disclosed computer-implemented method may include accessing an audio stream with heterogenous audio content; dividing the audio stream into a plurality of frames; generating a plurality of spectrogram patches, each spectrogram patch within the plurality of spectrogram patches being derived from a frame within the plurality of frames; and providing each spectrogram patch within the plurality of spectrogram patches as input to a convolutional neural network classifier and receiving, as output, a classification of music within a corresponding frame from within the plurality of frames. Various other methods, systems, and computer-readable media are also disclosed.

METHOD OF TRAINING A NEURAL NETWORK AND RELATED SYSTEM AND METHOD FOR CATEGORIZING AND RECOMMENDING ASSOCIATED CONTENT

A property vector representing extractable measurable properties, such as musical properties, of a file is mapped to semantic properties for the file. This is achieved by using artificial neural networks ANNs in which weights and biases are trained to align a distance dissimilarity measure in property space for pairwise comparative files back towards a corresponding semantic distance dissimilarity measure in semantic space for those same files. The result is that, once optimised, the ANNs can process any file, parsed with those properties, to identify other files sharing common traits reflective of emotional perception, thereby rendering a more liable and true-to-life result of similarity/dissimilarity. This contrasts with simply training a neural network to consider extractable measurable properties that, in isolation, do not provide a reliable contextual relationship into the real-world.

Singing assisting system, singing assisting method, and non-transitory computer-readable medium comprising instructions for executing the same

A singing assisting system, a singing assisting method, and a non-transitory computer-readable medium including instructions for executing the method are provided. When the performed singing track does not appear in an ought-to-be-performed period, a singing-continuing procedure is executed. When the performed singing track is off pitch, a pitch adjustment procedure is executed.

DETERMINING MUSICAL STYLE USING A VARIATIONAL AUTOENCODER
20200372924 · 2020-11-26 ·

A computer receives a first audio content item and applies a process to generate a representation of first audio content item. A portion is extracted from the representation of the first audio content item. A first representative vector that corresponds to the first audio content item is determined by applying a variational autoencoder (VAE) to a first segment of the extracted portion the audio content item. The computer stores the first representative vector that corresponds to the first audio content item.

VOICE PROCESSING METHOD, VOICE PROCESSING DEVICE, AND RECORDING MEDIUM
20200365170 · 2020-11-19 ·

A voice processing method realized by a computer includes compressing forward a first steady period of a plurality of steady periods in a voice signal representing voice, and extending forward a transition period between the first steady period and a second steady period of the plurality of steady periods in the voice signal. Each of the plurality of steady periods is a period in which acoustic characteristics are temporally stable. The second steady period is a period immediately after the first steady period and has a pitch that is different from a pitch of the first steady period.

HAPTIC FEEDBACK METHOD
20200211338 · 2020-07-02 ·

Provided a haptic feedback method, including: step S1 of algorithmically training an audio clip containing a known audio event type to obtain an algorithm model; and step S2 of obtaining an audio, identifying the audio by the algorithm model to obtain different audio event types in this audio, matching, according to a preset rule, the audio event types with different vibration effects as a haptic feedback and outputting the haptic feedback. Compared with the related art, the present haptic feedback method provides users with real-time haptic feedback when applied to a mobile electronic product, thereby achieving excellent use experience of the mobile electronic product.

MOBILE TERMINAL AND MUSIC PLAY-BACK SYSTEM COMPRISING MOBILE TERMINAL

A mobile terminal includes a display, a communicator configured to perform communication with a plurality of music bots, and a controller configured to extract sound source characteristic information from each of a plurality of previously divided sound source tracks configuring music, generate a plurality of control commands for controlling operation of the plurality of music bots using the extracted sound source characteristic information, and transmit each of the plurality of generated control commands to each of the plurality of music bots through the communicator.

Method, apparatus and system
10657973 · 2020-05-19 · ·

A method including decomposing a magnitude part of a signal spectrum of a mixture signal into spectral components, each spectral component including a frequency part and a time activation part; and clustering the spectral components to obtain one or more clusters of spectral components, wherein the clustering of the spectral components is computed in the time domain.

SYSTEMS AND METHODS FOR CAPTURING AND INTERPRETING AUDIO
20200111468 · 2020-04-09 ·

A device is provided for capturing vibrations produced by an object such as a musical instrument such as a cymbal of a drum kit. The device comprises a detectable element, such as a ferromagnetic element, such as a metal shim and a sensor spaced apart from and located relative to the musical instrument. The detectable element is located between the sensor and the musical instrument. When the musical instrument vibrates, the sensor remains stationary and the detectable element is vibrated relative to the sensor by the musical instrument.