G10H2210/051

SYSTEMS AND METHODS FOR CAPTURING AND INTERPRETING AUDIO
20220199059 · 2022-06-23 ·

A device is provided for capturing vibrations produced by an object such as a musical instrument such as a drum head of a drum kit. The device comprises a detectable element, such as a ferromagnetic element, such as a metal shim and a sensor spaced apart from and located relative to the musical instrument. The detectable element is located between the sensor and the musical instrument. When the musical instrument vibrates, the sensor remains stationary and the detectable element is vibrated relative to the sensor by the musical instrument.

AUTOMATIC CONVERSION OF SPEECH INTO SONG, RAP OR OTHER AUDIBLE EXPRESSION HAVING TARGET METER OR RHYTHM

Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.

Media content identification on mobile devices

A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.

Media content identification on mobile devices

A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.

Conversion of Music Audio to Enhanced MIDI Using Two Inference Tracks and Pattern Recognition
20220139363 · 2022-05-05 · ·

A method and system for automatically transcribing an audio source, e.g. a WAV file or live feeds, into a computer-readable code, e.g., enhanced MIDI, are provided, specifically and limited to solving a central problem that has not been solved elsewhere: it takes a large sampling window, say from a fifth of a second to a full second for typical music, to extract many music perceptual parameters of interest, yet that transcription also needs to maintain synchronization with the source music, with the time resolution of that synchronization about a sixteenth of a second.

Systems and methods for generating a graphical representation of audio signal data during time compression or expansion

Systems and methods for generating a graphical representation of audio signal data during time compression or expansion are provided. The system may include a processor that performs a method including displaying a waveform during audio-signal playback at a first speed by scrolling the waveform from aright portion of a display to a left portion of the display. The method includes receiving a command to increase or decrease the audio-signal playback speed and horizontally expanding or horizontally contracting the waveform in response to receiving the command to increase or decrease the audio-signal playback speed.

EVALUATING PERCUSSIVE PERFORMANCES
20220028295 · 2022-01-27 ·

Measures (for example, methods, systems and computer programs) are provided to evaluate a percussive performance. Percussive performance data captured by one or more sensors is received. The percussive performance data represents one or more impact waveforms of one or more hits on a performance surface. The one or more impact waveforms are analysed. The analysing comprises: (i) identifying one or more characteristics of the one or more impact waveforms; (ii) classifying the one or more hits as one or more percussive hit-types based on the one or more characteristics; and (iii) evaluating the one or more percussive hit-types against performance target data. Performance evaluation data is output based on said evaluating.

Song analysis device and song analysis program

A music piece analyzer includes: a beat-position-acquiring-unit configured to detect beat positions in music piece data; a snare drum detector configured to detect sounding positions of a snare drum in the music piece data; a bass drum detector configured to detect sounding positions of a bass drum in, the music piece data; a one-beat-shift-determination-unit configured to determine whether a bar beginning of the music piece data is shifted by one beat based upon the sounding positions of the snare drum detected by the snare drum detector; a two-beat-shift-determination-unit configured to determine whether the bar beginning of the music piece data is shifted by two beats on a basis of the sounding positions of the bass drum detected by the bass drum detector; and a bar-beginning-setting-unit configured to set the bar beginning of the music piece data on a basis of results determined by the one-beat-shift-determination-unit and the two-beat-shift-determination-unit.

METHOD AND SYSTEM FOR PROCESSING AUDIO STEMS
20210350778 · 2021-11-11 ·

A method and system for processing an audio stem/loop, including dividing a stem into a plurality of stem slices, classifying each of the plurality of stem slices into at least a first group or a second group and applying a stem effect, that includes replacing at least one stem slice with an all-zero stem slice, replacing at least one stem slice belonging to the first group or the second group with a stem slice belonging to the first group or the second group.

AUDIO SIGNAL PROCESSING METHOD, APPARATUS, AND PROGRAM
20220007124 · 2022-01-06 ·

An appropriate level index is obtained from an input audio waveform. An audio signal processing method includes detecting one or more level values in each of a plurality of attack sections included in an audio signal including the plurality of attack sections and a plurality of non-attack sections different from the plurality of attack sections; and generating a histogram of the detected level values in each of the plurality of attack sections, the generated histogram excluding level values of the plurality of non-attack sections.