Patent classifications
G10H2210/076
Methods and apparatus to segment audio and determine audio segment similarities
Methods, apparatus, and systems are disclosed to segment audio and determine audio segment similarities. An example apparatus includes at least one memory storing instructions and processor circuitry to execute instructions to at least select an anchor index beat of digital audio, identify a first segment of the digital audio based on the anchor index beat to analyze, the first segment having at least two beats and a respective center beat, concatenate time-frequency data of the at least two beats and the respective center beat to form a matrix of the first segment, generate a first deep feature based on the first segment, the first deep feature indicative of a descriptor of the digital audio, and train internal coefficients to classify the first deep feature as similar to a second deep feature based on the descriptor of the first deep feature and a descriptor of a second deep feature.
Virtual tutorials for musical instruments with finger tracking in augmented reality
Systems, devices, media, and methods are described for presenting a tutorial in augmented reality on the display of a smart eyewear device. The system includes a marker registration utility for setting a marker on a musical instrument, a localization utility for locating the eyewear device relative to the marker location and the instrument, a virtual object rendering utility for presenting a series of virtual tutorial objects on the display near one or more actuators on the instrument, and a hand tracking utility for tracking the performer's finger locations in real time during playback of a song file. A high-definition video camera captures sequences of frames of video data. The series of virtual tutorial objects, in one example, includes graphical elements presented on a virtual scroll that appears to move toward the instrument at a speed correlated with the song tempo. The hand tracking utility calculates a set of expected fingertip coordinates based on a detected hand shape and a library of hand poses and landmarks.
Systems and methods for generating a visual color display of audio-file data
Systems and methods for generating a visual color display of audio-file data are provided. The system includes a processor that performs a method including receiving audio-file data; generating filtered-audio data by processing the audio-file data by frequency-band filters. The frequency band filters have different frequency bands. The method includes generating one or more waveforms corresponding to the filtered-audio data and displaying the waveforms superimposed in unique color relative to one another. The method includes downsampling the waveforms. The method includes processing the waveforms through an envelope detector. The method includes processing the waveforms through an expander and applying a gain factor. The waveforms have transparency levels at sections that are proportional or inversely proportional to amplitudes at the sections.
Electronic device and music visualization method thereof
An electronic device and a method for controlling the electronic device, the method including receiving an input of a command to reproduce music contents, determining audio characteristics information on the music contents and situation information on an environment where the music contents are being reproduced; and displaying a visualization effect of visualizing the music contents using the audio characteristics information and the situation information, and reproducing the music contents.
METHOD FOR IDENTIFYING A SONG
A computer-implemented method for identifying a song includes: providing audio data including musical notation information for songs, receiving a real-time audio signal of a user performing on an instrument, detecting playing activity in successive segments, detecting notes and/or chords from the audio signal, storing user play history information including of information of songs a user has played before and number of plays, based on the play history information calculating a first probability for a song, based on first probabilities for a number of songs and based on the detected playing activity and the detected notes and/or chords, estimating the song being performed. The estimation includes calculating a second probability for different songs. The second probabilities are defined by the audio signal corresponding with a particular song of the play history combined with first probability associated with the song, and providing the song the user is performing or related information.
METHOD FOR TEMPO ADAPTIVE BACKING TRACK
A computer-implemented method includes: providing backing track audio data, wherein each backing track includes information of at least: song tempo, tonal content that is synchronized with the backing track audio, selecting a song, receiving a real-time audio signal of the user's performance, estimating parameters, based on the audio signal, including at least: playing activity of the user, wherein detecting whether the user is producing any sounding notes with a musical instrument, tempo of the user's playing, and playing position of the user within the selected song, estimating the reliability of the estimated tempo and play position of the user, wherein a value of the reliability represents the probability that the amount of error in the estimated user tempo and play position is sufficiently small, and when the estimated reliability of the user position and tempo is sufficiently high, start playing the backing track at the user position and tempo.
AUTOMATIC AND INTERACTIVE MASHUP SYSTEM
Systems and methods directed to combining audio tracks are provided. More specifically, a first audio track and a second audio track are received. The first audio track is separated into a vocal component and one or more accompaniment components. The second audio track is separated into a vocal component and one or more accompaniment components. A structure of the first audio track and a structure of the second audio track are determined. The first audio track and the second audio track are aligned based on the determined structures of the tracks. The vocal component of the first audio track is stretched to match a tempo of the second audio track. The stretched vocal component of the first audio track is added to the one or more accompaniment components of the second audio track.
Systems and methods for generating a playback-information display during time compression or expansion of an audio signal
Systems and methods for generating a playback-information display during time compression or expansion of an audio signal are provided. The system includes a processor that performs a method including displaying a first remaining playback-time associated with an audio file; adjusting the playback speed of the audio file during playback of the audio file; and, in response to the playback speed being adjusted, automatically displaying a second remaining playback-time associated with the audio file during playback of the audio file.
METHOD AND APPARATUS FOR DISPLAYING MUSIC POINTS, AND ELECTRONIC DEVICE AND MEDIUM
Disclosed are a method and apparatus for displaying music points, and an electronic device and a medium. One specific embodiment of the method includes: acquiring audio material; analyzing initial music points in the audio material, wherein the initial music points include beat points and/or note starting points in the audio material; and on an operation interface of video clipping, displaying, according to the position of the audio material on a clip timeline and the positions of target music points in the audio material, identifiers of the target music points on the clip timeline, wherein the target music points are some of or all of the initial music points. According to the embodiment, the time for a user to process audio material and to make music points is reduced, and the flexibility of tools is also guaranteed.
Connected accessory for a voice-controlled device
Coordinated operation of a voice-controlled device and an accessory device in an environment is described. A remote system processes audio data it receives from the voice-controlled device in the environment to identify a first intent associated with a first domain, a second intent associated with a second domain, and a named entity associated with the audio data. The remote system sends, to the voice-controlled device, first information for accessing main content associated with the named entity, and a first instruction corresponding to the first intent. The remote system also sends, to the accessory device, second information for accessing control information or supplemental content associated with the main content, and a second instruction corresponding to the second intent. The first and second instructions, when processed by the devices in the environment, cause coordinated operation of the voice-controlled device and the accessory device.