G10L25/54

Audio matching

An audio matching technique generates audio fingerprints from a captured audio signal. Coarse and Fine fingerprints are generated from the captured audio. The coarse fingerprint is used to match with a set of coarse fingerprints stored in a database to identify a subset of possibly matching database entries. The fine fingerprint is then used to perform a detailed comparison with fine fingerprints associated with the subset of possibly matching database entries in order to find a match for the captured audio signal.

GUIDANCE QUERY FOR CACHE SYSTEM
20230223027 · 2023-07-13 ·

A device may be configured to determine whether an audio file is a first type of audio file that is capable of being processed to recognize the voice query based on a characteristic of the audio file itself or a second type of audio file that may require speech recognition processing in order to recognize the voice query associated with the audio file. In determining whether the audio file is a first type of audio file or a second type of audio file, a query filter associated with the device may be configured to access one or more guidance queries. Using the one or more guidance queries, the device may classify the audio file as a first type of audio file or a second type of audio file based on receiving only a portion of the audio file, thereby improving the speed at which the audio file can be processed.

APPARATUS AND METHOD FOR AUDIO DATA ANALYSIS

A data processing apparatus includes storage circuitry to store a plurality of sound recordings, receiving circuitry to receive input data indicative of one or more sounds detected by a microphone, selection circuitry to select, from the plurality of sounds recordings, one or more candidate sound recordings in dependence upon the input data and output circuitry to output data in dependence upon one or more of the candidate sound recordings.

Electronic apparatus for dynamic note matching and operating method of the same

Disclosed are an electronic apparatus for dynamic note matching (DNM) and an operating method thereof, the method including acquiring a first section sequence by reducing a first sequence extracted from an input signal based on at least one first section in which the respective values are successively arranged; acquiring a second section sequence reduced from a pre-stored second sequence based on at least one second section in which the respective values are successively arranged; and calculating a similarity between the first section sequence and the second section sequence.

METHODS, SYSTEMS, AND MEDIA FOR RIGHTS MANAGEMENT OF EMBEDDED SOUND RECORDINGS USING COMPOSITION CLUSTERING

Methods, systems, and media for determining and presenting information related to embedded sound recordings are provided. In some embodiments, the method comprises: receiving a content item; extracting a sound recording from the content item; generating a melody fingerprint of the extracted sound recording; determining whether the melody fingerprint of the extracted sound recording matches one of a plurality of clusters of similar sounding sound recordings in a reference database, wherein each cluster in the plurality of clusters of similar sounding sound recordings is associated with ownership information based on a plurality of ownership information associated with each of the sound recordings in the cluster; in response to determining that the melody fingerprint of the extracted sound recording matches a cluster of similar sounding sound recordings, retrieving ownership information associated with the cluster; mapping the ownership information to the sound recording extracted from the content item; and causing an action to be performed on the content item based on the mapped ownership information

VIDEO GENERATION METHOD, APPARATUS, ELECTRONIC DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT

This disclosure provides a video generation method, a video generation apparatus, an electronic device, a storage medium and a program product, and relates to the field of artificial intelligence technology, and in particular to the field of computer vision technology and deep learning technology. A specific implementation includes: obtaining document content information of a document; extracting, from the document content information, populating information for multiple scenes in a preset video template; populating the populating information for the multiple scenes into corresponding scenes in the preset video template, respectively, to obtain image information of the multiple scenes; generating audio information of the multiple scenes according to the populating information for the multiple scenes; generating a video of the document based on the image information and audio information of the multiple scenes.

Automated audio mapping using an artificial neural network

According to one implementation, an automated audio mapping system includes a computing platform having a hardware processor and a system memory storing an audio mapping software code including an artificial neural network (ANN) trained to identify multiple different audio content types. The hardware processor is configured to execute the audio mapping software code to receive content including multiple audio tracks, and to identify, without using the ANN, a first music track and a second music track of the multiple audio tracks. The hardware processor is further configured to execute the audio mapping software code to identify, using the ANN, the audio content type of each of the multiple audio tracks except the first music track and the second music track, and to output a mapped content file including the multiple audio tracks each assigned to a respective one predetermined audio channel based on its identified audio content type.

Three-dimensional room analysis with audio input
11521376 · 2022-12-06 · ·

System and methods are provided that generate a three-dimensional model from a physical space. While a user is scanning and/or recording the physical space with a user computing device, user speech describing the physical space is recorded. A transcript is generated from the audio captured during the scan and/or image recording of the physical space. Keywords from the transcript are used to improve computer-vision object identification, which is incorporated in the three-dimensional model.

Information processing device, information processing terminal, information processing method, and program

[Problem] The problem of the present disclosure relates to proposing an information processing device, an information processing terminal, an information processing method, and a program, which are capable of controlling the output of a voice so as to be adaptive to an action purpose of a user. [Solution] An information processing device including: an inference unit that infers an action purpose of a user on the basis of a result of sensing by one or more sensors; and an output control unit that controls, on the basis of a result of inference by the inference unit, output of a voice to the user performed by an audio output unit.

Information processing device, information processing terminal, information processing method, and program

[Problem] The problem of the present disclosure relates to proposing an information processing device, an information processing terminal, an information processing method, and a program, which are capable of controlling the output of a voice so as to be adaptive to an action purpose of a user. [Solution] An information processing device including: an inference unit that infers an action purpose of a user on the basis of a result of sensing by one or more sensors; and an output control unit that controls, on the basis of a result of inference by the inference unit, output of a voice to the user performed by an audio output unit.