G06F16/685

Audio request interaction system

A person can use a portable electronic device to electronically purchase or otherwise request a product, service or other deliverable related to audio programming to which the person is listening at the time they initiate the request. The request is fulfilled by a service that analyzes the audio content to identify the deliverable the person desires.

System and method for identifying social trends

A method and system for identifying social trends are provided. The method includes collecting multimedia content from a plurality of data sources; gathering environmental variables related to the collected multimedia content; extracting visual elements from the collected multimedia content; generating at least one signature for each extracted visual element; generating at least one cluster of visual elements by clustering at least similar signatures generated for the extracted visual elements; correlating environmental variables related to visual elements in the at least one cluster; determining at least one social trend by associating the correlated environmental variables with the at least one cluster.

Intelligent digital assistant in a multi-tasking environment

Systems and processes for operating a digital assistant are provided. In one example, a method includes receiving a first speech input from a user. The method further includes identifying context information and determining a user intent based on the first speech input and the context information. The method further includes determining whether the user intent is to perform a task using a searching process or an object managing process. The searching process is configured to search data, and the object managing process is configured to manage objects. The method further includes, in accordance with a determination the user intent is to perform the task using the searching process, performing the task using the searching process; and in accordance with the determination that the user intent is to perform the task using the object managing process, performing the task using the object managing process.

INTERACTIVE FASHION WITH MUSIC AR

Methods and systems are disclosed for performing operations comprising: receiving a monocular image that includes a depiction of a person wearing an article of clothing; generating a segmentation of the article of clothing worn by the person in the monocular image; obtaining one or more audio-track related augmented reality elements; and applying the one or more audio-track related augmented reality elements to the article of clothing worn by the person based on the segmentation of the article of clothing worn by the person.

MEDIA CONTENT PROCESSING TECHNIQUES FOR RIGHTS AND CLEARANCE MANAGEMENT
20230112625 · 2023-04-13 ·

Systems and methods in accordance with various embodiments of the present disclosure provide improved techniques to process and manage media content and associated intellectual property rights associated with the media content. Intellectual property rights associated with media content can include copyright, trademarks, licenses to composition, synchronization, performance, recordings, etc. In particular, various embodiments provide media licensing management and monetization based on media licensing using a centralized registry of media content and associated asset rights.

SUGGESTED QUERIES FOR TRANSCRIPT SEARCH

Systems and methods for surfacing natural language queries from one or more transcripts. An example method may include converting received audio to text, through automated speech recognition, to form a transcript of the audio, wherein the transcript includes text of the audio and identifications of speakers associated with portions of the text corresponding to utterances from the respective speakers; generating input signals based on at least the transcript; executing at least one of one or more heuristics or a trained machine-learning (ML) model, using the generated input signals as an input, to generate at least one of a suggested natural language query for searching the transcript or a key moment within the received audio; and causing at least one of the suggested natural language query or the key moment to be surfaced on one or more remote devices.

Place search by audio signals

The present disclosure provides systems and methods that provides users with information pertaining to the audio properties at one or more points of interest. A database associated the audio properties with the points of interest is built using audio input received from devices at the points of interest. The device may determine that audio properties associated with the received audio input. The audio properties may determine a type of background noise and/or a volume of the background noise. If the type of background noise is music, the audio properties may further include a music genre, a title of a song, whether the music is recorded or there is a live band, etc. The audio properties associated with the point of interest may be updated in a database real time.

System and method for combining phonetic and automatic speech recognition search

A text search query including one or more words may be received. An ASR index created for an audio recording may be searched over using the query to produce ASR search results including words, each word associated with a confidence score. For each of the words in the ASR search results associated with a confidence score below a threshold (and in some cases having one or more preceding words in the ASR index and one or more subsequent words in the ASR index), a phonetic representation of the audio recording may be searched for the word having the confidence score below the threshold, where it occurs in the audio recording, possibly after the one or more preceding words and in the audio recording before the one or more subsequent words, to produce phonetic search results. Search results may be returned include ASR and phonetic results.

Project issue tracking via automated voice recognition

A processor may receive information from one or more users. The information may include identifiers associated with the one or more users and audio associated with the one or more users. The processor may transcribe the audio into a text of the audio. The processor may parse the text into one or more segments. The processor may analyze each of the one or more segments. The processor may determine, from the analyzing, a specific subject of the information.

Tagging an Image with Audio-Related Metadata
20230072899 · 2023-03-09 ·

In one aspect, an example method to be performed by a computing device includes (a) receiving a request to use a camera of the computing device; (b) in response to receiving the request, (i) using a microphone of the computing device to capture audio content and (ii) using the camera of the computing device to capture an image; (c) identifying reference audio content that has at least a threshold extent of similarity with the captured audio content; and (d) outputting an indication of the identified reference audio content while displaying the captured image.