G06F16/685

Systems and methods for disambiguating a voice search query

Systems and methods are described herein for disambiguating a voice search query that contains a command keyword by determining whether the user spoke a quotation from a content item and whether the user mimicked or approximated the way the quotation is spoken in the content item. The voice search query is transcribed into a string, and an audio signature of the voice search query is identified. Metadata of a quotation matching the string is retrieved from a database that includes audio signature information for the string as spoken within the content item. The audio signature of the voice search query is compared with the audio signature information in the metadata to determine whether the audio signature matches the audio signature information in the quotation metadata. If a match is detected, then a search result comprising an identifier of the content item from which the quotation comes is generated.

Centrally controlling communication at a venue

One example may include a method that includes receiving, at a presentation server, an audio data signal from a mobile device located in a presentation space, identifying a mobile device identification characteristic of the mobile device based on the received audio data signal, determining a mobile device location via a location determination procedure, and playing the audio signal via a loudspeaker.

PREDICTIVE QUERY EXECUTION

First audio data associated with a first portion of a voice query (e.g., an incomplete voice query) may be received (e.g., by a device or a server). A first transcript may be determined by a speech recognition engine and based on the first audio data. A plurality of predicted queries may be determined by applying a prediction process to the first transcript. A response for each of the plurality of predicted queries may be determined by processing the plurality of the predicted queries. Second audio data associated with a second portion of the voice query (e.g., a complete voice query) may be received. A second transcript may be determined by the speech recognition engine and based on the second audio data. Based on comparing the second transcript to one of the plurality of predicted queries, a response for the voice query may be returned.

SYSTEMS AND METHODS FOR DISAMBIGUATING A VOICE SEARCH QUERY

Systems and methods are described herein for disambiguating a voice search query that contains a command keyword by determining whether the user spoke a quotation from a content item and whether the user mimicked or approximated the way the quotation is spoken in the content item. The voice search query is transcribed into a string, and an audio signature of the voice search query is identified. Metadata of a quotation matching the string is retrieved from a database that includes audio signature information for the string as spoken within the content item. The audio signature of the voice search query is compared with the audio signature information in the metadata to determine whether the audio signature matches the audio signature information in the quotation metadata. If a match is detected, then a search result comprising an identifier of the content item from which the quotation comes is generated.

INTELLIGENT DIGITAL ASSISTANT IN A MULTI-TASKING ENVIRONMENT

Systems and processes for operating a digital assistant are provided. In one example, a method includes receiving a first speech input from a user. The method further includes identifying context information and determining a user intent based on the first speech input and the context information. The method further includes determining whether the user intent is to perform a task using a searching process or an object managing process. The searching process is configured to search data, and the object managing process is configured to manage objects. The method further includes, in accordance with a determination the user intent is to perform the task using the searching process, performing the task using the searching process; and in accordance with the determination that the user intent is to perform the task using the object managing process, performing the task using the object managing process.

Speech Enhancement Based on Metadata Associated with Audio Content
20230195783 · 2023-06-22 ·

Systems and methods disclosed herein include computing devices and/or computing systems configured to (i) determine portions of audio content comprising speech dialog based at least in part on metadata associated with the audio content, (ii) for individual portions of the audio content containing speech dialog, identify dialog enhancement parameters for application the portions of audio content containing speech dialog, and (iii) playing (or causing to be played) the audio content, where playing the audio content includes applying the dialog enhancement parameters to the portions of audio content containing speech dialog.

ADDING BACKGROUND SOUND TO SPEECH-CONTAINING AUDIO DATA
20170352361 · 2017-12-07 ·

An editing method facilitates the task of adding background sound to speech-containing audio data so as to augment the listening experience. The editing method is executed by a processor in a computing device and comprises obtaining characterization data that characterizes time segments in the audio data by at least one of topic and sentiment; deriving, for a respective time segment in the audio data and based on the characterization data, a desired property of a background sound to be added to the audio data in the respective time segment, and providing the desired property for the respective time segment so as to enable the audio data to be combined, within the respective time segment, with background sound having the desired property. The background sound may be selected and added automatically or by manual user intervention.

SYSTEMS AND METHODS FOR EMBEDDING DATA IN MEDIA CONTENT

A method is provided for modifying a first media content item by superimposing a first set of data over a first audio event having an amplitude that satisfies a first threshold. The first audio event has a first audio profile, the first set of data has a second audio profile, playback of the second audio profile is configured to be masked by the first audio profile during playback of the first media content item, and the first set of data includes playlist information. The method includes transmitting, to a second electronic device, the modified first media content item.

Delivery of synchronised soundtracks for electronic media content

A method and system for streaming a soundtrack from a server to a remote user device for a reader of electronic media content. The soundtrack is defined by multiple audio regions. Each audio region defined by an audio track for playback in the audio region, a start position in the electronic media content corresponding to where the playback of the audio region is to begin, and a stop position in the electronic media content corresponding to where the playback of the audio region is to cease. The streaming of the soundtrack is based on control data generated by the remote user device.

Displaying information related to content playing on a device

A computer-implemented method includes: detecting whether a user is watching media content; after detecting that the user is watching media, presenting on a user device a first affordance providing a first user-selectable election to receive information on entities relevant to the media content; in response to user selection of the election: sampling at the user device program information from the media content including one or more of audio signals and subtitles, and sending the program information to a server, which identifies the media content and generates one or more second user-selectable user elections for the identified media content and sends to the user device one or more second affordances providing the second user-selectable elections; displaying the second affordances on the user device; and in response to user selection of one of the second affordances, displaying on the user device information on a respective entity relevant to the media content.