G06F16/685

Intelligent automated assistant for media exploration

Systems and processes for operating an intelligent automated assistant are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a first natural-language speech input indicative of a request for media, where the first natural-language speech input comprises a first search parameter; providing, by a digital assistant, a first media item identified based on the first search parameter. The method further includes, while providing the first media item, receiving a second natural-language speech input and determining whether the second input corresponds to a user intent of refining the request for media. The method further includes, in accordance with a determination that the second speech input corresponds to a user intent of refining the request for media: identifying, based on the first parameter and the second speech input, a second media item and providing the second media item.

System and method for speech recognition for occupancy detection in high occupancy toll applications
11676425 · 2023-06-13 · ·

A system and method for dividing toll charges among vehicle occupants includes, at a mobile device, receiving identifying information for each vehicle occupant that can be used to verify the identity of the occupant. The identifying information can be biometric in nature, including images or voice prints. The information is provided to tolling service with which each of the occupants has an account. Toll charges that accrue as the vehicle travels are then divided or split among the vehicle occupants.

Systems and methods for interpreting natural language search queries

Systems and methods are described herein for interpreting natural language search queries that account for contextual relevance of words of the search query that would ordinarily not be processed, including, for example, processing each word of the query. Each term or phrase is associated with a respective part of speech, and a frequency of occurrence of a combination of adjacent terms or phrases public domain is determined. A relevance of each term is then determined based on its respective type of term and frequency of occurrence in the public domain. The natural language search query is then interpreted based on the importance or relevance of each term.

SYSTEM AND METHOD FOR COMBINING PHONETIC AND AUTOMATIC SPEECH RECOGNITION SEARCH

A text search query including one or more words may be received. An ASR index created for an audio recording may be searched over using the query to produce ASR search results including words, each word associated with a confidence score. For each of the words in the ASR search results associated with a confidence score below a threshold (and in some cases having one or more preceding words in the ASR index and one or more subsequent words in the ASR index), a phonetic representation of the audio recording may be searched for the word having the confidence score below the threshold, where it occurs in the audio recording, possibly after the one or more preceding words and in the audio recording before the one or more subsequent words, to produce phonetic search results. Search results may be returned include ASR and phonetic results.

Method and system for providing audio content
11669567 · 2023-06-06 · ·

Methods and systems are disclosed in which audio broadcasts are converted into audio segments, for example, based on segment content. These audio segments are indexed, so as to be searchable, as computer searchable segments, for example, by network search engines and other computerized search tools.

Apparatus, method and program to facilitate retrieval of voice messages

An information processing apparatus includes a display, an input unit, and a controller. The input unit is configured to receive an input of a first keyword from a user. The controller is configured to retrieve first character information including the input first keyword from a database configured to store a plurality of character information items converted from a plurality of voice information items by voice recognition processing, extract a second keyword that is included in the first character information acquired by the retrieval and is different from the first keyword, and control the display to display a list of items including first identification information with which the acquired first character information is identified and the second keyword included in the first character information.

Methods and systems for predictive buffering of related content segments

The methods and systems described herein aid users by providing thorough and efficient content consumption. For example, the methods and systems buffer content segments related to a current portion of the content that the system is generating for display. The methods and systems determine a characteristic of the current portion of the content and related content segments based on the characteristic. Confidence scores are determined by the systems and methods for each of the related content segments, and one or more related content segments with higher confidence scores are buffered in memory. Accordingly, the methods and systems described herein provide a thorough viewing of content through related segments that are buffered in memory for quick access.

METHODS AND APPARATUS TO IDENTIFY MEDIA THAT HAS BEEN PITCH SHIFTED, TIME SHIFTED, AND/OR RESAMPLED

Methods, apparatus, systems and articles of manufacture are disclosed to identify media that has been pitch shifted, time shifted, and/or resampled. An example apparatus includes: memory; instructions in the apparatus; and processor circuitry to execute the instructions to: transmit a fingerprint of an audio signal and adjusting instructions to a central facility to facilitate a query, the adjusting instructions identifying at least one of a pitch shift, a time shift, or a resample ratio; obtain a response including an identifier for the audio signal and information corresponding to how the audio signal was adjusted; and change the adjusting instructions based on the information.

METHOD AND SYSTEM FOR SYNCHRONIZIING PRESENTATION SLIDE CONTENT WITH SOUNDTRACK
20220351754 · 2022-11-03 ·

A method for synchronizing a plurality of presentation slide content with a soundtrack comprises obtaining the plurality of presentation slide content and the soundtrack including a plurality of audio samples. The presentation slide content comprises a video or an animation in the presentation slide. Each presentation slide content is associated with a metadata and each audio sample is indexed with a corresponding timecode. The method comprises detecting triggering event that identifies a current audio sample of the soundtrack as an audio sample to transition from a first presentation slide content to a second presentation slide content, and obtaining a timecode indexed with the identified audio sample, associating the timecode with the metadata of the second presentation slide content to link the second presentation slide content with the identified audio sample, and generating a synchronized presentation multimedia file having the linked second presentation slide content with the identified audio sample.

Synchronized voice application to present accurate real time content uttered by a text reader/reciter
20170316778 · 2017-11-02 ·

The embodiments of the invention allows retrieval of information or processing of commands through a speech interface and/or a combination of a speech interface and a non-speech interface. Thus, facilitating verbal search of religious and non-religious texts, and publishing resulting finds, along with exegesis and/or explanations. Beneficial uses can be gotten in the fields of, but not limited to those fields, of religious worship, and education. The embodiments of the invention eases interaction of the user(s) with the text(s) and allows for chances of in-depth comprehension and greater access to knowledge, but not limited only to those benefits. The scope and ramifications of the use(s) and benefits cannot be measured in a limited manner. As technology and imagination increases, the true scope and ramifications would increase as well.