Patent classifications
G06F16/685
Phonetic comparison for virtual assistants
In an approach for optimizing an intelligent virtual assistant by using phonetic comparison to find a response stored in a local database, a processor receives an audio input on a computing device. A processor transcribes the audio input to text. A processor compares the text to a set of user queries and commands in a local database of the computing device using a phonetic algorithm. A processor determines whether a user query or command of the set of user queries and commands meets a pre-defined threshold of similarity. Responsive to determining that the user query or command meets the pre-defined threshold of similarity, a processor identifies an intention of a set of intentions stored in the local database corresponding to the user query or command. A processor identifies a response of a set of responses in the local database corresponding to the intention. A processor outputs the response audibly.
Text-to-speech from media content item snippets
A text-to-speech engine creates audio output that includes synthesized speech and one or more media content item snippets. The input text is obtained and partitioned into text sets. A track having lyrics that match a part of one of the text sets is identified. The location of the track's audio that contains the lyric is extracted based on forced alignment data. The extracted audio is combined with synthesized speech corresponding to the remainder of the input text to form audio output.
Speech endpointing
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing are described. In one aspect, a method includes the action of accessing voice query log data that includes voice queries spoken by a particular user. The actions further include based on the voice query log data that includes voice queries spoken by a particular user, determining a pause threshold from the voice query log data that includes voice queries spoken by the particular user. The actions further include receiving, from the particular user, an utterance. The actions further include determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold. The actions further include based on determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold, processing the utterance as a voice query.
METHOD AND SYSTEM FOR ANALYIZING, CLASSIFYING, AND NODE-RANKING CONTENT IN AUDIO TRACKS
In one embodiment, a computer-implemented method is disclosed. The method includes receiving a first content item, transcribing audio included in the first content item to obtain text associated with the audio, determining a plurality of keywords included in the text, classifying, based on the plurality of keywords, the text as one or more nodes in a data structure, and ranking, based on a plurality of factors, the one or more nodes relative to one or more other nodes associated with a second content item.
SYSTEMS AND METHODS FOR TRANSFORMING DIGITIAL AUDIO CONTENT INTO VISUAL TOPIC-BASED SEGMENTS
A system for platform-independent visualization of audio content, in particular audio tracks utilizing a central computer system in communication with user devices via a computer network. The central system utilizes various algorithms to identify spoken content from audio tracks and selects visual assets associated with the identified content. Thereafter, a visualized audio track is available for users to listen and view. Audio tracks, for example Podcasts, may be segmented into topical audio segments based upon themes or topics, with segments from disparate podcasts combined into a single listening experience, based upon certain criteria, e.g., topics, themes, keywords, and the like.
User interfaces and associated systems and processes for sharing portions of content items
In some embodiments, a first electronic device shares a portion of a content item corresponding to a respective portion of lyrics for the content item with a second electronic device. In some embodiments, a system facilitates securely sharing a portion of a content item between two devices. In some embodiments, an electronic device generates a server address for accessing a portion of a content item. In some embodiments, a uniform resource locator includes an identification of content, an indication of a first subset of the content, and authentication data.
TECHNOLOGIES FOR CREATING, ALTERING, AND PRESENTING MEDIA CONTENT
Different types of media experiences can be developed based on characteristics of the consumer. “Linear” experiences may require execution of a pre-built script, although the script could be dynamically modified by a media production platform. Linear experiences can include guided audio tours that are modified or updated based on the location of the consumer. “Enhanced” experiences include conventional media content that is supplemented with intelligent media content. For example, turn-by-turn directions could be supplemented with audio descriptions about the surrounding area. “Freeform” experiences, meanwhile, are those that can continually morph based on information gleaned from a consumer. For example, a radio station may modify what content is being presented based on the geographical metadata uploaded by a computing device associated with the consumer.
Proximity based audio collaboration
A method includes: defining, by a computer device, an audio collaborative environment; defining, by the computer device, an access control of the audio collaborative environment, wherein the access control includes a geofence; receiving, by the computer device, a request from at least one user device to connect to the audio collaborative environment; determining, by the computer device, the at least one user device satisfies the access control; connecting, by the computer device, the at least one user device to an audio channel of the audio collaborative environment; recording, by the computer device, audio data transmitted on the audio channel by the at least one user device; storing, by the computer device, the audio data in a record; tagging, by the computer device, respective portions of the audio data in the record; and presenting one of the respective portions of the audio data to a user based on the tagging.
STRUCTURING AUDIO SESSION DATA WITH INDEPENDENTLY QUERYABLE SEGMENTS FOR EFFICIENT DETERMINATION OF HIGH VALUE CONTENT AND/OR GENERATION OF RECOMBINANT CONTENT
This disclosure relates generally to data processing devices and, more particularly, to a method, a device, and/or a system of structuring audio session data with independently queryable segments for efficient determination of high value content and/or generation of recombinant content. In one embodiment, a system for analyzing use of audio files to determine high value content includes a database server storing a data container referencing a segment data comprising an audio data and a segment UID that is independently addressable with a database query. A playback manager receives a playback request and streams the audio data to a device of a user. An interest marker engine receives an interest notification including a first audio time point and a second audio time point and generates an interest marker. An analytics server then generates an insight data from the interest marker and stores the insight data in association with the segment data.
SYSTEMS AND METHODS FOR REMOTELY INTERACTING WITH PERFORMERS AND INFLUENCING LIVE EVENTS
A computer-implemented method of remotely influencing a performer at a live event is disclosed herein. The method, for example, can be implemented via a customer device including a display, a processor, and a memory configured to store an application that, when executed by the processor, causes the processor to perform the method. The method can include displaying a graphical user interface configured to receive user inputs, receiving a first user input including a user request for the performer to take a user-specified action during the live event, displaying terms and conditions associated with the user request, receiving a second user input including a user acceptance of the terms and conditions associated with the user request, and transmitting the user request to a performer device upon receiving the user acceptance of the terms and conditions associated with the user request.