Patent classifications
G10L15/00
Hotword-based speaker recognition
Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving an utterance from a user in a multi-user environment, each user having an associated set of available resources, determining that the received utterance includes at least one predetermined word, comparing speaker identification features of the uttered predetermined word with speaker identification features of each of a plurality of previous utterances of the predetermined word, the plurality of previous predetermined word utterances corresponding to different known users in the multi-user environment, attempting to identify the user associated with the uttered predetermined word as matching one of the known users in the multi-user environment, and based on a result of the attempt to identify, selectively providing the user with access to one or more resources associated with a corresponding known user.
Hotword-based speaker recognition
Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving an utterance from a user in a multi-user environment, each user having an associated set of available resources, determining that the received utterance includes at least one predetermined word, comparing speaker identification features of the uttered predetermined word with speaker identification features of each of a plurality of previous utterances of the predetermined word, the plurality of previous predetermined word utterances corresponding to different known users in the multi-user environment, attempting to identify the user associated with the uttered predetermined word as matching one of the known users in the multi-user environment, and based on a result of the attempt to identify, selectively providing the user with access to one or more resources associated with a corresponding known user.
Speech recognition technology to improve retail store checkout
Systems and methods for using speech recognition technologies to facilitate retail store checkout are disclosed. According to certain aspects, an electronic device may detect a user's speech and analyze the speech to identify a set of matching items that may correspond to items being purchased by a customer. The electronic device may display, via a user interface, the set of matching items as well as a code or identification associated with the set of matching items. The user may interface with a point of sale system to input a code for a desired item, and the point of sale system may add the desired item to an order and may facilitate a checkout for the order.
Cognitive analysis for speech recognition using multi-language vector representations
A method, system and computer program product for speech recognition using multiple languages includes receiving, by one or more processors, an input from a user, the input includes a sentence in a first language. The one or more processors translate the sentence to a plurality of languages different than the first language, and create vectors associated with the plurality of languages, each vector includes a representation of the sentence in each of the plurality of languages. The one or more processors calculate eigenvectors for each vector associated with a language in the plurality of languages, and based on the calculated eigenvectors, a score is assigned to each of the plurality of languages according to a relevance for determining a meaning of the sentence.
System and method for editing transcriptions with improved readability and correctness
Disclosed are a computer implemented method, system and platform for improving the readability and/or coherency of a conversation transcript, which include the applying of a speech disfluency detection model to identify speech disfluencies in a text transcript and to provide a corrected and/or annotated version of the conversation transcript indicating the edits made vis-à-vis the inputted text transcript.
REAL TIME CORRECTION OF ACCENT IN SPEECH AUDIO SIGNALS
Systems and methods for real-time correction of an accent in a speech audio signal are provided. A method includes dividing the speech audio signal into a stream of input chunks, an input chunk from the stream of input chunks including a pre-defined number of frames of the speech audio signal, extracting, by an acoustic features extraction module from the input chunk and a context associated with the input chunk, acoustic features, the context is a pre-determined number of the frames preceding the input chunk in the stream; extracting, by a linguistic features extraction module from the input chunk and the context, linguistic features, receiving a speaker embedding for a human speaker, providing the speaker embedding, the acoustic features, and the linguistic features to a synthesis module to generate a melspectrogram with a reduced accent, providing the melspectrogram to a vocoder to generate an output chunk of an output audio signal.
Event-based speech interactive media player
Interactive content containing audio or video may be provided in conjunction with non-interactive content containing audio or video to enhance user engagement and interest with the contents and to increase the effectiveness of the distributed information. Interactive content may be directly inserted into the existing, non-interactive content. Additionally or alternatively, interactive content may be streamed in parallel to the existing content, with minimal modification to the existing content. For example, the server may monitor content from a content provider; detect an event (e.g., a marker embedded in the content stream, or in a data source external to the content stream); upon detecting the event, play interactive content at a designated time while silencing the content stream of the content provider (e.g., by muting, pausing, playing silence.) The marker may be a sub-audible tone or metadata associated with the content stream. The user may respond to the interactive content by voice.
Systems and methods for providing media based on a detected language being spoken
Various embodiments provide media based on a detected language being spoken. In one embodiment, the system electronically detects which language of a plurality of languages is being spoken by a user, such during a conversation or while giving a voice command to the television. Based on which language of a plurality of languages is being spoken by the user, the system electronically presents media to the user that is in the detected language. For example, the media may be television channels and/or programs that are in the detected language and/or a program guide, such as a pop-up menu, including such media that are in the detected language.
VOICE TRANSLATION AND VIDEO MANIPULATION SYSTEM
A communication modification system including an audio gathering unit that gathers an audio stream, a language detection unit that converts the audio stream into text, where the language detection unit correlates portions of the text with audio portions of the audio stream, and the language detection unit determines a first and second deviation in the audio stream portion based on the text portion and audio portion gathered by the audio gathering unit.
INTERACTIVE DATA SYSTEM AND PROCESS
A computer system for remote interactive graphical display and data management includes a data storage device storing data records, a remote data acquisition computer configured to selectively trigger display actions for the data records based on at least a time-based rule and a time-independent rule; a classification engine configured to classify a response received from a remote display interface having user-selectable options arranged to define a scale of values, in one of two categories, a first category and a second category, being below a first threshold value being classified as being in the first category, and responses on the scale above a second threshold value being in the second category, and a display interface generator configured to selectively generate a supplemental interface or a conclusion message dependent on the category.