Patent classifications
G10L15/005
METHOD AND SYSTEM FOR EVALUATING AND IMPROVING LIVE TRANSLATION CAPTIONING SYSTEMS
Methods, systems, and apparatus, including computer programs encoded on computer storage media for evaluating and improving live translation captioning systems. An exemplary method includes: displaying a word in a first language; receiving a first audio sequence, the first audio sequence comprising a verbal description of the word; generating a first translated text in a second language; displaying the first translated text; receiving a second audio sequence, the second audio sequence comprising a guessed word based on the first translated text; generating a second translated text in the first language; determining a matching score between the word and the second translated text; determining a performance score of the live translation captioning system based on the matching score.
SYSTEMS, METHODS AND INTERFACES FOR MULTILINGUAL PROCESSING
Systems are provided for multilingual speech data processing. A language identification module is configured to analyze spoken utterances in an audio stream and to detect at least one language corresponding to the spoken language utterances. The language identification module detects that a first language corresponds to the first portion of the audio stream. A first transcription of the first portion of the audio stream in the first language is generated and stored in a cache. A second transcription of a second portion of the audio stream in the first language is also generated and stored. When the second portion of the audio stream corresponds to a second language, a third transcription is generated in the second language using a second speech recognition engine configured to transcribe spoken language utterances in the second language. Then, the second transcription is replaced with the third transcription in the cache and any displayed instances.
MULTI-FORMAT CONTENT REPOSITORY SEARCH
An audio file format of an audio portion of a natural language content is determined. Using a trained audio language identification model, a human language included in the audio portion is identified. Using a trained audio to text model trained on the human language, the audio portion is converted to a corresponding set of text data. The set of text data is indexed. Using the indexed set of text data responsive to a search query, a search result is generated, the search query specifying a search including a non-textual portion of the natural language content.
TECHNIQUES FOR LANGUAGE INDEPENDENT WAKE-UP WORD DETECTION
A method for a user device, including receiving a first acoustic input of a user speaking a wake-up word in the target language; providing a first acoustic feature derived from the first acoustic input to an acoustic model stored on the user device to obtain a first sequence of speech units corresponding to the wake-up word spoken by the user in the target language, the acoustic model trained on a corpus of training data in a source language different than the target language; receiving a second acoustic input including the wake-up word in the target language; providing a second acoustic feature derived from the second acoustic input to the acoustic model to obtain a second sequence of speech units corresponding to the wake-up word in the target language; and comparing the first and second sequences of speech units to recognize the wake-up word in the target language.
Audio recognition method, device and server
An audio recognition method, including: acquiring an audio file to be recognized (S100); extracting audio feature information of the audio file to be recognized, the audio feature information including audio fingerprints (S200); searching, in a fingerprint index database, audio attribute information matched with the audio feature information, the fingerprint index database including an audio fingerprint set in which invalid audio fingerprint removal has been performed on audio sample data (S300). As the audio fingerprint set in the fingerprint index database has been subjected to invalid audio fingerprint removal of audio sample data, the storage space of audio fingerprints in the fingerprint index database can be reduced, and the audio recognition efficiency can be improved. Further provided are an audio recognition device and a server.
Head mounted display device and method for providing visual aid using same
An external scene image captured by an external scene imaging electronic camera attached to a head mounted display (HMD) is projected and displayed onto an image display screen arranged in front of the eyes of the user as a virtual image with a suitable viewing distance corresponding to the visual acuity of the user. At this time, for each object image presented in the virtual image of the external scene image, the virtual image is processed and formatted to add a predetermined degree of binocular disparity and image blur to the virtual image projected and displayed on the right and the left image display screen on the basis of a predetermined converted distance calculated from the real distance of each object. Thus, the user is given a sense of a realistic perspective for the virtual image of the external scene, free of the discomfort or unease.
URGENCY-BASED QUEUE MANAGEMENT SYSTEMS AND METHODS
Disclosed embodiments may include a queue management system . The system may receive one or more utterances comprising a customer intent from a user device, determine a first queue from a plurality of queues in which to place the user based on the user intent, and receive first urgency data comprising battery indication data from the user device. The system may then determine, using a machine learning model, a first dynamic priority score for the user based on the user intent and the first urgency data including battery indication data associated with the user device. Based on the first dynamic priority score for the user, the system may assign an initial user-specific position within the first queue to the user that differs from a default initial position in the first queue. Based on updated urgency data, the system may dynamically update the user’s position to a second user-specific position.
Method and system for implementing language neutral virtual assistant
In one aspect, a computerized method useful for implementing a language neutral virtual assistant including the step of providing a language detector. The language detector comprises one or more trained language classifiers. With language detector identifying a language of an incoming message from a user to an artificially intelligent (AI) personal assistant. The method includes the step of receiving an incoming message to the AI personal assistant. The method includes the step of normalizing the incoming message, wherein the normalizing the incoming message comprises a set of spelling corrections and a set of grammar corrections. The method includes the step of translating the incoming message to a specified language with a specified encoding process and a specified decoding process. The method includes the step of providing an AI personal assistant engine that comprise an artificial intelligence which conducts a conversation via auditory or textual methods. The AI personal assistant engine provides outputs a response translator. The method includes the step of providing a response translator that uses the AI personal assistant engine output to provide a response to the user.
Reducing digital assistant latency when a language is incorrectly determined
Systems and processes for operating an intelligent automated assistant are provided. An example process includes causing a first recognition result for a received natural language speech input to be displayed, where the first recognition result is in a first language and a second recognition result for the received natural language speech input is available for display responsive to receiving input indicative of user selection of the first recognition result, the second recognition result being in a second language. The example process further includes receiving the input indicative of user selection of the first recognition result and in response to receiving the input indicative of user selection of the first recognition result, causing the second recognition result to be displayed.
ASSISTANCE DURING AUDIO AND VIDEO CALLS
Implementations relate to providing information items for display during a communication session. In some implementations, a computer-implemented method includes receiving, during a communication session between a first computing device and a second computing device, first media content from the communication session. The method further includes determining a first information item for display in the communication session based at least in part on the first media content. The method further includes sending a first command to at least one of the first computing device and the second computing device to display the first information item.