Patent classifications
G10L15/07
DOCUMENTATION SYSTEM BASED ON DYNAMIC SEMANTIC TEMPLATES
A computer system and method transcribe a spoken dialogue, such as a dialogue between a physician and a patient, into a document, such as a clinical note. As the document is generated, if content is detected in the dialog which corresponds to a content template, the content template is inserted into the document. Fields in the content template may also be filled using information from the dialog and/or information external to the dialog.
SYSTEM AND METHOD FOR EXTRACTING AND DISPLAYING SPEAKER INFORMATION IN AN ATC TRANSCRIPTION
A system for extracting speaker information in an ATC transcription and displaying the speaker information on a graphical display unit is provided. The system is configured to: segment a stream of audio received from an ATC and other aircraft into a plurality of chunks; determine, for each chunk, if the speaker is enrolled in an enrolled speaker database; when the speaker is enrolled in the enrolled speaker database, decode the chunk using a speaker-dependent automatic speech recognition (ASR) model and tag the chunk with a permanent name for the speaker; when the speaker is not enrolled in the enrolled speaker database, assign a temporary name for the speaker, tag the chunk with the temporary name, and decode the chunk using a speaker independent speech recognition model; format the decoded chunk as text; and signal the graphical display unit to display the formatted text along with an identity for the speaker.
SYSTEM AND METHOD FOR FEDERATED, CONTEXT-SENSITIVE, ACOUSTIC MODEL REFINEMENT
A system and method for federated, context-sensitive, acoustic model refinement comprising a federated language model server and a plurality of edge devices. The federated language model server may comprise one or more machine learning models trained and developed centrally on the server, and distribute these one or more machine learning models to edge devices wherein they may be operated locally on the edge devices. The edge devices may gather or generate context data that can be used by a speech recognition engine, and the local language models contained therein, to develop adaptive, context-sensitive, user-specific language models. Periodically, the federated language model server may select a subset of edge devices from which to receive uploaded local model parameters, that may be aggregated to perform central model updates wherein the updated model parameters may then be sent back to edge devices in order to update the local model parameters.
SYSTEM AND METHOD FOR FEDERATED, CONTEXT-SENSITIVE, ACOUSTIC MODEL REFINEMENT
A system and method for federated, context-sensitive, acoustic model refinement comprising a federated language model server and a plurality of edge devices. The federated language model server may comprise one or more machine learning models trained and developed centrally on the server, and distribute these one or more machine learning models to edge devices wherein they may be operated locally on the edge devices. The edge devices may gather or generate context data that can be used by a speech recognition engine, and the local language models contained therein, to develop adaptive, context-sensitive, user-specific language models. Periodically, the federated language model server may select a subset of edge devices from which to receive uploaded local model parameters, that may be aggregated to perform central model updates wherein the updated model parameters may then be sent back to edge devices in order to update the local model parameters.
Filtering directive invoking vocal utterances
Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: receiving, from a user, voice data defining a candidate directive invoking vocal utterance for invoking a directive to execute a first text based command to perform a first computer function of a computer system, wherein the candidate directive invoking vocal utterance includes at least one word or phrase of the text based command, wherein the computer system is configured to perform the first computer function in response to the first text based command and wherein the computer system is configured to perform a second computer function in response to a second text based command; determining, based on machine logic, whether a word or phrase of the candidate vocal utterance sounds confusingly similar to a speech rendering of a word or phrase defining the second text based command.
End-To-End Speech Diarization Via Iterative Speaker Embedding
A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.
End-To-End Speech Diarization Via Iterative Speaker Embedding
A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.
Assisting Users with Efficient Information Sharing among Social Connections
In one embodiment, a method includes receiving a user input from a first user at the first client system, determining that the user input is a sharing request to share content, determining multiple second users the sharing request is directed to, determining, for each second user, modalities associated with the respective second user based on the content, a user profile associated with the respective second user, and modalities supported by a second client system the respective second user is currently engaged with, the respective second user being associated with two or more second client systems, and sending, to one or more second client systems currently associated with the second users, instructions for accessing the content based on the determined modalities for each second user.
Assisting Users with Efficient Information Sharing among Social Connections
In one embodiment, a method includes receiving a user input from a first user at the first client system, determining that the user input is a sharing request to share content, determining multiple second users the sharing request is directed to, determining, for each second user, modalities associated with the respective second user based on the content, a user profile associated with the respective second user, and modalities supported by a second client system the respective second user is currently engaged with, the respective second user being associated with two or more second client systems, and sending, to one or more second client systems currently associated with the second users, instructions for accessing the content based on the determined modalities for each second user.
Client device, information processing system, storage medium, and information processing method
Provided is a client device that includes a detection unit that detects an act of expressing gratitude by a user, a communication unit that transmits and receives at least a first portion of virtual currency, and a control unit that performs control, with recognition of an act of expressing gratitude by a first user on the basis of data detected, such that a certain amount corresponding to the act in virtual currency held by the first user is subtracted and a first portion of the certain amount of the virtual currency is managed as gratitude currency held by the first user.