G10L15/083

Lookup-Table Recurrent Language Model

A computer-implemented method includes receiving audio data that corresponds to an utterance spoken by a user and captured by a user device. The method also includes processing the audio data to determine a candidate transcription that includes a sequence of tokens for the spoken utterance. Tor each token in the sequence of tokens, the method includes determining a token embedding for corresponding token, determining a n-gram token embedding for a previous sequence of n-gram tokens, and concatenating the token embedding and the n-gram token embedding to generate a concatenated output for the corresponding token. The method also includes rescoring the candidate transcription for the spoken utterance by processing the concatenated output generated for each corresponding token in the sequence of tokens.

EMERGENCY MODE FOR MOBILE DEVICES

Methods and apparatuses for emergency modes for a mobile device (e.g., a mobile phone) are described. The mobile device may receive sonic signals indicative of a voice of a person uttering words and identify the person as an authorized user. The mobile device may determine that the words correlate with an emergency situation from a set of predefined emergency situations. Further, the mobile device may determine that the emergency situation satisfies a threshold. The mobile device may perform one or more tasks (e.g., making a phone call to an emergency service), after determining that the emergency situation has satisfied the threshold. In some cases, the mobile device may monitor physiological signals of the person and/or collect additional data to increase a confidence level associated with making the determination. In some cases, the mobile device may leverage or include an artificial intelligence algorithm/engine to facilitate managing the emergency modes.

Systems and methods to briefly deviate from and resume back to amending a section of a note

Systems and methods to briefly deviate from and resume back to amending a section of a note are disclosed. Exemplary implementations may: obtain audio information representing sound captured by an audio section of a client computing platform, such sound including speech from a user associated with the client computing platform; effectuate presentation of a graphical user interface that includes sections of the note; analyze the audio information to determine which individual ones of the spoken inputs are the primary spoken input or the deviant spoken input; determine, based on analysis, which section of the note to which the deviant spoken input is related; alternately amend, based on the determination, sections of the note by deviating from one section to another section and returning back to the one section for continued population; and effectuate, via the user interface, presentation of the alternating amendments to the sections of the note.

Word lattice augmentation for automatic speech recognition
11238227 · 2022-02-01 · ·

Speech processing techniques are disclosed that enable determining a text representation of named entities in captured audio data. Various implementations include determining the location of a carrier phrase in a word lattice representation of the captured audio data, where the carrier phrase provides an indication of a named entity. Additional or alternative implementations include matching a candidate named entity with the portion of the word lattice, and augmenting the word lattice with the matched candidate named entity.

SYSTEM AND METHOD OF AUTOMATIC SPEECH RECOGNITION USING PARALLEL PROCESSING FOR WEIGHTED FINITE STATE TRANSDUCER-BASED SPEECH DECODING

A system, article, and method of automatic speech recognition using parallel processing for weighted finite state transducer-based speech decoding.

Sarcasm-sensitive spoken dialog system
11250853 · 2022-02-15 · ·

A dialog system and a method of using the dialog system is disclosed. The method may comprise: receiving audible human speech from a user; determining that the audible human speech comprises sarcasm information; providing an input to a neural network, wherein the input comprises speech data input associated with the audible human speech, an embedding vector associated with the sarcasm information, and a one-hot vector; and based on the input, determining an audible response to the human speech.

ELECTRONIC DEVICE FOR PERFORMING VOICE RECOGNITION USING MICROPHONES SELECTED ON BASIS OF OPERATION STATE, AND OPERATION METHOD OF SAME
20220044670 · 2022-02-10 ·

Various embodiments of the present invention relate to an electronic device for performing voice recognition using microphones selected on the basis of the operation state, and an operation method of same. According to an embodiment, the electronic device includes: one or more microphone arrays which include a plurality of microphones; at least one processor operatively connected to the microphone arrays; and at least one memory electrically connected to the processor, wherein the memory may store instructions for the processor to, at the time of execution; receive wake-up utterances, for calling designated voice services, by using a first group of microphones among the plurality of microphones when operating in a first state; operate in a second state in response to the wake-up utterances; and receive subsequent utterances using a second group of microphones among the plurality of microphones when operating in the second state. Various other embodiments are also possible.

Network access method and apparatus for speech recognition service based on artificial intelligence

The present disclosure discloses a network access method and a network access apparatus for speech recognition service based on artificial intelligence. The network access method includes: judging whether there is available IP address information in an IP buffer module when a speech recognition request is received, in which the IP buffer module is configured to buffer IP address information used for a speech recognition performed successfully last time; performing an identity authentication on the available IP address information when there is the available IP address information in the IP buffer module; and accessing to the speech recognition service via the available IP address information passing the identity authentication, in which the speech recognition service is configured to recognize a speech in the speech recognition request.

Communication system and method for providing advice to improve a speaking style
11398224 · 2022-07-26 · ·

A communication system includes a first terminal device, a second terminal device, and an advice providing device. The first terminal device is operated by an operator. The second terminal device is operated by a guest. The second terminal device communicates with the first terminal device through a network. The advice providing device includes circuitry that determines advice for the operator based on voice data including first voice data that is related to the operator and transmitted from the first terminal device and second voice data that is related to the guest and transmitted from the second terminal device. The circuitry of the advice providing device further transmits the advice to the first terminal device. The first terminal device receives the advice and displays, on a display, the advice.

WORD LATTICE AUGMENTATION FOR AUTOMATIC SPEECH RECOGNITION
20220229992 · 2022-07-21 ·

Speech processing techniques are disclosed that enable determining a text representation of named entities in captured audio data. Various implementations include determining the location of a carrier phrase in a word lattice representation of the captured audio data, where the carrier phrase provides an indication of a named entity. Additional or alternative implementations include matching a candidate named entity with the portion of the word lattice, and augmenting the word lattice with the matched candidate named entity.