G10L15/24

ACCIDENTAL VOICE TRIGGER AVOIDANCE USING THERMAL DATA
20230031145 · 2023-02-02 ·

Methods and systems for processing voice commands are disclosed. A voice controlled device may receive audio data comprising a voice command. Location information indicative of the source of the audio data may be determined. One or more devices may be caused to determine signals based on the location information. The one or more devices may receive thermal data in response to the signals. The thermal data may be analyzed to determine if the thermal data indicates the presence of a person at the expected location. If a person is detected, then the audio data may processed to cause the voice command to be executed.

System to convert phonemes into phonetics-based words
11615786 · 2023-03-28 ·

A system to convert phonemes into phonetics-based words that is implemented in one or more computing systems, in association with a system that provides required inputs is disclosed. Said system comprises a phoneme enhancer, a phoneme sequence buffer, a phoneme sequence to phonetics-based word converter that comprises a sliding window phoneme sequence matcher, a phoneme sequence to phonetics-based word custom data memory, a most frequent phonetics-based word predictive memory, a phoneme similarity matrix, and a phonetics-based word output unit.

HEARING AID WITH VOICE RECOGNITION
20230080418 · 2023-03-16 · ·

A system for selectively amplifying audio signals may include a microphone configured to capture sounds from an environment of a user. The system may also include a processor programmed to: receive audio signals representative of the sounds captured by the microphone; cause selective conditioning of at least one audio signal received by the microphone from a region associated with the recognized individual; and cause transmission of the at least one conditioned audio signal to a hearing interface device configured to provide sound to an ear of the user.

Method, apparatus, and terminal for providing sign language video reflecting appearance of conversation partner

Disclosed is a method of providing a sign language video reflecting an appearance of a conversation partner. The method includes recognizing a speech language sentence from speech information, and recognizing an appearance image and a background image from video information. The method further comprises acquiring multiple pieces of word-joint information corresponding to the speech language sentence from joint information database, sequentially inputting the word-joint information to a deep learning neural network to generate sentence-joint information, generating a motion model on the basis of the sentence-joint information, and generating a sign language video in which the background image and the appearance image are synthesized with the motion model. The method provides a natural communication environment between a sign language user and a speech language user.

Method, apparatus, and terminal for providing sign language video reflecting appearance of conversation partner

Disclosed is a method of providing a sign language video reflecting an appearance of a conversation partner. The method includes recognizing a speech language sentence from speech information, and recognizing an appearance image and a background image from video information. The method further comprises acquiring multiple pieces of word-joint information corresponding to the speech language sentence from joint information database, sequentially inputting the word-joint information to a deep learning neural network to generate sentence-joint information, generating a motion model on the basis of the sentence-joint information, and generating a sign language video in which the background image and the appearance image are synthesized with the motion model. The method provides a natural communication environment between a sign language user and a speech language user.

Voice Command Integration into Augmented Reality Systems and Virtual Reality Systems
20230128422 · 2023-04-27 ·

In one embodiment, a method includes receiving, by a XR display device, a gesture-based input from a first user of the XR display device, processing, using a gesture-detection model, the gesture-based input to identify a first gesture, receiving, by the XR display device, an audio input from the first user, where the audio input includes a first voice command, processing, using a natural-language model, the audio input to identify one or more intents or one or more slots associated with the first voice command, determining whether the identified first gesture matches the first voice command, and executing, responsive to the identified first gesture matching the first voice command and by the XR display device, a first task corresponding to the first voice command based on the identified first gesture and the identified one or more intents or one or more slots.

Voice Command Integration into Augmented Reality Systems and Virtual Reality Systems
20230128422 · 2023-04-27 ·

In one embodiment, a method includes receiving, by a XR display device, a gesture-based input from a first user of the XR display device, processing, using a gesture-detection model, the gesture-based input to identify a first gesture, receiving, by the XR display device, an audio input from the first user, where the audio input includes a first voice command, processing, using a natural-language model, the audio input to identify one or more intents or one or more slots associated with the first voice command, determining whether the identified first gesture matches the first voice command, and executing, responsive to the identified first gesture matching the first voice command and by the XR display device, a first task corresponding to the first voice command based on the identified first gesture and the identified one or more intents or one or more slots.

Identifying information and associated individuals

A hearing aid system for individual identification of a hearing aid system may include a wearable camera, a microphone, and at least one processor. The processor may be programmed to receive a plurality of images captured by the wearable camera; receive audio signals representative of sounds captured by the microphone; and identify a first audio signal, from among the received audio signals, representative of a voice of a first individual. The processor may transcribe and store, in a memory, text corresponding to speech associated with the voice of the first individual and determine whether the first individual is a recognized individual. If the first individual is a recognized individual, the processor may associate an identifier of the first recognized individual with the stored text corresponding to the speech associated with the voice of the first individual.

System and method for virtual assistant situation commentary

Techniques for virtual assistant situation commentary are provided. At least one image frame of a field of view (FOV) of a camera may be received, the at least one image frame intended to be sent to at least one participant of a talk group. A description associated with each element of a plurality of elements within the FOV of the camera may be generated. It may be determined that the at least one participant of the talk group is not currently visually engaged. Audio communication of a sender of the at least one image frame may be monitored to identify a reference to an element of the plurality of elements. The audio communication may be supplemented to include portions of the description of the element that were not included in the audio communication from the sender when it is determined that the at least one participant is not visually engaged.

System and method for virtual assistant situation commentary

Techniques for virtual assistant situation commentary are provided. At least one image frame of a field of view (FOV) of a camera may be received, the at least one image frame intended to be sent to at least one participant of a talk group. A description associated with each element of a plurality of elements within the FOV of the camera may be generated. It may be determined that the at least one participant of the talk group is not currently visually engaged. Audio communication of a sender of the at least one image frame may be monitored to identify a reference to an element of the plurality of elements. The audio communication may be supplemented to include portions of the description of the element that were not included in the audio communication from the sender when it is determined that the at least one participant is not visually engaged.