G10L15/24

Dynamic language and command recognition

Systems and methods are described for processing and interpreting audible commands spoken in one or more languages. Speech recognition systems disclosed herein may be used as a stand-alone speech recognition system or comprise a portion of another content consumption system. A requesting user may provide audio input (e.g., command data) to the speech recognition system via a computing device to request an entertainment system to perform one or more operational commands. The speech recognition system may analyze the audio input across a variety of linguistic models, and may parse the audio input to identify a plurality of phrases and corresponding action classifiers. In some embodiments, the speech recognition system may utilize the action classifiers and other information to determine the one or more identified phrases that appropriately match the desired intent and operational command associated with the user's spoken command.

Coordinating content-item output across multiple electronic devices

This disclosure describes, in part, techniques and systems for generating and outputting immersive, multi-device content items in user environment, such as connected homes, offices, and the like. For example, the techniques and systems may output different portions of content on different devices within a user environment based on information such as respective capabilities of the devices, a current location of the user within the environment, a time of day, which user(s) are present in the environment, and/or the like.

Coordinating content-item output across multiple electronic devices

This disclosure describes, in part, techniques and systems for generating and outputting immersive, multi-device content items in user environment, such as connected homes, offices, and the like. For example, the techniques and systems may output different portions of content on different devices within a user environment based on information such as respective capabilities of the devices, a current location of the user within the environment, a time of day, which user(s) are present in the environment, and/or the like.

Electronic apparatus and method for controlling electronic apparatus

An electronic apparatus and a control method thereof are provided. The electronic apparatus includes a microphone, a camera, a memory storing an instruction, and a processor configured to control the electronic apparatus coupled with the microphone, the camera and the memory, and the processor is configured to, by executing the instruction, obtain a user image by photographing a user through the camera, obtain the user information based on the user image, and based on a user speech being input from the user through the microphone, recognize the user speech by using a speech recognition model corresponding to the user information among a plurality of speech recognition models.

Electronic apparatus and method for controlling electronic apparatus

An electronic apparatus and a control method thereof are provided. The electronic apparatus includes a microphone, a camera, a memory storing an instruction, and a processor configured to control the electronic apparatus coupled with the microphone, the camera and the memory, and the processor is configured to, by executing the instruction, obtain a user image by photographing a user through the camera, obtain the user information based on the user image, and based on a user speech being input from the user through the microphone, recognize the user speech by using a speech recognition model corresponding to the user information among a plurality of speech recognition models.

Speech synthesis in noisy environment

Disclosed is speech synthesis in a noisy environment. According to an embodiment of the disclosure, a method of speech synthesis may generate a Lombard effect-applied synthesized speech using a feature vector generated from an utterance feature. According to the disclosure, the speech synthesis method and device may be related to artificial intelligence (AI) modules, unmanned aerial vehicles (UAVs), robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.

Speech synthesis in noisy environment

Disclosed is speech synthesis in a noisy environment. According to an embodiment of the disclosure, a method of speech synthesis may generate a Lombard effect-applied synthesized speech using a feature vector generated from an utterance feature. According to the disclosure, the speech synthesis method and device may be related to artificial intelligence (AI) modules, unmanned aerial vehicles (UAVs), robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.

Voice interactive system
11627417 · 2023-04-11 · ·

An interactive system can utilize microtechnology (e.g., a micro-electromechanical system (MEMS)), such as miniaturized microphone (e.g., a bone-conducting microphone), audio output device, microprocessor, and signal conversion and propagation means to create a personal area network (PAN) for a user. The system can include a voice input device (e.g., worn on one or more teeth of the user) that outputs a near-field magnetic induction (NFMI) signal based on a whisper input by the user. The NFMI signal is either detected by the user's mobile device, or converted into a wireless signal (e.g., a Bluetooth RF signal) detectable by the user's mobile device, for receiving voice commands (e.g., to provide personal assistant services) via a designated application running on the mobile device.

Image capture device that reduces gaps between captures
11470260 · 2022-10-11 · ·

After a command to stop recording a video is received, an image capture device may buffer footage in a buffer memory. The buffer memory may be used as a post-capture cache. The footage buffered in the buffer memory may be appended to the end of previously captured footage, appended to the beginning of subsequently captured footage, and/or used to bridge two separately captured footage.

Image capture device that reduces gaps between captures
11470260 · 2022-10-11 · ·

After a command to stop recording a video is received, an image capture device may buffer footage in a buffer memory. The buffer memory may be used as a post-capture cache. The footage buffered in the buffer memory may be appended to the end of previously captured footage, appended to the beginning of subsequently captured footage, and/or used to bridge two separately captured footage.