G10L15/005

Method for generating acoustic model
11551672 · 2023-01-10 · ·

A method for generating an acoustic model is disclosed. The method can generate the acoustic model with high accuracy through learning data including various dialects by training the acoustic model using text data, to which regional information is tagged, and changing a parameter of the acoustic model based on the tagged regional information. The acoustic model can be associated with an artificial intelligence module, an unmanned aerial vehicle (UAV), a robot, an augmented reality (AR) device, a virtual reality (VR) device, devices related to 5G services, and the like.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, AND INFORMATION PROCESSING METHOD
20230005478 · 2023-01-05 · ·

An information processing apparatus comprises a controller configured to: give a speech guidance to a user using first speech data corresponding to a first language; determine that the user utilizes a language different from the first language; and acquire second speech data corresponding to the language to be utilized by the user on a basis of a result of the determination.

Techniques for language independent wake-up word detection
11545146 · 2023-01-03 · ·

A user device configured to perform wake-up word detection in a target language. The user device comprises at least one microphone (430) configured to obtain acoustic information from the environment of the user device, at least one computer readable medium (435) storing an acoustic model (150) trained on a corpus of training data (105) in a source language different than the target language, and storing a first sequence of speech units obtained by providing acoustic features (110) derived from audio comprising the user speaking a wake-up word in the target language to the acoustic model (150), and at least one processor (415,425) coupled to the at least one computer readable medium (435) and programmed to perform receiving, from the at least one microphone (430), acoustic input from the user speaking in the target language while the user device is operating in a low-power mode, applying acoustic features derived from the acoustic input to the acoustic model (150) to obtain a second sequence of speech units corresponding to the acoustic input, determining if the user spoke the wake-up word at least in part by comparing the first sequence of speech units to the second sequence of speech units, and exiting the low-power mode if it is determined that the user spoke the wake-up word.

Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof

In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and generating a transcript of the audio recording from respective accepted hypotheses for the plurality of audio segments.

System and method for language-based service hailing

Systems and methods are provided for language-based service hailing. Such system may comprise one or more processors and a memory storing instructions that, when executed by the one or more processors, cause the computing system to obtain a plurality of speech samples, each speech sample comprising one or more words spoken in a language, train a neural network model with the speech samples to obtain a trained model for determining languages of speeches, obtain a voice input, identify at least one language corresponding to the voice based at least on applying the trained model to the voice input, and communicate a message in the identified language.

SYSTEM AND METHOD FOR ESTABLISHING DENTAL TREATMENT ENVIRONMENT
20220415489 · 2022-12-29 · ·

A system for establishing a dental treatment environment, includes: a head-mounted device provided at a dental clinic to be mounted on a patient's head, the head-mounted device having an image display unit and an ear-mounted speaker, a microphone for converting a sound including the voice of the medical staff in charge of the patient into an electric signal; a voice recognition module for recognizing the voice of the medical staff in charge from the electric sound input from the microphone; a content module storing multiple image contents for relaxing the patient mentally physically; a user interface having a content selection unit configured such that the patient can select a play content provided to the image display unit from the multiple image contents; and an output signal generating module for generating an output signal that is output to the head-mounted device.

MULTI-MODAL INPUT ON AN ELECTRONIC DEVICE

A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application.

AUDIO INFORMATION PROCESSING METHOD, APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM
20220406311 · 2022-12-22 ·

The present disclosure relates to an audio information processing method, an apparatus, an electronic device and a computer-readable storage medium. The audio information processing method includes: determining whether an audio recording start condition is satisfied; collecting audio information associated with an electronic device in response to determining that the audio recording start condition is satisfied; performing word segmentation on text information corresponding to the audio information to obtain word-segmented text information; and displaying the word-segmented text information on a user interface of the electronic device.

Integrated System and Related Methods for Learning, Collaboration, Tournament Hosting, and Business Management
20220405661 · 2022-12-22 ·

The present disclosure provides a system for hosting an online platform with multiple functionalities, separated into a plurality of interfaces, but all hosted within an integrated system to increase the immersion of a user in the learning experience. Multimedia content streaming, educational course, history, and tracking, and business management functions are provided on the various interfaces that quickly educate a user about a given industry. The platform is industry agnostic but can also be provided with specific functionalities such as competitive tournament hosting for the e-sports industry. Also provided herein is a method of translating an educational lecture from a first language into a plurality of second languages.

ELECTRONIC APPARATUS, CONTROLLING METHOD OF ELECTRONIC APPARATUS AND SERVER
20220392431 · 2022-12-08 ·

An electronic apparatus which registers a device to a server by using a voice, and a method therefor are provided. The electronic apparatus includes a communication circuit, a microphone, a memory for storing computer executable instructions, and at least one processor configured to execute the computer executable instructions to acquire, from a voice received through the microphone, information on an external device which a user wishes to register, based on an external device corresponding to the acquired information being searched through the communication circuit, control the communication circuit to transmit information on an access point to the external device to enable the external device to communicate with a server, and control the communication circuit to transmit a registration request with respect to the external device to the server.