G10L2015/025

SOMATIC, AUDITORY AND COCHLEAR COMMUNICATION SYSTEM AND METHOD
20220370803 · 2022-11-24 ·

Methods and devices to deliver a tactile speech analog to a person's skin providing a silent, invisible, hands-free, eyes-free, and ears-free way to receive and directly comprehend electronic communications. Embodiments include an alternative to hearing aids that will enable people with hearing loss to better understand speech. A device, worn like watch or bracelet, supplements a person's remaining hearing to help identify and disambiguate those sounds he or she can not hear properly. Embodiments for hearing aids and hearing prosthetics are also described.

METHOD FOR ANIMATION SYNTHESIS, ELECTRONIC DEVICE AND STORAGE MEDIUM
20220375456 · 2022-11-24 ·

A method for animation synthesis includes: obtaining an audio stream to be processed and a syllable sequence, wherein both the audio stream and the syllable sequence correspond to the same text and each syllable in the syllable sequence is pinyin of each character of the text; obtaining a phoneme information sequence of the audio stream by performing phoneme detection on the audio stream, wherein each piece of phoneme information in the phoneme information sequence comprises a phoneme category and a pronunciation time period; determining a pronunciation time period corresponding to each syllable in the syllable sequence based on the syllable sequence, phoneme categories and pronunciation time periods in the phoneme information sequence; and generating an animation video corresponding to the audio stream based on the pronunciation time period corresponding to each syllable in the syllable sequence and an animation frame sequence corresponding to each syllable.

PHONEME MISPRONUNCIATION RANKING AND PHONEMIC RULES FOR IDENTIFYING READING PASSAGES FOR READING PROGRESS
20220375455 · 2022-11-24 ·

A method of identifying reading passages for reading progress can include receiving a set of error-indicated phonemes, wherein the set of error-indicated phonemes correspond to pronunciation errors identified in a recorded audio file from an individual reading an assigned passage aloud; determining corresponding error-indicated phonetic rules for each error-indicated phoneme of the set of error-indicated phonemes using a mapping of phonemes to phonetic rules; identifying at least one content passage from a set of content passages that satisfies a condition with respect to the error-indicated phonetic rules; and providing the at least one content passage for a new assignment for the individual to read aloud.

Apparatus and method for voice event detection

A voice event detection apparatus is disclosed. The apparatus comprises a vibration to digital converter and a computing unit. The vibration to digital converter is configured to convert an input audio signal into vibration data. The computing unit is configured to trigger a downstream module according to a sum of vibration counts of the vibration data for a number X of frames. In an embodiment, the voice event detection apparatus is capable of correctly distinguishing a wake phoneme from the input vibration data so as to trigger a downstream module of a computing system. Thus, the power consumption of the computing system is saved.

Extracting natural language semantics from speech without the use of speech recognition
11508355 · 2022-11-22 · ·

Systems and methods are disclosed herein for discerning aspects of user speech to determine user intent and/or other acoustic features of a sound input without the use of an ASR engine. To this end, a processor may receive a sound signal comprising raw acoustic data from a client device, and divides the data into acoustic units. The processor feeds the acoustic units through a first machine learning model to obtain a first output and determines a first mapping, using the first output, of each respective acoustic unit to a plurality of candidate representations of the respective acoustic unit. The processor feeds each candidate representation of the plurality through a second machine learning model to obtain a second output, determines a second mapping, using the second output, of each candidate representation to a known condition, and determines a label for the sound signal based on the second mapping.

Method of Training Voice Recognition Model and Voice Recognition Device Trained by Using Same Method
20230055233 · 2023-02-23 ·

A method of training a voice recognition model to convert voice data to text data, according to one embodiment of the present invention, comprises the steps of: receiving the voice data input; converting the voice data into one or more grapheme data items using the voice recognition model; generating one or more word candidates corresponding to the one or more grapheme data items by using the voice recognition model; determining, on the basis of context, one of the word candidates as the text data that corresponds to the voice data, by using the voice recognition model; and adding a weight to one or more rules associated with generation of the word candidate determined as the text data, by using a back propagation value generated on the basis of the text data.

UTTERANCE EVALUATION APPARATUS, UTTERANCE EVALUATION, AND PROGRAM

A stable evaluation result is obtained from a voice of speech for any sentence. A speech evaluation device (1) outputs a score for evaluating speech of an input voice signal spoken by a speaker in a first group. A feature extraction unit (11) extracts an acoustic feature from the input voice signal. A conversion unit (12) converts the acoustic feature of the input voice signal to an acoustic feature when a speaker in a second group speaks the same text as text of the input voice signal. An evaluation unit (13) calculates a score indicating a higher evaluation as a distance between the acoustic feature before the conversion and the acoustic feature after the conversion becomes shorter.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

An information processing apparatus includes a processor configured to: segment, into multiple voice segments, voice data and text data converted from the voice data; impart a security level to each of the voice segments in accordance with contents of the text data and the voice data in each of the voice segments; and perform control on an output of each of the voice segments in accordance with the security level.

INFORMATION PROCESSING APPARATUS AND COMMAND PROCESSING METHOD
20220366909 · 2022-11-17 · ·

An acoustic feature detection unit (31) detects acoustic features of voice discretely input separately from a command instructing movement of an operation target. A movement control unit (32) controls the movement of the operation target instructed by the command on the basis of the acoustic features detected by the acoustic feature detection unit (31).

Efficient empirical determination, computation, and use of acoustic confusability measures

A computer-implemented method includes generating an empirically derived acoustic confusability measure by processing example utterances and iterating from an initial estimate of the acoustic confusability measure to improve the measure. The method can further include using the acoustic confusability measure to selectively limit phrases to make recognizable by a speech recognition application.