Patent classifications
G10L2025/783
ACOUSTIC ACTIVITY DETECTION
Acoustic activity detection is provided herein. Operations of a method can include receiving an acoustic signal at a micro-electromechanical system (MEMS) microphone. Based on portions of the acoustic signal being determined to exceed a threshold signal level, output pulses are generated. Further, the method can include extracting information representative of a frequency of the acoustic signal based on respective spacing between rising edges of the output pulses.
CONTACT AND ACOUSTIC MICROPHONES FOR VOICE WAKE AND VOICE PROCESSING FOR AR/VR APPLICATIONS
A method to combine contact and acoustic microphones in a headset for voice wake and voice processing in immersive reality applications is provided. The method includes receiving, from a contact microphone, a first acoustic signal, determining a fidelity and a quality of the first acoustic signal, receiving, from an acoustic microphone, a second acoustic signal, and when the fidelity and quality of the first acoustic signal exceeds a pre-selected threshold, combining the first acoustic signal and the second acoustic signal to provide an enhanced acoustic signal to a smart glass user. A non-transitory, computer-readable medium storing instructions to cause a headset to perform the above method, and the headset, are also provided.
USER VOICE DETECTOR DEVICE AND METHOD USING IN-EAR MICROPHONE SIGNAL OF OCCLUDED EAR
A device and a method for detecting voice of a user of an intra-aural device. The intra-aural device has an in-ear microphone adapted to be in fluid communication with an outer-ear ear canal of the user occluded from an environment outside the ear. A signal provided by the in-ear microphone is obtained to determine an acquired voice indicator signal, and a voice produced by the user is detecting by comparing the acquired voice indicator signal with a corresponding threshold value, upon the acquired voice indicator signal being larger than the corresponding threshold value. Although the method also reduces any voice interference coming from a non-user, the results are improved when the non-user voice is captured from an outer-ear microphone of the intra-aural device.
METHOD, APPARATUS FOR ELIMINATING POPPING SOUNDS AT THE BEGINNING OF AUDIO, AND STORAGE MEDIUM
A method and apparatus for eliminating popping sounds at the beginning of audio includes: examining audio frames within a pre-set time period at the beginning of audio to determine a popping residing section; applying popping elimination to audio frames in the popping residing section; calculating an average value of amplitudes of M audio frames preceding the popping residing section and an average value of amplitudes of K audio frames succeeding the popping residing section; setting the amplitudes of the audio frames in the popping residing section to zero in response to a determination that the two average values are both smaller than a pre-set sound reduction threshold; weakening the amplitudes of the audio frames in the popping residing section in response to a determination that both the two average values are not smaller than a pre-set sound reduction threshold; M and K are integers larger than one.
Natural assistant interaction
Systems and processes for operating a virtual assistant to provide natural assistant interaction are provided. In accordance with one or more examples, a method includes, at an electronic device with one or more processors and memory: receiving a first audio stream including one or more utterances; determining whether the first audio stream includes a lexical trigger; generating one or more candidate text representations of the one or more utterances; determining whether at least one candidate text representation of the one or more candidate text representations is to be disregarded by the virtual assistant. If at least one candidate text representation is to be disregarded, one or more candidate intents are generated based on candidate text representations of the one or more candidate text representations other than the to be disregarded at least one candidate text representation.
ENHANCED SPEECH ENDPOINTING
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.
Privacy device for smart speakers
Systems, apparatuses, and methods are described for a privacy blocking device configured to prevent receipt, by a listening device, of video and/or audio data until a trigger occurs. A blocker may be configured to prevent receipt of video and/or audio data by one or more microphones and/or one or more cameras of a listening device. The blocker may use the one or more microphones, the one or more cameras, and/or one or more second microphones and/or one or more second cameras to monitor for a trigger. The blocker may process the data. Upon detecting the trigger, the blocker may transmit data to the listening device. For example, the blocker may transmit all or a part of a spoken phrase to the listening device.
Speech endpointing
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing are described. In one aspect, a method includes the action of accessing voice query log data that includes voice queries spoken by a particular user. The actions further include based on the voice query log data that includes voice queries spoken by a particular user, determining a pause threshold from the voice query log data that includes voice queries spoken by the particular user. The actions further include receiving, from the particular user, an utterance. The actions further include determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold. The actions further include based on determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold, processing the utterance as a voice query.
SPEECH ENDPOINTING BASED ON WORD COMPARISONS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
METHOD AND DEVICE FOR IMPROVING DYSARTHRIA
A method of providing a language training to a user by a computing device comprising a processor and a memory is provided. The method comprises: providing contents corresponding to the language training to a user terminal; receiving the user’s voice data from the user terminal; detecting a pitch and a loudness of the user’s voice by analyzing the voice data; and generating a training evaluation by evaluating the user’s training for the contents corresponding to the language training based on the user’s voice data, further comprising determining a phoneme with poor pronunciation accuracy by analyzing the user’s voice data; and automatically generating and providing at least one of a vocabulary, a sentence, and a paragraph including the determined phoneme.