G10L2025/783

Terminal control method, terminal and non-transitory computer readable storage medium
11568888 · 2023-01-31 · ·

A terminal control method, a terminal and a non-transitory computer-readable storage medium are provided. The terminal control method includes: receiving, by a microphone, a detection audio signal emitted from a speaker and having a frequency within a pre-set detection frequency range; acquiring actual audio parameters of the detection audio signal when being received by the microphone, and original audio parameters of the detection audio signal when being emitted from the speaker; determining a relative state between the microphone and the speaker according to the actual audio parameters and the original audio parameters; determining a terminal control operation to be performed, according to the relative state and a pre-set correspondence between relative states and terminal control operations; and performing the determined terminal control operation on a terminal where the microphone is located.

Image pickup apparatus that controls operations based on voice, control method, and storage medium
11570349 · 2023-01-31 · ·

An image pickup apparatus includes an image pickup unit that obtains video, an audio input unit that collects sound and a control unit that controls recording of the video based on a wake word and a control word included in the sound collected by the audio input unit. In a case where the control word that gives an instruction to stop recording the video is included in the sound, the control unit stops recording the video and records video data before a start time of the wake word as a video file.

Audio system and signal processing method of voice activity detection for an ear mountable playback device
11705103 · 2023-07-18 · ·

An audio system for an ear mountable playback device comprises a speaker, an error microphone predominantly sensing sound being output from the speaker and a feed-forward microphone predominantly sensing ambient sound. The audio system further comprises a voice activity detector which is configured to record a feed-forward signal from the feed-forward microphone. Furthermore, an error signal is recorded from the error microphone. A detection parameter is determined as a function of the feed-forward signal and the error signal. The detection parameter is monitored and a voice activity state is set depending on the detection parameter.

Adapting Automated Speech Recognition Parameters Based on Hotword Properties
20230223014 · 2023-07-13 · ·

A method for optimizing speech recognition includes receiving a first acoustic segment characterizing a hotword detected by a hotword detector in streaming audio captured by a user device, extracting one or more hotword attributes from the first acoustic segment, and adjusting, based on the one or more hotword attributes extracted from the first acoustic segment, one or more speech recognition parameters of an automated speech recognition (ASR) model. After adjusting the speech recognition parameters of the ASR model, the method also includes processing, using the ASR model, a second acoustic segment to generate a speech recognition result. The second acoustic segment characterizes a spoken query/command that follows the first acoustic segment in the streaming audio captured by the user device.

AUTOMATIC CAMERA SELECTION IN A COMMUNICATION DEVICE
20230224579 · 2023-07-13 ·

A method, a first communication device and a computer program product for selecting an active camera from a front facing camera and a rear facing camera for use during a video communication session. A request is detected, via a processor, to transition to a video communication session between the first communication device and a second communication device. The first communication device receives, from the second communication device, first context identifying data that identifies which of at least one front facing camera or at least one rear facing camera to activate. The first context identifying data is generated at the second communication device based on information within the exchanged communication. At least one front facing camera or at least one rear facing camera identified by the received first context identifying data is selected and activated.

End of query detection

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting an end of a query are disclosed. In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance spoken by a user. The actions further include applying, to the audio data, an end of query model. The actions further include determining the confidence score that reflects a likelihood that the utterance is a complete utterance. The actions further include comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to a confidence score threshold. The actions further include determining whether the utterance is likely complete or likely incomplete. The actions further include providing, for output, an instruction to (i) maintain a microphone that is receiving the utterance in an active state or (ii) deactivate the microphone that is receiving the utterance.

MEETING INCLUSION AND HYBRID WORKPLACE INSIGHTS

The disclosure herein describes a system for calculating meeting inclusion metrics including insights and recommendations. Meeting data associated with one or more meetings attended by at least one participant remotely is converted into anonymized meeting data for inclusivity metric analysis. An inclusivity insights manager generates inclusivity metrics associated with inclusive behavior and language occurring during meetings to measure the level of inclusivity. The inclusivity metrics include attendee participation metrics measuring an amount of participation by each meeting attendee, participation in-person versus participation remotely, concurrent speech indicating attendees may be talking over one another or other interruptions occurring during meetings. Inclusivity metric data includes insights and actionable recommendations to improve inclusivity at future meetings provided at an individual level, group level or organizational level. The inclusivity insights can also include percentage metric values, graphs, feedback, and other metric-related information for improving participation by meeting attendees.

Recognition or synthesis of human-uttered harmonic sounds
11545143 · 2023-01-03 ·

Within each harmonic spectrum of a sequence of spectra derived from analysis of a waveform representing human speech are identified two or more fundamental or harmonic components that have frequencies that are separated by integer multiples of a fundamental acoustic frequency. The highest harmonic frequency that is also greater than 410 Hz is a primary cap frequency, which is used to select a primary phonetic note that corresponds to a subset of phonetic chords from a set of phonetic chords for which acoustic spectral is available. The spectral data can also include frequencies for primary band, secondary band (or secondary note), basal band, or reduced basal band acoustic components, which can be used to select a phonetic chord from the subset of phonetic chords corresponding to the selected primary note.

Interaction system, non-transitory computer readable storage medium, and method for controlling interaction system
11538491 · 2022-12-27 · ·

An interaction system that interacts with a user is disclosed. The interaction system includes: an input device that receives a speech signal of the user; a computing device that determines a speech content of the interaction system for a speech content acquired from the speech signal of the user such that a frequency distribution of speech feature values of the speech content of the interaction system approaches an ideal frequency distribution; and an output device that outputs the determined speech content of the interaction system.

METHOD FOR SELECTING OUTPUT WAVE BEAM OF MICROPHONE ARRAY
20220399028 · 2022-12-15 ·

A method for selecting an output wave beam of a microphone array, comprising: (a) receiving a plurality of voice signals from the microphone array comprising a plurality of microphones, and performing beamforming on the voice signals to obtain a plurality of wave beams and corresponding wave beam output signals (102); (b) performing the following operation on each wave beam: converting the wave beam output signal of a current wave beam to frequency domain from time domain to obtain a frequency spectrum vector and a power spectrum vector of the current wave beam (104); on the basis of the frequency spectrum vector and the power spectrum vector of the current wave beam, calculating comprehensive voice signal energy of the current wave beam, wherein the comprehensive voice signal energy is the product of comprehensive energy of the current wave beam and a comprehensive voice existence probability, the comprehensive energy indicates the energy level of the wave beam output signal of the current wave beam, the comprehensive voice existence probability indicates an existence probability of voice in the wave beam output signal of the current wave beam, and the comprehensive voice existence probability and the comprehensive energy are scalar quantities (106); and (c) selecting the wave beam with a maximal comprehensive voice signal energy value as the output wave beam (110).