Patent classifications
G10L25/60
Method, apparatus, device and computer storage medium for generating speech packet
A method, device and computer storage medium for generating a speech packet which relates to the technical field of speech are disclosed. The method may include: providing a speech recording interface to a user; obtaining speech data entered by the user after obtaining an event of triggering speech recording on the speech recording interface; uploading the speech data to a server side in response to determining that the speech data meets requirements for training a speech synthesis model; receiving a downloading address of the speech packet generated by the server side after training the speech synthesis model with the speech data. An ordinary user may customize a personalized speech packet through the speech recording interface provided by the client, without using professional recording equipment, which may substantially reduce the production cost of the speech packet.
Systems and methods for avoiding inadvertently triggering a voice assistant
Systems and methods are provided herein for avoiding inadvertently trigging a voice assistant with audio played through a speaker. An audio signal is captured by sampling a microphone of the voice assistant at a sampling frequency that is higher than an expected finite sampling frequency of previously recorded audio played through the speaker to generate a voice data sample. A quality metric of the generated voice data sample is calculated by determining whether the generated voice data sample comprises artifacts resulting from previous compression or approximation by the expected finite sampling frequency. Based on the calculated quality metric, it is determined whether the captured audio signal is previously recorded audio played through the speaker. Responsive to the determination that the captured audio signal is previously recorded audio played through the speaker, the voice assistant refrains from being activated.
Systems and methods for avoiding inadvertently triggering a voice assistant
Systems and methods are provided herein for avoiding inadvertently trigging a voice assistant with audio played through a speaker. An audio signal is captured by sampling a microphone of the voice assistant at a sampling frequency that is higher than an expected finite sampling frequency of previously recorded audio played through the speaker to generate a voice data sample. A quality metric of the generated voice data sample is calculated by determining whether the generated voice data sample comprises artifacts resulting from previous compression or approximation by the expected finite sampling frequency. Based on the calculated quality metric, it is determined whether the captured audio signal is previously recorded audio played through the speaker. Responsive to the determination that the captured audio signal is previously recorded audio played through the speaker, the voice assistant refrains from being activated.
METHOD FOR GENERATING SPEECH PACKAGE, AND ELECTRONIC DEVICE
A method for generating a speech package, an electronic device and a storage medium The method includes: determining a number of texts to be displayed and a speech recording condition based on a type of a recording mode selection control in response to the recording mode selection control being triggered; acquiring speech data with an amount matched with the number based on the speech recording condition; sending the speech data to a server; and acquiring a speech package generated by the server using the speech data.
METHOD FOR GENERATING SPEECH PACKAGE, AND ELECTRONIC DEVICE
A method for generating a speech package, an electronic device and a storage medium The method includes: determining a number of texts to be displayed and a speech recording condition based on a type of a recording mode selection control in response to the recording mode selection control being triggered; acquiring speech data with an amount matched with the number based on the speech recording condition; sending the speech data to a server; and acquiring a speech package generated by the server using the speech data.
AUDIOMETRIC RECEIVER SYSTEM TO DETECT AND PROCESS AUDIO SIGNALS
In an approach for detecting and processing multiple audio signals simultaneously, an audiometric receiver system comprises a transmitter, wherein the transmitter comprises a digital signal processor, and wherein the digital signal processor comprises a quality check component, an amplifier or attenuator component, mixer component, a modulator component, and an encrypter component; and a receiver, wherein the receiver comprises a decrypter component, a demodulator component, a splitter component, and a second amplifier or attenuator component.
AUDIOMETRIC RECEIVER SYSTEM TO DETECT AND PROCESS AUDIO SIGNALS
In an approach for detecting and processing multiple audio signals simultaneously, an audiometric receiver system comprises a transmitter, wherein the transmitter comprises a digital signal processor, and wherein the digital signal processor comprises a quality check component, an amplifier or attenuator component, mixer component, a modulator component, and an encrypter component; and a receiver, wherein the receiver comprises a decrypter component, a demodulator component, a splitter component, and a second amplifier or attenuator component.
ADAPTING SIBILANCE DETECTION BASED ON DETECTING SPECIFIC SOUNDS IN AN AUDIO SIGNAL
A method is disclosed herein for adapting parameters of a sibilance detector. Time-frequency features are extracted from an audio signal being received and. Based on those time-frequency features, a determination is made of whether the audio signal includes a short-term feature or a long-term feature. In accordance with determining that the audio signal includes the short-term feature or the long-term feature, one or more parameters of a sibilance detector for detecting sibilance in the audio signal are adapted. Sibilance in the audio signal, is detected using the sibilance detector with the one or more adapted parameters.
Device arbitration by multiple speech processing systems
A device can perform device arbitration, even when the device is unable to communicate with a remote system over a wide area network (e.g., the Internet). Upon detecting a wakeword in an utterance, the device can wait a period of time for data to arrive at the device, which, if received, indicates to the device that another speech interface device in the environment detected an utterance. If the device receives data prior to the period of time lapsing, the device can determine the earliest-occurring wakeword based on multiple wakeword occurrence times, and may designate whichever device that detected the wakeword first as the designated device to perform an action with respect to the user speech. To account for differences in sound capture latency between speech interface devices, a pre-calculated time offset value can be applied to wakeword occurrence time(s) during device arbitration.
Device arbitration by multiple speech processing systems
A device can perform device arbitration, even when the device is unable to communicate with a remote system over a wide area network (e.g., the Internet). Upon detecting a wakeword in an utterance, the device can wait a period of time for data to arrive at the device, which, if received, indicates to the device that another speech interface device in the environment detected an utterance. If the device receives data prior to the period of time lapsing, the device can determine the earliest-occurring wakeword based on multiple wakeword occurrence times, and may designate whichever device that detected the wakeword first as the designated device to perform an action with respect to the user speech. To account for differences in sound capture latency between speech interface devices, a pre-calculated time offset value can be applied to wakeword occurrence time(s) during device arbitration.