G10L2021/0135

HEARING SYSTEM INCLUDING A HEARING INSTRUMENT AND METHOD FOR OPERATING THE HEARING INSTRUMENT
20230047868 · 2023-02-16 ·

A hearing system includes a hearing instrument for capturing a sound signal from an environment of the hearing instrument. The captured sound signal is processed, and the processed sound signal is output to a user of the hearing instrument. In a speech recognition step, the captured sound signal is analyzed to recognize speech intervals, in which the captured sound signal contains speech. In a speech enhancement procedure performed during recognized speech intervals, the amplitude of the processed sound signal is periodically varied according to a temporal pattern that is consistent with a stress rhythmic pattern of the user. A method for operating the hearing instrument is also provided.

Speaker identity and content de-identification

One embodiment of the invention provides a method for speaker identity and content de-identification under privacy guarantees. The method comprises receiving input indicative of privacy protection levels to enforce, extracting features from a speech recorded in a voice recording, recognizing and extracting textual content from the speech, parsing the textual content to recognize privacy-sensitive personal information about an individual, generating de-identified textual content by anonymizing the personal information to an extent that satisfies the privacy protection levels and conceals the individual's identity, and mapping the de-identified textual content to a speaker who delivered the speech. The method further comprises generating a synthetic speaker identity based on other features that are dissimilar from the features to an extent that satisfies the privacy protection levels, and synthesizing a new speech waveform based on the synthetic speaker identity to deliver the de-identified textual content. The new speech waveform conceals the speaker's identity.

Removal of identifying traits of a user in a virtual environment

A virtual environment platform may receive, from a user device, a request to access a virtual reality (VR) environment and may verify, based on the request, a user of the user device to allow the user device access to the VR environment. The virtual environment platform may receive, after verifying the user of the user device, user voice input and user handwritten input from the user device. The virtual environment platform may generate processed user speech by processing the user voice input, wherein a characteristic of the processed user speech and a corresponding characteristic of the user voice input are different and may generate formatted user text by processing the user handwritten input, wherein the formatted user text is machine-encoded text. The virtual environment platform may cause the processed user speech to be audibly presented and the formatted user text to be visually presented in the VR environment.

PERSONALIZED VOICE CONVERSION SYSTEM

A personalized voice conversion system includes a cloud server and an intelligent device that communicates with the cloud server. The intelligent device upstreams an original voice signal to the cloud server. The cloud server converts the original voice signal into an intelligible voice signal based on an intelligible voice conversion model. The intelligent device downloads and plays the intelligible voice signal. Based on the original voice signal and the corresponding intelligible voice signal, the cloud server and the intelligent device train an off-line voice conversion model provided to the intelligent device. When the intelligent device stops communicating with the cloud server, the intelligent device converts a new original voice signal into a new intelligible voice signal based on the off-line voice conversion model and plays the new intelligible voice signal.

Generation and detection of watermark for real-time voice conversion
11538485 · 2022-12-27 · ·

A method watermarks speech data by using a generator to generate speech data including a watermark. The generator is trained to generate the speech data including the watermark. The training process generates first speech from the generator. The first speech data is configured to represent speech. The first speech data includes a candidate watermark. The training also produces an inconsistency message as a function of at least one difference between the first speech data and at least authentic speech data. The training further includes transforming the first speech data, including the candidate watermark, using a watermark robustness module to produce transformed speech data including a transformed candidate watermark. The transformed speech data includes a transformed candidate watermark. The training further produces a watermark-detectability message, using a watermark detection machine learning system, relating to one or more desirable watermark features of the transformed candidate watermark.

METHOD OF CONVERTING SPEECH, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM
20220383876 · 2022-12-01 ·

A method of converting a speech, an electronic device, and a readable storage medium are provided, which relate to a field of artificial intelligence technology such as speech and deep learning, in particular to speech converting technology. The method of converting a speech includes: acquiring a first speech of a target speaker; acquiring a speech of an original speaker; extracting a first feature parameter of the first speech of the target speaker; extracting a second feature parameter of the speech of the original speaker; processing the first feature parameter and the second feature parameter to obtain a Mel spectrum information; and converting the Mel spectrum information to output a second speech of the target speaker having a tone identical to a tone of the first speech of the target speaker and a content identical to a content of the speech of the original speaker.

METHODS AND SYSTEMS FOR IMAGE PROCESSING USING A LEARNING ENGINE

Systems and methods are disclosed configured to train an autoencoder. A data training set is generated comprising images of different faces. A first autoencoder configuration is generated, comprising a first encoder, and a first decoder. The first autoencoder configuration is trained using dataset images, wherein weights associated with the first encoder and weights associated with the first decoder are modified. A second autoencoder configuration is generated comprising the first encoder and a second decoder. The second decoder is trained using images of a first target face. First encoder weights are substantially maintained, and weights associated with the second decoder are modified. An autoencoder comprising the trained first encoder and the trained second decoder generates an output using a source image of a first face having a facial expression, where the facial expression of the first face from the source image is applied to the first specific target face.

Real-Time Accent Conversion Model
20220358903 · 2022-11-10 ·

Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent. The computing device is configured to convert the synthesized audio data into a synthesized version of the received speech content having the second accent.

AUDIO PROCESSING METHOD, AUDIO PROCESSING APPARATUS AND COMPUTER STORAGE MEDIUM

An audio processing method applied to a first terminal is described, and includes: in response to receiving of audio data input by a user at the first terminal, and determination that a voice change function is turned on, determining change parameters; and based on the change parameters, performing change processing on the audio data.

VOICE CONVERSION DEVICE, VOICE CONVERSION METHOD, AND VOICE CONVERSION PROGRAM

The present invention provides a voice conversion apparatus and the like using a differential spectral method which is capable of implementing both high voice quality and real-time performance even in wideband. A voice conversion apparatus 10 includes: an acquiring unit 11 configured to acquire a signal of a voice of a subject; a dividing unit 12 configured to divide the signal into sub-band signals corresponding to a plurality of frequency bands; a converting unit configured to convert one or a plurality of sub-band signals corresponding to one or a plurality of lower frequency bands, out of the sub-band signals corresponding to the plurality of frequency bands; and a synthesizing unit 16 configured to generate a synthesized voice by synthesizing the one or plurality of sub-band signals after conversion and the remaining sub-band signals that are not converted.