G10L21/007

Personalized Accent and/or Pace of Speaking Modulation for Audio/Video Streams
20230267941 · 2023-08-24 ·

Aspects of the disclosure relate to generating personalized accent and/or pace of speaking modulation for audio/video streams. In some embodiments, a computing platform may train an artificial intelligence model on audio or video samples associated with different geographic regions. The computing platform may receive, via a communication interface, an audio or video stream associated with a first geographic region. The computing platform may identify a second geographic region different from the first geographic region. The computing platform may transform the audio or video stream to correspond to the second geographic region different from the first geographic region. The computing platform may send, via the communication interface, the transformed audio or video stream to a user device associated with the second geographic region.

Machine-learned differentiable digital signal processing

Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.

AUDIO SIGNAL CONVERSION MODEL LEARNING APPARATUS, AUDIO SIGNAL CONVERSION APPARATUS, AUDIO SIGNAL CONVERSION MODEL LEARNING METHOD AND PROGRAM

A voice signal conversion model learning device including: a generation unit configured to generate a conversion destination voice signal on the basis of an input voice signal that is a voice signal of an input voice and conversion destination attribute information indicating an attribute of a voice represented by the conversion destination voice signal that is a voice signal of a conversion destination of the input voice signal; and an identification unit configured to execute a voice estimation process of estimating whether a voice signal represents a voice actually uttered by a person on the basis of the conversion destination voice signal, wherein the generation unit executes characteristic processing that is processing based on a neural network with respect to information indicating characteristics of the input voice signal and processing of converting a result of the characteristic processing based on a conversion mapping that is a mapping updated in accordance with an estimation result of the identification unit and is a mapping according to the conversion destination voice signal, and the generation unit and the identification unit perform learning on the basis of an estimation result of the voice estimation process.

SOUND SYNTHESIZING METHOD AND PROGRAM
20230260493 · 2023-08-17 ·

A sound synthesizing method according to one aspect of the present disclosure relates to a sound synthesizing method that is realized by a computer, including receiving musical score data and acoustic data via a user interface; and generating, based on respective one of the musical score data and the acoustic data, acoustic features of a sound waveform having a desired timbre.

DATA ANONYMIZATION FOR DATA LABELING AND DEVELOPMENT PURPOSES
20220129582 · 2022-04-28 ·

A method and system are disclosed for anonymizing data for labeling and development purposes. A data storage backend has a database of non-anonymous data that is received from a data source. An anonymization engine of the data storage backend generates anonymized data by removing personally identifiable information from the non-anonymous data. These anonymized data are made available to human labelers who manually provide labels based on the anonymized data using a data labeling tool. These labels are then stored in association with the corresponding non-anonymous data, which can then be used for training one or more machine learning models. In this way, non-anonymous data having personally identifiable information can be manually labelled for development purposes without exposing the personally identifiable information to any human labelers.

DATA ANONYMIZATION FOR DATA LABELING AND DEVELOPMENT PURPOSES
20220129582 · 2022-04-28 ·

A method and system are disclosed for anonymizing data for labeling and development purposes. A data storage backend has a database of non-anonymous data that is received from a data source. An anonymization engine of the data storage backend generates anonymized data by removing personally identifiable information from the non-anonymous data. These anonymized data are made available to human labelers who manually provide labels based on the anonymized data using a data labeling tool. These labels are then stored in association with the corresponding non-anonymous data, which can then be used for training one or more machine learning models. In this way, non-anonymous data having personally identifiable information can be manually labelled for development purposes without exposing the personally identifiable information to any human labelers.

Singing voice conversion
11721318 · 2023-08-08 · ·

A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

VOICE CONVERSION LEARNING DEVICE, VOICE CONVERSION DEVICE, METHOD, AND PROGRAM

To be able to convert to a voice of the desired attribution. A learning unit learns a converter to minimize a value of a learning criterion of the converter, learns a voice identifier to minimize a value of a learning criterion of the voice identifier, and learns an attribution identifier to minimize a value of a learning criterion of the attribution identifier.

VOICE PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND COMPUTER READABLE MEDIUM
20230306979 · 2023-09-28 ·

A voice processing method, comprising: segmenting a voice to be processed into at least one voice segment; generating at least one first voice on the basis of a clustering result of the at least one voice segment; performing feature extraction on each of the at least one first voice, to obtain a voiceprint feature vector corresponding to each first voice; and generating a second voice on the basis of the voiceprint feature vector, the second voice being an unmixed voice of the same sound source. Further disclosed are a voice processing apparatus, an electronic device, and a computer readable medium. By performing feature extraction on the first voice and further performing voice separation on the first voice, a more accurate second voice is obtained, thereby improving the overall voice segmentation effect.

VOICE PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND COMPUTER READABLE MEDIUM
20230306979 · 2023-09-28 ·

A voice processing method, comprising: segmenting a voice to be processed into at least one voice segment; generating at least one first voice on the basis of a clustering result of the at least one voice segment; performing feature extraction on each of the at least one first voice, to obtain a voiceprint feature vector corresponding to each first voice; and generating a second voice on the basis of the voiceprint feature vector, the second voice being an unmixed voice of the same sound source. Further disclosed are a voice processing apparatus, an electronic device, and a computer readable medium. By performing feature extraction on the first voice and further performing voice separation on the first voice, a more accurate second voice is obtained, thereby improving the overall voice segmentation effect.