G10L21/01

METHOD AND SYSTEM FOR TIME AND FEATURE MODIFICATION OF SIGNALS

The application relates to a computer implemented method and system for modifying at least one feature of an input audio signal based on features in a guide audio signal. The method comprises: determining matchable and unmatchable sections of the guide and input audio signals; generating a time-alignment path for modifying the at least one feature of the input audio signal in the matchable sections of the input audio signal based on corresponding features in the matchable sections of the guide audio signal, based on the time-alignment path, modifying the at least one feature in the matchable sections of the audio input signal.

Sound processing method, sound processing apparatus, and recording medium
11646044 · 2023-05-09 · ·

A method obtains a first sound signal representative of a first sound, including a first spectrum envelope contour and a first reference spectrum envelope contour; obtains a second sound signal, representative of a second sound differing in sound characteristics from the first sound, including a second spectrum envelope contour and a second reference spectrum envelope contour; generates a synthesis spectrum envelope contour by transforming the first spectrum envelope contour based on a first difference between the first spectrum envelope contour and the first reference spectrum envelope contour at a first time point of the first sound signal, and a second difference between the second spectrum envelope contour and the second reference spectrum envelope contour at a second time point of the second sound signal; and generates a third sound signal representative of the first sound that has been transformed using the generated synthesis spectrum envelope contour.

Sound processing method, sound processing apparatus, and recording medium
11646044 · 2023-05-09 · ·

A method obtains a first sound signal representative of a first sound, including a first spectrum envelope contour and a first reference spectrum envelope contour; obtains a second sound signal, representative of a second sound differing in sound characteristics from the first sound, including a second spectrum envelope contour and a second reference spectrum envelope contour; generates a synthesis spectrum envelope contour by transforming the first spectrum envelope contour based on a first difference between the first spectrum envelope contour and the first reference spectrum envelope contour at a first time point of the first sound signal, and a second difference between the second spectrum envelope contour and the second reference spectrum envelope contour at a second time point of the second sound signal; and generates a third sound signal representative of the first sound that has been transformed using the generated synthesis spectrum envelope contour.

Voice signal processing apparatus and voice signal processing method
09761242 · 2017-09-12 · ·

A voice signal processing apparatus and a voice signal processing method are provided. A first sampling point of an m.sup.th original frequency-lowered signal frame phase-matched to the sampling point corresponding to a phase reference sampling point number is determined according to the phase reference sampling point number of an (m−1).sup.th original frequency-lowered signal frame corresponding to a middle sampling point of an (m−1).sup.th renovating frequency-lowered signal frame. The q consecutive sampling points starting from the first sampling point are used as the sampling points of an m.sup.th renovating frequency-lowered signal frame.

Voice signal processing apparatus and voice signal processing method
09761242 · 2017-09-12 · ·

A voice signal processing apparatus and a voice signal processing method are provided. A first sampling point of an m.sup.th original frequency-lowered signal frame phase-matched to the sampling point corresponding to a phase reference sampling point number is determined according to the phase reference sampling point number of an (m−1).sup.th original frequency-lowered signal frame corresponding to a middle sampling point of an (m−1).sup.th renovating frequency-lowered signal frame. The q consecutive sampling points starting from the first sampling point are used as the sampling points of an m.sup.th renovating frequency-lowered signal frame.

Voice conversion training method and server and computer readable storage medium

The present disclosure discloses a voice conversion training method. The method includes: forming a first training data set including a plurality of training voice data groups; selecting two of the training voice data groups from the first training data set to input into a voice conversion neural network for training; forming a second training data set including the first training data set and a first source speaker voice data group; inputting one of the training voice data groups selected from the first training data set and the first source speaker voice data group into the network for training; forming the third training data set including the second source speaker voice data group and the personalized voice data group that are parallel corpus with respect to each other; and inputting the second source speaker voice data group and the personalized voice data group into the network for training.

Voice conversion training method and server and computer readable storage medium

The present disclosure discloses a voice conversion training method. The method includes: forming a first training data set including a plurality of training voice data groups; selecting two of the training voice data groups from the first training data set to input into a voice conversion neural network for training; forming a second training data set including the first training data set and a first source speaker voice data group; inputting one of the training voice data groups selected from the first training data set and the first source speaker voice data group into the network for training; forming the third training data set including the second source speaker voice data group and the personalized voice data group that are parallel corpus with respect to each other; and inputting the second source speaker voice data group and the personalized voice data group into the network for training.

VOICE CONVERSION TRAINING METHOD AND SERVER AND COMPUTER READABLE STORAGE MEDIUM

The present disclosure discloses a voice conversion training method. The method includes: forming a first training data set including a plurality of training voice data groups; selecting two of the training voice data groups from the first training data set to input into a voice conversion neural network for training; forming a second training data set including the first training data set and a first source speaker voice data group; inputting one of the training voice data groups selected from the first training data set and the first source speaker voice data group into the network for training; forming the third training data set including the second source speaker voice data group and the personalized voice data group that are parallel corpus with respect to each other; and inputting the second source speaker voice data group and the personalized voice data group into the network for training.

VOICE CONVERSION TRAINING METHOD AND SERVER AND COMPUTER READABLE STORAGE MEDIUM

The present disclosure discloses a voice conversion training method. The method includes: forming a first training data set including a plurality of training voice data groups; selecting two of the training voice data groups from the first training data set to input into a voice conversion neural network for training; forming a second training data set including the first training data set and a first source speaker voice data group; inputting one of the training voice data groups selected from the first training data set and the first source speaker voice data group into the network for training; forming the third training data set including the second source speaker voice data group and the personalized voice data group that are parallel corpus with respect to each other; and inputting the second source speaker voice data group and the personalized voice data group into the network for training.

Music synthesis method, system, terminal and computer-readable storage medium

A music synthesis method, a system, a terminal and a computer-readable storage medium are provided. The method includes: receiving a track selected by a user; obtaining a text; receiving speech data recorded by the user on the basis of the text; and forming a music file in accordance with the selected track and the speech data. The speech of a user can be combined with the track through the music synthesis method of the present application and an optimal effect of music can be simulated such that the user can participate in the singing and presentation of a music, thereby making music more entertaining.