G10L25/30

Volume leveler controller and controlling method

Volume leveler controller and controlling method are disclosed. In one embodiment, A volume leveler controller includes an audio content classifier for identifying the content type of an audio signal in real time; and an adjusting unit for adjusting a volume leveler in a continuous manner based on the content type as identified. The adjusting unit may configured to positively correlate the dynamic gain of the volume leveler with informative content types of the audio signal, and negatively correlate the dynamic gain of the volume leveler with interfering content types of the audio signal.

DYNAMIC TEMPERED SAMPLING IN GENERATIVE MODELS INFERENCE
20230237986 · 2023-07-27 · ·

A method of sampling output audio samples includes, during a packet loss concealment event, obtaining a sequence of previous output audio samples. At each time step during the event, the method includes generating a probability distribution over possible output audio samples for the time step. Each sample includes a respective probability indicating a likelihood that the corresponding sample represents a portion of an utterance at the time step. The method also includes determining a temperature sampling value based on a function of a number of time steps that precedes the time step, and an initial, a minimum, and a maximum temperature sampling value. The method also includes applying the temperature sampling value to the probability distribution to adjust a probability of selecting possible samples and randomly selecting one of the possible samples based on the adjusted probability. The method also includes generating synthesized speech using the randomly selected sample.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
20230233931 · 2023-07-27 · ·

An information processing apparatus is configured to acquire sound data, acquire, as teacher vibration data, vibration data that is created on the basis of the sound data and that is used to cause a vibration device to vibrate, and execute machine learning by using the sound data and the teacher vibration data, to generate learned model data that is used to convert an input sound waveform into an output vibration waveform. The information processing apparatus executes the machine learning by using a value obtained by analysis of a frequency spectrum of the sound data as an input feature amount.

DISEASE PREDICTION DEVICE, PREDICTION MODEL GENERATION DEVICE, AND DISEASE PREDICTION PROGRAM

Provided is a device performing machine learning by extracting an acoustic feature value from conversational voice data and predicting a disease level of a subject on the basis of a disease prediction model to be generated by the machine learning, the device including: a matrix calculation unit 23 calculating a spatial delay matrix using a relation value of a plurality of types of acoustic feature values; and a matrix decomposition unit 24 calculating a matrix decomposition value from the spatial delay matrix, in which a relation value reflecting a non-linear and non-stationary relationship of the feature values can be obtained by calculating at least one of a DCCA coefficient and a mutual information amount as the relation value of the plurality of types of acoustic feature values, and the disease level of the subject can be predicted on the basis of the relation value.

WEARABLE DEVICE FOR PROVIDING MULTI-MODALITY AND OPERATION METHOD THEREOF

Provided are a wearable device for providing a multi-modality, and an operation method of the wearable device. The operation method of the wearable device including obtaining source data including at least one of image data, text data, or sound data, determining whether the image data, the text data, and the sound data are included in the source data, based on determining that at least one of the image data, the text data, or the sound data is not included in the source data, generating the image data, the text data, and the sound data, which are not included in the source data, by using a generator of an generative adversarial network (GAN), which receives the source data as an input, generating a pulse-width modulation (PWM) signal based on the sound data, and outputting the multi-modality based on the image data, the text data, the sound data, and the PWM signal.

WEARABLE DEVICE FOR PROVIDING MULTI-MODALITY AND OPERATION METHOD THEREOF

Provided are a wearable device for providing a multi-modality, and an operation method of the wearable device. The operation method of the wearable device including obtaining source data including at least one of image data, text data, or sound data, determining whether the image data, the text data, and the sound data are included in the source data, based on determining that at least one of the image data, the text data, or the sound data is not included in the source data, generating the image data, the text data, and the sound data, which are not included in the source data, by using a generator of an generative adversarial network (GAN), which receives the source data as an input, generating a pulse-width modulation (PWM) signal based on the sound data, and outputting the multi-modality based on the image data, the text data, the sound data, and the PWM signal.

METHOD FOR PROCESSING AN AUDIO STREAM AND CORRESPONDING SYSTEM

A method and a system for processing an audio stream are described, wherein at least one database of classified voices and at least one database of classified background sounds are provided and a comparison between these classified voices and background sounds with the voices and the sounds extrapolated from a suitably re-processed audio stream is carried out in order to identify possible matches.

SPEECH SYNTHESIS METHOD, AND ELECTRONIC DEVICE
20230005466 · 2023-01-05 ·

The disclosure provides a speech synthesis method, and an electronic device. The technical solution is described as follows. A text to be synthesized and speech features of a target user are obtained. Predicted first acoustic features based on the text to be synthesized and the speech features are obtained. A target template audio is obtained from a template audio library based on the text to be synthesized. Second acoustic features of the target template audio are extracted. Target acoustic features are generated by splicing the first acoustic features and the second acoustic features. Speech synthesis is performed on the text to be synthesized based on the target acoustic features and the speech features, to generate a target speech of the text to be synthesized.

Machine learning method, audio source separation apparatus, and electronic instrument
11568857 · 2023-01-31 · ·

A machine learning method for training a learning model includes: transforming a first audio type of audio data into a first image type of image data, wherein a first audio component and a second audio component are mixed in the first audio type of audio data, and the first image type of image data corresponds to the first audio type of audio data; transforming a second audio type of audio data into a second image type of image data, wherein the second audio type of audio data includes the first audio component without mixture of the second audio component, and the second image type of image data corresponds to the second audio type of audio data; and performing machine learning on the learning model with training data including sets of the first image type of image data and the second image type of image data.

Audio stem identification systems and methods

Methods, systems and computer program products are provided for determining acoustic feature vectors of query and target items in a first vector space, and mapping the acoustic feature vectors to a second vector space having a lower dimension. The distribution of vectors in the second vector space can then be used to identify items from the same songs, and/or items that are complementary. A mapping function is trained using a machine learning algorithm, such that complementary audio items are closer in the second vector space than the first, according to a given distance metric.