Patent classifications
G10L21/007
ADAPTIVE COEFFICIENTS AND SAMPLES ELIMINATION FOR CIRCULAR CONVOLUTION
Technologies are disclosed for improving the efficiency of real-time audio processing, and specifically for improving the efficiency of continuously modifying a real-time audio signal. Efficiency is improved by reducing memory bandwidth requirements and by reducing the amount of processing used to modify the real-time audio signal. In some configurations, memory bandwidth requirements are reduced by selectively transferring active samples in the frequency domain—e.g. avoiding the transfer samples with amplitudes of zero or near-zero. This has particular importance when the specialized hardware retrieves samples from main memory in real-time. In some configurations, the amount of processing needed to modify the audio signal is reduced by omitting operations that do not meaningfully affect the output audio signal. For example, a multiplication of samples may be avoided when at least one of the samples has an amplitude of zero or near-zero.
ADAPTIVE COEFFICIENTS AND SAMPLES ELIMINATION FOR CIRCULAR CONVOLUTION
Technologies are disclosed for improving the efficiency of real-time audio processing, and specifically for improving the efficiency of continuously modifying a real-time audio signal. Efficiency is improved by reducing memory bandwidth requirements and by reducing the amount of processing used to modify the real-time audio signal. In some configurations, memory bandwidth requirements are reduced by selectively transferring active samples in the frequency domain—e.g. avoiding the transfer samples with amplitudes of zero or near-zero. This has particular importance when the specialized hardware retrieves samples from main memory in real-time. In some configurations, the amount of processing needed to modify the audio signal is reduced by omitting operations that do not meaningfully affect the output audio signal. For example, a multiplication of samples may be avoided when at least one of the samples has an amplitude of zero or near-zero.
GLOBAL PROSODY STYLE TRANSFER WITHOUT TEXT TRANSCRIPTIONS
A computer-implemented method is provided of using a machine learning model for disentanglement of prosody in spoken natural language. The method includes encoding, by a computing device, the spoken natural language to produce content code. The method further includes resampling, by the computing device without text transcriptions, the content code to obscure the prosody by applying an unsupervised technique to the machine learning model to generate prosody-obscured content code. The method additionally includes decoding, by the computing device, the prosody-obscured content code to synthesize speech indirectly based upon the content code.
Encoding machine-learning models and determining ownership of machine-learning models
Methods, systems, and non-transitory computer readable storage media are disclosed for generating a machine-learning model and encoding ownership information in the machine-learning model. For example, the disclosed system can generate parameters of a machine-learning model utilizing digital content items modified by a filter. The disclosed system can then process digital content items modified by the filter to generate first outputs based on the digital content items being modified by the filter. The disclosed system can also process digital content items unmodified by the filter to generate second outputs based on the digital content items not being modified by the filter. The disclosed system can determine that the second outputs are degraded relative to the first outputs. Accordingly, the disclosed system can determine ownership of the machine-learning model based on detecting that information about the filter is embedded in parameters of the machine-learning model.
Encoding machine-learning models and determining ownership of machine-learning models
Methods, systems, and non-transitory computer readable storage media are disclosed for generating a machine-learning model and encoding ownership information in the machine-learning model. For example, the disclosed system can generate parameters of a machine-learning model utilizing digital content items modified by a filter. The disclosed system can then process digital content items modified by the filter to generate first outputs based on the digital content items being modified by the filter. The disclosed system can also process digital content items unmodified by the filter to generate second outputs based on the digital content items not being modified by the filter. The disclosed system can determine that the second outputs are degraded relative to the first outputs. Accordingly, the disclosed system can determine ownership of the machine-learning model based on detecting that information about the filter is embedded in parameters of the machine-learning model.
Dynamic creation and insertion of content
In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.
Dynamic creation and insertion of content
In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.
Deep Learning Based Method and System for Processing Sound Quality Characteristics
The present invention provides a deep learning based method and system for processing sound quality characteristics. The method comprises: obtaining data characteristics of an audio data to be processed by extracting features from user preference data including the audio data to be processed; based on the data characteristics, generating a sound quality processing result of the audio to be processed by using a trained baseline model; wherein the baseline model is a neural network model trained by using audio data behavioral data, and other relevant data from multiple users or a single user.
Deep Learning Based Method and System for Processing Sound Quality Characteristics
The present invention provides a deep learning based method and system for processing sound quality characteristics. The method comprises: obtaining data characteristics of an audio data to be processed by extracting features from user preference data including the audio data to be processed; based on the data characteristics, generating a sound quality processing result of the audio to be processed by using a trained baseline model; wherein the baseline model is a neural network model trained by using audio data behavioral data, and other relevant data from multiple users or a single user.
SOUND SIGNAL PROCESSING SYSTEM AND SOUND SIGNAL PROCESSING METHOD
A sound signal processing system includes: a first obtainer that obtains recurrence plot information indicating a characteristic of a first sound; a second obtainer that obtains a sound signal of a second sound different from the first sound; a generator that generates a sound signal in which the characteristic of the first sound is reflected in the sound signal of the second sound, based on the recurrence plot information obtained by the first obtainer, the sound signal of the second sound being obtained by the second obtainer; and an outputter that outputs the sound signal generated.