G10L25/18

HYBRID INPUT MACHINE LEARNING FRAMEWORKS
20230005496 · 2023-01-05 ·

There is a need for more accurate and more efficient hybrid-input prediction steps/operations. This need can be addressed by, for example, techniques for efficient joint processing of data objects. In one example, a method includes: processing an audio data object using an audio processing machine learning model to generate an audio-based feature data object, processing an acceleration data object using an acceleration processing machine learning model to generate an acceleration-based feature data object, processing the audio-based feature data object and the acceleration-based feature data object using an feature synthesis machine learning model in order to generate a hybrid-input prediction data object; and performing one or more prediction-based actions based at least in part on the hybrid-input prediction data object.

HYBRID INPUT MACHINE LEARNING FRAMEWORKS
20230005496 · 2023-01-05 ·

There is a need for more accurate and more efficient hybrid-input prediction steps/operations. This need can be addressed by, for example, techniques for efficient joint processing of data objects. In one example, a method includes: processing an audio data object using an audio processing machine learning model to generate an audio-based feature data object, processing an acceleration data object using an acceleration processing machine learning model to generate an acceleration-based feature data object, processing the audio-based feature data object and the acceleration-based feature data object using an feature synthesis machine learning model in order to generate a hybrid-input prediction data object; and performing one or more prediction-based actions based at least in part on the hybrid-input prediction data object.

Machine learning method, audio source separation apparatus, and electronic instrument
11568857 · 2023-01-31 · ·

A machine learning method for training a learning model includes: transforming a first audio type of audio data into a first image type of image data, wherein a first audio component and a second audio component are mixed in the first audio type of audio data, and the first image type of image data corresponds to the first audio type of audio data; transforming a second audio type of audio data into a second image type of image data, wherein the second audio type of audio data includes the first audio component without mixture of the second audio component, and the second image type of image data corresponds to the second audio type of audio data; and performing machine learning on the learning model with training data including sets of the first image type of image data and the second image type of image data.

ANONYMIZING SPEECH DATA

A computer includes a processor and a memory, and the memory stores instructions executable by the processor to receive first speech data, remove a first vector of speaker-identifying characteristics from the first speech data to generate extracted first speech data, generate a random vector of the speaker-identifying characteristics, and generate second speech data by applying the random vector to the extracted first speech data.

ANONYMIZING SPEECH DATA

A computer includes a processor and a memory, and the memory stores instructions executable by the processor to receive first speech data, remove a first vector of speaker-identifying characteristics from the first speech data to generate extracted first speech data, generate a random vector of the speaker-identifying characteristics, and generate second speech data by applying the random vector to the extracted first speech data.

Using classified sounds and localized sound sources to operate an autonomous vehicle
11567510 · 2023-01-31 · ·

An ambient sound environment is captured by a microphone array of an autonomous vehicle traveling in the ambient sound environment. A perception module of the autonomous vehicle classifies sounds and localizes sound sources in the ambient sound environment. Classification is performed using spectrum analysis and/or machine learning. In an embodiment, sound sources within a field of view (FOV) of an image sensor of the autonomous vehicle are localized in a visual scene generated by the perception module. In an embodiment, one or more sound sources outside the FOV of the image sensors are localized in a static digital map. Localization is performed using parametric or non-parametric techniques and/or machine learning. The output of the perception module is input into a planning module of the autonomous vehicle to plan a route or trajectory for the autonomous vehicle in the ambient sound environment.

Using classified sounds and localized sound sources to operate an autonomous vehicle
11567510 · 2023-01-31 · ·

An ambient sound environment is captured by a microphone array of an autonomous vehicle traveling in the ambient sound environment. A perception module of the autonomous vehicle classifies sounds and localizes sound sources in the ambient sound environment. Classification is performed using spectrum analysis and/or machine learning. In an embodiment, sound sources within a field of view (FOV) of an image sensor of the autonomous vehicle are localized in a visual scene generated by the perception module. In an embodiment, one or more sound sources outside the FOV of the image sensors are localized in a static digital map. Localization is performed using parametric or non-parametric techniques and/or machine learning. The output of the perception module is input into a planning module of the autonomous vehicle to plan a route or trajectory for the autonomous vehicle in the ambient sound environment.

Encoding parameter adjustment method and apparatus, device, and storage medium

An encoding parameter adjustment method is performed at a computer device. The method includes: obtaining a first audio signal, and determining a psychoacoustic masking threshold within a service frequency band in the first audio signal; obtaining a second audio signal, and determining a background environmental noise estimation value of the frequency within the service frequency band in the second audio signal; determining a masking tag corresponding to the service frequency band according to the psychoacoustic masking threshold of the first audio signal and the background environmental noise estimation value of the second audio signal; determining a masking rate of the service frequency band according to the masking tag corresponding to the frequency within the service frequency band; determining a first reference bit rate according to the masking rate of the service frequency band; and configuring an encoding bit rate of an audio encoder based on the first reference bit rate.

Techniques to perform fast fourier transform

Apparatuses, systems, and techniques to perform a fast Fourier transform operation. In at least one embodiment, a fast Fourier transform operation is performed based on one or more parameters, wherein the one or more parameters indicate information about one or more operands of the fast Fourier transform.

Techniques to perform fast fourier transform

Apparatuses, systems, and techniques to perform a fast Fourier transform operation. In at least one embodiment, a fast Fourier transform operation is performed based on one or more parameters, wherein the one or more parameters indicate information about one or more operands of the fast Fourier transform.