G10L21/0364

METHOD AND APPARATUS FOR DETERMINING PARAMETERS OF A GENERATIVE NEURAL NETWORK
20230229892 · 2023-07-20 · ·

Described herein is a method of determining parameters for a generative neural network for processing an audio signal, wherein the generative neural network includes an encoder stage mapping to a coded feature space and a decoder stage, each stage including a plurality of convolutional layers with one or more weight coefficients, the method comprising a plurality of cycles with sequential processes of: pruning the weight coefficients of either or both stages based on pruning control information, the pruning control information determining the number of weight coefficients that are pruned for respective convolutional layers; training the pruned generative neural network based on a set of training data; determining a loss for the trained and pruned generative neural network based on a loss function; and determining updated pruning control information based on the determined loss and a target loss. Further described are corresponding apparatus, programs, and computer-readable storage media.

INFORMATION PROVIDING DEVICE, INFORMATION PROVIDING METHOD, AND INFORMATION PROVIDING PROGRAM

The present invention provides an information provision device and the like capable of reliably providing necessary guidance information while reducing the amount of audio information that is necessary for processing.

The information provision device is provided with: a guidance voice output control unit 1 and a speaker 10 which output a guidance voice; a microphone 11 and a collected sound transmission unit 2 which collect sounds that are made after the guidance voice starts to be output, wherein the guidance voice output control unit 1 and the speaker 10 output re-guidance voice corresponding to the guidance voice, on the basis of the result of determination of whether the collected sounds include a specific sound.

INFORMATION PROVIDING DEVICE, INFORMATION PROVIDING METHOD, AND INFORMATION PROVIDING PROGRAM

The present invention provides an information provision device and the like capable of reliably providing necessary guidance information while reducing the amount of audio information that is necessary for processing.

The information provision device is provided with: a guidance voice output control unit 1 and a speaker 10 which output a guidance voice; a microphone 11 and a collected sound transmission unit 2 which collect sounds that are made after the guidance voice starts to be output, wherein the guidance voice output control unit 1 and the speaker 10 output re-guidance voice corresponding to the guidance voice, on the basis of the result of determination of whether the collected sounds include a specific sound.

Order post to enable parallelized order taking using artificial intelligence engine(s)

In some aspects, a computing device receives a scan of a code displayed on an order post located near a restaurant, determines that the code is associated with the restaurant, and automatically opens a software application and navigates the software application to an ordering page associated with the restaurant. The computing device initiates receiving, via the software application, input associated with an order, sends the input to a machine learning based software agent executing on a server, receives a predicted response to the input, provides the predicted response as audio output and/or displays the predicted response on the touchscreen display device. After the order is complete, the computing device sends order data associated with the order to the restaurant. After receiving an indication from the restaurant that the order is ready, the computing device indicates that the order is ready to be picked up.

AUTOMATIC GAIN CONTROL BASED ON MACHINE LEARNING LEVEL ESTIMATION OF THE DESIRED SIGNAL
20230215451 · 2023-07-06 ·

Method includes receiving, at a server device, from a plurality of input devices, audio data. The audio data of each input device corresponds to a time-related portion of the audio data. The method determines a speech energy level for each input device by providing the time-related audio portion as input to a trained model. For each input device, a statistical value associated with the speech energy level is determined. A strongest input device is identified based on the statistical value. The statistical value associated with the speech energy level of each input device other than the strongest input device is compared to the statistical value of the strongest input device. Depending on the comparison, the method determines whether to update the gain value of an input device to an estimated target gain value based on the statistical value of the speech energy level of the respective input device.

AUTOMATIC GAIN CONTROL BASED ON MACHINE LEARNING LEVEL ESTIMATION OF THE DESIRED SIGNAL
20230215451 · 2023-07-06 ·

Method includes receiving, at a server device, from a plurality of input devices, audio data. The audio data of each input device corresponds to a time-related portion of the audio data. The method determines a speech energy level for each input device by providing the time-related audio portion as input to a trained model. For each input device, a statistical value associated with the speech energy level is determined. A strongest input device is identified based on the statistical value. The statistical value associated with the speech energy level of each input device other than the strongest input device is compared to the statistical value of the strongest input device. Depending on the comparison, the method determines whether to update the gain value of an input device to an estimated target gain value based on the statistical value of the speech energy level of the respective input device.

DYNAMIC SIDE-TONE TO CONTROL VOICE CATEGORY
20230215455 · 2023-07-06 ·

A method for providing sidetone adjustment comprises generating an audio signal representing user speech, determining a spectral distribution of the audio signal, determining a voice category from the spectral distribution of the audio signal, applying an adjustment to the audio signal based on the determined voice category to generate an adjusted audio signal, and providing audio output based on the adjusted audio signal to the user as sidetone. The adjustment to the audio signal may comprise adjustments to a plurality of frequency bands in the audio signal. The adjustments may further comprise boosting the levels of frequency bands in a high frequency speech band.

DYNAMIC SIDE-TONE TO CONTROL VOICE CATEGORY
20230215455 · 2023-07-06 ·

A method for providing sidetone adjustment comprises generating an audio signal representing user speech, determining a spectral distribution of the audio signal, determining a voice category from the spectral distribution of the audio signal, applying an adjustment to the audio signal based on the determined voice category to generate an adjusted audio signal, and providing audio output based on the adjusted audio signal to the user as sidetone. The adjustment to the audio signal may comprise adjustments to a plurality of frequency bands in the audio signal. The adjustments may further comprise boosting the levels of frequency bands in a high frequency speech band.

Post-processing gains for signal enhancement

A method, an apparatus, and logic to post-process raw gains determined by input processing to generate post-processed gains, comprising using one or both of delta gain smoothing and decision-directed gain smoothing. The delta gain smoothing comprises applying a smoothing filter to the raw gain with a smoothing factor that depends on the gain delta: the absolute value of the difference between the raw gain for the current frame and the post-processed gain for a previous frame. The decision-directed gain smoothing comprises converting the raw gain to a signal-to-noise ratio, applying a smoothing filter with a smoothing factor to the signal-to-noise ratio to calculate a smoothed signal-to-noise ratio, and converting the smoothed signal-to-noise ratio to determine the second smoothed gain, with smoothing factor possibly dependent on the gain delta.

Post-processing gains for signal enhancement

A method, an apparatus, and logic to post-process raw gains determined by input processing to generate post-processed gains, comprising using one or both of delta gain smoothing and decision-directed gain smoothing. The delta gain smoothing comprises applying a smoothing filter to the raw gain with a smoothing factor that depends on the gain delta: the absolute value of the difference between the raw gain for the current frame and the post-processed gain for a previous frame. The decision-directed gain smoothing comprises converting the raw gain to a signal-to-noise ratio, applying a smoothing filter with a smoothing factor to the signal-to-noise ratio to calculate a smoothed signal-to-noise ratio, and converting the smoothed signal-to-noise ratio to determine the second smoothed gain, with smoothing factor possibly dependent on the gain delta.