G10L21/028

SIGNAL PROCESSING APPARATUS, SIGNAL PROCESSING METHOD, AND PROGRAM
20220375485 · 2022-11-24 · ·

A signal processing apparatus is provided that includes a sound source separation section configured to apply sound source separation processing to a mixed sound signal including a mixture of signals of a plurality of sound sources, and band extension sections configured to apply frequency band extension processing to respective sound source separation signals obtained by separation by the sound source separation section.

Context aware hearing optimization engine

One or more context aware processing parameters and an ambient audio stream are received. One or more sound characteristics associated with the ambient audio stream are identified using a machine learning model. One or more actions to perform are determined using the machine learning model and based on the one or more context aware processing parameters and the identified one or more sound characteristics. The one or more actions are performed.

Context aware hearing optimization engine

One or more context aware processing parameters and an ambient audio stream are received. One or more sound characteristics associated with the ambient audio stream are identified using a machine learning model. One or more actions to perform are determined using the machine learning model and based on the one or more context aware processing parameters and the identified one or more sound characteristics. The one or more actions are performed.

SERVER FOR IDENTIFYING FALSE WAKEUP AND METHOD FOR CONTROLLING THE SAME
20220358918 · 2022-11-10 ·

A server is provided. The server includes a communication circuitry, and at least one processor operatively connected with the communication circuitry. The at least one processor may be configured to, in response to traffic of a plurality of speeches to wake up a voice assistant feature, received within a preset period being a preset value or more, generate a plurality of clusters based on similarities between the plurality of speeches, and determine whether to respond to each of speeches included in each of the plurality of clusters based on similarities between the speeches included in each of the plurality of clusters.

SERVER FOR IDENTIFYING FALSE WAKEUP AND METHOD FOR CONTROLLING THE SAME
20220358918 · 2022-11-10 ·

A server is provided. The server includes a communication circuitry, and at least one processor operatively connected with the communication circuitry. The at least one processor may be configured to, in response to traffic of a plurality of speeches to wake up a voice assistant feature, received within a preset period being a preset value or more, generate a plurality of clusters based on similarities between the plurality of speeches, and determine whether to respond to each of speeches included in each of the plurality of clusters based on similarities between the speeches included in each of the plurality of clusters.

CONTEXT-BASED SPEAKER COUNTER FOR A SPEAKER DIARIZATION SYSTEM
20230103060 · 2023-03-30 ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining the number of speakers in a video and a corresponding audio using visual context. In one aspect, a method includes detecting within the video multiple speakers, determining a bounding box for each detected speaker that includes the detected person and objects within a threshold distance of the detected person in an image frame, determining a unique descriptor for that person based in part on image information depicting the objects within the bounding box, determining a cardinality of unique speakers in the video, providing to the speaker diarization system the cardinality of unique speakers.

CONTEXT-BASED SPEAKER COUNTER FOR A SPEAKER DIARIZATION SYSTEM
20230103060 · 2023-03-30 ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining the number of speakers in a video and a corresponding audio using visual context. In one aspect, a method includes detecting within the video multiple speakers, determining a bounding box for each detected speaker that includes the detected person and objects within a threshold distance of the detected person in an image frame, determining a unique descriptor for that person based in part on image information depicting the objects within the bounding box, determining a cardinality of unique speakers in the video, providing to the speaker diarization system the cardinality of unique speakers.

Acoustic object extraction device and acoustic object extraction method

In the acoustic object extraction device, beam forming processing units generate a first acoustic signal by beam forming in an arrival direction of a signal from an acoustic object with respect to a microphone array and generate a second acoustic signal by beam forming in an arrival direction of a signal from the acoustic object with respect to a microphone array, and a common component extraction unit extracts, on the basis of a similarity between the spectrum of the first acoustic signal and the spectrum of the second acoustic signal and from the first acoustic signal and the second acoustic signal, a signal containing a common component corresponding to the acoustic object. The common component extraction unit divides the spectrums of the first acoustic signal and the second acoustic signal into a plurality of frequency sections and calculates a similarity for each of the frequency sections.

AI-BASED DJ SYSTEM AND METHOD FOR DECOMPOSING, MISING AND PLAYING OF AUDIO DATA
20230089356 · 2023-03-23 ·

The present invention relates to a method for processing and playing audio data comprising the steps of receiving mixed input data and playing recombined output data. Furthermore, the invention relates to a device 10 for processing and playing audio data, preferably DJ equipment, comprising an audio input unit for receiving a mixed input signal, a recombination unit 32 and a playing unit 34 for playing recombined output data. In addition, the present invention relates to a method and a device for representing audio data, i.e. on a display.

METHOD AND SYSTEM TO IMPROVE VOICE SEPARATION BY ELIMINATING OVERLAP

Aspects disclosed herein generally relate to a method and a system for improving voice separation by eliminating overlaps or overlapping points. The time-frequency points from the two recorded mixtures are separated by using a Degenerate unmixing estimation technique (DUET) algorithm. The method or system further eliminates the overlapping time-frequency points which belongs to neither of the original resources of sounds.