IPIQ

G10L25/45

DETERMINING WHEN A SUBJECT IS SPEAKING BY ANALYZING A RESPIRATORY SIGNAL OBTAINED FROM A VIDEO

20170294193 · 2017-10-12 ·

What is disclosed is a system and method for determining when a subject is speaking from a respiratory signal obtained from a video of that subject. A video of a subject is received and a respiratory signal is extracted from a time-series signal is obtained from processing pixels in image frames of the video. The respiratory signal comprises an inspiratory signal and an expiratory signal. Cycle-level feature are extracted from the respiratory signal and used to identify expiratory signals during which speech is likely to have occurred. The identified expiratory signal are divided into time intervals. Frame-level features are determined for each time interval and an amount of distortion in the expiratory signal for this time interval is quantified. The amount of distortion is compared to a threshold. In response to the comparison, a determination is made that speech occurred during this interval. The process repeats for all time intervals.

Stereo signal encoding method and encoding apparatus

11244691 · 2022-02-08 ·

Huawei Technologies Co., Ltd.

A stereo signal encoding method includes determining a window length of an attenuation window based on an inter-channel time difference, determining a modified linear prediction analysis window based on the window length of the attenuation window, where values of at least some points from a point (L−sub_window_len) to a point (L−1) in the modified linear prediction analysis window are less than values of corresponding points from a point (L−sub_window_len) to a point (L−1) in an initial linear prediction analysis window, and the window length of the modified linear prediction analysis window is equal to a window length of the initial linear prediction analysis window, and performing linear prediction analysis on a to-be-processed sound channel signal based on the modified linear prediction analysis window.

Generating audio fingerprints based on audio signal complexity

09728205 · 2017-08-08 ·

Facebook, Inc.

Sergiy Bilobrov

An audio identification system accounts for an audio signal's complexity when generating a test audio fingerprint for identification of the audio signal. In particular, the audio identification system determines a complexity of an audio signal to be fingerprinted. For example, the audio signal's complexity may be determined by performance of an autocorrelation on the audio signal. Based on the determined complexity, the audio identification system determines a length of a sample of the audio signal used to generate a test audio fingerprint. A sample having the length is then obtained and used to generate a test audio fingerprint for the audio signal. The test audio fingerprint may be compared to a set of reference audio fingerprints to identify the audio signal.

Alias cancelling during audio coding mode transitions

RE048916 · 2022-02-01 ·

Dolby Laboratories Licensing Corporation

An apparatus for processing an audio signal and method thereof are disclosed. The present invention includes receiving, by an audio processing apparatus, an audio signal including a first data of a first block encoded with rectangular coding scheme and a second data of a second block encoded with non-rectangular coding scheme; receiving a compensation signal corresponding to the second block; estimating a prediction of an aliasing part using the first data; and, obtaining a reconstructed signal for the second block based on the second data, the compensation signal and the prediction of aliasing part.

Alias cancelling during audio coding mode transitions

RE048916 · 2022-02-01 ·

Dolby Laboratories Licensing Corporation

Concealing a lost audio frame by adjusting spectrum magnitude of a substitute audio frame based on a transient condition of a previously reconstructed audio signal

09721574 · 2017-08-01 ·

Telefonaktiebolaget L M Ericsson (Publ)

In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus thereof for controlling a concealment method for a lost audio frame of a received audio signal. A method for a decoder of concealing a lost audio frame comprises detecting in a property of the previously received and reconstructed audio signal, or in a statistical property of observed frame losses, a condition for which the substitution of a lost frame provides relatively reduced quality. In case such a condition is detected, the concealment method is modified by selectively adjusting a phase or a spectrum magnitude of a substitution frame spectrum.

Concealing a lost audio frame by adjusting spectrum magnitude of a substitute audio frame based on a transient condition of a previously reconstructed audio signal

09721574 · 2017-08-01 ·

Telefonaktiebolaget L M Ericsson (Publ)

Automated clinical documentation system and method

11250383 · 2022-02-15 ·

Nuance Communications, Inc.

A method, computer program product, and computing system for determining a time delay between a first audio signal received on a first audio detection system and a second audio signal received on a second audio detection system. The first and second audio detection systems are located within a monitored space. The first audio detection system is located with respect to the second audio detection system within the monitored space.

Automated clinical documentation system and method

11250383 · 2022-02-15 ·

Nuance Communications, Inc.

A DIALOG DETECTOR

20220199074 · 2022-06-23 ·

Dolby Laboratories Licensing Corporation

Lie Lu
Xin Liu

The present application relates to a method of extracting audio features in a dialog detector in response to an input audio signal, the method comprising dividing the input audio signal into a plurality of frames, extracting frame audio features from each frame, determining a set of context windows, each context window including a number of frames surrounding a current frame, deriving, for each context window, a relevant context audio feature for the current frame based on the frame audio features of the frames in each respective context, and concatenating each context audio feature to form a combined feature vector to represent the current frame. The context windows with the different length can improve the response speed and improve robustness.

Patent classifications

G10L25/45