MONAURAL SPEECH INTELLIGIBILITY PREDICTOR UNIT, A HEARING AID AND A BINAURAL HEARING SYSTEM

Abstract

Signal processing methods for predicting the intelligibility of speech, e.g., in the form of an index that correlate highly with the fraction of words that an average listener (amongst a group of listeners with similar hearing profiles) would be able to understand from some speech material are proposed. Specifically, solutions to the problem of predicting the intelligibility of speech signals, which are distorted, e.g., by noise or reverberation, and which might have been passed through some signal processing device, e.g., a hearing aid are described. In summary, the disclosure present solutions to the following problems: 1. Monaural, non-intrusive intelligibility prediction of noisy/processed speech signals 2. Binaural, non-intrusive intelligibility prediction of noisy/processed speech signals 3. Monaural and binaural intelligibility enhancement of noisy speech signals.

Claims

1. A monaural speech intelligibility predictor unit adapted for receiving an information signal x comprising either a clean or noisy and/or processed version of a target speech signal, the speech intelligibility predictor unit being configured to provide as an output a speech intelligibility predictor value d for the information signal, the speech intelligibility predictor unit comprising a) An input unit for providing a time-frequency representation x(k,m) of said information signal x, k being a frequency bin index, k=1, 2, . . . , K, and m being a time index; b) An envelope extraction unit for providing a time-frequency sub-band representation x.sub.j(m) of the information signal x representing temporal envelopes, or functions thereof, of frequency sub-band signals x.sub.j(m) of said information signal x, j being a frequency sub-band index, j=1, 2, . . . , J, and m being the time index; c) A time-frequency segment division unit for dividing said time-frequency representation x.sub.j(m) of the information signal x into time-frequency segments X.sub.m corresponding to a number N of successive samples of said-sub-band signals; d) A segment estimation unit for estimating essentially noise-free time-frequency segments S.sub.m or normalized and/or transformed versions {tilde over (S)}.sub.m thereof, among said time-frequency segments X.sub.m, or normalized and/or transformed versions {tilde over (X)}.sub.m thereof, respectively; e) An intermediate speech intelligibility calculation unit adapted for providing intermediate speech intelligibility coefficients d.sub.m estimating an intelligibility of said time-frequency segment X.sub.m, said intermediate speech intelligibility coefficients d.sub.m being based on said estimated essentially noise-free time segments S.sub.m or normalized and/or transformed versions {tilde over (S)}.sub.m thereof, and said time-frequency segments X.sub.m, or normalized and/or transformed versions {tilde over (X)}.sub.m thereof, respectively; f) A final speech intelligibility calculation unit for calculating a final speech intelligibility predictor d estimating an intelligibility of said information signal x by combining, e.g. averaging or applying a MIN or MAX-function, said intermediate speech intelligibility coefficients d.sub.m, or a transformed version thereof, over time.

2. A monaural speech intelligibility predictor unit according to claim 1 comprising a normalization and transformation unit configured to provide at least one normalization and/or transformation operation of rows and at least one normalization and/or transformation operation of columns of said time-frequency segments S.sub.m and X.sub.m.

3. A monaural speech intelligibility predictor unit according to claim 1 comprising a normalization and transformation unit configured to provide normalization and/or transformation of rows and columns of said time-frequency segments S.sub.m and X.sub.m, wherein said normalization and/or transformation of rows comprises at least one of the following operations R1) mean normalization of rows, R2) unit-norm normalization of rows, R3) Fourier transform of rows, R4) providing a Fourier magnitude spectrum of rows, and R5) providing the identity operation, and wherein said normalization and/or transformation of columns comprises at least one of the following operations C1) mean normalization of columns, and C2) unit-norm normalization of columns.

4. A monaural speech intelligibility predictor unit according to claim 2 comprising a normalization and/or transformation unit adapted for providing normalized and/or transformed versions {tilde over (X)}.sub.m of said time-frequency segments X.sub.m, wherein the normalization and/or transformation unit is configured to apply one or more of the following algorithms to the time-frequency segments X.sub.m: R1) Normalization of rows to zero mean:
g.sub.1(X)=X−μ.sub.x.sup.r1.sup.T, where μ.sub.x.sup.r is a J×1 vector whose j'th entry is the mean of the j'th row of X (hence the superscript r in μ.sub.x.sup.r), where 1 denotes an N×1 vector of ones, and where superscript T denotes matrix transposition; R2) Normalization of rows to unit-norm:
g.sub.2(X)=D.sup.r(X)X, where D.sup.r(X)=diag(└1/√{square root over (X(1,:)X(1,:).sup.H)} . . . 1/√{square root over (X(J,:)X(J,:).sup.H)}┘), and where X(j,:) denotes the j'th row of X, such that D.sup.r(X) is a J×J diagonal matrix with the inverse norm of each row on the main diagonal, and zeros elsewhere (the superscript H denotes Hermitian transposition). Pre-multiplication with D.sup.r(X) normalizes the rows of the resulting matrix to unit-norm; R3) Fourier transformation applied to each row
g.sub.3(X)=XF, where F is an N×N Fourier matrix; R4) Fourier transformation applied to each row followed by computing the magnitude of the resulting complex-valued elements
g.sub.4(X)=|XF| where |•| computes the element-wise magnitudes; R5) The identity operator
g.sub.5(X)=X C1) Normalization of columns to zero mean:
h.sub.1(X)=X−1μ.sub.x.sup.c.sup.T, where μ.sub.x.sup.c is a N×1 vector whose i.sup.th entry is the mean of the i.sup.th row of X, and where 1 denote an J×1 vector of ones; C2) Normalization of columns to unit-norm:
h.sub.2(X)=XD.sup.c(X), where D.sup.c(X)=diag(└1/√{square root over (X(:,1).sup.HX(:,1))} . . . 1/√{square root over (X(:,N).sup.HX(:,N))}┘), where X(:,n) denotes the n'th row of X, such that D.sup.c(X) is a diagonal N×N matrix with the inverse norm of each column on the main diagonal, and zeros elsewhere. Post-multiplication with D.sup.c(X) normalizes the rows of the resulting matrix to unit-norm.

5. A monaural speech intelligibility predictor unit according to claim 1 adapted to extract said temporal envelope signals as $x_{j} (m) = f (\sqrt{{.Math.}_{k = k .Math. .Math. 1 .Math. (j)}^{k .Math. .Math. 2 .Math. (j)} .Math. {.Math. x (k, m) .Math.}^{2}}),$ where j=1, . . . , J and m=1, . . . , M, k1(j) and k2(j) denote DFT bin indices corresponding to lower and higher cut-off frequencies of the j.sup.th sub-band, J is the number of sub-bands, and M is the number of signal frames in the signal in question, and ƒ(•) is a function.

6. A monaural speech intelligibility predictor unit according to claim 5 wherein the function ƒ(•)=ƒ(w), where w represents $(\sqrt{{.Math.}_{k = k .Math. .Math. 1 .Math. (j)}^{k .Math. .Math. 2 .Math. (j)} .Math. {.Math. x (k, m) .Math.}^{2}}),$ is selected among the following functions ƒ(w)=w representing the identity ƒ(w)=w.sup.2 providing power envelopes, ƒ(w)=2.Math.log w or ƒ(w)=w.sup.β, 0<β<2, allowing the modelling of the compressive non-linearity of the healthy cochlea, or combinations thereof.

7. A monaural speech intelligibility predictor unit according to claim 1 wherein the segment estimation unit is configured to estimate the essentially noise-free time-frequency segments {tilde over (S)}.sub.m from time-frequency segments {tilde over (X)}.sub.m representing the information signal based on statistical methods.

8. A monaural speech intelligibility predictor unit according to claim 1 wherein the segment estimation unit is configured to estimate said essentially noise-free time-frequency segments S.sub.m or normalized and/or transformed versions {tilde over (S)}.sub.m thereof based on super-vectors {tilde over (x)}.sub.m derived from time-frequency segments X.sub.m or from normalized and/or transformed time-frequency segments {tilde over (X)}.sub.m of the information signal, and an estimator r({tilde over (x)}.sub.m) that maps the super vectors {tilde over (x)}.sub.m of the information signal to estimates {tilde over (ŝ)}.sub.m of super vectors {tilde over (s)}.sub.m representing the essentially noise-free, optionally normalized and/or transformed time-frequency segments {tilde over (S)}.sub.m.

9. A monaural speech intelligibility predictor unit according to claim 1 wherein the segment estimation unit is configured to estimate the essentially noise-free time-frequency segments {tilde over (S)}.sub.m based on a linear estimator.

10. A monaural speech intelligibility predictor unit according to claim 9 wherein the segment estimation unit is configured to estimate the essentially noise-free, optionally normalized and/or transformed, time-frequency segments (S.sub.m, {tilde over (S)}.sub.m) based on a pre-estimated J.Math.N×J.Math.N sample correlation matrix ${\hat{R}}_{\tilde{z}} = \frac{1}{\tilde{M}} .Math. {.Math.}_{m = 1}^{\tilde{M}} .Math. {\tilde{z}}_{m} .Math. {\tilde{z}}_{m}^{H},$ across a training set of super vectors {tilde over (z)}.sub.m derived from optionally normalized and/or transformed segments of noise-free speech signals z.sub.m, where {tilde over (M)} is the number of entries in the training set.

11. A monaural speech intelligibility predictor unit according to claim 1 wherein the final speech intelligibility calculation unit is adapted to calculate the final speech intelligibility predictor d from the intermediate speech intelligibility coefficients d.sub.m, optionally transformed by a function u(d.sub.m), as an average over time of said information signal x: $d = \frac{1}{M} .Math. {.Math.}_{m = 1}^{M} .Math. u (d_{m})$ where M represents the duration in time units of the speech active parts of said information signal x.

12. A hearing aid adapted for being located at or in left and right ears of a user, or for being fully or partially implanted in the head of the user, the hearing aid comprising a monaural speech intelligibility predictor unit according to claim 1.

13. A hearing aid according to claim 12 comprising a) A number of input units IU.sub.i, i=1, . . . , M, M being larger than or equal to one, each being configured to provide a time-variant electric input signal y′.sub.i representing a sound input received at an i.sup.th input unit, the electric input signal y′.sub.i comprising a target signal component and a noise signal component, the target signal component originating from a target signal source; b) A configurable signal processor for processing the electric input signals and providing a processed signal u; c) An output unit for creating output stimuli configured to be perceivable by the user as sound based on an electric output either in the form of the processed signal u from the signal processor or a signal derived therefrom; and d) A hearing loss model unit operatively connected to the monaural speech intelligibility predictor unit and configured to apply a frequency dependent modification of the electric output signal reflecting a hearing impairment of the corresponding left or right ear of the user to provide information signal x to the monaural speech intelligibility predictor unit.

14. A hearing aid according to claim 13 wherein the configurable signal processor is adapted to control or influence the processing of the respective electric input signals based on said final speech intelligibility predictor d provided by the monaural speech intelligibility predictor unit.

15. A binaural hearing system comprising left and right hearing aids according to claim 12, wherein each of the left and right hearing aids comprises antenna and transceiver circuitry for allowing a communication link to be established and information to be exchanged between said left and right hearing aids.

16. A binaural hearing system according to claim 15 further comprising a binaural speech intelligibility prediction unit for providing a final binaural speech intelligibility measure d.sub.binaural of the predicted speech intelligibility of the user, when exposed to said sound input, based on the monaural speech intelligibility predictor values d.sub.left, d.sub.right of the respective left and right hearing aids.

17. A binaural hearing system according to claim 16 wherein the final binaural speech intelligibility measure d.sub.binaural is determined as the maximum of the speech intelligibility predictor values d.sub.left, d.sub.right of the respective left and right hearing aids: d.sub.binaural=max(d.sub.left, d.sub.right).

18. A method of providing a monaural speech intelligibility predictor for estimating a user's ability to understand an information signal x comprising either a clean or noisy and/or processed version of a target speech signal, the method comprising Providing a time-frequency representation x(k,m) of said information signal x, k being a frequency bin index, k=1, 2, . . . , K, and m being a time index; Extracting temporal envelopes of said frequency time-frequency representation x(k,m) providing a time-frequency sub-band representation x.sub.j(m) of the information signal x representing temporal envelopes, or functions thereof, in the form of frequency sub-band signals x.sub.j(m), j being a frequency sub-band index, j=1, 2, . . . , J, and m being the time index; Dividing said time-frequency representation x.sub.j(m) of the information signal x into time-frequency segments X.sub.m corresponding to a number N of successive samples of said sub-band signals; Estimating essentially noise-free time-frequency segments S.sub.m or normalized and/or transformed versions {tilde over (S)}.sub.m thereof, among said time-frequency segments X.sub.m, or normalized and/or transformed versions {tilde over (X)}.sub.m thereof, respectively; Providing intermediate speech intelligibility coefficients d.sub.m estimating an intelligibility of said time-frequency segment X.sub.m, said intermediate speech intelligibility coefficients d.sub.m being based on said estimated essentially noise-free time segments S.sub.m or normalized and/or transformed versions {tilde over (S)}.sub.m thereof, and said time-frequency segments X.sub.m, or normalized and/or transformed versions {tilde over (X)}.sub.m thereof, respectively; Calculating a final speech intelligibility predictor d estimating an intelligibility of said information signal x by combining, e.g. averaging, said intermediate speech intelligibility coefficients d.sub.m, or a transformed version thereof, over time, e.g. in a single scalar value.

19. A data processing system comprising a processor and program code means for causing the processor to perform the steps the method according to claim 18.

20. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method according to claim 18.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0134] The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:

[0135] FIG. 1A schematically shows a time variant analogue signal (Amplitude vs time) and its digitization in samples, the samples being arranged in a number of time frames, each comprising a number N.sub.s of samples, and

[0136] FIG. 1B illustrates a time-frequency map representation of the time variant electric signal of FIG. 1A,

[0137] FIG. 2A symbolically shows a monaural speech intelligibility predictor unit providing a monaural speech intelligibility predictor d based on a time-frequency representation x.sub.j(m) of an information signal x, and

[0138] FIG. 2B shows an embodiment a monaural speech intelligibility predictor unit,

[0139] FIG. 3A shows a monaural speech intelligibility predictor unit in combination with a hearing loss model and an evaluation unit,

[0140] FIG. 3B shows a monaural speech intelligibility predictor unit in combination with a signal processor and an evaluation unit,

[0141] FIG. 3C shows a first combination of a monaural speech intelligibility predictor unit with a hearing loss model, a signal processor and an evaluation unit, and

[0142] FIG. 3D shows a second combination of a monaural speech intelligibility predictor unit with a hearing loss model, a signal processor and an evaluation unit,

[0143] FIG. 4 shows an embodiment of a monaural speech intelligibility predictor according to the present disclosure,

[0144] FIG. 5A symbolically shows a binaural speech intelligibility predictor in combination with a hearing loss model, and

[0145] FIG. 5B shows an embodiment of a binaural speech intelligibility predictor based on a combination of two monaural speech intelligibility predictors in combination with a hearing loss model according to the present disclosure,

[0146] FIG. 6 schematically shows processing steps of a method of providing a non-intrusive binaural speech intelligibility predictor according to the present disclosure,

[0147] FIG. 7 schematically shows a method of providing an intrusive binaural speech intelligibility predictor d.sub.binaural for adapting the processing of a binaural hearing aid systems to maximize the intelligibility of output speech signal(s),

[0148] FIG. 8A shows an embodiment of a hearing aid according to the present disclosure comprising a monaural speech intelligibility predictor for estimating intelligibility of an output signal and using the predictor to adapt the signal processing of an input speech signal to maximize the monaural speech intelligibility predictor,

[0149] FIG. 8B shows a first embodiment of a binaural hearing aid system according to the present disclosure comprising a binaural speech intelligibility predictor for estimating intelligibility of respective left and right output signals of the binaural hearing aid system and using the predictor to adapt the binaural signal processing of a number of input signals comprising speech to maximize the binaural speech intelligibility predictor, and

[0150] FIG. 8C a second embodiment of a binaural hearing aid system according to the present disclosure comprising left and right hearing aids and a binaural speech intelligibility predictor for estimating intelligibility of output signals of the respective left and right hearing aids and using the predictor to adapt the signal processing of a number of input signals comprising speech of each of the left and right hearing aids to maximize the binaural speech intelligibility predictor,

[0151] FIG. 9 illustrates an exemplary hearing aid formed as a receiver in the ear (RITE) type of hearing aid comprising a part adapted for being located behind pinna and a part comprising an output transducer (e.g. a loudspeaker/receiver) adapted for being located in an ear canal of the user, and

[0152] FIG. 10A shows a binaural hearing aid system according to the present disclosure comprising first and second hearing aids and an auxiliary device, and

[0153] FIG. 10B shows the auxiliary device comprising a user interface in the form of an APP for controlling and displaying data related to the speech intelligibility predictors.

[0154] The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.

[0155] Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.

DETAILED DESCRIPTION OF EMBODIMENTS

[0156] The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practised without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.

[0157] The electronic hardware may include microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

[0158] The present application relates to the field of hearing aids.

[0159] The present invention relates to specifically to signal processing methods for predicting the intelligibility of speech, e.g., in the form of an index that correlate highly with the fraction of words that an average listener (amongst a group of listeners with similar hearing profiles) would be able to understand from some speech material. Specifically, we present solutions to the problem of predicting the intelligibility of speech signals, which are distorted, e.g., by noise or reverberation, and which might have been passed through some signal processing device, e.g., a hearing aid. The invention is characterized by the fact that the intelligibility prediction is based on the noisy/processed signal only—in the literature, such methods are called non-intrusive intelligibility predictors, e.g. [1]. The non-intrusive class of methods, which we focus on in the present invention, is in contrast to the much larger class of methods which require a noise-free and unprocessed reference speech signal to be available too (e.g. [2,3,4], etc.)—this class of methods is called intrusive.

[0160] The core of the invention is a method for monaural, non-intrusive intelligibility prediction—in other words, given a noisy speech signal, picked up by a single microphone, and potentially passed through some signal processing stages, e.g. of a hearing aid system, we wish to estimate its' intelligibility. In the first part of the text below, we will provide an extensive description of a novel, general class of methods for solving this problem.

[0161] Next, we extend the invention to deal with the binaural, non-intrusive intelligibility problem.

[0162] The reason to for this extension is that listening to acoustic scenes using two ears (i.e., binaurally) can in certain situations increase the intelligibility dramatically over using only one ear (or presenting the same signal to both ears) [5].

[0163] Finally, we extend the invention even further to be used for monaural or binaural speech intelligibility enhancement. The problem solved here is the following: given noisy/reverberant speech signals, e.g. picked up by the microphones of a hearing aid system, process them in such a way that their intelligibility is improved or even maximized when presented binaurally to the user.

[0164] In summary, the disclosure present solutions to the following problems:

1. Monaural, non-intrusive intelligibility prediction of noisy/processed speech signals
2. Binaural, non-intrusive intelligibility prediction of noisy/processed speech signals
3. Monaural and binaural intelligibility enhancement of noisy speech signals.

[0165] Much of the signal processing of the present disclosure is performed in the time-frequency domain, where a time domain signal is transformed into the (time-)frequency domain by a suitable mathematical algorithm (e.g. a Fourier transform algorithm) or filter (e.g. a filter bank).

[0166] FIG. 1A schematically shows a time variant analogue signal (Amplitude vs time) and its digitization in samples, the samples being arranged in a number of time frames, each comprising a number N.sub.s of digital samples. FIG. 1A shows an analogue electric signal (solid graph), e.g. representing an acoustic input signal, e.g. from a microphone, which is converted to a digital audio signal in an analogue-to-digital (AD) conversion process, where the analogue signal is sampled with a predefined sampling frequency or rate f.sub.s, f.sub.s being e.g. in the range from 8 kHz to 40 kHz (adapted to the particular needs of the application) to provide digital samples x(n) at discrete points in time n, as indicated by the vertical lines extending from the time axis with solid dots at its endpoint coinciding with the graph, and representing its digital sample value at the corresponding distinct point in time n. Each (audio) sample x(n) represents the value of the acoustic signal at n by a predefined number N.sub.b of bits, N.sub.b being e.g. in the range from 1 to 16 bits. A digital sample x(n) has a length in time of 1/f.sub.s, e.g. 50 μs, for ƒ.sub.s=20 kHz. A number of (audio) samples N.sub.s are arranged in a time frame, as schematically illustrated in the lower part of FIG. 1A, where the individual (here uniformly spaced) samples are grouped in time frames (1, 2, . . . , N.sub.s)). As also illustrated in the lower part of FIG. 1A, the time frames may be arranged consecutively to be non-overlapping (time frames 1, 2, . . . , m, . . . , M) or overlapping (here 50%, time frames 1, 2, . . . , m, . . . , M′), where m is time frame index. In an embodiment, a time frame comprises 64 audio data samples. Other frame lengths may be used depending on the practical application.

[0167] FIG. 1B schematically illustrates a time-frequency representation of the (digitized) time variant electric signal x(n) of FIG. 1A. The time-frequency representation comprises an array or map of corresponding complex or real values of the signal in a particular time and frequency range. The time-frequency representation may e.g. be a result of a Fourier transformation converting the time variant input signal x(n) to a (time variant) signal x(k,m) in the time-frequency domain. In an embodiment, the Fourier transformation comprises a discrete Fourier transform algorithm (DFT). The frequency range considered by a typical hearing device (e.g. a hearing aid) from a minimum frequency f.sub.min to a maximum frequency f.sub.max comprises a part of the typical human audible frequency range from 20 Hz to 20 kHz, e.g. a part of the range from 20 Hz to 12 kHz. In FIG. 1B, the time-frequency representation x(k,m) of signal x(n) comprises complex values of magnitude and/or phase of the signal in a number of DFT-bins defined by indices (k,m), where k=1, . . . , K represents a number K of frequency values (cf. vertical k-axis in FIG. 1B) and m=1, . . . , M (M′) represents a number M (M′) of time frames (cf. horizontal m-axis in FIG. 1B). A time frame is defined by a specific time index m and the corresponding K DFT-bins (cf. indication of Time frame m in FIG. 1B). A time frame m represents a frequency spectrum of signal x at time m. A DFT-bin (k,m) comprising a (real) or complex value x(k,m) of the signal in question is illustrated in FIG. 1B by hatching of the corresponding field in the time-frequency map. Each value of the frequency index k corresponds to a frequency range Δƒ.sub.k, as indicated in FIG. 1B by the vertical frequency axis ƒ. Each value of the time index m represents a time frame. The time Δt.sub.m spanned by consecutive time indices depend on the length of a time frame (e.g. 25 ms) and the degree of overlap between neighbouring time frames (cf. horizontal t-axis in FIG. 1B).

[0168] In the present application, a number J of (non-uniform) frequency sub-bands with sub-band indices j=1, 2, . . . , J is defined, each sub-band comprising one or more DFT-bins (cf. vertical Sub-band j-axis in FIG. 1B). The j.sup.th sub-band (indicated by Sub-band j (x.sub.j(m)) in the right part of FIG. 1B) comprises DFT-bins with lower and upper indices k1(j) and k2(j), respectively, defining lower and upper cut-off frequencies of the j.sup.th sub-band, respectively. A specific time-frequency unit (j,m) is defined by a specific time index m and the DFT-bin indices k1(j)-k2(j), as indicated in FIG. 1B by the bold framing around the corresponding DFT-bins. A specific time-frequency unit (j,m) contains complex or real values of the j.sup.th sub-band signal x.sub.j(m) at time m.

[0169] FIG. 2A symbolically illustrates a monaural speech intelligibility predictor unit (MSIP) providing a monaural speech intelligibility predictor d based on a time domain version x(n) (n being a time (sample) index), a time-frequency band representation x(k,m) (k being a frequency index, m being a time (frame) index) or a sub-band representation x.sub.j(m) (j being a frequency sub-band index) of an information signal x comprising speech.

[0170] FIG. 2B shows an embodiment a monaural speech intelligibility predictor unit (MSIP) adapted for receiving an information signal x(n) comprising either a clean or noisy and/or processed version of a target speech signal, the speech intelligibility predictor unit being configured to provide as an output a speech intelligibility predictor value d for the information signal. The speech intelligibility predictor unit (MSIP) comprises [0171] an input unit (IU) for providing a time-frequency representation x(k,m) of said information signal x, k being a frequency bin index, k=1, 2, . . . , K, and m being a time (frame) index; [0172] An envelope extraction unit (AEU) for providing a time-frequency sub-band representation x.sub.j(m) of the information signal x from said time-frequency representation x(k,m) of said information signal x, representing temporal envelopes, or functions thereof, j being a frequency sub-band index, j=1, 2, . . . , J, and m being the time index; [0173] A time-frequency, segment division unit (SDU) for dividing said time-frequency sub-band representation x.sub.j(m) of the information signal x into time-frequency segments X.sub.m corresponding to a number N of successive samples of said sub-band signals; [0174] An optional (indicated by dashed outline) normalization and/or transformation unit (N/TU) adapted for providing normalized and/or transformed versions {tilde over (X)}.sub.m of the time-frequency segments X.sub.m; [0175] A segment estimation unit (SEU) for estimating essentially noise-free time-frequency segments S.sub.m or normalized and/or transformed versions {tilde over (S)}.sub.m thereof, among said time-frequency segments X.sub.m, or normalized and/or transformed versions {tilde over (X)}.sub.m thereof, respectively; [0176] An intermediate speech intelligibility calculation unit (ISIU) adapted for providing intermediate speech intelligibility coefficients d.sub.m estimating an intelligibility of said time-frequency segment X.sub.m, said intermediate speech intelligibility coefficients d.sub.m being based on said estimated essentially noise-free time segments S.sub.m or normalized and/or transformed versions {tilde over (S)}.sub.m thereof, and said time-frequency segments X.sub.m, or normalized and/or transformed versions {tilde over (X)}.sub.m thereof, respectively; [0177] A final speech intelligibility calculation unit (FSIU) for calculating a final speech intelligibility predictor d estimating an intelligibility of the information signal x by combining, e.g. averaging or applying a MIN or MAX-function, the intermediate speech intelligibility coefficients d.sub.m, or a transformed version thereof, over time.

[0178] FIG. 3A shows a monaural speech intelligibility predictor unit (MSIP) in combination with a hearing loss model (HLM) and an (optional) evaluation unit (EVAL). The Monaural Speech Intelligibility Predictor (MSIP) estimates an intelligibility index d, which reflects the intelligibility of a noisy and potentially processed speech signal. A noisy/reverberant speech signal y, which potentially has been passed through some signal processing device, e.g. a hearing aid (cf. e.g. signal processor (SPU) in FIG. 3B, 3C, 3D), is considered for analysis by the monaural speech intelligibility predictor (MSIP). The present disclosure proposes an algorithm, which can predict the intelligibility of the signal noisy/processed signal, as perceived by a group of listeners with similar hearing profiles, e.g. normal hearing or hearing impaired listeners. In the embodiment of FIG. 3A, the signal under study, y, is passed through a hearing loss model (HLM), to model the imperfections of an impaired auditory system providing information signal x. This is done to simulate the potential decrease in intelligibility due to a hearing loss. Several methods for simulating a hearing loss exist (cf. e.g. [6]). The, perhaps, simplest consists of adding to the input signal a statistically independent noise signal, which is spectrally shaped according to the audiogram of the listener (cf. e.g. [7]). In the embodiment of FIGS. 3A (and 3B, 3C, 3D), an evaluation unit (EVAL) is included to evaluate the resulting speech intelligibility predictor value d. The evaluation unit (EVAL) may e.g. further process the speech intelligibility predictor value d, to e.g. graphically and/or numerically display the current and/or recent historic values, derive trends, etc. Alternatively, or additionally the evaluation unit may propose actions to the user (or a communication partner or caring person), such as add directionality, move closer, speak louder, activate SI-enhancement mode, etc. The evaluation unit may e.g. be implemented in a separate device, e.g. acting as a user interface to the speech intelligibility predictor unit (MSIP) and/or to a hearing aid including such unit, e.g. implemented as a remote control devise, e.g. as an APP of a smartphone (cf. FIG. 10A, 10B).

[0179] FIG. 3B shows a monaural speech intelligibility predictor unit (MSIP) in combination with a signal processor (SPU) and an (optional) evaluation unit (EVAL). A noisy/reverberant speech signal y is passed through a signal processor (SPU) and the processed output signal x thereof is used as an input to the monaural speech intelligibility predictor (MSIP) providing the resulting speech intelligibility predictor value d, which is fed to the evaluation unit (EVAL) for further processing, analysis and/or display.

[0180] FIG. 3C shows a first combination of a monaural speech intelligibility predictor unit (MSIP) with a hearing loss model (HLM), a signal processor (SPU) and an (optional) evaluation unit (EVAL). A noisy signal, y, comprising speech is passed through a hearing loss model (HLM) to model the imperfections of an impaired auditory system providing noisy hearing loss shaped signal x, which is passed through a signal processor (SPU) and the processed output signal x thereof is used as an input to the monaural speech intelligibility predictor (MSIP). The MSIP-unit provides the resulting speech intelligibility predictor value d, which is fed to the evaluation unit (EVAL) for further processing, analysis and/or display.

[0181] FIG. 3D shows a second combination of a monaural speech intelligibility predictor unit (MSIP) with a hearing loss model (HLM), a signal processor (SPU) and an (optional) evaluation unit (EVAL). The embodiment of FIG. 3D is similar to the embodiment of FIG. 3C apart from the two units HLM and SPU being sapped in order. The embodiment of FIG. 3D may reflect a setup used in a hearing aid to evaluate the intelligibility of a processed signal u from a signal processor (SPU) (e.g. intended for presentation to a user). The noisy signal comprising speech y is passed through the signal processor (SPU) and the processed output signal u thereof is passed through a hearing loss model (HLM) to model the imperfections of an impaired auditory system and providing noisy hearing loss shaped signal x, which is used by the monaural speech intelligibility predictor unit (MSIP) to determine the resulting speech intelligibility predictor value d, which is fed to the evaluation unit (EVAL) for further processing, analysis and/or display.

[0182] FIG. 4 shows an embodiment of a monaural speech intelligibility predictor unit (MSIP) according to the present disclosure. The embodiment of a monaural speech intelligibility predictor shown in FIG. 4 is decomposed into a number of sub-units (e.g. representing separate tasks of a corresponding method). Each sub-unit (process step) is described in more detail in the following. Sub-units (process steps) that are symbolized with dashed outline are optional.

Voice Activity Detection.

[0183] Speech intelligibility (SI) relates to regions of the input signal with speech activity—silence regions do no contribute to SI. Hence, in some realizations of the invention, the first step is to detect voice activity regions in the input signal (in other realizations, voice activity detection is performed implicitly at a later stage of the algorithm). The explicit voice activity detection can be done with any of a range of existing algorithms, e.g., [8,9] or the references therein. Let us denote the input signal with speech activity by x′(n), where n is a discrete-time index.

Frequency Decomposition and Envelope Extraction

[0184] The first step is to perform a frequency decomposition of the signal x(n). This may be achieved in many ways, e.g., using a short-time Fourier transform (STFT), a band-pass filterbank (e.g., a Gamma-tone filter bank), etc. Subsequently, the temporal envelopes of each sub-band signal are extracted. This may, e.g., be achieved using a Hilbert transform, or by low-pass filtering the magnitude of complex-valued STFT signals, etc.

[0185] As an example, we describe in the following how the frequency decomposition and envelope extraction can be achieved using an STFT. Let us assume a sampling frequency of 10000 Hz. First, a time-frequency representation is obtained by segmenting x′(n) into (e.g. 50%) overlapping, windowed frames; normally, some tapered window, e.g. a Hanning-window is used. The window length could e.g. be 256 samples when the sample rate is 10000 Hz. Then, each frame is Fourier transformed using a fast Fourier transform (FFT) (potentially after appropriate zero-padding). The resulting DFT bins may be grouped in perceptually relevant sub-bands. For example, one could use one-third octave bands (e.g. as in [4]), but it should be clear that any other sub-band division can be used (for example, the grouping could be uniform, i.e., unrelated to perception in this respect). In the case of one-third octave bands and a sampling rate of 10000 Hz, there are 15 bands which cover the frequency range 150-5000 Hz (cf. e.g. [4]). Other numbers of bands and another frequency range can be used. We refer to the time-frequency tiles defined by these frames and sub-bands as time-frequency (TF) units (or STFT coefficients). Applying this to the noisy/processed input signal x(n) leads to (generally complex-valued) STFT coefficients x(k,m), where k and m denote frequency and frame (time) indices, respectively. Temporal envelope signals may then be extracted as

[00012] $x_{j} (m) = f (\sqrt{{.Math.}_{k = k .Math. .Math. 1 .Math. (j)}^{k .Math. .Math. 2 .Math. (j)} .Math. {.Math. x (k, m) .Math.}^{2}}),$

j=1, . . . J, and m=1, . . . M,
where k1(j) and k2(j) denote DFT bin indices corresponding to lower and higher cut-off frequencies of the j'th sub-band, J is the number of sub-bands, and M is the number of signal frames in the signal in question, and where the function ƒ(•)=ƒ(w), where w represents

[00013] $(\sqrt{{.Math.}_{k = k .Math. .Math. 1 .Math. (j)}^{k .Math. .Math. 2 .Math. (j)} .Math. {.Math. x (k, m) .Math.}^{2}}),$

is included for generality. In an embodiment, x.sub.j(m) is real (i.e. f(•) represents a real (non-complex) function). For example, for ƒ(w)=w, we get the temporal envelope used in [4], with ƒ(w)=w.sup.2, we extract power envelopes, and with ƒ(w)=2.Math.log w or ƒ(w)=w.sup.β, 0<β<2, we can model the compressive non-linearity of the healthy cochlea (cf. e.g. [10, 11]). It should be clear that other reasonable choices for ƒ(w) exist.

[0186] As mentioned, other envelope representations may be implemented, e.g., using a Gammatone filterbank, followed by a Hilbert envelope extractor, etc, and functions ƒ(w) may be applied to these envelopes in a similar manner as described above for STFT based envelopes. In any case, the result of this procedure is a time-frequency representation in terms of sub-band temporal envelopes, x.sub.j(m), where j is a sub-band index, and m is a time index (cf. e.g. FIG. 1B).

Time-Frequency Segments

[0187] Next, we divide the time-frequency representation x.sub.j(m) into segments, i.e., spectrograms corresponding to N successive samples of all sub-band signals. For example, the m'th segment is defined by the J×N matrix

[00014] $X_{m} = [\begin{matrix} x_{1} (m - N + 1) & .Math. & x_{1} (m) \\ .Math. & .Math. \\ x_{J} (m - N + 1) & .Math. & x_{J} (m) \end{matrix}] .$

[0188] It should be understood that other versions of the time-segments could be used, e.g., segments, which have been shifted in time to operate on frame indices m−N/2+1 through m+N/2, to be centered around the current value of frame index m.

Normalizations and Transformation of Time-Frequency Segments

[0189] The rows and columns of each segment X.sub.m may be normalized/transformed in various ways.

[0190] In particular, we consider the following row normalizations/transformations: [0191] Normalization of rows to zero mean:

g.sub.1(X)=X−μ.sub.x.sup.r1.sup.T, [0192] where μ.sub.x.sup.r is a J×1 vector whose j'th entry is the mean of the j'th row of X (hence the superscript r in μ.sub.x.sup.r), where 1 denotes an N×1 vector of ones, and where superscript T denotes matrix transposition). [0193] Normalization of rows to unit-norm:

g.sub.2(X)=D.sup.r(X)X, [0194] where D.sup.r(X)=diag(└1/√{square root over (X(1,:)X(1,:).sup.H)} . . . 1/√{square root over (X(J,:)X(J,:).sup.H)}┘). Here X(j,:) denotes the j'th row of X, such that D.sup.r(X) is a J×J diagonal matrix with the inverse norm of each row on the main diagonal, and zeros elsewhere (the superscript H denotes Hermitian transposition). Pre-multiplication with D.sup.r(X) normalizes the rows of the resulting matrix to unit-norm. [0195] Fourier transformation applied to each row

g.sub.3(X)=XF, [0196] where F is an N×N Fourier matrix. [0197] Fourier transformation applied to each row followed by computing the magnitude of the resulting complex-valued elements

g.sub.4(X)=|XF| [0198] where |•| (computes the element-wise magnitudes; [0199] The identity operator

g.sub.5(X)=X

[0200] We further consider the following column normalizations [0201] Normalization of columns to zero mean:

h.sub.1(X)=X−1μ.sub.x.sup.c.sup.T, [0202] where μ.sub.x.sup.c is a N×1 vector whose i.sup.th entry is the mean of the i.sup.th row of X, and where 1 denote an J×1 vector of ones. [0203] Normalization of columns to unit-norm:

h.sub.2(X)=XD.sup.c(X), [0204] where D.sup.c(X)=diag(└1/√{square root over (X(:,1).sup.HX (:,1))} . . . 1/√{square root over (X(:,N).sup.HX(:,N))}┘). Here X(:,n) denotes the n'th row of X, such that D.sup.c(X) is a diagonal N×N matrix with the inverse norm of each column on the main diagonal, and zeros elsewhere. Post-multiplication with D.sup.c(X) normalizes the rows of the resulting matrix to unit-norm.

[0205] The row- and column normalizations/transformations listed above may be combined in different ways

[0206] One combination of particular interest is where, first, the rows are normalized to zero-mean and unit-norm, followed by a similar mean and norm normalization of the columns. This particular combination may be written as

{tilde over (X)}.sub.m=h.sub.2(h.sub.1(g.sub.2(g.sub.1(X.sub.m)))),

where {tilde over (X)}.sub.m is the resulting row- and column normalized matrix.

[0207] Another transformation of interest is to apply a Fourier transform to each row of matrix X.sub.m. With the introduced notation, this may be written simply as

{tilde over (X)}.sub.m=g.sub.3(X.sub.m),

where {tilde over (X)}.sub.m is the resulting (complex-valued) J×N matrix.

[0208] Other combinations of these normalizations/transformations may be of interest, e.g., {tilde over (X)}.sub.m=g.sub.2(g.sub.1(h.sub.2(h.sub.1(X.sub.m)))) (mean- and norm-standardization of the columns followed by mean- and norm-standardization of the rows), {tilde over (X)}.sub.m=g.sub.2(g.sub.1(g.sub.3(X.sub.m))) (mean- and norm-standardization of Fourier-transformed rows), {tilde over (X)}.sub.m=g.sub.4(X.sub.m), which completely bypasses the normalization stage, etc.

[0209] A still further combination is to provide at least one normalization and/or transformation operation of rows and at least one normalization and/or transformation operation of columns of said time-frequency segments S.sub.m and X.sub.m.

Estimation of Noise-Free Time-Frequency Segments

[0210] The next step involves estimation of the underlying noise-free normalized/transformed time-frequency segment {tilde over (S)}.sub.m. Obviously, this matrix cannot be observed in practice, since only the noisy/processed normalized/transformed time-frequency segment in matrix {tilde over (X)}.sub.m is available. So, we estimate {tilde over (S)}.sub.m based on {tilde over (X)}.sub.m.

[0211] To this end, let us define a J.Math.N×1 super-vector {tilde over (x)}.sub.m by stacking the columns of matrix {tilde over (X)}.sub.m, i.e.,

{tilde over (x)}.sub.m=[{tilde over (X)}.sub.m(:,1).sup.T{tilde over (X)}.sub.m(:,2).sup.T . . . {tilde over (X)}.sub.m(:,N).sup.T].sup.T.

[0212] Similarly, we define the corresponding noise-free/unprocessed super-vector {tilde over (s)}.sub.m as

{tilde over (s)}.sub.m=[{tilde over (S)}.sub.m(:,1).sup.T{tilde over (S)}.sub.m(:,2).sup.T . . . {tilde over (S)}.sub.m(:,N).sup.T].sup.T.

[0213] The goal is now to derive an estimate {tilde over (ŝ)}.sub.m of {tilde over (s)}.sub.m based on {tilde over (x)}.sub.m, i.e.,

{tilde over ({circumflex over (s)})}.sub.m=r({tilde over (x)}.sub.m),

where r(.) is an estimator that maps J.Math.N×1 noisy super-vectors to estimates of noise-free J.Math.N×1 super-vectors.

[0214] The problem of estimating an un-observable target vector {tilde over (s)}.sub.m based on a related, but distorted, observation {tilde over (x)}.sub.m is a well-known problem in many engineering contexts, and many methods can be applied to solve it. These include (but are not limited to) methods based on neural networks, e.g. where the map r(.) is pre-estimated off-line, e.g. using supervised learning techniques, Bayesian techniques, e.g., where the joint probability density function of ({tilde over (s)}.sub.m,{tilde over (x)}.sub.m) is estimated off-line and used for providing estimates of {tilde over (s)}.sub.m, which are optimal in some statistical sense, e.g., minimum mean-square error (mmse) sense, maximum a posteriori (MAP) sense, or maximum likelihood (ML) sense, etc.

[0215] A particularly simple class of solutions involve maps r(.) which are linear in the observations {tilde over (x)}.sub.m. In this solution class, we form a linear estimate {tilde over (ŝ)}.sub.m of the corresponding noise-free J.Math.N×1 super-vector {tilde over (s)}.sub.m from linear combinations of the entries in {tilde over (x)}.sub.m, i.e.,

{tilde over ({circumflex over (s)})}.sub.m=G{tilde over (x)}.sub.m,

where G is a pre-estimated J.Math.N×J.Math.N matrix (see e.g. below for an example of how G can be found). Finally, an estimate {tilde over (Ŝ)}.sub.m is found of the clean normalized/transformed segment by simply reshaping the super-vector estimate {tilde over (ŝ)}.sub.m to a time-frequency segment matrix,

{tilde over ({circumflex over (S)})}.sub.m=[{tilde over ({circumflex over (s)})}.sub.m(1:J)+{tilde over ({circumflex over (s)})}.sub.m(J+1:2J) . . . {tilde over ({circumflex over (s)})}.sub.m(J(N−1)+1:JN)],

where {tilde over (ŝ)}.sub.m(r:q) denotes a vector consisting of entries of vector {tilde over (ŝ)}.sub.m with index r through q.

Estimation of Intermediate Intelligibility Coefficients

[0216] The estimated normalized/transformed time-frequency segment {tilde over (Ŝ)}.sub.m may now be used together with the corresponding noisy/processed segment {tilde over (X)}.sub.m to compute an intermediate intelligibility index d.sub.m, reflecting the intelligibility of the signal segment {tilde over (X)}.sub.m. To do so, let us first define the sample correlation coefficient d(a,b) of the elements in two K×1 vectors a and b:

[00015] $d (a, b) = \frac{{.Math.}_{k = 1}^{K} .Math. .Math. (a (k) - μ_{a}) .Math. (b (k) - μ_{b})}{\sqrt{{.Math.}_{k = 1}^{K} .Math. .Math. {(a (k) - μ_{a})}^{2} .Math. {(b (k) - μ_{b})}^{2}}}, .Math. where .Math.$ $μ_{a} = \frac{1}{K} .Math. {.Math.}_{k = 1}^{K} .Math. a (k) .Math. .Math. and .Math. .Math. μ_{b} = \frac{1}{K} .Math. {.Math.}_{k = 1}^{K} .Math. b (k) .$

[0217] Several options exist for computing the intermediate intelligibility index d.sub.m. In particular, d.sub.m may be defined as [0218] 1) the average sample correlation coefficient of the columns in {tilde over (Ŝ)}.sub.m and

[00016] ${\tilde{X}}_{m}, i . e ., d_{m} = \frac{1}{N} .Math. {.Math.}_{n = 1}^{N} .Math. d ({\hat{\tilde{S}}}_{m} (:, n), {\tilde{X}}_{m} (:, n)),$

or [0219] 2) the average sample correlation coefficient of the rows in {tilde over (Ŝ)}.sub.m and

[00017] ${\tilde{X}}_{m}, i . e ., d_{m} = \frac{1}{J} .Math. {.Math.}_{j = 1}^{J} .Math. d ({{\hat{\tilde{S}}}_{m} (j, :)}^{T}, {{\tilde{X}}_{m} (j, :)}^{T}),$

or [0220] 3) the sample correlation coefficient of all elements in {tilde over (Ŝ)}.sub.m and {tilde over (X)}.sub.m, i.e.,

d.sub.m=d({tilde over ({circumflex over (s)})}.sub.m,{tilde over (x)}.sub.m).

[0221] Alternatively, the noisy/processed segment {tilde over (X)}.sub.m and the corresponding estimate of the underlying clean segment {tilde over (Ŝ)}.sub.m may be used to generate an estimate of the noise-free, unprocessed speech signals, which can be used with the noisy, processed signals as input to any existing intrusive intelligibility prediction scheme, e.g., the STOI algorithm (cf. e.g. [4]).

Estimation of Final Intelligibility Coefficient

[0222] The final intelligibility coefficient d, which reflects the intelligibility of the noisy/processed input signal x(n), is defined as the average of the intermediate intelligibility coefficients, potentially transformed via a function u(d.sub.m), across the duration of the speech-active parts of x(n) i.e.,

[00018] $d = \frac{1}{M} .Math. {.Math.}_{m = 1}^{M} .Math. u (d_{m}) .$

[0223] The function u(d.sub.m) may for example be

[00019] $u (d_{m}) = \log (\frac{1}{1 - d_{m}^{2}}),$

to link the intermediate intelligibility coefficients to information measures (cf. e.g. [14]), but it should be clear that other choices exist.

[0224] The “do-nothing” function u(d.sub.m)=d.sub.m may also be used, as has been done in the STOI algorithm (cf. [4]).

Pre-Computation of Linear Map

[0225] As outlined above, many methods exist for estimating the noise-free (potentially normalized/transformed) supervector {tilde over (s)}.sub.m, based on the entries in the noisy/processed (and optionally normalized/transformed) supervector {tilde over (x)}.sub.m. In this section—to demonstrate a particularly simple realization of the invention—we constrain our attention to linear estimators, i.e., where the estimate of {tilde over (s)}.sub.m is found as an appropriate linear combination of the entries in {tilde over (x)}.sub.m. Any such linear combination may be written compactly as

{tilde over ({circumflex over (s)})}.sub.m=G{tilde over (x)}.sub.m,

where G is a pre-estimated J.Math.N×J.Math.N matrix. In general, J and N can be chosen according to the application in question. N may preferably be chosen with a view to characteristics of the human vocal system. In an embodiment, N is chosen, so that a time spanned by N (possibly overlapping) time frames is in the range from 50 ms or 100 ms to 1 s, e.g. between 300 ms and 600 ms. In embodiment, N is chosen to represent the (e.g. average or maximum) duration of a basic speech element of the language in question. In embodiment, N is chosen to represent the (e.g. average or maximum) duration of a syllable (or word) of the language in question. In an embodiment, J=15. In an embodiment, N=30. In an embodiment J.Math.N=450. In an embodiment, a time frame has duration of 10 ms, or more, e.g. 25 ms or more, e.g. 40 ms or more (e.g. depending on a degree of overlap). In an embodiment, a time frame has a duration in the range between 10 ms and 40 ms.

[0226] As described in more detail in the following, the matrix G may be pre-estimated (i.e. off-line, prior to application of the proposed method or device) using a training set of noise-free speech signals. We can think of G as a way of building a priori knowledge of the statistical structure of speech signals into the estimation process. Many variants of this approach exist. In the following, one of them is described. This approach has the advantage of being computationally relatively simple, and hence well suited for applications (such as portable electronic devices, e.g. hearing aids) where power consumption is an important design parameter (restriction).

[0227] Let us for convenience assume that all noise-free training speech signals are concatenated into a (potentially very long) training speech signal z(n). Assume that the steps described above to find noisy super vectors {tilde over (x)}.sub.m are applied to the training speech signal z(n). In other words, z(n) is subject to voice activity detection, collection of samples into time-frequency segment matrices, applying relevant normalizations/transformations of the form g.sub.i(X), h.sub.i(X), to the matrices, and stacking the columns of the resulting matrices into super vectors {tilde over (z)}.sub.m, m=1, . . . , {tilde over (M)}, where {tilde over (M)} denotes the total number of segments in the entire noise-free speech training set.

[0228] We compute the J.Math.N×J.Math.N sample correlation matrix across the training set as

[00020] ${\hat{R}}_{\tilde{z}} = \frac{1}{\tilde{M}} .Math. {.Math.}_{m = 1}^{\tilde{M}} .Math. {\tilde{z}}_{m} .Math. {\tilde{z}}_{m}^{H},$

and compute the eigen-value decomposition of this matrix,

{circumflex over (R)}.sub.{tilde over (z)}=U.sub.{tilde over (z)}Λ.sub.{tilde over (z)}U.sub.{tilde over (z)}.sup.H,

where Λ.sub.{tilde over (z)} is a diagonal J.Math.N×J.Math.N matrix with real-valued eigenvalues in decreasing order, and where the columns of the J.Math.N×J.Math.N matrix U.sub.{tilde over (z)} are the corresponding eigen vectors.

[0229] Finally let us partition the eigen vector matrix U.sub.{tilde over (z)} into two submatrices

U.sub.{tilde over (z)}=└U.sub.{tilde over (z)},1U.sub.{tilde over (z)},2┘,

where U.sub.{tilde over (z)},1 is an J.Math.N×L matrix with the eigenvectors corresponding to the L<J.Math.N dominant eigenvalues, and U.sub.{tilde over (z)},2 has the remaining eigen vectors as columns. As an example, L/(J.Math.N) may be less than 80%, such as less than 50%, e.g. less than 33%, such as less than 20% or less than 10%. In the above example of J.Math.N=450, L may e.g. be 100 (leading to U.sub.{tilde over (z)},1 being a 450×100 matrix (dominant sub-space), and U.sub.{tilde over (s)},2 being a 450×350 matrix (inferior sub-space)).

[0230] The (J.Math.N×J.Math.N) matrix G may then be computed as

G=U.sub.{tilde over (z)},1U.sub.{tilde over (z)},1.sup.H.

[0231] This example of matrix G may be recognized as an orthogonal projection operator (cf. e.g. [12]). In this case, forming the estimate {tilde over (ŝ)}.sub.m=G{tilde over (x)}.sub.m simply projects the noisy/processed super vector {tilde over (x)}.sub.m orthogonally onto the linear subspace spanned by the columns in U.sub.{tilde over (z)},1.

Binaural, Non-Intrusive Intelligibility Prediction.

[0232] In principle, methods from the class of monaural, non-intrusive intelligibility predictors proposed above are able to predict the intelligibility of speech signals, when the listener listens with one ear. While this can already give a good indication of the intelligibility that can be achieved when listening with both ears, there exist acoustic situations, where two-ear listening is much more advantageous than listening with one ear (cf. e.g. [5]). To take this effect into account, a first binaural, non-intrusive speech intelligibility predictor d.sub.binaural (e.g. taking on values between −1 and 1) is proposed. The monaural intelligibility predictor described above serves as the basis for the proposed first binaural intelligibility predictor.

[0233] The general block diagram of the proposed binaural intelligibility predictor is shown in FIG. 5A. FIG. 5A shows a first binaural speech intelligibility predictor in combination with a hearing loss model. The Binaural Speech Intelligibility Predictor (BSIP) estimates an intelligibility index d.sub.binaural, which reflects the intelligibility of a listener listening to two noisy and potentially processed information signals comprising speech x.sub.left and x.sub.right (presented to the listener's left and right ears, respectively). Optionally, (noisy and/or processed) binaural signals y.sub.left and y.sub.right comprising speech are passed through a binaural hearing loss model (BHLM) first, to model the imperfections of an impaired auditory system, providing noisy and/or processed hearing loss shaped signals x.sub.left and x.sub.right for use by the binaural speech intelligibility predictor (BSIP).

[0234] As for the monaural case, a potential hearing loss may be modelled by simply adding independent noise to the input signals, spectrally shaped according to the audiogram of the listener—this approach was e.g. used in [7].

Better-Ear Non-Intrusive Binaural Intelligibility Prediction

[0235] A simple method for binaural speech intelligibility prediction is to apply the monaural model described above independently to the left- and right-ear inputs signals x.sub.left and x.sub.right, resulting in intelligibility indices d.sub.left and d.sub.right, respectively. Assuming that the listener is able to mentally adapt to the ear with the best intelligibility, the resulting better-ear intelligibility predictor d.sub.binaural is given by:

d.sub.binaural=max(d.sub.left,d.sub.right).

[0236] A block diagram of this approach is given in FIG. 5B

[0237] FIG. 5B shows an embodiment of a binaural speech intelligibility predictor based on a combination of two monaural speech intelligibility predictors in combination with a hearing loss model. FIG. 5B illustrates processing steps for determining a better-ear non-intrusive binaural intelligibility predictor d.sub.binaural. Along the lines of FIG. 5A, FIG. 5B shows noisy and/or processed binaural signals y.sub.left and y.sub.right comprising speech are (in each of the left and right monaural speech intelligibility predictors), which are passed through respective hearing loss models (HLM) for the left and right ears, providing noisy and/or processed hearing loss shaped signals x.sub.left and x.sub.right. Together, the hearing loss models (HLM) for the left and right ears may constitute or form part of the binaural hearing loss model (BHLM) of FIG. 5A. The left and right information signals x.sub.left and x.sub.right are used by the monaural speech intelligibility predictors (MSIP) of the left and right ears, respectively, to provide left and right (monaural) speech intelligibility predictors d.sub.left and d.sub.right. A maximum value of the left and right speech intelligibility predictors d.sub.left and d.sub.right is determined by calculation unit (max) and used as the binaural intelligibility predictor d.sub.binaural. Together, the monaural speech intelligibility predictors (MSIP) of the left and right ears and the calculation unit (max) may constitute or form part of the binaural speech intelligibility predictor (BSIP) of FIG. 5A.

General Non-Intrusive Binaural Intelligibility Prediction

[0238] While the better ear intelligibility prediction approach described above will work well in a wide range of acoustic situations (see e.g. [5] for a discussion of binaural intelligibility), there are acoustic situations, where it is too simple. To account for this, we propose to combine the steps of the monaural intrusive intelligibility predictor, outlined above, with ideas from the binaural, intrusive intelligibility predictor described in [13], to arrive at a general, novel non-intrusive binaural intelligibility predictor.

[0239] The processing steps of the proposed non-intrusive binaural intelligibility predictor are outlined in FIG. 6. The individual processing blocks in FIG. 6 are identical to the blocks used in the monaural, non-intrusive speech intelligibility predictor proposed above (FIG. 4), except for the Equalization-Cancellation stage (EC) (as indicated with a bold-faced box in FIG. 6). This stage, on the other hand, is completely described in [13]. In the following, the EC-stage is briefly outlined. For a detailed treatment, see [13] and the references therein.

[0240] The EC-stage operates independently on different frequency sub-bands (hence, the frequency decomposition stage before the EC-stage). In each sub-band (index j), the EC-stage time-shifts the input signals (from left and right ear) and adjusts their amplitudes in order to find the time shift and amplitude adjustment that leads to the maximum predicted intelligibility (d.sub.binaural in FIG. 5, hence, the bold dashed arrow from the output of the model leading back to the EC-stage). In an embodiment, d.sub.binaural is maximized in each frequency band, whereby a resulting binaural speech intelligibility predictor can be provided, e.g. as a single scalar value. In general, no closed-form solution exists for the optimal time-shift/amplitude adjustment, but the optimal parameter pairs may at least be found by a brute-force search across a suitable range of parameter values (see [13] for details of such exhaustive search approach).

Monaural and Binaural Intelligibility Enhancement Using Intelligibility Predictors

[0241] The methods proposed in the previous sections for non-intrusive monaural and binaural speech intelligibility prediction can be used for online adaptation of the signal processing taking place in a hearing aid system (or another communication device), in order to maximize the speech intelligibility of its output. This general idea is depicted in FIG. 7 for a binaural setting: noisy/reverberant signals y.sub.1(n), . . . , y.sub.L(n) are picked up by a total of L microphones.

[0242] FIG. 7 shows a method of providing an intrusive binaural speech intelligibility predictor d.sub.binaural for adapting the processing of a binaural hearing aid systems to maximize the intelligibility of output speech signal(s).

[0243] In the binaural setting, the L microphone signals y′.sub.1, y′.sub.2, . . . , y′.sub.L are processed in binaural signal processor (BSPU) to produce a left- and a right-ear signal, u.sub.left and u.sub.right, e.g. to be presented for a user. In FIG. 7, all L microphones of the hearing aid system together; one or more microphones are generally available from the left- and right-ear hearing aids, respectively, but microphone signals could also be available from external devices, e.g., table microphones, microphones positioned at a target talker, etc. The microphone signals from spatially separated locations are assumed to be transmitted wirelessly (or wired) for processing in the hearing aid system. To estimate the intelligibility experienced by the user when listening binaurally to the left- and right-ear signals, u.sub.left and u.sub.right, the signals are passed through the binaural intelligibility model (BSIP) proposed above, where the binaural hearing loss model (BHLM, see above for some details) is optional. The resulting estimated intelligibility index d.sub.binaural is returned to the processing unit (BSPU) of the hearing aid system, which adapts the parameters of relevant signal processing algorithms to maximize d.sub.binaural.

[0244] The adaptation of processing could take place as follows. Let us assume that, the hearing aid system has at its disposal a number of processing schemes, which could be relevant for a particular acoustic situation. For example, in a speech-in-noise situation, the hearing aid system may be equipped with three different noise reduction schemes: mild, medium, and aggressive. In this situation, the hearing aid system applies (e.g. successively) each of the noise reduction schemes to the input signal and chooses the one that leads to maximum (estimated) intelligibility. The hearing aid user need not suffer the perceptual annoyance of the hearing aid system “trying-out” processing schemes. Specifically, the hearing aid system could try out the processing schemes “internally”, i.e., without presenting the result of each of the tried-out processing schemes through the loudspeakers—only the output signal which has largest (estimated) intelligibility needs to be presented to the user.

[0245] It should be obvious, that this procedure can be applied on a more detailed level as well. In particular, even a value of a single parameter in the hearing aid system, e.g., the maximum attenuation of a noise reduction system in a particular frequency band, may be optimized with respect to intelligibility by trying out a range of candidate values and choosing the one leading to maximum (estimated) intelligibility.

[0246] The idea of using non-intrusive speech intelligibility predictors for speech intelligibility enhancement has been described in a general binaural model context. It should be obvious that exactly the same idea could be executed for the better-ear non-intrusive intelligibility model described above, or for a monaural listening situation, using the monaural non-intrusive intelligibility model. These aspects are further described in the following in connection with FIGS. 8A, 8B, and 8C.

[0247] FIG. 8A shows an embodiment of a hearing aid (HD) according to the present disclosure comprising a monaural speech intelligibility predictor unit (MSIP) for estimating intelligibility of an output signal u and using the predictor to adapt the signal processing of an input speech signal y′ to maximize the monaural speech intelligibility predictor d. The hearing aid HD comprises at least one input unit (here a microphone, e.g. two or more). The microphone provides a time-variant electric input signal y′ representing a sound input y received at the microphone. The electric input signal y′ is assumed to comprise a target signal component and a noise signal component (at least in some time segments). The target signal component originates from a target signal source, e.g. a person speaking. The hearing aid further comprises a configurable signal processor (SPU) for processing the electric input signal y′ and providing a processed signal u. The hearing aid further comprises an output unit for creating output stimuli configured to be perceivable by the user as sound based on an electric output either in the form of the processed signal u from the signal processor or a signal derived therefrom. In the embodiment of FIG. 8A a loudspeaker is directly connected to the output of the signal processor. (SPU), thus receiving output signal u. The hearing aid further comprises a hearing loss model unit (HLM) connected to the monaural speech intelligibility predictor unit (MSIP) and the output of the signal processor, and configured to modify the electric output signal u reflecting a hearing impairment of the relevant ear of the user to provide information signal x to the monaural speech intelligibility predictor unit (MSIP). The monaural speech intelligibility predictor unit (MSIP) provides an estimate of the intelligibility of the output signal by the user in the form of the (final) speech intelligibility predictor d, which is fed to a control unit of the configurable signal processor to modify signal processing to optimize d.

[0248] FIG. 8B shows a first embodiment of a binaural hearing aid system according to the present disclosure comprising a binaural speech intelligibility predictor unit (BSIP) for estimating the perceived intelligibility of the user when presented with the respective left and right output signals u.sub.left and u.sub.right of the binaural hearing aid system and using the predictor d.sub.binaural adapt the binaural signal processor (BSPU) of input signals y′.sub.left and y′.sub.right comprising speech to maximize the binaural speech intelligibility predictor d.sub.binaural. This is done by feeding the output signals u.sub.left and u.sub.right presented to the user via output respective units (here loudspeakers)

[0249] To a binaural hearing loss model that models the (impaired) auditory system of the user and presents resulting left and right signals x.sub.left and x.sub.right to the binaural speech intelligibility predictor unit (BSIP). The configurable binaural signal processor (BSIP) is adapted to control the processing of the respective electric input signals y′.sub.left and y′.sub.right based on the final binaural speech intelligibility measure d.sub.binaural to optimize said measure thereby maximizing the users' intelligibility of the input sound signals y.sub.left and y.sub.right.

[0250] A more detailed embodiment of binaural hearing aid system of FIG. 8B is shown in FIG. 8C. FIG. 8C shows an embodiment of a binaural hearing system comprising left and right hearing aids (HD.sub.left, HD.sub.right) according to the present disclosure. The left and right hearing aids (HD.sub.left, HD.sub.right) are adapted to be located at or in left and right ears (Left Ear, Right Ear in FIG. 8C) of a user. The signal processing of each of the left and right hearing aids is guided by an estimate of the speech intelligibility experienced by the hearing aid user, the binaural speech intelligibility predictor d.sub.binaural (cf. control signal d.sub.binaural from the binaural speech intelligibility predictor (BSIP) to the respective signal processors (SPU) of the left and right hearing aids). The binaural speech intelligibility predictor unit (BSIP) is configured to take as inputs the output signals u.sub.left, u.sub.right of left and hearing aids as modified by a hearing loss model (HLM.sub.left, HLM.sub.right, respectively, in FIG. 8C) for the respective left and right ears of the user, respectively (to model imperfections of an impaired auditory system of the user). In this example, the speech intelligibility estimation/prediction takes place in the left-ear hearing aid (Left Ear: HD.sub.left). The output signal u.sub.right of the right-ear hearing aid (Right Ear: HD.sub.right) is transmitted to the left-ear hearing aid (Left Ear: HD.sub.left) via communication link LINK. The communication link (LINK) may be based on a wired or wireless connection. The hearing aids are preferably wirelessly connected.

[0251] Each of the hearing aids (HD.sub.left, HD.sub.right) comprise two microphones, a signal processing block (SPU), and a loudspeaker. Additionally, one or both of the hearing aids comprise a binaural speech intelligibility unit (BSIP). The two microphones of each of the left and right hearing aids (HD.sub.left, HD.sub.right) each pick up a—potentially noisy (time varying) signal y(t) (cf. y.sub.1,left, y.sub.2,left and y.sub.1,right, y.sub.2,right in FIG. 8C)—and which generally consists of a target signal component s(t) (cf. s.sub.1,left, s.sub.2,left and s.sub.1,right, s.sub.2,right in FIG. 8C) and an undesired signal component v(t) (cf. v.sub.1,left, v.sub.2,left and v.sub.1,right, v.sub.2,right in FIG. 8C). In FIG. 8C, the subscripts 1, 2 indicate a first and second (e.g. front and rear) microphone, respectively, while the subscripts left, right indicate whether it is the left or right ear hearing aid (HD.sub.left, HD.sub.right, respectively).

[0252] Based on binaural speech intelligibility predictor d.sub.binaural, the signal processors (SPU) of each hearing aid may be (individually) adapted (cf. control signal d.sub.binaural). Since the binaural speech intelligibility predictor is determined in the left-ear hearing aid (HD.sub.left), adaptation of the processing in the right-ear hearing aid (HD.sub.right) requires control signal d.sub.binaural to be transmitted from left to right-ear hearing aid via communication link (LINK).

[0253] In FIG. 8C, each of the left and right hearing aids comprise two microphones. In other embodiments, each (or one) of the hearing aids may comprises three or more microphones. Likewise, in FIG. 8C, the binaural speech intelligibility predictor (BSIP) is located in the left hearing aid (HD.sub.left). Alternatively, the binaural speech intelligibility predictor (BSIP) may be located in the right hearing aid (HD.sub.right), or alternatively in both, preferably performing the same function in each hearing aid. The latter embodiment consumes more power and requires a two-way exchange of output audio signals (u.sub.left, u.sub.right), whereas the exchange of processing control signals (d.sub.binaural in FIG. 8C) can be omitted. In still another embodiment, the binaural speech intelligibility predictor unit (BSIP) is located in a separate auxiliary device, e.g. a remote control (e.g. embodied in a SmartPhone), requiring that an audio link can be established between the hearing aids and the auxiliary device for receiving output signals (u.sub.left, u.sub.right) from, and transmitting processing control signals (d.sub.binaural) to, the respective hearing aids (HD.sub.left, HD.sub.right).

[0254] The processing performed in the signal processors (SPU) and controlled or influenced by the control signals (d.sub.binaural) of the respective left and right hearing aids (HD.sub.left, HD.sub.right) from the binaural speech intelligibility predictor (BSIP) may in principle include any processing algorithm influencing speech intelligibility, e.g. spatial filtering (beamforming) and noise reduction, compression, feedback cancellation, etc. The adaptation of the signal processing of a hearing aid based on the estimated binaural speech intelligibility predictor includes (but are not limited to): [0255] 1. Adapting the aggressiveness of beamformers of the hearing system. Specifically, for binaural beamformers, it is well-known that the beamformer configuration involves a trade-off between noise reduction and spatial correctness of the noise cues. In one extreme setting, the noise is maximally reduced, but all noise signals sound as if originating from the direction of the target signal source. The trade-off that leads to maximum SI is generally time-varying and generally unknown. With the proposed approach, however, it is possible to adapt the beamformer stage of a given hearing aid to produce maximum SI at all times. [0256] 2. Adapting the aggressiveness of a (single-channel (SC)) noise reduction system. Often a beamformer stage is followed by an SC noise reduction stage (cf. e.g. FIG. 6). The aggressiveness of the SC noise reduction filter is adaptable (e.g. by changing the maximum attenuation allowed by the SC noise reduction filter). The proposed approach allows to choose the SI optimal tradeoff, i.e., a system that suppresses an appropriate amount of noise without introducing SI-disturbing artefacts in the target speech signal. [0257] 3. For systems with adaptable analysis/synthesis filterbanks, the analysis/synthesis filter bank leading to maximum SI may be chosen. This implies to change the time-frequency tiling, i.e., the bandwidths and/or sampling rate used in individual subbands to deliver maximum SI in accordance with the target signal and acoustic situation (e.g., noise type, level, spatial distribution, etc.). [0258] 4. If the binaural speech intelligibility predictor unit estimates the maximum SI of the binaural hearing system to be so low that it is of no use for the user, then an indication may be given to the user (e.g. via a sound signal), that the HA system is unable to operate in the given acoustical conditions. It may then adapt its processing, e.g. to at least not introduce sound quality degradations, or to go to a “power-saving” mode, where the signal processing is limited to save power.

[0259] FIG. 9 illustrates an exemplary hearing aid (HD) formed as a receiver in the ear (RITE) type of hearing aid comprising a part (BTE) adapted for being located behind pinna and a part (ITE) comprising an output transducer (OT, e.g. a loudspeaker/receiver) adapted for being located in an ear canal of the user. The BTE-part and the ITE-part are connected (e.g. electrically connected) by a connecting element (IC). In the embodiment of a hearing aid of FIG. 9, the BTE part comprises an input unit comprising two (individually selectable) input transducers (e.g. microphones) (MIC.sub.1, MIC.sub.2) each for providing an electric input audio signal representative of an input sound signal. The input unit further comprises two (individually selectable) wireless receivers (WLR.sub.1, WLR.sub.2) for providing respective directly received auxiliary audio and/or information signals. The hearing aid (HA) further comprises a substrate SUB whereon a number of electronic components are mounted, including a configurable signal processor (SPU), a monaural speech intelligibility predictor unit (MSIP), and a hearing loss model unit (coupled to each other and input and output units via electrical conductors Wx), as e.g. described above in connection with 8A. The configurable signal processor (SPU) provides an enhanced audio signal (cf. e.g. signal u in FIG. 8A), which is intended to be presented to a user. In the embodiment of a hearing aid device in FIG. 9, the ITE part comprises an output unit in the form of a loudspeaker (receiver) (OT) for converting an electric signal (e.g. u in FIG. 8A) to an acoustic signal. The ITE-part further comprises a guiding element, e.g. a dome, (DO) for guiding and positioning the ITE-part in the ear canal of the user.

[0260] The hearing aid (HD) exemplified in FIG. 9 is a portable device and further comprises a battery (BAT) for energizing electronic components of the BTE- and ITE-parts.

[0261] The hearing aid device comprises an input unit for providing an electric input signal representing sound. The input unit comprises one or more input transducers (e.g. microphones) (MIC.sub.1, MIC.sub.2) for converting an input sound to an electric input signal. The input unit comprises one or more wireless receivers (WLR.sub.1, WLR.sub.2) for receiving (and possibly transmitting) a wireless signal comprising sound and for providing corresponding directly received auxiliary audio input signals. In an embodiment, the hearing aid device comprises a directional microphone system (beamformer) adapted to enhance a target acoustic source among a multitude of acoustic sources in the local environment of the user wearing the hearing aid device. In an embodiment, the directional system is adapted to detect (such as adaptively detect) from which direction a particular part of the microphone signal originates.

[0262] The hearing aid of FIG. 9 may form part of a hearing aid and/or a binaural hearing aid system according to the present disclosure.

[0263] FIG. 10A shows an embodiment of a binaural hearing system comprising left and right hearing aids (HD.sub.left, HD.sub.right) in communication with a portable (handheld) auxiliary device (AD) functioning as a user interface (UI) for the binaural hearing aid system (cf. FIG. 10B). In an embodiment, the binaural hearing system comprises the auxiliary device (Aux, and the user interface UI). In the embodiment of FIG. 10A, wireless links denoted IA-WL (e.g. an inductive link between the left and right hearing aids) and WL-RF (e.g. RF-links (e.g. Bluetooth) between the auxiliary device Aux and the left HD.sub.left, and between the auxiliary device Aux and the right HD.sub.right, hearing aid, respectively) are indicated (implemented in the devices by corresponding antenna and transceiver circuitry, indicated in FIG. 10A in the left and right hearing aids as RF-IA-Rx/Tx-l and RF-IA-Rx/Tx-r, respectively).

[0264] FIG. 10B shows the auxiliary device (Aux) comprising a user interface (UI) in the form of an APP for controlling and displaying data related to the speech intelligibility predictors. The user interface (UI) comprises a display (e.g. a touch sensitive display) displaying a screen of a Speech intelligibility SI-APP for controlling the hearing aid system and a number of predefined actions regarding functionality of the binaural (or monaural) hearing system. In the exemplified (part of the) APP, a user (U) has the option of influencing a mode of operation via the selection of a SI-prediction mode to be a Monaural SIP or Binaural SIP mode. In the screen shown in FIG. 10B. the un-shaded buttons are selected, i.e. Binaural SIP. Further, a show SI-estimate has been activated resulting in a current predicted value of the binaural speech intelligibility predictor d.sub.binaural=85% is displayed. The grey shaded button Monaural SIP may be selected instead of Binaural SIP. Further, the SI-enhancement mode may be selected to activate processing of the input signal that an optimizes the (monaural or binaural) speech intelligibility predictor.

[0265] It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.

[0266] As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening elements may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.

[0267] It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.

[0268] The claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.

[0269] Accordingly, the scope should be judged in terms of the claims that follow.

REFERENCES

[0270] [1] T. H. Falk, V. Parsa, J. F. Santos, K. Arehart, O. Hazrati, R. Huber, J. M. Kates, and S. Scollie, “Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices,” IEEE Signal Processing Magazine, Vol. 32, No. 2, pp. 114-124, March 2015. [0271] [2] American National Standards Institute, “ANSI S3.5, Methods for the Calculation of the Speech Intelligibility Index,” New York 1995. [0272] [3] K. S. Rhebergen and N. J. Versfeld, “A speech intelligibility index based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners,” J. Acoust. Soc. Am., vol. 117, no. 4, pp. 2181-2192, 2005. [0273] [4] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2125-2136, September 2011. [0274] [5] A. W. Bronkhorst, “The cocktail party phenomenon: A review on speech intelligibility in multiple-talker conditions,” Acta Acustica United with Acustica, vol. 86, no. 1, pp. 117-128, January 2000. [0275] [6] B. C. J. Moore, “Cochlear Hearing Loss, Physiological, Psychological and Technical Issues,” Wiley, 2007. [0276] [7] R. Beutelmann and T. Brand, “Prediction of intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners,” J. Acoust. Soc. Am., Vol. 120, no. 1, pp. 331-342, April 2006. [0277] [8] J. R. Deller, J. G. Proakis, and J. H. L. Hansen, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000. [0278] [9] P. C. Loizou, “Speech Enhancement—Theory and Practice,” CRC Press, 2007. [0279] [10] T. Dau, D. Püschel, and A. Kohlraush, “A quantitative model of the “effective” signal processing in the auditory system. I. Model structure,” J. Acoust. Soc. Am., Vol. 99, no. 6, pp. 3615-3622, 1996. [0280] [11] J. Jensen and Z.-H. Tan, “Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features—A Theoretically Consistent Approach,” IEEE Trans. Audio, Speech, Language Process., Vol. 23, No. 1, pp. 186-197, 2015. [0281] [12] Y. Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Trans. Speech, Audio Proc., vol. 3, no. 4, pp. 251-266, 1995. [0282] [13] A. H. Andersen, J. M. de Haan, Z.-H. Tan, and J. Jensen, “A method for predicting the intelligibility of noisy and non-linearly enhanced binaural speech,” Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pp. 4995-4999, March 2016. [0283] [14] J. Jensen and C. H. Taal, “Speech Intelligibility Prediction based on Mutual Information,” IEEE Trans. Audio, Speech, and Language Processing, vol. 22, no. 2, February 2014, pp. 430-440.

MONAURAL SPEECH INTELLIGIBILITY PREDICTOR UNIT, A HEARING AID AND A BINAURAL HEARING SYSTEM

Assignee

Inventors

Cpc classification

Classification Explorer

G10L25/60

PHYSICS

Classification Explorer

H04R2225/51

ELECTRICITY

Classification Explorer

H04R25/552

ELECTRICITY

Classification Explorer

H04R25/554

ELECTRICITY

Classification Explorer

H04R25/505

ELECTRICITY

Classification Explorer

G10L21/0272

PHYSICS

Classification Explorer

H04R2225/43

ELECTRICITY

International classification

Classification Explorer

H04R25/00

ELECTRICITY

Abstract

Claims

Description