HEARING DEVICE OR SYSTEM COMPRISING A COMMUNICATION INTERFACE

20220337960 · 2022-10-20

Assignee

Inventors

Cpc classification

International classification

Abstract

A hearing device, e.g. a hearing aid, comprises a) at least one input transducer for converting sound in the environment of the hearing device to respective at least one acoustically received electric input signal or signals representing said sound; b) a wireless receiver for receiving an audio signal from a wireless transmitter of a sound capturing device for picking up sound in said environment and providing a wirelessly received electric input signal representing said sound; and c) a processor configured c1) to receive said at least one acoustically received electric input signal or signals, or a processed version thereof; c2) to receive said wirelessly received electric input signal; and c3) to provide a processed signal. The processor comprises a signal predictor for estimating future values of said wirelessly received electric input signal in dependence of a multitude of past values of said signal, thereby providing a predicted signal. The hearing device further comprises d) an output transducer for presenting output stimuli perceivable as sound to the user in dependence of said processed signal from said processor, or a further processed version thereof. The processor is configured to provide said processed signal in dependence of the predicted signal or a processed version thereof 1) alone, or 2) mixed with said at least one acoustically received electric input signal or signals, or a processed version thereof. A hearing device comprising an earpiece and a separate audio processing device is further disclosed. The invention may e.g. be used in hearing devices in wireless communication with audio capture devices in an immediate environment of the user wearing the hearing device.

Claims

1. A hearing device comprising at least one input transducer for converting sound in the environment of the hearing device to respective at least one acoustically received electric input signal or signals representing said sound; a wireless receiver for receiving an audio signal from a wireless transmitter of a sound capturing device for picking up sound in said environment and providing a wirelessly received electric input signal representing said sound; a processor configured to receive said at least one acoustically received electric input signal or signals, or a processed version thereof; to receive said wirelessly received electric input signal; and to provide a processed signal, the processor comprising a signal predictor for estimating future values of said wirelessly received electric input signal in dependence of a multitude of past values of said signal, thereby providing a predicted signal; an output transducer for presenting output stimuli perceivable as sound to the user in dependence of said processed signal from said processor, or a further processed version thereof, wherein the processor is configured to provide said processed signal in dependence of the predicted signal or a processed version thereof alone, or mixed with said at least one acoustically received electric input signal or signals, or a processed version thereof.

2. A hearing device according to claim 1 wherein the processor comprises a delay estimator configured to estimate a time-difference-of-arrival of sound from a given sound source in said environment at said processor between said acoustically received electric input signal or signals, or a processed version thereof, and said wirelessly received electric input signal.

3. A hearing device according to claim 1 comprising a wireless transmitter for transmitting data to another device.

4. A hearing device according to claim 1 wherein the processor comprises a selection controller configured to include said estimated predicted signal or pails thereof in said processed signal in dependence of a sound quality measure.

5. A hearing device according to claim 1 comprising a transform unit, or respective transform units, for providing said at least one acoustically received electric input signal or signals, or a processed version thereof, and/or said wirelessly received electric input signal in a transform domain.

6. A hearing device according to claim 5 wherein said transform units are configured to provide said signals in the frequency domain.

7. A hearing device according to claim 6 wherein the processor is configured to include said estimated future values of said wirelessly received electric input signal in the processed signal only in a limited part of an operating frequency range of the hearing device.

8. A hearing device according to claim 7 wherein the processed signal comprises future values of said wirelessly received electric input signal only in frequency bands or time-frequency regions that fulfil a sound quality criterion.

9. A hearing device according to claim 1 comprising a beamformer configured to provide a beamformed signal based on said at least one acoustically received electric input signal or signals and said predicted signal.

10. A hearing device according to claim 1 configured to apply spatial cues to the predicted signal before being presented to the user.

11. A hearing device according to claim 2 configured to only activate the signal predictor in case the time-difference-of-arrival is larger than a minimum value.

12. A hearing device comprising at least one earpiece configured to be worn at or in an ear of a user; and a separate audio processing device; the at least one earpiece comprising an input transducer for converting sound in the environment of the hearing device to an acoustically received electric input signal representing said sound; a wireless transmitter for transmitting said acoustically received electric input signal, or a part thereof, to said audio processing device; a wireless receiver for receiving a first processed signal from said audio processing device, at least in a normal mode of operation of the hearing device; and an output transducer for converting a final processed signal to stimuli perceived by, said user as sound, the audio processing device comprising a wireless receiver for receiving said acoustically received electric input signal, or a part thereof, from the earpiece, and to provide a received signal representative thereof; a computing device for processing said received signal, or a signal originating therefrom, and to provide a first processed signal; a transmitter for transmitting said first processed signal to said earpiece; wherein said earpiece or said audio processing device comprises a signal predictor for estimating future values of said received signal, or a processed version thereof, in dependence of a multitude of past values of said signal, thereby providing a predicted signal; wherein said signal predictor is configured to fully or partially compensate for a processing delay incurred by one or more, such as all of said transmission of the acoustically received electric input signal from the hearing device to the audio processing device, said processing in the audio processing device, and said transmission of the predicted signal or a processed version thereof to said earpiece and its reception therein; wherein the final processed signal, at least in a normal mode of operation of the hearing device, is constituted by or comprises at least a part of said predicted signal.

13. A hearing device according to claim 12 wherein the audio processing device comprises said signal predictor.

14. A hearing device according to claim 12 wherein the earpiece comprises an earpiece-computing device configured process said acoustically received electric input signal and/or to said first processed signal received from the audio processing device, and to provide said final processed signal, and wherein the earpiece computing device, at least in a normal mode of operation of the hearing device, is configured to mix the acoustically received electric input signal, or the modified signal, with a predicted signal received from the audio processing device and to provide the mixture as the final processed signal to the output transducer.

15. A hearing device according claim 14 wherein the earpiece computing device, in an earpiece-mode of operation, where said first processed signal is not received from the audio processing device, or is received in an inferior quality, is configured to provide the final processed signal to the output transducer in dependence of the acoustically received input signal.

16. A hearing device or system comprising at least one earpiece configured to be worn at or in an ear of a user and to receive an acoustic signal and to present a final processed signal to the user; and a separate audio processing device in communication with the at least one earpiece; wherein the earpiece is configured to transmit said acoustic signal to the audio processing device; and wherein the audio processing device comprises a signal predictor for estimating future values of the acoustical signal received by the at least one earpiece, or a processed version thereof, in dependence of a multitude of past values of said signal, thereby providing a predicted signal; and wherein the predictor is configured to compensate for or reduce the delay incurred by the processing being conducted in the external processing device.

17. A hearing device or system according to claim 16 wherein the audio processing device is configured to transmit the predicted signal or a processed version thereof to said earpiece; and wherein the earpiece is configured to determine said final processed signal in dependence of said predicted signal.

18. A hearing device or system according to claim 16 wherein the signal predictor is configured to fully or partially compensate for a processing delay incurred by one or more, such as all of a) a transmission of the acoustically received electric input signal from the hearing device to the audio processing device, b) a processing in the audio processing device providing a predicted signal, and c) a transmission of the predicted signal or a processed version thereof to said earpiece and its reception therein.

19. A hearing device according to claim 1 comprising a hearing instrument, a headset, an earphone, an ear protection device or a combination thereof.

20. A hearing device according to claim 12 comprising a hearing instrument, a headset, an earphone, an ear protection device or a combination thereof.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0113] The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:

[0114] FIG. 1A illustrates a situation where a hearing aid system comprising left and right hearing aids receives a wireless speech signal s(n−T) and an acoustic signal s(n), where it is assumed that the wireless signal arrives at hearing aids later than the acoustic signal (T>0),

[0115] FIG. 1B shows an exemplary waveform of amplitude versus time for a wirelessly received signal representing speech,

[0116] FIG. 1C schematically shows a time-frequency representation of the waveform of FIG. 1B, and

[0117] FIG. 1D schematically shows a time-domain representation of the waveform of FIG. 1B,

[0118] FIG. 2A schematically shows a time variant analogue signal (Amplitude vs time) and its digitization in samples, the samples being arranged in a number of time frames, each comprising a number N.sub.s of samples, and

[0119] FIG. 2B schematically illustrates a time-frequency representation of the time variant electric signal of FIG. 2A, in relation to a prediction algorithm according to the present disclosure,

[0120] FIG. 3 shows an embodiment of a hearing system comprising a hearing device according to the present disclosure in communication with a sound capturing device,

[0121] FIG. 4 shows an embodiment of a hearing aid comprising a signal predictor and respective signal quality estimators according to the present disclosure,

[0122] FIG. 5 shows an embodiment of a hearing aid comprising a signal predictor and a beamformer according to the present disclosure,

[0123] FIG. 6A shows an embodiment of a hearing aid comprising an earpiece and a (e.g. body-worn) processing device in communication with each other comprising, wherein the body-worn processing device comprises a signal predictor according to the present disclosure;

[0124] FIG. 6B shows an embodiment of a hearing aid comprising an earpiece and a (e.g. body-worn) processing device as shown in FIG. 6A, but where the earpiece further comprises a processing unit allowing a signal from the microphone/and or from the audio processing device to be processed, e.g. to provide the predicted signal in the earpiece; and

[0125] FIG. 6C shows an embodiment of a hearing aid comprising an earpiece and a (e.g. body-worn) processing device as shown in FIG. 6A, where the earpiece further comprises a processing unit allowing a signal from the microphone/and or from the audio processing device to be processed, and where the signal predictor is located in the audio processing device (as in FIG. 6A), but works in the time domain.

[0126] The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.

[0127] Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.

DETAILED DESCRIPTION OF EMBODIMENTS

[0128] The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.

[0129] The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

[0130] The present application relates to the field of hearing systems, e.g. hearing aids or headsets. It relates in particular to a situation where a wearer of the hearing system receives acoustic as well as electromagnetically transmitted versions of sound from a sound environment around the wearer of the hearing system. Mutual timing of the arrival of the two representations matter (in particular if they differ in propagation/processing time). A too large difference in time of arrival (e.g. more than 10-20 ms) of the same sound ‘content’ of the two representations at the user's ear leads to confusion and disturbance, rather than improved perception (e.g. speech intelligibility) by the wearer.

[0131] Consider a situation, where a user is in a very noisy environment—the target talker is speaking, but the SNR at the user is low, because he or she is located at a distance from the target talker and the user cannot understand the target talker. However, a wireless microphone is located on the table, very close to the target talker and can pick up an essentially noise-free version of the target speech signal. Unfortunately, the essentially noise-free signal picked up by the wireless microphone is delayed by T.sub.D (e.g. 30 ms) relative to the direct sound (for example) when it arrives at the user (and thus the hearing system worn by the user), and it cannot be presented to the user for the reasons described above.

[0132] However, the hearing system may use the received (essentially clean, but delayed) signal to predict the clean signal T.sub.D (e.g. 30 ms) into the future—the prediction will obviously not be perfect, but parts of the predicted signal (in particular low-frequency parts) will be a good representation of the actual clean signal T.sub.D (e.g. 30 ms) in the future. This predicted part can be usefully presented to the user, either directly, or combined with the microphone signals of the hearing system.

[0133] Further, a gain varying in time and frequency may be extracted from the predicted signal and applied to the hearing aid microphone signals. The gain may e.g. be depending on the level of the signal of interest, such that only the time-frequency regions of the external signal with high amount of energy are preserved (and the low-energy (or low SNR) regions are attenuated).

[0134] More specifically, consider the situation depicted in FIG. 1A, 1B, 1C, where the speech signal of a target talker (TT) is picked up by a wireless microphone (M.sub.ex). FIG. 1A shows a situation where a hearing aid system comprising left and right hearing aids (HD1, HD2) receives a wireless speech signal s(n−T) and an acoustic signal s(n), where it is assumed that the wireless signal arrives at the hearing aids a time period T (e.g. ms) later than the acoustic signal (i.e. T>0). FIG. 1B shows an exemplary waveform of amplitude versus time for the wirelessly received (relatively high quality) speech signal s.

[0135] It should be noted that this is not always the case. It takes approximately 3 ms for sound to travel 1 meter. If e.g. the sound source is 5 meters away, the transmission delay through air is 15 ms, so in fact the wirelessly transmitted sound may arrive prior to the sound picked up by the microphones. It may thus be advantageous knowing the TDOA, and only apply prediction, when T>0 or T>5 ms or T>10 ms (where T=T.sub.1=T.sub.2, where T.sub.1 and T.sub.2 are the time of arrival at the hearing device of the wirelessly received signal and (‘corresponding’) the acoustically received signal, cf. below).

[0136] In the wireless microphone (M.sub.ex), the speech signal is encoded and transmitted (WTS) to the hearing aid user (U), where it is received T.sub.1 ms later (e.g. in one or both hearing aids (HD1, HD2) or a in separate processing device in communication with the hearing aid(s)). Meanwhile, the acoustic speech signal (ATS) emitted from the target talker (TT) is received at the microphones of the hearing aid user T.sub.2 ms later. Hence, the wirelessly transmitted signal (WTS) is delayed by T=T.sub.1−T.sub.2 ms compared to the acoustic signal (ATS) received at the hearing aid user (U). There may be differences between the time of arrival of the acoustic signals (equal to the interaural time difference, ITD) (and theoretically also between the time of arrival of the wirelessly transmitted signal) at the two hearing aids (HD1, HD2).

[0137] In practice, the time-difference-of-arrival (TDOA) T may be estimated by a similarity measurement between the relevant signals, e.g. by correlating the acoustic signal and the wirelessly received signal of a given hearing aid to determine the time difference T. Alternatively or additionally, the time-difference-of-arrival T may be estimated by other means, e.g. via ultra wide band (UWB) technology.

[0138] In a binaural hearing aid setup comprising left and right hearing instruments (cf. e.g. HD1, HD2 in FIG. 1A), the TDOA can be preferably jointly estimated, e.g. as the smallest TDOA of each hearing instrument. Alternatively, the TDOA may be estimated separately for each instrument.

[0139] If T is too large, e.g. larger than 10 ms, the wirelessly received signal cannot be used for real-time presentation to the user. In that case, we propose to use signal samples that are available in the hearing aid system s(n−T−K+1), . . . , s(n−T) to predict future samples s(n) (relative to the signal available in the hearing aid system (HD1, HD2)), where K−1 is the number of past samples of the wirelessly received signal used for the prediction (s(n−T) is the last wirelessly received sample, so that K is the number of samples used for the prediction). The prediction may be performed in the time domain or in other domains, e.g. the (time-) frequency domain.

[0140] FIG. 1C schematically illustrates a time-frequency domain representation of the waveform illustrated in FIG. 1B. The time samples of FIG. 1B are transformed into the frequency domain as a number Q of (e.g. complex) values of the time domain-signal s(n), as e.g. provided by a Fourier transform algorithm (such as Short-Time Fourier Transform, STFT, or the like). In FIG. 1C the same time index n is used for time samples and time frames. This is done for illustrational simplicity. In practice, a number of samples, e.g. 64, are included in a time frame, which, on the other hand, may overlap. In each frequency band, or as illustrated in FIG. 1C, in the lower frequency bands, e.g. below a threshold frequency f.sub.th, future ‘samples’ s.sub.q(n) are predicted based on (K−1) past samples (or frames) s.sub.q(n−T−K+1), . . . , s.sub.q(n−T), for the q'th frequency band FB.sub.q (i.e. for the frequency bands FB.sub.1, . . . FB.sub.qth below the threshold frequency, f.sub.th). The predicted values are indicated in FIG. 1C by the light dotted shading of time frequency-bins at time index n). The values whereon the predicted values are based are indicated in FIG. 1C by the dark dotted shading of time frequency-bins at time indices n−T−(K−1), . . . , n−T. In case the prediction is limited to the low-frequency bands (FB.sub.1, . . . FB.sub.qth) below the threshold frequency, the frequency bands (FB.sub.qth+1, . . . , FB.sub.Q) above the threshold frequency (f.sub.th) may be represented by one of the noisy microphone signals, or by a beamformed signal generated as a combination of the two or more microphone signals (as indicated by the cross hatched time frequency-bins at time index n).

[0141] Different use cases of the concept of the present disclosure may be envisioned, e.g. the following situations:

1) The wireless microphone is an external wireless microphone—e.g. a microphone unit clipped on to a target speaker, a table microphone, a microphone in a smart-phone, etc. (cf. e.g. M.sub.ex in FIG. 1A).
2) The “wireless microphone” is in fact the opposite hearing device (HDx, x=1 or 2 as the case may be), and the target signal is an external sound source: the sound signal is picked up by the left (HD2 for example) and sent to the right hearing device (HD1 for example) using signal prediction to reduce latency, so that the (parts of) the microphone signal from the left (HD2) may replace/or be combined with the right (HD1) microphone signals (see below for a more detailed description).
3) The “wireless microphone” is not a microphone. We consider the situation, where we would like to export computations to an external (processing) device, e.g. a smart-phone: A sound signal is picked up by the hearing device-users microphones, sent from the hearing device to an external device for computations (potentially using signal prediction to reduce latency), and sent back from the external device to the hearing device (potentially using signal prediction to reduce latency).

[0142] Prediction of future samples s(n) based on past samples s(n−T−K+1), . . . , s(n−T), as illustrated in FIG. 1C, 1D. FIG. 1C schematically shows a time-frequency representation of the waveform of FIG. 1B, and FIG. 1D schematically shows a time-domain representation of the waveform of FIG. 1B. As illustrated in FIG. 1D, the predicted signal z(n) is a function ‘f’ of a number K of past samples of wirelessly received signal s (from t.sub.now=n−T and back): s(n−T−K+1), . . . , s(n−T). The time t.sub.now=n−T is a time of arrival of the wireless signal at the hearing device and indicates a delay T relative to the time of arrival of the (corresponding) acoustically propagated signal y. Hence, at t.sub.now=n−T, the hearing device can have access to samples of s at t=t.sub.now and any number of past sample values at t.sub.now−1, t.sub.now−2, etc. of s.

[0143] Prediction of future samples s(n) based on past samples s(n−T−K+1), . . . , s(n−T) is a well-known problem with well-known existing solutions. For example, prediction of future samples s(n) may be based on linear prediction, see e.g. [1], where an estimate z(n) of s(n) is formed as a linear combination of past samples, i.e.,


z(n)=Σ.sub.k=0.sup.P−1a.sub.ks(n−T−k),  (1)

where a.sub.k, k=0, . . . , P−1 are time-varying, signal-dependent coefficients derived from past samples s(n−T−K+1), . . . , s(n−T) [1], and where P denotes the order of the linear predictor.

[0144] Many other ways of predicting s(n) exist. For example, an estimate z(n) of s(n) could be computed using non-linear methods, such as deep neural networks (DNNs), i.e.,


z(n)=G(s(n−T−K+1), . . . ,s(n−T);Θ,T),  (2)

where Θ denotes the set of parameters of the DNN and G(., Θ, T) represents the network. In this situation, the network G(., Θ, T) would be trained off-line, before deployment to predict signal samples separated by T samples, using a training set of clean speech signals, cf. e.g. chapter 5 in [2].

[0145] More generally, an estimate z(n) of s(n) could be computed using a DNN of the form,


z(n)=G(s(n−T−K+1), . . . ,s(n−T),x(n);Θ),  (3)

where x(n) represents a microphone signal captured at the hearing aid (i.e., a noisy version of what the network tries to predict). In this situation, we removed the network dependency on T, because it can be estimated internally in the DNN by comparing the wireless received samples s(n−T−K+1), . . . , s(n−T) with the local microphone signal x(n), for example by correlating the two sequences. This configuration which has access to an up-to-date, but potentially very noisy signal x(n) is particularly well-suited for prediction of transients/speech onsets, which may otherwise be challenging.

[0146] In yet another generalized version of the predictor,


z(n)=G(s(n−T−K+1), . . . ,s(n−T),x.sub.1(n), . . . ,x.sub.M(n);Θ),  (4)

the estimate z(n) is a function of the (out-dated, but potentially relatively noise-free) received wireless signal and multiple local microphone signals x.sub.1(n), . . . , x.sub.M(n), (which are up-to-date, but potentially very noisy). This latter configuration has as a special case the situation, where z(n) is computed (partly) as a function of a beamformed signal y(n), computed at the hearing aid using the local microphone signals,


y(n)=H(x.sub.1(n), . . . ,x.sub.M(n)),  (5)

where H(.) represents a beamforming operation. Yet other prediction methods exist.

[0147] Obviously, prediction is not limited to time domain signals s(n) as described above. For example, (linear) prediction could also take place in the time-frequency domain or in other domains (e.g. cosine, wavelet, Laplace, etc.).

[0148] In general, one can write the predicted signal as


z(n)=s(n)+e(n),  (6)

where e(n) is the estimation error. If e(n) is considered as a noise term, the prediction process may be seen as simply ‘trading’ a delayed (outdated), essentially noise-free signal s(n−T) with an up-to-date, but generally noisy signal z(n)=s(n)+e(n). The more accurate prediction, the smaller the noise (prediction error). The signal-to-noise ratio (SNR) ξ(n) in the predicted signal may be estimated from the available past samples, for example as


{circumflex over (ξ)}(n)=Σ.sub.n′=n−T−K+1.sup.n−Ts.sup.2(n′)/Σ.sub.n′=n−T−K+1.sup.n−Te.sup.2(n′)|.sup.2,  (7)

where the sum is taken over available past samples. Alternatively, the SNR may be computed offline as a long-term average SNR to be expected for a particular value of T.

[0149] The SNR may also be estimated in the time-frequency domain (m,q),


{circumflex over (ξ)}(m,q)=Σ.sub.m′=m−T′−K′+1.sup.m−T′|s(m′q)|.sup.2/Σ.sub.m′=m−T′−K′+1.sup.m−T′|e(m′,q)|.sup.2,  (8)

where s(m, q) and e (m, q) denote time-frequency representations (for example, short-time Fourier transforms) of signals s(n) and e(n), and T′ and K′ time-frequency analogues of T and K, where m is a time index (e.g. a time-frame index) and q is a frequency index (e.g. a frequency band index).

[0150] The predicted signal z(n) may be used in several ways in the hearing device.

[0151] For example, if the SNR ξ(n) is sufficiently high, one could simply substitute the noisy signal x(n) picked up at a microphone of the hearing device with the predicted signal z(n), for example in signal regions where the SNR ξ(n) in the predicted signal would be higher than the SNR in the microphone signal x(n) (as estimated by an SNR estimation algorithm on board the hearing device).

[0152] Alternatively, rather than substituting signal samples z(n), one could perform the substitution in frequency bands. For example, one could decompose and substitute z(n) in frequency channels (e.g. low frequencies), for which it is known that the predicted signal is generally of better quality than the hearing aid microphone signal More generally, substitution could even be performed in the time-frequency domain according to an estimate of the SNR in time-frequency tiles, cf. eq. (8) above.

[0153] Alternatively, the signal z(n) may be combined with one or more of the microphone signals of the hearing device in various beamforming schemes in order to produce a final noise-reduced signal. In this situation, the signal z(n) is simply considered yet another microphone signal with a noisy realization of the target signal.

[0154] The description above assumed the predictor to be part of the receiver (i.e., the hearing system, e.g. a hearing device). However, it is also possible to do the prediction in the wireless microphone—assuming it has processing capabilities and can be informed of the time-difference-of-arrival T. In other words, the predicted signal z(n) is formed in the wireless microphone and transmitted to the hearing system (e.g. hearing device(s)), potentially together with side information such as the estimated SNR {circumflex over (ξ)}(n).

[0155] The description above has assumed the wireless microphone is a single microphone that captures the essentially noise-free signal s(n). However, it could also consist of a microphone array (i.e., more than one microphone). In this case, a beamforming system could be implemented in the wireless device, and the output of the beamformer play the role of the essentially noise-free signal s(n). Further, the external (sound capturing) device may e.g. be constituted by or comprise a table microphone array capable of extracting at least one noise free signal.

[0156] FIG. 2A schematically illustrates a time variant analogue signal (Amplitude vs time) and its digitization in samples x(n), the samples being arranged in time frames, each comprising a number N.sub.s of samples. FIG. 2A shows an analogue electric signal (solid graph), e.g. representing an acoustic input signal, e.g. from a microphone, which is converted to a digital audio signal x(n) in an analogue-to-digital (AD) conversion process, where the analogue signal is sampled with a predefined sampling rate f.sub.s, f.sub.s being e.g. in the range from 8 kHz to 48 kHz (adapted to the particular needs of the application) to provide digital samples x(n) at discrete points in time n, as indicated by the vertical lines extending from the time axis with solid dots at their endpoint ‘coinciding’ with the graph, and representing its digital sample value at the corresponding distinct point in time n. Each (audio) sample x(n) represents the value of the acoustic signal at time n by a predefined number N.sub.b of (quantization) bits, N.sub.b being e.g. in the range from 1 to 48 bit, e.g. 24 bits. Each audio sample is hence quantized using N.sub.b bits (resulting in 2.sup.Nb different possible values of the audio sample).

[0157] In an analogue to digital (AD) process, a digital sample x(n) has a length in time of 1/f.sub.s e.g. 50 μs, for f.sub.s=20 kHz. A number of (audio) samples N.sub.s are e.g. arranged in a time frame, as schematically illustrated in the lower part of FIG. 2A, where the individual (here uniformly spaced) samples are grouped in time frames x(m) (comprising individual sample elements #1, 2, . . . , N.sub.s), where m is the frame number. As also illustrated in the lower part of FIG. 2A, the time frames may be arranged consecutively to be non-overlapping (time frames 1, 2, . . . , m, . . . , N.sub.M), where m is a time frame index. Alternatively, the time frames may be overlapping (e.g. 50% or more, as illustrated in the lower part of FIG. 2A). In an embodiment, a time frame comprises 64 audio data samples. Other frame lengths may be used depending on the practical application. A time frame may e.g. have a duration of 3.2 ms (e.g. corresponding to 64 samples at a sampling rate of 20 kHz).

[0158] FIG. 2B schematically illustrates a time-frequency map (or frequency sub-band) representation of the time variant electric signal x(n) of FIG. 2A in relation to a prediction algorithm according to the present disclosure. The time-frequency representation X.sub.m(q) (q=1, . . . , Q, where q is a frequency index) comprises an array or map of corresponding complex or real values of the signal in a particular time and frequency range. The time-frequency representation may e.g. be a result of a Fourier transformation converting the time variant input signal x(n) to a (time variant) signal X(k,m) in the time-frequency domain. In an embodiment, the Fourier transformation comprises a discrete Fourier transform algorithm (DFT), e.g. a short time Fourier transformation (STFT) algorithm. The frequency range considered by a typical hearing device (e.g. a hearing aid) from a minimum frequency f.sub.min to a maximum frequency f.sub.max comprises a part of the typical human audible frequency range from 20 Hz to 20 kHz, e.g. a part of the range from 20 Hz to 12 kHz. In FIG. 7B, the time-frequency representation X(m,q) of signal x(n) comprises complex values of magnitude and/or phase of the signal in a number of DFT-bins (or tiles) defined by indices (m,q), where q=1, . . . , Q represents a number Q of frequency values (cf. vertical q-axis in FIG. 2B) and m=1, . . . , N.sub.M represents a number N.sub.M of time frames (cf. horizontal m-axis in FIG. 2B). A time frame is defined by a specific time index m, and the corresponding Q DFT-bins (cf. indication of Time frame m in FIG. 2B). A time frame m (or X.sub.m) represents a frequency spectrum of signal x at time m. A DFT-bin or tile (m,q) comprising a (real) or complex value X(m,q) of the signal in question is illustrated in FIG. 2B by hatching of the corresponding field in the time-frequency map (cf. DFT-bin=time frequency unit (m,q): X(m,q)=|X|.Math.e.sup.iφ in FIG. 2B, where |X| represents a magnitude and φ represents a phase of the signal in that time-frequency unit. Each value of the frequency index q corresponds to a frequency range Δf.sub.q, as indicated in FIG. 2B by the vertical frequency axis f. Each value of the time index m represents a time frame. The time T.sub.F spanned by consecutive time indices depend on the length of a time frame and the degree of overlap between neighbouring time frames (cf. horizontal time-axis in FIG. 2B).

[0159] A time frame of an electric signal may e.g. comprise a number N.sub.s of consecutive samples, e.g. 64, (written as vector x.sub.m) of the digitized electric signal representing sound, m being a time index, cf. e.g. FIG. 2A. A time frame of an electric signal may, however, alternatively be defined to comprise a magnitude spectrum (written as vector X.sub.m) of the electric signal at a given point in time (as e.g. provided by a Fourier transformation algorithm, e.g. an STFT (Short Time Fourier Transform)-algorithm, cf. e.g. schematic illustration of a TF-map in FIG. 2B. The time frame x.sub.m representing a number of time samples, and the time frame X.sub.m representing a magnitude spectrum (of the same time samples) of the electric signal are tied together by Fourier transformation, as e.g. given by the expression X.sub.m=F.Math.x.sub.m, where F is a matrix representing the Fourier transform.

[0160] The electric input signal(s) representing sound may be provided as a number of frequency sub-band signals. The frequency sub-bands signals may e.g. be provided by an analysis filter bank, e.g. based on a number of band-pass filters, or on a Fourier transform algorithm as indicated above (e.g. by consecutively extracting respective magnitude spectra from the Fourier transformed data).

[0161] As indicated in FIG. 2B, a prediction algorithm according to the present disclosure may be provided on a frequency sub-band level (instead of on the full-band (time-domain) signal as described above, cf. e.g. FIG. 1D). Thereby a down-sampling of the update rate of the respective (frequency sub-band) prediction algorithms is provided (e.g. a factor of 20 or more). The bold ‘stair-like’ polygon in FIG. 2B enclosing a number of historic time-frequency units (DFT-bins) of the (wirelessly received) input signal (from time ‘now’ (index m, cf. time ‘n−T’ in FIG. 1C, 1D) and K.sub.q time units backwards in time) indicate the part of the known input data that—for a given frequency band q—are used to predict future values z of the (wirelessly received) signal s.sub.WLR at a prediction time T later index m+T), cf. bold rectangle with dotted filling at time unit m+T. The prediction algorithm may be executed in all frequency bands q=1, . . . , Q, and e.g. may use the same number K.sub.q of historic values to predict the future value (or use different values for some frequency bands). But the prediction algorithm may be executed only in selected frequency bands, e.g. frequency bands having the most importance for speech intelligibility, e.g. frequency bands below a threshold frequency (cf. e.g. f.sub.th in FIG. 1C), or as indicated in the schematic illustration of FIG. 2B, above a low-frequency threshold frequency f.sub.th,low and below a high-frequency threshold frequency f.sub.th,high. The high frequency threshold frequency f.sub.th,high may e.g. be 4 kHz (typically prediction is difficult at higher frequencies), or 3 kHz, or 2 kHz or smaller, e.g. 1 kHz. This is due in part to the origin of voice at frequencies above the high frequency threshold being mainly due to turbulent air streams in the mouth and throat region, which by nature is less predictable than voice at frequencies below the low-frequency threshold, which is mainly created by vibration of the vocal cords. The low-frequency threshold frequency f.sub.th,low may e.g. be larger than or equal to 100 Hz (typically human hearing perception is low below 100 Hz), or larger than or equal to 200 Hz or larger than or equal to 500 Hz. The parameter K.sub.q indicating the number of past values of time-frequency units that are used to predict a future time-frequency unit may be different, e.g. decreasing with increasing frequency (as illustrated in FIG. 2B), e.g. to mimic an increasing time period of a fundamental frequency with decreasing frequencies. Likewise, the weighting factor a.sub.i applied to each previous value (time frequency unit) of a given frequency sub-band signal may be frequency dependent a.sub.i=a.sub.i(q)=a.sub.i,q. Even the prediction time T (e.g. due to different values of the parameter K.sub.q) may be frequency dependent (T=T(q)=T.sub.q). The individual prediction algorithms may be executed according to the present disclosure as discussed above for the full-band signal Instead of operating on uniform frequency bands (the band width Δf.sub.q being independent of frequency index q) as shown in FIG. 2B, the prediction algorithms may operate on non-uniform frequency bands, e.g. having increasing width with increasing frequency (reflecting the logarithm nature of the human auditory system).

[0162] FIG. 3 shows an embodiment of a hearing system comprising a hearing device (HD) according to the present disclosure in communication with a sound capturing device (M.sub.ex−Tx). The hearing device (HD), e.g. a hearing aid, comprises an input unit (IU) comprising a multitude M(M≥2) of input units (IU.sub.1, . . . , IU.sub.M) comprising respective input transducers (IT.sub.1, . . . , IT.sub.M) (e.g. microphones) for converting sound (‘Acoustic input’, y.sub.1(t), . . . , y.sub.M(t)) in the environment of the hearing device to a corresponding multitude of acoustically received electric input signals (y′.sub.1(n), . . . , y′.sub.M(n)) representing said sound as a stream or streams of digital samples. The input units (e.g. the input transducers) may comprise appropriate analogue to digital converters, to provide the acoustically received electric input signals as digital samples. The input units (IU.sub.1, . . . , IU.sub.M) further comprise respective analysis filter banks (AFB) for providing the acoustically received electric input signals in a time-frequency representation (m, q) as signal Y=(Y.sub.1(m,q), . . . , Y.sub.M(m, q)). The hearing device (HD) (here the input unit (IU)) further comprises an auxiliary input unit IU.sub.aux comprising a wireless receiver (Rx) for receiving an audio signal (‘Audio input’, s(t)) from a wireless transmitter (Tx) of a sound capturing device (M.sub.ex) for picking up (a target) sound (S) in the environment and providing a wirelessly received electric input signal s.sub.WLR(n) representing said sound as a stream of digital samples (s.sub.WLR(n)). The auxiliary input unit IU.sub.aux may comprise an appropriate analogue to digital converter, to provide the wirelessly received electric input signal as digital samples (s.sub.WLR(n)). The input units (IU.sub.1, . . . , IU.sub.M) further comprise respective analysis filter banks (AFB) for providing the acoustically received electric input signal (s.sub.WLR(n)) in a time-frequency representation (m, q) as signal S.sub.WLR(m,q). The hearing aid further comprises a beamformer (BF) configured to provide a beamformed signal Y.sub.BF(m, q) based on the multitude of acoustically received electric input signals (Y.sub.1(m,q), . . . , Y.sub.M(m,q)). The hearing aid further comprises a processor (PRO) configured to receive beamformed signal Y.sub.BF(m,q) and the wirelessly received electric input signal S.sub.WLR(m,q). The processor (PRO) comprises a signal predictor (PRED) configured to estimate future samples (or time-frequency units) of the wirelessly received electric input signal s.sub.WLR(n) (or S.sub.WLR(m,q)) in dependence of a multitude of past samples (or time-frequency units) of the signal, thereby providing a predicted signal z(n) (or Z(m,q)). The signal predictor (PRED) may be configured to run an estimation algorithm as outlined above (and in the prior art, cf. e.g. EP3681175A1). The signal predictor (PRED) may be configured to estimate a time difference of arrival between an acoustically received electric input signal (e.g. y′.sub.i(n), i=1, . . . , M, or the beamformed signal Y.sub.BF(m,q)) and the wirelessly received electric input signal (s.sub.WLR(n′), or S.sub.WLR(m,q)), cf. e.g. delay estimator (DEST) in FIG. 4, e.g. by finding a time lag that optimizes a correlation function between the two signals. The time difference of arrival (cf. T in FIG. 4) may be fed to a prediction algorithm to determine the prediction time of the algorithm. The time difference of arrival (cf. T in FIG. 4) may be determined based on the time-frequency domain signals (Y.sub.BF(m, q) and S.sub.WLR(m, q)), cf. dashed input Y.sub.BF(m, q) to the signal predictor (PRED). The hearing device further comprises a selection controller (SEL-MIX-CTR) configured to include the predicted signal based on the wirelessly received electric input signal in the (resulting) processed signal (Ŝ(m, q)) in dependence of a control signal, e.g. a sound quality measure, e.g. an SNR-estimate (cf. e.g. FIG. 4). The processed signal (Ŝ(m, q)) may comprise or be constituted by the predicted signal (Z(m,q)). The processed signal (Ŝ(m, q)) may be a mixture of the acoustically received, beamformed signal (Y.sub.BF(m,q)) and, and the wirelessly received electric input signal (Z(m,q)), respectively. The embodiment of a hearing device of FIG. 3 further comprises an output unit (OU) comprising a synthesis filter bank (FBS) and an output transducer for presenting output stimuli perceivable as sound to the user in dependence of the processed signal from said processor (Ŝ(m, q)), or a further processed version thereof (cf. e.g. FIG. 4). The output unit may comprise a digital to analogue converter as the case may be. The output transducer may comprise a loudspeaker of an air conduction type hearing device. The output transducer may comprise a vibrator of a bone conduction type hearing device. The output transducer may comprise a multi-electrode array or a cochlear implant type hearing device. In the latter case, the synthesis filter bank can be dispensed with.

[0163] FIG. 4 shows an embodiment of a hearing aid according to the present disclosure. The hearing aid comprises an input transducer (here a microphone (M)) for converting sound (Acoustic input y(t)′, t representing time) in the environment of the hearing device to an acoustically received electric input signal (y(n)=s(n)+v(n)) representing said sound as a stream or streams of digital samples, n being a time index. The input transducer is assumed to provide the electric input signal as a stream of digital samples y(n), e.g. by using a MEMS microphone or by including an analogue to digital converter as appropriate. The Acoustic input y(t) (as the electric input signal v(n) based thereon) may comprise a target sound s and noise v from the environment (or from the user, or from the hearing aid (e.g. acoustic feedback)). The hearing aid further comprises a wireless receiver (Rx) for receiving an audio signal from a wireless transmitter of a sound capturing device for picking up sound in said environment. The wireless receiver (Rx) may e.g. comprise an antenna and corresponding electronic circuitry for receiving and extracting a payload (audio) signal and providing a wirelessly received electric input signal s.sub.WLR(n′) representing said sound as a stream of digital samples, n′ being a time index. The hearing aid comprises respective analysis filter banks (or Fourier transform algorithms) for providing each of the digitized electric input signals y(n) and s.sub.WLR(n′) in a frequency sub-band or time-frequency representation Y(m,q) and S.sub.WLR(m′,q), respectively, where m and m′ are time indices and q is a frequency index, respectively. The hearing aid further comprises a signal predictor (PRED) configured to estimate future samples (e.g. as values of time frequency units (m,q) in one or more frequency bands, q′) of the wirelessly received electric input signal S.sub.WLR(m,q) in dependence of a multitude of past samples of said signal S.sub.WLR(m′,q), thereby providing a predicted signal Z(m,q). The hearing aid further comprises a delay estimator (DEST) configured to estimate a time-difference-of-arrival (T) of sound from a given sound source in the environment at the hearing aid (e.g. at the inputs of the delay estimator) between the acoustically received electric input signal y(n) and the wirelessly received electric input signal s.sub.WLR(n′). The time-difference-of-arrival (T) provides as an output of the delay estimator is fed to the signal predictor (PRED) to define a prediction time period of the signal predictor. The transmit time-difference-of-arrival (T) may e.g. be determined by correlating the acoustically received signal with the wirelessly received signal. The hearing aid further comprises respective SNR estimators (SNRestA, SNRestP) configured to provide an estimate of the signal to noise ratio of the acoustically received electric input signal (Y(m,q)) and the predicted signal (Z(m,q)), respectively. SNR-estimation may in general be provided in a number of ways, e.g. involving a voice activity detector, and e.g. estimating a noise level <N(m.sub.0, q)> during speech pauses and providing a noise estimate as Y(m,q)/<N(m.sub.0,q)>, where m.sub.0 is the last time index where noise was estimated (last speech pause). More sophisticated SNR estimation schemes, or other signal quality estimates, are available, see e.g. US20190378531A1. The SNR estimators (SNRestA, SNRestP) applied to the acoustically received and the predicted signals, respectively, may be based on different principles. The SNR estimator (SNRestP) of the predicted signal (Z) may be estimated based on the previous values of the wirelessly received electric input signal (S.sub.WLR) (cf. dashed input to the SNR estimator (SNRestP)) and a prediction error signal e (Z=S+e), cf. otline above in connection with eq. (7) and (8) (for respective time domain and time-frequency domain implementations). The hearing aid further comprises a selection controller (SEL-MIX-CTR) configured to include said estimated future samples of said wirelessly received electric input signal in said processed signal in dependence of a sound quality measure, her in dependence of the SNR-estimates SNR.sub.Y(m,q) and SNR.sub.Z(m,q) of the acoustically received and wirelessly received electric input signals Y(m, q) and Z(m, q), respectively. The estimated future samples of said wirelessly received electric input signal may be included in time regions where the predicted signal fulfils a sound quality criterion (e.g. in that the sound quality measure, here that the SNR-estimate SNR.sub.Z(m,q) is larger than a first threshold value SNR.sub.TH1(q), or larger than the SNR estimate of the acoustically received signal Y(m,q)). For example, if the SNR estimate is sufficiently high, e.g. larger than a second threshold value SNR.sub.TH2(q), selection controller may be configured to substitute the (noisy) acoustically received electric input signal y(n) (or picked up at a microphone of the hearing aid with the predicted signal z(n), for example in time regions where the estimated SNR in the predicted signal is higher than the estimated SNR in the microphone signal y(n) (as estimated by an SNR estimation algorithm on board the hearing aid), n being a time (sample) index. In the time-frequency-domain, such scheme may equivalently be adapted on a frequency sub-band level (q) or even on a time-frequency unit level (i.e. individually for each TF-unit (m, q)). The selection controller (SEL-MIX-CTR) thus receives as audio inputs signals Y(m, q) and Z(m, q) and provides as an output an (enhanced) audio signal Ŝ(m,q) in dependence of control signals SNR.sub.Y(m,q) and SNR.sub.Z(m,q). The hearing aid further comprises a signal processing unit (HAG) configured to provide a frequency dependent gain and/or a level dependent compression, e.g. to compensate for a hearing impairment of a user. The thus determined hearing aid gain may be applied to the (enhanced) audio signal Ŝ(m,q) and provides (user adapted) processed signal S.sub.out(m, q). The hearing aid further comprises a synthesis filter bank (FBS) for converting a signal in the time-frequency domain (S.sub.out(m,q)) to a signal in the time domain s.sub.out(n). The hearing aid further comprises an output transducer (OT) for presenting output stimuli perceivable as sound to the user in dependence of the processed signal s.sub.out(n) from the processor, or a further processed version thereof. The output transducer may e.g. comprise a loudspeaker, or a vibrator, or an implanted electrode array. Some of the functional components of the hearing aid of FIG. 4 may be included in a (e.g. digital signal) processor. The processor is configured to receive a) the at least one acoustically received electric input signal or signals, or a processed version thereof, b) the wirelessly received electric input signal, and to provide a processed signal in dependence thereof. The (digital signal) processor may comprise the following functional blocks of the embodiment of FIG. 4: a) the analysis filter banks (FBA), b) the delay estimator (DEST), c) the signal predictor (FRED), d) the SNR estimators (SNRest), e) the selection controller (SEL-MIX-CTR), f) the signal processing unit (HAG), and g) the synthesis filter bank (FBS). Other functional blocks, e.g. related to feedback control, or further analysis and control blocks, e.g. related to own voice estimation/voice control, etc., may as well be included in the (digital signal) processor.

[0164] FIG. 5 shows an embodiment of a hearing aid comprising a signal predictor and a beamformer according to the present disclosure. The embodiment of FIG. 5 is similar to the embodiment of FIG. 4, except that no SNR estimator (SNRestP, SNRestP) to control the mixture of the acoustically received electric input signal (Y) and the predicted signal (Z) are indicated. Further, instead of the selection controller (SEL-MIX-CTR) of FIG. 4, a beamformer filter (BF) is included in the embodiment of FIG. 5. Or in other words, the selection controller (SEL-MIX-CTR) may be embodied in the beamformer filter (BF) providing the (enhanced) audio signal (m, q). In this embodiment, the predicted signal (z(n) or Z(m, q)) is combined with one or more of the microphone signals of the hearing aid in various beamforming schemes in order to produce a final noise-reduced signal (here only one microphone signal, Y(m, q), is shown, but there may in other embodiments be a multitude M of electric input signals, cf. e.g. FIG. 3). In this situation, the predicted signal is simply considered yet another microphone signal with a noisy realization of the target signal.

[0165] FIG. 6A shows an embodiment of a hearing device (HD) comprising an earpiece (EP) and a body-worn audio processing device (APD) in communication with each other. The (possibly) body-worn processing device (APD) comprises a computing device (CPD.sub.apd, e.g. an audio signal processor or similar) comprising a signal predictor (PRED) according to the present disclosure. The hearing device, e.g. a hearing aid, comprises at least one earpiece (EP) configured to be worn at or in an ear of a user and a separate audio processing device (APD) configured to be worn or carried by the user (or at least located sufficiently close to the user to stay in communication with the earpiece via the wireless link implemented by the transceivers of the respective devices).

[0166] The at least one earpiece (EP) comprises an input transducer (here a microphone (M) for converting sound in the environment of the hearing device to an acoustically received electric input signal y(n) representing the sound. The earpiece further comprises a wireless transmitter (Tx) for transmitting the acoustically received electric input signal y(n), or a part (e.g. a filtered part, e.g. a lowpass filtered part) thereof, to the audio processing device (APD). The earpiece (EP) further comprises a wireless receiver for receiving a predicted signal from said audio processing device, at least in a normal mode of operation of the hearing device. The wireless transmitter and receiver may be provided as antenna and transceiver circuitry for establishing an audio communication link (WL) according to a standardized of proprietary (short range) protocol. The earpiece (EP) further comprises an output transducer (here a loudspeaker (SPK)) for converting a (final) processed signal s′.sub.out(n) to stimuli perceived by the user as sound. The processed signal (s′.sub.out(n)) may, at least in a normal mode of operation of the hearing device, be constituted by or comprise at least a part of the predicted signal (provided by the audio processing device, (or by the earpiece as in FIG. 6B), see in the following).

[0167] The audio processing device (APD) comprises a wireless receiver (Rx) for receiving the acoustically received electric input signal y(n), or a part thereof, from the earpiece (EP), and is configured to provide a received signal y(n′) representative thereof. The audio processing device (APD) (e.g. the computing device (CPD.sub.apd)) further comprises a processor part (HAP) for applying a processing algorithm (e.g. including a neural network) to said received signal (y(n′)), or to a signal originating therefrom, e.g. a transformed version thereof (Y), and to provide a modified signal (Y′). The processor part (HAP) may e.g. be configured to compensate for a hearing impairment of the user (e.g. by applying a compressive amplification algorithm, e.g. providing a frequency and/or level dependent gain (or attenuation) to be applied to the input signal (y(n′), or Y). The audio processing device (APD) (e.g. the computing device (CPD.sub.apd)) further comprises a signal predictor (PRED) for estimating future values of the modified signal (y′, Y′) in dependence of a multitude of past values of the signal, thereby providing a predicted signal (z, Z). The signal predictor (PRED) may comprise a prediction algorithm (either working in the time domain or in a transform domain, e.g. the time-frequency domain) configured to predict future values of an input signal based on past values of the input signal, and knowledge of a prediction time, e.g. a processing delay (cf. input T) between the first future value(s) and the latest past value(s) of the input signal (cf. e.g. FIG. 1C, 1D). The total processing delay (T) may be a sum of delays (T.sub.link) incurred by the wireless link between the earpiece (EP) and the separate processing device (APD) plus the processing delay (T.sub.apd) in the audio processing device (i.e. T=T.sub.link+T.sub.apd). The processing delay (T.sub.link) of the wireless link is dependent of the technology (communication protocol) used for establishing the link and may be known or estimated in advance (or during use). The processing delay (T.sub.apd) of the audio processing device is dependent of the processing blocks of the audio path through the device from the receiver to the transmitter and may likewise be known or estimated in advance (or during use). The audio processing device (APD) further comprises a transmitter (Tx) for transmitting said predicted signal (Z) or a processed version thereof (s.sub.out(n)) (termed the ‘first processed signal’) to the earpiece (EP).

[0168] The signal predictor (PRED) is configured to fully or partially compensate for a processing delay incurred by a) the transmission of the acoustically received electric input signal (y(n)) from the earpiece (EP) to the audio processing device (APD), b) the processing in the audio processing device (APD) (through its audio signal processing path from receiver (Rx) to transmitter (Tx)), and c) the transmission of the predicted signal (z(n) or a processed version thereof to said earpiece (EP) and its reception therein (as signal s′.sub.out(n)). This is achieved by providing an estimate T of the total processing delay (T=T.sub.link+T.sub.apd) as input to the prediction algorithm (PRED).

[0169] In the embodiment of FIG. 6A, the audio processing device (APD) (e.g. the computing device (CPD.sub.apd)) further comprises respective transform domain and inverse transform domain units (TRF, I-TRF) to convert a signal in the time domain (here the received signal (y(n′) from the earpiece) to a transform domain (e.g. the time-frequency domain), cf. signal Y, and back again (here the predicted signal Z in the transform domain to z(n) in the time domain). In the embodiment of FIG. 6A, the signal predictor (PRED) is implemented in the transform domain. In the embodiment of FIG. 6C, the signal predictor (PRED) is implemented in the time domain. This can be chosen as a design feature according to the specific configuration (e.g. partition) of the device/system.

[0170] FIG. 6B shows an embodiment of a hearing aid (HD) comprising an earpiece (EP) and a (e.g. body-worn) processing device (APD) as shown in FIG. 6A, but where the earpiece further comprises a computing device (CPD.sub.ep) (e.g. a signal processing unit) allowing a signal from the microphone (M) (signal y(n)) and/or from the wireless receiver (Rx) (signal s′.sub.out(n)) to be processed in the earpiece. The computing device (CPD.sub.ep) provides a final processed signal (s″.sub.out(n) in FIG. 6B) that is fed to the output transducer (here loudspeaker (SPK)) for presentation to the user. In the embodiment of FIG. 6B, the signal predictor (PRED) is implemented in the time domain.

[0171] FIG. 6C shows an embodiment of a hearing aid comprising an earpiece and a body-worn processing device as shown in FIG. 6A, but where the signal predictor works in the time domain (in that the order of the inverse transform domain unit (I-TRF) and the signal predictor (PRED) has been reversed). A further difference is that the earpiece (EP) comprises a computing device (CPD.sub.ep) allowing the earpiece to process the acoustically received signal (y(n)) and or the first processed signal received from the separate audio processing device (APD). The optional processing of the acoustically received signal (y(n)) (as indicated by dashed input to the computing device (CPD.sub.ep)) may e.g. be of interest in a mode of operation, where no contact to the audio processing device (APD) can be established (e.g. to provide the user with basic functions of the hearing device (e.g. hearing loss compensation).

[0172] Regarding the embodiments of FIG. 6A, 6B, 6C, is should be mentioned that the signals transmitted from the earpiece (EP) to the (external) audio processing device (APD), via the wireless link (WL), and/or from the audio processing device (APD) to the earpiece (EP), do not necessarily have to be ‘audio signal(s)’ as such. It may as well be features derived from the audio signal(s). E.g. instead of transmitting an audio signal back to the hearing device, a gain derived from the predicted signal could be transmitted back.

[0173] The term ‘or a processed version thereof’ may e.g. cover such extracted features from an original audio signal. The term ‘or a processed version thereof’ may e.g. also cover an original audio signal that has been subject to a processing algorithm that applies gain or attenuation to the original audio signal and this results in a modified audio signal (preferably enhanced in some sense, e.g. noise reduced relative to a target signal).

[0174] It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.

[0175] As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.

[0176] It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.

[0177] The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.

REFERENCES

[0178] [1] Deller, Hansen and Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000. [0179] [2] Goodfellow, Bengio and Courville, “Deep Learning,” MIT Press, 2016. [0180] EP3681175A1 (Oticon) 15 Jul. 2020 [0181] US20190378531A1 (Oticon) 12 Dec. 2019