Hearing system configured to localize a target sound source

10945079 · 2021-03-09

Assignee

Inventors

Cpc classification

International classification

Abstract

A hearing system is adapted to be worn by a user and configured to capture sound in an environment of the user and comprises a) a sensor array comprising M transducers for providing M electric input signals representing said sound and having a known geometrical configuration relative to each other; b) a detector unit for detecting movements over time of the hearing system, and providing location data of said sensor array at different points in time t, t=1, . . . , N; c) a first processor for receiving said electric input signals andin case said sound comprises sound from a localized sound source Sfor extracting sensor array configuration specific data .sub.ij of said sensor array indicative of differences between a time of arrival of sound from said localized sound source S at said respective input transducers, at said different points in time t, t=1, . . . , N; and d) a second processor configured to estimate data indicative of a location of said localized sound source S relative to the user based on corresponding values of said location data and said sensor array configuration data at said different points in time t, t=1, . . . , N.

Claims

1. A hearing system adapted to be worn by a user and configured to capture sound in an environment of the user, the hearing system comprising a sensor array of M input transducers, where M2, each for providing an electric input signal representing said sound in said environment, said input transducers p.sub.i, i=1, . . . , M, of said array having a geometrical configuration relative to each other, when worn by the user, and a detector unit for detecting movements over time of the hearing system when worn by the user, and providing location data of said sensor array at different points in time t, t=1, . . . , N; a first processor for receiving said electric input signals and for extracting sensor array configuration specific data .sub.ij of said sensor array indicative of differences between a time of arrival of sound from said localized sound source S at said respective input transducers, at said different points in time t, t=1, . . . , N; a second processor configured to estimate data indicative of a location of said localized sound source S relative to the user based on corresponding values of said location data and said sensor array configuration specific data at said different points in time t, t=1, . . . , N.

2. A hearing system according to claim 1 wherein the detector unit is configured to detect rotational and/or translational movements of the hearing system.

3. A hearing system according to claim 1 wherein said data indicative of a location of said localized sound source S relative to the user at said different points in time t, t=1, . . . , N constitutes or comprises a direction of arrival of sound from said sound source S.

4. A hearing system according to claim 1 wherein said data indicative of a location of said localized sound source S relative to the user at said different points in time t, t=1, . . . , N comprises coordinates of said sound source relative said user, or direction of arrival of sound from and distance to said sound source relative said user.

5. A hearing system according to claim 1 wherein said detector unit comprises a number of IMU-sensors including at least one of an accelerometer, a gyroscope and a magnetometer.

6. A hearing system according to claim 5 wherein at least one of said IMU-sensors is located in a separate device.

7. A hearing system according to claim 1 wherein said second processor is configured to estimate data indicative of a location of said localized sound source S relative to the user based on the following expression for stacked residual vectors r(S.sup.e) originating from said time instances t=1, . . . , N
r(S.sup.e)=y.sub.t.sup.ijh.sub.ij(S.sup.e,R.sub.t,T.sub.t.sup.e) where S.sup.e represent the position of said sound source in an inertial frame of reference, R.sub.t and T.sub.t.sup.eare matrices describing a rotation and a translation, respectively, of the sensor array with respect to the inertial frame at time t, and y.sub.t.sup.ij=.sub.ij+e.sub.t represent said sensor array configuration specific data, where .sub.ij represent said differences between a time of arrival of sound from said localized sound source Sat said respective input transducers i, j, and e.sub.t represents measurement noise, where (i,j)=1, . . . , M, j>i, wherein h.sub.ij is a model of the time differences .sub.ij between each microphone pair P.sub.i and p.sub.j.

8. A hearing system according to claim 7 wherein the second processor is configured to solve the problem represented by the stacked residual vectors r(S.sup.e) in a maximum likelihood framework.

9. A hearing system according to claim 7 wherein the second processor is configured to solve the problem represented by the stacked residual vectors r(S.sup.e) using an Extended Kalman filter (EKF) algorithm.

10. A hearing system according to claim 1 comprising first and second hearing devices, adapted to be located at or in left and right ears of the user, or to be fully or partially implanted in the head at the left and right ears of the user, each of the first and second hearing devices comprising at least one input transducer for providing an electric input signal representing sound in said environment, at least one output transducer for providing stimuli perceivable to the user as representative of said sound in the environment, wherein said at least one input transducer of said first and second hearing devices constitutes or form part of said sensor array.

11. A hearing system according to claim 10 wherein each of the first and second hearing device comprises circuitry for wirelessly exchanging said electric input signals, or parts thereof, with the other hearing device, and/or with an auxiliary device.

12. A hearing system according to claim 10 wherein the first and second hearing devices are constituted by or comprises respective first and second hearing aids.

13. A hearing system according to claim 1 comprising a hearing aid, a headset, an earphone, an ear protection device or a combination thereof.

14. A hearing system according to claim 1 comprising an auxiliary device comprising said second processor.

15. A hearing system according to claim 1 comprising a carrier configured to carry at least some of the M input transducers of the sensor array, wherein the carrier has a dimension larger than 0.10 m.

16. A hearing system according to claim 15 wherein the carrier may be configured to carry at least some of the sensors of the detector unit.

17. A hearing system according to claim 1 the number M input transducers is larger than or equal to 8.

18. A hearing system according to claim 1 comprising one or more cameras.

19. A hearing system according to claim 1 comprising a number of EOG sensors or an eye tracking camera for eye-tracking, and a scene camera for Simultaneous Localization and Mapping (SLAM) combined with a number of Inertial Measurements Units (IMUs) for motion tracking/head rotation.

20. A method of operating a hearing system adapted to be worn by a user and configured to capture sound in an environment of the user, when said hearing system is operationally mounted on the user, the hearing system comprising sensor array of M input transducers, where M2, each for providing an electric input signal representing said sound in said environment, said input transducers p.sub.i, i=1, . . . , M, of said array having a geometrical configuration relative to each other, when worn by the user, the method comprising detecting movements over time of the hearing system when worn by the user, and providing location data of said sensor array at different points in time t, t=1, . . . , N; when said sound comprises sound from a localized sound source S extracting sensor array configuration specific data .sub.ij of said sensor array indicative of differences between a time of arrival of sound from said localized sound source S at said respective input transducers, at said different points in time t, t=1, . . . , from said electric input signals; and estimating data indicative of a location of said localized sound source S relative to the user based on corresponding values of said location data and said sensor array configuration specific data at said different points in time t, t=1, . . . , N.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:

(2) FIG. 1A shows a sound source located in a three dimensional coordinate system defining Cartesian (x, y, z) and spherical (r, , ) coordinates of the sound source, and

(3) FIG. 1B shows a sound source located in a three dimensional coordinate system relative to a microphone array comprising two microphones located on the x-axis symmetrically around origo of the coordinate system (the microphones being e.g. located in each their left and right hearing device), and

(4) FIG. 1C is a further illustration of an example of the geometry of 3D direction of arrival, where the bold line is the direction to the source, S.sup.e, depicted with a solid dot (), the diamonds on the line coinciding with the y-axis represents sensor nodes (e.g. microphone locations), p.sub.i, i=1, . . . , M, is the azimuth angle, is the elevation angle, and is the broadside angle,

(5) FIG. 2 shows an illustration of the orientation, R, and position, T.sup.e, of the array (p.sub.1, p.sub.2, . . . , p.sub.M) with respect to the e frame of reference,

(6) FIG. 3 shows a first embodiment of a hearing system according to the present disclosure,

(7) FIG. 4 shows an embodiment of a hearing device according to the present disclosure,

(8) FIG. 5 shows a second embodiment of a hearing system according to the present disclosure in communication with an auxiliary device,

(9) FIG. 6 shows a third embodiment of a hearing system according to the present disclosure,

(10) FIG. 7 shows a fourth embodiment of a hearing system according to the present disclosure, and

(11) FIG. 8 shows a fifth embodiment of a hearing system according to the present disclosure.

(12) The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.

(13) Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.

DETAILED DESCRIPTION OF EMBODIMENTS

(14) The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as elements). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.

(15) The electronic hardware may include microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

(16) The present application relates to the field of hearing devices, e.g. hearing aids, to hearing systems, e.g. to binaural hearing aid systems

(17) Direction Of Arrival (DOA) estimation and source-location estimation are becoming increasingly important. Some examples are power saving and user tracking in WiFi access points and Mobile cell towers, detection and tracking of acoustic sources. With modern array processing techniques applications such as Massive Multiple Input Output (M-MIMO) and Active Electronically Scanned Array (AESA) Radars can steer the output energy or the antenna sensitivity in the desired direction. Both AESA and M-MIMO are based on planar arrays yielding directionality in azimuth and elevation. However, some system may be limited to linear arrays for computing the DOA, e.g., Binural Hearing Aid Systems (HAS) which use one microphone per ear and towed arrays in deep-sea exploration can only estimate one angle.

(18) In this disclosure, linear arrays with two or more sensors receiving a signal from a source are considered. When the sensors are equidistantly spaced a so called uniform linear array (ULA) is obtained and it gives a uniform spatial sampling of the wavefield. This sampling eases non-parametric narrowband DOA methods, such as MUltiple SIgnal Classification (MUSIC) and Minimum Variance Distortionless Response (MVDR), as they seek the direction with strongest power.

(19) To overcome the limitations of linear arrays several methods has been proposed in order to estimate the 3D source direction or its full position. A chest-worn planar microphone array may be used to estimate the direction, while Head-Related Transfer Functions (HRTFs) are used to estimate the position.

(20) The proposed methods utilize the geometrical properties of the array when subject to motion. The aperture is the space occupied by the array and the simple idea utilized here is that the motion of the array synthesize a larger space. A nonlinear least-squares (NLS) formulation utilizing known motion is proposed and two sequential solutions are proposed. The formulation is extended to include uncertainty in the motion allowing estimation of source locations and the motion simultaneously.

(21) FIG. 1A shows a sound source S located in a three dimensional coordinate system defining Cartesian (x, y, z) and spherical (r, , ) coordinates of the sound source S. A direction of arrival (DOA) of sound from the sound source S at a microphone array located along the x-axis is defined by the angle between the sound source vector r.sub.s and microphone axis (x), indicated by bold dashed arc DOA.

(22) FIG. 1B shows a sound source S located in a three dimensional coordinate system (x, y, z) relative to a microphone array comprising two microphones (mic.sub.1, mic.sub.2) located a distance d=2a apart on the x-axis symmetrically around origo (0, 0, 0) of the coordinate system (i.e. centred in (a, 0, 0) and (a, 0, 0), respectively. The angle between the sound source vector r.sub.s and the microphone array vector may (termed the DOA array vector) is indicated in FIG. 1B by bold dashed arc (DOA). The microphones are e.g. located in each their left and right hearing device, or are e.g. both located in the same hearing device.

(23) The setting illustrated in FIG. 1B is a linear array with two sensors (here microphones) receiving a signal from a sound source S. For simplicity, a free field assumption is made which result in unobstructed waves impinging the array. It is also assumed that wave-front is planar.

(24) When the sources are not perpendicular to the array, the distance between the sensors and the source will be different resulting in a time difference in the received signals. With known speed of the medium (here e.g. air), the time difference can be converted to a distance and with known separation between the sensors, the angle to the source can be calculated.

(25) FIG. 1C is a further illustration of an example of the geometry of 3D direction of arrival, where the bold line is the direction to the source, S.sup.e, depicted with a solid dot (), the diamonds on the line coinciding with the y-axis represents sensor nodes (e.g. microphone locations), p.sub.i, i=1, . . . , M, is the azimuth angle, is the elevation angle, and is the broadside angle.

(26) For simplicity, a free field assumption is made which result in unobstructed waves impinging the array. It is also assumed that wave-front is planar. When the sources are not perpendicular to the array the distance between the sensors and the source will be different resulting in a time difference in the received signals. With known speed of the medium the time difference can be converted to a distance and with known separation between the sensors the angle to the source can be calculated.

(27) When the sensors are not necessarily equidistantly spaced the DOA on a linear sensor array, as illustrated in FIG. 1C, can be described by

(28) sin = c ij .Math. p i - p j .Math. ( 1 )

(29) where [90, 90] is the DOA, .sub.ij is the time difference of between the signal at each sensor p.sub.i and p.sub.j with distance p.sub.i-p.sub.j, and c is the transmission speed of the medium (e.g. air). Time difference measurements can be for instance obtained with time-domain methods based Generalized Cross Correlation (cf. e.g. [Knapp & Carter; 1976]).

(30) A common setting is to consider the array and DOA source all lying in the same plane (e.g. the xy-plane in FIG. 1B. However, a more general case is to consider the array as a vector in custom character.sup.3 and the source as a point in the same space, as illustrated in FIG. 1C. Then the DOA is the angle between the vector from the source to the origin of the array, and the array itself (cf. e.g. FIG. 1B). This is of course nothing but the scalar product, also known as the inner product. It is also common to consider the angle the source vector makes to a vector perpendicular to the array. This angle is called the broadside angle and it is zero for sources perpendicular to the array (along the z-axis in FIG. 1C), i.e., it is the sinus of the scalar product.

(31) The source direction then has two degrees of freedom (DOF), namely, the azimuth () and polar (or elevation) () angles, see e.g. FIG. 1B, 1C. The distance to the source cannot be obtained from angular measurements without translation of the array. When the elevation angle () is zero then the azimuth () and the broadside angles are the same.

(32) A body fixed coordinate (b) frame containing the array at which the sensor nodes are located with X.sup.b in custom character.sup.3 is defined. The orientation of the b frame with respect to an inertial frame of reference (e) is described with a rotation matrix {R custom character.sup.33; det R=1; R.sup.T=R.sup.1}. Hence, for pure orientation changes, vectors between these frames are related by X.sup.b=RX.sup.e and trivially X.sup.e=R.sup.1 X.sup.b=R.sup.T X.sup.b. Denote the translation, i.e., the position, of the array vector with T.sup.e custom character.sup.3 and the position of point source by S.sup.e custom character.sup.3, then the source expressed in the b frame is
S.sup.b=R(S.sup.eT.sup.e).(2)

(33) This rigid body transformation of the array vector and the position of the source is illustrated in FIG. 2.

(34) FIG. 2 is an illustration of the orientation, R and position T.sup.e of the sensor array (p.sub.1, p.sub.2, . . . , p.sub.M) with respect to the e frame of reference. The body fixed array vector is aligned with the y.sup.b vector. The source location, S.sup.e, is illustrated with a solid dot ().

(35) Let the pairwise difference between the Mnodes be denoted by X.sub.ij.sup.b=p.sub.ip.sub.j custom character.sup.3, (i, j)=1, . . . , M, j>i. The DOA in the b-frame is the scalar product between the vectors X.sub.ij.sup.b and S.sup.b. Using eq. (1), the time difference measurement can be expressed as

(36) ij = ( S b ) T X ij b .Math. S b .Math. c = ( R ( S e - T e ) ) T X ij b .Math. R ( S e - T e ) .Math. c = h ij ( S e , R , T e ) ( 3 )

(37) where h.sub.ij is a model of the time differences .sub.ij between each microphone pair p.sub.i and p.sub.j. Thus, the time difference between each node pair can be expressed as a nonlinear function of the source position, the array length, its position and orientation. Furthermore, with S.sup.e=[x,y,z], the azimuth and elevation angles can be defined as

(38) = arc tan y z and = arc cos z .Math. S e .Math.
respectively.

(39) The unknown variable S.sup.e only has two DOF since distance is not observed and it is therefore convenient to assume S.sup.e=1. In this case, the DOA measurements and the measurement function corresponds to a system of nonlinear equations.

(40) Rotation only: If there is no translation i.e., T.sub.t.sup.e=0, t=1, . . . , N, then the distance to the source cannot be found. Hence, S.sup.e has two DOF and can only be determined up to an unknown scale. In the case that there is only one measurement, N=1, the nonlinear system is underdetermined since max rank H=1. In the case N2, there exists a search direction, by the corresponding normal equations, only if rank H=2, since this is also the DOF of the unknown parameter S.sup.e. The rank of the Jacobian is a function of the rotation and the location of the source.

(41) As discussed earlier, the general DOA problem has geometrical ambiguities resulting in rotational invariance for certain configurations. This invariance means that DOA remains the same since the relative distance to the source is not changed by the rotation.

(42) A rotation around the DOA array itself corresponds to a change in pitch. This is because any vector is rotationally invariant to rotations around its own axis i.e., X.sup.b=R(X.sup.b)X.sup.b, where R(X.sup.b) denotes a rotation around the vector X.sup.b. Thus, for rotations around the DOA array the two angles to the source cannot be resolved.

(43) Rotation and translation: When there is translation of the array, then all three DOF of S.sup.e can be considered on the basis of triangulation. Assume that X.sup.b undergoes known rotation and translation {R.sub.t, T.sub.t.sup.e, t=1, . . . , N} and there is a set of DOA measurements, as before. The corresponding measurement function (3) is parametrized by h(S.sup.e, R.sub.t; T.sub.t.sup.e). The basic requirement is that the number of measurements are greater or equal than the DOF, i.e., N3. The motion resulting in rank H<3 from which a search direction cannot be found is translation along vectors parallel to S.sup.e-T.sup.e with any rotation. This result is immediate from (2) since the only information about S.sup.e that affects the measurements (3) are related to orientation changes. From the discussion, it was established that orientation could only contribute to finding two DOF of S.sup.e. The intuition is that such motion does not result in any parallax which is needed for triangulation.

(44) Estimation:

(45) Assume that all rotations and translations (the pose trajectory) {R.sub.t, T.sub.t.sup.e, t=1, . . . , N} of the array vector X.sup.b are available (e.g. from movement monitoring sensors, such as IMUs), and there is a corresponding set of time difference measurements (e.g. based on maximizing respective correlation estimates between the signals in question)
{y.sub.t.sup.ij=.sub.ij+e.sub.t,(i,j)=1,. . . M,j>i,t=1,. . . ,N}

(46) Here y.sub.t.sup.ij is the measurement at the i.sup.th node compared to node j at time t such that j>i and e.sub.t is noise. The collection of measurements at each time t is called a snap-shot. With a stationary source S.sup.e the stacked residual vector for one time instant t=1 can be written as

(47) r ( S e ) = [ y 1 12 - h 12 ( S e , R 1 , T 1 e ) y 1 13 - h 13 ( S e , R 1 , T 1 e ) : - : y 1 1 M - h 1 M ( S e , R 1 , T 1 e ) y 1 23 - h 23 ( S e , R 1 T 1 e ) y 1 24 - h 24 ( S e , R 1 , T 1 e ) : : y 1 2 M h 2 M ( S e , R 1 T 1 e ) : : y 1 ( M - 1 ) M - h ( M - 1 ) M ( S e , R 1 , T 1 e ) ] ( 4 )

(48) And by stacking the N residual vectors (for t=1, . . . , N), we obtain
r(S.sup.e)=[r.sub.1(S.sup.e).sup.T, . . . , r.sub.N(s.sup.e).sup.T].sup.T(5)

(49) where r(S.sup.e) custom character.sup.B1 and B=N.sub.i=1.sup.M1i. The squared from of (5) is
V(S.sup.e)=r(S.sup.e).sub.2.sup.2(6)

(50) which is nonlinear least-squares (NLS) formulation. NLS problems are readily solved using e.g., the Levenberg-Marquardt (LM) method, cf. e.g. [Levenberg; 1944], [Marquardt; 1963]. LM uses only gradient information to perform a quasi-Newton search. The gradient of (6) is

(51) d V ( S e ) d S e = Hr 3 1

(52) where H is the Jacobian, i.e., the matrix of first order partial derivatives dr(Se)

(53) dr ( S e ) dS e = H 3 B

(54) It is also preferable to use a weighting strategy for the NLS problem by taking into account that the measurement noise may vary over the time, and/or be different. The corresponding residuals in (6) are then weighted by the inverse of the measurement covariance r.sub.iR.sub.i.sup.1 or the whole batch as
V.sub.R(S.sup.e)=r(S.sup.e).sub.R.sub.1.sup.2(7)

(55) where R=diag(R.sub.1, . . . , R.sub.B). When the measurement errors are Gaussian, e.sub.t custom character(0, R), then cost function (7) corresponds to the Maximum Likelihood (ML) criterion.

(56) The array is said to be unambiguous if the spatial distribution of the nodes yields a well-defined estimation problem. It turns out that there are two motions for which the array is ambiguous and the S.sup.e cannot be estimated. The first is rotation only (RO) for which only the source direction can be found as long as the rotation is not around the array axis. The second is rotation and translation (RT) of the array. From such general motion the source location is implicitly triangulated by the NLS solution as long as the translation is non-parallel to S.sup.e-T.sup.e.

(57) Target tracking and SLAM: With the NLS problem defined for a stationary source and known motion of the array, it is straightforward to define more challenging cases. If the source is allowed to move, then the parameter S.sup.e is changed to be time-varying S.sub.t.sup.e, t=1, . . . , Nin eq. (6) and the problem is that of target tracking. This is not well-defined since there are more DOFs in the parameter than what can be obtained in the measurements. A remedy may be to include a dynamic model of the parameter into the residual.

(58) V R TT ( S t e ) = .Math. [ r ( S t e ) X t + 1 - FX t ] .Math. diag ( R - 1 , Q - 1 ) 2 ( 8 )

(59) where
X.sub.t+1=vec S.sub.i.sup.e,i=2, . . . , N+1,F=I.sub.3N,X.sub.t=vec S.sub.i.sup.e,i=1, . . . , N

(60) And Q is a diagonal covariance matrix of appropriate dimension. In an embodiment, Q is large.

(61) When there is uncertainty in both the position of sources and the motion of the array a Simultaneous Localization and Mapping (SLAM) problem is obtained. The Maximum Likelihood (ML) version of SLAM does not consider any motion model and thus the following NLS problem is obtained
V.sub.R(S.sub.k.sup.e,T.sub.t.sup.e,R.sub.t)=r(S.sub.k.sup.E,T.sub.t.sup.e,R.sub.t).sub.R.sub.1.sup.2(9)
and there are K stationary sources S.sub.k.sup.e, k=1, . . . , K. This kind of formulation is common in computer vision where it is called Bundle Adjustment.

(62) Sequential solutions: In many applications it is desired to process data in an on-line fashion. By construction, NLS is an off-line solution but sequential recursive methods are easily derived thereof. A well known algorithm is the Extended Kalman filter (EKF) [Jazwinski; 1970], which can be viewed as a special case of NLS without iterations. This naturally leads to iterated solutions which, in general, result in an increased performance. In order to compute a search direction for the RO case, at least two snapshots are needed at each update. Similarly, at least three snapshots are needed in the RT case.

(63) Sequential Nonlinear Least-Squares: A simple sequential NLS (S-NLS) solution can be done as follows. Given an initial guess (x).sup.0 of the unknown parameter x then, for an appropriate number of snapshots iterate
x.sub.i+1=x.sub.i+.sub.i(H.sup.TH).sup.1Hr(10)

(64) until convergence. Here H and r are parametrized by the current iterate x.sub.i, and .sub.i [0, 1] is a step-size, which can be computed with e.g., backtracking. In the RO case (x=S.sup.e), then x can only be estimated up to scale and therefore the estimate should be normalized at each iteration as

(65) x i + 1 := x i + 1 .Math. x i + 1 .Math. ( 11 )

(66) Iterated Extended Kalman filter: State space models are an important tool as they admit dynamic assumptions on the otherwise stationary parameter through a process model. As usual, the state is assumed to evolve according to some process model
x.sub.t+1=f(x.sub.t,w.sub.t),(12)

(67) where w.sub.t is process noise. The iterated Extended Kalman filter (IEKF) can be seen as an NLS solver for state space models. IEKF generally obtains smaller residual errors and is to prefer over the standard EKF when the nonlinearities are severe and computational resources are available. The iterations are performed in the measurement update where the Minimum a posteriori (MAP) cost function is minimized with respect to the unknown state. The cost function can be used to ensure cost decrease and when the iterations should terminate. A basic version of the measurement update in IEKF is summarized in Algorithm 1. For a complete description and other options.

(68) Algorithm 1 Iterated Extended Kalman Measurement Update:

(69) Require an initial state, {circumflex over (x)}.sub.0|0=(x).sup.0T.sup.e, and an initial state covariance, {circumflex over (P)}.sub.0|0.

(70) 1. Measurement update iterations

(71) H i = h ( s ) s | s = x i ( 13 a ) K i = P ^ t | t - 1 H i T ( H i P ^ t | t - 1 H i T + R t ) - 1 ( 13 b ) x i + 1 = x i + i ( x ^ - x i + K i ( y t - h ( x i ) - H i ( x ^ - x i ) ) ) ( 13 c )

(72) 2. Update the state and the covariance
{circumflex over (x)}.sub.t|t=x.sub.i+1,(14a)
{circumflex over (P)}.sub.t|t=(IK.sub.iH.sub.i){circumflex over (P)}.sub.t|t1(14b)

(73) Example Stationary Target

(74) With a stationary target initialized at S.sup.e=[10, 10, 10].sup.T+w, where w custom character(0.sub.31, I.sub.3), the cases of rotation only (RO) and rotation and translation (RT) are evaluated in a Monte Carlo (MC) fashion. For each case, the measurements are from an array with M=2 with p.sub.1P.sub.2=0.3 giving y.sub.t=.sub.12+e.sub.t, t=1, . . . , 31, where e.sub.t custom character(0,0.01). The rotation sequence is given by a roll pitch and yaw motion as R.sub.t=[0, 0, 0].sup.T.fwdarw.[30, 30, 30].sup.T [] in increments of one degree. The translation sequence is T.sub.t.sup.e=[0, 0, 0].sup.T.fwdarw.[0, 0.3, 0.3].sup.T [m] in increments of 0.01 m for the yz coordinates. For both cases, twenty runs where made and all estimators where run until no significant progress could be made. The dynamic model used in IEKF is constant position x.sub.t+1=x.sub.t+w.sub.t, where w.sub.t custom character(0, Q=0.01I.sub.3). The measurement covariance R=0.01I, where I is either I.sub.2 for RO or I.sub.3 for RT. For all three methods, a fixed step size =0.5 where chosen, and the initial point in each MC iterate was (S.sup.e).sup.0=S.sup.3+w.sup.init, where w.sup.init custom character(0, 0.5.sup.2I.sub.3). In Table 1, the RMSE over the MC estimation results from the proposed methods on the two cases are shown. All three methods work fine and, as expected, the two sequential solutions perform slightly worse than NLS.

(75) TABLE-US-00001 TABLE 1 RMSE of estimates obtained with the proposed methods for the case of rotation only and the case of rotation and translation. Method/Case NLS S-NLS IEKF RO 0.0069 0.1526 0.2222 RT 0.5737 0.7298 0.6762

(76) Example (Fixed Microphone Distance):

(77) The direction of arrival (DOA) of a soundwave, assumed to be a free-field and planar wave front, impinging the array can be described by

(78) 0 sin = ( R ( S e - T e ) ) T X b .Math. R ( S e - T e ) .Math. d = h ( S e , R , T e ) . ( 1 )

(79) Where represents the DOA, R is the 3D orientation of the array, S.sup.e (=(x.sub.s, y.sub.s, z.sub.s) in FIG. 1B) is the position of the sound source where superscript e denotes an inertial reference frame, T.sup.e is the position of the array (=(0, 0, 0) in FIG. 1B), X.sup.b (=2a, 0, 0) is the array vector described in the body fixed coordinate frame and d (=2a in FIG. 1B) is the length of the array, i.e. (here with two microphones), the distance between the microphones. The nonlinear expression can be stacked into a nonlinear equation system

(80) r ( S e ) = [ y 1 - h ( S e , R 1 , T 1 e ) .Math. y N - h ( S e , R N , T N e ) ] , ( 4 )

(81) where the y's are the DOA measurements found via e.g., delay-and-sum or beamforming. Then the two-norm of the residual vector r(S.sup.e) can be solved for in two scenarios: 1. Given two, or more, DOA measurements from distinct orientations, which are not a rotation around the array axis X.sup.b, then the corresponding equation system can be solved with respect to S.sup.e. In this scenario, only the direction, , to the source can be found, i.e., not the distance r. This method requires that the orientation of the array can be computed. This can be done using inertial measurement units (IMU), e.g. a 3D-gyroscope and/or a 3D-accelerometer. 2. Given three, or more, DOA measurements at distinct positions, and the translation is not along the DOA vector, then the corresponding equation system can be solved with respect to S.sup.e. In this scenario the full three degrees of freedom of the system can be found. This method requires that the position of the array can be computed. This can be done using the IMU over short time intervals.

(82) The minimization procedure can be any nonlinear least squares (NLS) method such as Levenberg-Marquardt or standard NLS with line-search.

(83) FIG. 3 shows a first embodiment of a hearing system according to the present disclosure. The hearing system (HD) is adapted to be worn by a user and configured to capture sound in an environment of the user, when the hearing system is operationally mounted on the user's head. The hearing system comprises a sensor array of M=2 input transducers, here microphones M1, M2. Each microphone provides an electric input signal representing sound in the environment. The input transducers of the array have a known geometrical configuration relative to each other, when worn by the user (here defined by microphone distance d between M1 and M2). Each microphone path comprises an analogue to digital converter (AD) for sampling an analogue electric signal, thereby converting it to a digital electric input signal (e.g. using a sampling frequency of 20 kHz or more). Each microphone path further comprises an analysis filter bank (FBA) for providing a digitized electric input signal in a number of frequency sub-bands (e.g. K=64 or more). Each frequency sub-band signal (e.g. represented by index k) may comprise a time-variant complex representation of the input signal in successive time instances m, m+1, . . . (time frames).

(84) The hearing system further comprises a detector unit (DET) (or is configured for receiving corresponding signals from separate sensors) for detecting movements over time of the hearing system when worn by the user, and providing location data of said sensor array at different points in time t, t=1, . . . , N. The detector (DET) provides data indicative of a track of the user (hearing system) relative to the sound source (cf. signal(s) trac, e.g. from Q different sensors or comprising Q different signals)

(85) The hearing system further comprises a first processor (PRO1) for receiving said electric input signals andin case said sound comprises sound from a localized sound source Sfor extracting sensor array configuration specific data .sub.ij (cf. signal tau) of the sensor array indicative of differences between a time of arrival of sound from the localized sound source S at said respective input transducers (M1, M2), at different points in time t, t=1, . . . , N.

(86) FIG. 3 illustrates propagation paths (in a plane wave approximation (acoustic far-field)) from the localized sound source (S), e.g. a talker, situation at time t=1. It can be seen that sound from source S will arrive later at the second microphone M2 than at the first microphone M1. The time difference, denoted .sub.12 is determined in the first processor based on the two electric input signals (e.g. determining the time difference, .sub.12, as the time that maximizes a correlation measure between the two electric input signals). A movement of the user and the sound source (S) relative to each other is schematically indicated by the spatial displacement of the sound source S indicated by time instants t=2 and t=3, respectively.

(87) The hearing system further comprises a second processor (PRO2) configured to estimate data indicative of a location of said localized sound source S relative to the user based on corresponding values of said location data and said sensor array configuration data at said different points in time t, t=1, . . . , N. The data indicative of a location of said localized sound source S relative to the user may e.g. be a direction of arrival (cf. signal doa from the processor (PRO2) to the beamformer filtering unit BF)

(88) The embodiment of a hearing system in FIG. 3 further comprises (as already mentioned) a beamformer filtering unit (BF) for spatially filtering the electric input signals from microphones M1 and M2 and providing a beamformed signal. The beamformer filtering unit (BF) is a customer of location data from the second processor (PRO2) to allow the generation of a beamformer that attenuates signals from the sound source S less than signals from other directions (e.g. an MVDR beamformer, cf. e.g. EP2701145A1). In the embodiment of FIG. 3 the beamformer filtering unit (BF) receives data indicative of a direction of arrival of the (target) sound relative to the user (and thus to the sensor array M1, M2) as indicated in FIG. 3 (solid arrow denoted DOA from S to midway between M1 and M2). Alternatively, the beamformer filtering unit (BF) may receive a location of the target sound source (s), e.g. including a distance from source (s) to user.

(89) The embodiment of a hearing system in FIG. 3 further comprises signal processor (SPU) for processing the spatially filtered (and possibly further noise reduced signal) from the beamformer filtering unit in a number of frequency sub-bands. The signal processor (SPU) is e.g. configured to apply further processing algorithms, e.g. compressive amplification (to apply a frequency and level dependent amplification or attenuation to the beamformed signal), feedback suppression, etc. The signal processor (SPU) provides a processed signal that is fed to synthesis filter bank (FBS) for conversion from the time frequency domain to the time domain. The output of the synthesis filter bank (FBS) is fed to an output unit (here a loudspeaker) for providing stimuli representative of sound to the user (based in the electric input signals representative of sound in the environment).

(90) The embodiment of a hearing system in FIG. 3 may be partitioned in different ways. In an embodiment, the hearing system comprises first and second hearing devices adapted for being located around left and right ears of the user (e.g. so that the first and second microphones (M1, M2) are located the left and right ears of the user, respectively.

(91) FIG. 4 shows an embodiment of a hearing device according to the present disclosure. FIG. 4 shows an embodiment of a hearing system comprising a hearing device (HD) comprising a BTE-part (BTE) adapted for being located behind pinna and a part (ITE) adapted for being located in an ear canal of the user. The ITE-part may, as shown in FIG. 4, comprise an output transducer (e.g. a loudspeaker/receiver) adapted for being located in an ear canal of the user and to provide an acoustic signal (providing, or contributing to, an acoustic signal at the ear drum). In the latter case, a so-called receiver-in-the-ear (RITE) type hearing aid is provided. The BTE-part (BTE) and the ITE-part (ITE) are connected (e.g. electrically connected) by a connecting element (IC), e.g. comprising a number of electric conductors. Electric conductors of the connecting element (IC) may e.g. have the purpose of transferring electrical signals from the BTE-part to the ITE-part, e.g. comprising audio signals to the output transducer, and/or for functioning as antenna for providing wireless interface. The BTE part (BTE) comprises an input unit comprising two input transducers (e.g. microphones) (IT.sub.11, T.sub.12) each for providing an electric input audio signal representative of an input sound signal from the environment. In the scenario of FIG. 4, the input sound signal S.sub.BTE includes a contribution from sound source S (and possibly additive noise from the environment). The hearing aid (HD) of FIG. 4 further comprises two wireless transceivers (WLR.sub.1, WLR.sub.2) for transmitting and/or receiving respective audio and/or information signals and/or control signals (possibly including localization data from external detectors, and/or one or more audio signals from a contra-lateral hearing device or an auxiliary device). The hearing aid (HD) further comprises a substrate (SUB) whereon a number of electronic components are mounted, functionally partitioned according to the application in question (analogue, digital, passive components, etc.), but including a configurable signal processor (SPU), e.g. comprising a processor for executing a number of processing algorithms, e.g. to compensate for a hearing loss of a wearer of the hearing device), a processor (PRO, cf. e.g. PRO1, PRO2 of FIG. 3) for extracting localization data according to the present disclosure, and a detector unit (DET) coupled to each other and to input and output transducers and wireless transceivers via electrical conductors Wx. Typically a front end IC for interfacing to the input and output transducers, etc. is further included on the substrate. The mentioned functional units (as well as other components) may be partitioned in circuits and components according to the application in question (e.g. with a view to size, power consumption, analogue vs. digital processing, etc.), e.g. integrated in one or more integrated circuits, or as a combination of one or more integrated circuits and one or more separate electronic components (e.g. inductor, capacitor, etc.). The configurable signal processor (SPU) provides a processed audio signal, which is intended to be presented to a user. In the embodiment of a hearing device in FIG. 4, the ITE part (ITE) comprises an input transducer (e.g. a microphone) (IT.sub.2) for providing an electric input audio signal representative of an input sound signal from the environment (including from sound source S) at or in the ear canal. In another embodiment, the hearing aid may comprise only the BTE-microphones (IT.sub.11, IT.sub.12). In another embodiment, the hearing aid may comprise only the ITE-microphone (IT2). In yet another embodiment, the hearing aid may comprise an input unit located elsewhere than at the ear canal in combination with one or more input units located in the BTE-part and/or the ITE-part. The ITE-part may further comprise a guiding element, e.g. a dome (DO) or equivalent, for guiding and positioning the ITE-part in the ear canal of the user.

(92) The hearing aid (HD) exemplified in FIG. 4 is a portable device and further comprises a battery, e.g. a rechargeable battery, (BAT) for energizing electronic components of the BTE- and possibly of the ITE-parts.

(93) In an embodiment, the hearing device (HD) of FIG. 4 form part of a hearing system according to the present disclosure for localizing a target sound source in the environment of a user.

(94) The hearing aid (HD) may e.g. comprise a directional microphone system (including a beamformer filtering unit) adapted to spatially filter out a target acoustic source among a multitude of acoustic sources in the local environment of the user wearing the hearing aid, and to suppress noise from other sources in the environment. The beamformer filtering unit may receive as inputs the respective electric signals from input transducers IT.sub.11, IT.sub.12, IT.sub.2 (and possibly further input transducers) (or any combination thereof) and generate a beamformed signal based thereon. In an embodiment, the directional system is adapted to detect (such as adaptively detect) from which direction a particular part of the microphone signal (e.g. a target part and/or a noise part) originates. In an embodiment, the beam former filtering unit is adapted to receive inputs from a user interface (e.g. a remote control or a smartphone) regarding the present target direction. A memory unit (MEM) may e.g. comprise predefined (or adaptively determined) complex, frequency dependent constants (Wi.sub.j) defining predefined (or adaptively determined) or fixed beam patterns (e.g. omni-directional, target cancelling, pointing in a number of specific directions relative to the user), together defining a beamformed signal Y.sub.BF.

(95) The hearing aid of FIG. 4 may constitute or form part of a hearing aid and/or a binaural hearing aid system according to the present disclosure. The processing of an audio signal in a forward path of the hearing aid (the forward path including the input transducer(s), the signal processor, and the output transducer) may e.g. be performed fully or partially in the time-frequency domain. Likewise, the processing of signals in an analysis or control path of the hearing aid may be fully or partially performed in the time-frequency domain.

(96) The hearing aid (HD) according to the present disclosure may comprise a user interface UI, e.g. as shown in FIG. 5 implemented in an auxiliary device (AD), e.g. a remote control, e.g. implemented as an APP in a smartphone or other portable (or stationary) electronic device.

(97) FIG. 5 shows a second embodiment of a hearing system according to the present disclosure in communication with an auxiliary device. FIG. 5 shows an embodiment of a binaural hearing system comprising left and right hearing devices (HD.sub.left, HD.sub.right) and an auxiliary device (AD) in communication with each other according to the present disclosure. The left and right hearing devices are adapted for being located at or in left and right ears and/or for fully or partially being implanted in the head at left and right ears of a user. The left and right hearing devices and the auxiliary device (e.g. a separate processing or relaying device, e.g. a smartphone or the like) are configured to allow an exchange of data between them (cf. links IA-WL (localization data LOC.sub.left, LOC.sub.right, respectively) and AD-WL (control-information signals X-CNT.sub.left/right) in FIG. 5), including exchanging localization data, audio data, control data, information, or the like. The binaural hearing system comprises a user interface (UI) fully or partially implemented in the auxiliary device (AD), e.g. as an APP, cf. Source localization APP screen of the auxiliary device (AD) in FIG. 5. The APP allows a display of a current localization of a sound source S relative to the user (wearing the hearing system), and allows to control functionality of the hearing system, e.g. an activation or deactivation of source localization according to the present disclosure.

(98) The left and right hearing devices each comprise a forward path between M input units IU, i=1, . . . , M (each comprising e.g. an input transducer, such as a microphone or a microphone system and/or a direct electric input (e.g. a wireless receiver)) and an output unit (SP), e.g. an output transducer, here a loudspeaker. A beamformer or selector (BF) and a signal processor (SPU) is located in the forward path. In an embodiment, the signal processor is adapted to provide a frequency dependent gain according to a user's particular needs. In the embodiment of FIG. 5, the forward path comprises appropriate analogue to digital converters and analysis filter banks (AD/FBA) to provide input signals IN.sub.1, . . . , IN.sub.M (and to allow signal processing to be conducted) in frequency sub-bands (in the (time-) frequency domain) In another embodiment, some or all signal processing of the forward path is conducted in the time domain. The weighting unit (beamformer or mixer or selector) (BFU) provides beamformed or mixed or selected signal Y.sub.BF based on one or more of the input signals IN.sub.1, . . . , IN.sub.M. The function of the weighting unit (BF) is controlled via the signal processor (SPU), cf. signal CTR, e.g. influenced by the user interface (signal X-CNT) and/or the localization signals doa and r.sub.s representing direction of arrival and distance, respectively, to a currently active sound source in the environment (as determined according to the present disclosure). The forward path further comprises a synthesis filter bank and appropriate digital to analogue converter (FBS/DA) to prepare the processed frequency sub-band signals OUT from the signal processor (SPU) as an analogue time domain signal for presentation to a user via the output transducer (loudspeaker) (SP). The respective configurable signal processor s(SPU) are in communication with the respective processors (PRO) for determining localization data (doa and r.sub.s) via signals ctr and LOC. The control signal ctr from unit SPU to unit PRO may e.g. allow the signal processor (SPU) to control a mode of operation of the system, (e.g. via the user interface), e.g. to activate or deactivate source localization (or otherwise influence it). Data signals LOC may be exchanged between the two processing units, e.g. to allow localization data from a contra-lateral hearing device to influence the resulting localization data applied to the beamformer filtering unit (BF), e.g. exchanged via the link IA-WL (LOC.sub.left, LOC.sub.right). The interaural wireless ling IA-WL for the transfer of audio and/or control signals between the left and right hearing devices may e.g. be based on near-field communication, e.g. magnetic induction technologies (such as NFC or proprietary schemes).

(99) FIG. 6 shows a third embodiment of a hearing system (HS) according to the present disclosure. FIG. 6 shows an embodiment of a hearing system according to the present disclosure comprising left and right hearing devices and a number of sensors mounted on a spectacle frame. The hearing system (HS) comprises a number of sensors S.sub.1i, S.sub.2i (i=1, . . . , N.sub.S) associated with (e.g. forming part of or connected to) left and right hearing devices (HD.sub.1, HD.sub.2), respectively. The first, second and third sensors S.sub.11, S.sub.12, S.sub.13 and S.sub.21, S.sub.22, S.sub.23 are mounted on a spectacle frame of the glasses (GL). In the embodiment of FIG. 3, sensors S.sub.11, S.sub.12 and S.sub.21, S.sub.22 are mounted on the respective sidebars (SB.sub.1 and SB.sub.2), whereas sensors S.sub.13 and S.sub.23 are mounted on the cross bar (CB) having hinged connections to the right and left side bars (SB.sub.1 and SB.sub.2). Glasses or lenses (LE) of the spectacles are mounted on the cross bar (CB). The left and right hearing devices (HD.sub.1, HD.sub.2) comprises respective BTE-parts (BTE.sub.1, BTE.sub.2), and may e.g. further comprise respective ITE-parts (ITE.sub.1, ITE.sub.2). The ITE-parts may e.g. comprise electrodes for picking up body signals from the user, e.g. forming part of sensors S.sub.1i, S.sub.2i (i=1, . . . , N.sub.S) for monitoring physiological functions of the user, e.g. brain activity or eye movement activity or temperature. The sensors (detectors, cf. detector unit DET in FIG. 3) mounted on the spectacle frame may e.g. comprise one or more of an accelerometer, a gyroscope, a magnetometer, a radar sensor, an eye camera (e.g. for monitoring pupillometry), etc., or other sensors for localizing or contributing to localization of a sound source of interest to the user wearing the hearing system.

(100) FIG. 7 shows an embodiment of a hearing system according to the present disclosure. The hearing system comprises a hearing device (HD), e.g. a hearing aid, here illustrated as a particular style (sometimes termed receiver-in-the ear, or RITE, style) comprising a BTE-part (BTE) adapted for being located at or behind an ear of a user, and an ITE-part (ITE) adapted for being located in or at an ear canal of the user's ear and comprising a receiver (loudspeaker, SPK). The BTE-part and the ITE-part are connected (e.g. electrically connected) by a connecting element (IC) and internal wiring in the ITE- and BTE-parts (cf. e.g. wiring Wx in the BTE-part). The connecting element may alternatively be fully or partially constituted by a wireless link between the BTE- and ITE-parts.

(101) In the embodiment of a hearing device in FIG. 7, the BTE part comprises three input units comprising respective input transducers (e.g. microphones) (M.sub.BTE1, M.sub.BTE2, M.sub.BTE3), each for providing an electric input audio signal representative of an input sound signal (S.sub.BTE) (originating from a sound field S around the hearing device). The input unit further comprises two wireless receivers (WLR.sub.1, WLR.sub.2) (or transceivers) for providing respective directly received auxiliary audio and/or control input signals (and/or allowing transmission of audio and/or control signals to other devices, e.g. a remote control or processing device). The input unit further comprises a video camera (VC) located in the housing of the BTE-part, e.g. so that its field of view (FOV) is directed in a look direction of the user wearing the hearing device (here next to the electric interface to the connecting element (IC)). The video camera (VC) may e.g. be coupled to a processor and arranged to constitute a scene camera for SLAM. The hearing device (HD) comprises a substrate (SUB) whereon a number of electronic components are mounted, including a memory (MEM) e.g. storing different hearing aid programs (e.g. parameter settings defining such programs, or parameters of algorithms (e.g. for implementing SLAM), e.g. optimized parameters of a neural network) and/or hearing aid configurations, e.g. input source combinations (M.sub.BTE1, M.sub.BTE2, M.sub.BTE3, M.sub.ITE1, M.sub.ITE2, WLR.sub.1, WLR.sub.2, VC), e.g. optimized for a number of different listening situations. The substrate further comprises a configurable signal processor (DSP, e.g. a digital signal processor, e.g. including a processor (e.g. PRO in FIG. 2A) for applying a frequency and level dependent gain, e.g. providing beamforming, noise reduction (including improvements using the camera), filter bank functionality, and other digital functionality of a hearing device according to the present disclosure). The configurable signal processor (DSP) is adapted to access the memory (MEM) and for selecting and processing one or more of the electric input audio signals and/or one or more of the directly received auxiliary audio input signals, and/or the camera signal based on a currently selected (activated) hearing aid program/parameter setting (e.g. either automatically selected, e.g. based on one or more sensors, or selected based on inputs from a user interface). The mentioned functional units (as well as other components) may be partitioned in circuits and components according to the application in question (e.g. with a view to size, power consumption, analogue vs. digital processing, etc.), e.g. integrated in one or more integrated circuits, or as a combination of one or more integrated circuits and one or more separate electronic components (e.g. inductor, capacitor, etc.). The configurable signal processor (DSP) provides a processed audio signal, which is intended to be presented to a user. The substrate further comprises a front-end IC (FE) for interfacing the configurable signal processor (DSP) to the input and output transducers, etc., and typically comprising interfaces between analogue and digital signals. The input and output transducers may be individual separate components, or integrated (e.g. MEMS-based) with other electronic circuitry.

(102) The hearing system (here, the hearing device HD) further comprises a detector unit comprising one or more inertial measurement units (IMU), e.g. a 3D gyroscope, a 3D accelerometer and/or a 3D magnetometer, here denoted IMU1 and located in the BTE-part (BTE). Inertial measurement units (IMUs), e.g. accelerometers, gyroscopes, and magnetometers, and combinations thereof, are available in a multitude of forms (e.g. multi-axis, such as 3D-versions), e.g. constituted by or forming part of an integrated circuit, and thus suitable for integration, even in miniature devices, such as hearing devices, e.g. hearing aids. The sensor IMU1 may thus be located on the substrate (SUB) together with other electronic components (e.g. MEM, FE, DSP). One or more movement sensors (IMU) may alternatively or additionally be located in or on the ITE part (ITE) or in or on the connecting element (IC).

(103) The hearing device (HD) further comprises an output unit (e.g. an output transducer) providing stimuli perceivable by the user as sound based on a processed audio signal from the processor or a signal derived therefrom. In the embodiment of a hearing device in FIG. 7, the ITE part comprises the output unit in the form of a loudspeaker (also termed a receiver) (SPK) for converting an electric signal to an acoustic (air borne) signal, which (when the hearing device is mounted at an ear of the user) is directed towards the ear drum (Ear drum), where sound signal (S.sub.ED) is provided. The ITE-part further comprises a guiding element, e.g. a dome, (DO) for guiding and positioning the ITE-part in the ear canal (Ear canal) of the user. The ITE part (e.g. a housing or a soft or rigid or semi-rigid dome-like structure) comprises a number of electrodes or electric potential sensors (EPS) (EL1, EL2) for picking up signals (e.g. potentials or currents) from the body of the user, when mounted in the ear canal. The signals picked up by the electrodes or EPS may e.g. be used for estimating an eye gaze angle of the user (using EOG). The ITE-part further comprises two further input transducers, e.g. a microphone (M.sub.ITE1, M.sub.ITE2) for providing respective electric input audio signal representative of a sound field (S.sub.ITE) at the ear canal.

(104) An auxiliary electric signal derived from visual information from video camera VC may be used in a mode of operation where it is combined with an electric sound signal from one of more of the input transducers (e.g. the microphones) to localize sound sources relative to the user. In another mode of operation, the a beamformed signal is provided by appropriately combining electric input signals from the input transducers (M.sub.BTE1, M.sub.BTE2, M.sub.BTE3, M.sub.ITE1, M.sub.ITE2), e.g. by applying appropriate complex weights to the respective electric input signals (beamformer). In a mode of operation, the auxiliary electric signal is used as input to a processing algorithm (e.g. a single channel noise reduction algorithm) to enhance a signal of the forward path, e.g. a beamformed (spatially filtered) signal.

(105) The electric input signals (from input transducers M.sub.BTE1, M.sub.BTE2, M.sub.BTE3, M.sub.ITE1, M.sub.ITE2) may be processed in the time domain or in the (time-) frequency domain (or partly in the time domain and partly in the frequency domain as considered advantageous for the application in question).

(106) The hearing device (HD) exemplified in FIG. 7 is a portable device and further comprises a battery (BAT), e.g. a rechargeable battery, e.g. based on Li-Ion battery technology, e.g. for energizing electronic components of the BTE- and possibly ITE-parts. In an embodiment, the hearing device, e.g. a hearing aid, is adapted to provide a frequency dependent gain and/or a level dependent compression and/or a transposition (with or without frequency compression) of one or more frequency ranges to one or more other frequency ranges, e.g. to compensate for a hearing impairment of a user.

(107) The hearing device in FIG. 7 may thus implement a hearing system comprising a combination of EOG (based on EOG sensors (EL1, EL2), e.g. electrodes) for eye-tracking and a scene camera (VC) for SLAM combined with movement sensors (IMU1) for motion tracking/head rotation.

(108) FIG. 8 shows a further embodiment of a hearing system according to the present disclosure. The hearing system comprises a spectacle frame comprising a number of input transducers here 12 microphones, 3 on each of the left and right side bars, and 6 on the cross-bar. Thereby an acoustic image of (most) of the sound scene of interest to the user can be monitored. Further, the hearing system comprises a number of movement sensors (IMU), here two, one on each of the left and right side bars for picking up movement of the user, incl. rotation of the user's head. The hearing system further comprises a number of cameras, here 3. All three cameras are located on the cross-bar. Two of the cameras (denoted Eye-tracking cameras in FIG. 8) are located and oriented towards the face of the user and to allow a monitoring of the user's eyes, e.g. to provide an estimate of a current eye gaze of the user. The third camera (denoted Front-facing camera in FIG. 8) is located in the middle of the cross-bar and oriented to allow it to monitor the environment in front of the user, e.g. in a look direction of the user.

(109) The hearing system in FIG. 8 may thus implement a hearing system comprising a carrier (here in the form of a spectacle frame) configured to host at least some of the input transducers of the system (here 12 microphones), a number of cameras (a scene camera, e.g. for Simultaneous Localization and Mapping (SLAM) and two eye-tracking cameras for eye gaze). The hearing system may e.g. further comprise one or two hearing devices adapted to be located at the ears of a user (e.g. mounted on or connected to the carrier (spectacle frame) and operationally coupled to the (12) microphones and the (3) cameras. The hearing system may thus be configured to localize sound sources in the environment of the user and to use this localization to improve the processing of the hearing device(s), e.g. to compensate for a hearing impairment of a user and/or to assist a user in a difficult sound environment.

(110) It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.

(111) As used, the singular forms a, an, and the are intended to include the plural forms as well (i.e. to have the meaning at least one), unless expressly stated otherwise. It will be further understood that the terms includes, comprises, including, and/or comprising, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being connected or coupled to another element, it can be directly connected or coupled to the other element but an intervening element may also be present, unless expressly stated otherwise. Furthermore, connected or coupled as used herein may include wirelessly connected or coupled. As used herein, the term and/or includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.

(112) It should be appreciated that reference throughout this specification to one embodiment or an embodiment or an aspect or features included as may means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.

(113) The claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean one and only one unless specifically so stated, but rather one or more. Unless specifically stated otherwise, the term some refers to one or more.

(114) Accordingly, the scope should be judged in terms of the claims that follow.

REFERENCES

(115) [Jazwinski; 1970] Andrew H. Jazwinski, Stochastic Processes and Filtering Theory, vol. 64 of Mathematics in Science and Engineering, Academic Press, Inc, 1970.

(116) [Knapp & Carter; 1976] C. Knapp and G. Carter, The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320-327, August 1976.

(117) [Levenberg; 1944] Kenneth Levenberg, A method for the solution of certain non-linear problems in least squares, Quarterly Journal of Applied Mathematics, vol. II, no. 2, pp. 164-168, 1944.

(118) [Marquardt; 1963] Donald W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, SIAM Journal on Applied Mathematics, vol. 11, no. 2, pp. 431-441, 1963.

(119) EP2701145A1 (Oticon, Retune) Feb. 26, 2014.

(120) EP3267697A1 (Oticon) Jan. 1, 2018.