Abstract
A hearing device adapted to be worn by a user and for picking up sound containing the user's own voice is provided. The hearing device comprises a) an input unit comprising first and second input transducers for converting sound to first and second electric input signals, respectively, representing said sound; b) a processor configured to receive said first and second electric input signals and to provide a combined signal as a linear combination of the first and second electric input signals, wherein the combined signal comprises an estimate of the user's own voice, and c) wherein said hearing device is configured to provide that said first and second input transducers are located on said user at first and second locations, when worn by said user; and d) wherein said first and second locations are selected to provide that said first and second electric signals exhibit substantially different directional responses for sound from the user's mouth as well as from sound from sound sources located in an environment around the user. A method of operating a hearing device is further disclosed. Thereby an improved quality of an own voice estimate may be provided.
Claims
1. A hearing device adapted to be worn by a user and for picking up sound containing the user's own voice, the hearing device comprising an input unit comprising first and second input transducers for converting sound to first and second electric input signals, respectively, representing said sound; a processor configured to receive said first and second electric input signals and to provide a combined signal as a linear combination of the first and second electric input signals, wherein the combined signal comprises an estimate of the user's own voice, and wherein said hearing device is configured to provide that said first and second input transducers are located on said user at first and second locations, when worn by said user; and wherein said first and second locations are selected to provide that said first and second electric signals exhibit substantially different directional responses for sound from the user's mouth as well as from sound from sound sources located in an environment around the user.
2. A hearing device according to claim 1 wherein the processor comprises one or more beamformers each providing a spatially filtered signal by filtering and summing said first and second electric input signals, wherein one of said beamformers is an own voice beamformer and wherein said spatially filtered signal comprises an estimate of the user's own voice.
3. A hearing device according to claim 1 comprising an in the ear (ITE-)part that provides an open fitting between the first and second locations.
4. A hearing device according to claim 1 wherein the first input transducer is located in an ear canal of the user facing the eardrum and wherein the second input transducer is located at or in said ear canal of the user facing the environment.
5. A hearing device according to claim 1 comprising an output unit comprising an output transducer, e.g. a loudspeaker, for converting an electric signal representing sound to an acoustic signal representing said sound.
6. A hearing device according to claim 5 wherein the output transducer is located in the hearing device between the first and second input transducers.
7. A hearing device according to claim 1 comprising an earpiece adapted to be located at or in an ear of the user, whereon or wherein said first input transducer and/or said output transducer is/are supported or located.
8. A hearing device according to claim 7 wherein said earpiece is configured to contribute to an at least partial sealing between the first and second locations.
9. A hearing device according to claim 8 comprising a sealing element configured to contribute to said at least partial sealing between the first and second locations.
10. A hearing device according to claim 1 comprising a transmitter, e.g. a wireless transmitter, configured to transmit said estimate of the user's own voice or a processed version thereof to another device or system.
11. A hearing device according to claim 1 comprising a keyword detector configured to receive said estimate of the user's own voice or a processed version thereof.
12. A hearing device according to claim 1 wherein said processor comprises a beamformer block configured to provide one or more beamformers each being configured to filter said first and second electric input signals, and to provide a spatially filtered (beamformed) signal, and wherein said one or more beamformers comprises an own voice beamformer comprising predetermined or adaptively updated own voice filter weights, wherein an estimate of the user's own voice is provided in dependence on said own voice filter weights and said first and second electric input signals.
13. A hearing device according to claim 1 comprising one or more further input transducers for providing one or more further electric signals representing sound in the environment of the user.
14. A hearing device according to claim 13 wherein at least one of said one or more further input transducers is located off-line compared to said first and second input transducers.
15. A hearing device according to claim 1 wherein said first and second input transducer comprises at least one microphone.
16. A hearing device according to claim 1 wherein said first and second input transducer comprises at least one vibration sensor, e.g. an accelerometer.
17. A hearing device according to claim 1 comprising a hearing aid, a headset, an earphone, an ear protection device or a combination thereof.
18. A hearing device adapted to be worn by a user and for picking up sound containing the user's own voice, the hearing device comprising an input unit comprising first and second input transducers for converting sound to first and second electric input signals, respectively, representing said sound; a processor configured to receive said first and second electric input signals and to provide a combined signal as a linear combination of the first and second electric input signals, wherein the combined signal comprises an estimate of the user's own voice, and wherein said hearing device is configured to provide that said least first and second input transducers are located on said user at first and second locations, when worn by said user; and wherein said first and second locations are defined by properties of the respective first and second electric input signals being different in that they exhibit a difference in signal to noise ratio of an own voice signal ΔSNR.sub.OV=SNR.sub.OV,1−SNR.sub.OV,2 larger than an SNR-threshold TH.sub.SNR, where SNR.sub.OV,1>SNR.sub.OV,2, and where noise is taken to be all other environmental acoustic signals than that originating from the user's own voice.
19. A hearing device according to claim 18 comprising an in the ear (ITE-)part that fully or partially (acoustically) blocks (occludes) the ear canal between the first and second locations.
20. A method of operating a hearing device adapted to be worn by a user and for picking up sound containing the user's own voice, the method comprising converting sound to first and second electric input signals, respectively, representing said sound using first and second input transducers; providing a spatially filtered signal by filtering and summing said first and second electric input signals, and wherein said spatially filtered signal comprises an estimate of the user's own voice, providing that said first and second input transducers are located on said user at first and second locations, when worn by said user; and selecting said first and second locations to provide that said first and second electric signals exhibit substantially different directional responses for sound from the user's mouth as well as from sound from sound sources located in an environment around the user.
21. A method according to claim 20 further comprising providing an open fitting between the first and second locations.
22. A method of operating a hearing device adapted to be worn by a user and for picking up sound containing the user's own voice, the method comprising converting sound to first and second electric input signals, respectively, representing said sound using first and second input transducers; providing a spatially filtered signal by filtering and summing said first and second electric input signals, and wherein said spatially filtered signal comprises an estimate of the user's own voice, providing that said first and second input transducers are located on said user at first and second locations, when worn by said user; and selecting said first and second locations to provide that said first and second electric signals exhibit a difference in signal to noise ratio of an own voice signal ΔSNR.sub.OV=SNR.sub.OV,1−SNR.sub.OV,2 larger than an SNR-threshold TH.sub.SNR, where SNR.sub.OV,1>SNR.sub.OV,2, where noise is taken to be all other environmental acoustic signals than that originating from the user's own voice.
23. A method according to claim 22 further comprising providing that the ear canal between the first and second locations is fully or partially acoustically occluded.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:
(2) FIG. 1A schematically shows first and second acoustic environments according to an aspect of the present disclosure and first exemplary first and second locations of first and second input transducers of a hearing device according to an embodiment of the present disclosure,
(3) FIG. 1B schematically shows second exemplary first and second locations of first and second input transducers of a hearing device according to an embodiment of the present disclosure,
(4) FIG. 1C schematically shows third exemplary first and second locations of first and second input transducers of a hearing device according to an embodiment of the present disclosure,
(5) FIG. 1D schematically shows fourth exemplary first and second locations of first and second input transducers of a hearing device according to an embodiment of the present disclosure, and
(6) FIG. 1E schematically shows fifth exemplary first and second locations of first and second input transducers of a hearing device according to an embodiment of the present disclosure,
(7) FIG. 2A schematically shows a first embodiment of an earpiece constituting or forming part of a hearing device according to the present disclosure, e.g. a headset or a hearing aid, configured to be located, at least partially, at or in an ear canal of a user, and
(8) FIG. 2B schematically shows a second embodiment of an earpiece constituting or forming part of a hearing device according to the present disclosure, e.g. a headset or a hearing aid, configured to be located, at least partially, at or in an ear canal of a user,
(9) FIG. 3 schematically shows an embodiment of a hearing device, e.g. a headset or a hearing aid, according to the present disclosure, the hearing device comprising an earpiece adapted to be worn in an ear canal of a user,
(10) FIG. 4A schematically shows a first embodiment of a hearing device according to the present disclosure, the hearing device comprising an earpiece comprising 1.sup.st and 2.sup.nd microphones adapted to be located in an ear canal of a user;
(11) FIG. 4B schematically shows a second embodiment of a hearing device according to the present disclosure, the hearing device comprising an earpiece comprising 1.sup.st and 2.sup.nd microphones adapted to be located in an ear canal of a user, the earpiece comprising a guiding or sealing element;
(12) FIG. 4C schematically shows a third embodiment of a hearing device according to the present disclosure, the hearing device comprising an earpiece comprising 1.sup.st and 2.sup.nd microphones, the earpiece being adapted to be located in an ear canal of a user, and the hearing device further comprising a (third) microphone located outside the ear canal (e.g. in concha);
(13) FIG. 4D schematically fourth a second embodiment of a hearing device according to the present disclosure, the hearing device comprising an earpiece comprising a 1.sup.st microphone, the earpiece being adapted to be located in an ear canal of a user, and the hearing device further comprising a 2.sup.nd microphone located outside the ear canal (e.g. in concha);
(14) FIG. 4E schematically fourth a second embodiment of a hearing device according to the present disclosure, the hearing device comprising an earpiece comprising a 1.sup.st microphone, the earpiece being adapted to be located in an ear canal of a user, and the hearing device further comprising a 2.sup.nd microphone located outside the ear canal (e.g. outside concha), e.g. on a boom arm, e.g. extending in a direction of the user's mouth,
(15) FIG. 5A shows a first embodiment of a microphone path of a hearing device from an input unit to a transmitter for providing an estimate of an own voice of a user wearing the hearing device and transmitting the estimate to another device or system, and
(16) FIG. 5B shows a second embodiment of a microphone path of a hearing device from an input unit to a transmitter for providing an estimate of an own voice of a user wearing the hearing device and transmitting the estimate to another device or system,
(17) FIG. 6 shows an embodiment of a headset or a hearing aid comprising own voice estimation and the option of transmitting the own voice estimate to another device, and to receive sound from another device for presentation to the user via a loudspeaker, e.g. mixed with sound from the environment of the user,
(18) FIG. 7A shows an embodiment of an adaptive beamformer filtering unit for providing a beamformed signal based on two microphone inputs,
(19) FIG. 7B an adaptive (own voice) beamformer configuration, comprising an omnidirectional beamformer and a target cancelling beamformer, respectively, and, based on smoothed versions thereof, the adaptation factor β(k) is determined, and
(20) FIG. 7C shows an embodiment of an own voice beamformer including a post filter, e.g. for the telephone or headset mode illustrated in FIG. 6,
(21) FIG. 8A shows a top view of an embodiment of a hearing system comprising first and second hearing devices integrated with a spectacle frame,
(22) FIG. 8B shows a front view of the embodiment in FIG. 8A, and
(23) FIG. 8C shows a side view of the embodiment in FIG. 8A,
(24) FIG. 9 shows an embodiment of a hearing aid according to the present disclosure, and
(25) FIG. 10 shows an embodiment of a headset according to the present disclosure.
(26) The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.
(27) Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
DETAILED DESCRIPTION OF EMBODIMENTS
(28) The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.
(29) The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
(30) The disclosure relates to hearing devices, e.g. headsets or headphones or hearing aids ear protection devices or combinations thereof, in particular to the pick up of a user's own voice. In the present context, a ‘target signal’ is generally (unless otherwise stated) the user's own voice.
(31) In the present application, an own voice capturing system that captures the voice of the user and transfers it to an application (e.g. locally in the hearing device or to in external device or system) is provided. The capturing is achieved by using at least two input transducers, e.g. microphones. The conventional use of the at least two microphones is to use spatial filtering (e.g. beamforming) or source separation (e.g. BSS) on the external sounds from the environment in order to separate unwanted acoustical signals (‘noise’) from wanted acoustical signals. In a ‘normal mode’ hearing aid application, target signals are typically arriving from the frontal direction (e.g. to pick up the voice of a communication partner). In a headset application (or in a hearing aid with a telephone mode or a voice interface), target signals are typically arriving from a direction towards the mouth of the user (to pick up the user's own voice).
(32) By placing input transducers (e.g. microphones) of the hearing device in or at the ear canal of the user of the hearing device and e.g. by (partially) sealing the ear canal to the outside offers some interesting opportunities, e.g. for own voice estimation. The input transducers (e.g. microphones) inside the ear canal will pick up own voice signals (OV). The quality of the signal (OV) will depend primarily of the seal of the ear canal. The present application provides a combination of in-ear input transducers (e.g. microphones or vibration sensors) with standard input transducers (e.g. microphones) located outside the (possibly) sealed off part of the ear canal, e.g. completely outside the ear canal (e.g. at or in or behind pinna or further towards the user's mouth). The use of binaural in-ear microphones may also improve signal quality. The two types of locations of the input transducers provide wanted acoustical signals (own voice) that are highly correlated. In the sealed use case, the two types of input transducers (e.g. microphones, or an (external) microphone and an (internal) vibration sensor) also provide noise signals that tend to be uncorrelated.
(33) According to the present disclosure, an estimate of the user's own voice is provided from a linear combination of signals created by input transducers located in different acoustic environments (e.g. relying on bone conduction and air conduction, respectively). The possible symmetry of binaural detection (with regard to the location of the mouth) using input transducers at both ears of the user could greatly aid the quality of own-voice estimation. The environmental noise (unwanted noise) will not exhibit these symmetries. Hence an algorithm may distinguish wanted from unwanted acoustical signals by investigating correlation between the two sources, experienced by input transducers located in two different acoustic environments e.g. located outside and inside an ear canal of the user, e.g. outside and inside a seal of the ear canal. The present disclosure may e.g. rely on standard beamforming procedures, e.g. the MVDR formalism, to determine linear filters or beamformer weights to extract the user's voice from the electric input signals.
(34) The hearing device comprises at least two (first and second) input transducers (e.g. microphones or vibration sensors) located at or in or near an ear of the user. The first and/or second input transducers may be located at or in an ear canal of the user, or elsewhere on the head of the user. The first and second input transducers provide first and second electric input signals, respectively. The following FIG. 1A-1E illustrate a number of exemplary first and second locations of the first and second input transducers, respectively. The first and second locations of the first and second input transducers, when the hearing device is (operationally) located at the ear of the user, are achieved by appropriate adaptation of the hearing device (considering the form and dimensions of the human ear, e.g. specifically adapted to the user's ear). The first and second locations may be selected (and the hearing device specifically adapted) to provide that the first and second input transducers experience first and second different acoustic environments, when the hearing device is mounted on the user. The first and second electric input signals may advantageously be used in combination to provide an estimate of the user's own voice (e.g. based on correlation between, and/or filtering and subsequent summation of the first and second electric input signals).
(35) In the embodiments of FIG. 1A-1E, the transducers from acoustic sound to electric signals representing the sound are denoted as ‘input transducers’. The input transducers may e.g. be embodied in microphones or vibration sensors depending on the application, e.g. one being a microphone (e.g. the second input transducer), the other being a vibration sensor (e.g. the first input transducer), or both of the first and second input transducers being microphones. The microphones may e.g. be omni-directional microphones. Directional microphones may, however, be used depending on the application (e.g. the second input transducer may be a directional microphone having a preferred direction towards the user's mouth, when the hearing devices is worn by the user). A vibration sensor may e.g. comprise an accelerometer. It may be beneficial that a vibration sensor is located so that it is in direct or indirect contact with the skin (in the soft or bony part) of the ear canal (or elsewhere on the head of the user).
(36) In the embodiments of FIG. 1A-1E, only two input transducers are shown. This is the minimum number, but it is not intended to (necessarily) limit the number of input transducers to two. Other embodiments may exhibit three or more input transducers. The additional (one or more) input transducers may be located in the first acoustic environment or in the second acoustic environment. However, one or more additional input transducers may be located in both acoustic environments (e.g. one in the first and one in the second acoustic environment, etc.). For example, it may be advantageous (e.g. for a headset application) to include a number of further input transducers (e.g. microphones) in a direction towards the user's mouth, e.g. as a linear array of microphones located on an earpiece or a separate carrier (e.g. to increase the (own voice) SNR experienced by such input transducers). It may be further advantageous to include a number of additional input transducers (e.g. microphones) in the ear canal. Additional microphones in the ear canal may be used to estimate an ear canal geometry and/or to detect possible leaks of sound from the ear canal. Further, an improved calibration of beamformers, e.g. of an own voice beamformer, e.g. to provide personalized linear filters or beamformer weights, can be supported by microphones located in the ear canal.
(37) In the embodiments of FIG. 1A-1E, a ‘Transition region’ between the first and second acoustic environments is indicated by a solid ‘line’. The transition region may e.g. be implemented by creating a minimum distance in the ear canal (e.g. ≥5 mm, or ≥10 mm, or ≥20 mm, e.g. in the region between 5 mm and 25 mm, e.g. between 10 mm and 20 mm), to thereby change the acoustic conditions of an acoustic signal impinging on an input transducer located on each side of the transition region (e.g. its directional properties, and or its spectral properties, and/or its SNR). The transition region may e.g. be implemented by an object which fully or partially occludes the ear canal, e.g. an ITE-part (e.g. an earpiece). The object may e.g. comprise a sealing element. The sealing element may be partially open (i.e. e.g. comprise one or more openings allowing a certain exchange of air and sound with the environment, e.g. to decrease a sense of occlusion by the user).
(38) FIG. 1A schematically shows an ear canal (‘Ear canal’) of the user wearing a hearing device. The hearing device is not shown in FIG. 1A (see instead FIG. 2A, 2B). For simplicity, the ear canal is shown as straight cylindrical opening in pinna from the environment to the eardrum (‘Eardrum’). In reality, the ear canal may have a non-cylindrical extension and exhibiting a varying cross-section (and may have a curved extension between its opening and the eardrum). The walls of the first, relatively soft (‘fleshy’), part of the ear canal (closest to the ear canal opening) are denoted ‘Skin/tissue’ in FIGS. 1A-1E (and 2A, 2B), whereas the walls of the relatively hard part of the ear canal are denoted ‘bony part’ in FIGS. 1A-1E (and 2A, 2B). The vertical parts of the outer ear (pinna or auricle) denoted ‘Skin/tissue/bone’ in FIG. 1A-1E define the ear canal opening (‘aperture’, e.g. visualized by (virtually) connecting opposite parts of the vertical outer walls close to the opening). The bony parts of the outer ear close to the ear canal opening (e.g. close to tragus) may serve as a location for an input transducer (e.g. a vibration sensor) configured to pick up bone-conducted sound).
(39) An ear canal opening may be used as a reference point for the location of the input transducers (e.g. microphones) of the hearing device, e.g. the first input transducer may be located on the internal side of the ear canal opening (and or on a bony part of the head), termed ‘a 1.sup.st acoustic environment’ in FIG. 1A. The 1.sup.st acoustic environment (indicated by a cross-hatched filling) may be characterized with its availability of the user's own voice in a bone conducted version (that may be spectrally distorted; e.g. above a threshold frequency, e.g. 2 kHz-3 kHz), cf. indication in FIG. 1A ‘Own voice (bone conducted)’ next to a dashed arrow denoted ‘Direction towards mouth (Own voice)’. The 2.sup.nd input transducer may be located on the external side (FIG. 1A) or on the internal side (FIG. 1B) of the ear canal opening, but further towards the environment than the first input transducer. The 2.sup.nd acoustic environment (indicated by a quadratically hatched filling) may be characterized with its availability of the user's own voice in an air-borne version (that is spectrally (substantially) undistorted, or at least less spectrally distorted then in the 1.sup.st acoustic environment), cf. indication in FIG. 1A ‘Own voice (air borne)’ next to a dashed arrow denoted ‘Direction towards mouth (Own voice)’. The 2.sup.nd acoustic environment may be extended to the volume around the ear where an air borne version of the user's own voice can be received with a level above a threshold level (or an SNR above a threshold SNR). In the present context, ‘internal side’ is taken to mean towards the eardrum, and ‘external side’ is taken to mean towards the environment, as seen from the ear canal opening (e.g. from a reference point thereon), see e.g. FIG. 1A. The first and second input transducers may both be located in the ear canal (i.e. on the internal side of the ear canal opening), cf. e.g. FIG. 1B, FIG. 4A, 4B, 4C. Such location may benefit from a good sealing between the first and second acoustic environments.
(40) The ear canal opening is in the present context taken to be defined by (e.g. a center point of) a typically oval cross section where the ear canal joins the outer ear (pinna), cf. e.g. FIG. 1A-1E.
(41) FIG. 1B shows a further exemplary configuration of locations of first and second transducers in or around an ear canal of the user. The configuration of FIG. 1B is similar to the configuration of FIG. 1A apart from the fact that the location of the second transducers is shifted further towards the ear drum, to be located just inside the ear canal opening. Thereby an earpiece located fully in the ear canal (se. e.g. FIG. 3) can be implemented, while still maintaining the advantages of the respective first and second acoustic environments. To provide optimal own voice estimation according to the present disclosure, this location of the second input transducer may benefit from a sealing between the first and second acoustic environments, e.g. using a sealing element around the earpiece housing the make tight fit to the walls of the ear canal, see e.g. FIG. 4B.
(42) FIG. 1C shows a further exemplary configuration of locations of first and second transducers in or around an ear canal of the user. The configuration of FIG. 1C is similar to the configuration of FIG. 1A apart from the fact that the location of the second transducers is shifted towards the mouth of the user, so that the location (‘2.sup.nd location)’ of the second input transducer is outside the ear canal, in the ear (pinna), e.g. near tragus of antitragus. This has the advantage that the first and second acoustic environments can be fully exploited. The second input transducer (e.g. a microphone) is located closer to the user's mouth and will be exposed to an improved SNR for air-borne reception of the user's own voice. The second input transducer may alternatively be located elsewhere in pinna (e.g. in the upper part of concha, or at the top of pinna, such as e.g. in a BTE-part of a hearing device, such as a hearing aid).
(43) FIG. 1D shows a further exemplary configuration of locations of first and second transducers in or around an ear of the user. The configuration of FIG. 1D is similar to the configuration of FIG. 1C apart from the fact that the location of the first input transducer (IT1) is outside the ear canal, located at or behind pinna (or elsewhere), in contact with bone of the skull, e.g. the mastoid bone. The first input transducer (IT1) may preferably be implemented as a vibration sensor to fully exploit the advantages of bone conduction (e.g. originating from the user's mouth and comprising at least a spectral part of the user's own voice).
(44) FIG. 1E shows a further exemplary configuration of first and second transducers in first and second acoustic environments around an ear of the user wearing the hearing device. The configuration of FIG. 1E is similar to the configuration of FIG. 1A apart from the fact that both transducers are shifted further towards the environment. The first input transducer (IT1) is located in the ear canal (‘1.sup.st location’ in a ‘1.sup.st acoustic environment’) a distance L(IT1) from the ear canal opening. The second input transducer (IT2) is located outside the ear canal (‘2.sup.nd location’, in a ‘2.sup.nd acoustic environment’) a distance L(IT2) from the ear canal opening. The distances L(IT1) and L(IT2) may be different. L(IT1) may be larger than L(IT2). The distances L(IT1) and L(IT2) may, however, be essentially equal, each being e.g. in the range between 5 mm and 15 mm, e.g. between 5 mm and 10 mm. This configuration may have the advantage that the second input transducer, e.g. a microphone, is located (just) outside the ear canal to fully provide the benefit of air-borne sound (incl. from the user's mouth), while also getting the benefits of the acoustical properties of the ear (pinna). Further, the location of the first input transducer (e.g. a microphone) just inside the opening of the ear canal has the advantage of avoiding an earpiece that extends deep into the ear canal (a shallow construction), while still having the benefit of a the first acoustic environment (providing an own voice signal with a good SNR).
(45) It is the intention that the configurations of FIG. 1A-1E can be provided with extra input transducers located at other relevant positions inside or outside the ear canal. It is further the intention that the exemplary configurations can be mixed where appropriate (e.g. so that the configuration comprises a vibration senor located at the mastoid bone, as well as a microphone in the 1.sup.st acoustic environment of the ear canal).
(46) FIGS. 2A and 2B illustrate respective first and second embodiments of an earpiece constituting or forming part of a hearing device according to the present disclosure, e.g. a headset or a hearing aid, configured to be located, at least partially, at or in an ear canal of a user.
(47) The embodiments of a hearing device (HD) illustrated in FIG. 2A and FIG. 2B each comprises first and second microphones (M1, M2), a loudspeaker (SPK), a wireless transceiver (comprising receiver (Rx) and transmitter (Tx)) and a processor (PRO). The processor (PRO) may be connected to the first and second microphones, to the loudspeaker and to the transceiver (Rx, Tx). The processor (PRO) may be configured to (at least in a specific communication mode of operation) generate an estimate of the user's own voice (signal ‘To’) based on the first and second electric input signals from the first and second microphones (M1, M2), and to feed it to the transmitter (Tx) for transmission to another device or application. The processor may thus e.g. comprise a noise reduction system comprising a beamformer (e.g. an MVDR beamformer) for estimating the user's own voice in dependence of the first and second (and possibly more) electric input signals. The processor (PRO) may further be configured to (at least in a specific communication mode of operation) (possibly process and) feed a signal (‘From’) received from another device or application via the receiver (Rx) to the loudspeaker (SPK) for presentation to the user of the hearing device.
(48) In the embodiments of FIGS. 2A and 2B, the first microphone (M1) is located in an earpiece or ITE-part (denoted HD in FIG. 2) (constituting or forming part of the hearing device) adapted for reaching at least partially into the ear canal (‘Ear canal’) of the user. The location of the first microphone in the earpiece may (in principle) be (at least partially) open for sound propagation from or towards the environment. However, in the embodiment of FIGS. 2A and 2B, the location of the first microphone (M1) in the earpiece is (at least partially) closed (e.g. sealed) for sound propagation from or towards the environment (cf. ‘Environment sound’). The earpiece (HD) may comprise a sealing element (‘Seal’) and a guiding element (‘Guide’, FIG. 2A). The sealing element is intended to make a tight fit (seal) of the housing of the earpiece to the walls of the ear canal. Thereby a volume between the earpiece and the eardrum (‘Eardrum’), termed the residual volume (‘residual volume’) is at least partially sealed from the environment (outside the ear canal (‘Ear canal’)). This volume is (in the embodiments of FIG. 2A, 2B) termed the ‘1.sup.st acoustic environment’ (cf. also FIG. 1A-1E). The part of the ear piece facing the ear drum may comprise a ventilation channel (‘Vent’) having an opening in the housing of the earpiece (‘Vent opening’) located in the housing closer to the ear canal opening than the sealing element (‘Seal’) allowing a limited exchange of air (and sound) between the residual volume and the environment to thereby reduce the (annoying) sensation by the user of occlusion. The seal may be located closer to the eardrum than the seal, if the seal allows some exchange of air and sound with the environment (or if other parts of the construction allows such exchange).
(49) In FIG. 2A, the (optional) guiding element (‘Guide’), may be configured to guide the earpiece (e.g. in collaboration with the sealing element) so that it can be inserted into the ear canal in a controlled manner, e.g. so that it is centered along a central axis of the ear canal. The guiding element may be made of a flexible material allowing a certain adaptation to variations in the ear canal cross section. The guiding element may comprise one or more openings allowing air (and sound) to pass it. The guiding element (as well as the seal) may be made of a relatively rigid material.
(50) The loudspeaker (SPK) is located in the earpiece (HD) to play sound towards the eardrum into the residual volume (‘Ear canal (residual volume)’). A loudspeaker outlet (‘SPK outlet’) directs the sound towards the eardrum. Instead of (or in addition to the loudspeaker), the hearing device (HD) may comprise a vibrator for transferring stimuli as vibrations of skull-bone or a multi-electrode array for electric stimulation of the hearing nerve.
(51) In the embodiments of FIGS. 2A and 2B, the first microphone (M1) is located in a loudspeaker outlet (‘SPK outlet’) and is configured to pick up sound from the 1.sup.st acoustic environment (including the residual volume), e.g. provided to the residual volume as bone conducted sound, e.g. from the user's mouth (own voice). In the embodiments of FIGS. 2A and 2B, the loudspeaker is located between the first and second microphones.
(52) The first microphone (M1) may be substituted by a vibration sensor e.g. located at the same position as the first microphone, or in direct or indirect contact with the skin in the soft or bony part of the ear canal (the vibration sensor, e.g. comprising an accelerometer, being particularly adapted to pick up bone conducted sound). In another embodiment, the first microphone (M1) may be substituted (or supplemented) by a vibration sensor located outside the ear canal at a location suited to pick up bone conducted sound from the user's mouth, e.g. at an ear of the user in a mastoid part of the temporal bone, or e.g. near the bony part of the ear canal, cf. e.g. FIG. 1D.
(53) In the embodiment of FIG. 2A, the second microphone (M2) is located in the earpiece (HD) near (just outside) the opening of the ear canal (‘Ear canal opening’), e.g. so that the directional cues and filtering effects of the outer ear (pinna) are substantially maintained (e.g. more than 50% maintained), and so that the user's own voice is received (mainly) as air conducted sound (and so that its frequency spectrum is substantially undistorted). A location exhibiting the mentioned properties is denoted a ‘second acoustic environment’ (different from the ‘first acoustic environment’). In the embodiment of FIG. 2A, the second microphone is located so that it faces the environment outside the ear canal, e.g. in a microphone inlet (‘M2 inlet’). In the embodiment of FIG. 2B, the first and second microphones (M1, M2) (and the loudspeaker (SPK) located therebetween) of FIG. 2A are moved outwards away from the eardrum in a direction towards the environment), as also illustrated and discussed in connection with FIG. 1E. However, in the embodiment of FIG. 2B, the ‘second’ microphone (M2, aimed at receiving a good quality, air-borne own voice signal is moved to the bottom surface of the outer part of the earpiece (and the location of the second microphone in FIG. 2A is ‘occupied’ by a ‘an additional, third microphone, M3).
(54) The embodiment of a hearing device shown in FIG. 2B comprises the same elements as the embodiment of FIG. 2A. In FIG. 2B, the earpiece has an external part that has a larger cross section than the ear canal (opening). The earpiece is still configured to be partially inserted into the ear canal (but not as deeply as the embodiment of FIG. 2A). The external part comprises partly open sealing elements (‘(open) Seal’, indicated by ‘zebra-stripes’) adapted to contact the users skin around (and in) the ear canal opening to make a comfortable and partially open fitting to the user's ear. The part of the earpiece adapted to extend into the ear canal when worn by the user comprises another sealing element (‘Seal’, indicated by black filling) adapted to make a tight(er) fit (and to guide the ear piece in the ear canal). In addition to the first and second microphones (M1, M2), the earpiece comprises third and fourth microphones (M3, M4) located near the outer surface of the earpiece facing the environment. The third and fourth microphones may be used for picking up sound from the (far-field) acoustic environment of the user (particularly relevant for a hearing aid application). The hearing device, e.g. the processor (PRO) may comprise one or more beamformers each providing a spatially filtered signal by filtering and summing at least two of the first, second, third and fourth electric input signals, wherein one of the beamformers is an own voice beamformer and wherein the spatially filtered signal comprises an estimate of the user's own voice. Another beamformer may be aimed at a target or noise signal in the environment (e.g. in a particular mode of operation), e.g. aimed at cancelling such target or noise signal or at maintaining such target signal (e.g. from a communication partner in the environment). By having microphone inlets, the microphones, although inherently omni-directional, the resulting microphone signal exhibit a degree of directionality. In particular, the second microphone M2 configured to pick up the user's own voice has the advantage of being directed towards the user's mouth.
(55) In an embodiment, the earpiece has only two microphones (M1, M2), e.g. located as outlined in FIG. 1E.
(56) The second microphone (M2) may in another embodiment be located in the ear canal away from its opening (‘Ear canal opening’) in a direction towards the eardrum, e.g. confined to the soft (non-bony) part of the ear canal, e.g. less than 10 mm from the opening (cf. e.g. FIG. 4A, 4B, 4C).
(57) In general, the second microphone (M2) may be located a distance away from the first microphone (M1), e.g. in the same physical part of the hearing device (e.g. an earpiece) as the first microphone (as e.g. shown in FIG. 2A, 2B, and FIG. 3), e.g. so that the first and second microphones are located on a line parallel to a ‘longitudinal direction of the ear canal’ (cf. e.g. FIG. 1A, 1B, 1E, 2A, 2B). The second microphone (M2) may, however, be located in an ATE part (ATE=At the ear) separate from the earpiece. The ATE part may be adapted to be located outside the ear canal, e.g. in concha (cf. e.g. FIG. 1C, 1D, 4C, 4D), or at or behind Pinna or elsewhere at or around the ear (pinna), e.g. on a boom arm reaching towards the mouth of the user (e.g. FIG. 4E), when the hearing device is mounted (ready for normal operation) on the user.
(58) The hearing device of FIG. 2A, 2B may represent a headset as well as a hearing aid.
(59) The distance between the first and second input transducers, e.g. microphones (M1, M2), may be in the range from 5 mm to 100 mm, such as between 10 mm and 50 mm, or between 10 mm and 30 mm.
(60) The hearing device (HD) may comprise three or more input transducers, e.g. microphones, e.g. one or more located on a boom arm pointing towards the user's mouth (such microphone(s) being e.g. located in the 2.sup.nd acoustic environment). Two of the at least three microphones may be located around and just outside, respectively, the ear canal opening, e.g. 10-20 mm outside (in the 2.sup.nd acoustic environment). Two of the at least three microphones may e.g. be located in the ear canal relatively close to the ear drum, e.g. in the 1.sup.st or 2.sup.nd acoustic environment.
(61) The first microphone may be located at or in the ear canal. The first microphone may be located closer to the ear drum than the second microphone. The second microphone may be located closer to the ear drum than a third microphone, etc.
(62) The first and second microphones may be located at or in the ear canal of the user so that they experience first and second acoustic environments, wherein the first and second acoustic environments are at least partially acoustically isolated from each other when the user wears the hearing device, e.g. a headset. In the below table, internal and external may refer to first and second, respectively.
(63) Properties (in a relative sense) of the first (‘internal’) and second (‘external’) input transducers
(64) TABLE-US-00001 Spectral shape (‘coloring’) SNR Noise Internal (1.sup.st) mic. − + + (point-like) External (2.sup.nd) mic. + − − (diffuse)
(65) The first (internal) input transducer signal has the advantage of a good SNR (some of the noise from the environment has been filtered out by the directional properties of the outer ear and head and possibly torso), and the noise source (cf. ‘Noise’ in the table) will hence be more localized (point like), which facilitates its attenuation by a null (or minimum) of the beamformer in the direction away from the ear (e.g. perpendicular to the side of the head, and definitely not in a direction of the mouth, so the chance of (accidentally) attenuating the target signal is minimal). The spectral shape (coloring) of the signal from the first input transducer may, however, depending on the actual location (depth) in the ear canal and the degree of sealing of the first input transducer be poorer (e.g. confined to lower frequencies, e.g. less than 2 or 3 kHz) and thus sounding un-natural, if listened to. The first electric input signal from the first (internal) input transducer may experience a boost in dependence on leakage and residual volume. This boost is therefore difficult to “calibrate”.
(66) The second (‘external’ (or ‘less internal’)) input transducer signal has the advantage of a good spectral shape that makes it more pleasant for a (far end listener) to listen to, but it has the downside of being ‘polluted’ by noise from the environment (which may be at least partially removed by spatial filtering (beamforming) and optionally post-filtering). But compared to the first input transducer, the second input transducer may experience a more diffuse noise distribution.
(67) The hearing device may preferably comprise a beamformer, e.g. an MVDR beamformer, configured to provide an estimate of the user's voice based on beamformer weights applied to the first and second electric input signals. A property of an MVDR beamformer is that it will always provide a beamformed signal that exhibits an SNR that is larger than or equal to any of the input signals (it does not destroy SNR). In the present case, the ‘external’ (second) input transducer may preferably be the reference microphone for which a ‘distortionless response’ is provided by the MVDR-beamformer.
(68) The filter weights (w) of the MVDR-beamformer may be adaptively determined. Typically, the noise field (e.g. represented by a noise covariance matrix C.sub.v) is updated during speech pauses of the user (no OWN-voice), or speech pauses in general (no voice). The transfer functions d.sub.ov,i from the user's mouth to each of the at least two microphones (i=1, . . . , M, M≥2) may be determined in advance of use of the hearing device or be adaptively determined during use (e.g. when the hearing device is powered up or repeatedly during use), when the user's own voice is present (and preferably when the noise level is below a threshold value). The transfer functions d.sub.ov,i from the user's mouth to each of the at least two microphones (i=1, . . . , M, M≥2) may be represented by a look vector d.sub.ov=(d.sub.ov,1 . . . , d.sub.ov,M).sup.T, where superscript T indicates transposition.
(69) In case the first input transducer is in acoustic communication with the environment, the MVDR-beamformer may rely on a predetermined look vector (e.g. determined in advance of use of the hearing device). In case the first input transducer is occluded (substantially (acoustically) sealed off from the environment), the look vector of the MVDR-beamformer may be adaptively updated.
(70) FIG. 3 shows an embodiment of a hearing device, e.g. a headset or a hearing aid, according to the present disclosure. The hearing device (HD) of FIG. 3 comprises or is constituted by an earpiece configured to be inserted into an ear canal of a user. The hearing device comprises three microphones (M1, M2, M3), a loudspeaker (SPK), a processor (PRO) and first and second beamformers (OV-BF, ENV-BF) e.g. for, respectively, providing an estimate of the user's voice and optionally an estimate of a sound signal from the environment, e.g. a target speaker, respectively (e.g. activated in two different modes of operation). The hearing device (HD) may further comprise respective transmitters (Tx) and receivers (Rx) for transmitting the estimate of the user's voice (OV.sub.est) to another device and for receiving a signal representative of sound (FEV) from another device, respectively. The first microphone (M1) is located in the earpiece at an eardrum-facing surface suitable for picking up sound from the residual volume (‘Residual volume’). The second and third microphones (M2, M3) are located in the earpiece at an environment-facing surface suitable for picking up sound from the environment. The own voice beamformer (OV-BF) is configured to provide the (spatially filtered) estimate of the user's own voice, e.g. based on the three electric input signals from the three microphones (M1, M2, M3), or at least from M1, M2. The environment beamformer (ENV-BF) is e.g. configured to provide the estimate of sound from the environment based on the second and third microphones (M2, M3). The earpiece of the hearing device (HD) of FIG. 3 is shown to follow the (schematic) form of the ear canal of the user (e.g. due to customization of the earpiece). Thereby an improved estimate of the user's own voice may be provided. The earpiece may comprise a ventilation channel (e.g. an (electrically) controllable ventilation channel).
(71) FIG. 4A-4E shows embodiments of a hearing device HD, e.g. a hearing aid or a headset, or an ITE-part (earpiece) thereof, in the context of own voice estimation. Only the input transducers are shown in the ITE-part of the hearing device of FIG. 4A-4E to focus on their number and location, while other components of the hearing device are implicit, e.g. located in other parts of the hearing device, e.g. a BTE-part (see e.g. FIG. 9). The electric input signals provided by the shown microphones are assumed to be used as inputs to a beamformer (e.g. an MVDR beamformer) for providing the estimate of the user's own voice. An example of a block diagram of such own voice beamformer is shown in FIG. 7C. The possible symmetry of binaural in-ear microphones (i.e. microphones located at or in left and right ears, respectively) may improve the quality of the own voice estimate.
(72) The hearing device of FIG. 4A comprises first and second microphones (M1, M2). The first microphone is located in the earpiece closer to the ear drum (‘eardrum’) than the second microphone (M2). The earpiece is partially occluding the ear canal thereby creating a separation between first and second acoustic environments for the first and second microphones. Thereby, the first microphone (M1) is predominantly exposed to a bone conducted version of the user's own voice, while the second microphone (M2) is predominantly exposed to an air borne version of the user's own voice.
(73) In the embodiment of FIG. 4B, the earpiece further comprises a guide or seal (Guide/sear) configured to at least partially seal a residual volume (1.sup.st acoustic environment), wherein the first microphone (M1) is located, from the environment (2.sup.nd acoustic environment), where the second microphone (M2) is located. The earpiece/ITE-part may further be customized to the ear canal of the user, e.g. to thereby increase the effect of the sealing (i.e. to minimize leakage) between housing and walls (‘Skin/tissue’) of the ear canal (‘Ear canal’). Sound from an external sound source (e.g. in the acoustic far filed of the user) is indicated by S.sub.ENV. Sound from the user's mouth is indicated by a solid arrow denoted S.sub.OV. By the seal and possible customization of the earpiece, the differences between the properties of the 1.sup.st and 2.sup.nd environments will be enhanced and a quality of the own voice estimate may be increased.
(74) In the embodiment of FIG. 4C, the hearing device comprises a third microphone (M3) compared to the first and second microphones of the embodiment of FIG. 4A or 4B. The third microphone is located in a direction towards the mouth of the user, and thus in the 2.sup.nd acoustic environment, aimed at picking up air-borne signals, including such signals from the user's mouth. FIG. 4C does not include a seal, but a seal between a housing of the ITE-part of the hearing device will improve the isolation between the 1.sup.st and 2.sup.nd environments (cf. structure ‘Guide/seal’ in FIG. 4B or ‘Guide’, ‘Seal’ in FIG. 2A). The same can be said of the embodiment of FIG. 4D. Dependent on the sealing effect of the haring device, the first microphone M1 facing the eardrum has significantly higher SNR compared to the second and third microphones M2, M3 facing the environment.
(75) The embodiment of FIG. 4D is equal to the embodiment of FIG. 4C except that it only contains two microphones (M1, M2). In the embodiment of FIG. 4D, the second microphone (M2) is located in a direction towards the mouth of the user (at the location of the additional third microphone of the embodiment of FIG. 4C). Again, the second microphone (M2) is located in a 2.sup.nd acoustic environment where it will predominantly receive air conducted sound (including air-conducted sound from the user's mouth).
(76) The embodiment of FIG. 4E is equal to the embodiment of FIG. 4D except that the second microphone (M2) is located outside the outer ear (pinna), e.g. on a boom arm directed towards the mouth of the user (thereby—other things being equal—increasing the SNR of the (own voice) signal received by the microphone. Again, the second microphone (M2) is located in a 2.sup.nd acoustic environment, where it will predominantly receive air conducted sound (including air-conducted sound from the user's mouth).
(77) FIGS. 5A and 5B schematically illustrate respective first and second embodiments of a microphone path of a hearing device from an input unit to a transmitter for providing an estimate of an own voice of a user wearing the hearing device and transmitting the estimate to another device or system.
(78) Now referring to FIG. 5A, which illustrates an embodiment of a part of a hearing device comprising a directional system according to the present disclosure. The hearing device (HD) is configured to be located at or in an ear of a user, e.g. fully or partially in an ear canal of the user. The hearing device comprises an input unit IU comprising a multitude (N) of input transducers (M1, . . . , MN) (here microphones) for providing respective electric input signals (IN1, IN2, . . . , INN) representing sound in an environment of the user. The hearing device further comprises a transmitter (Tx) for wireless communication with an external device (AD), e.g. a telephone or other communication device. The hearing device further comprises a spatial filter or beamformer (w1, w2, . . . , wN, CU) connected to the input unit IU configured to provide a spatially filtered output signal Y.sub.OV based on the multitude of electric input signals and configurable beamformer weights w1p, w2p, wNp, where p is a beamformer weight set index. The spatial filter comprises weighting units w1, w2, wN, e.g. multiplication units, each being adapted to apply respective beamformer weights w1p, w2p, wNp (from the p.sup.th set of beamformer weights) to the respective electric input signals IN1, IN2, . . . , INN and to provide respective weighted input signals Y.sub.1, Y.sub.2, Y.sub.N. The weighting units w1, w2, . . . , wN, may in an embodiment e.g. be implemented as linear filters in the time domain. The spatial filter further comprises a combination unit CU, e.g. a summation unit, for combining the weighted (or linearly filtered) input signals to one or more spatially filtered signals, here one, the beamformed signal Y.sub.OV comprising an estimate of the user's own voice, which is fed to the transmitter Tx for transmission to another device or system (e.g. to a telephone of a network device (AD) via a wireless link (WL)). In the embodiment of FIG. 5A, the beamformed signal Y.sub.OV is fed to an optional processor (PRO), e.g. for applying one or more processing algorithms e.g. further noise reduction, to the beamformed signal Y.sub.OV from the spatial filter/beamformer) before the processed signal OUT is forwarded to the transmitter (Tx).
(79) The hearing device (HD), e.g. the beamformer, further comprises a spatial filter controller SCU configured to apply at least a first set (p=1) of beamformer weights (w1p, w2p, . . . , wNp) (or linear filters, e.g. FIR-filters) to the multitude of electric input signals (IN1, IN2, . . . , INN). The first set of beamformer weights (p=1) (or linear filters) is applied to provide spatial filtering of an external sound field (e.g. from a sound source located at the user's mouth), cf. signals (Y.sub.1, Y.sub.2, . . . , Y.sub.N). The hearing device further comprises a memory MEM accessible from the spatial filter controller SCU. The spatial filter controller SCU is configured to adaptively select an appropriate set of beamformer weights (signal wip) (or linear filters) among two or more sets (p=1, 2, . . . ) of beamformer weights (or linear filters) stored in the memory (including the first set of beamformer weights (or linear filters)). At a given point in time, an appropriate set of beamformer weights (or linear filters) may e.g. be selected from sets of different beamformer weights (or linear filter coefficients) stored in the memory or such appropriate (updated) beamformer weights (or linear filters) may be adaptively determined, e.g. dependent of a change in source location (e.g. in a case where the user's own voice is NOT of interest). The beamformer weights (or filter coefficients of linear filters, e.g. FIR-filters) may be determined by any method known in the art, e.g. using the MVDR procedure.
(80) The part of a hearing device illustrated in FIG. 5A may implement a microphone path from input transducer to wireless transceiver of a normal headset or of a hearing aid in a specific communication mode of operation (e.g. a telephone mode). The hearing device may of course additionally comprise an output unit comprising an output transducer, e.g. a loudspeaker for presenting stimuli perceivable as sound to the user of the hearing device, either e.g. in the form of voice from a remote communication partner received via a wireless receiver and/or sound from the environment of the user picked up by input transducers of the hearing device. The same can be said of the embodiment of FIG. 5B. The microphone path may be provided in the time domain or in the frequency domain (here termed ‘time-frequency domain’ to indicate that the frequency spectra are (typically) time variant)).
(81) The embodiment of FIG. 5B is similar to the embodiment of FIG. 5A but exhibits the following differences. The input unit (IU) of the hearing device of FIG. comprises two input transducers in the form of microphones (M1, M2) and two analysis filter banks (FB-A1, FB-A2) for providing the respective electric input signals (IN1, IN2) as frequency sub-band signals X.sub.1, X.sub.2 in a time-frequency representation (k,m), where k and m are frequency and time indices, respectively. Correspondingly, the beamformer receives two input signals X.sub.1, X.sub.2 in K frequency bands (k=1, . . . , K) and provides beamformer weights w1p(k), w2p(k) in K frequency bands, which are applied to the respective electric input signals X.sub.1, X.sub.2 in filter units (w1, w2). The filtered signals (Y1, Y2) are added together in the SUM unit ‘+’, (implemented as combination unit (CU) in FIG. 5A). In the embodiment of FIG. 5B, the own voice estimate Y.sub.OV from the beamformer is fed directly to a synthesis filter bank (FB-S) providing a resulting signal (OUT) as a time-domain signal. The output signal OUT comprising the own voice estimate is fed to the transmitter and sent to the external device or system (AD) via wireless link (WL) and/or a Network or the cloud. The number of frequency bands can be any larger than 2, e.g. 8 or 24 or 64, etc.
(82) FIG. 6 shows an embodiment of a headset or a hearing aid comprising own voice estimation and the option of transmitting the own voice estimate to another device, and to receive sound from another device for presentation to the user via a loudspeaker, e.g. mixed with sound from the environment of the user. FIG. 6 shows an embodiment of a hearing device (HD), e.g. a hearing aid, comprising two microphones (M1, M2) to provide electric input signals IN1, IN2 representing sound in the environment of a user wearing the hearing device. The hearing device further comprises spatial filters DIR and Own Voice DIR, each providing a spatially filtered signal (ENV and OV respectively) based on the electric input signals. The spatial filter DIR may e.g. implement a target maintaining, noise cancelling, beamformer. The spatial filter Own Voice DIR is a spatial filter according to the present disclosure. The spatial filter Own Voice DIR implements an own voice beamformer directed at the mouth of the user (its activation being e.g. controlled by an own voice presence control signal, and/or a telephone mode control signal, and/or a far-end talker presence control signal, and/or a user initiated control signal). In a specific telephone mode of operation, the user's own voice is picked up by the microphones M1, M2 and spatially filtered by the own voice beamformer of spatial filter ‘Own Voice DIR’ providing signal OV, which—optionally via own voice processor (OVP)—is fed to transmitter Tx and transmitted (by cable or wireless link to a another device or system (e.g. a telephone, cf. dashed arrow denoted ‘To phone’ and telephone symbol). In the specific telephone mode of operation, signal PHIN may be received by (wired or wireless) receiver Rx from another device or system (e.g. a telephone, as indicated by telephone symbol and dashed arrow denoted ‘From Phone’). When a far-end talker is active, signal PHIN contains speech from the far-end talker, e.g. transmitted via a telephone line (e.g. fully or partially wirelessly, but typically at least partially cable-borne). The ‘far-end’ telephone signal PHIN may be selected or mixed with the environment signal ENV from the spatial filter DIR in a combination unit (here selector/mixer SEL-MIX), and the selected or mixed signal PHENV is fed to output transducer SPK (e.g. a loudspeaker or a vibrator of a bone conduction hearing device) for presentation to the user as sound. Optionally, as shown in FIG. 6, the selected or mixed signal PHENV may be fed to processor PRO for applying one or more processing algorithms to the selected or mixed signal PHENV to provide processed signal OUT, which is then fed to the output transducer SPK. The embodiment of FIG. 6 may represent a headset, in which case the received signal PHIN may be selected for presentation to the user without mixing with an environment signal. The embodiment of FIG. 6 may represent a hearing aid, in which case the received signal PHIN may be mixed with an environment signal before presentation to the user (to allow a user to maintain a sensation of the surrounding environment; the same may of course be relevant for a headset application, depending on the use-case). Further, in a hearing aid, the processor (PRO) may be configured to compensate for a hearing impairment of the user of the hearing device (hearing aid).
(83) Example of an Own-Voice Beamformer:
(84) An adaptive (own voice) beamformer may comprise a first set of beamformers C.sub.1 and C.sub.2, wherein the adaptive beamformer filter is configured to provide a resulting directional signal (comprising an estimate of the user's own voice) Y.sub.BF(k)=C.sub.1(k)−β(k)−C.sub.2(k), where β(k) is an adaptively updated adaptation factor. This is illustrated in FIG. 7A.
(85) The beamformers C.sub.1 and C.sub.2 may comprise a beamformer C.sub.1 which is configured to leave a signal from a target direction un-altered, and an orthogonal beamformer C.sub.2 which is configured to cancel the signal from the target direction.
(86) In this case, the target direction is the direction of the user's mouth (the target sound source is equal to the user's own voice).
(87) FIG. 7A shows a part of a hearing device comprising an embodiment of an adaptive beamformer filtering unit (BFU) for providing a beamformed signal based on two microphone inputs. The hearing device comprises first and second microphones (M.sub.1, M.sub.2) providing first and second electric input signals IN.sub.1 and IN.sub.2, respectively and a beamformer providing a beamformed signal Y.sub.BF (here Y.sub.OV) based on the first and second electric input signals. A direction from the target signal to the hearing aid is e.g. defined by the microphone axis and indicated in FIG. 7A by arrow denoted Target sound. The target direction can be any direction, e.g., as here, a direction to the user's mouth (to pick up the user's own voice). An adaptive beam pattern (Y (Y(k))), for a given frequency band k, k being a frequency band index, is e.g. obtained by linearly combining an omnidirectional delay-and-sum-beamformer (C.sub.1(C.sub.1(k))) and a delay-and-subtract-beamformer (C.sub.2 (C.sub.2(k))) in that frequency band. The adaptive beam pattern arises by scaling the delay-and-subtract-beamformer (C.sub.2(k)) by a complex-valued, frequency-dependent, adaptive scaling factor β(k) (generated by beamformer BF) before subtracting it from the delay-and-sum-beamformer (C.sub.1(k)), i.e. providing the beam pattern Y,
Y(k)=C.sub.1(k)−β(k)−C.sub.2(k).
(88) It should be noted that the sign in front of β(k) might as well be +, if the sign(s) of the beamformer weights constituting the delay-and-subtract beamformer C.sub.2 are appropriately adapted. The beamformed signal Y.sub.BF is expressed as Y.sub.BF=Y.sub.OV=(w.sub.C1(k)−β(k).Math.w.sub.C2(k)).sup.H.Math.IN(k), where bold face (x) indicates a vector, e.g. IN(k)=(IN.sub.1(k), IN.sub.2(k)), in case of two electric input signals, as illustrated in FIG. 7A (in which case β(k) is a scalar, but in a general case, with more input signals, a matrix). The beamformer weights (w.sub.C1(k), w.sub.C2(k)) may be predefined and stored in a memory (MEM) of the hearing device. The beamformer weights may be updated during use, e.g. either provoked by certain events (e.g. power on), or adaptively.
(89) The beamformer (BFU) may e.g. be adapted to work optimally in situations where the microphone signals consist of a point-noise target sound source in the presence of additive noise sources. Given this situation, the scaling factor β(k) (β in FIG. 7A) is adapted to minimize the noise under the constraint that the sound impinging from the target direction (at least at one frequency) is essentially unchanged. For each frequency band k, the adaptation factor β(k) can be found in different ways.
(90) The adaptation factor β(k) may be expressed as
(91)
where * denotes the complex conjugation and <⋅> denotes the statistical expectation operator, which may be approximated in an implementation as a time average, k is the frequency index, and c is a constant (e.g. 0). The expectation operator <⋅> may be implemented using e.g. a first order IIR filter, possibly with different attack and release time constants. Alternatively, the expectation operator may be implemented using a FIR filter.
(92) In a further embodiment, the adaptive beamformer processing unit is configured to determine the adaptation parameter β.sub.opt(k) from the following expression
(93)
where w.sub.C1 and w.sub.C2 are the beamformer weights for the delay and sum C.sub.1 and the delay and subtract C.sub.2 beamformers, respectively, C.sub.v is the noise covariance matrix, and H denotes Hermitian transposition.
(94) The adaptive beamformer (BF) may e.g. be implemented as a generalized sidelobe canceller (GSC) structure, e.g. as a Minimum Variance Distortionless Response (MVDR) beamformer, as is known in the art.
(95) FIG. 7B shows an adaptive (own voice) beamformer configuration, an omnidirectional beamformer and a (own voice) target cancelling beamformer, respectively, are smoothed, and based thereon, the adaptation factor β(k) is determined. FIG. 7B implements an embodiment of a determination of the adaptive parameter
(96)
(97) The beamformers C.sub.1(k) and C.sub.2(k) (defined by respective sets of complex beamformer weights (w.sub.11(k), w.sub.12(k)) and (w.sub.21(k), w.sub.22(k))), as illustrated in FIG. 7B, define an omnidirectional beamformer (C.sub.1(k)) and a target (own voice) cancelling beamformer (C.sub.2(k)), respectively. LP is an (optional) low-pass filtering (smoothing) unit. The unit (Conj) provides a complex conjugate of the input signal to the unit. The unit |⋅|.sup.2 provides a magnitude squared of the input signal to the unit. A voice activity detector (VAD) controls the smoothing units (LP) via control signal N-VAD to provide that β(k) is updated during speech pauses (noise only),
(98) FIG. 7C shows an embodiment of an own voice beamformer, e.g. for the telephone mode illustrated in FIG. 6, implemented using the configuration comprising two microphones. FIG. 7C shows an own voice beamformer according to the present disclosure including an own voice-enhancing post filter (OV-PF) providing post filter gain (G.sub.OV,BF(k)), which is applied to the beamformed signal Y.sub.BF. The own voice gains are determined on the basis of a current noise estimate, here provided by a combination of an own voice cancelling beamformer (C.sub.2(k)), defined by (frequency dependent, cf. frequency index k) complex beamformer weights (w.sub.ov_cnc1_1(k), w.sub.ov_cnc1_2(k)) and the output of the own voice beamformer (Y.sub.BF) containing the own voice signal, enhanced by the own voice beamformer. In the embodiment of FIG. 7C, the own voice beamformer is adaptive, provided by adaptively updated parameter β(k), cf. e.g. FIG. 7B, so that Y.sub.BF=C.sub.1(k)−β(k) C.sub.2(k). A direction from the user's mouth, when the hearing device is operationally mounted is schematically indicated (cf. solid arrow denoted ‘Own Voice’ in FIG. 7C). The resulting signal (G.sub.OV,BF(k) Y.sub.BF(k)), provides the (enhanced, noise reduced) own voice estimate Y.sub.OV(k). The own voice estimate may (e.g. in an own-voice mode of operation of the hearing aid, e.g. when a connection to a telephone or other remote device is established (cf. e.g. FIG. 6)) be transmitted to a remote device via a transmitter (cf. e.g. Tx in FIG. 6), (e.g. to a far-end listener of a telephone, cf. FIG. 6), or used in a keyword detector, e.g. for a voice control interface of the hearing device. In the ‘own voice mode’, noise from external sound sources may be reduced by the beamformer.
(99) A binaural hearing system comprising first and second hearing devices (e.g. hearing aids, or first and second earpieces of a headset) as described above may be provided. The first and second hearing devices may be configured to allow the exchange of data, e.g. audio data, and with another device, e.g. a telephone, or a speakerphone, a computer (e.g. a PC or a tablet). Own voice estimation may be provided based on signals from microphones in the first and second hearing devices. Own voice detection may be provided in both hearing devices. A final own voice detection decision may be based on own voice detection values from both hearing devices or based on signals from microphones in the first and second hearing devices.
(100) FIG. 8A shows a top view of a first embodiment of a hearing system comprising first and second hearing devices integrated with a spectacle frame. FIG. 8B shows a front view of the embodiment in FIG. 8A, and FIG. 8C shows a side view of the embodiment in FIG. 8A.
(101) The hearing system (HS) according to the present disclosure comprises first and second hearing devices HD.sub.1, HD.sub.2 (e.g. first and second hearing aids of a binaural hearing aid system, or first and second ear pieces of a headset) configured to be worn on the head of a user comprising a head worn carrier, here embodied in a spectacle frame.
(102) The hearing system comprises left and right hearing devices and a number of microphones and possibly vibration sensors mounted on the spectacle frame. Glasses or lenses (LE) of the spectacles may be mounted on the cross bar (CB) and nose sub-bars (NSB.sub.1, NSB.sub.2). The left and right hearing devices (HD.sub.1, HD.sub.2) comprises respective BTE-parts (BTE.sub.1, BTE.sub.2), and further comprise respective ITE-parts (ITE.sub.1, ITE.sub.2). The hearing system may further comprise a multitude of input transducers, here shown as microphones, and here configured in three separate microphone arrays (MA.sub.R, MA.sub.L, MA.sub.F) located on the right, left side bars and on the (front) cross bar, respectively. Each microphone array (MA.sub.R, MA.sub.L, MA.sub.F) comprises a multitude of microphones (MIC.sub.R, MIC.sub.L, MIC.sub.F, respectively), here four, four and eight, respectively. The microphones may form part of the hearing system (e.g. associated with the right and left hearing devices (HD.sub.1, HD.sub.2), respectively, and contribute to localise and spatially filter sound from the respective sound sources of the environment around the user (and possibly in the estimation of the user's own voice). In an embodiment, all microphones of the system are located on the glasses and/or on the BTE part and/or in the ITE-part. The hearing system (e.g. the ITE-parts) may e.g. comprise electrodes for picking up body signals from the user, e.g. forming part of sensors for monitoring physiological functions of the user, e.g. brain activity or eye movement activity or temperature.
(103) However, as taught by the present disclosure, for own voice estimation, it may be advantageous to locate a first input transducer (e.g. a microphone or a vibration sensor) in the (preferably partially occluded part of the) ear canal. It might alternatively, or additionally, be advantageous to locate a first input transducer (e.g. a vibration sensor) on the mastoid bone, e.g. in the form of a vibration sensor contacting the skin of the user covering the mastoid bone, possibly forming part of the BTE-part, or located on a specifically adapted carrier part of the spectacle frame.
(104) Other sensors (not shown) may be located on the spectacle frame (camera, radar, etc.).
(105) The BTE- and ITE parts (BTE and ITE) of the hearing devices are electrically connected, either wirelessly or wired, as indicated by the dashed connection between them in FIG. 8C. The ITE part may comprise one or more input transducers (e.g. microphones) and/or a loudspeaker (cf. e.g. SPK in FIGS. 2 and 6) located in the ear canal during use. One or more of the microphones (MIC.sub.L, MIC.sub.R, MIC.sub.F) on the spectacle frame may be ‘second input transducers’ in the sense of the present disclosure, i.e. be located in a ‘send acoustic environment’ well suited to receive air-borne sound from the user's mouth, and participate in own-voice estimation according to the present disclosure.
(106) Instead of a spectacle frame, the carrier may be a dedicated frame for carrying the first and second hearing devices and for appropriately locating the first and second (and possible further) input transducers on the head (e.g. at the respective ears) of the user.
(107) FIG. 9 shows an embodiment of a hearing device, e.g. a hearing aid, according to the present disclosure. The hearing aid is here illustrated as a particular style (sometimes termed receiver-in-the ear, or RITE, style) comprising a BTE-part (BTE) adapted for being located at or behind an ear (pinna) of a user, and an ITE-part (ITE) adapted for being located in or at an ear canal of the user's ear and comprising a loudspeaker (SPK). The BTE-part and the ITE-part are connected (e.g. electrically connected) by a connecting element (IC) and internal wiring in the ITE- and BTE-parts (cf. e.g. wiring Wx in the BTE-part). The connecting element may alternatively be fully or partially constituted by a wireless link between the BTE- and ITE-parts.
(108) In the embodiment of a hearing device in FIG. 9, the BTE part comprises an input unit comprising three input transducers (e.g. microphones) (M.sub.BTE1, M.sub.BTE2, M.sub.BTE3), each for providing an electric input audio signal representative of an input sound signal (S.sub.BTE) (originating from a sound field S around the hearing device). The input unit further comprises two wireless receivers (WLR.sub.1, WLR.sub.2) (or transceivers) for providing respective directly received auxiliary audio and/or control input signals (and/or allowing transmission of audio and/or control signals to other devices, e.g. a remote control or processing device). The hearing device (HD) comprises a substrate (SUB) whereon a number of electronic components are mounted, including a memory (MEM) e.g. storing different hearing aid programs (e.g. parameter settings defining such programs, or parameters of algorithms, e.g. optimized parameters of a neural network, e.g. beamformer weights of one or more (e.g. an own voice) beamformer(s)) and/or hearing aid configurations, e.g. input source combinations (M.sub.BTE1, M.sub.BTE2, M.sub.BTE3, M.sub.1, M.sub.2, M.sub.3, WLR.sub.1, WLR.sub.2), e.g. optimized for a number of different listening situations or modes of operation. One mode of operation may be a communication mode, where the user's own voice is picked up by microphones of the hearing aid (e.g. M.sub.1, M.sub.2, M.sub.3) and transmitted to another device or system via one of the wireless interfaces (WLR.sub.1, WLR.sub.2). The substrate further comprises a configurable signal processor (DSP, e.g. a digital signal processor, e.g. including a processor (e.g. PRO in FIG. 2A, 2B) for applying a frequency and level dependent gain, e.g. providing beamforming, noise reduction, filter bank functionality, and other digital functionality of a hearing device according to the present disclosure). The configurable signal processor (DSP) is adapted to access the memory (MEM) and for selecting and processing one or more of the electric input audio signals and/or one or more of the directly received auxiliary audio input signals based on a currently selected (activated) hearing aid program/parameter setting (e.g. either automatically selected, e.g. based on one or more sensors, or selected based on inputs from a user interface). The mentioned functional units (as well as other components) may be partitioned in physical circuits and components according to the application in question (e.g. with a view to size, power consumption, analogue vs. digital processing, etc.), e.g. integrated in one or more integrated circuits, or as a combination of one or more integrated circuits and one or more separate electronic components (e.g. inductor, capacitor, etc.). The configurable signal processor (DSP) provides a processed audio signal, which is intended to be presented to a user. The substrate further comprises a front-end IC (FE) for interfacing the configurable signal processor (DSP) to the input and output transducers, etc., and typically comprising interfaces between analogue and digital signals. The input and output transducers may be individual separate components, or integrated (e.g. MEMS-based) with other electronic circuitry.
(109) The hearing system (here, the hearing device HD) may further comprise a detector unit comprising one or more inertial measurement units (IMU), e.g. a 3D gyroscope, a 3D accelerometer and/or a 3D magnetometer, here denoted IMU.sub.1 and located in the BTE-part (BTE). Inertial measurement units (IMUs), e.g. accelerometers, gyroscopes, and magnetometers, and combinations thereof, are available in a multitude of forms (e.g. multi-axis, such as 3D-versions), e.g. constituted by or forming part of an integrated circuit, and thus suitable for integration, even in miniature devices, such as hearing devices, e.g. hearing aids. The sensor IMU.sub.1 may thus be located on the substrate (SUB) together with other electronic components (e.g. MEM, FE, DSP). One or more movement sensors (IMU) may alternatively or additionally be located in or on the ITE part (ITE) or in or on the connecting element (IC), e.g. used to pick up sound from the user's mouth (own voice).
(110) The hearing device (HD) further comprises an output unit (e.g. an output transducer) providing stimuli perceivable by the user as sound based on a processed audio signal from the processor or a signal derived therefrom. In the embodiment of a hearing device in FIG. 9, the ITE part comprises the output unit in the form of a loudspeaker (also sometimes termed a ‘receiver’) (SPK) for converting an electric signal to an acoustic (air borne) signal, which (when the hearing device is mounted at an ear of the user) is directed towards the ear drum (Ear drum), where sound signal (S.sub.ED) is provided (possibly including bone conducted sound from the user's mouth, and sound from the environment ‘leaking around or through’ the ITE-part and into the residual volume). The ITE-part further comprises a sealing and guiding element (‘Seal’) for guiding and positioning the ITE-part in the ear canal (Ear canal) of the user, and for separating the ‘Residual volume’ (1.sup.st acoustic environment) from the environment (2.sup.nd acoustic environment), cf. e.g. FIG. 1A-1E, 2A, 2B. The ITE part (earpiece) may comprise a housing or a soft or rigid or semi-rigid dome-like structure.
(111) The electric input signals (from input transducers M.sub.BTE1, M.sub.BTE2, M.sub.BTE3, M.sub.1, M.sub.2, M.sub.3, IMU.sub.1) may be processed in the time domain or in the (time-) frequency domain (or partly in the time domain and partly in the frequency domain as considered advantageous for the application in question).
(112) The hearing device (HD) exemplified in FIG. 9 is a portable device and further comprises a battery (BAT), e.g. a rechargeable battery, e.g. based on Li-Ion battery technology, e.g. for energizing electronic components of the BTE- and possibly ITE-parts. In an embodiment, the hearing device, e.g. a hearing aid, is adapted to provide a frequency dependent gain and/or a level dependent compression and/or a transposition (with or without frequency compression) of one or more frequency ranges to one or more other frequency ranges, e.g. to compensate for a hearing impairment of a user.
(113) FIG. 10 shows an embodiment of a hearing device (HD), e.g. a headset, according to the present disclosure. The headset of FIG. 10 comprises a loudspeaker signal path (SSP), a microphone signal path (MSP), and a control unit (CONT) for dynamically controlling signal processing of the two signal paths. The loudspeaker signal path (SSP) comprises a receiver unit (Rx) for receiving an electric signal (In) from a remote device and providing it as an electric received input signal (S-IN), an SSP-signal processing unit (G1) for processing the electric received input signal (S-IN) and providing a processed output signal (S-OUT), and a loudspeaker unit (SPK) operationally connected to each other and configured to convert the processed output signal (S-OUT) to an acoustic sound signal (OS) originating from the signal (In) received by the receiver unit (IU). The microphone signal path (MSP) comprises an input unit (IU) comprising at least first and second microphones for converting an acoustic input sound (IS) (e.g. from a wearer of the headset) to respective electric input signals (M-IN), an MSP-signal processing unit (G2) for processing the electric microphone input signals (M-IN) and providing a processed output signal (M-OUT), and a transmitter unit (Tx) operationally connected to each other and configured to transmit the processed signal (M-OUT) originating from an input sound (IS) (e.g. comprising the user's own voice) picked up by the input unit (IU) to a remote end as a transmitted signal (On). The control unit (CONT) is configured to dynamically control the processing of the SSP- and MSP-signal processing units (G1 and G2, respectively), e.g. based on one or more control input signals (not shown).
(114) The input signals (S-IN, M-IN) to the headset (HD) may be presented in the (time-) frequency domain or converted from the time domain to the (time-) frequency domain by appropriate functional units, e.g. included in receiver unit (Rx) and input unit (IU) of the headset. A headset according to the present disclosure may e.g. comprise a multitude of time to time time-frequency conversion units (e.g. one for each input signal that is not otherwise provided in a time-frequency representation, e.g. analysis filter bank units (A-FB) of FIG. 5B) to provide each input signal in a number of frequency bands k and a number of time instances m (the entity (k,m) being defined by corresponding values of indices k and m being termed a TF-bin or DFT-bin or TF-unit.
(115) The headset (HD) is configured to provide an estimate of the user's own voice as disclosed in the present application. The MSP-signal processing unit (G2) may e.g. comprise an own voice beamformer as described in the present disclosure (see e.g. FIG. 7A-7C). The input transducers may e.g. be located on the headset as disclosed in the present application, e.g. as proposed in FIG. 1A-1E, FIG. 2A, 2B, FIG. 3, FIG. 4A-4E.
(116) It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.
(117) As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.
(118) It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.
(119) The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.
(120) Accordingly, the scope should be judged in terms of the claims that follow.
REFERENCES
(121) EP3328097A1 (Oticon A/S) 30 May 2018