NOISE SUPPRESSION SYSTEM AND METHOD
20170213567 ยท 2017-07-27
Assignee
- Koninklijke Kpn N.V. (The Hague, NL)
- Nederlandse Organisatie Voor Toegepast- Natuurwetenschappelijk Onderzoek TNO (s-Gravenhage, NL)
Inventors
Cpc classification
G10L21/0308
PHYSICS
International classification
Abstract
A play-out device is provided for playing out an audio signal via a speaker to provide a sound signal, and a recording device for recording the sound signal to obtain a recorded signal comprising a recording of at least the sound signal. The play-out device is configured for generating noise suppression data comprising the audio signal, or a reference thereto, and timing information for enabling the audio signal to be correlated in time with the recorded signal. A noise suppression subsystem is provided with the recorded signal and the noise suppression data. The noise suppression subsystem comprises a timing manager for synchronizing the audio signal with the recorded signal based on the timing information, and a noise suppressor for processing the recorded signal based on said synchronized audio signal to obtain a processed signal in which the recording of the sound signal is suppressed. The noise suppression subsystem is thus enabled to perform noise suppression, even when not comprised in the play-out device but rather in another device such as the recording device.
Claims
1. A system for noise suppression, comprising: a play-out device for playing out an audio signal via a speaker to provide a sound signal; a recording device for recording the sound signal to obtain a recorded signal comprising a recording of at least the sound signal, wherein: the play-out device is configured for providing noise suppression data to a communication channel, the noise suppression data comprising: i) the audio signal, or a reference to the audio signal which enables the audio signal to be accessed; and ii) timing information for enabling the audio signal to be correlated in time with the recorded signal; and wherein the system further comprises a noise suppression subsystem configured for obtaining the recorded signal and the noise suppression data, the noise suppression subsystem comprising: a timing manager for synchronizing the audio signal with the recorded signal based on the timing information to obtain a synchronized audio signal; and a noise suppressor for processing the recorded signal based on the synchronized audio signal to obtain a processed signal in which the recording of the sound signal is suppressed.
2. The system according to claim 1, wherein the audio signal obtained by the noise suppression subsystem comprises one or more content timestamps, and wherein the timing manager is configured for synchronizing the audio signal with the recorded signal further based on the one or more content timestamps.
3. The system according to claim 2, wherein the audio signal played-out by the play-out device comprises one or more watermarks, the one or more watermarks being associated with one or more watermark timestamps having a known relation in time with the one or more content timestamps, wherein the noise suppression subsystem comprises a watermark detector for detecting the one or more watermarks in the recorded signal, and wherein the timing manager is configured for synchronizing the audio signal with the recorded signal by correlating the one or more watermark timestamps in time with the one or more content timestamps.
4. The system according to claim 3, wherein the one or more watermark timestamps are play-out timestamps of the one or more watermarks at the play-out device, and wherein the timing information provided by the play-out device is constituted at least in part by the one or more play-out timestamps.
5. The system according to claim 3, wherein the one or more watermark timestamps are encoded in respective ones of the one or more watermarks.
6. The system according to claim 1, wherein the play-out device comprises a clock, wherein the timing information provided by the play-out device comprises one or more play-out timestamps associated with one or more content timestamps of the audio signal, wherein the one or more play-out timestamps are derived from the clock during play-out of the audio signal, wherein the recording device comprises a further clock having a known relation in time with the clock of the play-out device, wherein the recording device derives one or more recording timestamps from the further clock during recording of the sound signal, and wherein the timing manager is configured for synchronizing the audio signal with the recorded signal by correlating the one or more recording timestamps in time with the one or more content timestamps of the audio signal using the one or more play-out timestamps.
7. The system according to claim 1, wherein the audio signal obtained by the noise suppression subsystem comprises one or more watermarks matching one or more watermarks in the recorded signal, wherein the noise suppression subsystem comprises a watermark detector for detecting the one or more watermarks in the audio signal and in the recorded signal, and wherein the timing manager is configured for synchronizing the audio signal with the recorded signal by aligning in time the one or more watermarks in the audio signal and in the recorded signal.
8. The system according to any one of claim 1, wherein the recorded signal comprises, in addition to the recording of the sound signal, a recording of a further sound signal, and wherein the noise suppressor processes the recorded signal to obtain the processed signal having the recording of the sound signal suppressed with respect to the recording of the further sound signal.
9. The system according to claim 8, wherein the further sound signal is constituted by speech of a user.
10. A recording device as used in the system according to claim 1, comprising an input interface for receiving the noise suppression data from the play-out device via the communication channel.
11. The recording device according to claim 10, comprising the noise suppression subsystem.
12. A communication system for enabling speech communication between users, comprising at least one instance of the recording device according to claim 10.
13. A play-out device as used in the system according to claim 1, comprising an output interface for providing the noise suppression data to the noise suppression subsystem via the communication channel.
14. The play-out device according to claim 13, comprising at least one of: a watermark inserter for inserting one or more watermarks in the audio signal prior to play-out and/or transmission via the communication channel; and a timestamp function unit for determining one or more play-out timestamps during play-out of the audio signal for use in the timing information.
15. Noise suppression data as generated by the play-out device according to claim 13.
16. A method for suppressing noise, comprising: obtaining a recorded signal comprising a recording of at least a sound signal, the sound signal being provided by a play-out device playing out an audio signal via a speaker; obtaining, via a communication channel, noise suppression data from the play-out device, the noise suppression data comprising: i) the audio signal, or a reference to the audio signal which enables the audio signal to be accessed; and ii) timing information for enabling the audio signal to be correlated in time with the recorded signal; synchronizing the audio signal with the recorded signal based on the timing information to obtain a synchronized audio signal; and processing the recorded signal based on the synchronized audio signal to obtain a processed signal in which the recording of the sound signal is suppressed.
17. A computer program product comprising instructions for causing a processing system to perform the method according to claim 16.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0051] These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter. In the drawings,
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064] It should be noted that items which have the same reference numbers in different Figures, have the same structural features and the same functions, or are the same signals. Where the function and/or structure of such an item has been explained, there is no necessity for repeated explanation thereof in the detailed description.
LIST OF REFERENCE NUMERALS
[0065] The following list of reference numbers is provided for facilitating the interpretation of the drawings and shall not be construed as limiting the claims.
[0066] 020 communication channel
[0067] 040 sound signal
[0068] 060 providing of timing information via communication channel
[0069] 080 providing of audio signal via communication channel
[0070] 100 system for noise suppression
[0071] 120 speaker
[0072] 140 microphone
[0073] 200 play-out device
[0074] 210 output interface
[0075] 220 clock
[0076] 250 watermark inserter
[0077] 252 combination of watermark inserter and timestamp function unit
[0078] 260 timestamp function unit
[0079] 270 decoder
[0080] 280 encoder
[0081] 290 audio buffer
[0082] 300 recording device
[0083] 310 input interface
[0084] 320 clock
[0085] 330 timing manager
[0086] 340 noise suppressor
[0087] 342 impulse response estimator
[0088] 350 watermark detector
[0089] 352 combination of watermark detector and timestamp extractor
[0090] 360 timestamp extractor
[0091] 370 decoder
[0092] 380 recording buffer
[0093] 390 audio buffer
[0094] 400 noise suppression data
[0095] 410 audio signal
[0096] 412 audio signal or reference
[0097] 420 timing information
[0098] 430 watermark
[0099] 440 watermark encoding timestamp
[0100] 460 recorded signal
[0101] 470 synchronized audio signal
[0102] 480 processed signal
[0103] 500 method for noise suppression
[0104] 510 obtaining recorded signal
[0105] 520 obtaining noise suppression data
[0106] 530 synchronizing audio signal using noise suppression data
[0107] 540 processing recorded signal using synchronized audio signal
[0108] 600 computer readable medium
[0109] 610 computer program stored as non-transitory data
DETAILED DESCRIPTION OF EMBODIMENTS
[0110]
[0111]
[0112] The play-out device 200 may be configured for providing, via the communication channel 020, noise suppression data 400 to the recording device 300. For that purpose, the play-out device 200 is shown to comprise an output interface 210 for outputting data to the communication channel 020, and the recording device 300 is shown to comprise an input interface 310 for receiving data from the communication channel 020. Each respective interface may take any suitable form. For example, for providing Bluetooth-based data communication, the output interface may be a Bluetooth transmitter and the input interface may be a Bluetooth receiver.
[0113] The noise suppression data 400 generated by play-out device 200 may comprise the audio signal. Alternatively, although not shown in
[0114]
[0115] The system may be advantageously used in use-cases where the recorded signal comprises, in addition to the recording of the sound signal, a recording of a further sound signal. As such, the noise suppressor may provide a processed signal in which the recording of the sound signal is suppressed with respect to the recording of the further sound signal. For example, in case the further sound signal is constituted by speech of a user, the sound signal of the play-out device may be suppressed with respect to the speech of the user, thereby improving the intelligibility of the speech.
[0116] Examples of advantageous use-cases include the following: [0117] Social television (TV). Here, two or more parties may view the same TV program at different locations and at the same time communicate with each other via an audio communication channel. In this use case, each respective party may hear the TV audio of the other party through the audio communication channel in addition to the TV audio of their own TV. Moreover, even if the TV audio at each location is synchronized, the transmission delay of the audio communication channel will delay the TV audio, causing annoying echoes, and will not help in correctly hearing the other party. In addition, the TV's audio volume might be loud, further reducing intelligibility. The system may be employed here to suppress the TV audio in the recorded signal at one, or more parties, prior to transmitting the recorded signal to another party. [0118] Speech control. If a user is trying to control an electronic device using his/her speech, background noise such as TV audio may severely limit the usability of speech control. The system may be employed here to suppress the TV audio in the recorded signal prior to applying speech recognition to the recorded signal. [0119] Forensic audio enhancement. Here, law enforcement may attempt to listen in on a target using audio surveillance, while the target may attempt to hinder such eavesdropping by turning the volume of a play-out device, such as a home or car stereo, very high. Here, the system may be employed to suppress the sound signal of the play-out device in the recorded signal obtained by law enforcement. [0120] Audio communication. In general, in audio communication, it may be desirable to avoid transmitting the sound signal of a TV or radio playing in the background in order to avoid letting the other party know which TV program you are watching or what radio station you are listening to, e.g., for reasons of privacy. The system may be employed here to suppress such sound signals in the recorded signal at one, or both parties, prior to transmitting the recorded signal to the other party. [0121] Audio recording. It may be desirable to record your own speech on some recording device, e.g. for taking personal notes, without recording background audio. Likewise, the system may be employed to suppress background noise. [0122] Referring further to
[0123] It is further noted that the synchronization of the audio signal with the recorded signal may be a coarse synchronization in that there may, after synchronization, still be a delay remaining between the synchronized audio signal and the recorded signal. A reason for this may be that the system may not always be able to account for all factors contributing to the delay between the audio signal and the recorded signal. For example, there is normally a propagation delay of the sound signal from the speaker of the play-out device to the microphone of the recording device. For certain configurations of the system, as elucidated further from
[0124] In this respect, it is noted that noise suppression techniques are known, and may be used by the noise suppressor, which are capable of compensating for smaller delays between input signals, e.g., up to 128 ms. An example of such a technique is noise suppression using adaptive filters. However, in view of the coarse synchronization performed by the timing manager, such noise suppression techniques may be simpler, e.g., by using shorter adaptive filters, requiring fewer iterations, etc.
[0125]
[0126]
[0127] The timing manager may then synchronize the audio signal with the recorded signal by correlating in time one or more content timestamps of the audio signal with the one or more recording timestamps. For that purpose, the timing manager may match the recording timestamps of the recorded signal to the play-out timestamps of the audio signal and thereby to the associated content timestamps. As such, the audio signal may be synchronized with the recorded signal so as to obtain a synchronized audio signal. It is noted that the matching of the recording timestamps to the play-out timestamps may be a one-to-one matching which may assume no delay existing between the play-out and subsequent recording of the sound signal.
[0128] In practice, however, there may be a delay constituted at least in part by a propagation time of the sound signal from the speaker to the microphone. By disregarding such a delay, the synchronization may effectively be a coarse synchronization, as previously discussed, thereby yielding a coarsely synchronized audio signal. The timing manager may also compensate for such delay, e.g., by assuming a predefined delay value or by estimating the delay, e.g., by applying a cross-correlation technique to the coarsely synchronized audio signal and the recorded signal to determine the delay.
[0129]
[0130]
[0131]
[0132] It is noted that in the above examples of
[0133] It is further noted that the term play-out timestamp may refer to a timestamp representing the actual time, e.g., in relation to a wall clock, at which the play-out device is presenting. Moreover, the term content timestamp may refer to a timestamp marking a specific point in the content, e.g., the audio signal. An example of a content timestamp is a presentation timestamp included in an MPEG transport stream (TS) for the purpose of synchronizing different elementary streams.
[0134]
[0135] In general, the play-out device 200 may comprise an output interface 210 for outputting the noise suppression data to the communication channel. The play-out device 200 may comprise a clock 220. The clock 220 may be, but does not need to be, synchronized or have a known relation in time with a clock in the recording device. The play-out device 200 may comprise a watermark inserter 250 which may insert one or watermarks into the audio signal during or prior to play-out and/or prior to transmission via the communication channel. The play-out device 200 may comprise a timestamp function unit 260 which may determine one or more play-out timestamps. The play-out timestamps may be of watermarks. The timestamp function unit 260 may make use of the clock 220 in determining the play-out timestamps. The timestamp function unit 260 may cooperate with the watermark inserter, e.g., by being integrated therein, to allow the play-out timestamps to be encoded in respective watermarks. The play-out device 200 may comprise a decoder 270. The decoder 270 may be used to decode the audio signal from a received audio stream. The play-out device 200 may comprise an encoder 280. The encoder 280 may be used to encode the audio signal prior to transmission via the communication channel. Such encoding may comprise lossless or lossy compression. The play-out device 200 may comprise an audio buffer 290. The audio buffer 290 may be used to delay the play-out of the audio signal to pre-compensate for a transmission delay of the noise suppression data.
[0136] Although not explicitly shown in
[0137]
[0138] In general, the recording device 300 may comprise an input interface 310 for receiving the noise suppression data from the communication channel. The recording device 300 may comprise a clock 320. The clock 320 may be, but does not need to be, synchronized or have a known relation in time with a clock in the play-out device. The recording device 300 may comprise a timing manager 330 for synchronizing the audio signal with the recorded signal based on timing information. The recording device 300 may comprise a noise suppressor 340 for processing the recorded signal based on the synchronized audio signal to obtain a processed signal in which the recording of the sound signal is suppressed. Together, the timing manager 330 and the noise suppressor 340 may form (part of) a noise suppression subsystem.
[0139] The recording device 300 may comprise an impulse response estimator 342. The impulse response estimator 342 may estimate an impulse response of the speaker, the room and the microphone from the recorded signal. The impulse response may be applied to the (synchronized) audio signal prior to being subtracted from the recorded signal. As such, it may be possible to compensate for the sound signal being recorded no longer perfectly matching the audio signal from which the sound signal originated due to imperfect reproduction by the speaker, reverberations within the room, and imperfect recording by the microphone. The recording device 300 may comprise a watermark detector 350 which may detect one or more watermarks into the recorded signal and/or the (synchronized) audio signal. Alternatively, a combination 352 of watermark detector and timestamp extractor may be provided which may comprise a timestamp extractor 360. The timestamp extractor 360 may extract timestamps from watermarks in cases where the watermarks encode the timestamps. It is noted that the components described in this paragraph may be part of the noise suppression subsystem, also when located externally of the recording device.
[0140] The recording device 300 may comprise a decoder 370 for decoding an encoded audio signal as received via the communication channel. The recording device 300 may comprise a recording buffer 380. The recording buffer 380 may be used to buffer the recorded signal prior to noise suppression so as to account for a transmission delay of the noise suppression data. The recording device 300 may comprise an audio buffer 390. The audio buffer 390 may be used to buffer the audio signal received via the communication channel in cases where it runs ahead of the recorded signal. This may occur when the play-out device delays the play-out of the audio signal with respect to the transmission of the noise suppression data.
[0141] In general, the play-out device may take various forms, such as, but not limited to, a television, a stereo, a computer, etc. The recording device may also take various forms, such as, but not limited to, a computer, a tablet device, a mobile phone, a home phone, etc. In particular, the recording device may be comprised in, or constituted by, a communication device. The communication device may, together with another communication device and optionally a server, form a communication system which enables speech communication between users. In addition to speech communication, the communication system may, but does not need to, provide video communication. For that purpose, the communication device may comprise a camera.
[0142]
[0143] Alternatively, the noise suppression data may comprise a reference 412 to the audio signal from which the audio signal may be accessed. The reference 412 may be a reference to a resource. The resource may be a network resource such as a streaming server. For example, the reference may be to a stream representing a broadcast of a television channel, a stream representing a broadcast of a radio channel, or to a video-on-demand stream, etc. The content timestamps may be the timestamps originally present in the audio signal or its stream before reception by the play-out device. Watermarks may also be present in the audio signal, in which case the play-out device may make use of the watermarks. Also, in such a case, it may not be needed for the play-out device itself to insert watermarks in the audio signal.
[0144] It is noted that the audio signal accessed on the resource may comprise the same content timestamps as the audio signal available to the play-out device. For example, in case the content timestamps are constituted by presentation timestamps included in a MPEG transport stream, the play-out device and the noise suppression subsystem may have access to the same content timestamps when accessing the MPEG transport stream. Accordingly, the play-out device may directly use the content timestamps in generating the timing information. Alternatively, if the audio signal accessed by the noise suppression subsystem comprises different content timestamps than those available to the play-out device, these different content timestamps may be correlated in time using correlation information. Such correlation information is described in WO 2010/106075 A1 for purpose of media stream synchronization, and may be used to correlate the content timestamps at the play-out device to the (different) content timestamps at the noise suppression subsystem.
[0145] The noise suppression data 400 is further shown to comprise the timing information 420. The timing information 420 may comprise one or more play-out timestamps. In addition, the timing information 420 may comprise one or more content timestamps which are associated with the one or more play-out timestamps, or may comprise other information which may enable the timing manager to associate the play-out timestamps with the content timestamps of the audio signal 412. The timing information 420 may be formatted as a metadata stream. Accordingly, the play-out device may stream the timing information 420 via the communication channel. The metadata stream may be multiplexed with the audio stream to obtain a multiplexed stream such as a MPEG Transport Stream (TS). Such multiplexing may take place in cases where the audio signal 412 does not comprise content timestamps. Accordingly, the play-out timestamps or other information provided by the timing information 420 may be associated with respective parts of the audio signal 412.
[0146] In general, the noise suppression data may comprise i) an audio stream representing the audio signal, the audio stream comprising content timestamps, and ii) a metadata stream representing the timing information, the metadata stream comprising at least one combination of a play-out timestamp and a content timestamp. Alternatively, the noise suppression data may comprise i) an audio stream representing the audio signal and ii) a metadata stream representing the timing information, the metadata stream comprising at least one play-out timestamp, the metadata stream being multiplexed with the audio stream so as to associate the at least one play-out timestamp with respective part(s) of the audio signal. The audio stream may comprise a watermark, e.g., as described with reference to
[0147]
[0148] The operations of the method 500 may be performed in any suitable order. For example, the obtaining 510 of the recorded signal and the obtaining 520 of the noise suppression data may be performed sequentially, or in parallel.
[0149] It will be appreciated that a method according to the invention may be implemented in the form of a computer program which comprises instructions for causing a processor system to perform the method. The method may also be implemented in hardware, or as a combination of hardware and software.
[0150] The computer program may be stored in a non-transitory manner on a computer readable medium. Said non-transitory storing may comprise providing a series of machine readable physical marks and/or a series of elements having different electrical, e.g., magnetic, or optical properties or values.
[0151] It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments.
[0152] In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb comprise and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. The article a or an preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.