RECOGNITION APPARATUS, RECOGNITION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
20210397649 · 2021-12-23
Assignee
Inventors
Cpc classification
A61B5/6803
HUMAN NECESSITIES
G06F18/21
PHYSICS
A61B5/7264
HUMAN NECESSITIES
G06F21/32
PHYSICS
A61B5/126
HUMAN NECESSITIES
International classification
A61B5/00
HUMAN NECESSITIES
Abstract
A recognition apparatus 100 for ear acoustic recognition include a feature normalizer 101 which reads input ear acoustic data and removes the earphone's resonance effect from the input ear acoustic data to produce a normalized data at the output, a feature extractor 102 which extracts acoustic features from the normalized data, a classifier 103 which reads the acoustic features as input and classifies them into their corresponding class.
Claims
1. A recognition apparatus for ear acoustic recognition comprising: a feature normalizer that reads input ear acoustic data and removes the earphone's resonance effect from the input ear acoustic data to produce a normalized data at the output; a feature extractor that extracts acoustic features from the normalized data; a classifier that reads the acoustic features as input and classifies them into their corresponding class.
2. The recognition apparatus according to claim 1, wherein the feature normalizer reads the input ear acoustic data along with the type of earphone used for capturing the input ear acoustic data, searches the earphone's acoustic resonance in a dictionary of acoustic resonances of various earphone, removes the searched earphone's resonance from the input ear acoustic data, and produces the normalized ear acoustic data at the output.
3. The recognition apparatus according to claim 2, wherein the acoustic resonances of earphones in the dictionary are made by capturing acoustic responses of a hollow tube with the earphones attached in it and separating the acoustic resonances of the earphones from the one of the hollow tube.
4. The recognition apparatus according to claim 3, wherein the acoustic resonances of earphones are obtained by blind source separation that extracts signal components which are common over earphones and signal components which are unique to individual earphones from captured acoustic responses.
5. The recognition apparatus according to claim 4, wherein the acoustic resonances of earphones are obtained by using non-negative matrix factorization as a blind source separation technique.
6. A recognition method for ear acoustic recognition comprising: reading input ear acoustic data and removing the earphone's resonance effect from the input ear acoustic data to produce a normalized data at the output; extracting acoustic features from the normalized data; reading the acoustic features as input and classifies them into their corresponding class.
7. The recognition method according to claim 6, wherein in the reading, reading the input ear acoustic data along with the type of earphone used for capturing the input ear acoustic data, searching the earphone's acoustic resonance in a dictionary of acoustic resonances of various earphone, removing the searched earphone's resonance from the input ear acoustic data, and producing the normalized ear acoustic data at the output.
8. The recognition method according to claim 7, wherein in the reading, the acoustic resonances of earphones in the dictionary are made by capturing acoustic responses of a hollow tube with the earphones attached in it and separating the acoustic resonances of the earphones from the one of the hollow tube.
9. The recognition method according to claim 8, wherein in the reading, the acoustic resonances of earphones are obtained by blind source separation that extracts signal components which are common over earphones and signal components which are unique to individual earphones from captured acoustic responses.
10. The recognition method according to claim 9, wherein in the reading, the acoustic resonances of earphones are obtained by using n-negative matrix factorization as a blind source separation technique.
11. A non-transitory computer-readable medium having recorded thereon a program for ear acoustic recognition by a computer, the program including instructions for causing the computer to execute: reading input ear acoustic data and removing the earphone's resonance effect from the input ear acoustic data to produce a normalized data at the output; extracting acoustic features from the normalized data; reading the acoustic features as input and classifies them into their corresponding class.
12. The non-transitory computer-readable medium according to claim 11, wherein in the reading the input ear acoustic data along with the type of earphone used for capturing the input ear acoustic data, searching the earphone's acoustic resonance in a dictionary of acoustic resonances of various earphone, removing the searched earphone's resonance from the input ear acoustic data, and producing the normalized ear acoustic data at the output.
13. The non-transitory computer-readable medium according to claim 12, wherein in the reading, the acoustic resonances of earphones in the dictionary are made by capturing acoustic responses of a hollow tube with the earphones attached in it and separating the acoustic resonances of the earphones from the one of the hollow tube.
14. The non-transitory computer-readable medium according to claim 13, wherein in the reading, the acoustic resonances of earphones are obtained by blind source separation that extracts signal components which are common over earphones and signal components which are unique to individual earphones from captured acoustic responses.
15. The non-transitory computer-readable medium according to claim 14, wherein in the reading, the acoustic resonances of earphones are obtained by using n-negative matrix factorization as a blind source separation technique.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0042] The drawings together with the detailed description, serve to explain the principles for the inventive method. The drawings are for illustration and do not limit the application of the technique.
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
DETAILED DESCRIPTION
[0051] Principle of the Invention
[0052] A summary of the solution to all these problems is provided next. To solve the technical problems discussed above an overall approach is summarized here. There are two stages in the approach as a training stage and a test stage.
[0053] In training stage, a feature normalization block reads the training ear acoustic data and produces normalized data as output by removing earphone's resonance effect. Acoustic feature extractor reads the normalized data as input and extracts corresponding acoustic features.
[0054] Classifier reads the extracted features as input and estimates their class labels. Objective function calculator reads original labels of the input feature and estimated class labels by classifier. It calculates cost of the classification as classification error between original labels and estimated class labels.
[0055] Parameter updater updates parameters of the classifier according to the minimization of cost function. This process keeps going on till convergence. After convergence, parameter updater stores parameters of the classifier in storage.
[0056] In test stage, the feature normalization block reads the given test acoustic data and produces normalized data. Then, feature extractor reads the normalized data as input and extracts corresponding acoustic features. Following this, classifier reads the extracted acoustic features as input and predicts the corresponding class.
[0057] The feature normalization block consists of 2-step processing. First step is to prepare a dictionary of acoustic resonances of various kind of earphones. This step is done before using the block in the ear acoustic recognition system.
[0058] In this step, first a collector collects the acoustic responses of a hollow cylindrical tube with the help of a mic-integrated earphone by transmitting white noise. Secondly, a separator performs source separation on each of the recorded acoustic responses of hollow tube to separate the resonances of earphone from that of captured hollow tube using signal processing for e.g., Non-negative Matrix Factorization source separation. Thirdly, the storage stores the separated acoustic resonance of the earphone in the dictionary with the type of earphone as the label.
[0059] The second step of the block is performed in the system during both of the training and test stage for the normalization of the input ear acoustic features. In this step, a resonance remover reads the input ear acoustic data and the type of earphone used to capture it.
[0060] Then, it looks up for the acoustic resonance of the used earphone from the dictionary prepared in Step 1. After that, the remover removes the earphone's resonance from the input data and gives the normalized data as output. Direct subtraction or some source separation techniques can be used by the remover for removal purpose.
Embodiment
[0061] Hereinafter, a recognition apparatus, a recognition method, and a program of exemplary embodiments of the present invention will be described in detail with reference to
Device Configuration
[0062] First, the schematic configuration of the recognition apparatus of the embodiment will be described.
[0063] A recognition apparatus 100 of the embodiment shown in
[0064] The feature normalizer 101 reads input ear acoustic data and removes the earphone's resonance effect from the input ear acoustic data to produce a normalized data at the output. The feature extractor 102 extracts acoustic features from the normalized data. The classifier reads the acoustic features as input and classifies them into their corresponding class.
[0065] In this way, with the recognition apparatus 100, the resonance effect of earphone is removed from acoustic data. For this reason, it is possible to improved pattern recognition accuracy
[0066] Next, the configuration of the recognition apparatus 100 of the embodiment will be described in detail with reference to
[0067]
[0068] As shown in
[0069] In the training stage, the feature normalizer 101 reads captured ear acoustic data x and the type of earphone used for capturing the data t. Then, the feature normalizer 101 looks up for the resonance of earphone t and removes it from the input ear acoustic features and produces the normalized ear acoustic data y at the output.
[0070] The feature extractor 102 reads normalized acoustic data y as input and extracts acoustic features z at output. The classifier 103 receives the extracted acoustic features z as input and classifies them into their corresponding classes o. The classifier can be any classifier such as support vector machines or neural networks.
[0071] The objective function calculator 104 calculates cost 1041 as classification error 1042 between estimated classes of input features o and original labels of classes 1. The parameter updater 105 updates parameters of classifier according to cost minimization. This process keeps going till convergence when no more cost function can be reduced. After convergence, the parameter updater 105 stores parameters of the trained classifier in the storage 106.
[0072] In the trial stage, the feature normalizer 101 reads the input test data x′ and produces normalized data as output y′. The feature extractor 102 reads normalized data as input and extracts corresponding feature at output z′. The classifier 103 reads its stored structure and parameters from the storage 106. The classifier 103 reads test acoustic features as input and predicts its class at the output o′.
[0073]
[0074] First step is preparation process of resonance directory using the collector 1011 which collects acoustic resonance of a hollow tube in storage 1012, the separator 1013, and the Storage 1014. Second step process is resonance removal using the resonance remover 1015.
[0075] In first step, the collector 1011 collects the acoustic responses of a hollow cylindrical tube with the help of a mic-integrated earphone by transmitting white noise and stores it in Storage 1012.
[0076] Then, the separator 1013 performs source separation on each of the recorded acoustic responses of hollow tube to separate the resonances of earphone from that of captured hollow tube using signal processing for e.g., Non-negative Matrix Factorization source separation (NMF).
[0077] NMF reads spectrogram of input captured acoustic data and perform source separation on to produce 2 spectrograms at output corresponding to 2 sources. One source is common among all the input that is the hollow tube air resonance and another source is the earphone's acoustic resonance. This separated acoustic resonance of the earphone is stored in the dictionary with the type of earphone as the label in the storage 1014.
[0078] In second step, the resonance remover 1015 reads the input ear acoustic data and the type of earphone used to capture it. Then, the resonance remover 1015 looks up for the acoustic resonance of the used earphone in the Storage 1014 consisting of the resonance dictionary.
[0079] After that, the resonance remover 1015 removes the obtained earphone's resonance from the input data and gives the normalized data as output. Direct subtraction or some source separation techniques can be used by the remover for removal purpose. Spectrograms of ear acoustics are taken as input.
Device Operation
[0080] Next, operations performed by the recognition apparatus 100 of the embodiment will be described with reference to
[0081] First, with reference to
[0082] In training stage, the feature normalizer 101 reads the training ear acoustic data and the type of earphone used to capture the data (step A01). Next, the feature normalizer 101 produces normalized data as output by removing earphone's resonance effect (step A02). Next, the acoustic feature extractor 102 reads the normalized data as input and extracts corresponding acoustic features (step A03).
[0083] Then, the classifier 103 reads the extracted features as input and estimates their class labels (step A04). Next, the objective function calculator 104 reads original labels of the input feature and estimated class labels by classifier. The objective function calculator 104 calculates cost of the classification as classification error between original labels and estimated class labels (step A05).
[0084] Then, the parameter updater 105 updates the parameters of the classifier 103 according to the minimization of cost function (step A06). The parameter updater 105 keeps going to execute step A06 until the parameters of the classifier 103 convergence (step A07). After convergence, the parameter updater 105 stores the parameters of the classifier 103 in storage 106 (step A08).
[0085] Next, with reference to
[0086] As shown in
[0087] Next, the acoustic feature extractor 102 reads normalized data as input and extracts corresponding feature at output (step B04). After that, the classifier 103 reads its stored structure and parameters from the storage 106. The classifier 103 reads test acoustic features as input and predicts its class at the output (step B05).
[0088] Second flow chart
[0089] As shown in
[0090] Next, the acoustic feature extractor 102 reads normalized data as input and extracts corresponding feature at output (step C04). Then, the classifier 103 reads its stored structure and parameters from the storage. Next, the classifier 103 reads test acoustic features as input and transforms them using its trained matrix to discriminative features. (step C05).
Program
[0091] It is sufficient that the program of the embodiment is a program for causing a computer to execute steps A01 to A08 shown in
[0092] Note that the program of the embodiment may be executed by a computer system that is constituted by multiple computers. In this case, the computers may respectively function as the feature normalizer 101, the feature extractor 102, the classifier 103, the objective function calculator 104, and the parameter updater 105, for example.
Physical Configuration
[0093] The following describes a computer that realizes the recognition apparatus by executing the program of the embodiment, with reference to
[0094] As shown in
[0095] The CPU 11 deploys programs (code) of this embodiment, which are stored in the storage device 13, to the main memory 12, and executes various types of calculation by executing the programs in a predetermined order. The main memory 12 is typically a volatile storage device such as a DRAM (Dynamic Random-Access Memory). The programs of this embodiment are provided in a state of being stored in a computer-readable recording medium 20. Note that the programs of this embodiment may be distributed over the Internet, which is accessed via the communication interface 17.
[0096] Other specific examples of the storage device 13 include a hard disk and a semiconductor storage device such as a flash memory. The input interface 14 mediates the transfer of data between the CPU 11 and an input device 18 such as a keyboard or a mouse. The display controller 15 is connected to a display device 19, and controls screens displayed by the display device 19.
[0097] The data reader/writer 116 mediates the transfer of data between the CPU 11 and the recording medium 20 and executes the readout of programs from the recording medium 20, and the writing of processing results obtained by the computer 10 to the recording medium 20. The communication interface 17 mediates the transfer of data between the CPU 11 and another computer.
[0098] Specific examples of the recording medium 20 include a general-purpose semiconductor storage device such as a CF (Compact Flash (registered trademark)) card or an SD (Secure Digital) card, a magnetic storage medium such as a Flexible Disk, and an optical storage medium such as a CD-ROM (Compact Disk Read Only Memory).
[0099] Note that the recognition apparatus of the above embodiments can also be realized by using hardware that corresponds to the various units, instead of a computer in which a program is installed. Furthermore, a configuration is possible in which a portion of the recognition apparatus is realized by a program, and the remaining portion is realized by hardware.
[0100] Part or all of the embodiments described above can be realized by Supplementary Notes 1 to 15 described below, but the present invention is not limited to the following descriptions.
(Supplementary Note 1)
[0101] A recognition apparatus for ear acoustic recognition comprising:
[0102] a feature normalizer that reads input ear acoustic data and removes the earphone's resonance effect from the input ear acoustic data to produce a normalized data at the output;
[0103] a feature extractor that extracts acoustic features from the normalized data;
[0104] a classifier that reads the acoustic features as input and classifies them into their corresponding class.
(Supplementary Note 2)
[0105] The recognition apparatus according to supplementary note 1,
[0106] wherein the feature normalizer reads the input ear acoustic data along with the type of earphone used for capturing the input ear acoustic data, searches the earphone's acoustic resonance in a dictionary of acoustic resonances of various earphone, removes the searched earphone's resonance from the input ear acoustic data, and produces the normalized ear acoustic data at the output.
(Supplementary Note 3)
[0107] The recognition apparatus according to supplementary note 2,
[0108] wherein the acoustic resonances of earphones in the dictionary are made by capturing acoustic responses of a hollow tube with the earphones attached in it and separating the acoustic resonances of the earphones from the one of the hollow tube.
(Supplementary Note 4)
[0109] The recognition apparatus according to supplementary note 3,
[0110] wherein the acoustic resonances of earphones are obtained by blind source separation that extracts signal components which are common over earphones and signal components which are unique to individual earphones from captured acoustic responses.
(Supplementary Note 5)
[0111] The recognition apparatus according to supplementary note 4,
[0112] wherein the acoustic resonances of earphones are obtained by using non-negative matrix factorization as a blind source separation technique.
(Supplementary Note 6)
[0113] A recognition method for ear acoustic recognition comprising:
[0114] (a) a step of reading input ear acoustic data and removing the earphone's resonance effect from the input ear acoustic data to produce a normalized data at the output;
[0115] (b) a step of extracting acoustic features from the normalized data;
[0116] (c) a step of reading the acoustic features as input and classifies them into their corresponding class.
(Supplementary Note 7)
[0117] The recognition method according to supplementary note 6,
[0118] wherein in the step (a), reading the input ear acoustic data along with the type of earphone used for capturing the input ear acoustic data, searching the earphone's acoustic resonance in a dictionary of acoustic resonances of various earphone, removing the searched earphone's resonance from the input ear acoustic data, and producing the normalized ear acoustic data at the output.
(Supplementary Note 8)
[0119] The recognition method according to supplementary note 7,
[0120] wherein in the step (a), the acoustic resonances of earphones in the dictionary are made by capturing acoustic responses of a hollow tube with the earphones attached in it and separating the acoustic resonances of the earphones from the one of the hollow tube.
(Supplementary Note 9)
[0121] The recognition method according to supplementary note 8,
[0122] wherein in the step (a), the acoustic resonances of earphones are obtained by blind source separation that extracts signal components which are common over earphones and signal components which are unique to individual earphones from captured acoustic responses.
(Supplementary Note 10)
[0123] The recognition method according to supplementary note 9,
[0124] wherein in the step (a), the acoustic resonances of earphones are obtained by using n-negative matrix factorization as a blind source separation technique.
(Supplementary Note 11)
[0125] A computer-readable medium having recorded thereon a program for ear acoustic recognition by a computer, the program including instructions for causing the computer to execute:
[0126] (a) a step of reading input ear acoustic data and removing the earphone's resonance effect from the input ear acoustic data to produce a normalized data at the output;
[0127] (b) a step of extracting acoustic features from the normalized data;
[0128] (c) a step of reading the acoustic features as input and classifies them into their corresponding class.
(Supplementary Note 12)
[0129] The computer-readable medium according to supplementary note 11,
[0130] wherein in the step (a), reading the input ear acoustic data along with the type of earphone used for capturing the input ear acoustic data, searching the earphone's acoustic resonance in a dictionary of acoustic resonances of various earphone, removing the searched earphone's resonance from the input ear acoustic data, and producing the normalized ear acoustic data at the output.
(Supplementary Note 13)
[0131] The computer-readable medium according to supplementary note 12,
[0132] wherein in the step (a), the acoustic resonances of earphones in the dictionary are made by capturing acoustic responses of a hollow tube with the earphones attached in it and separating the acoustic resonances of the earphones from the one of the hollow tube.
(Supplementary Note 14)
[0133] The computer-readable medium according to supplementary note 13,
[0134] wherein in the step (a), the acoustic resonances of earphones are obtained by blind source separation that extracts signal components which are common over earphones and signal components which are unique to individual earphones from captured acoustic responses.
(Supplementary Note 15)
[0135] The computer-readable medium according to supplementary note 14,
[0136] wherein in the step (a), the acoustic resonances of earphones are obtained by using n-negative matrix factorization as a blind source separation technique.
[0137] As a final point, it should be clear that the process, techniques and methodology described and illustrated here are not limited or related to a particular apparatus. It can be implemented using a combination of components. Also, various types of general purpose devise may be used in accordance with the instructions herein. The present invention has also been described using a particular set of examples.
[0138] However, these are merely illustrative and not restrictive. For example, the described software may be implemented in a wide variety of languages such as C++, Java, Python and Perl etc. Moreover, other implementations of the inventive technology will be apparent to those skilled in the art.
INDUSTRIAL APPLICABILITY
[0139] According to the present invention, it is possible to remove the resonance effect of earphone from acoustic data. The present invention is useful in ear acoustic recognition.
REFERENCE SIGNS LIST
[0140] 10 Computer [0141] 11 CPU [0142] 12 Main memory [0143] 13 Storage device [0144] 14 Input interface [0145] 15 Display controller [0146] 16 Data reader/writer [0147] 17 Communication interface [0148] 18 Input device [0149] 19 Display apparatus [0150] 20 Storage medium [0151] 21 Bus [0152] 100 Recognition apparatus [0153] 101 Feature normalizer [0154] 102 Feature extractor [0155] 103 Classifier [0156] 104 Objective function calculator [0157] 105 Parameter updater. [0158] 106 Storage [0159] 1011 Collector [0160] 1012 Storage [0161] 1013 separator [0162] 1014 storage [0163] 1015 resonance remover