Music Detection
20180342260 ยท 2018-11-29
Inventors
Cpc classification
G10L25/18
PHYSICS
G10H2210/046
PHYSICS
International classification
Abstract
The invention provides a method for detecting music in audio speech processing by decomposing an audio signal into component signals in one or more bandwidths. The invention then detects energy levels across preselected time and frequency windows within the narrowest bandwidth components. A predetermined number of detections at predetermined detection levels will result in the likely characterization of music being present in that window.
Claims
1. A method for detecting music, comprising decomposing a first signal into wide bandwidth components; medium bandwidth components; and narrow bandwidth components: subtracting said wide bandwidth components from said first signal to form a second signal; subtracting said medium bandwidth components from said second signal to form a third signal; detecting narrow bandwidth components from said third signal; summing said narrow bandwidth components from said third signal over a predetermined time period and predetermined frequency range; and determining music is present in said first signal within said predetermined time period when said summing exceeds a predetermined threshold.
2. In the method of claim 1 said predetermined time period is determined by the temporal length of a search window.
3. In the method of claim 1 said predetermined frequency range is determined by an upper and a lower frequency for a search window.
4. In the method of claim 1 said predetermined threshold is determined by setting a number of narrow bandwidth detections within a search window.
5. An article of manufacture comprising a non-transitory storage medium and a plurality of programming instructions stored therein, said programming instructions being configured to program an apparatus to implement on said apparatus one or more subsystems or services, including: decomposition of a first signal into wide bandwidth components; medium bandwidth components; and narrow bandwidth components; subtraction of said wide bandwidth components from said first signal to form a second signal; subtraction of said medium bandwidth components from said second signal to form a third signal; detection of narrow bandwidth components from said third signal; summation of said narrow bandwidth components from said third signal over a predetermined time period and predetermined frequency range; and determination that music is present in said first signal within said predetermined time period when said summing exceeds a predetermined threshold.
6. In the article of manufacture of claim 1 said predetermined time period is determined by the temporal length of a search window.
7. In the article of manufacture of claim 1 said predetermined frequency range is determined by an upper and a lower frequency for a search window.
8. In the article of manufacture of claim 1 said predetermined threshold is determined by setting a number of narrow bandwidth detections within a search window.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0017] The invention described herein provides a capability to detect music signals where the music signal is considered an interfering signal. The present invention does not address trying to identify the genre of music; nor does it attempt to remove or mitigate the music signal. For some applications, music may be considered an interfering signal or background noise. For other applications, music detection may provide search capabilities to locate songs of interest or genres of interest.
[0018] Most prior art approaches for music detection compute several features and feed the features to a classifier. The present invention avoids the pitfall of needing to provide music and non-music examples to train a classifier, such as a neural network. Instead, the present invention's approach defines what music is which makes this approach robust to varying recording settings, contaminating signals, and various artifacts. The goal is to develop an accurate music detection algorithm that can work in poor conditions, but can also succeed in clean recording environments.
[0019] Referring to
[0020] The present invention begins the music detection process by decomposing an input signal into wide, medium, and narrow band components. The Adjustable Bandwidth Concept (ABC) (see U.S. Pat. No. 5,257,211) is one such technique that provides an automated spectral decomposition technique which requires little or no a-priori knowledge about the digital signal. By estimating an individual noise threshold for each file, the ABC algorithm finds narrowband signals that are buried in wider bandwidth, noisy signals. This helps to avoid requiring an operator to adjust multiple (and often confusing) parameters. Because no assumptions are being made as to the type of the signal, the type of noise, or the type of interference, the ABC algorithm can succeed even when there are multiple, spectrally overlapping, time coincident signals present.
[0021] Instead of looking for specific types of signals, the present invention focuses on broad classes of signal detection. For a signal, such as the spectrogram in
[0022] Referring to
[0023] Referring to
[0024] Referring back to
[0025] It is within the scope of the present invention that it can be implemented in a combination of hardware and software. In certain embodiments a speech signal may already be in a digitized form, ready for immediate decomposition and downstream processing. In other embodiments the invention may comprise an audio capture means followed by analog-to-digital conversion prior to the decomposition step. It is envisioned that in all embodiments to that all functions performed by the invention can be implemented in software on a computer or alternatively software in firmware form as part of dedicated hardware embodiment of the invention.
Results
[0026] A set of 199 files were used to validate the present invention. For strong harmonics, like rotor noise, a length parameter is introduced. If the tone is too long, then it is not counted. Likewise, low-level tones are not counted by using an energy parameter. The use of an approach like the ABC process for signal decomposition provides a simple, robust, and efficient technique to detect the presence of music in noisy, diverse files. However, it is within the scope of the present invention to utilize any other compatible signal decomposition method in lieu of the ABC process.
[0027] Adjusting the parameters (lower/upper frequency, search window length, and threshold) affect the hits, misses, and false alarms of the data. A low frequency setting might allow more noise into the search window. Depending on the parameters, more hits and more false alarms could occur. Or, depending on parameter choice, fewer hits (more misses) and fewer false alarms could occur. If fewer misses is the desired goal, then, setting a lower threshold is necessary. If fewer false alarms is the desired goal, then, setting a higher threshold is necessary. In the end, a compromise between hits, misses, and false alarms is required.
[0028] An F1 measure is meant to combine the hits, misses, and false alarms into one to number. The F1 measure is the weighted average of the precision and recall. It is scaled to be on the interval [0, 100] with its best score at 100 and its worst score at 0. The precision of the test is calculated by:
The recall of the test is calculated by:
Combining the precision and recall for the F1 measure is:
[0029] The 199 files are divided into two sets of data. The first dataset is used to develop empirical thresholds for the parameters (lower/upper frequency, search window length, and threshold) while the second dataset is compute a F1 value. Then, the dataset are reversed by using the second dataset to develop the thresholds and the first dataset to compute a F1 value. The average F1 value for the 199 files using this approach is 80.75. This is still a good result since the F1 measure has three types of potential errors (hits, misses, and false alarms). Additionally, as stated previously, this is real-world data where there is a strong variety of music genres, recording quality, signal-to-noise ratio, and languages which complicate the process.
[0030] Having described preferred embodiments of the invention with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications may be effected therein by one skilled in the art without departing from the scope or spirit of the invention as defined in the appended claims.