SOUND PROCESSING METHOD AND DEVICE USING DJ TRANSFORM
20250349310 ยท 2025-11-13
Inventors
Cpc classification
G10L25/18
PHYSICS
International classification
Abstract
According to research findings, it is known that human hearing ability is not restricted by the Fourier uncertainty principle. The present disclosure intends to propose the sound processing method and device using the DJ transform method, a new frequency extraction method from understanding of the human hearing ability that improves the temporal resolution as well as the frequency resolution simultaneously based on the operating principle of hair cells constituting the cochlea.
Claims
1. A sound processing device comprising: a spring modeling unit that calculates displacement and velocity of each of the plurality of springs by modeling a plurality of springs, each of which has a different natural frequency and vibrates according to an input sound, and calculates displacement, velocity, energy, and amplitude of each of the plurality of springs by modeling a plurality of springs, each of which has a different natural frequency and vibrates according to an input pure tone; a frequency extraction unit that extracts the natural frequency of the spring corresponding to the local maximum among the filtered pure tone amplitudes calculated by the spring modeling unit; a sound recognition and synthesis unit that recognizes and synthesizes sound by using the amplitude or natural frequency of the input pure tone; and an error inspection unit that checks the excess error of the conversion result of the frequency when the frequency of the plurality of input springs changes and inspects the error between the pure tone frequencies.
2. The device according to claim 1, the spring modeling unit comprises: a spring frequency modeling module that models natural frequencies of a plurality of springs having different natural frequencies and vibrating according to input sound; a filtered pure tone amplitude determination module that determines filtered pure tone amplitudes of the plurality of springs; an amplitude calculation module that calculates transient pure tone amplitudes of the modeled plurality of springs, calculates expected steady-state amplitudes of the modeled plurality of springs, calculates predicted pure tone amplitudes based on the expected steady-state amplitudes, and calculates filtered pure tone amplitudes by multiplying the transient pure tone amplitude by the predicted pure tone amplitude; an expected steady-state amplitude estimation module that estimates the expected steady-state amplitude of a spring having the largest amplitude among the modeled plurality of springs; a spring energy calculation module that calculates the energy of at least one spring having the largest amplitude among the plurality of springs based on the expected steady-state amplitude; and an input pure tone amplitude calculation module that calculates the amplitude of the input pure tone based on the energy.
3. The device according to claim 1, the sound recognition and synthesis unit is characterized by performing speech recognition; speaker verification; speaker identification; source separation; sound direction detection; sound-based nomenclature diagnostics; sound-based machine fault diagnostics; or Sonar for navigation undersea terrain or ranging objects.
4. The device according to claim 1, the error inspection unit is characterized in that, when the frequency of the plurality of input springs is maintained at a first value until a certain point of time and turns to a second value at the certain point, the frequency conversion result up to the certain point is indicated as the first value, and immediately after the turning point, the transient error from the first value to the second value is checked to be within 10%, thereby inspecting the error between pure tone frequencies.
5. A sound processing method comprising the steps of: modeling, by a spring modeling unit, natural frequencies of a plurality of springs, the plurality of springs having natural frequencies different from each other and oscillating according to an input sound; determining, by the spring modeling unit, filtered pure-tone amplitudes of the plurality of springs: calculating, by the spring modeling unit, transient-state-pure-tone amplitudes of the plurality of modeled springs; calculating, by the spring modeling unit, expected steady-state amplitudes of the plurality of modeled springs; calculating, by the spring modeling unit, predicted pure-tone amplitudes based on the expected steady-state amplitudes; calculating, by the spring modeling unit, filtered pure-tone amplitudes by multiplying the transient-state-pure-tone amplitudes with the predicted pure-tone amplitudes; extracting, by a frequency extraction unit a natural frequency of at least one spring of the plurality of springs which corresponds to a local maximum value among the filtered pure-tone amplitudes; and using, by a sound recognition and synthesis unit, the natural frequency for sound recognition or sound synthesis.
6. The method according to claim 5, wherein said expected steady-state amplitude is calculated based on the amplitudes at least two time points within a duration of the input sound.
7. The method according to claim 5, wherein said expected steady-state amplitude is calculated by the equation below:
8. The method according to claim 6, wherein a difference between the two different time points is a period of the natural frequency of the corresponding spring.
9. The method according to claim 6, wherein if one of the two time points is t1, a sampling rate of the input sound is SR, and a period of the natural frequency of the corresponding spring is T, then the other t2 of the two time points is calculated by the equation below:
10. The method according to claim 6, wherein the expected steady-state amplitude is calculated by substituting amplitudes at least two points in the duration of the input sound into the following equation and using a linear regression analysis:
11. The method according to claim 5, wherein the spring modeling unit is characterized by performing the steps of: measuring displacements and velocities at time points for each of the plurality of springs; calculating an energy at each time point for each of the plurality of springs based on the displacements and the velocities; and calculating an amplitude at each time point for each of the plurality of springs based on the energy.
12. The method according to claim 5, wherein the number of the plurality of springs is determined based on a range and a resolution of the frequency to be extracted.
13. A sound processing method comprising the steps of: sampling, by a spring modeling unit, natural frequencies of a plurality of springs, the plurality of springs having natural frequencies different from each other and oscillating according to an input sound; estimating, by the spring modeling unit, an expected steady-state amplitude of the spring of which the amplitude is the highest among the plurality of modeled springs; calculating, by the spring modeling unit, an energy of at least one spring of the plurality of springs of which the amplitude is the highest based on the expected steady-state amplitudes; calculating, by the spring modeling unit, an amplitude of the input pure tone based on the energy; and using, by a sound recognition and synthesis unit, the amplitude of the input pure tone for sound recognition or sound synthesis.
14. The method according to claim 13, wherein said expected steady-state amplitude is calculated by the equation below:
15. The method according to claim 13, wherein the spring modeling unit is characterized by performing the steps of: measuring a displacement and a velocity at each time point for each of the plurality of springs; calculating an energy at each time point for each of the plurality of springs based on the displacement and the velocity; and calculating an amplitude at each time point for each of the plurality of springs based on the energy.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] The above and other aspects, features, and advantages of certain embodiments of the present disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
[0025]
[0026] invention.
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
DETAILED DESCRIPTION
[0042] The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the present disclosure.
[0043] Referring to
[0044] The spring modeling unit (110) models the movement of hair cells using a plurality of springs that have different natural frequencies and vibrate according to input sounds.
[0045] Hair cells change mechanical signals generated from the basilar membrane into electrical signals and transmit signals to the primary auditory cortex. Hair cells are composed of approximately 3,500 inner hair cells and 12,000 outer hair cells, and each hair cell is sensitive to sounds of its own characteristic frequency. This characteristic of hair cells is similar to the phenomenon in which a spring resonates and its amplitude increases when it receives an external force of a frequency that matches its own natural frequency. Utilizing this similarity, the spring modeling unit (100) models the movement of hair cells using a plurality of springs.
[0046] The spring modeling unit (110) can calculate the displacement and velocity of each of the plurality of springs by modeling a plurality of springs, each of which has a different natural frequency and vibrates according to the input sound. In addition, the spring modeling unit (110) can calculate the displacement, velocity, energy, and amplitude of each of the plurality of springs by modeling a plurality of springs, each of which has a different natural frequency and vibrates according to the input pure tone.
[0047] The spring modeling unit (110) can calculate the transient pure tone amplitude of the modeled plurality of springs, calculate the expected steady-state amplitude of the modeled plurality of springs, calculate the predicted pure tone amplitude based on the expected steady-state amplitude, multiply the transient pure tone amplitude by the predicted pure tone amplitude to calculate the filtered pure tone amplitude, and estimate the expected steady-state amplitude of the spring with the largest amplitude.
[0048] To this end, as illustrated in
[0049] The spring frequency modeling module (111) performs a function of modeling the natural frequencies of a plurality of springs that have different natural frequencies and vibrate according to the input sound.
[0050] The filtered pure tone amplitude determination module (112) performs a function of determining the filtered pure tone amplitude of a plurality of springs. The amplitude calculation module (113) performs a function of
[0051] calculating transient pure tone amplitudes of a plurality of modeled springs, a function of calculating expected steady state amplitudes of a plurality of modeled springs, a function of calculating expected pure tone amplitudes based on the expected steady state amplitudes, and a function of calculating filtered pure tone amplitudes by multiplying the expected pure tone amplitudes by the transient pure tone amplitudes.
[0052] The expected steady state amplitude estimation module (114) performs a function of estimating the expected steady state amplitude of a spring having the largest amplitude among the plurality of modeled springs.
[0053] The spring energy calculation module (115) performs a function of calculating the energy of at least one spring having the largest amplitude among the plurality of springs based on the expected steady state amplitudes.
[0054] Here, the spring energy calculation module (115) can measure displacement and velocity for each of the plurality of springs at each point in time, and calculate energy for each of the plurality of springs at each point in time based on the displacement and velocity.
[0055] The input pure tone amplitude calculation module (116) performs a function of calculating the amplitude of the input pure tone based on the energy.
[0056] The frequency extraction unit (120) extracts the natural frequency of the spring corresponding to the local maximum among the filtered pure tone amplitudes calculated by the water spring modeling unit (110).
[0057] The sound recognition and synthesis unit (130) determines the filtered pure tone amplitudes of several springs and performs sound recognition or sound synthesis using the natural frequencies.
[0058] To this end, as shown in
[0059] The sound recognition module (131) performs a function of recognizing sound using the amplitude or natural frequency of the input pure tone.
[0060] Here, sound recognition includes voice recognition in a narrow sense of converting human speech into text, speaker recognition that determines whose voice the input sound corresponds to, sound source separation such as distinguishing a specific person's voice when multiple speakers' voices are mixed, separating voice from noise when noise is mixed in the voice, or separating vocals excluding instruments in a song, sound direction detection, sound-based disease diagnosis such as coughing or breathing sounds, sound-based machine failure diagnosis using machine sounds, sonar for underwater terrain exploration or object distance measurement, etc.
[0061] The sound synthesis module (132) performs a function of synthesizing sound using the amplitude or natural frequency of the input pure tone.
[0062] The error checking unit (140) determines the frequency of multiple springs applied as input sounds. When input, maintains the first value until a certain point in time and changes to a second value at said certain point in time, the frequency conversion result up to said certain point in time is represented as the first value, and immediately after said changing point in time, checks whether the transient error from said first value to said second value is within 10% to examine the error between pure tone frequencies. In one embodiment, the sound processing device (100) of the present
[0063] invention may be configured as a SoC (System-on-a-chip) that receives sound in the form of wav data and extracts frequency at a constant cycle (e.g., 1 msec). Therefore, each of the components, a spring modeling unit (110), a frequency extraction unit (120), a sound recognition and synthesis unit (130), and an error inspection unit (140), may be components that operate through the hard-wired logic of the SoC.
[0064] In another embodiment, the sound processing device (100) of the present invention may be a DSP (Digital Signal Processor) that receives sound in the form of wav data and extracts frequency at a constant cycle (e.g., 1 msec). Therefore, each of the components, a spring modeling unit (110), a frequency extraction unit (120), a sound recognition and synthesis unit (130), and an error inspection unit (140), may be components of a programming code that operates in the DSP.
[0065] Through the above examples, the sound processing device (100) of the present invention can be used as a main component of a voice device such as a volume amplifier, a voice recognizer, a noise canceller, etc.
[0066] The process in which the sound processing device (100) of the present invention performs sound recognition and sound synthesis by utilizing the similarity between hair cells and water springs is as follows.
[0067] Hair cells convert mechanical signals generated in the basement membrane into electrical signals and transfer the signals to the primary auditory cortex. Hair cells consist of about 3,500 inner hair cells and 12,000 external hair cells, and each hair cell reacts sensitively to the sound of its own natural frequency. This characteristic of hair cells is similar to a phenomenon occurred in a spring of which amplitude increases because of resonance when the spring receives an external force with a frequency that matches the natural frequency of the spring. Using this similarity, the sound processing device (100) of present invention models the behavior of hair cells using a plurality of springs.
[0068] The human audible frequency is known to be in the range of 2020,000 Hz and the human voice frequency is known to be in the range of 808,000 Hz. The frequency range covered in the field such as speech recognition is within 8 kHz. Considering the same, when used for a voice processing, the natural frequencies of the springs from 50 Hz to 8 kHz are classified by 1 Hz intervals, and 7,951 different types of springs can be used based on those natural frequencies. This means that the frequency resolution is 1 Hz unit. However, this is only an example, and widening the frequency range or increasing the resolution by using more springs is possible.
[0069] The behavior of a hair cell modeled by a spring can be represented as a differential equation of motion for driven harmonic oscillations. A sound corresponds to an external force made up of a combination of various sine waves which are applied to a spring. Each spring has its own natural frequency and draws its own motion trajectory by a series of sound samples. The motion trajectory of each spring can be obtained by calculating the solution of the differential equation of motion for driven harmonic oscillations using numerical analysis techniques such as the Runge-Kuta method.
[0070] Assume that i is the natural frequency of a spring Si (1iN). The spring Si is used to model the response of a hair cell that are most sensitive to the sound of i frequency among the hair cells constituting the human hearing system.
[0071] When the sound F0cos(t) is input, the reaction xi(t) of the spring Si to the sound can be represented by the equation of motion of the following equation (1):
where x.sub.i is the length of the spring which deviates from the balance point (displacement), and m is the mass of the object suspended in the spring. is a damping ratio and when a friction coefficient is b.sub.i,
k.sub.i is a spring constant. i is the natural frequency of the spring when both and F.sub.i are zero, and .sub.i={square root over (k.sub.i/m)}.
[0072] Equation (1) is a differential equation with a general solution. When <1, the solution is the same as the equation (2) below.
where A.sub.i and .sub.i are determined by the initial conditions of the spring, and Z.sub.i and .sub.i are as below:
[0073] The integer n is specified so that qi is between 180 and 0. If F0=0, the spring is subjected to periodically damping oscillation as shown in
[0074] Consider a situation in which a sound having a frequency identical with the natural frequency i of a spring Si in a stop state is applied to the spring as an external force. The behavior of the spring in the process of reaching a steady state is described by the equation (6) below.
[0075] Therefore, the amplitude A.sub.i(t) of the spring gradually increases along the trajectory of
and finally becomes
[0076] As the external force disappears at the point t.sub.o, the amplitude of the spring gradually decreases to zero. This corresponds F.sub.0=0 in the equation (2), and the amplitude change in this process follows the equation below.
[0077]
[0078] According to the embodiments of the sound processing device (100) of the present invention, two methods for extracting the frequency and amplitude of the input sound are proposed based on the behavior of the spring modeled as hair cells.
Method I for Extracting the Frequency and Amplitude of the Input Sound
[0079] 1. In a steady state [0080] (1) Extraction of frequency
[0081] Based on the characteristic that a resonating spring oscillates with a greater amplitude than other springs, a frequency of an input sound can be extracted.
[0082] Given a pure sound F.sub.ocos(t), an amplitude of a spring Si in a steady state becomes
by the equation (5). If the mass m of the object suspended in each spring is equal to each other, the spring with the greatest amplitude is the spring having the minimum Z.sub.i. The relationship between the natural frequency .sub.i of the spring and the frequency of pure tone can be obtained by differentiating Equation (3) with respect to .sub.i, and the result is as follows:
where <1/{square root over (2)}. If (is a small value near zero, then .sub.i. For example, could be 0.001.
[0083] In order to find out the spring having the greatest amplitude, a numerical analysis method such as Runge-Kuta, which solves differential equations, is used. Given a pure sound F.sub.ocos(t), the displacement x.sub.i(t) and the velocity v.sub.i(t) of each spring S.sub.i which corresponds to the solution of equation (1) are calculated using the numerical analysis method. Since an energy of each spring is the sum of a kinetic energy and a potential energy, the energy of spring S.sub.i can be obtained by equation (9).
[0084] The energy of the spring that has reached a steady state maintains a constant value. Thus, the displacement x.sub.i at the time when the velocity v.sub.i is 0 becomes the amplitude of the spring S.sub.i. Therefore, the amplitude A.sub.i of spring S.sub.i in a steady state can be calculated by the equation below:
[0085] The spring having the largest amplitude among the extracted amplitudes of the springs is the resonating spring. Therefore, it is possible to obtain the frequency of an input pure tone by using both the natural frequency i of the spring having the largest amplitude and the equation (8).
(2) Extraction of Amplitude
[0086] In a steady state, the trajectory of the spring is given by the equation (5). Therefore, the relationship between an energy of a spring in a steady state, E.sub.i,s, and an amplitude F.sub.o of a given pure tone can be represented by the equation (11).
[0087] In addition, the energy in a steady state, E.sub.i,s, can be obtained by putting the displacement x.sub.i and the velocity v.sub.i in the steady state, which are obtained by solving the equation (1) with the numerical analysis method, into the equation (9). Therefore, the amplitude F.sub.o of a given pure tone becomes as below:
[0088] The natural frequency .sub.i of the spring that resonates with an external force is almost the same with the frequency of the external force. Therefore, if putting w.sub.i into the equation (3), then Zi=2.sub.i.sup.2. If putting both of this result and i={square root over (k.sub.i/m )} into the equation (12), the amplitude F.sub.0 of the input pure tone can be calculated by the equation (13).
[0091] Assume that a pure tone F.sub.ocos(t) is given over a time interval [t.sub.a, t.sub.b]. All springs start to move in an initial state where both displacements and velocities are zero. Using the numerical analysis technique, the energies of the springs are calculated at each time point, and the calculated results are put into the equation (10) to obtain the amplitudes of the springs at each time point. After that, the natural frequency of the spring having the largest amplitude is substituted into the equation (8) to calculate the frequency of the given pure tone.
[0092] (2) Extraction of amplitude
[0093] Assume that an energy of a resonating spring S.sub.i found by the numerical analysis is E.sub.i(t). The amplitude A.sub.i(t) of a spring S.sub.i at time t can be calculated from E.sub.i(t) using the equation (10).
[0094] According to the general solution of the equation (1), the amplitude A.sub.i(t) of the spring S.sub.i resonating with a given sound wave follows the trajectory of the equation (6), so that the spring S.sub.i follows the trajectory of A.sub.i(t)=(1e.sup.(tt.sup.
[0095] The energies E.sub.i(t.sub.1) and E.sub.i(t.sub.2) at two time points t.sub.1, t.sub.2 within the time interval [t.sub.a, t.sub.b] can be obtained with the numerical analysis method. Therefore, the amplitudes Ai(t.sub.1) and Ai(t.sub.2) can be obtained by substituting these results into the equation (10). The expected steady-state amplitude, A.sub.i,s, can be obtained by putting the result into A.sub.i(t)=(1e.sup.(tt.sup.
[0096] Next, regarding the case where the frequency is the same but the volume of the sound changes, assume that the amplitude of the sound given at the point t.sub.c has changed from F.sub.1to F.sub.2. Let A.sub.c be the amplitude of a spring at the time point t.sub.c and let A.sub.s be the amplitude of a spring at the time the spring will have approached a steady state after the external force changes to F.sub.2. The behavior of the amplitude over time can be described by the following equation.
[0097] Given the amplitudes A(t.sub.1) and A(t.sub.2) at two time points t.sub.1 and t.sub.2 within the time interval that the amplitude changes from A.sub.c to A.sub.s, it can be seen that the obtained A.sub.s is the same as Equation (14).
[0098] For example, consider the case where the external force F.sub.2=0 at the time point t.sub.c. When the external force disappears, the energy of the spring decreases exponentially according to the equation (7). Namely, the measured amplitude of the spring after T seconds from the time when the external force disappears will be A(t.sub.c+T)=A(t.sub.c)e.sup.t. Putting this measurement result into the equation (14) makes A.sub.s=0, and it means the external force has disappeared.
[0099] Therefore, the expected steady-state amplitude, As, can be obtained by measuring the energy of the spring more than once. Using equation (10) which represents the correlation between amplitude and energy, the energy in the steady state, Es, can be calculated and consequently the amplitude Fo of a given pure tone can be calculated using the equation (13).
[0100] Since the force applied to the spring is in the form of a periodic function, the energy does not increase uniformly within a period of a transient state. Considering this characteristic, when selecting the two time points ti and t.sub.2 described above, the time interval is made to be the same with the period.
[0101] In this regard, it may not be able to select two time points of which a time difference between them is one period due to the relationship between the sampling rate of the sound data and the natural frequency of the spring. In this case, an error may occur, and two methods can be used to correct this error.
[0102] The first method is to select an adjacent sample which shows a less difference with a period. When the position S.sub.1 of a sample and the period T of an audio data are given, the position S.sub.2 of the second sample is calculated as [S.sub.1+sampling rateT+0.5]. The expected steady-state amplitude, A.sub.s, is calculated by putting the time information of the two points and the amplitudes at the two points into the equation (14).
[0103] The second method uses a linear regression analysis. After extracting the amplitude at several points and putting the extracted data into the equation (15), the expected steady-state amplitude, A.sub.s, is calculated by the linear regression analysis.
[0104] Based on the above theoretical background, a method for extracting a frequency of an input sound can be proposed as below.
[0105] Referring
[0110] The step (a) may comprise the steps of: measuring displacements x.sub.i(t) and velocities v.sub.i(t) at time points for each of the plurality of springs (see the equation 1); calculating energy E.sub.i(t) at each time point for each of the plurality of springs based on the displacements and the velocities (see the equation 9); and calculating an amplitude A.sub.i(t) of each of the plurality of springs based on the energies E.sub.i(t) (see the equation 10).
[0111] The step (b) can be calculated with the equation (14).
[0112] In the step (b), said expected steady-state amplitude, A.sub.i,s(t), can be calculated based on the amplitudes at two different time points within a duration of the input sound.
[0113] A difference between the two different time points can be a period of the natural frequency of the corresponding spring.
[0114] When one of the two time points is t.sub.1, a sampling rate of the input sound is SR, and the period of the natural frequency of the corresponding spring is T, the other t.sub.2 of the two time points can be calculated by means of the equation below.
[0115] The number of the plurality of springs N may be determined based on a range and a resolution of the frequency to be extracted.
[0116]
[0117]
[0118]
Method II for Extracting the Frequency and Amplitude of the Input Sound
[0119] According to the method I for extracting the frequency and amplitude of the input sound described above, if the input sound is a pure tone, the frequency and amplitude of the input sound can be effectively extracted.
[0120] Now, assume that there are n types of pure tones constituting a complex tone F(t)=.sub.jF.sub.jcos(.sub.jt+.sub.j). If n =1, the pure tone of a given sound can be found by selecting the spring having the largest amplitude among the springs. However, if n>1, it is difficult to find out pure tones constituting the complex tone by selecting top n springs in the order of amplitude.
[0121] The first reason is that the amplitude of a spring of which the frequency is adjacent to the spring having the largest amplitude could be greater than the amplitude of the spring which resonates with other pure tones constituting the complex tone. The second reason is that, as shown in the trajectory after 0.8 seconds in
[0122] Accordingly, in this embodiment, instead of finding the local maximum value among the spring amplitudes at each time point, a method of finding the local maximum value from the results of multiplying an expected steady-state amplitude and a transient-state-pure-tone amplitude is proposed.
1. Expected Steady-State Amplitude and Filtered Pure-Tone Amplitude
[0123] First, in order to extract the pure tones constituting a complex tone, the amplitude A.sub.i(t) of each spring S.sub.i is calculated by applying the step (a) of the method I to each spring for extracting the frequency of an input sound.
[0124] Next, an expected steady-state amplitude, A.sub.i,s(t), is calculated by applying the step (b) of the method I for extracting the frequency of an input sound to the amplitude A.sub.i(t) of each spring Si. However, the equation (14) which calculates the expected steady-state amplitude is an equation derived from the equation (7) which describes the behavior of a resonating spring. Therefore, high amplitudes could be resulted even at the frequencies away from the resonant frequency as in
[0125] Accordingly, the following steps are performed. The third step is to calculate a transient-state-pure-tone amplitude, F.sub.i,t(t), by putting the amplitude A.sub.i(t) of the spring S.sub.i into the equation (13). In addition, a predicted pure-tone amplitude, F.sub.i,s(t), is calculated by applying steps (c) and (d) of the method I for extracting the frequency of the input sound to the expected steady-state amplitude, A.sub.i,s(t).
[0126] As the final step, a filtered pure-tone amplitude, F.sub.i,p(t), is calculated by multiplying the transient-state-pure-tone amplitude, F.sub.i,t(t), with the predicted pure-tone amplitude, F.sub.i,s(t), as in F.sub.i,p(t)=F.sub.i,st(t)F.sub.i,s(t). Additionally, the result of multiplication of the amplitudes may be divided by the maximum amplitude of the sound in order not to exceed 1 but to be normalized. For example, if the sound is expressed as a 16-bit integer, the result is divided by 32,767.
[0127] A filtered pure-tone amplitude has the characteristic that 1) the amplitude becomes 0 when the sound disappears, and 2) the amplitudes of frequencies away from a resonant frequency in the frequency domain are low.
[0128]
[0129]
2. Finding a Pure Tone from Local Maximum Values
[0130]
[0131] However, if the frequency interval is narrow, no local maximum might exist between two adjacent local maxima.
[0132] Based on the theoretical background described above, the following method for extracting the frequency of the input sound is proposed.
[0133] Referring
[0141] The step (1) may comprise the steps of: measuring displacements x.sub.i(t) and velocities v.sub.i(t) at different time points for each of the plurality of springs (see the equation 1); calculating an energy E.sub.i(t) at each time point for each of the plurality of springs based on the displacements x.sub.i(t) and the velocities v.sub.i(t) (see the equation 9); and calculating an amplitude A.sub.i(t) at each time point for each of the plurality of springs based on the energy E.sub.i(t) (see the equation 10).
[0142] The equation 13 can be used in the step (2), the equation 14 can be used in the step (3), and the equation 13 can be used in the step (4).
[0143] The number of the plurality of springs, N, may be determined based on a range and a resolution of the frequencies to be extracted.
[0144] In the step (3), the expected steady-state amplitudes, A.sub.i,s(t), can be calculated based on the amplitudes at two time points within a duration of the input sound.
[0145] In the step (3), the expected steady-state amplitudes, A.sub.i,s(t), can be calculated by means of the equation below:
where t.sub.1 and t.sub.2 are the two different time points within the duration of input sound, t.sub.2>t.sub.1, Ai(t.sub.1) is an amplitude of any spring among the plurality of springs at t.sub.1, Ai(t.sub.2) is an amplitude of said spring at t.sub.2, is a damping ratio of said spring, and satisfies the equation =.sub.i{square root over (12.sup.2)}, where .sub.i is the natural frequency of said spring.
[0146] A difference between the two different time points can be a period of the natural frequency of the corresponding spring.
[0147] When one of the two time points is t1, a sampling rate of the input sound is SR, and a period of the natural frequency of the corresponding spring is T, the other t2 of the two time points is calculated by the equation below.
[0148] In step (7), the natural frequency may be used for sound recognition or sound synthesis.
[0149] The sound processing method and sound processing apparatus according to the sound processing device (100) of the present invention can be applied not only to human voice but also to all types of sounds such as objects such as musical instruments and animals. In the present disclosure, sound recognition includes: speech recognition in a sense of converting human speech into text; speaker verification/speaker identification for determining whose voice an input sound corresponds to; source separation such as discrimination of a specific person's voice in a state in which the voices of a plurality of speakers are mixed, separation of voice from noise when noise is mixed, and separation of vocals from songs excluding instruments; sound direction detection; sound-based nomenclature diagnostics such as coughing or breathing; sound-based machine fault diagnostics based on mechanical sounds; and Sonar for navigating undersea terrain, ranging objects and more.
[0150] Sound recognition or sound synthesis are example to which the natural frequency obtained by the present invention can be applied, and the scope of the present invention is not limited thereto. The present invention can be applied to any field in which periodic properties or Fourier transforms are used such as price prediction for cryptocurrencies and stocks and image processing such as denoising.
[0151] Hereinafter, the experimental results according to the sound processing device (100) of the present invention will be described. To show the performance of the DJ transform according to the present disclosure, the results of the DJ transform and that of the STFT were compared. In the DJ transform, 7,951 springs of which natural frequencies are from 50 Hz to 8,000 Hz were used, respectively. The frequency interval of springs was 1 Hz. A 25 milliseconds window was used for the STFT.
[0152] The DJ transform was performed in an NVIDIA M40 GPU environment with 3,072 cores and 12 GB of memory and was implemented using the C language API of Cuda Toolkit 8.0. It took about 0.6 seconds to do the DJ transform for a 1 second audio data.
[0153]
[0154] As shown in
[0155] Three experiments were conducted to compare the results of the DJ transform with the STFT in terms of temporal resolution.
[0156] The first experiment was to check the frequency extracted at the time point where an input frequency changes.
[0157] The second experiment is to extract frequencies from the sounds that appear and disappear rapidly. The first rows of
[0158] In
[0159] The upper drawing in
[0160] The third experiment is an extension of the second experiment, which shows the results in frequency extraction when a 1 kHz and a 2 kHz pure tones are alternately generated for 5 milliseconds from 200 milliseconds to 800 milliseconds (
[0161] The first rows of
[0162] As can be seen in
[0163] Since the complex tone is composed of 400 Hz and 440 Hz, the amplitude fluctuates in a 40 Hz cycle as shown in the bottom of
[0164] preferred embodiments, the present disclosure is not limited thereto, and various changes and applications can be made without departing from the technical spirit of the present disclosure, which is obvious to a person skilled in the art. Therefore, the scope of protection for the present disclosure should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present disclosure.