METHOD AND A SYSTEM FOR DECOMPOSITION OF ACOUSTIC SIGNAL INTO SOUND OBJECTS, A SOUND OBJECT AND ITS USE
20180233120 ยท 2018-08-16
Inventors
Cpc classification
G10H2210/056
PHYSICS
G10H2240/145
PHYSICS
G10H2210/066
PHYSICS
G10H1/06
PHYSICS
G10H2250/055
PHYSICS
International classification
Abstract
A method and a system for decomposition of acoustic signal into sound objects having the form of signals with slowly-varying amplitude and frequency, as well as sound objects and their use. The object is achieved by a method for decomposing an acoustic signal into digital sound objects, a digital sound object representing a component of the acoustic signal, the component having a waveform, comprising the steps of converting the analogue acoustic signal into a digital input signal (PIN); determining an instantaneous frequency component of the digital input signal, using a digital filter bank; determining an instantaneous amplitude of the instantaneous frequency component; determining an instantaneous phase of the digital input signal associated with the instantaneous frequency; creating at least one digital sound object, based on the determined instantaneous frequency, phase and amplitude; and storing the digital sound object in a sound object database.
Claims
1. A method for decomposing an acoustic signal into digital sound objects, a digital sound object representing a component of the acoustic signal, the component having a waveform, the method comprising: converting the analogue acoustic signal into a digital input signal (PIN), wherein the digital signal comprises samples of the acoustic signal; determining, for each sample, an instantaneous frequency component of the digital input signal, using a digital filter bank comprising digital filters(n); determining, for each sample, an instantaneous amplitude of the instantaneous frequency component; determining, for each sample, an instantaneous phase of the digital input signal associated with the instantaneous frequency; creating at least one digital sound object, wherein the digital sound object includes the determined instantaneous frequency, phase and amplitude; and storing the digital sound object in a sound object database, characterized in that, for each sample, for each filter(n), locations of frequencies present in the acoustic signal are determined based on an intersection of a value of an angular frequency at the output of each filter(n) and its nominal angular frequency.
2. The method of claim 1, wherein a digital filter in the digital filter bank has a window length proportional to its central frequency.
3. The method of claim 2, wherein central frequencies of the filter bank are distributed according to a logarithmic scale.
4. The method of claim 1, characterized in that an operation improving the frequency-domain resolution of said filtered signal is executed sample by sample.
5. The method of claim 1, wherein the step of determining an instantaneous frequency component takes into account one or more instantaneous frequency components determined using adjacent digital filters of the digital filter bank.
6. The method of claim 1, wherein the instantaneous frequency is tracked over subsequent samples of the digital input signal.
7. The method of claim 6, characterized in that values of the envelope of amplitude and values of frequency and their corresponding time instants are determined in order to create characteristic points with coordinates in time-frequency-amplitude space describing the waveform of said sound object.
8. The method of claim 7, characterized in that the values are determined not less frequently than once per period of duration of a given filter's window W(n).
9. The method of claim 6, further comprising the step of correcting an amplitude and/or frequency of selected sound objects as to reduce an expected distortion in said sound objects, the distortion being introduced by said digital filter bank.
10. The method of claim 3, characterized in that improving the frequency-domain resolution of said filtered signal further comprises a step of increasing the window length of selected filters.
11. The method of claim 4, characterized in that the operation of improving the frequency-domain resolution of said filtered signal further comprises the step of subtracting an expected spectrum of located adjacent sound objects from the spectrum at the output of the filters.
12. The method of claim 4, characterized in that the operation of improving the frequency-domain resolution of said filtered signal further comprises a step of subtracting an audio signal generated based on located adjacent sound objects from said input signal.
13. A digital sound object, the digital sound object comprising at least one parameter set representing a waveform of at least one component of an acoustic signal, generated by a method according to claim 1.
14. Non-volatile, non-transient computer-readable medium, storing a sound object generated according to claim 1.
15. A method for generating an audio signal, comprising the steps of: receiving a digital sound object according to claim 13; decoding the digital sound object in order to extract at least one parameter set describing a waveform of at least one component of the audio signal; generating the waveform from the parameter set; synthesizing the audio signal, based on the generated waveform; and outputting the audio signal.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0037] The invention has been depicted in an embodiment with reference to the drawings, wherein:
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
DETAILED DESCRIPTION OF EMBODIMENTS
[0059] In the present patent application the term connected, in the context of a connection between any two systems, should be understood in the broadest possible sense as any possible single or multipath, as well as direct or indirect physical or operational connection.
[0060] A system 1 for decomposition of acoustic signal into sound objects according to the invention is shown schematically in
[0061] In order to extract sound objects from an acoustic signal, a time-domain and frequency-domain signal analysis has been used. Said digital input signal is input to the filter bank 2 sample by sample. Preferably, said filters are SOI filters. It is shown in
[0062] Since the main task of the method and the system according to the invention is to localize all sound objects in the spectrum, an important issue is possible accuracy of determination of signal's parameters and a resolution of simultaneously appearing sounds. The filter bank should provide a high frequency-domain resolution, i.e. greater than 2 filters per semitone, making it possible to separate two adjacent semitones. In the presented examples 4 filters per semitone are used.
[0063] Preferably, in the method and the system according to the invention a scale corresponding to human ear's parameters has been adopted, with logarithmic distribution, however a person skilled in the art will know that other distributions of filters' central frequencies are allowed within the scope of the invention. Preferably, a pattern for the distribution of filters' central frequencies is the musical scale, wherein the subsequent octaves begin with a tone 2 times higher than the previous octave. Each octave is divided into 12 semitones, i.e. the frequency of two adjacent semitones differs by 5.94% (e.g. e1=329.62 Hz, f1=349.20 Hz). To increase accuracy, there are four filters for each semitone in the method and the system according to the invention, wherein each filter listens to its own frequency, differing from an adjacent frequency by 1.45%. It has been assumed that the lowest audible frequency is C2=16.35 Hz. Preferably, the number of filters is greater than 300. A particular number of filters for a given embodiment depends on the sampling rate. With sampling at 22050 samples per second the highest frequency is e6=10548 Hz, 450 filters being in this range. With sampling at 44100 samples per second the highest frequency is e7=21096 Hz, 498 filters being in this range.
[0064] A general principle of operation of a passive filter bank is shown in
[0065]
W(n)=K*fp/FN(n) (1) [0066] where: W(n)window width of a filter n [0067] fpsampling rate (e.g. 44100 Hz) [0068] FN(n)nominal (central) frequency of a filter n [0069] Kwindow width coefficient (e.g. 16)
[0070] Since a higher frequency-domain resolution is necessary in the lower range of the musical scale, therefore for this range of frequencies the filter windows will be the widest. Thanks to an introduction of coefficient K and a normalization to the filter nominal frequency FN there is provided an identical amplitude and phase characteristic for all the filters.
[0071] With regard to the implementation of said filter banka skilled person will know that one of possible ways of obtaining the coefficients of a SOI type band-pass filter is to determine the impulse response of the filter. An exemplary impulse response of a filter 20 according to the invention is shown in
y(i)(n)=cos((n)*i)*AB*cos(2i/W(n))+C*cos(5i/W(n)) (2)
[0072] where: (n)=2n*FN(n)/fp
[0073] W(n), FN(n), fpare defined above
TABLE-US-00001 Window type A B C Hann 0.5 0.5 0 (Hanning) Hamming 0.53836 0.46164 0 Blackman 0.42 0.5 0.08
[0074] The operations performed by each of the filters 20 have been shown in
By using trigonometric equations relating to products of trigonometric functions for equations (3) and (4) one obtains a dependence of the components FC(n) and FS(n) on the values of these components for the previous sample of the audio signal and a value of the sample inputted to the filter P.sub.IN, and the one outputted from the filter P.sub.OUT, according to the equation shown in
[0075] Values of the real component FC(n) and the imaginary component FS(n) of the sample obtained after each subsequent sample of the input signal are forwarded from each filter's 20 output to a system for tracking sound objects 3, and in particular to a spectrum analyzing system 31 comprised therein (as shown in
[0076] A spectrum analyzing system 31, being a component of the system for tracking objects 3 (as shown in
TABLE-US-00002 Tone No. FN Note 276 880.0 Hz a2 288 1046 Hz c3 304 1318 Hz e3 324 1760 Hz a3
[0077] There are shown in
[0078] Due to some typical phenomena in the signal processing domain basing only on maxima of amplitude of the spectrum is not effective. The presence of a given tone in the input signal affects the value of the amplitude spectrum at adjacent frequencies, leading in consequence to a severely distorted spectrum when the signal comprises two tones close to each other. To illustrate this phenomenon, and to illustrate the functionality of the spectrum analyzing system 31 according to the invention, a signal has been subjected also to the analysis, comprising sounds of frequencies:
TABLE-US-00003 Tone No. FN Note 276 880.0 Hz a2 284 987.8 Hz h2 304 1318 Hz e3 312 1480 Hz #f3
As shown in
[0079] The fundamental task of the system for tracking objects 3, a block diagram of which is shown in
[0080] In other words, said voting system performs an operation of calculating votes, namely an operation of collecting votes of each filter(n) on a specific nominal angular frequency which votes by outputting its angular frequency close to the one on which said vote is given. Said votes are shown as a curved line FQ[n]. An exemplary implementation of said voting system 32 could be a register into which certain calculated values are collected under specific cell. The consecutive number of filter, namely the number of a cell in the register under which a certain value should be collected would be determined based on specific angular frequency outputted by a specific filter, said outputted angular frequency being an index to the register. The person skilled in the art will know that the value of outputted angular frequency is rarely an integer thus said index should be determined based on certain assumption, for example that said value of instant angular frequency should be round up or round down. Next the value to be collected under a determined index can be for example a value equal to 1 multiplied by the amplitude outputted by said voting filter or a value equal to a difference between the outputted angular frequency and the closest nominal frequency multiplied by the amplitude outputted by said voting filter. Such values can be collected in a consecutive cell of the register by addition or subtraction or multiplication or by any other mathematical operation reflecting the number of voting filters. In this way the voting system 31 calculates a weighted value for a specific nominal frequency based on parameters acquired from the spectrum analyzing system. This operation of calculating votes takes into account three sets of input values, the first one being values of nominal angular frequencies of filters, the second one being values of instant angular frequencies of filters, third ones being values of the amplitude spectrum FA(n) for each filter
[0081] As is shown in
[0082] In the case of associating with each other an active object and an object sufficiently close to, a matching function is further calculated in the system for associating objects 33, which comprises the following weighted values: amplitude matching, phase matching, objects duration time. Such a functionality of the system for associating objects 33 according to the invention is of essential importance in the situation when in a real input signal a component signal from one and the same source has changed frequency. This is because it happens that as a result of frequency changing a number of active objects become closer to each other. Therefore, after calculating the matching function the system for associating objects 33 checks if at a given time instant there is a second object sufficiently close to in the database 34. The system 33 decides which object will be a continuer of the objects which join together. The selection is decided by the result of the matching function comparison. The best matched active object will be continued, and an instruction to terminate will be issued for the remaining ones. Also a resolution improvement system 36 cooperates with the active objects database 34. It tracks the mutual frequency-domain distance of the objects present in the signal. If too close frequencies of active objects are detected the resolution improvement system 36 sends a control signal to start one of the three processes improving the frequency-domain resolution. As mentioned previously, in the case of presence of a few frequencies close to each other, their spectrum overlap. To distinguish them the system has to listen intently to the sound. It can achieve this by elongating the window in which the filter samples the signal. In this situation a window adjustment signal 301 is activated, informing the filter bank 2 that in the given range the windows should be elongated. Due to the window elongation the signal dynamics analysis is impeded, therefore if no close objects are detected the resolution improvement system 36 enforces a next shortening of the filter's 20 window.
[0083] In the solution according to the invention a window with length of 12 to 24 periods of nominal frequency of the filter 20 is assumed. The relation of the frequency-domain resolution with the window's width is shown in
TABLE-US-00004 Detects objects Tracks objects Window width in the in the (in periods) distance of distance of 12 17.4% 23.2% 16 14.5% 17.4% 20 8.7% 14.5% 24 5.9% 11.6%
[0084] In another embodiment the system listens intently to a sound by modifying the filter bank's spectrum, what is schematically illustrated in
FS(n)=FA(n)*exp(=(xFX(n)/2/22(W(n)))*sin(FD(n)*(xFX(n))+FF(n))
FC(n)=FA(n)*exp((xFX(n))2/22(W(n)))*cos(FD(n)*(xFX(n))+FF(n))
where It is a function of the width of the window when width of the window=20 then 2=10, i.e. based on the known instantaneous frequency and subtracts them from the real spectrum, causing that the spectrum of adjacent elements will not be interfered so strongly. The spectrum analyzing system 31 and the voting system 32 perceive only adjacent elements and a variation of the subtracted object. However, the system for associating objects 33 further takes into account the subtracted parameters while comparing the detected elements with the active objects database 34. Unfortunately, to implement this frequency-domain resolution improvement method a very large number of computations is required and a risk of positive feedback exists.
[0085] In a yet another embodiment, the frequency-domain resolution can be improved by subtracting from the input signal an audio signal generated based on well localized (like in the previous embodiment) adjacent objects. Such operation is shown schematically in
[0086] According to the invention, the information contained in the active objects database 34 is also used by a shape forming system 37. The expected result of the sound signal decomposition according to the invention is to obtain sound objects having the form of sinusoidal waveforms with slowly-varying amplitude envelope and frequency. Therefore, the shape forming system 37 tracks variations of the amplitude envelope and frequency of the active objects in the database 34 and calculates online subsequent characteristic points of amplitude and frequency, which are the local maximum, local minimum and inflection points. Such information allows to unambiguously describe sinusoidal waveforms. The shape forming system 37 forwards these characteristic information in the form of points describing an object online to the active objects database 34. It has been assumed that the distance between points to be determined should be no less than 20 periods of the object's frequency. Distances between points, which are proportional to frequency, are capable to effectively represent dynamics of the objects' variation. Exemplary sound objects have been shown in
[0087] The description of sound objects shown in the table
[0088] 1) Header: The notation starts with a header having as an essential element a header tag comprising a four byte keyword, informing that we deal with the description of sound objects. Next, in two bytes an information about the number of channels (tracks) is specified and two bytes of time unit definition. The header occurs only once at the beginning of a file.
[0089] 2) Channel: Information about channels (tracks) from this field serves to separate the group of sound objects being in an essential relation, e.g. left or right channel in stereo, vocal track, percussion instruments track, recording from a defined microphone etc. The channel field comprises the channel identifier (number), the number of objects in the channel and the position of the channel from the beginning of an audio signal, measured in defined units.
[0090] 3) Object: An identifier contained in the first byte decides about the type of the object decides. Identifier 0 denotes a basic unit in the signal record which is the sound object. Value 1 can denote a folder containing a group of objects like, for example, basic tone and its harmonics. Other values can be used to define other elements related to objects. The description of the fundamental sound object includes the number of points. The number of points does not include the first point, which is defined by the object itself. Specifying maximal amplitude in object's parameters allows to control simultaneous amplification of all points of the object. In the case of a folder of objects, this affects the value of amplitude of all the objects contained in the folder. Analogically, specifying information about frequency (applying notation: number of tone*4 of a filter bank=notes*16) allows to simultaneously control the frequency of all the elements related to an object. Furthermore, defining the position of the beginning of an object in relation to a higher level element (e.g. a channel) allows to shift the object in time.
[0091] 4) Point: Points are used to describe the shape of the sound object in time-frequency-amplitude domain. They have relative value with respect to parameters defined by the sound object. One byte of amplitude defines which part of the maximal amplitude defined by the object the point has. Similarly, tone variation defines by what fraction of tone the frequency has changed. Position of point is defined as relative with respect to the previously defined point in the object.
[0092] The multilevel structure of recording and relative associations between the fields allow a very flexible operation on sound objects, making them effective tools for designing and modifying audio signals.
[0093] Condensed recording of information about sound objects according to the invention, in the format shown in
[0094] Identification of sound objects in an audio signal is not an unambiguous mathematical transformation. The audio signal created as a composition of objects obtained in the result of a decomposition differs from the input signal. The task of the system and the method according to the invention is to minimize this difference. Sources of differences are of two types. Part of them is expected and results from the applied technology, other can result from interference or unexpected properties of input audio signal. To reduce the difference between the audio signal composed of sound objects according to the invention and the input signal a correcting system 4, shown in
[0095] The first type of correction of sound objects according to the invention, performed by the correcting system 4, is shown in
[0096] A further type of correction according to the invention, performed by the correcting system 4, has been shown in
[0097] Yet another type of correction according to the invention, performed by the correcting system 4, is shown in
[0098] A task of the correcting system 4 is also to remove objects having an insignificant influence on the audio signal's sound. According to the invention it was decided, that such objects can be the ones having the maximal amplitude which is lower than 1% of the maximal amplitude present in the whole signal at a given time instant. Change in the signal at the level of 40 dB should not be audible.
[0099] The correcting system performs generally the removal of all irregularities in the shape of sound objects, which operations can be classified as: joining of discontinuous objects, removal of objects' oscillations near the adjacent ones, removal of insignificant objects, as well as the interfering ones, lasting too shortly or audible too weakly.
[0100] To illustrate the results of the use of the method and the system for sound signal decomposition a fragment of stereo audio signal sampled at 44100 samples per second has been tested. The signal is a musical composition including sound of guitar and singing. The plot shown in
[0101]
[0102]
[0103]
[0104]
[0105] In
[0106] In
[0107] As a result of using the method and the system for signal decomposition according to the invention one obtains sound objects according to the invention, which can serve for an acoustic signal synthesis.
[0108] More specifically, a sound object comprises an identifier indicating the object's location relative to the beginning of the track and the number of points included in the object. Each point contains the position of the object in relation to the previous point, the change of the amplitude with respect to the previous point, and a change of pulsation (expressed on a logarithmic scale) against the pulsation of the previous point. In a properly built object amplitude of the first and last point should be zero. If it is not, then in the acoustic signal such amplitude jump can be perceived as a crack. An important assumption is that objects begin with a phase equaling zero. If not, the starting point should be moved to the location in which the phase is zero, otherwise the whole object will be out of phase.
[0109] Such information is sufficient to construct an audio signal represented by an object. In the simplest case, by using parameters included in the points it is possible to determine a polygonal line of an amplitude's envelope and a polygonal line of pulsation changes. To improve the sound signal and remove high frequency generated in places of the breaks of the curves one can generate a smooth curve in the form of a polynomial of second or higher order, whose subsequent derivatives are equal in the peaks of the polygonal line (e.g. cubic spline).
[0110] In the case of linear interpolation, the equation describing the section of the audio signal from one to the next point may be in the form:
AudioSignalP.sub.i(t)=(A.sub.(i)+t*A.sub.i+1)/P.sub.(i+1))*(cos*.sub.i+t*(.sub.i+.sub.(i+1)/P.sub.(i+1)))
Where: A.sub.iamplitude of point i [0111] P.sub.iposition of point i [0112] .sub.iangular frequency of point i [0113] .sub.iphase of point i, .sub.0=0
[0114] Object's audio signal composed of the P points is the sum of offset segments described above. In the same way, the complete audio signal is the sum of offset signals of objects.
[0115] A synthesized test signal in
[0116] The sound objects according to the invention have a number of properties enabling their multiple applications, in particular in processing, analysis and synthesis of sound signals. Sound objects can be acquired with the use of the method for signal decomposition according to the invention as a result of an audio signal decomposition. Sound objects can be also formed analytically, by defining values of parameters shown in
[0117] 1) Based on parameters describing sound objects it is possible to determine the function of amplitude and frequency variation, and to determine location in respect to other objects, so that an audio signal can be composed of them.
[0118] 2) One of the parameters which describe sound objects is the time, thanks to which the objects can be shifted, shortened and lengthened in the time domain.
[0119] 3) A second parameter of sound objects is the frequency, thanks to which the objects can be shifted and modified in the frequency domain.
[0120] 4) A next parameter of sound objects is the amplitude, thanks to which envelopes of sound objects can be modified.
[0121] 5) Sound objects can be grouped, by selecting e.g. the ones present in the same time or/and the ones with frequencies being harmonics.
[0122] 6) Grouped objects can be separated from or appended to an audio signal. This allows to create a new signal from a number of other signals or to split a single signal into a number of independent signals.
[0123] 7) Grouped objects can be amplified (by increasing their amplitude) or silenced (by decreasing their amplitude).
[0124] 8) By modifying proportions of harmonic amplitude included in a group of objects it is possible to modify the timbre of the grouped objects.
[0125] 9) It is possible to modify the value of all grouped frequencies by increasing or decreasing frequencies of harmonics.
[0126] 10) It is possible to modify audible emotions contained in sound objects, by modifying the slope (falling or raising) of component frequencies.
[0127] 11) By presenting an audio signal in the form of objects described by points with three coordinates it is possible to significantly reduce the number of required data bytes without loss of information contained in the signal.
[0128] Considering the properties of sound objects, a great deal of applications can be defined for them. The exemplary ones include: [0129] 1) Separation of audio signal sources such as instruments or speakers, based on proper grouping of sound objects present in the signal. [0130] 2) Automatic generation of musical notation for individual instruments from an audio signal. [0131] 3) Devices for automatic tuning of musical instruments during ongoing musical performance. [0132] 4) Forwarding the voice of separated speakers to speech recognition systems. [0133] 5) Recognition of emotion contained in separated voices. [0134] 6) Identification of separated speakers. [0135] 7) Modification of the timbre of recognized instruments. [0136] 8) Swapping the instruments (e.g. a guitar playing instead of a piano); [0137] 9) Modification of a voice of a speaker (raising, lowering, conversion of emotion, intonation). [0138] 10) Swapping of voices of speakers. [0139] 11) Synthesis of a voice with the possibility of emotion and intonation control. [0140] 12) Smooth joining of speeches. [0141] 13) Voice control of devices, even in an environment with interference. [0142] 14) Generation of new sounds, samples, unusual sounds. [0143] 15) New musical instruments. [0144] 16) Spatial management of sound. [0145] 17) Additional possibilities of data compression.
Further Embodiments
[0146] According to an embodiment of the invention, a method for decomposition of acoustic signal into sound objects having the form of sinusoidal wave with slowly-varying amplitude and frequency, comprises a step of determining parameters of short term signal model and a step of determining parameters of long term signal model based on said short term parameters, wherein a step of determining parameters of a short term signal model comprises a conversion of the analogue acoustic signal into a digital input signal P.sub.IN and wherein in said step of determining parameters of short term signal model the input signal P.sub.IN is then split into adjacent sub-bands with central frequencies distributed according to logarithmic scale by feeding samples of the acoustic signal to the digital filter bank's input, each digital filter having a window length proportionally to the nominal central frequency [0147] at each filter's (20) output the real value FC(n) and the imaginary value FS(n) of the filtered signal is determined sample by sample, and then based on this [0148] the frequency, the amplitude and the phase of all detected constituent elements of said acoustic signal are determined sample by sample, [0149] an operation improving the frequency-domain resolution of said filtered signal is executed sample by sample and involves at least a step of determining the frequency of all detected constituent elements based on maximum values of the function FG(n) resulting from a mathematical operation reflecting the number of neighboring filters (20) outputting an angular frequency value substantially similar to an angular frequency value of each consecutive filter (20), [0150] and in that in said step of determining parameters of long term signal model: [0151] for each detected element of said acoustic signal an active object in an active objects database (34 ) is created for its tracking [0152] subsequent detected elements of said acoustic signal are associated sample by sample with at least selected active objects in said active objects database (34) to create a new active object or to append said detected element to an active object, or to close an active object [0153] for each active object in the database (34) values of the envelope of amplitude and values of frequency and their corresponding time instants are determined not less frequently than once per period of duration of a given filter's (20) window W(n) so as to create characteristic points describing slowly-varying sinusoidal waveform of said sound object [0154] at least one selected closed active object is transferred to a database of sound objects (35) to obtain at least one decomposed sound object, defined by a set of characteristic points with coordinates in time-frequency-amplitude space.
[0155] The method may further comprise a step of correcting selected sound objects which involves a step of correcting of amplitude and/or frequency of selected sound objects as to reduce an expected distortion in said sound objects, the distortion being introduced by said digital filter bank.
[0156] Improving the frequency-domain resolution of said filtered signal may further comprise a step of increasing window length of selected filters.
[0157] The operation of improving the frequency-domain resolution of said filtered signal may further comprise a step of subtracting an expected spectrum of assuredly located adjacent sound objects from the spectrum at the output of the filters.
[0158] The operation of improving the frequency-domain resolution of said filtered signal may further comprise a step of subtracting an audio signal generated based on assuredly located adjacent sound objects from said input signal.
[0159] A system for decomposition of acoustic signal into sound objects having the form of sinusoidal waveforms with slowly-varying amplitude and frequency according to a further embodiment of the invention comprises a sub-system for determining parameters of a short term signal model and a sub-system for determining parameters of a long term signal model based on said parameters, wherein said subsystem for determining short term parameters comprises a converter system for conversion of the analogue acoustic signal into a digital input signal P.sub.IN wherein said subsystem for determining short term parameters further comprises a filter bank (20) with filter central frequencies distributed according to logarithmic distribution, each digital filter having a window length proportionally to the central frequency wherein each filter (20) is adapted to determine a real value FC(n) and an imaginary value FS(n) of said filtered signal, said filter bank (2) being connected to a system for tracking objects (3), wherein said system for tracking objects (3) comprises a spectrum analyzing system (31) adapted to detect all constituent elements of the input signal P.sub.IN, a voting system (32) adapted to determine the frequency of all detected constituent elements based on maximum values of the function FG(n) resulting from a mathematical operation reflecting the number of neighboring filters (20) which output an angular frequency value substantially similar to an angular frequency value of each consecutive filter (20), and in that said subsystem for determining long term parameters comprises
[0160] a system for associating objects (33), a shape forming system (37) adapted to determine characteristic points describing slowly-varying sinusoidal waveforms, an active objects database (34) and a sound objects database (35).
[0161] The system for tracking objects (3) may further be connected with a correcting system (4) adapted to correct the amplitude and/or the frequency of individual selected sound objects so as to reduce an expected distortion in said sound objects introduced by said digital filter bank and/or adapted to combine discontinuous objects and/or to remove selected sound objects.
[0162] The system may further comprise a resolution improvement system (36) adapted to increase window length of selected filter and/or to subtract an expected spectrum of assuredly located adjacent sound objects from the spectrum at the output of the filters and/or to subtract an audio signal generated based on assuredly located adjacent sound objects from said input signal.