PSD OPTIMIZATION APPARATUS, PSD OPTIMIZATION METHOD, AND PROGRAM
20220343932 · 2022-10-27
Assignee
Inventors
Cpc classification
International classification
Abstract
Sound source enhancement technology is provided that is capable of improving sound source enhancement capabilities in accordance with settings of usage and applications. A PSD optimization device includes a PSD updating unit that takes a target sound PSD input value {circumflex over ( )}φS(ω, τ), an interference noise PSD input value {circumflex over ( )}φIN(ω, τ), and a background noise PSD input value {circumflex over ( )}φBN(ω, τ) as input, and generates a target sound PSD output value φS(ω, τ), an interference noise PSD output value {circumflex over ( )}φIN(ω, τ), and a background noise PSD output value {circumflex over ( )}φBN(ω, τ), by solving an optimization problem for a cost function relating to a variable uS representing a target sound PSD, a variable uIN representing an interference noise PSD, and a variable uBN representing a background noise PSD. The optimization problem for the cost function is defined using at least one of a constraint relating to a frequency structure of a sound source or a convex cost term relating to the frequency structure of the sound source, a constraint relating to a temporal structure of the sound source or a convex cost term relating to the temporal structure of the sound source, and a constraint relating to a spatial structure of the sound source or a convex cost term relating to the spatial structure of the sound source.
Claims
1. A power spectral density (PSD) optimization device including a PSD updating unit that, with u.sub.S as a variable representing a target sound PSD, u.sub.IN as a variable representing an interference noise PSD, and u.sub.BN as a variable representing a background noise PSD, takes a target sound PSD input value {circumflex over ( )}φ.sub.S(ω, τ), an interference noise PSD input value {circumflex over ( )}φ.sub.IN(ω, τ), and a background noise PSD input value {circumflex over ( )}φ.sub.BN(ω, τ) as input, and generates a target sound PSD output value φ.sub.S(ω, τ), an interference noise PSD output value φ.sub.IN(ω, τ), and a background noise PSD output value φ.sub.BN(ω, τ), by solving an optimization problem for a cost function relating to the variable u.sub.S, the variable u.sub.IN, and the variable u.sub.BN, wherein the optimization problem for the cost function is defined using at least one of a constraint relating to a frequency structure of a sound source or a convex cost term relating to the frequency structure of the sound source, a constraint relating to a temporal structure of the sound source or a convex cost term relating to the temporal structure of the sound source, and a constraint relating to a spatial structure of the sound source or a convex cost term relating to the spatial structure of the sound source.
2. The PSD optimization device according to claim 1, wherein a convex cost term L relating to the frequency structure is defined by
3. The PSD optimization device according to claim 1, wherein, with u.sub.BN,τ as a variable representing the background noise PSD at a time frame τ, and {circumflex over ( )}φ.sub.BN,τ−1 as a background noise PSD estimation value at time frame τ−1, a convex cost term L relating to the temporal structure is defined by
4. The PSD optimization device according to claim 1, wherein, with c as a PSD estimation value of enhanced signals of the sound source at the target sound direction-of-arrival θ.sub.S, the constraint relating to the spatial structure is defined by
u.sub.S+u.sub.IN+u.sub.BN=c.
5. The PSD optimization device according to claim 1, wherein, with u=[u.sub.S.sup.T, u.sub.IN.sup.T, u.sub.BN.sup.T].sup.T, and v as an auxiliary variable of variable u, the optimization problem for the cost function is defined as a problem in which inf.sub.u,v F.sub.1(u)+F.sub.2(v) (where F.sub.1 and F.sub.2 are convex functions) is solved under linear constraints relating to the variables u and v, and where the linear constraints relating to the variables u and v include at least one of a constraint relating to the frequency structure of the sound source, a constraint relating to the temporal structure of the sound source, and a constraint relating to the spatial structure of the sound source, or F.sub.1(u)+F.sub.2(v) includes at least one of a convex cost term relating to the frequency structure of the sound source, a convex cost term relating to the temporal structure of the sound source, and a convex cost term relating to the spatial structure of the sound source.
6. The PSD optimization device according to claim 5, wherein linear constraints regarding the variables u and v are
Au=v, Bu=c, u≥0 where A=[Λ0 0], B=[I, I, I], c is the PSD estimation value of enhanced signals of the sound source at the target sound direction-of-arrival θ.sub.S, Δ(∈R.sup.Ω×Ω) is a predetermined sparse matrix, I(∈R.sup.Ω×Ω) is an identity matrix, and Ω is the number of frequency bands, F.sub.1(u) and F.sub.2(v) are each
D.sub.p.sup.⋅=D.sub.p.sup.−1, D.sub.q.sup.⋅=D.sub.q.sup.−1D.sub.r.sup.⋅=D.sub.r.sup.−1
{tilde over (p)}=∇D.sub.p(p),{tilde over (q)}=∇D.sub.q(q),{tilde over (r)}=∇D.sub.r(r) wherein the PSD updating unit includes a first variable calculator that calculates u.sup.t+1, which is a result of updating the variable u for a t+1′th time, by the following Expression,
{tilde over (p)}.sup.t+1/2={tilde over (p)}.sup.t−2Au.sup.t+1 a second dual variable calculator that calculates ˜q.sup.t+1, which is a result of updating the dual variable ˜q for the t+1′th time, by the following Expression,
{tilde over (q)}.sup.t+1={tilde over (q)}.sup.t−2(Bu.sup.t+1−c) a third dual variable calculator that calculates ˜r.sup.t+1/2, which is an intermediate updating result of the dual variable ˜r for the t+1′th time, by the following Expression,
{tilde over (r)}.sup.t+1/2={tilde over (r)}.sup.t−2u.sup.t+1 a second variable calculator that calculates v.sup.t+1, which is a result of updating the auxiliary variable v for the t+1′th time, by the following Expression,
{tilde over (p)}.sup.t+1={tilde over (p)}.sup.t+1/2+2v.sup.t+1 a fifth dual variable calculator that sets ˜r=[˜r.sub.1.sup.T, ˜r.sub.2.sup.T, ˜r.sub.3.sup.T].sup.T, and calculates ˜r.sup.t+1, which is a result of updating the dual variable ˜r for the t+1′th time, by the following Expression
7. A power spectral density (PSD) optimization method including a PSD updating step, in which, with u.sub.S as a variable representing a target sound PSD, u.sub.IN as a variable representing an interference noise PSD, and u.sub.BN as a variable representing a background noise PSD, a PSD optimization device takes a target sound PSD input value {circumflex over ( )}φ.sub.S, an interference noise PSD input value {circumflex over ( )}φ.sub.IN, and a background noise PSD input value {circumflex over ( )}φ.sub.BN as input, and generates a target sound PSD output value φ.sub.S, an interference noise PSD output value φ.sub.IN, and a background noise PSD output value φ.sub.BN, by solving an optimization problem for a cost function relating to the variable u.sub.S, the variable u.sub.IN, and the variable u.sub.BN, wherein the optimization problem for the cost function is defined using at least one of a constraint relating to a frequency structure of a sound source or a convex cost term relating to the frequency structure of the sound source, a constraint relating to a temporal structure of the sound source or a convex cost term relating to the temporal structure of the sound source, and a constraint relating to a spatial structure of the sound source or a convex cost term relating to the spatial structure of the sound source.
8. A non-transitory computer-readable medium having computer-readable instructions stored thereon, which, when executed, cause a computer including a memory and a processor to execute a set of operations, comprising: obtaining a target sound power spectral density (PSD) input value {circumflex over ( )}φ.sub.S(ω, τ), an interference noise PSD input value {circumflex over ( )}φ.sub.IN(ω, τ), and a background noise PSD input value {circumflex over ( )}φ.sub.BN(ω, τ) as input, with u.sub.S as a variable representing a target sound PSD, u.sub.IN as a variable representing an interference noise PSD, and u.sub.BN as a variable representing a background noise PSD, and generating a target sound PSD output value φ.sub.S(ω, τ), an interference noise PSD output value φ.sub.IN(ω, τ), and a background noise PSD output value φ.sub.BN(ω, τ), by solving an optimization problem for a cost function relating to the variable u.sub.S, the variable u.sub.IN, and the variable u.sub.BN, wherein the optimization problem for the cost function is defined using at least one of: a constraint relating to a frequency structure of a sound source or a convex cost term relating to the frequency structure of the sound source, a constraint relating to a temporal structure of the sound source or a convex cost term relating to the temporal structure of the sound source, and a constraint relating to a spatial structure of the sound source or a convex cost term relating to the spatial structure of the sound source.
9. The PSD optimization method according to claim 7, wherein the target sound power spectral density (PSD) input value {circumflex over ( )}φ.sub.S(ω, τ), the interference noise PSD input value {circumflex over ( )}φ.sub.IN(ω, τ), and the background noise PSD input value are obtained based on temporal region observation signals x.sub.m(t) obtained by a microphone element m.
10. The PSD optimization method according to claim 9, wherein the set of operations further comprises: generating frequency region observation signals X.sub.m(ω, τ) by transforming the temporal region observation signals x.sub.m(t) according to frequency region.
11. The PSD optimization method according to claim 10, wherein the set of operations further comprises: generating enhanced signals Y.sub.θ_S(ω, τ) by performing linear filtering of the frequency region observation signals X.sub.m(ω, τ).
12. The PSD optimization method according to claim 10, wherein the set of operations further comprises: generating enhanced signals Y.sub.θ_j by performing linear filtering of the frequency region observation signals X.sub.m(ω, τ).
13. The non-transitory computer-readable medium according to claim 8, wherein a convex cost term L relating to the frequency structure is defined by
14. The non-transitory computer-readable medium according to claim 13, wherein, with u.sub.BN,τ as a variable representing the background noise PSD at a time frame τ, and {circumflex over ( )}φ.sub.BN,τ−1 as a background noise PSD estimation value at time frame τ−1, a convex cost term L relating to the temporal structure is defined by
15. The non-transitory computer-readable medium according to claim 13, wherein, with c as a PSD estimation value of enhanced signals of the sound source at the target sound direction-of-arrival θ.sub.S, the constraint relating to the spatial structure is defined by
u.sub.S+u.sub.IN+u.sub.BN=c.
16. The non-transitory computer-readable medium according to claim 13, wherein, with u=[u.sub.S.sup.T, u.sub.IN.sup.T, u.sub.BN.sup.T].sup.T, and v as an auxiliary variable of variable u, the optimization problem for the cost function is defined as a problem in which inf.sub.u,v F.sub.1(u)+F.sub.2(v) (where F.sub.1 and F.sub.2 are convex functions) is solved under linear constraints relating to the variables u and v, and where the linear constraints relating to the variables u and v include at least one of a constraint relating to the frequency structure of the sound source, a constraint relating to the temporal structure of the sound source, and a constraint relating to the spatial structure of the sound source, or F.sub.1(u)+F.sub.2(v) includes at least one of a convex cost term relating to the frequency structure of the sound source, a convex cost term relating to the temporal structure of the sound source, and a convex cost term relating to the spatial structure of the sound source.
17. The non-transitory computer-readable medium according to claim 16, wherein linear constraints regarding the variables u and v are
Au=v, Bu=c, u≥0 where A=[Λ0 0], B=[I, I, I], c is the PSD estimation value of enhanced signals of the sound source at the target sound direction-of-arrival θ.sub.S, Λ(∈R.sup.Ω×Ω) is a predetermined sparse matrix, I(∈R.sup.Ω×Ω) is an identity matrix, and Ω is the number of frequency bands, F.sub.1(u) and F.sub.2(v) are each
D.sub.p.sup.⋅=D.sub.p.sup.−1,D.sub.q.sup.⋅=D.sub.q.sup.−1,D.sub.r.sup.⋅=D.sub.r.sup.−1
{tilde over (p)}=∇D.sub.p(p),{tilde over (q)}=∇D.sub.q(q), {tilde over (r)}=∇D.sub.r(r) wherein the PSD updating unit includes a first variable calculator that calculates u.sup.t+1, which is a result of updating the variable u for a t+1′th time, by the following Expression,
{tilde over (p)}.sup.t+1/2={tilde over (p)}.sup.t−2Au.sup.t+1 a second dual variable calculator that calculates ˜q.sup.t+1, which is a result of updating the dual variable ˜q for the t+1′th time, by the following Expression,
{tilde over (q)}.sup.t+1={tilde over (q)}.sup.t−2(Bu.sup.t+1−c) a third dual variable calculator that calculates ˜r.sup.t+1/2, which is an intermediate updating result of the dual variable ˜r for the t+1′th time, by the following Expression,
{tilde over (r)}.sup.t+1/2={tilde over (r)}.sup.t−2u.sup.t+1 a second variable calculator that calculates v.sup.t+1, which is a result of updating the auxiliary variable v for the t+1′th time, by the following Expression,
{tilde over (p)}.sup.t+1={tilde over (p)}.sup.t+1/2+2v.sup.t+1 a fifth dual variable calculator that sets ˜r=[˜r.sub.1.sup.T, ˜r.sub.2.sup.T, ˜r.sub.3.sup.T].sup.T, and calculates ˜r.sup.t+1, which is a result of updating the dual variable ˜r for the t+1′th time, by the following Expression
Description
BRIEF DESCRIPTION OF DRAWINGS
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
DESCRIPTION OF EMBODIMENTS
[0020] Embodiments of the present invention will be described in detail below. Note that components that have the same function are denoted by the same signs, and repetitive description will be omitted.
[0021] Notation in the present specification will be explained in advance to description of the embodiments.
[0022] An _ (underscore) represents a subscript index. For example, x.sup.y_z represents that y.sub.z is a superscript index of x, and x.sub.y_z represents that y.sub.z is a subscript index of x.
[0023] Also, the superscript indices “{circumflex over ( )}” and “˜” such as {circumflex over ( )}x or ˜x regarding a certain character x should actually be shown directly above the “x”, but are written as {circumflex over ( )}x and .sup.˜x due to limitations in notation of description in the specification.
TECHNICAL BACKGROUND
[0024] Embodiments of the present invention are to perform optimization processing with regard to a PSD of a target sound, a PSD of interference noise, and a PSD of background noise, estimated by the technique according to NPL 1, so that sound source enhancement capabilities are improved. Accordingly, the technique of NPL 1, which is a conventional technique, will be described first.
[0025] <<Conventional Technique>>
[0026] A sound source enhancement device 900 will be described below with reference to
[0027] Operations of the sound source enhancement device 900 will be described following
[0028] In S910, the microphone array 910, which is made up of M (where M is an integer of 2 or greater) microphone elements, generates and outputs temporal region observation signals x.sub.m(t) (m=0, 1, . . . , M−1) collected by microphone element m. Accordingly, m serves as a number indicating by which microphone element the signal has been observed.
[0029] In S920, the frequency region transform unit 920 takes the temporal region observation signals x.sub.m(t) (m=0, 1, . . . , M−1) generated in S910 as input, and transforms each of temporal region observation signals x.sub.m(t) (m=0, 1, . . . , M−1) into frequency region, thereby generating frequency region observation signals X.sub.m(ω, τ) (m=0, 1, . . . , M−1), which are output. Discrete Fourier transform can be used for transforming into the frequency region, for example.
[0030] The frequency region observation signals X.sub.m(ω, τ) are modeled below by the following Expression using target sound s(ω, τ)∈C, a K number (where K is an integer of 1 or greater) of interference noise ν.sub.k(ω, τ)∈C, and background noise ε.sub.m(ω, τ)∈C.
[0031] Here, ω, τ represent the angular frequency bin and time frame No., respectively. Also, h.sub.m.sup.S(ω)∈C is a transfer function between the sound source of the target sound and the microphone element m, and h.sub.k,m.sup.IN(ω)∈C (k=1, . . . , K) is a transfer function between an interference noise source k and each microphone element m.
[0032] In this model, a problem can be handed in which the direction of arrival (DOA: Direction of arrival) of the target sound is known, while the information relating to noise, such as the direction of arrival and number of interference noises, the noise level of background noise, and so forth, for example, is unknown.
[0033] In S930, the first beamformer unit 930 takes the frequency region observation signals X.sub.m(ω, τ) (m=0, 1, . . . , M−1) generated in S920 as input, and generates and outputs enhanced signals Y.sub.θ_S(ω, τ) of a sound source at a target sound direction-of-arrival θ.sub.S (hereinafter referred to as first enhanced signals Y.sub.θ_S (ω, τ)) by performing linear filtering of the frequency region observation signals X.sub.m(ω, τ) (m=0, 1, . . . , M−1). In a case in which the arrival time difference of the target sound direction-of-arrival θ.sub.S is known, the first enhanced signals Y.sub.θ_S are calculated from the following Expression, by a beamforming linear filter (i.e., a beamforming linear filter constructed using the arrival time difference of the target sound direction-of-arrival θ.sub.S) w.sub.θ_S.sup.H∈C.sup.M.
Y.sub.θ.sub.
[0034] Here, ⋅.sup.H represents a complex conjugate transpose. Also, X(ω, τ)=[X.sub.0(ω, τ), . . . , X.sub.M-1(ω, τ)].sup.T holds.
[0035] In S940, the second beamformer unit 940 takes the frequency region observation signals X.sub.m(ω, τ) (m=0, 1, . . . , M−1) generated in S920 as input, and generates and outputs L−1 (where L−1 is an integer that is K or greater) enhanced signals Y.sub.θ_j(ω, τ) (j=1, . . . , L−1) of sound sources of directions θ.sub.j other than the target sound direction-of-arrival (hereinafter referred to as second enhanced signals Y.sub.θ_j(ω, τ)) by linear filtering of the frequency region observation signals X.sub.m(ω, τ) (m=0, 1, . . . , M−1). The second beamformer unit 940 calculates the second enhanced signals Y.sub.θ_j (ω, τ) by the same method as the first beamformer unit 930. That is to say, the second beamformer unit 940 calculates the second enhanced signals Y.sub.θ_j (ω, τ) by the beamforming linear filter constructed using the arrival time difference of directions θ.sub.j other than the target sound direction-of-arrival, set in advance.
[0036] In S950, the PSD generating unit 950 takes the first enhanced signals Y.sub.θ_S(ω, τ) generated in S930 and the second enhanced signals Y.sub.θ_j(ω, τ) (j=1, . . . , L−1) generated in S940 as input, and generates and outputs target sound PSD φ.sub.S(ω, τ), interference noise PSD φ.sub.IN(ω, τ), and background noise PSD φ.sub.BN(ω, τ), using the first enhanced signals Y.sub.θ_S(ω, τ) and the second enhanced signals Y.sub.θ_j(ω, τ) (j=1, . . . , L−1).
[0037] The PSD generating unit 950 will be described below with reference to
[0038] The operations of the PSD estimating unit 950 will be described following
[0039] In S951, the first PSD estimating unit 951 takes the first enhanced signals Y.sub.θ_S(ω, τ) generated in S930 and the second enhanced signals Y.sub.θ_j(ω, τ) (j=1, . . . , L−1) generated in S940 as input. The first PSD estimating unit 951 then performs local PSD estimation using the first enhanced signals Y.sub.θ_S(ω, τ) and the second enhanced signals Y.sub.θ_j(ω, τ) (j=1, . . . , L−1), thereby estimating the target sound PSD˜φ.sub.S(ω, τ) and the interference noise PSD ˜φ.sub.IN(ω, τ), which are output. Local PSD estimation is a technique for estimating the target sound PSD and the interference noise PSD, using difference in gain on the basis of spatial positions of the target sound and the interference noise. The relation between PSD φ.sup.BF(ω, τ)=[φ.sub.0.sup.BF (ω, τ), φ.sub.1.sup.BF(ω∈τ), . . . , φ.sub.L-1.sup.BF(ω, τ)].sup.T∈R.sup.L of the first enhanced signals Y.sub.θ_S(ω, τ) and L−1 second enhanced signals Y.sub.θ_j(ω, τ), and PSD φ.sup.G(ω, τ)=[φ.sub.0.sup.G(ω, τ), φ.sub.1.sup.G(ω, τ), . . . , φ.sub.N-1.sup.B(ω, τ)].sup.T∈R.sup.N of target sound and interference noise grouped in N (where N is an integer or two or more) directions can be approximately expressed as in a form of linear transform as in the following Expression.
[0040] Note, however, that φ.sub.0.sup.BF(ωτ) is the PSD of the first enhanced signals Y.sub.θ_S (ω, τ). Accordingly, φ.sub.0.sup.BF(ω, τ)=|Y.sub.θ_S(ω, τ)|.sup.2 holds. Also, D.sub.j,n(ω)∈R.sup.L′N′Q is sensitivity as to direction n at the angular frequency bin ω and beamformer j. Here, beamformer 0 is the beamformer at the target sound direction-of-arrival θ.sub.S, and beamformer j is beamformers of directions θ.sub.j other than the target sound direction-of-arrival.
[0041] Solving this Expression yields φ.sup.G(ω, τ)∈R.sup.N. That is to say, first, the first PSD estimating unit 951 solves this expression to obtain φ.sup.G(ω, τ)∈R.sup.N.
φ.sup.G(ω,τ)=[D.sub.power*(ω)φ.sup.BF(ω,τ)].sub.+ [Math. 4]
[0042] Here, ⋅* and [⋅]+ respectively indicate a pseudo-inverse matrix and an operator that makes each element of the matrix to be a non-negative value.
[0043] Note that in order to reduce computation amount, performing PSD estimation with a frequency filter bank integrated into several frequency bands is effective.
[0044] In the above Expression, assuming that the PSD of the sound source of the target sound direction-of-arrival θ.sub.S is included at direction 0, and the PSD of a group at a different direction from the target sound is included at direction 1 through direction N−1, the target sound PSD ˜φ.sub.S(ω, τ) and the interference noise PSD ˜φ.sub.IN(ω, τ) are estimated by the following Expression. That is to say, next, the first PSD estimating unit 951 estimates the target sound PSD ˜φ.sub.S(ω, τ) and the interference noise PSD ˜φ.sub.IN(ω, τ) by the following Expressions.
[0045] In S952, the second PSD estimating unit 952 takes the target sound PSD ˜φ.sub.S(ω, τ) and the interference noise PSD ˜φ.sub.IN (ω, τ) estimated in S951 as input, and estimates and outputs the target sound PSD φ.sub.S(ω, τ), the interference noise PSD φ.sub.IN(ω, τ), and the background noise PSD φ.sub.BN(ω, τ), using the target sound PSD ˜φ.sub.S(ω, τ) and the interference noise PSD ˜φ.sub.IN(ω, τ). The estimation method will be described below. The background noise can be assumed to be steady. Accordingly, first, the second PSD estimating unit 952 uses PSD .sup.⋅φ.sub.S(ω, τ), .sup.⋅φ.sub.IN(ω, τ), smoothed by recursive smoothing computation, to calculate two background noise PSDs φ.sub.BN_s(ω, τ), φ.sub.BN_IN(ω, τ) as minimums in a certain section Γ.
φ.sub.BN.sub.
φ.sub.BN.sub.
{dot over (φ)}.sub.S(ω,τ)=β.sub.S{tilde over (φ)}.sub.S(ω,τ)+(1−β.sub.S){tilde over (φ)}.sub.S(ω,τ−1) (0<β.sub.S≤1)
{dot over (φ)}.sub.IN(ω,τ)=β.sub.IN{tilde over (φ)}.sub.IN(ω,τ)+(1−β.sub.IN){tilde over (φ)}.sub.IN(ω,τ−1) (0<β.sub.IN≤1) [Math. 6]
[0046] Here, β.sub.S and β.sub.IN are each forgetting coefficients. Note that β.sub.S and β.sub.IN are decided taking into consideration temporal energy change of the target sound, interference noise, and background noise.
[0047] The second PSD estimating unit 952 then estimates the target sound PSD φ.sub.S(ω, τ), interference noise PSD φ.sub.IN(ω, τ), and background noise PSD φ.sub.BN(ω, τ), by the following Expressions.
φ.sub.S(ω,τ)={tilde over (φ)}.sub.S(ω,τ)−φ.sub.BN.sub.
φ.sub.IN(ω,τ)={tilde over (φ)}.sub.IN(ω,τ)−φ.sub.BN.sub.
φ.sub.BN(ω,τ)={tilde over (φ)}.sub.BN.sub.
[0048] In S960, the sound source enhancing unit 960 takes, as input, the first enhanced signals Y.sub.θ_S(ω, τ generated in S930, and the target sound PSD φ.sub.S(ω, τ), interference noise PSD φ.sub.IN(ω, τ), and background noise PSD φ.sub.BN(ω, τ), generated in S950. The sound source enhancing unit 960 then uses the first enhanced signals Y.sub.θ_S (ω, τ), target sound PSD φ.sub.S(ω, τ), interference noise PSD φ.sub.IN(ω, τ), and background noise PSD φ.sub.BN(ω, τ), to generate and output frequency region target sound signals Z(ω, τ)∈C. Specifically, the sound source enhancing unit 960 calculates the frequency region target sound signals Z (ω, τ) from the following Expression, using a Weiner filter calculated from the target sound PSD φ.sub.S(ω, τ), interference noise PSD φ.sub.IN(ω, τ), and background noise PSD φ.sub.BN (ω, τ).
[0049] In S970, the temporal region transform unit 970 takes the frequency region target sound signals Z(ω, τ) generated in S960 as input, and generates and outputs temporal region target sound frequency region observation signals z(t)∈R by transform of the frequency region target sound signals Z(ω, τ) into the temporal region. Inverse transform of discrete Fourier transform, for example, can be used for performing transform to temporal region.
[0050] <<Optimization of PSD>>
[0051] A method will be described here regarding optimization of the PSD generated by the technique according to NPL 1, to improve sound source enhancement capabilities in accordance with settings of usage and applications.
[0052] There are the following three features in this optimization method.
(1) At least one PSD of the target sound PSD, interference noise PSD, and background noise PSD is optimized.
(2) The optimization processing of (1) is formulated as an optimization problem of a cost function represented as one convex cost term or a sum of a plurality of convex cost terms relating to a variable representing the PSD, under constraints relating to the PSD.
(3) The optimization problem of (2) is defined using the constraint or convex cost term of (a), the constraint of (b), the constraint or convex cost term of (c), listed below, for example. Note however, that two or more may be used out of the constraint or convex cost term of (c). Also, including the constraint or convex cost term of (a), the constraint of (b), is not indispensable.
[0053] (a) constraint or convex cost term based on assumption that a certain level of estimation has been achieved by the conventional PSD estimation (i.e., output of the PSD estimating unit 950)
[0054] (b) non-negative constraint of PSD
[0055] (c) constraint or convex cost term relating to PSD, based on structure of sound source
[0056] Note that here, the structure of sound source means the frequency structure, temporal structure, and spatial structure (inter-channel structure) of the target sound, interference noise, and background noise.
[0057] In the above optimization problem, constraints relating to PSD are expressed by linear equalities or inequalities, and cost functions are expressed as functions combining one or more convex cost terms relating to a variable representing PSD (cost term that is a closed proper convex function). That is to say, the optimization problem is a convex optimization problem with linear constraint. The optimized PSD is then obtained as the solution to this optimization problem.
[0058] One or more convex cost terms and zero or more constraints are used for this convex optimization problem with linear constraint. Increasing convex cost terms or constrains makes the optimization problem complicated, but can be solved with a low computation amount to a degree that enables real-time sound source enhancement processing by using later-described Bregman monotone operator splitting (B-MOS: Bregman Monotone Operator Splitting).
[0059] Hereinafter, the target sound PSD φDs(ω, τ), interference noise PSD φ.sub.IN(ω, τ), and background noise PSD φ.sub.BN(ω, τ), estimated by the second PSD estimating unit 952 will be respectively written as {circumflex over ( )}φ.sub.S(ω, τ), {circumflex over ( )}φ.sub.IN(ω, τ), and {circumflex over ( )}φ.sub.BN(ω, τ).
[0060] (1: Specific Examples of Constraints and Convex Cost Terms)
[0061] Specific examples of constraints and convex cost terms for (a) through (c) will be described here. The constraints or convex cost terms of (c) can be classified as follows.
(c-1) constraints or convex cost terms based on frequency structure of sound source
(c-2) constraints or convex cost terms based on temporal structure of sound source
(c-3) constraints or convex cost terms based on spatial structure (inter-channel structure) of sound source
[0062] First, variables that are the object of optimization in an optimization problem will be described.
[0063] (1-1: Definition of Variables)
[0064] PSD is organized in optional frequency bands. The number of frequency bands here is Ω.
[0065] A variable representing the target sound PSD, a variable representing the interference noise PSD, and a variable representing the background noise PSD, in time frame τ, are u.sub.S, τ, u.sub.IN,τ, and u.sub.BN,τ, respectively. Also, a target sound PSD input value, an interference noise PSD input value, and a background noise PSD input value, in time frame τ, are {circumflex over ( )}φ.sub.S,τ, {circumflex over ( )}φ.sub.IN,τ, and {circumflex over ( )}φ.sub.BN,τ, respectively. That is to say,
u.sub.i,τ=[φ.sub.i(0,τ), . . . , φ.sub.i(Ω−1,τ)].sup.Ti∈{S,IN,BN}
{circumflex over (φ)}.sub.i,τ=[{circumflex over (φ)}.sub.i(0,τ), . . . , {circumflex over (φ)}.sub.i(Ω−1,τ)].sup.Ti∈{S,IN,BN} [Math. 9]
[0066] hold. Also, u=[u.sub.S,τ.sup.T, u.sub.IN,τ.sup.T, u.sub.BN,τ.sup.T].sup.T, and {circumflex over ( )}φ.sub.τ=[{circumflex over ( )}φ.sub.S,τ.sup.T, {circumflex over ( )}φ.sub.IN,τ.sup.T, {circumflex over ( )}φ.sub.BN,τ.sup.T].sup.T hold.
[0067] Also, c.sub.τ∈R.sup.Ω is defined by the following Expression for PSD φ.sub.Y_θ_S s of the first enhanced signals in time frame τ (i.e., signals beamformed in target sound direction-of-arrival θ.sub.S) Y.sub.θ_S (ω, τ).
[0068] Accordingly, c.sub.τ is the PSD estimation value of enhanced signals of the sound source at the target sound direction-of-arrival θ.sub.S in time frame τ.
[0069] Hereinafter, in cases of describing constraints or convex cost terms that are not dependent on preceding or following time frames, the time frame index τ will be omitted.
[0070] (1-2: Constraint or Convex Cost Term Based on Assumption that Certain Level of Estimation has been Achieved by Conventional PSD Estimation (i.e., Output of PSD Estimating Unit 950))
[0071] The value of the variable u=[u.sub.S.sup.T, u.sub.IN.sup.T, u.sub.BN.sup.T].sup.T is assumed to be a value close to the PSD input value {circumflex over ( )}φ=[{circumflex over ( )}φ.sub.S.sup.T, {circumflex over ( )}φ.sub.IN.sup.T, {circumflex over ( )}φ.sub.BN.sup.T].sup.T. A convex cost term corresponding to this assumption can be expressed by a quadratic function such as in the following Expressions, for example.
[0072] Here, w.sub.i∈R.sup.+ (i∈{S, IN, BN}) is a coefficient for adjusting weighting of convex cost terms (weighting coefficient). Note that R.sup.+ represents a set of positive real numbers.
[0073] These convex cost terms may also be combined and used. For example, in a case of optimizing the three PSDs of the target sound, interference noise, and background noise, convex cost terms such as in the following Expression can be used.
L(u)=L(u.sub.S)+L(u.sub.IN)+L(u.sub.BN) [Math. 12]
[0074] (1-3: Non-Negative Constraint of PSD)
[0075] PSDs are non-negative values. Accordingly, constraints can be applied by inequalities of u.sub.S≥0, u.sub.IN≥0, u.sub.BN≥0, i.e., u≥0.
[0076] (1-4: Constraint or Convex Cost Term Based on Frequency Structure of Sound Source)
[0077] The frequency structure of the target sound will be described here as one example.
[0078] The target sound PSD input value {circumflex over ( )}φ.sub.S contains interference noise PSD and background noise PSD that have not completely been split, as small values. In a case where the target sound is speech, for example, the harmonic structure of the target sound PSD can be assumed, and accordingly prior knowledge such as being sparce in the frequency direction, that there is an overtone structure in the frequency direction, that there is co-occurrence relations in frequency bands adjacent to overtones, and so forth, and be used. Accordingly, it is anticipated that the target sound PSD and noise PSD (i.e., interference noise PSD and background noise PSD) can be split by using constraints and convex cost terms based on such prior knowledge. Accordingly, convex cost terms corresponding to the above assumption will be expressed using an Li norm. Note however, that sparce target sound PSD is estimated in a region weighted using Λ∈R.sup.Ω×Ω, to keep from deleting components that are small values but are auditorily important. Also, in order to stabilize the optimization algorithm, a squared error of a signal where the target sound PSD input value {circumflex over ( )}φ.sub.S is transformed by Λ is added to the cost term. To summarize the above, the cost term of the target can be expressed by the following Expression.
[0079] Here, μ, ρ(∈R.sup.+) are weighting coefficients. Also, Λ(∈R.sup.Ω×Ω) is a predetermined sparce matrix.
[0080] Specific examples of Λ∈R.sup.Ω×Ω a are the following (α) and (β). These (α) and (β) may be combined.
(α) frequency weighting matrix Λ.sub.w
[0081] (β) matrix Λ.sub.nb for smoothing with adjacent frequency band
[0082] In a case of taking the moving average with one band to each of the left and the right, the matrix Λ.sub.nb is as in the following Expression.
[0083] (1-5: Constraints or Convex Cost Terms Based on Temporal Structure of Sound Source)
[0084] Smoothing with the PSD of the immediately-preceding time frame will be described here as one example.
[0085] Suppression of distortion can be anticipated by assuming that the value of PSD will smoothly change between preceding and following time frames. A convex cost term corresponding to this assumption can be expressed as a term using a squared error such as in the following Expression, for example.
[0086] Note that {circumflex over ( )}φ.sub.BN,τ−1 is a background noise PSD estimation value at time frame τ−1. Also, γ.sub.BN (∈R.sup.+) is a weighting coefficient.
[0087] Estimation of background noise PSD that is smooth in the temporal direction can be performed by minimizing this cost term. Note that in a case in which the target sound or interference noise is singing, musical instruments, or the like, for example, the target sound and the interference noise are also smooth in the temporal direction, and accordingly a convex cost term such as in the above Expression for background noise can be used for the target sound and interference noise as well (see following Expressions).
[0088] Note however, that {circumflex over ( )}φ.sub.S,τ−1 and {circumflex over ( )}φ.sub.IN, τ−1 respectively are target sound PSD estimation value at time frame τ−1, and interference noise PSD estimation value at time frame τ−1. Also, γ.sub.S, γ.sub.IN (∈R.sup.+) are weighting coefficients.
[0089] (1-6: Constraints or Convex Cost Terms Based on Spatial Structure of Sound Source)
[0090] Additivity constraint of PSD will be described here as an example.
[0091] Assuming additivity of PSD in the frequency region, the sum of target sound PSD, interference noise PSD, and background noise PSD is close to a PSD estimation value c of enhanced signals of the sound source at the target sound direction-of-arrival θ.sub.S. A constraint regarding this assumption can be expressed by the following linear constraint, for example.
u.sub.S+u.sub.IN+u.sub.BN=c [Math. 18]
[0092] It is anticipated that by using this constraint, distortion will be reduced and components lost in upstream processing (i.e., at the output of the PSD generating unit 950) will be restored, consequently improving PSD estimation precision.
[0093] (1-7: Summarization)
[0094] To summarize the above, the optimization problem is an optimization problem for a cost function relating to the variable u.sub.S, the variable u.sub.IN, and the variable u.sub.BN, and is defined using at least one of the convex cost terms described below. Now, these convex cost terms are
(1) constraints relating to frequency structure of the sound source or convex cost terms relating to frequency structure of the sound source,
(2) constraints relating to temporal structure of the sound source or convex cost terms relating to temporal structure of the sound source, and
(3) constraints relating to spatial structure of the sound source or convex cost terms relating to spatial structure of the sound source.
[0095] Note that the optimization problem may be defined in a form in which a constraint or convex cost term based on assumption that a certain level of estimation has been achieved by the conventional PSD estimation (i.e., output of the PSD estimating unit 950) and non-negative constraint of PSD are used in conjunction, as a matter of course.
[0096] (2: Application Example)
[0097] A specific example of the optimization problem, and an optimization algorithm for solving this optimization problem, will be described here.
[0098] A problem defined using the constraints and convex cost terms of (a), (b), (c-1), and (c-3), will be considered as a specific example of the optimization problem.
[0099] Here, μ, ρ(∈R.sup.+) are weighting coefficients. Also, with Λ(∈R.sup.Ω×Ω) as a predetermined sparse matrix, and I(∈R.sup.Ω×Ω) as an identity matrix, matrices A and B, vectors c and {circumflex over ( )}v.sub.{circumflex over ( )}φ_S, and matrices W and W.sup.1/2, are given by the following Expressions.
[0100] The cost function F.sub.1+F.sub.2 in this optimization problem uses, besides a latent variable u, an auxiliary variable v of the latent variable u. Also, constraints of this optimization problem are linear constraints regarding the variables u and v, i.e., Au=v, Bu=c, u≥0.
[0101] Solving a dual problem instead of solving the above optimization problem will be considered. The dual problem is shown in the following Expression.
[0102] By organizing dual variables p, q, and r as ζ=[p, q, r].sup.T, the dual problem can be expressed as in the following Expression.
[0103] Here, F.sub.1* and F.sub.2* are convex conjugate functions of F.sub.1 and F.sub.2, and are expressed as in the following Expressions.
[0104] Also, I.sub.(r?0) (r) is an indicator function that guarantees the non-negativity of r.
[0105] It can be seen from the above that the cost function of the dual problem is expressed as the sum of two closed proper convex functions G.sub.1 and G.sub.2.
[0106] In order to realize sound source enhancement in real time, an algorithm is necessary that solves the above dual problem inf.sub.ζ G.sub.1(ζ)+G.sub.2(ζ) at high speeds. The Bregman monotone operator splitting (B-MOS) disclosed in reference NPL 1 is used here. [0107] (Reference NPL 1: K. Niwa and W. B. Kleijn, “Bregman monotone operator splitting”, https://arxiv.org/abs/1807.04871, 2018.)
[0108] Specifically, a Bregman-Peaceman-Rachfold (B-P-R) type optimization solver is used. The B-P-R type optimization solver uses a recursive update expression obtained from a fixed-point condition where 0 ∈∂G.sub.1(ζ)+∂G.sub.2(ζ).
ζ∈C.sub.2C.sub.1(ζ) [Math. 25]
[0109] This Expression is configured using the following D-Cayley operator C.sub.i.
[0110] Here, .sup.⋅−1 represents inverse mapping. Also, D is a function used for defining Bregman divergence. A function that satisfies ∇D(0)=0, and in which ∇D is a strongly convex function that is differentiable, is used as the function D.
[0111] Also, R.sub.1 and I are respectively a D-resolvent operator and an identity operator, and the D-resolvent operator R.sub.i is given by the following Expression.
R.sub.i=(I+(∇D).sup.−1∂G.sub.i).sup.−1 (i=1,2) [Math. 27]
[0112] The optimization algorithm shown in
[0113] Accordingly, ∇D.sub.p, ∇D.sub.q, and ∇D.sub.r are respectively given by the following Expressions.
∇D.sub.p=AW.sup.−1A.sup.T
∇D.sub.q=BW.sup.−1B.sup.T
∇D.sub.r=W.sup.−1 [Math. 29]
[0114] Accordingly, the gradients of the strongly convex functions D.sub.p, D.sub.q, and D.sub.r at zero are 0.
[0115] Also, Bregman divergence is used in the regularization term of the proximal operator in updating of the primary variable u in the algorithm in
J.sub.D.sub.∇D.sub.p.sup.⋅({tilde over (p)}),Au−{tilde over (p)}
[Math. 30]
[0116] Here, D.sub.p.sup.⋅=D.sub.p.sup.−1.
[0117] Generally, ∇(D.sup.−1)=(∇D).sup.−1 holds with respect to the differential operator of the strongly convex function D, and accordingly ∇D.sub.p.sup.⋅=∇(D.sub.p.sup.−1)=(∇D.sub.p).sup.−1=(AWA.sup.T)* holds. This is the same for ∇D.sub.q.sup.⋅ and ∇D.sub.r.sup.⋅ as well. Accordingly, ∇D.sub.p.sup.⋅, ∇D.sub.q.sup.⋅, and ∇D.sub.r.sup.⋅ are given by the following Expressions.
∇D.sub.p.sup.⋅=(AW.sup.−1A.sup.T)*
∇D.sub.q.sup.⋅=(BW.sup.−1B.sup.T)*
∇D.sub.r.sup.⋅=(W.sup.−1)* [Math. 31]
[0118] In the algorithm in
First Embodiment
[0119] A sound source enhancement device 100 will be described below with reference to
[0120] Operations of the sound source enhancement device 100 will be described following
[0121] In S910, the microphone array 910, which is made up of M (where M is an integer of 2 or greater) microphone elements, generates and outputs temporal region observation signals x.sub.m(t) (m=0, 1, . . . , M−1) collected by a microphone element m.
[0122] In S920, the frequency region transform unit 920 takes the temporal region observation signals x.sub.m(t) (m=0, 1, . . . , M−1) generated in S910 as input and, and transforms each of the temporal region observation signals x.sub.m(t) (m=0, 1, . . . , M−1) into frequency region, thereby generating frequency region observation signals X.sub.m(ω, τ) (m=0, 1, . . . , M−1), which are output.
[0123] In S930, the first beamformer unit 930 takes the frequency region observation signals x.sub.m(ω, τ) (m=0, 1, . . . , M−1) generated in S920 as input, and generates and outputs enhanced signals Y.sub.θ_S(ω, τ) of a sound source at a target sound direction-of-arrival θ.sub.S (hereinafter referred to as first enhanced signals Y.sub.θ_S(ω, τ)) by performing linear filtering of the frequency region observation signals X.sub.m(ω, τ) (m=0, 1, . . . , M−1).
[0124] In S940, the second beamformer unit 940 takes the frequency region observation signals X.sub.m(ω, τ) (m=0, 1, . . . , M−1) generated in S920 as input, and generates and outputs L−1 (where L−1 is an integer that is K or greater) enhanced signals Y.sub.θ_j(ω, τ) (j=1, . . . , L−1) of sound sources of directions θ.sub.j other than the target sound direction-of-arrival (hereinafter referred to as second enhanced signals Y.sub.θ_j (ω, τ)) by performing linear filtering of the frequency region observation signals X.sub.m(ω, τ) (m=0, 1, . . . , M−1).
[0125] In S950, the PSD generating unit 950 takes the first enhanced signals Y.sub.θ_S(ω, τ) generated in S930 and the second enhanced signals Y.sub.θ_j(ω, τ) (j=1, . . . , L−1) generated in S940 as input, and generates and outputs target sound PSD {circumflex over ( )}φ.sub.S(ω, τ), interference noise PSD {circumflex over ( )}φ.sub.IN((ω, τ), and background noise PSD {circumflex over ( )}φ.sub.BN(ω, τ), using the first enhanced signals Y.sub.θ_S(ω, τ) and the second enhanced signals Y.sub.θ_j(ω, τ) (j=1, . . . , L−1). Note that although the is affixed here to the signs representing the target sound PSD, the interference noise PSD, and the background noise PSD, the operations of the PSD generating unit 950 in S950 are the same as those described by way of
[0126] Hereinafter, the target sound PSD {circumflex over ( )}φ.sub.S(ω, τ), the interference noise PSD {circumflex over ( )}φ.sub.IN (ω, τ), and the background noise PSD {circumflex over ( )}φ.sub.BN (ω, τ) will be referred to as target sound PSD input value {circumflex over ( )}φ.sub.S(ω, τ), interference noise PSD input value {circumflex over ( )}φ.sub.IN (ω, τ), and background noise PSD input value {circumflex over ( )}φ.sub.BN(ω, τ). Also, u.sub.S is a variable representing the target sound PSD, u.sub.IN is a variable representing the interference noise PSD, and u.sub.BN is a variable representing the background noise PSD.
[0127] In S150, the PSD updating unit 150 takes the target sound PSD input value {circumflex over ( )}φ.sub.S(ω, τ), the interference noise PSD input value {circumflex over ( )}φ.sub.IN(ω, τ), and the background noise PSD input value {circumflex over ( )}φ.sub.BN(ω, τ), generated in S950, as input, and solves the optimization problem for the cost function relating to the variable u.sub.S, the variable u.sub.IN, and the variable u.sub.BN, thereby generating and outputting a target sound PSD output value φ.sub.S(ω, τ), an interference noise PSD output value φ.sub.IN(ω, τ), and a background noise PSD output value φ.sub.BN(ω, τ). That is to say, the PSD updating unit 150 is a component that solves the optimization problem described in the <Technical Background>. This optimization problem is an optimization problem for a cost function relating to the variable u.sub.S, the variable u.sub.IN, and the variable u.sub.BN, and is defined using at least one of the convex cost terms described below. Now, these convex cost terms are
(1) constraints relating to frequency structure of the sound source or convex cost terms relating to frequency structure of the sound source,
(2) constraints relating to temporal structure of the sound source or convex cost terms relating to temporal structure of the sound source, and
(3) constraints relating to spatial structure of the sound source or convex cost terms relating to spatial structure of the sound source.
[0128] An example of the constraints and convex cost terms of (1) through (3) will be described below. For example, as a convex const term L relating to the frequency structure of the sound source,
[0129] (where μ, ρ(∈R.sup.+) are weighting coefficients, Λ(∈R.sup.Ω×Ω) is a predetermined sparse matrix, and Ω is the number of frequency bands) can be used.
[0130] Also, for example, as a convex cost term L relating to the temporal structure of the sound source, with u.sub.BN,τ as a variable representing the background noise PSD at time frame τ, {circumflex over ( )}φ.sub.BN,τ−1 as the background noise PSD estimation value at time frame .sub.τ−1, u.sub.S,τ as a variable representing the target sound PSD at time frame τ, {circumflex over ( )}φ.sub.S,τ−1 as a target sound PSD estimation value at time frame τ−1, u.sub.IN,τ as a variable representing the interference noise PSD at time frame τ, and {circumflex over ( )}.sub.IN,τ−1 as an interference noise PSD estimation value at time frame τ−1,
[0131] can be used as a convex cost term L relating to the temporal structure of the sound source. Note that γ.sub.BN, γ.sub.S, and γ.sub.IN (∈R.sup.+) are weighting coefficients.
[0132] Also, for example, with c as a PSD estimation value of enhanced signals of the sound source at the target sound direction-of-arrival θ.sub.S,
u.sub.S+u.sub.IN+u.sub.BN=c [Math. 34]
[0133] can be used as a constraint relating to the spatial structure of the sound source. Note that in this case, the PSD updating unit 150 may take the first enhanced signals Y.sub.θ_S(ω, τ) generated in S930 as input as well, obtain the PSD φ.sub.θ_q_S of the first enhanced signals Y.sub.θ_S(ω, τ), and use the PSD estimation value c of enhanced signals of the sound source at the target sound direction-of-arrival θ.sub.S defined by this PSD φ.sub.Y_θ_S.
[0134] Also, this optimization problem for the cost function relating to the variable u.sub.S, the variable u.sub.IN, and the variable u.sub.BN can be formulated as a problem in which u=[u.sub.S.sup.T, u.sub.IN.sup.T, u.sub.BN.sup.T].sup.T, and v is an auxiliary variable of the variable u, and inf.sub.u,v, F.sub.1(u)+F.sub.2(v) (where F.sub.1 and F.sub.2 are convex functions) is solved under linear constraints relating to the variables u and v. Now, the linear constraints relating to the variables u and v include at least one of constraints relating to the frequency structure of the sound source, constraints relating to the temporal structure of the sound source, and constraints relating to the spatial structure of the sound source. Alternatively, regarding the linear constraints relating to the variables u and v, F.sub.1(u)+F.sub.2(v) includes at least one of convex cost terms relating to the frequency structure of the sound source, convex cost terms relating to the temporal structure of the sound source, and convex cost terms relating to the spatial structure of the sound source.
[0135] An example of the optimization problem formulated in this way will be described below.
[0136] Linear constraints regarding the variables u and v are given by the following Expression.
Au=v, Bu=c, u≥0 [Math. 35]
[0137] (where A=[Λ 0 0], B=[I, I, I], c is the PSD estimation value of enhanced signals of the sound source at the target sound direction-of-arrival θ.sub.S, Λ(∈R.sup.Ω×Ω) is a predetermined sparse matrix, I(∈R.sup.Ω×Ω) is an identity matrix, and Ω is the number of frequency bands)
[0138] Also, F.sub.1(u) and F.sub.2(v) are each given by the following Expressions.
[0139] and μ, ρ(∈R.sup.+) are weighting coefficients)
[0140] The PSD updating unit 150 that solves the optimization problem will be described below with reference to
[0141] The operations of the PSD updating unit 150 will be described following
D.sub.p.sup.⋅=D.sub.p.sup.−1,D.sub.q.sup.⋅=D.sub.q.sup.−1,D.sub.r.sup.⋅=D.sub.r.sup.−1
{tilde over (p)}=∇D.sub.p(p),{tilde over (q)}=∇D.sub.q(q), {tilde over (r)}=∇D.sub.r(r) [Math. 38]
[0142] In S151, the initializing unit 151 initializes a counter t. Specifically, the initializing unit 151 initializes t to t=0. The initializing unit 151 also initializes the dual variables .sup.˜p, .sup.˜q, and .sup.˜r. Specifically, .sup.˜p.sup.0, ˜q.sup.0, and ˜r.sup.0 are set as initial values of the dual variables .sup.˜p, .sup.˜q, and .sup.˜r (the result of having updated the dual variables .sup.˜p, .sup.˜q, and .sup.˜r for the 0′th time).
[0143] In S1521, the first variable calculating unit 1521 calculates u.sup.t+1, which is the result of updating the variable u for a t+1′th time, by the following Expression.
[0144] In S1522, the first dual variable calculating unit calculates .sup.˜p.sup.t+1/2, which is the intermediate updating result of the dual variable .sup.˜p for the t+1′th time, by the following Expression.
{tilde over (p)}.sup.t+1/2={tilde over (p)}.sup.t−2Au.sup.t+1 [Math. 40]
[0145] In S1523, the second dual variable calculating unit calculates .sup.˜q.sup.t+1, which is the result of updating the dual variable .sup.˜q for the t+1′th time, by the following Expression.
{tilde over (q)}.sup.t+1={tilde over (q)}.sup.t−2(Bu.sup.t+1−c) [Math. 41]
[0146] In S1524, the third dual variable calculating unit calculates .sup.˜r.sup.t+1/2, which is an intermediate updating result of the dual variable .sup.˜r for the t+1′th time, by the following Expression.
{tilde over (r)}.sup.t+1/2={tilde over (r)}.sup.t−2u.sup.t+1 [Math. 42]
[0147] In S1525, the second variable calculating unit calculates v.sup.t+1, which is the result of updating the auxiliary variable v for the t+1′th time, by the following Expression.
[0148] In S1526, the fourth dual variable calculating unit calculates .sup.˜p.sup.t+1, which is the result of updating the dual variable .sup.˜p for the t+1′th time, by the following Expression.
{tilde over (p)}.sup.t+1={tilde over (p)}.sup.t+1/2+2v.sup.t+1 [Math. 44]
[0149] In S1527, the fifth dual variable calculating unit sets .sup.˜r=[.sup.˜r.sub.1.sup.T, .sup.˜r.sub.2.sup.T, .sup.˜r.sub.3.sup.T].sup.T, and calculates .sup.˜r.sup.t+1, which is the result of updating the dual variable .sup.˜r for the t+1′th time, by the following Expression.
[0150] In S153, the counter updating unit 125 increments the counter t by 1. Specifically, t−t+1 is set.
[0151] In S154, in a case where the counter t has reached a predetermined update count T (where T is a value that is an integer of 1 or greater, and that has been set taking real-time nature into consideration) (i.e., whent>T−1 is reached, and ending conditions are satisfied), the ending condition determining unit 154 outputs a value u.sup.T of the variable u at that time, and ends processing. Otherwise, the flow returns to the processing of S1521. That is to say, the PSD updating unit 150 repeats the processing of S1521 through S154.
[0152] In S960, the sound source enhancing unit 960 takes, as input, the first enhanced signals Y.sub.θ_S(ω, τ) generated in S930, and the target sound PSD output value φ.sub.S (ω, τ), interference noise PSD output value φ.sub.IN (ω, τ), and background noise PSD output value φ.sub.BN (ω, τ), generated in S150. The sound source enhancing unit 960 then uses the first enhanced signals Y.sub.θ_S(ω, τ), target sound PSD output value φ.sub.S(ω, τ), interference noise PSD output value φ.sub.IN(ω, τ), and background noise PSD output value ω.sub.BN(ω, τ), to generate and output frequency region target sound signals Z(ω, τ)∈C.
[0153] In S970, the temporal region transform unit 970 takes the frequency region target sound signals Z(ω, τ) generated in S960 as input, and generates and outputs temporal region target sound frequency region observation signals z(t)∈R by transforming the frequency region target sound signals Z(ω, t) into the temporal region.
[0154] Note that the PSD updating unit 150 can be configured as a standalone device (hereinafter referred to as PSD optimization device 200).
[0155] According to the invention of the present embodiment, sound source enhancement capabilities can be efficiently improved in accordance with settings of usage and applications.
[0156] <Notes>
[0157]
[0158] The device according to the present invention, as a standalone hardware entity for example, has an input unit to which a keyboard or the like can be connected, and an output unit to which a liquid crystal display or the like can be connected, a communication unit connectable to a communication device (e.g., communication cable) that can communicate externally from the hardware entity, a CPU (Central Processing Unit, may have cache memory, registers, etc.), RAM and ROM that are memory, an external storage device that is a hard disk, and a bus that connects the input unit, output unit, communication unit, CPU, RAM, ROM, and external storage device so as to be capable of exchanging data therebetween. Also, a device (drive) that can read from and write to a recording medium such as a CD-ROM or the like, and so forth, may be provided to the hardware entity as necessary. Examples of physical entities having such hardware resources include a general purpose computer or the like.
[0159] The external storage device of the hardware entity stores programs necessary for realizing the above-described functions, and data and so forth necessary for processing of the programs (this is not limited to the external storage device, and programs may be stored in ROM that is a read-only storage device, for example). Data and so forth obtained by processing performed by these programs is stored in RAM, the external storage device, and so forth, as appropriate.
[0160] In the hardware entity, the programs stored in the external storage device (or ROM or the like) and data necessary for processing of the programs are read into memory as necessary, and subjected to interpreting processing by the CPU as appropriate. As a result, the CPU realizes predetermined functions (the components described above as so-and-so unit, so-and-so means, and so forth).
[0161] The present invention is not limited to the above-described embodiments, and modifications can be made as appropriate without departing from the essence of the present invention. Also, processing described in the above embodiments is not restricted to being executed in the order of the time sequence described therein, and may be executed in parallel or individually, in accordance with the processing capabilities of the device executing processing, or as necessary.
[0162] In a case of realizing the processing functions at the hardware entity (device of the present invention) described in the above embodiments by a computer, the contents of processing for the function which the hardware entity should have are described as a program, as mentioned earlier. Executing this program on a computer realizes the processing functions of the above hardware entity on the computer.
[0163] The program describing these contents of processing can be recorded in a computer-readable recording medium. Any computer-readable recording medium may be used, such as magnetic recording devices, optical discs, opto-magnetic recording media, semiconductor memory, and so forth, for example. Specifically, examples of a magnetic recording device that can be used include hard disk devices, flexible disks, magnetic tape, and so forth. Examples of optical discs include DVD (Digital Versatile Disc), DVD-RAM (Random Access Memory), CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable)/RW (ReWritable), and so forth, examples of opto-magnetic recording media include MO (Magneto-Optical disc) and so forth, and examples of semiconductor memory include EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) and so forth.
[0164] Also, distribution of this program is performed by sales, transfer, lending, and so forth of a transportable recording medium such as a DVD, CD-ROM, or the like, in which the program is recorded, for example. Further, a configuration for distribution of the program may be made by storing the program in a storage device of a server computer, and transferring the program from the server computer to other computers via a network.
[0165] A computer that executes such a program first stores the program recorded in a transportable recording medium or the program transferred from a server computer in its own storage device to begin with, for example. Then, at the time of executing the processing, the computer reads the program stored in its own recording device, and executes processing following the program that has been read out. As a separate form of executing the program, the computer may directly read the program from the transportable recording medium and execute processing following the program. Further, each time the program is transferred from the server computer to this computer, the computer may successively execute processing following the program that has been received. Also, a configuration may be made where the above-described processing is executed by a so-called ASP (Application Service Provider) type service, where the program is not transferred from the server computer to this computer, and the processing functions are realized just by instructions for execution and acquisition of results. Note that the program according to this form includes information provided for processing by electronic computers that is equivalent to programs (data or the like that is not direct instructions to a computer but has a nature of defining processing of the computer).
[0166] Also, in this form, the hardware entity is configured by executing a predetermined program on a computer, but at least part of these contents of processing may be realized by hardware.
[0167] The above description of the embodiment of the present invention has been given for exemplification and description. The description is not intended to be exhaustive, nor is the invention intended to be strictly limited to the disclosed form. Modifications and variations can be made from the above teachings. The embodiment has been selectively expressed to provide the best exemplification of the principles of the present invention, and to enable one skilled in the art to carry out the present invention in various embodiments that are well thought out to be applied to practical use, with various modifications added thereto. All such modifications and variations are within the scope of the present invention set forth in the attached Claims, as interpreted according to the breadth justly and legally fairly imparted thereto.