Method for speech coding, method for speech decoding and their apparatuses

09852740 ยท 2017-12-26

Assignee

Inventors

Cpc classification

International classification

Abstract

A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.

Claims

1. A speech decoding method at a decoder for synthesizing speech signals, the method comprising: receiving, at the decoder, a coded speech signal including an adaptive code, an excitation code, and a gain code; determining an adaptive code vector from an adaptive codebook based on the adaptive code; determining a decoded adaptive code vector gain by decoding the gain code; determining a decoded excitation vector by decoding the excitation code, the decoded excitation vector having a number of samples with zero amplitude; modifying, at the decoder, the decoded excitation vector based on the decoded adaptive code vector gain such that the number of samples with zero amplitude is changed; weighting the adaptive code vector by the decoded adaptive code vector gain; and synthesizing a speech signal based on the modified decoded excitation vector and the weighted adaptive code vector.

2. The method of claim 1, wherein the gain code is decoded in a decoding period associated with the received coded speech.

3. The method of claim 1, wherein the decoded excitation vector is modified based on a noise level associated with the received coded speech.

4. The method of claim 1, further comprising: weighting the modified decoded excitation vector; and adding together the weighted adaptive code vector and the weighted modified decoded excitation vector.

5. The method of claim 1, wherein the adaptive codebook is based on a past excitation.

6. The method of claim 4, further comprising: determining a decoded linear prediction parameter by decoding a linear prediction parameter code associated with the received coded speech; and wherein the speech signal is synthesized using the decoded linear prediction parameter and the added weighted adaptive code vector and weighted modified decoded excitation vector.

7. The method of claim 6, wherein the decoded linear prediction parameter corresponds to coefficients of a synthesis filter.

8. A speech decoding apparatus for synthesizing speech signals, comprising: a memory; and at least one hardware processor communicatively coupled with the memory and configured to: receive a coded speech signal including an adaptive code, an excitation code, and a gain code; determine an adaptive code vector from an adaptive codebook based on the adaptive code; determine a decoded adaptive code vector gain by decoding the gain code; determine a decoded excitation vector by decoding the excitation code, the decoded excitation vector having a number of samples with zero amplitude; modify the decoded excitation vector based on the decoded adaptive code vector gain such that the number of samples with zero amplitude is changed; weight the adaptive code vector by the decoded adaptive code vector gain; and synthesize a speech signal based on the modified decoded excitation vector and the weighted adaptive code vector.

9. The apparatus of claim 8, wherein the gain code is decoded in a decoding period associated with the received coded speech.

10. The apparatus of claim 8, wherein the decoded excitation vector is modified based on a noise level associated with the received coded speech.

11. The apparatus of claim 8, wherein the at least one hardware processor is further configured to: weight the modified decoded excitation vector; and add together the weighted adaptive code vector and the weighted modified decoded excitation vector.

12. The apparatus of claim 8, wherein the adaptive codebook is based on a past excitation.

13. The apparatus of claim 11, wherein the at least one hardware processor is further configured to: determine a decoded linear prediction parameter by decoding a linear prediction parameter code associated with the received coded speech; and synthesize the speech signal using the decoded linear prediction parameter and the added weighted adaptive code vector and weighted modified decoded excitation vector.

14. The apparatus of claim 13, wherein the decoded linear prediction parameter corresponds to coefficients of a synthesis filter.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 shows a block diagram of a whole configuration of a speech coding and speech decoding apparatus in embodiment 1 of this invention;

(2) FIG. 2 shows a table for explaining an evaluation of a noise level in embodiment 1 of this invention illustrated in FIG. 1;

(3) FIG. 3 shows a block diagram of a whole configuration of a speech coding and speech decoding apparatus in embodiment 3 of this invention;

(4) FIG. 4 shows a block diagram of a whole configuration of a speech coding and speech decoding apparatus in embodiment 5 of this invention;

(5) FIG. 5 shows a schematic line chart for explaining a decision process of weighting in embodiment 5 illustrated in FIG. 4;

(6) FIG. 6 shows a block diagram of a whole configuration of a CELP speech coding and decoding apparatus according to the related art;

(7) FIG. 7 shows a block diagram of a whole configuration of an improved CELP speech coding and decoding apparatus according to the related art; and

(8) FIG. 8 shows a block diagram of a whole configuration of a speech coding and decoding apparatus according to embodiment 8 of the invention.

DETAILED DESCRIPTION OF THE INVENTION

(9) Explanations are made on embodiments of this invention with reference to drawings.

Embodiment 1

(10) FIG. 1 illustrates a whole configuration of a speech coding method and speech decoding method in embodiment 1 according to this invention. In FIG. 1, an encoder 1, a decoder 2, a multiplexer 3, and a divider 4 are illustrated. The encoder 1 includes a linear prediction parameter analyzer 5, linear prediction parameter encoder 6, synthesis filter 7, adaptive codebook 8, gain encoder 10, distance calculator 11, first excitation codebook 19, second excitation codebook 20, noise level evaluator 24, excitation codebook switch 25, and weighting-adder 38. The decoder 2 includes a linear prediction parameter decoder 12, synthesis filter 13, adaptive codebook 14, first excitation codebook 22, second excitation codebook 23, noise level evaluator 26, excitation codebook switch 27, gain decoder 16, and weighting-adder 39. In FIG. 1, the linear prediction parameter analyzer 5 is a spectrum information analyzer for analyzing an input speech S1 and extracting a linear prediction parameter, which is spectrum information of the speech. The linear prediction parameter encoder 6 is a spectrum information encoder for coding the linear prediction parameter, which is the spectrum information and setting a coded linear prediction parameter as a coefficient for the synthesis filter 7. The first excitation codebooks 19 and 22 store pluralities of non-noise time series vectors, and the second excitation codebooks 20 and 23 store pluralities of noise time series vectors. The noise level evaluators 24 and 26 evaluate a noise level, and the excitation codebook switches 25 and 27 switch the excitation codebooks based on the noise level.

(11) Operations are explained.

(12) In the encoder 1, the linear prediction parameter analyzer 5 analyzes the input speech S1, and extracts a linear prediction parameter, which is spectrum information of the speech. The linear prediction parameter encoder 6 codes the linear prediction parameter. Then, the linear prediction parameter encoder 6 sets a coded linear prediction parameter as a coefficient for the synthesis filter 7, and also outputs the coded linear prediction parameter to the noise level evaluator 24.

(13) Explanations are made on coding of excitation information.

(14) An old excitation signal is stored in the adaptive codebook 8, and a time series vector corresponding to an adaptive code inputted by the distance calculator 11, which is generated by repeating an old excitation signal periodically, is outputted. The noise level evaluator 24 evaluates a noise level in a concerning coding period based on the coded linear prediction parameter inputted by the linear prediction parameter encoder 6 and the adaptive code, e.g., a spectrum gradient, abort-term prediction gain, and pitch fluctuation as shown in FIG. 2, and outputs an evaluation result to the excitation codebook switch 25. The excitation codebook switch 25 switches excitation codebooks for coding based on the evaluation result of the noise level. For example, if the noise level is low, the first excitation codebook 19 is used, and if the noise level is high, the second excitation codebook 20 is used.

(15) The first excitation codebook 19 stores a plurality of non-noise time series vectors, e.g., a plurality of time series vectors trained by reducing a distortion between a speech for training and its coded speech. The second excitation codebook 20 stores a plurality of noise time series vectors, e.g., a plurality of time series vectors generated from random noises. Each of the first excitation codebook 19 and the second excitation codebook 20 outputs a time series vector respectively corresponding to an excitation code inputted by the distance calculator 11. Each of the time series vectors from the adaptive codebook 8 and one of first excitation codebook 19 or second excitation codebook 20 are weighted by using a respective gain provided by the gain encoder 10, and added by the weighting-adder 38. An addition result is provided to the synthesis filter 7 as excitation signals, and a coded speech is produced. The distance calculator 11 calculates a distance between the coded speech and the input speech S1, and searches sin adaptive code, excitation code, and gain for minimizing the distance. When this coding is over, the linear prediction parameter code and an adaptive code, excitation code, and gain code for minimizing the distortion between the input speech and the coded speech are outputted as a coding result S2. These are characteristic operations in the speech coding method in embodiment 1.

(16) Explanations are made on the decoder 2. In the decoder 2, the linear prediction parameter decoder 12 decodes the linear prediction parameter code to the linear prediction parameter, and sets the decoded linear prediction parameter as a coefficient for the synthesis filter 13, and outputs the decoded linear prediction parameter to the noise level evaluator 26.

(17) Explanations are made on decoding of excitation information. The adaptive codebook 14 outputs a time series vector corresponding to an adaptive code, which is generated by repeating an old excitation signal periodically. The noise level evaluator 26 evaluates a noise level by using the decoded linear prediction parameter inputted by the linear prediction parameter decoder 12 and the adaptive code in a same method with the noise level evaluator 24 in the encoder 1, and outputs an evaluation result to the excitation codebook switch 27. The excitation codebook switch 27 switches the first excitation codebook 22 and the second excitation codebook 23 based on the evaluation result of the noise level in a same method with the excitation codebook switch 25 in the encoder 1.

(18) A plurality of non-noise time series vectors, e.g., a plurality of time series vectors generated by training for reducing a distortion between a speech for training and its coded speech, is stored in the first excitation codebook 22. A plurality of noise time series vectors, e.g., a plurality of vectors generated from random noises, is stored in the second excitation codebook 23. Each of the first and second excitation codebooks outputs a time series vector respectively corresponding to an excitation code. The time series vectors from the adaptive cod book 14 and one of first excitation codebook 22 or second excitation codebook 23 are weighted by using respective gains, decoded from gain codes by the gain decoder 16, and added by the weighting-adder 39. An addition result is provided to the synthesis filter 13 as an excitation signal, and an output speech S3 is produced. These are operations are characteristic operations in the speech decoding method in embodiment 1.

(19) In embodiment 1, the noise level of the input speech is evaluated by using the code and coding result, and various excitation codebooks are used based on the evaluation result. Therefore, a high quality speech can be reproduced with a small data amount.

(20) In embodiment 1, the plurality of time series vectors is stored in each of the excitation codebooks 19, 20, 22, and 23. However, this embodiment can be realized as far as at least a time series vector is stored in each of the excitation codebooks.

Embodiment 2

(21) In embodiment 1, two excitation codebooks are switched. However, it is also possible that three or more excitation codebooks are provided and switched based on a noise level.

(22) In embodiment 2, a suitable excitation codebook can be used even for a medium speech, e.g., slightly noisy, in addition to two kinds of speech, i.e., noise and non-noise. Therefore, a high quality speech can be reproduced.

Embodiment 3

(23) FIG. 3 shows a whole configuration of a speech coding method and speech decoding method in embodiment 3 of this invention. In FIG. 3, same signs are used for units corresponding to the units in FIG. 1. In FIG. 3, excitation codebooks 28 and 30 store noise time series vectors, and samplers 29 and 31 set an amplitude value of a sample with a low amplitude in the time series vectors to zero.

(24) Operations are explained. In the encoder 1, the linear prediction parameter analyzer 5 analyzes the input speech S1, and extracts a linear prediction parameter, which is spectrum information of the speech. The linear prediction parameter encoder 6 codes the linear prediction parameter. Then, the linear prediction parameter encoder 6 sets a coded linear prediction parameter as a coefficient for the synthesis filter 7, and also outputs the coded linear prediction parameter to the noise level evaluator 24.

(25) Explanations are made on coding of excitation information. An old excitation signal is stored in the adaptive codebook 8, and a time series vector corresponding to an adaptive code inputted by the distance calculator 11, which is generated by repeating an old excitation signal periodically, is outputted. The noise level evaluator 24 evaluates a noise level in a concerning coding period by using the coded linear prediction parameter, which is inputted from the linear prediction parameter encoder 6, and an adaptive code, e.g., a spectrum gradient, short-term prediction gain, and pitch fluctuation, and outputs an evaluation result to the sampler 29.

(26) The excitation codebook 28 stores a plurality of time series vectors generated from random noises, for example, and outputs a time series vector corresponding to an excitation code inputted by the distance calculator 11. If the noise level is low in the evaluation result of the noise, the sampler 29 outputs a time series vector, in which an amplitude of a sample with an amplitude below a determined value in the time series vectors, inputted from the excitation codebook 28, is set to zero, for example. If the noise level is high, the sampler 29 outputs the time series vector inputted from the excitation codebook 28 without modification. Each of the times series vectors from the adaptive codebook 8 and the sampler 29 is weighted by using a respective gain provided by the pin encoder 10 and added by the weighting-adder 38. An addition result is provided to the synthesis filter 7 as excitation signals, and a coded speech is produced. The distance calculator 11 calculates a distance between the coded speech and the input speech S1, and searches an adaptive code, excitation code, and gain for minimizing the distance. When coding is over, the linear prediction parameter code and the adaptive code, excitation code, and gain code for minimizing a distortion between the input speech and the coded speech are outputted as a coding result S2. These am characteristic operations in the speech coding method in embodiment 3.

(27) Explanations are made on the decoder 2. In the decoder 2, the linear prediction parameter decoder 12 decodes the linear prediction parameter code to the linear prediction parameter. The linear prediction parameter decoder 12 sets the linear prediction parameter as a coefficient for the synthesis filter 13, and also outputs the linear prediction parameter to the noise level evaluator 26.

(28) Explanations are made on decoding of excitation information. The adaptive codebook 14 outputs a time series vector corresponding to an adaptive code, generated by repeating an old excitation signal periodically. The noise level evaluator 26 evaluates a noise level by using the decoded linear prediction parameter inputted from the linear prediction parameter decoder 12 and the adaptive code in a same method with the noise level evaluator 24 in the encoder 1, and outputs an evaluation result to the sampler 31.

(29) The excitation codebook 30 outputs a time series vector corresponding to an excitation code. The sampler 31 outputs a time series vector based on the evaluation result of the noise level in same processing with the sampler 29 in the encoder 1. Each of the time series vectors outputted from the adaptive codebook 14 and sampler 31 are weighted by using a respective gain provided by the gain decoder 16, and added by the weighting-adder 39. An addition result is provided to the synthesis filter 13 as an excitation signal, and an output speech S3 is produced.

(30) In embodiment 3, the excitation codebook storing noise time series vectors is provided, and an excitation with a low noise level can be generated by sampling excitation signal samples based on an evaluation result of the noise level the speech. Hence, a high quality speech can be reproduced with a small data amount. Further, since it is not necessary to provide a plurality of excitation codebooks, a memory amount for storing the excitation codebook can be reduced.

Embodiment 4

(31) In embodiment 3, the samples in the time series vectors are either sampled or not. However, it is also possible to change a threshold value of an amplitude for sampling the samples based on the noise level. In embodiment 4, a suitable time series vector can be generated and used also for a medium speech, e.g., slightly noisy, in addition to the two types of speech, i.e., noise and non-noise. Therefore, a high quality speech can be reproduced.

Embodiment 5

(32) FIG. 4 shows a whole configuration of a speech coding method and a speech decoding method in embodiment 5 of this invention, and same signs are used for units corresponding to the units in FIG. 1.

(33) In FIG. 4, first excitation codebooks 32 and 35 store noise time series vectors, and second excitation codebooks 33 and 36 store non-noise time series vectors. The weight determiners 34 and 37 are also illustrated.

(34) Operations are explained. In the encoder 1, the linear prediction parameter analyzer 5 analyzes the input speech S1, and extracts a linear prediction parameter, which is spectrum information of the speech. The linear prediction parameter encoder 6 codes the linear prediction parameter. Then, the linear prediction parameter encoder 6 sets a coded linear prediction parameter as a coefficient for the synthesis filter 7, and also outputs the coded prediction parameter to the noise level evaluator 24.

(35) Explanations are made on coding of excitation information. The adaptive codebook 8 stores an old excitation signal, and outputs a time series vector corresponding to an adaptive code inputted by the distance calculator 11, which is generated by repeating an old excitation signal periodically. The noise level evaluator 24 evaluates a noise level in a concerning coding period by using the coded linear prediction parameter, which is inputted from the linear prediction parameter encoder 6 and the adaptive code, e.g., a spectrum gradient, short-term prediction gain, and pitch fluctuation, and outputs an evaluation result to the weight determiner 34.

(36) The first excitation codebook 32 stores a plurality of noise time series vectors generated from random noises, for example, and outputs a time series vector corresponding to an excitation code. The second excitation codebook 33 stores a plurality of time series vectors generated by training for reducing a distortion between a speech for training and its coded speech, and outputs a time series vector corresponding to an excitation code inputted by the distance calculator 11. The weight determiner 34 determines a weight provided to the time series vector from the first excitation codebook 32 and the time series vector from the second excitation codebook 33 based on the evaluation result of the noise level inputted from the noise level evaluator 24, as illustrated in FIG. 5, for example. Each of the time series vectors from the first excitation codebook 32 and the second excitation codebook 33 is weighted by using the weight provided by the weight determiner 34, and added. The time series vector outputted from the adaptive codebook 8 and the time series vector, which is generated by being weighted and added, are weighted by using respective gains provided by the gain encoder 10, and added by the weighting-adder 38. Then, an addition result is provided to the synthesis filter 7 as excitation signals, and a coded speech is produced. The distance calculator 11 calculates a distance between the coded speech and the input speech S1, and searches an adaptive code, excitation code, and gain for minimizing the distance. When coding is over, the linear prediction parameter code, adaptive code excitation code, and gain code for minimizing a distortion between the input speech and the coded speech, are outputted as a coding result.

(37) Explanations are made on the decoder 2. In the decoder 2, the linear prediction parameter decoder 12 decodes the linear prediction parameter code to the linear prediction parameter. Then, the linear prediction parameter decoder 12 sets the linear prediction parameter as a coefficient for the synthesis filter 13, and also outputs the linear prediction parameter to the noise evaluator 26.

(38) Explanations are made on decoding of excitation information. The adaptive codebook 14 outputs a time series vector corresponding to an adaptive code by repeating an old excitation signal periodically. The noise level evaluator 26 evaluates a noise level by using the decoded linear prediction parameter, which is inputted from the linear prediction parameter decoder 12, and the adaptive code in a same method with the noise level evaluator 24 in the encoder 1, and outputs an evaluation result to the weight determiner 37.

(39) The first excitation codebook 35 and the second excitation codebook 36 output time series vectors corresponding to excitation codes. The weight determiner 37 weights based on the noise level evaluation result inputted from the noise level evaluator 26 in a same method with the weight determiner 34 in the encoder 1. Each of the time series vectors from the first excitation codebook 33 and the second excitation codebook 36 is weighted by using a respective weight provided by the weight determiner 37, and added. The time series vector outputted from the adaptive codebook 14 and the time series vector, which is generated by being weighted and added, are weighted by using respective gains decoded from the gain codes by the gain decoder 16, and added by the weighting-adder 39. Then, an addition result is provided to the synthesis filter 13 as an excitation signal, and an output speech S3 is produced.

(40) In embodiment 5, the noise level of the speech is evaluated by using a code and coding result, and the noise time series vector or non-noise time series vector are weighted based on the evaluation result, and added. Therefore, a high quality speech can be reproduced with a small data amount.

Embodiment 6

(41) In embodiments 1-5, it is also possible to change gain codebooks based on the evaluation result of the noise level. In embodiment 6, a most suitable gain codebook can be used based on the excitation codebook. Therefore, a high quality speech can be reproduced.

Embodiment 7

(42) In embodiments 1-6, the noise level of the speech is evaluated, and the excitation codebooks are switched based on the evaluation result. However, it is also possible to decide and evaluate each of a voiced onset, plosive consonant, etc., and switch the excitation codebooks based on an evaluation result. In embodiment 7, in addition to the noise state of the speech, the speech is classified in more details, e.g., voiced onset, plosive consonant, etc., and a suitable excitation codebook can be used for each state. Therefore, a high quality speech can be reproduced.

Embodiment 8

(43) In embodiments 1-6, the noise level in the coding period is evaluated by using a spectrum gradient; short-term prediction gain, pitch fluctuation. However, it is also possible to evaluate the noise level by using a ratio of a gain value against an output from the adaptive codebook as illustrated in FIG. 8, in which similar elements are labeled with the same reference numerals.

INDUSTRIAL APPLICABILITY

(44) In the speech coding method, speech decoding method, speech coding apparatus, and speech decoding apparatus according to this invention, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of the spectrum information, power information, and pitch information, and various excitation codebooks are used based on the evaluation result. Therefore, a high quality speech can be reproduced with a small data amount.

(45) In the speech coding method and speech decoding method according to this invention, a plurality of excitation codebooks storing excitations with various noise levels is provided, and the plurality of excitation codebooks is switched based on the evaluation result of the noise level of the speech. Therefore, a high quality speech can be reproduced with a small data amount.

(46) In the speech coding method and speech decoding method according to this invention, the noise levels of the time series vectors stored in the excitation codebooks are changed based on the evaluation result of the noise level of the speech. Therefore, a high quality speech can be reproduced with a small data amount.

(47) In the speech coding method and speech decoding method according to this invention, an excitation codebook storing noise time series vectors is provided, and a time series vector with a low noise level is generated by sampling signal samples in the time series vectors based on the evaluation result of the noise level of the speech. Therefore, a high quality speech can be reproduced with a small data amount.

(48) In the speech coding method and speech decoding method according to this invention, the first excitation codebook storing noise time series vectors and the second excitation codebook storing non-noise time series vectors are provided, and the time series vector in the first excitation codebook or the time series vector in the second excitation codebook is weighted based on the evaluation result of the noise level of the speech, and added to generate a time series vector. Therefore a high quality speech can be reproduced with a small data amount.