Image or video encoding and decoding

10972747 · 2021-04-06

Assignee

Inventors

Cpc classification

International classification

Abstract

A method of encoding image or video content within a set of targets such as a time, a complexity, quality, and bitrate, using an encoder having a plurality of coding configurations, comprises setting an initial coding configuration in the encoder, where coding configurations differ one from the other in the number of options tested for each of one or more tools. A first part of the content is encoded in the initial configuration and measures are made of the number of times a specified tool is used and a representative time taken to test an option for that specified tool. From those measures a prediction is made of the time difference between the time taken to encode content using the initial coding configuration and the time taken to encode content using another coding configuration in which a different number of options are tested for that specified tool.

Claims

1. A method of encoding image or video content within a set of fixed or adaptable targets, which may include a target time to encode, a target complexity, a target output quality, or a target output bitrate, using an encoder having a plurality of coding configurations and utilising a plurality of coding tools each having a set of selectable options, the method comprising the steps of: selecting an initial coding configuration; encoding a first part of the content using the encoder in the initial configuration; determining content based usage measures for the initial configuration; deriving from those measures predictions of the time difference between the time taken to encode content using the initial coding configuration and the time taken to encode content using at least some of the other coding configurations; determining from the predictions of the time difference and the given targets a second coding configuration meeting given targets; and encoding a second or subsequent part of the content using the second or subsequent coding configuration; wherein options are selected dynamically for each tool during encoding by testing of options, wherein said coding configurations differ one from the other in the number of options tested for each of one or more tools; and wherein the step of determining content based usage measures for the initial configuration comprises measuring the number of times a tool is used and measuring a representative time taken to test an option for a tool.

2. The method of claim 1, wherein a reduction in the number of options tested for a tool is contingent upon an analysis of an element of content indicating use of a particular option for that tool for that element of content.

3. The method of claim 1, wherein a reduction in the number of options tested for a tool is contingent upon an analysis of an element of content contra-indicating use of one or more options for that tool for that element of content.

4. The method of claim 1, wherein in the second configuration options are tested for a particular tool for which options were not tested in the first configuration.

5. The method of claim 1, wherein in the first configuration options are tested for a particular tool for which options are not tested in the second configuration.

6. The method of claim 1, in which the plurality of coding configurations are ranked independently of content on the a priori effect on the quality of encoding and said step of selecting a second coding configuration meeting the given targets takes into consideration said ranking.

7. The method of claim 1, wherein the given targets include an overall target time to encode and transmit content, and where a target time taken to encode content is derived from an overall target time to encode and transmit content and from a bit rate measure.

8. An encoder configured to perform a method of encoding in accordance with claim 1.

9. A method of decoding image or video content that has been encoded using a method according to claim 1.

10. A decoder configured to perform a method of decoding in accordance with claim 9.

11. A method of encoding image or video content utilising a plurality of tools each having a set of selectable options, where options are selected dynamically during encoding by testing of options; wherein to vary the speed of encoding the number of tools used, or options tested for a tool, may be changed; characterised by predicting repeatedly throughout encoding by analysis of content the difference in the time taken to encode content associated with a respective change in the number of tools used or options tested for one or more of the tools; wherein the step of varying the number of tools used or options tested comprises for each tool the conduct of an analysis on an element of content and dependent on that analysis precluding testing of one or more options for that tool on that element of content, and wherein the step of predicting the difference in the time taken to encode content associated with a respective change in the number of options tested for one or more of the tools comprises, for each tool, measuring the number of times that through said analysis testing of one or more options for that tool was or would have been precluded and measuring a representative time taken to test an option.

12. The method of claim 11, further comprising the step of determining a target time difference from a target time taken to encode the content and from the time elapsed in encoding and through use of said predictions changing the number of tools used or options tested for at least one tool accordingly.

13. The method of claim 11, wherein the precluding of testing of one or more options for a tool may be enabled or disabled for each tool according to a configuration.

14. The method of claim 13, wherein said analysis is conducted for a tool irrespective of whether the precluding of testing of one or more options for that tool is enabled or disabled in the current configuration.

15. The method of claim 11, in which the effects of each change in the number of options tested are ranked on the a priori effect on the quality of encoding.

16. The method of claim 11, wherein a target time taken to encode content is derived from an overall target time to encode and transmit content and from a bit rate measure.

Description

(1) The invention will now be described by way of example with reference to the accompanying drawings, in which:

(2) FIG. 1 illustrates (a) all options are tested by an encoder and (b) the effect of a speed-up

(3) FIG. 2 illustrates the time required by an encoder in all possible configurations using two nested speed-ups

(4) Assume that the encoder can be configured in C different possible configurations. An initial configuration i is selected. The selection of such initial configuration can be performed by means of pre-determined statistics, or pre-analysis of the sequence. Typical encoders will encode the whole sequence using the initial configuration i. Conversely by means of performing an analysis based on information extracted during the encoding of a given part of the sequence with a given configuration, a decision can be made to encode the subsequent part of the encoding with a different configuration in order to meet specific targets. Such decision can then be repeated after encoding each part of the sequence.

(5) As an example, assume that such targets consist of a total encoding time T. The sequence can be split into a number of N parts. An initial configuration c.sub.0 is selected, and the first part is considered. The residual time left to encode the remaining N parts in the sequence is initialised to the total available time, or T.sub.0=T.

(6) Then: 1. The current part n of the sequence is encoded using the selected configuration c.sub.n. 2. The time necessary to encode the current part using the selected configuration is measured as t[c.sub.n] 3. An analysis of the encoding is performed. Based on such analysis, accurate estimates of the time that it would have taken to encode part n using the other possible are computed as t[j], where j=0, . . . , C−1 4. The residual time left to encode the remaining subsequent parts in the sequence is computed as T.sub.n+1=T.sub.n−t[c.sub.n]. The average time available for each remaining part is consequently computed as

(7) t _ n + 1 = T n + 1 N - n - 1 5. The configuration c.sub.n+1 among all j=0, . . . , C−1 such that t[c.sub.n+1] is the closest to t.sub.n+1 and such that t[i.sub.n+1]≤t.sub.n+1, is determined. 6. If n<N−1 the next part n+1 is considered, and Step 1 is repeated.

(8) As a further example, assume that such targets consist of a total encoding time T and a maximum output quality achievable in such total encoding time. Assume that the available configurations j=0, . . . , C−1 can be pre-determinedly sorted according to their impact on the quality when encoding with each configuration. The sequence can be split into a number of N parts. An initial configuration i.sub.0 is selected and the first part is considered. The available time to encode the remaining parts in the sequence is initialised to the total available time, as T.sub.0=T.

(9) Then: 1. The current part n of the sequence is encoded using the selected configuration c.sub.n. 2. The time necessary to encode using the selected configuration is measured as t[c.sub.n] 3. An analysis of the encoding is performed. Based on such analysis, an accurate estimate of the time that it would have taken to encode part n using other possible configurations j=0, . . . , C−1 is computed as t[j] 4. The residual time left to encode the remaining subsequent parts in the sequence is computed as T.sub.n+1=T.sub.n−t[c.sub.n]. The average time available for each remaining part is consequently computed as

(10) t _ n + 1 = T n + 1 N - n - 1 5. The configuration c.sub.n+1 such that the quality associated with c.sub.n+1 is the maximum among all the configurations {tilde over (J)} that satisfy t[{tilde over (J)}]≤t.sub.n+1 is determined. 6. If n<N−1 the next part n+1 is considered, and Step 1 is repeated.

(11) As a further example, assume that such targets consist of a total target time T, consisting of the time for encoding T.sub.e and time for transmitting the sequence T.sub.t, and a maximum output quality achievable in such total time. Assume that information related on the network status is available at any given time instant so that it is always possible to compute the time necessary to transmit given information on the bit rate of the content that is being transmitted. Assume that the available configurations j=0, . . . , C−1 can be pre-determinedly sorted according to their impact on the quality when encoding with each configuration. The sequence can be split into a number of N parts. An initial configuration c.sub.0 is selected and the first part is considered. The available time to encode and transmit the remaining parts in the sequence is initialised to the total available time, as T.sub.0=T.

(12) Then: 1. The current part n of the sequence is encoded using the selected configuration c.sub.n. 2. The time necessary to encode the current part using the selected configuration is measured as t.sub.e[i.sub.n]. Based on the achieved bit rate, the time necessary to transmit the current part is computed as t.sub.t[i.sub.n] 3. An analysis of the encoding is performed. Based on such analysis, an accurate estimate of the time that it would have taken to encode part n using other possible configurations j=0, . . . , C−1 is computed as t[j]. 4. An analysis of the encoding is performed. Based on such analysis, an accurate estimate of the bit rate that it would have been produced to encode part n using other possible configurations j=0, . . . , C−1 is computed. Based on such bit rates, the time necessary to transmit the part using other possible configurations j=0, . . . , 2.sup.R−1 is computed as T.sub.t[j]. 5. The residual time left to encode and transmit the remaining subsequent parts in the sequence is computed as T.sub.n+1=T.sub.n−t.sub.e[c.sub.n]−t.sub.t[c.sub.n]. The average time available for each remaining part is consequently computed as

(13) t _ n + 1 = T n + 1 N - n - 1 6. The configuration c.sub.n+1 such that the quality associated with c.sub.n+1 is the maximum among the configurations {tilde over (J)} such that t[{tilde over (J)}]+t.sub.t[{tilde over (J)}]≤t.sub.n+1 is determined. 7. If n<N−1 the next part n+1 is considered, and Step 1 is repeated.

(14) The above examples make use of an analysis step in which, based on information extracted while encoding a part of the sequence using a given configuration i, an accurate estimate of the time that it would have taken to encode part n using other possible configurations is computed.

(15) Assume that the encoder has availability of a number of R speed-ups, each of which can be enabled or disabled. The encoder can therefore be configured to encode a given part of the sequence in 2.sup.R different possible configurations. In order to identify whether a speed-up is enabled or disabled in a specific configuration, define a matrix S of size 2.sup.R×R with elements s.sub.i,r, i=0, . . . , 2.sup.R−1, r=0, . . . , R−1, where element s.sub.i,r=0 (s.sub.i,r=1) denotes that speed-up r is disabled (enabled) in configuration i.

(16) Assume now that the encoder is run on a given part of the sequence using an initial configuration c.sub.init. The analysis in the aforementioned examples requires the encoder to be able to perform an accurate estimate of the time necessary to encode the same part if a different final configuration c.sub.fin was used instead. Formally, denote the time required to encode the considered part using the initial configuration as t[c.sub.init]; the time required to encode the considered part in a different configuration can be defined as t[c.sub.fin]=t[c.sub.init]+Δt.sub.init,fin, where Δt.sub.init,fin is the time difference between encoding using the two configurations. In order to provide these estimates, the effects of the changes between the two configurations must be taken into account.

(17) Single Speed-Up in Isolation:

(18) First, the effects of changing a single speed-up taken in isolation are considered; namely, it is considered how enabling a speed-up affects the encoding with respect to encoding with such speed-up disabled, or conversely, how disabling a speed-up affects the encoding with respect to encoding with such speed-up enabled.

(19) Consider a specific speed-up r which affects the execution of a given tool used within a module in the encoder loop. The execution of the tool is repeated on multiple instances, each time the specific module is called: for instance, if the tool is used within the prediction step, then it will be executed for each prediction unit in the sequence. When r is disabled, each time the encoder executes the given tool, it has to test a number of possible options, denote such number as K.sub.r. Conversely, in case r is enabled, then before execution of the tool, the encoder computes an analysis. This analysis results in the encoder triggering or not triggering the speed-up r while executing the current instance of the tool. In such case, instead of testing the whole set of possible K.sub.r options, only a sub-set is actually tested. Notice that, in case when triggered, the speed-up reduces the number of options to one, the encoder still has to perform the operations related with executing such option. For simplicity, assume in the rest of this example that when r is triggered, the speed-up always reduces the number of options to one; the example can be easily extended to the case when a sub-set of multiple options are tested.

(20) If p.sub.r is the average time necessary to execute one option, then each time r is triggered when executing one instance of the tool, the encoder would go from requiring a time of K.sub.r×p.sub.r, to only requiring p.sub.r. Denote as c.sub.˜r and c.sub.r the two specific configurations corresponding to the case in which r is disabled and enabled, respectively, and in which all other speed-ups are left unchanged, or s.sub.i,m=s.sub.j,m, ∀m≠r. Obviously Δt.sub.˜r,r≤0, in that the encoder will be equally fast or faster when r is enabled than when it is disabled. Also obviously Δt.sub.r,˜r=−Δt.sub.˜r,r.

(21) It will be understood that if a speed-up r is enabled, the appropriate analysis will be carried out. Based on the computation of specific metrics, a particular option may be prima facie indicated and therefore selected without testing. It may also be the case—depending on the content—that no option is prima facie indicated and all options still require testing. In the latter case, even though the speed-up is enabled, no time saving ensues. It can be said that the enabled speed-up is “triggered” in the former case where there is no testing (or reduced testing) and an enabled speed-up is not “triggered” when, despite analysis, all options are still tested.

(22) It is also pointed out that—in order to make the measurements made below—the analysis necessarily carried out when a speed-up is enabled, is here carried out each time a tool is used, even where the speed-up is disabled. Whilst such analysis on a disabled speed-up cannot reduce testing time (and is itself to a degree time consuming), the extra benefits outlined below are found greatly to exceed the “cost” of the analysis.

(23) This estimation can be performed as follows.

(24) During the encoding, the following parameters can be measured: 1. The total number of instances the tool is executed, denoted as N.sub.r. 2. The total number of times that r is triggered (or would be triggered, if c.sub.init=c.sub.˜r) as N.sub.r,true. Denote as N.sub.r,false=N.sub.r−N.sub.r,true. 3. The average time spent executing an individual option, denoted as p.sub.r.

(25) Eventually, Δt.sub.init,fin can be computed as follows:
Δt.sub.init,fin=(s.sub.c.sub.init.sub.,r−s.sub.c.sub.fin.sub.,r)p.sub.r(K.sub.r−1)N.sub.r,true

(26) Single Speed-Up in Isolation Assuming Uneven Time Distribution:

(27) The above approach is based on the assumption that time is uniformly distributed while testing different options in different parts of the encoder, namely that the time necessary to execute each option is constant. In most cases though this is not the case and a better estimation can be obtained. In particular, assume that the speed-up r is considered again in isolation. The idea is that the time necessary to execute each option in case r is (or would be) triggered is different than in case it is not triggered. That is say that for content where analysis would indicated a particular tool option, the time taken to use or test that tool option might be expected to be less than for content where use of that tool was not prima facie indicated.

(28) In order to compensate for this, a new estimation can be performed as follows. During the encoding, the following parameters can be measured: 1. The total number of instances the tool is executed, denoted as N.sub.r. 2. The total number of times that r is triggered (or would be triggered, if c.sub.init=c.sub.˜r) as N.sub.r,true. Denote as N.sub.r,false=N.sub.r−N.sub.r,true. 3. The average time spent executing an individual option, denoted as p.sub.r. 4. The average time spent executing an individual option in case r is (or would be) triggered, p.sub.r,true 5. The average time spent executing an individual option in case r is not (or would not be) triggered, p.sub.r,false

(29) After encoding, the ratio between p.sub.r,false and p.sub.r,true can be computed as:

(30) β r = p r , false p

(31) Finally:
Δt.sub.init,fin=β.sub.r(s.sub.c.sub.init.sub.,r−s.sub.c.sub.fin.sub.,r)p.sub.r(K.sub.r−1)N.sub.r,true

(32) Multiple Speed-Ups in Combination:

(33) Consider now the effects of another speed-up q. In case the tool affected by q is orthogonal to the tool affected by r, namely the two tools affect independent parts of the encoding loop, then the effects of the two speed-ups are additive. An example of this could be if the speed-up q affects inter-prediction whereas the speed-up r affects intra-prediction. In this case, the processes affected by the two encoders are completely independent which means their effects can be computed also independently. Switching from a configuration in which q and r are either enabled or disabled can be performed considering the two speed-ups in isolation: first, the effects of enabling (or disabling) r are computed, leading to a certain time difference as from the aforementioned formulas. Then, the effects of enabling (or disabling) q are computed, leading to another time difference. Finally, the two time differences could be added together to obtain the final total time difference.

(34) The above only holds in case the two speed-ups are independent, namely if they are not nested one within the other. A speedup q is defined as nested within a speed-up r if the tool affected by q is evaluated while testing each individual option whilst evaluating the tool affected by r. Consider for instance that, while executing an option among the possible K.sub.r options that are tested during the execution of the tool affected by r, the encoder must further select among a variety of possible K.sub.q sub-options. A speedup q may be designed to speed-up this process. If enabled, when q is triggered the encoder could avoid the testing of all K.sub.q sub-options, and instead perform some analytics so that a single sub-option is executed instead

(35) Consider a starting configuration c.sub.init where s.sub.c.sub.init.sub.,r={0,1} and s.sub.c.sub.init.sub.,q={0,1}, and a final configuration c.sub.fin where s.sub.c.sub.fin.sub.,m=s.sub.c.sub.init.sub.,m, ∀m≠r, q. The goal is again that of computing the time difference Δt.sub.init,fin. Unfortunately, in case s.sub.c.sub.init.sub.,r=1, namely in case r is enabled in the initial configuration, then in some instances when testing the related tool, not all K.sub.r options are executed. In case the execution of one option is skipped, obviously the information related to whether the nested speed-up is triggered or not cannot be computed (because the encoder skips execution of that option altogether). This means that information regarding encoder decisions or operations on the other skipped options is not available: among these missing data, the number of times the nested speed-up q would have been triggered or not, is also missing. This means that in order to compute the time difference to switch from a configuration in which s.sub.c.sub.init.sub.,r=1 to a configuration in which s.sub.c.sub.fin.sub.,r=0, the missing data must be recovered using other information available at encoding time. The following illustrates this process.

(36) During the encoding, the following parameters can be measured: 1. The total number of instances that the tool affected by r is executed, denoted as N.sub.r. 2. The total number of times that r is (or would be, if s.sub.c.sub.init.sub.,r=0) triggered as N.sub.r,true. Denote as N.sub.r,false=N.sub.r−N.sub.r,true. 3. The total number of instances that the tool affected by q is executed, denoted as N.sub.q. Due to the fact that ifs is not triggered the tool affected by q is tested K.sub.r times, whereas if it is triggered, the tool affected by q is tested only once, then: N.sub.q=N.sub.r,falseK.sub.r+N.sub.r,true. 4. For each tool in which r is not (or would not be) triggered, the following parameters can be measured: a. The total number of times that q is (or would be, if s.sub.c.sub.init.sub.,q=0) triggered as N.sub.q,true,false b. The total number of times that q is not (or would not be, if s.sub.c.sub.init.sub.,q=0) triggered as N.sub.q,false,false. It is easy to show that: N.sub.q,true,false+N.sub.q,false,false=K.sub.rN.sub.r,false 5. For each tool in which r is (or would be) triggered, only one option is performed if s.sub.c.sub.init.sub.,r=1, whereas all options are tested if s.sub.c.sub.init.sub.,r=0. The following parameters can be measured: a. The total number of times that q is (or would be, if s.sub.c.sub.init.sub.,q=0) triggered while testing an option that is finally selected, as N.sub.q,true,true b. In case s.sub.c.sub.init.sub.,q=0, the total number of times that q would be triggered while testing an option that is not finally selected, as Ñ.sub.q,true,true. This number cannot be computed if s.sub.c.sub.init.sub.,q=1. c. The total number of times that q is not (or would not be, if s.sub.c.sub.init.sub.,q=0) triggered as N.sub.q,false,true. 6. When r is not triggered, the ratio between number of times q is triggered over the total number of times q is tested can be computed as D.sub.q=N.sub.q,true,false/N.sub.r,falseK.sub.r 7. The average time spent testing or performing an individual sub-option, denoted as p.sub.q.
In case Ñ.sub.q,true,true is available, then the total number of times q is (or would be) triggered for tools in which r was not triggered can be then computed as:
N.sub.q,true,true=N.sub.q,true,true.sub.q,true,true

(37) On the other hand, if Ñ.sub.q,true,true is not available, then the following can be used, where N.sub.q,true,true is estimated by looking at the average number of times that q was triggered among the tools in which r is not triggered. This information is in fact always available.

(38) Formally:
N.sub.q,true,true≅N.sub.r,trueD.sub.qK.sub.q

(39) Eventually Δt.sub.init,fin can be computed as follows:

(40) Δ t init , fin = p q { ( s c init , q - s c fin , q ) [ ( K q - 1 ) ( N q , true , true + N q , true , false ) ] + ( s c init , r - s c fin , r ) N r , true ( K r - 1 ) K q }

(41) What have here been referred to as “speed-ups” can of course take a wide variety of forms and will vary from one coding standard (or technique) to another.

(42) Examples are given below in the context of an HEVC software codec implementation, in particular the Turing codec HEVC software implementation: ECU—early Coding Unit determination (this speed-up avoids testing of smaller block sizes obtained partitioning the current block, and directly selects the current block size instead) ESD—early SKIP detection (this speed-up avoids testing of inter-prediction modes in a given block, and directly selects the SKIP mode instead) APS—adaptive partition selection (this speed-up avoids testing of specific inter-prediction modes, leaving the encoder the choice among a reduced set of inter-prediction modes) CFM—Coded Flag Mode (this speed-up avoids testing of specific inter-prediction modes, leaving the encoder the choice among a reduced set of inter-prediction modes) FDM—Fast decision for merge (this speed-up avoids testing of inter-prediction modes, and directly selects the merge mode instead) FDAM—Fast decision for all modes (this speed-up avoids testing of specific inter-prediction modes, leaving the encoder the choice among a reduced set of inter-prediction modes) MET—Multiple early termination (this speed-up avoids testing of some candidates during the motion search and directly selects the initial starting candidate instead) RCU—reverse CU algorithm (this speed-up avoids testing of specific block sizes obtained partitioning the current block, leaving the encoder the choice among a reduced set of block sizes) No sub-pixel ME—no sub-pixel motion estimation (this speed-up avoids testing of sub-pixel motion search candidates) No quarter-pixel ME—no quarter-pixel motion estimation (this speed-up avoids testing of quarter-pixel motion search candidates) Fast Intra-prediction (this speed-up avoids testing specific intra-prediction modes in a rate-distortion sense, leaving the encoder the choice among a reduced set of intra-prediction modes) No RQT (this speed-up disables the RQT search during the transform and quantisation steps and directly selects the maximum transform size instead)