Hypothetical reference decoder

RE048953 · 2022-03-01

Assignee

Inventors

Cpc classification

International classification

Abstract

A hypothetical reference decoder.

Claims

1. A method comprising: (a) .[.defining.]. .Iadd.receiving .Iaddend.a first set of .[.at least one value.]. .Iadd.multiple values, each value in the first set being .Iaddend.characteristic of a transmission bit rate for a first .[.segment.]. .Iadd.access point at a start .Iaddend.of a video .[.having an associated first segment presentation start time and an associated first segment presentation end time.]. .Iadd.sequence.Iaddend.; (b) .[.defining.]. .Iadd.receiving .Iaddend.a second set of .[.at least one value.]. .Iadd.multiple values each .Iaddend.characteristic of a buffer size for said first .[.segment.]. .Iadd.access point.Iaddend.; (c) .[.defining.]. .Iadd.receiving .Iaddend.a third set of .[.at least one value.]. .Iadd.multiple values each .Iaddend.characteristic of an initial .[.decoder buffer fullness for said first segment.]. .Iadd.delay for said first access point.Iaddend.; (d) .[.wherein each value within said first set, said second set, and said third set, respectively, is defined so that data received by a decoder for constructing a plurality of video frames of said first segment is free from an underflow state in a buffer of said decoder when said constructing begins at said first segment presentation start time.]. .Iadd.receiving a fourth set of multiple values characteristic of an initial delay for other access points of the video sequence, the other access points being distinct access points from the first access point.Iaddend.; .Iadd.wherein.Iaddend. (e) .[.defining a fourth set of at least one value characteristic of said transmission bit rate for a second segment of said video having an associated second segment presentation start time and an associated second segment presentation end time, said second segment presentation start time being later than first segment presentation start time and said second segment presentation end time being the same as, or earlier, than said first segment presentation end time.]. .Iadd.the values within said first set, said second set, and said third set, respectively, are defined so that data received by a decoder for constructing a plurality of video frames is free from an overflow state for said first access point.Iaddend.; (f) .[.defining a fifth set of at least one value characteristic of said buffer size for said second segment;.]. .Iadd.the values within said first set, said second set, and said fourth set, respectively, are defined so that data received by a decoder for constructing a plurality of video frames is free from an overflow state for each of said other access points.Iaddend. .[.(g) defining a sixth set of at least one value characteristic of said initial decoder buffer fullness for said second segment; (h) wherein each value within said fourth set, said fifth set, and said sixth set, respectively, is defined so that data received by said decoder for constructing a plurality of video frames of said second segment is free from an underflow state in said buffer of said decoder when said constructing begins at said second segment presentation start time; and (i) allowing a user to begin presentation at a user-selected one of said first segment presentation start time, and said second segment presentation start time associated with said second segment.]..

2. The method of claim 1 wherein said first set, second set, and third set of respective values together define at least one leaky bucket model for a buffer of a hypothetical reference decoder.

.[.3. The method of claim 1 wherein said second segment presentation start time corresponds to a local maximum buffer fullness state of a said leaky bucket model constructed using values defined for said first segment of said video..].

4. The method of claim 2 wherein said at least one leaky bucket model uses a fixed transmission bit rate.

5. The method if claim 2 wherein said at least one leaky bucket model uses a variable transmission bit rate.

.[.6. The method of claim 1 including defining further respective sets of at least one value characteristic of a transmission bit rate, a buffer size, and an initial buffer fullness, respectively, each respective further set associated with another respective segment of said video having a presentation start time later than said second segment presentation start time, and a presentation end time the same as, or earlier, than said first segment presentation end time..].

7. The method of claim 1 wherein steps (a) through (.[.h.]. .Iadd.f.Iaddend.) are performed at an encoder having a buffer fullness state complementary to .[.said.]. .Iadd.a .Iaddend.buffer of .[.said.]. .Iadd.a corresponding .Iaddend.decoder.

.[.8. The method of claim 2 wherein said sixth set of at least one value is at least 90% of the buffer size of said at least one leaky bucket model..].

.[.9. The method of claim 1 wherein said fourth set of at least one value equals said first set of at least one value..].

.[.10. The method of claim 1 wherein said fifth set of at least one value equals said second set of at least one value..].

.[.11. The method of claim 1 wherein said sixth set of at least one value equals said third set of at least one value..].

.Iadd.12. The method of claim 1 wherein at least one of said first access point or said other access points correspond to a local maximum buffer fullness state of at least one leaky bucket model for a buffer of a hypothetical reference decoder..Iaddend.

.Iadd.13. The method of claim 1 wherein at least one of said first access point or said other access points correspond to a local minimum buffer fullness state of at least one leaky bucket model for a buffer of a hypothetical reference decoder..Iaddend.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 illustrates decoder buffer fullness.

(2) FIG. 2 illustrates a R-B curve.

(3) FIGS. 3A and 3B illustrate plots of decoder buffer fullness for some bit streams operating in CBR and VBR modes, respectively.

(4) FIG. 4 illustrates a set of N leaky bucket models and their interpolated or extrapolated (R, B) values.

(5) FIG. 5 illustrates initial buffering B.sub.i for any point of the decoder the user seeks to when the rate is R.sub.j.

(6) FIG. 6 illustrates sets of (R, B, F) defined in a forward looking fashion for the particular video stream.

(7) FIG. 7 illustrates the initial buffer fullness (in bits) for a video segment.

(8) FIG. 8 illustrates the selection criteria of a set of 10 points for FIG. 7.

(9) FIG. 9 illustrates selection criteria.

(10) FIG. 10 illustrates delay reductions.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

(11) As previously described, the JVT standard (WD-2) allows the storing of (N>=1) leaky buckets, (R.sub.1, B.sub.1, F.sub.1), . . . , (R.sub.N, B.sub.N, F.sub.N) values which are contained in the bit stream. These values may be stored in the header. Using F.sub.i as the initial buffer fullness and B.sub.i as the buffer size, guarantees that the decoder buffer will not underflow when the input stream comes in at the rate R.sub.i. This will be the case if the user desires to present the encoded video from start to end. In a typical video-on-demand application the user may want to seek to different portions of the video stream. The point that the user desires to seek to may be referred to as the access point. During the process of receiving video data and constructing video frames the amount of data in the buffer fluctuates. After consideration, the present inventor came to the realization that if the F.sub.i value of the initial buffer fullness (when the channel rate is R.sub.i) is used before starting to decode the video from the access point, then it is possible that the decoder will have an underflow. For example, at the access point or sometime thereafter the amount of bits necessary for video reconstruction may be greater than the bits currently in the buffer, resulting in underflow and inability to present video frames in a timely manner. It can likewise be shown that in a video stream the value of initial buffer fullness required to make sure there in no underflow at the decoder varies based on the point at which the user seeks to. This value is bounded by the B.sub.i. Accordingly, the combination of B and F provided for the entire video sequence, if used for an intermediate point in the video will not likely be appropriate, resulting in an underflow, and thus freezing frames.

(12) Based upon this previously unrealized underflow potential, the present inventor then came to the realization that if only a set of R, B, and F values are defined for an entire video segment, then the system should wait until the buffer B for the corresponding rate R is full or substantially full (or greater than 90% full) to start decoding frames when a user jumps to an access point. In this manner, the initial fullness of the buffer will be at a maximum and thus there is no potential of underflow during subsequent decoding starting from the access point. This may be achieved without any additional changes to the existing bit stream, thus not impacting existing systems. Accordingly, the decoder would use the value of initial buffering B.sub.j for any point the user seeks to when the rate is R.sub.j, as shown in FIG. 5. However, this unfortunately sometimes results in a significant delay until video frames are presented after selecting a different location (e.g., access point) from which to present video.

(13) The initial buffer fullness (F) may likewise be characterized as a delay until the video sequence is presented (e.g., initial_cpb_removal_delay). The delay is temporal in nature being related to the time necessary to achieve initial buffer fullness (F). The delay and/or F may be associated with the entire video or the access points. It is likewise to be understood that delay may be substituted for F in all embodiments described herein (e.g., (R,B,delay)). One particular value for the delay may be calculated as delay=F/R, using a special time unit (units of 90 KHz Clock).

(14) To reduce the potential delay the present inventor came to the realization that sets of (R, B, F) may be defined for a particular video stream at each access point. Referring to FIG. 6, these sets of (R, B, F) are preferably defined in a forward looking fashion for the particular video stream. For example set of (R, B, F) values may be computed in the previously existing manner for the video stream as a whole, in addition, a set of F values for the same (R, B) values as those for the whole video stream may be computed in the previously existing manner for the video stream with respect to the video stream from position “2” looking forward, etc. The same process may be used for the remaining access points. The access points may be any frame within the video sequence, I frames of the sequence, B frames of the sequence, or P frames of the sequence (I, B, and P frames are typically used in MPEG based video encoding). Accordingly, the user may select one of the access points and thereafter use the respective F.sub.ij for the desired initial fullness (assuming that the buffer B.sub.j and rate R.sub.j remain unchanged) or otherwise a set of two or more of R.sub.i, B.sub.i, F.sub.ij.

(15) The sets of R, B, F values for each access point may be located at any suitable location, such as for example, at the start of the video sequence together with sets of (R, B, F) values for the entire video stream or before each access point which avoids the need for an index; or stored in a manner external to the video stream itself which is especially suitable for a server/client environment.

(16) This technique may be characterized by the following model:
(R.sub.1, B.sub.1, F.sub.1, M.sub.1, f.sub.11, t.sub.11, . . . , f.sub.M11, t.sub.M11) . . . , (R.sub.N, B.sub.N, F.sub.N, M.sub.N, f.sub.1N, t.sub.1N, . . . , f.sub.MNN, t.sub.MNN),
where f.sub.kj denotes the initial buffer fullness value at rate R.sub.j at access point t.sub.kj (time stamp). The values of M.sub.j may be provided as an input parameter or may be automatically selected.
For example, M.sub.j may include the following options: (a) M.sub.j may be set equal to the number of access points. In this manner the values of f.sub.kj may be stored for each access point at each rate R.sub.j (either at the start of the video stream, within the video stream, distributed through the video stream, or otherwise in any location). (b) M.sub.j may be set equal to zero if no seekability support is desired. (c) M.sub.j values for each rate R.sub.j may be automatically selected (described later).

(17) The system may, for a given R.sub.j, use an initial buffer fullness equal to f.sub.jk if the user seeks an access point t.sub.kj. This occurs when the user selects to start at an access point, or otherwise the system adjusts the user's selection to one of the access points.

(18) It is noted that in the case that a variable bit rate (in bit stream) is used the initial buffer fullness value (or delay) is preferably different than the buffer size, albeit it may be the same. In the case of variable bit rate in MPEG-2 VBV buffer is filled till it is full, i.e. F=B (value of B is represented by vbv_buffer_size).

(19) If the system permits the user to jump to any frame of the video in the manner of an access point, then the decoding data set would need to be provided for each and every frame. While permissible, the resulting data set would be excessively large and consume a significant amount of the bitrate available for the data. A more reasonable approach would be to limit the user to specific access points within the video stream, such as every second, 10 seconds, 1 minute, etc. While an improvement, the resulting data set may still be somewhat extensive resulting in excessive data for limited bandwidth devices, such as mobile communication devices.

(20) In the event that the user selects a position that is not one of the access points with an associated data set, then the initial buffer fullness may be equal to max(f.sub.kj, f.sub.(k+1)j) for a time between t.sub.kj and t.sub.(k+1)j, especially if the access points are properly selected. In this manner, the system is guaranteed of having a set of values that will be free from resulting in an underflow condition, or otherwise reduce the likelihood of an underflow condition, as explained below.

(21) To select a set of values that will ensure no underflow condition (or otherwise reduce) when the above-referenced selection criteria is used, reference is made to FIG. 7. FIG. 7 illustrates the initial buffer fullness (in bits) for a video segment, where the forwarding looking initial buffer fullness is calculated for 10 second increments. Then the system preferably selects an access point at the start of the video sequence and an access point at the end of the video segment. Between the start and the end of the video segment, the system selects the local maximums to include as access points. Also, the system may select the local minimums to include as access points. Preferably, if a limited set of access points are desired the system first selects the local maximums, then the local minimums, which helps to ensure no underflow. Thereafter, the system may further select intermediate points, as desired.

(22) Based upon the selection criteria a set of 10 points for FIG. 7 may be selected as indicated in FIG. 8. Referring to FIG. 9, the 10 selected points are shown by the dashed curve. The resulting initial buffer fullness values at all access points are shown by the solid curve. The solid curve illustrates a “safe” set of values for all points in the video so that the decoder buffer will not underflow. If extreme fluctuations occurred in the bit rate of the actual bit stream that were not detected in the processing, such as a sharp spike, then it is possible to result in an underflow, through normally unlikely. The optimal initial buffer fullness values at all access points are shown by the dash-dotted curve. A significant reduction in the buffering time delay is achieved, in contrast to requiring a full buffer when accessing an access point, as illustrated in FIG. 10.

(23) In addition, if the bit rate and the buffer size remain the same while selecting a different access point, then merely the modified buffer fullness, F, needs to be provided or otherwise determined.

(24) All the references cited herein are incorporated by reference.

(25) The terms and expressions that have been employed in the foregoing specification are used as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding equivalents of the features shown and described or portions thereof, it being recognized that the scope of the invention is defined and limited only by the claims that follow.