ADAPTIVE BIT RATE RATIO CONTROL
20220116626 · 2022-04-14
Inventors
Cpc classification
H04N19/115
ELECTRICITY
H04N19/14
ELECTRICITY
H04N7/24
ELECTRICITY
International classification
H04N19/115
ELECTRICITY
H04N19/14
ELECTRICITY
Abstract
A system for encoding a sequence of frames of a data signal. The system comprises: a first encoding system comprising at least: a first encoder configured to encode the sequence of frames according to a first encoding algorithm; and a first rate control unit configured to control a first bit rate at which the first encoder encodes said sequence of frames; a second encoding system comprising at least: a second encoder configured to encode a second sequence of frames associated with the sequence of frames according to a second encoding algorithm; and a second rate control unit configured to control a second bit rate at which the second encoder encodes said second sequence of frames associated with the sequence of frames.
Claims
1. (canceled)
2. A system for encoding a sequence of frames of an input data signal, the system comprising: a second encoding system comprising at least; a down-scaler to down scale the sequence of frames of the input data signal to generate a first sequence of frames; an interface to enable communications between the second encoding system and a first encoding system, wherein the first encoding system comprises at least a first encoder configured to encode the first sequence of frames according to a first encoding algorithm and a first rate control unit to control a first bit rate at which the first encoder encodes the first sequence of frames, wherein the first encoding system outputs a first stream corresponding to an encoded version of the first sequence of frames, wherein the second encoding system further comprises at least: an up-scaler to up scale a second stream corresponding to a decoded version of the first stream to generate a second sequence of frames associated with the first sequence of frames; a second encoder configured to encode data derived from the second sequence of frames according to a second encoding algorithm to output a third stream; a second rate control unit to control a second bit rate at which the second encoder encodes the data derived from the second sequence of frames; and a configuration manager configured to receive configuration parameters and to control the first encoding system and the second encoding system, wherein the configuration manager is configured to communicate with the first rate control unit and the second rate control unit to maintain a specified combined transmission rate for the first and third streams.
3. The system of claim 2, wherein the second encoding system is configured to encode a first difference between the sequence of frames of the input data signal and the second sequence of frames.
4. The system of claim 3, wherein the second encoding system is configured to compute a second difference between the first sequence of frames and the second stream and to further encode the second difference to generate an encoded correction data stream, the encoded correction data stream forming part of the third stream output by the second encoder.
5. The system of claim 4, wherein the second encoding system is configured to decode the encoded correction data stream to generate a decoded correction data stream, the decoded correction data stream being summed with the second stream prior to up-scaling by the up-scaler.
6. The system of claim 2, wherein the first encoding system comprises a legacy first encoding system and the system comprises a plug-in for the legacy first encoding system.
7. The system of claim 2, wherein the configuration manager is configured to measure a temporal and/or spatial complexity of frames of the input data signal and to dynamically control the first bit rate and second bit rates based on the measured temporal and/or spatial complexity.
8. The system of claim 7, wherein the temporal and/or spatial complexity is based on one or more measures of entropy for one or more frames of the input data signal.
9. The system of claim 2, wherein a minimum value for a Quantization Parameter (QP) to be used by the first encoder is communicated from the second encoding system.
10. The system of claim 9, wherein the value of the Quantization Parameter (QP) is lowered for more complex frames and raised for less complex frames.
11. The system of claim 2, wherein the first rate control unit is configured to send an instruction via the interface to the second rate control unit indicating a bit rate available for the second encoder.
12. The system of claim 11, wherein the second encoding system is configured to send a signal via the interface to the first encoding system indicating one or more of a request for a bit rate to be allocated to the second encoder and a measure of complexity for a frame.
13. The system of claim 2, wherein the configuration manager is configured to control a size of filler data inserted into the first stream, wherein the third stream is used in place of the filler data within a combined stream comprising the first and third streams.
14. The system of claim 2, wherein the configuration manager is configured to dynamically control a ratio of the first bit rate to the second bit rate.
15. A method for encoding a sequence of frames of an input data signal, the method comprising: downscaling the sequence of frames of the input data signal to generate a first sequence of frames; receiving a decoding of an encoding of the first sequence of frames, the encoding being generated using a first encoding system; upscaling data derived from said decoding to generate a second sequence of frames associated with the first sequence of frames; and encoding data derived from the second sequence of frames using a second encoding system, wherein the first encoding system is adapted to generate a first encoded data stream according to a first bit rate, wherein the second encoding system is adapted to generate a second encoded data stream according to a second bit rate, and wherein the method further comprises: receiving configuration parameters to control the first encoding system and the second encoding system; and communicating with the first encoding system and the second encoding system to maintain a specified combined transmission rate for the first and second encoded data streams.
16. The method of claim 15, comprising: determining a first difference between the sequence of frames of the input data signal and the second sequence of frames; encoding the first difference as part of the second encoded data stream; determining a second difference between the first sequence of frames and the decoding of the encoding of the first sequence of frames; and encoding the second difference as part of the second encoded data stream.
17. The method of claim 16, wherein a decoding of the second difference is added to the decoding of the encoding of the first sequence of frames prior to upscaling.
18. The method of claim 15, further comprising: measuring a temporal and/or spatial complexity for frames of the input data signal; and dynamically controlling the first bit rate and second bit rates based on the measured temporal and/or spatial complexity.
19. The method of claim 15, further comprising: communicating a minimum value for a Quantization Parameter (QP) to be used by the first encoding system from the second encoding system, wherein the value of the Quantization Parameter (QP) is lowered for more complex frames and raised for less complex frames.
20. The method of claim 15, further comprising: dynamically controlling a ratio of the first bit rate to the second bit rate.
21. A non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: receive configuration parameters to control a first encoding system and a second encoding system; downscale a sequence of frames of an input data signal to generate a first sequence of frames; receive a decoding of an encoding of the first sequence of frames, the encoding being generated using the first encoding system; upscale data derived from said decoding to generate a second sequence of frames associated with the first sequence of frames; and encode data derived from the second sequence of frames using the second encoding system, wherein the first encoding system is adapted to generate a first encoded data stream according to a first bit rate, wherein the second encoding system is adapted to generate a second encoded data stream according to a second bit rate, and wherein the first bit rate of the first encoding system and the second bit rate of the second encoding system are controlled based on the configuration parameters to maintain a specified combined transmission rate for the first and second encoded data streams.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0017]
[0018]
[0019]
[0020]
DETAILED DESCRIPTION
[0021] Referring to
[0022] At this point, a difference is taken between data stream and up-scaled data stream to produce a difference data stream. The difference data stream is encoded using a second encoder. Said second encoder typically uses a second encoding algorithm. Such second encoding algorithm takes difference data stream, applies a specific set of transformation matrices and encodes the resulting transformed data stream using an entropy encoder to produce an encoded reconstruction data stream. The second encoding algorithm may also include a quantization process before use of the entropy encoder. A full discussion on this second encoding algorithm, how it works, and what type of matrices are used is described in International patent application Pub. No. WO 2013/171173 which is incorporated herein by reference. Said reconstruction data stream can also include information on how to reconstruct a rendition of data stream starting from a decoded version of encoded data stream.
[0023] Reconstruction data stream can be provided to a supplemental enhancement information (SEI) unit to produce data stream. Typically, the SEI unit is adapted to combine into a single elementary stream two or more SEI encapsulated streams, in the present case reconstruction data stream and encoded data stream. The encapsulation may be made by the SEI unit directly, or it could be fed to the SEI unit after encapsulation. The functioning of the SEI is based on standards such as ITU-T H.264. Alternatively (or in addition), encoded data stream and reconstruction data stream are multiplexed together by transport stream multiplexer to produce a multiplexed transport data stream. Alternatively, or in addition (but not shown in the Figure) single elementary stream and reconstruction data stream can be multiplexed together by transport stream multiplexer (or another transport stream multiplexer) to produce a multiplexed transport data stream.
[0024] The reconstruction data stream provides some significant advantages, some of which are explained here. A first advantage is to allow for the reproduction after decoding at the decoder side of a first quality and/or resolution video stream (e.g., High Definition, HD) starting from a second quality and/or resolution video stream (e.g., Standard Definition, SD) which would be otherwise provided by decoding only the encoded data stream produced by the first encoding system, the first quality and/or resolution being higher than the second quality and/or resolution. The overall bit rate used by the combination of the reconstruction data stream and the encoded data stream is lower than the bit rate which would be required by the first encoder to produce an encoded data stream which, when decoded and played at the decoder side, would result in a video stream of quality and/or resolution comparable to that of the first quality and/or resolution. Another advantage it that is allows back-compatibility with existing decoding system and/or compatibility with multiple device types, whereby one existing decoding system could decode based only on the encoded data stream and another existing device system could decode based on the on both the encoded data stream and the reconstruction data stream.
[0025] The multiplexed data stream and/or the single elementary stream is then sent over a transmission channel (e.g., over the air, cable, etc.) to a decoding system (not shown) which would then use the combination of encoded data stream and reconstruction data stream to reconstruct a rendition of the original data stream. In particular, encoded data stream is provided to a first decoder which decodes encoded data stream using a decoding mechanism which corresponds to the encoding mechanism used by the first encoder (e.g., a standard-based MPEG decoding algorithm such as H.264). The stream so decoded is then used as a “base” layer by a second decoder which combines it with reconstruction data stream to reconstruct a rendition of the original data stream. This second decoder uses a decoding mechanism which corresponds to the encoding mechanism used by the second encoder. More details on the encoding and decoding mechanisms, as well as on the overall mechanisms described above can be found in International patent application Pub. No. WO 2014/170819 which is incorporated herein by reference.
[0026] The ratio between the bit rate allocated and/or associated with the first data stream (e.g., encoded data stream) encoded with first encoding system and the bit rate allocated and/or associated with the second data stream (e.g., reconstruction data stream) encoded with the second data system is an important factor in ensuring that the quality of the reconstructed rendition of the original data stream is optimised whilst the bit rate of the multiplexed data stream is kept under control. A typical range of ratio values is between 60:40 and 90:10, where the first number indicates the percentage of bit rate allocated and/or associated with the first data stream and the second number indicates the percentage of bit rate allocated and/or associated with the second data stream. A potential ratio value is 70:30.
[0027] In the specific example of
[0028] Importantly, rate control unit typically interacts with first encoder to ensure that encoded data stream is such that a certain reference bit rate (e.g., a Constant Bit Rate CBR) is maintained. Rate control units are well known in the art. An example of how they work is described in “A Generalized Hypothetical Reference Decoder for H.264/AVC”, IEEE Transactions On Circuits and Systems for Video Technology, Vol. 13, No. 7, Jul. 2003 by J. Ribas-Corbera et. Al., whose contents are incorporated herein by reference. In particular, this paper describes the concept/strategy of the leaky bucket. This rate control technique looks at a rate of data generated by an encoder and a rate of encoded data to be sent by the encoder at a specific rate (e.g., CBR). Since the rate of data generated by the encoder is typically variable over time, the amount of encoded data to be sent by the encoder may fall below a threshold as the encoded data are sent at said specific rate. For example, if the specific rate is a CBR of 3 Mbits/s, and the encoder generates for the first two seconds 6 Mbits and for the next four seconds 10 Mbits, that implies that for the first two seconds the bucket should remain at the same level it was before generation of those 6 Mbits of encoded data (since the data are transmitted at a constant bit rate of 3 Mbit/s) but then for the next four seconds the level of encoded data in the bucket would reduce by 2 Mbits (as there would be 12 Mbits of encoded data transmitted with “only” 10 Mbits of data generated over the same period). If the level of encoded data available for transmission falls below a certain threshold, the rate control unit would instruct the encoder to “compensate” for this shortfall of available encoded bits by generating more encoded bits. This, for example, could be done by changing the encoding parameters such as QP in order for the encoder to use more bits for encoding. If the encoder is not capable of generating additional encoding bits to move the level of available encoded bits above the threshold, then the rate control unit would enable generation of default bits (e.g., zeros) to “fill” the bucket of available data, thus ensuring that the constant bit rate of transmission is maintained. Conversely, there could also be a higher threshold that should be monitored to avoid that there are too many available encoded bits compared to the rate at which they can be transmitted, as in that case it would create a problem at the decoder side where frames to be decoded could be dropped as a consequence. For example, if the level of encoded data available for transmission goes above a certain threshold, the rate control unit would instruct the encoder to generate less encoded bits by, for example, changing the QP so that a less finer quantization is performed.
[0029] During experimental tests performed on video signals (e.g., a sequence of video frames) it was observed that in scenes with high spatial complexity and low temporal complexity the first encoder required a low bit rate, and therefore only a small percentage of the available bit rate should be given to the first encoder allowing the second encoder to use a higher bit rate by reusing all or part of the remaining bit rate. On the other hand, in scenes with low spatial complexity but high temporal complexity it is best to give most of the available bit rate to the first encoder. In general, spatial complexity is inversely proportional to spatial correlation present within a single frame. In other words, high spatial complexity means that the elements in the scenes are less correlated (e.g., a scene where there are a large number of details of non-repetitive nature, for example a scene of a crowd at a football stadium) whereas low spatial complexity means that the elements in the scene are more correlated. Additionally, temporal complexity is inversely proportional to temporal correlation between frames. In other words, high temporal complexity means that consecutive frames are less correlated (e.g., they are significantly different, for example because many elements change positions and/or shapes), whereas low temporal complexity means that consecutive frames are more correlated.
[0030] One reason for this behaviour is that the first encoder may be optimised for utilising the underlying temporal correlation in a sequence of frames in an efficient manner which typically results in a better compression rate. For example, in the case MPEG-based encoding algorithms, a typical sequence of frames is encoded by having Group of Pictures (GOPs) in which an initial I-frame (e.g., a frame which can be decoded only using data encoded for that specific frame) is followed by a series of P-frames (i.e. frames which require data encoded from previous frames for decoding, but because of this allow for higher compression rates than I-frames) and/or B-frames (i.e., frames which require data encoded from previous and subsequent frames for decoding, but because of this allow for higher compression rates than I-frames and P-frames). The higher successive frames are correlated, the more P-frames and B-frames can be used, and therefore the encoder will require lower bit rates. On the other hand, the second encoder is usually one that maximizes spatial correlation, and therefore when the spatial correlation is higher, using a higher bit rate for the second encoder will allow to increase the quality of the reconstructed rendition of the original data stream by using more bits for the second data stream.
[0031] A further observation is that, when low bit rates are available and/or used for transmission, in order to have a sufficiently good level of quality for the reconstructed rendition of the original data stream in the event of temporally complex scenes, a big proportion of the available and/or used bit rate should be given to the first encoder (for example, equal or more than 85%). However, that would mean setting a ratio, in the current example of 85:15, which in turn implies that when for example there is a scene change with the new scene being very sharp (i.e., high spatial complexity) with low temporal complexity, it is likely that 15% of bit rate is not enough for the second encoder to provide a sufficient amount of reconstruction data over the reconstruction data stream for obtaining a sufficiently good level of quality for the reconstructed rendition of the original data stream. At the same time, because of the low temporal complexity the first encoder would only be using a small portion of bits it is allowed and generate filler in order not to underflow the decoder buffer. This filler correspond to “waste” bits which are only inserted in the bit stream in order to keep a constant bit rate but without any further benefit for the encoded sequence.
[0032] Accordingly, a possible solution would be to prevent the first encoder from putting the filler in the bit stream and instead adding bits from the reconstruction data stream so as to decrease the ratio and giving a bigger percentage of the available bit rate to the reconstruction data stream.
[0033] In an exemplary embodiment as shown in
[0034] In particular, rate control unit, based on information about the combined transmission of the encoded data (e.g., encoded data stream and reconstruction data stream), can determine that the portion of the encoded data stream which is or is intended to be used for filler could be instead used for adding more data for the reconstruction data stream. For example, rate control unit can instruct second encoder to encode using more bits, thus increasing the bit rate of the reconstruction data stream, and consequently allowing for an improvement of the data stream once decoded. One mechanism for the described determination is the use of a “joint” leaky bucket which is a modification of the leaky bucket described above. In particular, the “joint” leaky bucket is used to keep track of the combined encoded data available for transmission (i.e., both the available encoded data generated by the second encoder 116 and the available encoded data generated by the first encoder) and maintain a specific combined transmission rate (e.g., a CBR). As in the mechanism described above, the rate control unit using said “joint” leaky bucket would control the second bit rate of the second encoder in order to ensure that the combined transmission rate is maintained. In particular, if detected that filler is or is to be inserted by the first encoding system, it would instruct second encoder to generate more bits (e.g., by having a finer granularity in the quantization process) in order to use them in place of all or part of the filler. Then, the rate control unit would send a signal to instruct the first encoding system via the API to avoid generating such filler. In addition, the rate control unit will also use the “joint” leaky bucket to determine whether to increase the number of bits generated by the second encoder when the level of combined available encoded data falls below a threshold. This is because in that case the second encoder could “compensate” for the shortfall by generating more bits (e.g., by having a finer granularity in the quantization process) thereby increasing the quality of the reconstruction data stream. If the second encoder is unable to further increase the bits used for encoding and the level of combined available encoded data is still below a threshold, then the rate unit may enable generation of filler by the second encoding system. In this way, the combined stream timing and buffer management does not change but an improvement in the quality of the video played at the decoder is achieved.
[0035] Alternatively (or in addition), rate control unit may decide that, based on the temporal and/or spatial complexity of the original data stream, it would be beneficial to dedicate more bits to the second encoder so as to generate more bits for the reconstruction data stream. For example, as described above, if the scene has high spatial complexity and low temporal complexity, then rate control unit may decide to dedicate more bits to the second encoder. In such case, upon verifying based on the additional information that the first encoding system intends to use some filler, rate control unit may send a signal to instruct the first encoding system via the API to avoid generating such filler. This signal may be sent to rate control unit which in turn may instruct the first encoder not to generate this filler. Alternatively, rate control unit may send a signal to indicate to the first encoding system via the API that a certain number of bits are required for the reconstruction data stream and should be “reserved” by the first encoding system. As a consequence, the first encoding system should only generate filler (if any) for up to the difference between the size of the filler that the first encoder has added or is planning to add into said encoded data stream and the number of bits required for the reconstruction data stream.
[0036] The temporal and/or spatial complexity may be measured in various manners. One possible way is by measuring the entropy associated with a frame and/or a sequence of frames within the original data stream. The entropy is defined as the expected value (e.g., average) of the information contained in a data set. For example, in the case of a frame, it can be seen as the expected value of the information contained in that frame, so that a frame with a high content of information (e.g., a frame showing a crowd in a stadium) will have a high entropy. In the case of a sequence of frames, it could be seen as the expected value of the information across frames (e.g., a sequence of frames with fast moving objects). The measured entropy is compared against a threshold, and if above said threshold the corresponding complexity is determined as being high. Conversely, if below said threshold, the corresponding complexity is determined as being low. Of course, multiple thresholds could be used, for example two thresholds, a higher one above which complexity is determined to be high, and a lower one below which complexity is determined to be low. The measured entropy could refer to a spatial entropy (e.g., the entropy within a frame), a temporal entropy (e.g., the entropy between frames), or a combination of the two. The thresholds could then be adapted accordingly to account for the exact type of corresponding complexity. For example, in the case of entropy associated both with a temporal and a spatial complexity, there could be either a combined set of thresholds (i.e., thresholds that take into account both spatial and temporal complexity—this would be the case, for example, of a measure of combined entropy which is both a function of spatial entropy and temporal entropy) or two pairs of sets of thresholds, one set per dimension of entropy (i.e., one set for spatial entropy, the other for temporal entropy). The latter would be the case of a measure of entropy which includes two separate entropy measurements, one for spatial entropy and another for temporal entropy.
[0037] In a further embodiment, signal may also include a minimum value for the Quantization Parameter (QP) which is to be used by the first encoder. The QP is used for quantizing data when performing lossy compression. This in turn would enable to control the size of the data stream encoded by the first encoder and of the size of the filler so that the rate of bit rate of the reconstruction data stream can be managed in a manner that ensures sufficiently good level of quality for the reconstructed rendition of the original data stream. For example, this mechanism could allow to decrease the size of the filler and therefore enable more bit rate for the reconstruction data stream, thus increasing the quality of the video played at the decoder side.
[0038] In a further embodiment, it was observed that for high complexity scenes (e.g., both temporally and spatially) both the first encoder and the second encoder would have problems in encoding in an efficient manner using a lossy compression mechanism, and therefore the resulting encoded stream is likely to produce artefacts when decoded. From a visual and perceptual perspective (e.g., from the perspective of a person viewing the stream once decoded) temporal and low resolution artefacts caused by the first encoder are more noticeable and affect the visual perception of the decoded stream, thus resulting in a perception of an overall low quality of the decoded stream and an unsatisfactory visual experience. Such artefacts are, for example, “blocky” scenes (i.e., scenes where blocks of the frames are either disappearing or whose boundaries are evident when viewing the frame) or black spots in the frame. Those temporal and low resolution artefacts negatively affect the visual perception of the decoded stream to a greater extent than if the stream once decoded was to lack in high resolution details (e.g., due to the lack of bit rate given to second encoder). Note that the latter are usually the details added when decoding the reconstruction data stream produced by the second encoding system. Accordingly, it was concluded that it may be better to use most of the combined bit rate for the first encoder (thus increasing the first bit rate), so that a lower QP could be used instead and therefore more bits were given to the encoded stream. Accordingly, when it is determined that the scene is highly complex, a signal can be sent from second encoding system to first encoding system via the API in order, for example, to lower the QP used by the first encoder. For example, this signal can be sent as part of signal.
[0039] In an exemplary embodiment as shown in
[0040]
[0041] Although at least some aspects of the examples described herein with reference to the drawings comprise computer processes performed in processing systems or processors, examples described herein also extend to computer programs, for example computer programs on or in a carrier, adapted for putting the examples into practice. The carrier may be any entity or device capable of carrying the program.
[0042] The use of a modular structure such as the one depicted in any of the Figures provides also an advantage from an implementation and integration point of view, enabling a simple integration into legacy systems as well as compatibility with legacy systems. By way of example, the second encoding system could be embodied as a plug-in (including libraries and/or source code) to an existing firmware and/or software which already embodies legacy first encoding system (for example one that is already installed in legacy encoder systems). The first encoding system and the second encoding system may be embodied as a single system, or as two separate systems. In addition, the API could need to be included, and the necessary modifications to the first encoding system would need to be made based on the specific embodiment required. The first encoding system may provide to the second encoding system the encoded data stream or the decoded version of the encoded data stream.
[0043] It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with at least one feature of any other of the embodiments, or any combination of any other of the embodiments. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.