Robust packet loss handling in recording real-time video
10110930 ยท 2018-10-23
Assignee
Inventors
Cpc classification
H04N19/166
ELECTRICITY
H04N21/4425
ELECTRICITY
H04N21/44004
ELECTRICITY
H04N19/188
ELECTRICITY
H04N19/107
ELECTRICITY
International classification
H04N5/92
ELECTRICITY
H04N21/44
ELECTRICITY
H04L1/16
ELECTRICITY
H04N19/107
ELECTRICITY
H04N19/166
ELECTRICITY
H04N19/86
ELECTRICITY
H04N19/169
ELECTRICITY
Abstract
Improved systems and methods of video decoding and recording in real-time video communications for use in lossy network environments. The disclosed systems and methods can employ a plurality of wait time thresholds for retransmission of missing video packets, based at least on the processing performed on the respective video packets, such processing including video decoding in a real-time video communication between client devices, and video recording and storing in a video file. The disclosed system and methods can also adaptively perform error concealment on video frames in the bitstream domain prior to recording and storing encoded video frame data in a video file, based at least on estimates of the complexities of the respective video frames.
Claims
1. A method of video decoding and recording in real-time video communications in a lossy network environment, comprising: obtaining, at a video receiver, one or more video packets of a real-time video communication, the video receiver including a video decoder for decoding data of at least one video frame, and a video recorder for recording data of the at least one video frame; determining whether the one or more obtained video packets of the real-time video communication have one or more missing video packets, the one or more missing video packets being indicative of a possible eventual packet loss at the video receiver; having determined that the one or more obtained video packets of the real-time video communication have one or more missing video packets, transmitting, by the video receiver, at least one retransmission request for at least some of the missing video packets; monitoring a time elapsed since transmission of the at least one retransmission request; in the event the monitored time elapsed exceeds a first wait time threshold based on a first type of processing relating to the decoding of the data of the at least one video frame, determining a first occurrence of eventual packet loss relating to the decoding of the data of the at least one video frame; and in the event the monitored time elapsed exceeds a second wait time threshold based on a second type of processing relating to the recording of the data of the at least one video frame, determining a second occurrence of eventual packet loss relating to the recording of the data of the at least one video frame.
2. The method of claim 1 further comprising: having determined the first occurrence of eventual packet loss relating to decoding the data of the at least one video frame, transmitting, by the video receiver, at least one request for an intra-coded video frame; and directing the video decoder to at least temporarily stop the decoding of the data until receipt of the requested intra-coded video frame at the video receiver.
3. The method of claim 2 further comprising: upon the receipt of the requested intra-coded video frame, directing the video decoder to resume the decoding of the data of the at least one video frame in a real-time fashion.
4. The method of claim 1 further comprising: having determined the second occurrence of eventual packet loss relating to recording the data of the at least one video frame, transmitting, by the video receiver, at least one request for an intra-coded video frame; and directing the video recorder to at least temporarily stop the recording of the data until receipt of the requested intra-coded video frame at the video receiver.
5. The method of claim 4 further comprising: upon the receipt of the requested intra-coded video frame, directing the video recorder to resume the recording of the data of the at least one video frame in a non-real-time fashion.
6. The method of claim 1 wherein the second wait time threshold is greater than the first wait time threshold.
7. The method of claim 1 wherein the transmitting of the at least one retransmission request includes transmitting at least one Generic Negative Acknowledgement (GNACK) message.
8. The method of claim 1 wherein the transmitting of the at least one request for the intra-coded video frame includes transmitting at least one Picture Loss Indication (PLI) message.
9. The method of claim 1 further comprising: in the event the one or more obtained video packets of the real-time video communication do not have one or more missing video packets, reconstructing data of at least one encoded video frame from the one or more video packets, and decoding the data of the at least one encoded video frame.
10. The method of claim 9 further comprising: providing the at least one video frame in a video sequence for viewing on a display of a client device, or for re-encoding for transmission of re-encoded data to another client device.
11. The method of claim 1 further comprising: in the event the obtained video packets of the real-time video communication do not have one or more missing video packets, recording, by the video recorder, the data of the at least one video frame in a video file.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate one or more embodiments described herein and, together with the Detailed Description, explain these embodiments. In the drawings:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
DETAILED DESCRIPTION
(13) Improved systems and methods of video decoding and recording in real-time video communications are disclosed for use in lossy network environments. The disclosed systems and methods can employ a plurality of wait time thresholds for retransmission of missing video packets, based at least on the processing performed on the respective video packets, such processing including video decoding in real-time video communications between client devices, and/or video recording and storing in video files. The disclosed system and methods can also adaptively perform error concealment on video frames in the bitstream domain prior to recording and storing encoded video frame data in video files, based at least on estimates of the complexities of the respective video frames.
(14)
(15) In a real-time video communication between the video sender 102 and the video receiver 104, the video sender 102 can receive, generate, or otherwise obtain a sequence of video frames (also referred to herein as a/the video sequence) at the video encoder 108, which is operative to encode data of the respective video frames, and to provide the encoded video frame data to the video packetizer 110. The video packetizer 110 is operative to packetize the encoded video frame data into one or more video packets, and to provide the video packets to the transmission/retransmission buffer 114 of the network adaptor 112. The network adaptor 112 is operative to transmit, over the network 106, the video packets from the transmission/retransmission buffer 114 to the video receiver 104. The video receiver 104 can receive or otherwise obtain the video packets transmitted over the network 106, and temporarily store the video packets in the jitter buffer 124 of its network adaptor 116. The network adaptor 116 is operative to provide the video packets from the jitter buffer 124 to the video depacketizer 118. The video depacketizer 118 is operative to reconstruct the encoded video frame data from the video packets, and to provide the encoded video frame data in a video frame bitstream to the video decoder 120. The video decoder 120 is operative to decode the encoded video frame data in order to obtain video frames, and to provide the video frames in the video sequence for viewing by a human user on a display of the client device, or for further processing in another functional module or component, such as re-encoding the video data for transmission over the network 106 to a client device of another party. The video depacketizer 118 is further operative to provide the encoded video frame data to the video recorder 122 for storage in a video file within a suitable video storage area.
(16)
(17)
(18) As depicted in block 210, in the event one or more missing video packets are detected, a wait time clock is initialized (e.g., an elapsed time, T.sub.wait, can be set to zero (0) or any other suitable value) by the wait time monitor 134, and at least one request, such as a real-time control protocol (RTCP) message (e.g., a GNACK message), for retransmission of the missing video packets is transmitted, by the GNACK message transmitter 136, over the network 106 to the video sender 102. For example, the GNACK message transmitter 136 can transmit such a GNACK message over a wired and/or wireless communication path 128a to the network 106, and over a wired and/or wireless communication path 128b from the network 106 to the network adaptor 112 of the video sender 102. Such a GNACK message can be configured to identify one or more video packets that have been detected as being missing at the video receiver 104, and to request the video sender 102 to retransmit the identified missing video packets to the video receiver 104.
(19) As depicted in block 212, one or more video packets (including any retransmitted video packets) are received or otherwise obtained at the network adaptor 116 of the video receiver 104 over the network 106 from the video sender 102. For example, the network adaptor 112 of the video sender 102 can retransmit the video packets identified as being missing over a wired and/or wireless communication path 130a to the network 106, and over a wired and/or wireless communication path 130b from the network 106 to the network adaptor 116 of the video receiver 104. The received video packets are temporarily stored in the jitter buffer 124. As depicted in block 214, a determination is made, by the missing packet detector 132, as to whether the received video packets have any missing video packets. In the event no missing video packets are detected, the received video packets are depacketized (see block 206) by the video depacketizer 118, encoded video frame data are reconstructed from the video packets, and the encoded video frame data are decoded (see block 208) to YUV video frames by the video decoder 120. The method of
(20) As depicted in block 216, in the event one or more missing packets are detected, a determination is made, by the wait time monitor 134, as to whether the elapsed time, T.sub.wait, of the wait time clock exceeds a first wait time threshold, THR.sub.wait.sub._.sub.dec. For example, the value of the first wait time threshold, THR.sub.wait.sub._.sub.dec, can be set as a function of the round trip delay between the video sender 102 and the video receiver 104, and can be optimized by a tradeoff between the overall delay and the eventual packet loss ratio at the video receiver 104. In the case of real-time video communications, a higher value for the first wait time threshold, THR.sub.wait.sub._.sub.dec, can be used to decrease the eventual packet loss ratio while increasing the overall delay, and a lower value for the first wait time threshold, THR.sub.wait.sub._.sub.dec, can be used to possibly increase the eventual packet loss ratio while decreasing the overall delay. Because the overall delay can be a significant performance factor in real-time communications, the first wait time threshold, THR.sub.wait.sub._.sub.dec, can be set to a relatively small value, e.g., 500 milliseconds or lower, depending on the round trip delay between the video sender 102 and the video receiver 104.
(21) As depicted in block 222, in the event the elapsed time, T.sub.wait, of the wait time clock does not exceed the first wait time threshold, THR.sub.wait.sub._.sub.dec, at least one further request (e.g., a GNACK message) for retransmission of the missing video packets is transmitted, by the GNACK message transmitter 136, over the network 106 to the video sender 102. As depicted in block 218, in the event the elapsed time, T.sub.wait, of the wait time clock exceeds the first wait time threshold, THR.sub.wait.sub._.sub.dec, at least one request, such as a real-time control protocol (RTCP) message (e.g., a PLI message), for transmission of an I-frame is transmitted, by the PLI message transmitter 138, over the network 106 to the video sender 102. For example, the PLI message transmitter 138 can transmit such a PLI message over the wired and/or wireless communication path 128a to the network 106, and over the wired and/or wireless communication path 128b from the network 106 to the network adaptor 112 of the video sender 102. Such a PLI message can be configured to indicate the loss of an unspecified amount of video packets, and to request the transmission of an intra-coded frame (also referred to herein as an/the I-frame). As depicted in block 220, the video decoder 120 is directed, by the video depacketizer 118, to at least temporarily stop decoding encoded video frame data until the requested I-frame is received and processed at the network adaptor 116 of the video receiver 104.
(22)
(23) As depicted in block 232, in the event one or more missing video packets are detected, the wait time clock is initialized (e.g., the elapsed time, T.sub.wait, can be set to zero (0) or any other suitable value) by the wait time monitor 134, and at least one request (e.g., a GNACK message) for retransmission of the missing video packets is transmitted, by the GNACK message transmitter 136, over the network 106 to the video sender 102. As depicted in block 234, one or more video packets (including any retransmitted video packets) are received or otherwise obtained at the network adaptor 116 of the video receiver 104 over the network 106 from the video sender 102. The received video packets are temporarily stored in the jitter buffer 124. As depicted in block 236, a determination is made, by the missing packet detector 132, as to whether the received video packets have any missing video packets. In the event no missing video packets are detected, the received video packets are depacketized (see block 228) by the video depacketizer 118, encoded video frame data are reconstructed from the video packets, and the encoded video frame data is recorded and stored (see block 230) by the video recorder 122 in a video file. The method of
(24) As depicted in block 238, in the event one or more missing packets are detected, a determination is made, by the wait time monitor 134, as to whether the elapsed time, T.sub.wait, of the wait time clock exceeds a second wait time threshold, THR.sub.wait.sub._.sub.rec. For example, the second wait time threshold, THR.sub.wait.sub._.sub.rec, can be greater than the first wait time threshold, THR.sub.wait.sub._.sub.dec, or any other suitable value. Because the recording of a real-time video communication typically does not involve the concurrent viewing of the video in real-time, the second wait time threshold, THR.sub.wait.sub._.sub.dec, can be set to a value that is greater than the value of the first wait time threshold, THR.sub.wait.sub._.sub.dec, as follows:
THR.sub.wait.sub._.sub.rec=*THR.sub.wait.sub._.sub.dec,(1)
in which is greater than one (1) (e.g., =3).
(25) As depicted in block 244, in the event the elapsed time, T.sub.wait, of the wait time clock does not exceed the second wait time threshold, THR.sub.wait.sub._.sub.rec, at least one further request (e.g., a GNACK message) for retransmission of the missing video packets is transmitted, by the GNACK message transmitter 136, over the network 106 to the video sender 102. As depicted in block 240, in the event the elapsed time, T.sub.wait, of the wait time clock exceeds the second wait time threshold, THR.sub.wait.sub._.sub.rec, at least one request (e.g., a PLI message) for transmission of an I-frame is transmitted, by the PLI message transmitter 138, over the network 106 to the video sender 102. As depicted in block 242, the video recorder 122 is directed, by the video depacketizer 118, to at least temporarily stop recording and storing the encoded video frame data until the requested I-frame is received and processed at the network adaptor 116 of the video receiver 104.
(26)
(27) As depicted in block 260, in the event one or more missing video packets are detected, the wait time clock is initialized (e.g., the elapsed time, T.sub.wait, can be set to zero (0) or any other suitable value) by the wait time monitor 134, and at least one request (e.g., a GNACK message) for retransmission of the missing video packets is transmitted, by the GNACK message transmitter 136, over the network 106 to the video sender 102. As depicted in block 262, one or more video packets are further received at the network adaptor 116 of the video receiver 104 over the network 106 from the video sender 102, and the video packets are temporarily stored in the jitter buffer 124. As depicted in block 264, a further determination is then made, by the missing packet detector 132, as to whether the received video packets have any missing video packets. In the event no missing video packets are detected, the received video packets are depacketized (see block 250) by the video depacketizer 118, encoded video frame data are reconstructed from the video packets, the encoded video frame data are decoded (see block 252) by the video decoder 120 in order to obtain video frames. The video frames can be provided, by the video decoder 120, in the video sequence for viewing by the human user on the display of the client device, or for further processing in another functional module or component, such as re-encoding the video data for transmission over the network 106 to a client device of another party. A further determination is made (see block 254), at the video recorder 122, as to whether a recording of the encoded video frame data is desired. In the event a recording of the encoded video frame data is desired, the encoded video frame data is recorded (see block 258) and stored by the video recorder 122 in the video file. In the event a recording of the encoded video frame data is not desired, the method of
(28) As depicted in block 265, in the event one or more missing packets are detected, a determination is made, by the wait time monitor 134, as to whether an elapsed time, T.sub.wait, of the wait time clock exceeds the first wait time threshold, THR.sub.wait.sub._.sub.dec. As depicted in block 266, in the event the elapsed time, T.sub.wait, of the wait time clock does not exceed the first wait time threshold, THR.sub.wait.sub._.sub.dec, at least one further request (e.g., a GNACK message) for retransmission of the missing video packets is transmitted, by the GNACK message transmitter 136, over the network 106 to the video sender 102. As depicted in block 270, in the event the elapsed time, T.sub.wait, of the wait time clock exceeds the first wait time threshold, THR.sub.wait.sub._.sub.dec, at least one request (e.g., a PLI message) for transmission of an I-frame is transmitted, by the PLI message transmitter 138, over the network 106 to the video sender 102. Further, as depicted in block 272, the video decoder 120 is directed, by the video depacketizer 118, to at least temporarily stop decoding encoded video frame data until the requested I-frame is received and processed at the network adaptor 116 of the video receiver 104. Further, as depicted in block 268, in the event the elapsed time, T.sub.wait, of the wait time clock exceeds the first wait time threshold, THR.sub.wait.sub._.sub.dec, a determination is made as to whether a recording of the encoded video frame data is desired. In the event a recording of the encoded video frame data is not desired, the method of
(29) As depicted in block 274 (see
(30)
(31) In the event the encoded video frame data is determined to be incomplete, the error concealer 126 of the video recorder 122 can perform error concealment on the incomplete video frame in the bitstream domain, as follows. First, the missing MBs in the data of the video frame are identified (see block 308), and motion vectors (MVs) are estimated for the missing MBs (see block 310). For example, the error concealer 126 can estimate such MVs for the missing MBs in the data of the video frame, using any suitable temporal error concealment technique. For the H.264 coding method, such a temporal error concealment technique can be used to identify missing MBs for P-frames and/or B-frames by estimating the MVs of the missing MBs from MVs of their neighboring MBs. Further, a boundary matching technique can be used to select the best MV estimate. Using the estimated MVs for the missing MBs, a video frame bitstream that corresponds to the missing MBs is generated (see block 312). The video frame bitstream that corresponds to the missing MBs is then incorporated into the depacketized video frame bitstream provided by the video depacketizer 118, thereby generating a video frame bitstream that corresponds to a complete video frame (see block 316). As depicted in block 318, a video frame bitstream that includes the concealed complete video frame is then recorded and stored by the video recorder 122 in a video file.
(32) In one embodiment, motion vector (MV) estimation for a missing macroblock (MB) can be performed based on the motion vectors (MVs) of macroblocks (MBs) neighboring the missing MB in a current video frame, as well as MVs of the previous video frame and the next video frame. Further, a suitable frame delay can be employed to make the MVs of the MBs in the next video frame available.
MV.sub.est(i,j)=f(MV.sub.neighbors,MV.sub.prev.sub._.sub.frame,MV.sub.next.sub._.sub.frame),(2)
in which MV.sub.neighbors collectively corresponds to the MVs of the respective neighboring MBs 402-409, MV.sub.prev.sub._.sub.frame corresponds to the MVs of the previous video frame, MV.sub.next.sub._.sub.frame corresponds to the MVs of the next video frame, and f( . . . ) can be a median function, a weighted average function, or any other suitable function. It is noted that the MVs used in MV estimation are dependent on the locations of the corresponding MBs, as well as the pattern of the missing MBs, in the current video frame. Further, if a scene change is detected in a video sequence, the MVs of MBs in video frames from a different scene are preferably not used in such MV estimation, in order to assure an acceptable level of MV estimation performance.
(33)
(34) In the event the encoded video frame data is determined to be incomplete, the error concealer 126 of the video recorder 122 can adaptively perform error concealment on the incomplete video frame in the bitstream domain, as follows. First, the complexity of the video frame is estimated (see block 508), using any suitable video frame complexity estimation technique. For example, the complexity of the video frame can be estimated using (1) the ratio of missing MBs to existing MBs in the respective video frame (R.sub.MB.sub._.sub.missing), (2) the average quantization step size employed for the existing MBs in the respective video frame (QP.sub.avg), (3) the average number of bits per MB (Bits.sub.MB.sub._.sub.avg), (4) the average number of bits associated with MVs per MB (Bits.sub.MB.sub._.sub.MV.sub._.sub.avg), and/or (5) the average number of bits associated with video transform coefficients per MB (Bits.sub.MB.sub._.sub.Coeff.sub._.sub.avg). It is noted that the average number of bits associated with video transform coefficients per MB, Bits.sub.MB.sub._.sub.Coeff.sub._.sub.avg, can be obtained as the difference between the average number of bits per MB, Bits.sub.MB.sub._.sub.avg, and the average number of bits associated with MVs per MB, Bits.sub.MB.sub._.sub.MV.sub._.sub.avg. Accordingly, the complexity of the video frame (Complexity.sub.frame) can be obtained, as follows:
Complexity.sub.frame=g(R.sub.MB.sub._.sub.missing,Bits.sub.MB.sub._.sub.MV.sub._.sub.avg,Bits.sub.MB.sub._.sub.Coeff.sub._.sub.avg,QP.sub.avg).(3)
In one embodiment, the function, g( . . . ), in equation (3) can be expressed, as follows:
g(R.sub.MB.sub._.sub.missing,Bits.sub.MB.sub._.sub.MV.sub._.sub.avg,Bits.sub.MB.sub._.sub.Coeff.sub._.sub.avg,QP.sub.avg)=R.sub.MB.sub._.sub.missing(Bits.sub.MB.sub._.sub.MV.sub._.sub.avg+Bits.sub.MB.sub._.sub.Coeff.sub._.sub.avgQP.sub.avg),(4)
in which can be set to eight (8) or any other suitable value.
(35) A determination is then made (see block 510) as to whether the complexity of the video frame exceeds a predetermined video frame complexity threshold, THR.sub.complexity. In the event the complexity of the video frame does not exceed the predetermined video frame complexity threshold, THR.sub.complexity, the error concealer 126 of the video recorder 122 performs error concealment on the incomplete video frame in the bitstream domain to generate a video frame bitstream that corresponds to a complete video frame. For example, the missing MBs in the data of the video frame are identified (see block 520), and motion vectors (MVs) are estimated for the missing MBs (see block 522), using any suitable temporal error concealment technique. Using the estimated MVs for the missing MBs, a video frame bitstream that corresponds to the missing MBs is generated (see block 524). The bitstream that corresponds to the missing MBs is then incorporated into the depacketized video frame bitstream provided by the video depacketizer 118, thereby generating a video frame bitstream that corresponds to a complete video frame (see block 528). As depicted in block 530, a video frame bitstream that includes the complete video frame is then recorded and stored by the video recorder 122 in a video file.
(36) As depicted in block 512, in the event the complexity of the video frame exceeds the predetermined video frame complexity threshold, THR.sub.complexity, the video frame is dropped by the error concealer 126 of the video receiver 104. As depicted in block 514, one or more video packets are further received, substantially in real-time, at the network adaptor 116 of the video receiver 104 over the network 106 from the video sender 102. The video packets are temporarily stored in the jitter buffer 124, and encoded video frame data are reconstructed from the video packets by the video depacketizer 118. As depicted in block 516, the encoded video frame data are then decoded and reconstructed to obtain a YUV video frame by the video decoder 120. As depicted in block 518, a determination is then made, by the error concealer 126 of the video recorder 122, as to whether the video frame is an I-frame. In the event the video frame is not an I-frame, the method of
(37)
(38)
(39)
(40) As depicted in block 808, in the event error concealment was performed on the video frame, a determination is made, by the error concealment information extractor 610, as to whether (or not) to playback the recorded video frame. For example, the determination as to whether to playback the recorded video frame can be made based on the distance (in seconds) to the next I-frame, as well as the complexity of the video frame, each of which can be obtained from the stored error concealment information for the video frame, or determined from such stored error concealment information. In one embodiment, the decision not to playback the recorded video frame can be made if one of the following conditions is met:
Complexity.sub.frame>D.sub.I-frameTHR.sub.drop, if D.sub.I-frame>0.5,(5)
Complexity.sub.frame>2THR.sub.drop, otherwise,(6)
in which D.sub.I-frame is the distance (in seconds) to the next I-frame, and THR.sub.drop is a threshold value that can be set to sixty (60) or any other suitable value. It is noted that the next I-frame was unavailable when the concealed video frame was subjected to error concealment in the video recording process, while the next I-frame is available in the playback process, thereby allowing for better decision making in order to provide an improved video quality of experience to a human viewer during playback of the video.
(41) In the event the determination is made to playback the recorded video frame, the method of
(42) It is noted that the operations herein described are purely exemplary and imply no particular order. Further, the operations can be used in any sequence when appropriate and can be partially used. With the above illustrative embodiments in mind, it should be understood that the above-described systems and methods could employ various computer-implemented operations involving data transferred or stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic, or optical signals capable of being stored, transferred, combined, compared, and/or otherwise manipulated.
(43) Moreover, any of the operations described herein that form part of the above-described systems and methods are useful machine operations. The above-described systems and methods also relate to a device or an apparatus for performing such operations. The apparatus can be specially constructed for the required purpose, or the apparatus can be a general-purpose computer selectively activated or configured by a software program stored in the computer. In particular, various general-purpose machines employing one or more processors coupled to one or more computer readable media can be used with software programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
(44) The above-described systems and methods can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data, which can be thereafter be read by a computer system. Examples of such computer readable media include hard drives, read-only memory (ROM), random-access memory (RAM), CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices. The computer readable media can also be distributed over a network-coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
(45) It will be appreciated by those of ordinary skill in the art that modifications to and variations of the above-described systems and methods may be made without departing from the inventive concepts disclosed herein. Accordingly, the invention should not be viewed as limited except as by the scope and spirit of the appended claims.