Method and system for generating an event video sequence, and camera comprising such system
10779003 · 2020-09-15
Assignee
Inventors
- Viktor Edpalm (Lund, SE)
- Bjarne Rosengren (Södra Sandby, SE)
- Jonas Holmberg (Klagerup, SE)
- Stefan Lundberg (Lund, SE)
Cpc classification
H04N7/181
ELECTRICITY
H04N19/137
ELECTRICITY
H04N19/46
ELECTRICITY
G11B27/031
PHYSICS
H04N19/107
ELECTRICITY
G08B13/19667
PHYSICS
G08B13/19676
PHYSICS
H04N19/114
ELECTRICITY
H04N19/44
ELECTRICITY
H04N19/57
ELECTRICITY
International classification
H04N19/57
ELECTRICITY
H04N19/114
ELECTRICITY
H04N19/46
ELECTRICITY
H04N7/18
ELECTRICITY
H04N19/107
ELECTRICITY
H04N19/137
ELECTRICITY
H04N19/44
ELECTRICITY
Abstract
A method, system, and non-transitory computer readable storage medium having instructions for generating an event video sequence. The method comprises receiving input image frames encoded into a sequence of image frames arranged in groups of pictures, each group of pictures having a GOP structure. The method further comprises storing a pre-event sequence of image frames in a first-in-first-out buffer, said pre-event sequence having a first predetermined GOP structure, upon receiving an event notification, retrieving the pre-event sequence from the buffer, and generating the event video sequence by combining the pre-event sequence and a post-event sequence of image frames, said post-event sequence having a second GOP structure, which is different from said first GOP structure. The system may be included in a camera.
Claims
1. A method of generating an event video sequence, the method comprising: receiving input image frames encoded into a sequence of image frames arranged in groups of pictures, each group of pictures comprising an intra-frame and zero or more subsequent inter-frames, each group of pictures in said sequence of image frames having a group of pictures (GOP) structure; storing a pre-event sequence of image frames in a first-in-first-out buffer, said pre-event sequence having a first GOP structure; receiving an event notification; upon receiving said event notification, retrieving said pre-event sequence from said buffer; and generating the event video sequence by combining said pre-event sequence and a post-event sequence of image frames, said post-event sequence having a second GOP structure, which is different from said first GOP structure, wherein the generated event video sequence only includes data from the pre-event sequence of image frames and the post-event sequence of image frames, wherein said first GOP structure differs from said second GOP structure in that a first GOP length of said first GOP structure differs from a second GOP length of said second GOP structure, each GOP length being defined by a number of frames contained in a respective group of pictures, and wherein said first GOP length is set by user input and is used to determine the length of the pre-event sequence, and said second GOP length is dynamic.
2. The method according to claim 1, wherein said second GOP length is longer than said first GOP length.
3. The method according to claim 1, wherein said dynamic GOP length is set based on at least one of a level of motion determined for one or more input image frames, a noise level of one or more input image frames, a frame rate at which said input image frames are encoded, or a bit rate limit set for encoding said image frames.
4. A system for generating an event video sequence, said system comprising: an image receiving circuit arranged to receive input image frames encoded into a sequence of image frames arranged in groups of pictures, each group of pictures comprising an intra-frame and zero or more subsequent inter-frames, each group of pictures in said sequence of image frames having a group of pictures (GOP) structure, wherein the input image frames originate from a camera; a buffer arranged to store a pre-event sequence of the received image frames on a first-in-first-out basis, said pre-event sequence having a first GOP structure; a processor arranged to receive event notifications, retrieve the pre-event sequence from said buffer, and generate the event video sequence by combining the pre-event sequence with a post-event sequence of image frames upon receipt of an event notification, said post-event sequence having a second GOP structure, which is different from said first GOP structure, wherein the generated event video sequence only includes data from the pre-event sequence of image frames and the post-event sequence of image frames, wherein said first GOP structure differs from said second GOP structure in that a first GOP length of said first GOP structure differs from a second GOP length of said second GOP structure, each GOP length being defined by a number of frames contained in a respective group of pictures, and wherein said first GOP length is set by user input and is used to determine the length of the pre-event sequence, and said second GOP length is dynamic; and a graphical user interface arranged to receive the user input for setting said first GOP length.
5. The system according to claim 4, wherein said second GOP length is longer than said first GOP length.
6. The system according to claim 4, wherein the processor is further arranged to set said dynamic GOP length based on at least one of a level of motion determined for one or more input image frames, a noise level of one or more input image frames, a frame rate at which said input image frames are encoded, or a bit rate limit set for encoding said image frames.
7. The system according to claim 4, wherein the system is found in a camera.
8. An article of manufacture including a non-transitory computer-readable storage medium having instructions stored thereon, that when executed by a processor cause a computing device to execute instructions, the instructions causing the computing device to perform operations comprising: receiving input image frames encoded into a sequence of image frames arranged in groups of pictures, each group of pictures comprising an intra-frame and zero or more subsequent inter-frames, each group of pictures in said sequence of image frames having a group of pictures (GOP) structure; storing a pre-event sequence of image frames in a first-in-first-out buffer, said pre-event sequence having a first GOP structure; receiving an event notification; upon receiving said event notification, retrieving said pre-event sequence from said buffer; and generating the event video sequence by combining said pre-event sequence and a post-event sequence of image frames, said post-event sequence having a second GOP structure, which is different from said first GOP structure, wherein the generated event video sequence only includes data from the pre-event sequence of image frames and the post-event sequence of image frames, wherein said first GOP structure differs from said second GOP structure in that a first GOP length of said first GOP structure differs from a second GOP length of said second GOP structure, each GOP length being defined by a number of frames contained in a respective group of pictures, and wherein said first GOP length is set by user input and is used to determine the length of the pre-event sequence, and said second GOP length is dynamic.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The invention will now be described in more detail by way of example and with reference to the accompanying schematic drawings, in which:
(2)
(3)
(4)
(5)
(6)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(7) In
(8) The camera 1 captures images of the scene, and a video motion detection algorithm is performed on the captured images. The camera 1 is set up to record video based on events from the motion detection algorithm. Via a graphical user interface the user may define a region of interest in images of the scene in which the motion detection algorithm is to be performed. For instance, the user may be interested only in motion only in front of the door 3, such as on the path 5, but not in swaying movements of the bush 4. Thus, the user may define a region of interest that covers the desired area in front of the door 3, but not the bush 4.
(9) In order to be able to include a pre-event video sequence in recordings, the camera 1 has a FIFO buffer (12 in
(10) Before the event video sequence may be recorded or transmitted, it is encoded using a block-based hybrid video codec, such as an h.264 codec. As discussed above, the images are encoded as I-frames or P-frames, and these are grouped in groups of pictures, also referred to as GOPs.
(11) Reference may now be made to
(12) When a person enters the monitored scene, the motion detection algorithm determines that there is motion in the region of interest and an event is triggered. Upon notification of this event, the image frames that are currently in the pre-event buffer are recorded, and recording of image frames captured beginning at the event is started. When the motion detection algorithm no longer detects motion in the region of interest, recording is continued until the end of a post-event recording time set by the user. The post-event recording time may be, e.g., 5 seconds.
(13) In order to control bit rate, it is desired to use a dynamic GOP length for the recordings. In this manner, a long GOP length may be used when there is little or no motion in the scene, thereby reducing the output bit rate, since P-frames generally require less bitts for encoding than I-frames. When there is a lot of motion in the scene, shorter GOP lengths may be used, leading to higher output bit rate, but also reducing encoding artifacts. As already mentioned, this makes it difficult, or even impossible to determine a pre-event buffer length that ensures that a desired pre-event time may be decoded and viewed. Once again using the example with 3 seconds set by the user as pre-event recording time and a frame rate of 30 fps, but with an increased GOP length of 256, the pre-event buffer would have to hold 11.3 seconds worth of pre-event video. If only the previously calculated 7.3 seconds of video were stored in the pre-event buffer, there is great risk that there will not be an I-frame at or before the start of the desired 3 seconds of pre-event recording. It may be pointed out that, with this combination of frame rate and GOP length, a group of pictures is longer than the 3 seconds set by the user as pre-event recording time.
(14) According to the invention, this is solved by using another GOP structure for the buffer, in this example another GOP length. If a predetermined GOP length is used for the buffer, the necessary length of the buffer may be predictably calculated as described above. In
(15) It may be noted that, although the discussion above has referred to recording of the event video sequence, the same principle of generating an event video sequence is useful for other purposes, such as for displaying the event video sequence, e.g. to a security guard in a control center. It is also useful for transmitting the event video sequence, e.g., via e-mail or ftp.
(16) The method of generating an event video sequence may be described in more general terms with reference to
(17) A system that may be used in accordance with the method described above will now be described with reference to
(18) It will be appreciated that a person skilled in the art can modify the above described embodiments in many ways and still use the advantages of the invention as shown in the embodiments above. As an example, it may be noted that in the description above, only P-frames are used as inter-frames. However, B-frames may also be used.
(19) As already mentioned, the generation of the event video sequence may be performed for various reasons, such as for recording, as discussed above, or for displaying or transmitting, e.g., via e-mail or ftp.
(20) The system 10 for generating an event video sequence may be integrated in a camera, such as the monitoring camera 1 shown in
(21) The dynamic GOP length is in the example above based on level of motion. However, the GOP length of the post-event sequence may be varied also using other schemes. The dynamic control of the GOP length may be based on a noise level of one or more input image frames, a frame rate at which said input image frames are encoded, or a bit rate limit set for encoding said image frames.
(22) Further, an I-frame may need to be inserted after a loss of network connection for ensuring a new starting point for decoding.
(23) Additionally or alternatively, the user may have set a GOP length to be used for recordings, which is too long for the available pre-event buffer. A shorter GOP length that fits in the buffer may in such case be used for the pre-event sequence, whereas the longer GOP length set by the user is used for the post-event sequence.
(24) The second GOP structure may differ from the first GOP structure in alternative or additional ways than just differences in GOP length. Just as an example, the first GOP structure may be made up of I-frames, P-frames and B-frames, and the second GOP structure may be made up of only I-frames and P-frames.
(25) In addition to changes in GOP structure, there may be other encoding differences between the pre-event sequence and the post-event sequence. For example, one compression value may be used for the pre-event sequence, and another compression value for the post-event sequence. Alternatively, a static compression value may be used for the pre-event sequence, and a dynamically controlled compression value may be used for the post-event sequence.
(26) In the example described above the event is a motion detection event. It should be appreciated that the method may be used equally well with events from other video analysis algorithms, such as an object tracking algorithm, a cross-line detection algorithm, a wrong way detection algorithm, a face detection algorithm, or a license plate recognition algorithm. The event may also be of other alarm event types, e.g., originating from an external sensor, such as a microphone, a door contact or a PIR sensor. Such devices may be integrated in the camera, connected to the camera via an I/O port, or separate. Separate devices may submit event notifications to the system or to the camera, e.g., via an Ethernet connection.
(27) The event need not be an alarm event, but may be a system event or a user input event. A system event may be generated by a camera system, and may be an error notification, such as a network loss notification. A user event may be an input of a command, such as a manual start of recording.
(28) In the embodiment shown in
(29) A camera from which the input image frames may originate may be any kind of camera, such as a camera employing visible light, an IR camera or a thermal camera. The camera may be a monitoring camera.
(30) The input image frames may generally be generated by a visual light sensor, a thermal sensor, a time-of-flight sensor, or other types of image generating sensors capable of generating information to be encoded using intra-frame and inter-frame encoding technologies.
(31) The encoder used for encoding the image frames may be any kind of block-based hybrid video codec. The encoder may be integrated in the event video sequence generating system or may be a separate device or module operatively connected to the event video sequence generating system.
(32) The system may be embodied as software, firmware or hardware, or a combination thereof.
(33) Thus, the invention should not be limited to the shown embodiments but should only be defined by the appended claims.