Computer-implemented method and system of augmenting a video stream of an environment
11189319 · 2021-11-30
Assignee
Inventors
Cpc classification
H04N21/234345
ELECTRICITY
H04N21/23418
ELECTRICITY
H04N21/21805
ELECTRICITY
G11B27/031
PHYSICS
International classification
H04N9/80
ELECTRICITY
G06F21/62
PHYSICS
G11B27/031
PHYSICS
Abstract
In a computer-implemented method and system of augmenting a video stream of an environment, a processing device receives a first video stream including a plurality of image frames provided from at least one camera, processes the plurality of image frames of the first video stream to define a spatial scope of image data within a respective one of the image frames, modifies the plurality of image frames with frame-based modification information configured to obfuscate or delete or restrict at least one portion of a respective one of the image frames which is outside of the spatial scope of image data within the respective one of the image frames, and outputs a second video stream based on the modified plurality of image frames with the frame-based modification information for transmission to a receiving device, such as a display device.
Claims
1. A computer-implemented method of augmenting a video stream of an environment, comprising: defining a spatial scope of image data based on one or more markers positioned in three-dimensional space of a real environment and that each correspond to a position in the three-dimensional space; receiving, by a processing device, a first video stream including a plurality of image frames of a real environment provided from at least one camera; processing, by the processing device, the plurality of image frames of the first video stream to define the spatial scope of image data within a respective one of the plurality of image frames, wherein said receiving said first video stream and said processing the plurality of image frames of the first video stream comprises receiving the first video stream with image frames capturing the environment which is annotated with the one or more markers to define the spatial scope, and defining the spatial scope based on at least one of the one or more markers captured in the respective one of the plurality of image frames; modifying, by the processing device, the plurality of image frames with frame-based modification information configured to obfuscate or delete or restrict at least one portion of a respective one of the plurality of image frames which is outside of the spatial scope of image data within the respective one of the plurality of image frames; and outputting, by the processing device, a second video stream based on a modified plurality of image frames with the frame-based modification information to transmit to a receiving device.
2. The method of claim 1, wherein said modifying the plurality of image frames with frame-based modification information configured to obfuscate or delete or restrict at least one portion of a respective one of the plurality of image frames comprises one or more of deleting, blurring, replacing by other visual information, coarsening, and encrypting the at least one portion of the respective one of the plurality of image frames.
3. The method of claim 1, further comprising defining a set of markers of the one or more markers to define the spatial scope, and defining the spatial scope based on at least one of the one or more markers captured in the respective one of the plurality of image frames comprises defining the spatial scope only if the set of markers is captured in the respective one of the plurality of image frames and within a boundary interconnecting respective markers of the set of markers.
4. The method of claim 1, wherein said defining the spatial scope based on at least one of the one or more markers captured in the respective one of the plurality of image frames comprises defining the spatial scope with a specific spatial layout based on at least one of the one or more markers.
5. The method of claim 4, wherein the specific spatial layout is defined by at least one of the one or more markers as points of a closed polygon, or of a convex border.
6. The method of claim 5, wherein each marker of said one or more markers is uniquely identifiable and a relative position in space of said each marker is decodable.
7. The method of claim 1, wherein said processing the plurality of image frames of the first video stream comprises defining the spatial scope based on one or more modulated light emitting elements captured in a respective one of the plurality of image frames.
8. The method of claim 7, wherein said defining the spatial scope based on one or more modulated light emitting elements captured in a respective one of the plurality of image frames comprises defining the spatial scope based on light of a certain modulation recognized from a respective one of the plurality of image frames.
9. The method of claim 1, wherein said processing the plurality of image frames of the first video stream comprises defining the spatial scope based on depth information associated with at least one of the plurality of image frames when providing the first video stream defining a visual depth from the at least one camera.
10. The method of claim 9, wherein the depth information is provided from one or more of at least one depth sensor, a stereo camera, and depth-from-motion differential images.
11. The method of claim 9, wherein the depth information is provided based upon at least one 2D/3D model of an object registered in at least one of the plurality of image frames and tracked within the plurality of image frames, with the depth information derived from the at least one 2D/3D model and detection of the object in the respective one of the plurality of image frames associated with the at least one 2D/3D model.
12. A computer program product comprising a non-transitory computer usable medium with software code sections that, when loaded into an internal memory of a processing device, cause the processing device to perform a method of augmenting a video stream of an environment, the method comprising the steps of defining a spatial scope of image data based on one or more markers positioned in three-dimensional space of a real environment and that each correspond to a position in the three-dimensional space; receiving, by the processing device, a first video stream including a plurality of image frames of a real environment provided from at least one camera; processing, by the processing device, the plurality of image frames of the first video stream to define the spatial scope of image data within a respective one of the plurality of image frames, wherein said receiving said first video stream and said processing the plurality of image frames of the first video stream comprises receiving the first video stream with image frames capturing the environment which is annotated with the one or more markers to define the spatial scope, and defining the spatial scope based on at least one of the one or more markers captured in the respective one fo the plurality of image frames; modifying, by the processing device, the plurality of image frames with frame-based modification information configured to obfuscate or delete or restrict at least one portion of a respective one of the plurality of image frames which is outside of the spatial scope of image data within the respective one of the plurality of image frames; and outputting, by the processing device, a second video stream based on a modified plurality of image frames with the frame-based modification information to transmit to a receiving device.
13. A system for augmenting a video stream of an environment, comprising: a processing device operatively coupled to a memory comprising instructions stored thereon that, when executed by the processing device, cause the processing device to: define a spatial scope of image data based on one or more markers positioned in three-dimensional space of a real environment and that each correspond to a position in the three-dimensional space; receive a first video stream including a plurality of image frames of a real environment provided from at least one camera; process the plurality of image frames of the first video stream to define the spatial scope of image data within a respective one of the plurality of image frames, wherein said receive said first video stream and said process the plurality of image frames of the first video stream comprises receive the first video stream with image frames capturing the environment which is annotated with the one or more markers to define the spatial scope, and define the spatial scope based on at least one of the one or more markers captured in the respective one of the plurality of image frames; modify the plurality of image frames with frame-based modification information configured to obfuscate or delete or restrict at least one portion of a respective one of the plurality of image frames which is outside of the spatial scope of image data within the respective one of the plurality of image frames; and output a second video stream based on a modified plurality of image frames with the frame-based modification information to transmit to a receiving device.
14. The system of claim 13, further comprising the receiving device which comprises at least one display device, wherein the processing device is coupled with the receiving device to display the second video stream on the at least one display device such that the at least one portion of a respective one of the plurality of image frames which is outside the spatial scope of image data is obfuscated or deleted or restricted.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The above and other aspects, features and advantages of the invention will be more apparent from the following more particular description thereof, presented in conjunction with the following drawings wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION OF THE INVENTION
(11) In the following exemplary description numerous specific details are set forth in order to provide a more thorough understanding of embodiments of the invention. It will be apparent, however, to an artisan of ordinary skill that embodiments of the invention may be practiced without incorporating all aspects of the specific details described herein. In other instances, specific features, quantities, or measurements well known to those of ordinary skill in the art have not been described in detail so as not to obscure the invention. Readers should note that although examples of the invention are set forth herein, the claims, and the full scope of any equivalents, are what define the metes and bounds of the invention.
(12)
(13)
(14) The video stream 10 includes a plurality of image frames of a real environment as commonly known, which in their sequence represent the video stream 10 of the real environment, e.g. an environment inside a manufacturing plant, in the form of moving images. As set out in more detail below, the processing device 4 processes the plurality of image frames of the video stream 10 to define a spatial scope of image data within a respective one of the image frames, and accordingly modifies the plurality of image frames with frame-based modification information.
(15) The processing device 4 then outputs to an output unit 5 a video stream 20 (herein referred to as second video stream) based on the modified plurality of image frames with the frame-based modification information for transmission to one or more receiving devices 2 and/or 3. Particularly, the video stream 20 includes, or is formed by, the modified plurality of image frames with the frame-based modification information. For example, one of the receiving devices may be a storing device 2 for storing a file of the video stream, or a display device 3, such as a display device of a stationary device, a closed VR (virtual reality) glasses, or a mobile or wearable device, e.g. a semitransparent display device in the form of semitransparent data glasses. The storing device 2 and/or display device 3 may also be on site, e.g. coupled with the output unit 5 in wired manner or wirelessly, or remote from the processing device 4, for instance remotely accessible, e.g. through the Internet. For instance, the video stream 20 is used to record trainings, document features, communicate with remote persons, or is used for computer vision-based tasks such as feature, object, and/or gesture recognition.
(16) According to aspects of the invention, the system 7 and method implemented in processing device 4 modify the image frames of the video stream 10 with frame-based modification information to form the modified video stream 20, such that at least a portion (e.g. pixels within a particular region) of a respective image frame of the video stream 20, which is outside of a spatial scope of image data, is obfuscated or deleted or restricted. The spatial scope, as explained in more detail below, may be set such that an entity, private or personal information which is present outside of the spatial scope in the environment may be prevented from being displayed to a viewer of the video stream 20, e.g. on display device 3.
(17) In this regard,
(18) According to examples of individual image frames 51 and 52 of a potential video stream 20 output by the processing device 4, the portion 512, in particular image pixels of portion 512, is obfuscated, for example blurred, replaced by other visual information (e.g., patterns, colors, etc.), or coarsened. According to the example of image frame 53 of a potential video stream 20 output by the processing device 4, the portion 512, in particular image pixels of portion 512, is deleted. According to the example of image frame 54 of a potential video stream 20 output by the processing device 4, the portion 512, in particular image pixels of portion 512, is restricted, e.g. encrypted. To this end if someone with the correct credentials watches the video stream 20, the portion 512 could be reconstructed so that access could be granted.
(19) For example, the frame-based modification information (or frame-based restriction information) of each image frame calculated by the processing device 4 may contain a binary description/annotation for a set of pixels to be displayed or obfuscated, or deleted, or restricted. That is, each frame may contain such frame-based modification information in the form of a binary description/annotation for a set of pixels.
(20)
(21) In particular, the environment is annotated with markers defining a spatial scope of video data to be transmitted to the receiving device 2, 3. These markers are processed in the processing device 4 and a spatial restriction for video transmission is calculated. In the depicted illustration, the environment is annotated with markers 14 for defining the spatial scope 15 based on one or more of the markers 14 captured in the respective image frame. For example, a set of four markers 14 is defined for defining the spatial scope 15. In an embodiment, the spatial scope 15 is defined only if the set of four markers 14 is captured in the respective image frame and within a boundary 16 interconnecting the respective markers 14 of the set of four markers. In another embodiment, the spatial scope 15 is defined if at least some of the markers 14 are captured in the respective image frame and within a boundary 16 interconnecting the captured markers 14 of the set of four markers.
(22) According to some embodiments, the processing device 4 may need to access information from an external source (not shown in the Figures), such as a server accessible through the Internet, to acknowledge and distinguish a configuration of markers. It is not necessary that all such information needed is (completely) stored locally.
(23) Aspects of the invention may be implemented according to one or more of the following embodiments:
Complete Marker Visibility
(24) Only if the complete set of defined markers 14 is captured in the respective image frame, video transmission to the receiving device 2, 3 is calculated or performed for image data within boundary 16 (here: bounding box) of the set of markers.
Partial Marker Visibility
(25) Markers 14 are used to define a convex border (only contents with the hull are allowed to be transmitted to the receiving device 2, 3. Only if at least two or three of the markers 14 are captured, video transmission to the receiving device 2, 3 is calculated or performed for image data within boundary 16 (here: bounding box), if boundary 16 is of, e.g., rectangular or quadratic shape, or convex hull of set of markers 14.
Implicit Geometry Processing
(26) A specific spatial layout is defined by markers 14 as points of a closed polygon, wherein each marker 14 is uniquely identifiable and the relative position in space is decodable. Video transmission of data to the receiving device 2, 3 is calculated for the partial polygon reconstructed from captured markers 14. The geometry can be processed from a visual scan of markers 14.
(27)
(28) According to an embodiment, modulated light, e.g., modulated LEDs (light emitting diodes) are used to define a spatial scope for content transmission to receiving device 2, 3. Only if light of a certain modulation can be recognized by the at least one camera 1 (or processing device 4), image data of a respective image frame of image stream 10 is allowed or, in the negative case, not allowed to be transmitted. LEDs with specific signatures can be attached to devices of interest. If emitted light is captured, a respective transmission rule may be executed.
(29)
(30) According to an embodiment, the spatial scope 35 is defined based on depth information (here: depth range 36) associated with at least one of the image frames when providing the first video stream 10. The depth information defines a visual depth from the at least one camera 1. In the present embodiment, the depth information may be provided from a depth sensor 34, e.g. fixedly associated with or installed on the camera 1, which measures a range of objects 31 within a field of view 37. A particular depth range 36 may be preset by the user or processing device 4 up to which captured objects 31 shall be contained in the video stream 20, whereas objects 31 outside this range shall be obfuscated, deleted, or restricted. The depth range 36 thus defines a visual depth from the camera 1 and accordingly a spatial scope 35, here from the lens of the camera 1 to the depth range 36.
(31) According to embodiments, the depth information or range 36 is provided from the at least one depth sensor 34, a stereo camera, and/or from Depth-from-Motion differential images.
(32) According to further embodiments, depth information (such as the depth range 36) may additionally, or alternatively, be provided based upon at least one 2D/3D model of an object (such as one of the objects 31) registered in at least one of the image frames and tracked within the plurality of image frames of video stream 10. The depth information may then be derived from the at least one 2D/3D model and detection of the object in the respective image frame associated with the at least one 2D/3D model.
(33) According to an embodiment, calibrated additional depth information is used to restrict visual data to a defined visual depth, i.e., only visual data within a certain range is transmitted to the receiving device 2, 3.
(34) In a first variant thereof, depth information of a dedicated image and depth sensor, e.g., of a commonly known RGB-D sensor, is used to restrict the area of visual data transmission to the receiving device 2, 3. Only the contents of the video stream 10 within a defined distance from the camera 1 are transmitted to the receiving device 2, 3. Distance is represented by depth values of the sensor. A RGB-D sensor is a specific type of depth sensing device that works in association with a RGB camera, that is able to augment a conventional image with depth information (related with the distance to the sensor), for example in a per-pixel basis.
(35) In a further variant, depth information is additionally or alternatively generated by a stereo-camera to restrict the area of visual data transmission to receiving device 2, 3). Only contents of the video stream 10 within a defined distance from the camera 1 are transmitted to the receiving device 2, 3. For example, distance is represented by depth values of the sensor associated with (e.g. mounted on) the camera 1.
(36) In a further variant, depth information is additionally or alternatively generated by so-called Depth-from-Motion. That is, differential images, e.g., taken at different time instances are used to calculate depth information for defining the spatial scope of image data for restriction, obfuscating, or deletion.
(37) In a further variant, a model-based definition of the spatial scope may additionally or alternatively applied. For example, a 2D/3D model of an object is recognized and tracked. For example, a 2D/3D model of an object 31 may be used which is registered in the respective image frame according to the position of the object 31, so that a depth information may be generated for the object 31 from the 3D coordinates or dimensions of the model in three-dimensional space. Image information can then be obfuscated, deleted, or restricted based on the detection of the object 31. For example, the 2D/3D model can be obtained from CAD (computer aided design) data, video data, or blueprints known in the art.
(38) According to an embodiment, a MARS sensor (magnetic-acoustic-resonator sensor) (inertial measurement unit, IMU, etc.) is used to track the camera 1 and to obfuscate or delete or restrict the area or portion of a respective one of the image frames which is outside of the spatial scope for transmission to the receiving device 2, 3. According to another embodiment, a MARS sensor is supported by visual input for obfuscating or deleting or restricting the information (like with SLAM, Simultaneous Localization and Mapping). Further embodiments include location-based obfuscating or deletion or restriction, e.g. by external positioning/fencing employing, e.g., GPS, WiFi (wireless LAN), Bluetooth®, UWB (ultra-wideband), IR (infra-red) and/or other sensors.
(39) Thus, according to aspects of the invention, with the system and computer-implemented method as described herein, for example, a POV video recording may be employed, wherein during recording and streaming with the surrounding environment constantly captured by the camera system, critical content within the video stream, for instance with respect to privacy, production line, products, procedures, etc., may be restricted, deleted, or obfuscated, so that corresponding problems with such critical content may be taken account of. Advantageously, the system and method do not require any elaborate user pre-settings and resource intensive object or image recognition procedures for identifying any admissible or critical objects in the environment, thus are relatively user-friendly to handle, and require relatively low resource capabilities in preventing objects of an environment from being viewed in a video stream of the environment.
(40) While the invention herein disclosed has been described by means of specific embodiments and applications thereof, numerous modifications and variations could be made thereto by those skilled in the art without departing from the scope of the invention set forth in the claims.