Method and system for dynamically analyzing, modifying, and distributing digital images and video
11605227 · 2023-03-14
Inventors
- David M Ludwigsen (Riverside, CA, US)
- Dirk Dewar Brown (Lexington, SC, US)
- Mark Bradshaw (Weston, FL, US)
Cpc classification
G06V10/255
PHYSICS
G06F18/2137
PHYSICS
H04N21/23424
ELECTRICITY
G06V20/46
PHYSICS
H04N5/2723
ELECTRICITY
G06F16/7837
PHYSICS
International classification
H04N21/234
ELECTRICITY
Abstract
The present invention discloses a new method for analyzing, modifying, and distributing digital images and video in a quick, efficient, practical and/or cost-effective way. The method of processing video can take a different region or object and replace the pixels in the frames of the scenes that comprise the features and characteristics of the identified region or object with a different set of pixels. The replacement or other customizations of the frames and scenes lead to a naturally integrated video or image which is indistinguishable by the human eye or other visual system. In one embodiment, this invention can be used to provide different advertising elements into an image or set of images for different viewers, or to enable a viewer to control elements within a video and add their own preference or other elements.
Claims
1. A computer implemented method for modifying a video, the method comprising: i) Identifying, on a processor, one or more elements in each frame of a video in view of one or more characteristics associated with each element and stored in an object database; ii) Detecting, on said processor, a zone on a frame suitable for modification; iii) Identifying, on said processor, one or more scenes in said video by comparing the elements in each frame with the elements in the previous frame and subsequent frame, wherein frames having common elements above a threshold number will be considered to be in the same scene; iv) Constructing, on said processor, one or more 3D spatial maps with all elements in each frame based on the characteristics in one or more previous frames and one or more subsequent frames in each scene; and v) by applying said one or more 3D spatial maps, modifying, via said processor, said zone in all frames associated with all elements in said zone in one or more scenes, thus modifying said video, wherein all frames with said modified zone appear to be part of the original frames in a 3D environment prior to said modification.
2. The method of claim 1, wherein said one or more 3D spatial maps construct all elements in a 3D environment based on said one or more characteristics.
3. The method of claim 1, wherein said one or more characteristics comprise position, dimension, reflection, lighting, shadows, warping, rotation, blurring and occlusion in a 3D environment.
4. The method of claim 1, wherein said object database comprises one or more element-identification algorithms.
5. The method of claim 1, wherein said one or more elements are objects, regions, or part thereof in all frames.
6. The method of claim 1, wherein said zone for modification is detected by a detection algorithm which is stored in a detection algorithm database or selected in view of an input from a user.
7. The method of claim 1, wherein said one or more said scenes of step (iii) are stored in a scene database.
8. The method of claim 1, wherein, in step (v), said zone is modified by: a) removing one or more selected elements from said zone in all frames containing said one or more selected elements in said one or more scenes and adjusting said zone without said one or more selected elements in all frames being modified by applying said one or more 3D spatial maps; b) removing one or more selected elements from all frames containing said one or more selected elements, applying a new element to said zone in said one or more scenes and adjusting said zone with said new elements in all frames being modified by applying said one or more 3D spatial maps; or c) warping a desired element therein in all frames containing said desired element in said one or more scenes and adjusting said zone with said warped element in all frames being modified by applying said one or more 3D spatial maps.
9. The method of claim 8, wherein said new element is an advertisement image or element.
10. The method of claim 1, wherein said video is modified in a real-time manner or near real-time manner.
11. The method of claim 1, further comprising delivering the modified video of step (v) by streaming or downloading.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION OF THE INVENTION
(8) The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
(9) A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough ‘understanding’ of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
(10) The present invention relates to a system and method in which a video may be broken down into scenes that may relate to each other and objects and areas that exist across a single or multiple scenes. By doing this, the invention allows a user to ‘understand’ the context of a video or the video may be able to quickly, efficiently, realistically, and/or inexpensively be customized by pre-processing the identified scenes and frames of the video with the identified objects and areas and other types of metadata associated with an object or area or with the scene itself. These elements can then be replaced, altered, or removed.
(11) In some embodiments of the present invention, a system and method for identifying and tracking scenes, objects, or all or a portion of an area in a video is described. In some embodiments, the method is configured to identify scenes that relate to each other during the video. In some embodiments, in each related scene, objects or part of an object or areas or part of an area are identified along with associated characteristics such as lighting, shadows, occlusion, and are used to calculate a set of algorithms for each pixel in each object or area that can be applied for rapid replacement of all pixels in each object or area and its associated characteristics. In some embodiments, the method is used to allow a user or machine to replace the identified object or area in each scene in the video with a logo, object, or replacement image such that the resultant logo, object, or replacement image would appear to have been there all along. In some embodiments, the method is used to allow a user or machine to remove the object or area such that it would appear to have never been there at all. In some embodiments, the method is used to reconstruct 3d spatial maps for each frame.
(12)
(13) In some embodiments of the present invention, a system and method for analyzing and correlating scenes are described. A scene is categorized as one or more frames in a video that are related in some way. Frames in a video are analyzed in a scene preprocessing stage in which element-identification algorithms are used to identify elements within each frame of the video to determine which frames are associated with each other. The algorithms identify objects, like-pixel areas, sequences of continuous action, lighting, locations, and other elements that can be compared from frame to frame. As an example, a car chase sequence may be identified by identifying two cars and the characteristics of each car (color, type, branding), the drivers of each car, the surrounding location of where the cars are driven, and other identifiable elements in continuous frames and assign a weight to each identifiable object, area, or characteristic in order to compare it to a previous or subsequent frame. In a different example, a bedroom location may be automatically detected by identifying furniture and the associated characteristics of each (e.g., color, type, branding, scratches), and other elements such as artwork on the wall, carpeting, doors, etc. These objects or areas can be identified by a variety of algorithms including, but not limited to, DRIFT, KAZE, SIFT (Scale-invariant Feature Transform), SURF (Speeded Up Robust Features), haar classifiers, and FLANN (Fast Library for Approximate Nearest Neighbors). When the number of common elements that have been identified between two sequential frames, or groups of frames, decreases past a threshold number, then the scene is considered to have changed. In a normal sequential scene change, the number of common elements will drop from a large number within a scene to zero for the next scene. When fading from one scene to another or gradual shifts in scene changes, groups of frames can be used to determine the transition point from one frame to another. For example, in the case where a scene starts to fade into another scene, the element-identification algorithms begin to identify fewer common elements in sequential frames and pick up an increasing number of common elements in a new set of frames for the next scene. The transition point between scenes can be determined in a number of ways, including the midpoint of the faded transition, as determined by the frame number halfway between the last frame of the first scene in which the maximum number of common elements can be identified and the first frame of the second scene in which the maximum number of common elements can be identified. The transition point in fading from one scene to another can also be defined differently for different elements in the scene depending on when that element first fades in or fades out.
(14) Once the object and area comparisons in a previous or subsequent frame have determined that the current frame belongs to a different scene, the previous scene and all of its characteristics can be stored in a database, as shown in
(15) A scene processing server, as shown in
(16) In some embodiments of the present invention, a system and method for analyzing frames in a video are described. Individual frames from a video are analyzed through a frame preprocessing stage to automatically identify all objects and areas by comparing to a database of previously trained objects, areas, locations, actions, and other representations and by finding contiguous areas of space by examining like adjacent pixels. As illustrated in
(17) After identification, the associated characteristics of each object or area, such as lighting, shadows, warping, rotation, blurring, and occlusion can be determined in the overall frame, as shown in
(18) Identification databases are a collection of datasets that identify specific objects, areas, actions such as playing a football game, locations such as cities, or environments such as a beach. The system uses multiple methods of collecting this data for comparison and later identification of specific objects, areas, actions, locations, and environments. The identification databases are broken into multiple specific subcategories of groupings of objects with tags associated with them for identification. The reduction of the databases or “datasets” into specific datasets allows said method to search n number of datasets on specific identification worker nodes very quickly (less than the time to create or render a frame). This allows the invention to process a single frame against n number of nodes each with their own set of datasets, allowing the invention to process millions of objects in the time it takes to process a single frame so that a video can be played back at near the speed of standard video buffering and playback.
(19) Another aspect of the invention involves a tool for gathering and training image datasets of specific objects that can be used for identification both in the preprocessing phase and as well as the replacement or removal phase. By using image analysis and training algorithms such as, but not limited to, PICO, haar classifiers, and supervised learning, the tool can quickly collect and if necessary, crop, image data from either locally stored image sets or the internet by searching key tags of desired images. Once collected and cropped, the image data is converted into trained metadata files that can be placed onto server nodes and used to identify specific items or groupings of items on a per thread/node/server basis at a later time. In another embodiment, this gathering and training process may be done on local computers or can be split across networked servers for faster training. The tool allows for testing against multiple datasets to make sure that trained datasets are working properly before being stored in an identification database and deployed to servers.
(20) Another aspect of the invention employs high speed detection of items within the video. This process which uses the methods from an identification database benefits the system by identifying information about the video that can help to identify the interests of the viewer. For example, the process can be used to detect human faces, logo, specific text, pornographic images, a violent scene, adult content, etc. As another example, the identified elements can be further customized by replacement, removal or other modifications. Using this user interest metadata, and combining it with other sets of information that define what a user is interested in, as shown in
(21) Another aspect of the invention involves creation of a 3d spatial map of each frame that consists of all of the objects, areas, light sources, shadows, and occluded objects that have been identified as well as the context of each frame. As the invention is able to identify objects, areas, locations, environments, and other important data required for a complete ‘understanding’ of a 3d scene, such as shadows, lighting, and occlusion, the invention can reconstruct all or a portion of a 3d environment by using such data.
(22) Another aspect of the invention allows a user to select replacement zones to find a specific frame in the video where they believe a good place for a replacement is warranted. The algorithms search in all related scenes, as well as in previous and subsequent frames to do the replacement for the full area of video. Users can either select an area where they want to keep the replacement zone, or select a single point and allow the system to detect the extents of the replacement zone based on user's input/suggestion.
(23) Another aspect of the invention involves high speed, distributed replacement of objects or areas in n number of frames. Once an object or area has been identified for replacement, alteration, or removal, and the object maps have been generated, which may or may not be prior to the completion of the entire video pre-processing, the system can identify n number of alteration worker nodes that can work on each individual frame in which the object has been identified as existing and for replacement, alteration, or removal, and each node can process for that particular object or area or a collective set of overlapping objects or areas. By doing this, the alteration process for m number of objects or areas can be near real-time.
(24) Referring to
(25) In one embodiment, the method for processing a video in this invention is characterized by: (i) Identifying one or more elements in each frame of said video; (ii) Identifying one or more scenes from said video by comparing the elements in each frame with the elements in the previous frame and subsequent frame, wherein frames having common elements above a threshold number will be considered to be in the same scene; iii) Obtaining one or more associated characteristics for each element in each frame; iv) Generating a map on the 3D environment in each frame based on the associated characteristics in one or more previous frames and one or more subsequent frames; and v) Modifying one or more scenes in said video based on said map.
(26) In one embodiment, the method disclosed in this invention is further characterized in that the element in step (i) could be an object or a selected area in a scene of said video, with the one or more elements in step (i) identified by comparing with the characteristics stored in an object database and said element might automatically be detected by a detection algorithm which is stored in a detection algorithm database, or selected by user's input.
(27) In one embodiment, the method disclosed in this invention further comprises the step (ii), in which the two or more said scenes are correlated by the elements in each of said scenes and stored in a scene database.
(28) In one embodiment, the associated characteristics in step (iii) include, but are not limited to, position, dimension, reflection, lighting, shadows, warping, rotation, blurring and occlusion.
(29) In one embodiment, the step (v) comprises modifying the one or more scenes by removing one or more elements and applying the map generated in the step (iv) to average the one or more removed elements in each frame within one or more scenes.
(30) In one embodiment, the step (v) comprises modifying said one or more scenes by warping a desired element and applying the map generated in the step (iv) over the desired element in each frame within one or more scenes.
(31) In one embodiment, the method disclosed in this invention further comprises delivering the modified video of step (v) by streaming or downloading.
(32) In one embodiment, the computer implementation system for processing a video in this invention can be, but is not necessarily, characterized in that it comprises the steps below: (i) Identifying one or more elements in each frame of said video; (ii) Identifying one or more scenes from said video by comparing the elements in each frame with the elements in the previous frame and subsequent frame, wherein frames having common elements above a threshold number will be considered to be in the same scene; iii) Obtaining one or more associated characteristics for each element in each frame; iv) Generating a map on the 3D environment in each frame based on the associated characteristics in one or more previous frames and one or more subsequent frames; and v) Modifying one or more scenes in said video based on said map.
(33) In one embodiment, the computer implementation system disclosed in this invention is further characterized in that the element in step (i) can be an object or a selected area in a scene of said video, the one or more elements in step (i) can be identified by comparing with the characteristics stored in an object database; wherein said element might be automatically detected by a detection algorithm which is stored in a detection algorithm database, or selected by user's input.
(34) In one embodiment, the method disclosed in this invention further comprises the step (ii), in which the two or more said scenes are correlated by the elements in each of said scenes and stored in a scene database.
(35) In one embodiment, the associated characteristics in step (iii) include, but are not limited to, position, dimension, reflection, lighting, shadows, warping, rotation, blurring and occlusion.
(36) In one embodiment, the step (v) could comprise modifying the one or more scenes by removing one or more elements and applying the map generated in the step (iv) to average the one or more removed elements in each frame within one or more scenes.
(37) In one embodiment, the step (v) could comprise modifying said one or more scenes by warping a desired element and applying the map generated in the step (iv) over the desired element in each frame within one or more scenes.
(38) In one embodiment, the computer implementation system disclosed in this invention could further comprise delivering the modified video of step (v) by streaming or downloading.