DYNAMIC TARGETING OF PREFERRED OBJECTS IN VIDEO STREAM OF SMARTPHONE CAMERA

Abstract

Selecting objects in a video stream of a smart phone includes detecting quiescence of frame content in the video stream, detecting objects in a scene corresponding to the frame content, presenting at least one of the objects to a user of the smart phone, and selecting at least one of the objects in a group of objects in response to input by the user. Detecting quiescence of frame content in the video stream may include using motion sensors in the smart phone to determine an amount of movement of the smart phone. Detecting quiescence of frame content in the video stream may include detecting changes in view angles and distances of the smart phone with respect to the scene. Detecting objects in a scene may use heuristics, custom user preferences, and/or specifics of scene layout. At least one of the objects may be a person or a document.

Claims

1. A method of capturing a subset of objects within a video stream captured by an electronic device, the method comprising: receiving a video stream captured by an electronic device; detecting within a frame of the video stream one or more objects for capture; determining a plurality of scenarios based on the one or more objects, wherein each scenario of the plurality of scenarios is a distinct subset of the one or more objects; displaying, via the electronic device, the frame of the video stream in conjunction with a first scenario of the plurality of scenarios; responsive to a first user input rejecting the first scenario of the plurality of scenarios, displaying, via the electronic device, the frame of the video stream in conjunction with a second scenario of the plurality of scenarios; and responsive to a second user input selecting the second scenario of the plurality of scenarios, extracting a respective subset of the one or more objects corresponding to the second scenario from the frame of the video stream.

2. The method of claim 1, further comprising: after determining the plurality of scenarios, pre-selecting the first scenario of the plurality of scenarios to be displayed, via the electronic device, based on a third user input.

3. The method of claim 2, wherein the third user input includes one or more of a change in a view angle and a change in a distance of the electronic device with respect to the one or more objects.

4. The method of claim 1, wherein: displaying the frame of the video stream in conjunction with the first scenario of the plurality of scenarios includes displaying the first scenario with an overlay highlighting a respective subset of the one or more objects corresponding to the first scenario; and displaying the frame of the video stream in conjunction with the second scenario of the plurality of scenarios includes displaying the second scenario with an overlay highlighting a respective subset of the one or more objects corresponding to the second scenario.

5. The method of claim 1, wherein the one or more objects include one or more of a person and a document.

6. The method of claim 1, further comprising: after extracting the respective subset of the one or more objects corresponding to the second scenario from the frame of the video stream, displaying, via the electronic device, the respective subset of the one or more objects corresponding to the second scenario.

7. The method of claim 1, further comprising: after extracting the respective subset of the one or more objects corresponding to the second scenario from the frame of the video stream, displaying, via the electronic device, one or more affordances including: a first affordance that allows the user to store the respective subset of the one or more objects corresponding to the second scenario, and a second affordance that allows the user to share the respective subset of the one or more objects corresponding to the second scenario.

8. The method of claim 1, wherein detecting within the frame of the video stream the one or more objects for capture includes one or more of determining one or more objects in focus, determining one or more objects with a predetermined distance relative to the electronic device, and determining one or more unobstructed objects.

9. The method of claim 1, wherein the first user input includes one or more of selection of an rejection affordance displayed on the electronic device and a rejection gesture including shaking the electronic device left-and-right.

10. The method of claim 1, wherein the second user input includes one or more of selection of an approval affordance displayed on the electronic device, allowing a predetermined amount of time to elapse without moving the electronic device, eye-tracking, spatial gestures, and facial expressions.

11. An electronic device, comprising: one or more processors; and memory storing one or more programs for execution by the one or more processors, the one or more programs including instructions for: receiving a video stream captured by an electronic device; detecting within a frame of the video stream one or more objects for capture; determining a plurality of scenarios based on the one or more objects, wherein each scenario of the plurality of scenarios is a distinct subset of the one or more objects; displaying, via the electronic device, the frame of the video stream in conjunction with a first scenario of the plurality of scenarios; responsive to a first user input rejecting the first scenario of the plurality of scenarios, displaying, via the electronic device, the frame of the video stream in conjunction with a second scenario of the plurality of scenarios; and responsive to a second user input selecting the second scenario of the plurality of scenarios, extracting a respective subset of the one or more objects corresponding to the second scenario from the frame of the video stream.

12. The electronic device of claim 11, wherein the one or more programs further include instructions for: after determining the plurality of scenarios, pre-selecting the first scenario of the plurality of scenarios to be displayed, via the electronic device, based on a third user input.

13. The electronic device of claim 12, wherein the third user input includes one or more of a change in a view angle and a change in a distance of the electronic device with respect to the one or more objects.

14. The electronic device of claim 11, wherein: displaying the frame of the video stream in conjunction with the first scenario of the plurality of scenarios includes displaying the first scenario with an overlay highlighting a respective subset of the one or more objects corresponding to the first scenario; and displaying the frame of the video stream in conjunction with the second scenario of the plurality of scenarios includes displaying the second scenario with an overlay highlighting a respective subset of the one or more objects corresponding to the second scenario.

15. The electronic device of claim 11, wherein the one or more objects include one or more of a person and a document.

16. A non-transitory computer-readable storage medium storing one or more programs for execution by one or more processors of an electronic device, the one or more programs comprising instructions for: receiving a video stream captured by the electronic device; detecting within a frame of the video stream one or more objects for capture; determining a plurality of scenarios based on the one or more objects, wherein each scenario of the plurality of scenarios is a distinct subset of the one or more objects; displaying, via the electronic device, the frame of the video stream in conjunction with a first scenario of the plurality of scenarios; responsive to a first user input rejecting the first scenario of the plurality of scenarios, displaying, via the electronic device, the frame of the video stream in conjunction with a second scenario of the plurality of scenarios; and responsive to a second user input selecting the second scenario of the plurality of scenarios, extracting a respective subset of the one or more objects corresponding to the second scenario from the frame of the video stream.

17. The non-transitory computer-readable storage medium of claim 16, wherein the one or more programs further include instructions for: after determining the plurality of scenarios, pre-selecting the first scenario of the plurality of scenarios to be displayed, via the electronic device, based on a third user input.

18. The non-transitory computer-readable storage medium of claim 17, wherein the third user input includes one or more of a change in a view angle and a change in a distance of the electronic device with respect to the one or more objects.

19. The non-transitory computer-readable storage medium of claim 16, wherein: displaying the frame of the video stream in conjunction with the first scenario of the plurality of scenarios includes displaying the first scenario with an overlay highlighting a respective subset of the one or more objects corresponding to the first scenario; and displaying the frame of the video stream in conjunction with the second scenario of the plurality of scenarios includes displaying the second scenario with an overlay highlighting a respective subset of the one or more objects corresponding to the second scenario.

20. The non-transitory computer-readable storage medium of claim 16, wherein the one or more objects include one or more of a person and a document.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0026] Embodiments of the system described herein will now be explained in more detail in accordance with the figures of the drawings, which are briefly described as follows.

[0027] FIG. 1 is a schematic illustration of fast camera movement during pre-positioning when scene analysis is not performed, according to an embodiment of the system described herein.

[0028] FIG. 2 schematically illustrates retrieval and timeout-based confirmation of a preferred object in a scene, according to an embodiment of the system described herein.

[0029] FIG. 3 is a schematic illustration of altering a scene via positional targeting of a camera by a user, according to an embodiment of the system described herein.

[0030] FIG. 4 is a schematic illustration of generation and selecting of a desired set of preferred objects, according to an embodiment of the system described herein.

[0031] FIG. 5 is a system flow diagram illustrating system functioning, according to an embodiment of the system described herein.

DETAILED DESCRIPTION OF VARIOUS EMBODIMENTS

[0032] The system described herein provides a mechanism for identifying preferred objects in frames of preview video stream of a smartphone camera, building possible scenarios of object selection, providing a user with choice options and tools and creating photographs of chosen objects or their combinations for subsequent use.

[0033] FIG. 1 is a schematic illustration 100 of fast camera movement during pre-positioning when scene analysis is not performed. A smartphone 110 with an embedded camera 120 is aimed by a user of the camera 120 at a document page 130 arranged on a surface. At this preliminary stage of taking a photograph of the document, both view angles and distances 140 of the camera 120 are changing rapidly, as shown by a camera trajectory 150. The system assesses a speed of changing camera position and view angle, as explained elsewhere herein, and remains in a motion monitoring mode, without analyzing scenes in a video stream.

[0034] FIG. 2 is a schematic illustration of a retrieval and timeout-based confirmation of a preferred object in a scene. The smartphone 110 with the embedded camera 120 has stabilized after a preliminary targeting phase, explained in connection with FIG. 1. A current frame 210 of a preview video stream contains a portion of the document page 130. Additionally, a handwritten note 220 is placed on top of the page 130. The system identifies the note 220 as a preferred object using techniques explained elsewhere herein. The system highlights the preferred object with a frame 230 (other highlighting techniques based on color, contrast and other image attributes may also be used). If the user agrees with the selection of the preferred object, no action is required; in this embodiment, the system automatically accepts inactivity of the user for a predetermined amount of time 240 as a confirmation. Then, the system causes the camera 120 to make a static photograph of the frame 230, retrieve only a preferred object, perform perspective, color, geometric (e.g., rotation) and other corrections, and display a cleaned up preferred object 250 on a screen 260 of the smartphone 110. Note that, in other embodiments, the system may provide a confirmation button, a multi-touch or a spatial gesture or other tools for a proactive confirmation of acceptance of a preferred object by the user.

[0035] FIG. 3 is a schematic illustration 300 of altering a scene via positional targeting of the camera 120 by a user as a mechanism of preliminary object selection. In the example of FIG. 3, the current frame 210 of a video stream for the smartphone 110 with the camera 120 contains a whole document page, similar to the page 130 in FIG. 1, discussed above. The frame 210 has been analyzed by the system once the camera 120 arrives to a stable position and several objects have been identified as candidates for a preferred object set and highlighted in the frame 210 for subsequent user selection; examples include text columns 310a, 310b, a page title 310c and all other objects highlighted by similar frames.

[0036] In contrast with FIG. 2, where the user agrees with the system identification of a (single) preferred object and confirms a system choice by waiting for the timeout to occur, in FIG. 3 the user actively pre-selects a subset of objects for further consideration by changing camera position and view angle, as schematically shown by a camera trajectory 320 and an altered view angle 330. A change of camera view results in a new frame 340 of the video stream, which includes a three-column article with a title located in a bottom portion of the previous frame 210 of the video stream. Note that, while in the example of FIG. 3 the user has chosen a multi-object scene via positional targeting, a final user choice may be a single object.

[0037] FIG. 4 is a schematic illustration 400 of illustration of generation and selection of a desired set of preferred objects. Based on a selection of subset of preferred objects (e.g., the example of FIG. 3), the system may generate a set of scenarios for a final choice by the user and sequentially display the set of scenarios on the screen of the smartphone 110.

[0038] An original position of the smartphone 110 with the camera 120 indicates the frame 340 pre-selected by the user according to FIG. 3. Unlike FIG. 2, the frame 340 contains multiple candidate objects and the user may want to select a subset as a final choice by the user. The system may first generate a scenario 410a where all four objects in a scene are included in a final choice and display a corresponding scenario on the screen of the smartphone 110 as a pictogram 420a. In the example of FIG. 4, the user disagrees with a first scenario and indicates disagreement by using a conventional spatial rejection gesture 430 (shaking the camera back and forth, imitating shaking one's head), implemented in certain models of smartphones. Other spatial, multi-touch gestures, on-screen buttons or other tools may serve the same rejection purpose. Upon receiving rejection by the user of the first scenario, also shown by a cross next to the pictogram 420a, the system may generate a second scenario 410b where only a first column of the article without the title is selected and display it as the pictogram 420b. The user may reject the second scenario 420b, too, so the system moves to a third scenario 410c and a corresponding pictogram 420c where only a second column of the article is selected as a preferred object. In FIG. 4, the user accepts the second scenario 410c and holds the smartphone 110 stable until a timeout 440 expires indicating approval. Subsequently, the system may take a static photo of the scene, retrieve only a chosen object 450 and display the chosen object 450 to the user, possibly after an additional processing to sharpen, clean up or resize an image corresponding to the object 450 (or other operations as necessary). Pictograms for additional scenarios 420d, 420e are shown for illustration purposes. Other confirmation methods are also possible, as explained elsewhere herein. In an embodiment, the system may group the scenarios 410a-410e and display several pictograms 420a-420e on the screen of the smartphone 110, offering to select a desired one of the scenarios 410a-410e by tapping on a pictogram.

[0039] Referring to FIG. 5, a system flow diagram 500 illustrates processing in connection with the system functioning. Processing begins at a step 510 where the system assesses device (smartphone) movement and rotation speed using built-in sensors. The system may also estimate a change of page view angle by processing a video stream from the smartphone. After the step 510, processing proceeds to a test step 515, where it is determined whether the change in position and view angle (if applicable) of the device are occurring rapidly. If so, processing proceeds to a step 520, where the system registers a pre-positioning mode. After the step 520, processing proceeds back to the step 510, where the system continues monitoring the device mode.

[0040] If it is determined at the test step 515 that the change in position and view angle (if applicable) of the device are not occurring rapidly, processing proceeds from the step 515 to a step 525, where the system registers a scene analysis mode. After the step 525, processing proceeds to a step 530, where the system selects a frame from the video stream for processing. After the step 530, processing proceeds to a step 535, where the system detects preferred object candidates in the scene. After the step 535, processing proceeds to a test step 540, where it is determined whether a set of candidates for preferred objects is stable over a period of time (time-based sequencing for scene analysis, described elsewhere herein, is not shown in FIG. 5). If the set of candidates for preferred objects is not stable over a period of time, processing proceeds back to the step 530, discussed above. Otherwise, processing proceeds to a step 545 where candidate objects are highlighted and displayed to the user. After the step 545, processing proceeds to a test step 550, where it is determined whether the user is moving the device for positional targeting of a sub-scene or individual objects with the embedded camera, as explained elsewhere herein, including FIG. 3 and the accompanying text. If so, processing proceeds back to the step 510 to monitor device movement; otherwise, processing proceeds to a step 555 where the system builds object choice scenarios, as explained, for example, in FIG. 4 and the accompanying text.

[0041] After the step 555, processing proceeds to a step 560, where a first scenario is selected and a corresponding pictogram for displaying to the user is built, as illustrated, for example, by items 420a-420e in FIG. 4. After the step 560, processing proceeds to a step 565, where the pictogram is presented to the user. After the step 565, processing proceeds to a test step 570, where it is determined whether a choice timeout (predetermined amount of time for determining a user choice) has transpired, as explained in more details in conjunction with the predetermined amount of time 240 in FIG. 2 and the timeout 440 in FIG. 4. If not, processing proceeds to a test step 575, where it is determined whether the user has made a rejection gesture (or otherwise rejected the current scenario, as explained elsewhere herein). If not, processing proceeds back to the test step 570 to monitor user behavior within the choice timeout period (ending when the choice timeout transpires); otherwise, processing proceeds (without any user choice made) to a test step 580, where it is determined whether a currently selected scenario is a last scenario generated by the system. If not, processing proceeds to a step 585 where the next scenario is generated. After the step 585, processing proceeds back to the step 565 to offer the newly selected scenario for user choice. If it is determined at the step 580 that the current scenario is the last scenario, processing proceeds to a step 590 where the system displays to the user a message that no object choice has been made; after the step 590, processing is complete. If it is determined at the test step 570 that the choice timeout has transpired, then control transfers from the test step 570 to a step 595, where a static snapshot of the scene is made. After the step 595, processing proceeds to a step 597, where the system retrieves objects chosen by the user from the photograph and presents the objects to the user. After the step 597, processing is complete.

[0042] Various embodiments discussed herein may be combined with each other in appropriate combinations in connection with the system described herein. Additionally, in some instances, the order of steps in the flowcharts, flow diagrams and/or described flow processing may be modified, where appropriate. Subsequently, elements and areas of screen described in screen layouts may vary from the illustrations presented herein. Further, various aspects of the system described herein may be implemented using software, hardware, a combination of software and hardware and/or other computer-implemented modules or devices having the described features and performing the described functions. The smartphone may include software that is pre-loaded with the device, installed from an app store, installed from a desktop (after possibly being pre-loaded thereon), installed from media such as a CD, DVD, etc., and/or downloaded from a Web site. The smartphone 110 may use an operating system selected from the group consisting of: iOS, Android OS, Windows Phone OS, Blackberry OS and mobile versions of Linux OS. The smartphone 110 may be connected by various types of wireless and other connections, such as cellular connections in Wide Area Networks, Wi-Fi, Bluetooth, NFC, USB, infrared, ultrasound and other types of connections. A mobile device other than a smartphone may be used. Note that the system described herein may be used with other devices capable of taking a photograph and providing appropriate feedback to a user, such as a wireless digital camera with a screen for providing messages to the user and a mechanism for providing an intermediate image stream.

[0043] Software implementations of the system described herein may include executable code that is stored in a computer readable medium and executed by one or more processors. The computer readable medium may be non-transitory and include a computer hard drive, ROM, RAM, flash memory, portable computer storage media such as a CD-ROM, a DVD-ROM, a flash drive, an SD card and/or other drive with, for example, a universal serial bus (USB) interface, and/or any other appropriate tangible or non-transitory computer readable medium or computer memory on which executable code may be stored and executed by a processor. The software may be bundled (pre-loaded), installed from an app store or downloaded from a location of a network operator. The system described herein may be used in connection with any appropriate operating system.

[0044] Other embodiments of the invention will be apparent to those skilled in the art from a consideration of the specification or practice of the invention disclosed herein. It is intended that the specification and examples be considered as exemplary only, with the true scope and spirit of the invention being indicated by the following claims.

DYNAMIC TARGETING OF PREFERRED OBJECTS IN VIDEO STREAM OF SMARTPHONE CAMERA

Inventors

Cpc classification

Classification Explorer

G06F3/0304

PHYSICS

Classification Explorer

G06F1/1694

PHYSICS

Classification Explorer

H04M1/0264

ELECTRICITY

Classification Explorer

G06F3/04842

PHYSICS

Classification Explorer

H04M2250/52

ELECTRICITY

Classification Explorer

G06F3/017

PHYSICS

Classification Explorer

G06F3/002

PHYSICS

Classification Explorer

G06F3/013

PHYSICS

Classification Explorer

H04M1/72454

ELECTRICITY

Classification Explorer

G06F1/1626

PHYSICS

Classification Explorer

G06F1/1686

PHYSICS

Classification Explorer

H04N1/19594

ELECTRICITY

Classification Explorer

G06F3/04883

PHYSICS

International classification

Classification Explorer

G06F3/04842

PHYSICS

Classification Explorer

G06F3/01

PHYSICS

Classification Explorer

G06F3/04883

PHYSICS

Classification Explorer

H04N1/195

ELECTRICITY

Abstract

Claims

Description