THREE DIMENSIONAL MULTIPLE OBJECT TRACKING SYSTEM WITH ENVIRONMENTAL CUES
20180286259 ยท 2018-10-04
Inventors
Cpc classification
A63F13/92
HUMAN NECESSITIES
G06F3/04815
PHYSICS
G06F3/011
PHYSICS
A63F13/577
HUMAN NECESSITIES
A63F13/5372
HUMAN NECESSITIES
A63F13/235
HUMAN NECESSITIES
G09B5/02
PHYSICS
G09B19/00
PHYSICS
International classification
G09B5/02
PHYSICS
Abstract
A multiple object tracking system has a system controller with a placement block placing target objects and distractor objects within a 3D display space upon a representation of a solid ground, an assignment block assigning respective trajectories for movement of each of the objects, and an animation block defining an animated sequence of images showing the ground and the objects following the respective trajectories. A visual display presents images to a user including the animated sequence of images and a ground representation. A manual input device is adapted to respond to manual input from the user to select objects believed to be the target objects after presentation of the animated sequence. Preferably, the animation block incorporates a plurality of 3D cues applied to each of the objects, such as 3D perspective, parallax, 3D illumination, binocular disparity, and differing occlusion.
Claims
1. A multiple object tracking system, comprising: a system controller having a placement block placing target objects and distractor objects within a 3D display space upon a representation of a solid ground within the display space, an assignment block assigning respective trajectories for movement of each of the objects along the ground, and an animation block defining an animated sequence of images showing the ground and the objects following the respective trajectories; a visual display presenting images from the system controller to a user, wherein the presented images include the animated sequence of images; and a manual input device coupled to the system controller adapted to respond to manual input from the user to select objects believed to be the target objects after presentation of the animated sequence.
2. The system of claim 1 wherein the animation block incorporates a plurality of 3D cues applied to each of the objects, and wherein the 3D cues are comprised of at least one of 3D perspective, parallax, and 3D illumination.
3. The system of claim 2 wherein perspective is comprised of distance scaling and convergence.
4. The system of claim 2 wherein 3D illumination is comprised of shading and shadowing.
5. The system of claim 2 wherein the visual display presents stereoscopic views to a left eye and a right eye of the user, and wherein the 3D cues are comprised of at least one of binocular disparity and differing occlusion.
6. The system of claim 1 wherein the respective trajectories includes at least one curved path.
7. The system of claim 1 wherein the respective trajectories includes at least one path having a collision followed by a rebound segment along the ground.
8. The system of claim 1 wherein the presentation of images by the visual display includes an indication phase identifying the target objects, a mixing phase advancing through the animated sequence of images with the objects following the respective trajectories, and a selection phase responsive to the manual input.
9. A method for multiple object tracking comprising the steps of: placing target objects and distractor objects within a 3D display space upon a representation of a solid ground within a display space; assigning respective trajectories for movement of each of the objects along the ground; defining an animated sequence of images showing the ground and the objects following the respective trajectories; presenting the animated sequence of images to a user; receiving manual input from a user selecting objects believed by the user to be the target objects after presentation of the animated sequence; and updating a user score in response to comparing identities of the target objects to select objects.
10. The method of claim 9 further comprising the step of incorporating a plurality of 3D cues applied to each of the objects, wherein the 3D cues are comprised of at least one of 3D perspective, parallax, and 3D illumination.
11. The method of claim 10 wherein perspective is comprised of distance scaling and convergence.
12. The method of claim 10 wherein 3D illumination is comprised of shading and shadowing.
13. The method of claim 10 wherein the step of presenting the animated sequence of images include respective stereoscopic views presented to a left eye and a right eye of the user, and wherein the 3D cues are comprised of at least one of binocular disparity and differing occlusion.
14. The method of claim 9 wherein the respective trajectories includes at least one curved path.
15. The method of claim 9 wherein the respective trajectories includes at least one path having a collision followed by a rebound segment along the ground.
16. The method of claim 9 comprising an indication phase identifying the target objects, a mixing phase advancing through the animated sequence of images with the objects following the respective trajectories, and a selection phase responsive to the manual input.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0020] The present invention is a system and method for training and evaluating a user (i.e., person) on cognitive capacity of multiple object tracking (MOT) in 3D space, which presents a series of tests to the subject in a three dimensional environment with reference to a ground surface. In each test, a sequence of animated images are presented to the subject on a 3D display, which can be either a computer-based 3D screen or a head-mounted display (HMD) of a type used in virtual reality (VR) applications wherein separate images are presented to the user's left and right eyes. In the animated image sequence, a series of objects are presented on the ground surface, wherein a number of targets are indicated as a subset of the objects during a first time period (the remaining objects being distractors). Thereafter, the indications are removed so that the targets mix with distractors. All objects, including targets and distractors, start to move during a second time period. At the end of the second time period, subjects are instructed to identify the targets. A subject's response is evaluated in such a way that the next test adjusts the difficulty accordingly. At the end of the series of tests, the subject's attentional capacity may be calculated. Repeated performance of the tests can be carried out over several days to improve the subjects' cognitive function and attentional capacity.
[0021] The invention presents the subject with a much richer depth information from different resources, such as ground surface, perspective, motion parallax, occlusion, relative size, and binocular disparity. The invention takes the real world 3D conditions into consideration when measuring and training visual attention and cognition in a more realistic 3D space. The method and apparatus will have much greater ecological validity and can better represent everyday 3D environments. The inventor has found that training with 3D MOT not only improves the subject's performance on trained MOT tasks but also generalizes to untrained visual attention in space. The application has broader implications where performance on many everyday activities can benefit from having used the invention. The invention can be used by, for example, insurance companies, senior service facilities, driving rehabilitation service providers, team sports coaches/managers (e.g., football or basketball coaches) at different levels (grade school, or college).
[0022] In each test of the assessment or training sessions, the target and distractor locations and initial motion vectors are pseudo-randomly generated. A predetermined number among the total number of objects (e.g., 10 spheres) are indicated as targets at the beginning of each test. Then all objects travel in predetermined trajectories (e.g., linear or curved) until making contact with a wall or other object, at which point they are deflected. Thus, the objects may appear to bounce off each other.
[0023] Once the object motion phase ends, users will be instructed to indicate which items they believe to be targets by using a mouse/keyboard (when using a PC) or using a custom controller (when using smartphone-based or gaming console-based VR). The number of correctly selected targets will count towards a positive number of earned credits. At the end of an assessment/training session, an overall score will be assigned to the user, with his/her own top five historical performance displayed as a reference.
[0024]
[0025] The random placement of objects 12 in the prior art has included use of a simulated three-dimensional space in which one object can pass behind another. The placed objects 12 are assigned respective random trajectories 15 to follow during the mixing phase as shown in
[0026]
[0027] In placing and assigning trajectories to objects 24 and 25, a downward force of gravity is simulated by controlling the appearance and movement of objects 24 and 25 to be upon and along ground surface 21. Various techniques for defining ground surface 21 and objects 24 and 25 are well known in the field of computer graphics (e.g., as used in gaming applications). Additional 3D cues may preferably be included in the presentation of 3D objects on a monitor (i.e., a display screen simultaneously viewed by both eyes), such as adding perspective (e.g., depth convergence) to the environment and objects, simulating motion parallax, occlusion of objects moving behind another, scaling the relative sizes of objects based on depth, 3D illumination (e.g., shading and shadows), and adding 3D surface textures.
[0028] Other embodiments of the invention may present different left and right images to the left and right eyes for enhanced 3D effects using virtual reality (VR) headsets. The VR headset can be a standalone display (i.e., containing separate left and right display screens), such as the Oculus Rift headset available from Oculus VR, LLC, or the Vive headset available from HTC Corporation. The VR headset can alternatively be comprised of a smartphone-based (e.g., Android phone or iPhone) VR headset having left and right lenses/eyepieces adjacent a slot for receiving a phone. Commercially available examples include the Daydream View headset from Google, LLC, and the Gear VR headset from Samsung Electronics Company, Ltd. Images from the display screen of the phone are presented to the eyes separately by the lenses. A typical VR headset is supplied with a wireless controller that communicates with the smartphone or standalone headset via Bluetooth.
[0029] A VR-headset-based embodiment is shown in
[0030]
[0031] A functional block diagram of the invention is shown in
[0032] In decision block 58, performance of users can be evaluated in an adaptive way in order to progress successive trials to more difficult or challenging test conditions when user exhibits successful performance or to progress to easier conditions otherwise. An adaptive process helps ensure that the user continues to be challenged while avoiding frustration from having extremely difficult test conditions.
[0033] Display block 59 handles the creation and animation of the 3D objects and environment. A three-dimensional scene may be created corresponding to the example initial conditions shown in
[0034] Each object 64 is preferably comprised of a substantially identical sphere. Although spheres are shown, other shapes can also be used. Although objects 64 may preferably all have the same color, texture, or other salient characteristics (at least prior to adding 3D cues as discussed below), they can alternatively exhibit differences in appearance such as color or texture as long as they do not reveal the identities of tracked objects. Uniform spheres are generally the most preferred objects because they are the most featureless 3D objects. Thus, any training benefits will not be restricted to the trained type of object and will better generalize to the numerous object shapes and types in the real world. Nevertheless, it is possible to modify the display to meet a special need in a certain context (e.g., have soldiers to keep track of a number of military vehicles, such as tanks).
[0035] Display block 59 may be organized according to the block diagram of
[0036] There is a variety of 3D information that the human visual system uses to perceive and judge depth/distance of objects in 3D environments. The 3D cues include binocular information (which requires different images being sent to each eye, such as with a stereoscopic display) and monocular information (which uses a single image display). Binocular disparity is one source of binocular information, which represents the angular difference between the two monocular retina images (that any scene projects to the back of our eyes). Another binocular 3D cues is differing occlusion, wherein different portions of an object are obscured by an intervening object for each eye.
[0037] Monocular 3D cues do not rely on binocular processing (i.e., you can close one eye and will still experience a 3D view). Monocular cues include texture gradient, light illumination (i.e., shading and shadowing), motion parallax, perspective, and occlusion. Texture gradients indicate that the farther the distance, the smaller the projected retina image is for the texture (e.g., tiles, grass, or surface features). Motion parallax is a dynamic depth cue referring to the fact that when we are in motion, near objects appear to move rapidly in the opposite direction. Objects beyond fixation, however, will appear to move much more slowly, often in the same direction we are moving.
[0038] 3D cues can be added by animation block 67 using known tools and methods. For example, computer graphics software such as OpenGL library and Unreal Engine 4 have been successfully used in an application in the C++ programming languages to create animated sequences.
[0039] The invention is adapted to operate well in a system for testing and improving cognitive capacity of visual attention.
[0040]