G06V20/41

Method and apparatus for overlaying themed imagery onto real-world objects in a head-mounted display device

Method and apparatus for overlaying themed imagery onto real-world objects in a head-mounted display device (HMDD). A computing device receives, from an HMDD, depth data that identifies distances from the HMDD to surfaces of a plurality of objects in a user space. The computing device detects at least one object in the user space based at least in part on the depth data. The computing device determines a classification of video content being presented on a display system of the HMDD. The computing device selects, based on the classification, a particular image theme from a plurality of different image themes, the image theme comprising one or more image textures. The computing device sends, to the HMDD, at least one image texture for overlaying the at least one object during presentation of the at least one object on the display system of the HMDD in conjunction with the video content.

NODE-BASED NEAR-MISS DETECTION
20230237804 · 2023-07-27 ·

A system includes one or more video capture devices and a processor coupled to each video capture device. Each processor is operable to direct its respective video capture device to obtain an image of a monitored area and process the image to identify objects of interest represented in the image. The processor is also operable to generate bounding perimeter virtual objects for the identified objects of interest, each bounding perimeter virtual object surrounding at least part of its respective object of interest. The processor is further operable to determine danger zones for the identified objects of interest based on the bounding perimeter virtual objects. The processor is further operable to determine at least one near-miss condition based at least in part on an actual or predicted overlap of danger zones for multiple objects of interest, and may optionally generate an alert at least partially in response to the near-miss condition.

METHOD FOR VIDEO RECOGNITION AND RELATED PRODUCTS
20230005264 · 2023-01-05 ·

A method for video recognition and related products are provided. The method includes the following. An original set of clip descriptors is obtained by providing multiple clips of a video as an input of a 3D CNN of a neural network, where the neural network includes the 3D CNN and at least one first fully connected layer, and each of the multiple clips includes at least one frame. An attention vector corresponding to the original set of clip descriptors is determined. An enhanced set of clip descriptors is obtained based on the original set of clip descriptors and the attention vector. The enhanced set of clip descriptors is input into the at least one first fully connected layer and video recognition is performed based on an output of the at least one first fully connected layer.

HEALTH TESTING AND DIAGNOSTICS PLATFORM

Systems and methods for providing a universal platform for at-home health testing and diagnostics are provided herein. In particular, a health testing and diagnostic platform is provided to connect medical providers with patients and to generate a unique, private testing environment. In some embodiments, the testing environment may facilitate administration of a medical test to a patient with the guidance of a proctor. In some embodiments, the patient may be provided with step-by-step instructions for test administration by the proctor within a testing environment. The platform may display unique, dynamic testing interfaces to the patient and proctor to ensure proper testing protocols and accurate test result verification.

Merging events in interactive data processing systems

This disclosure describes interactive data processing systems configured to facilitate selection by a human associate of tentative results generated by an automated system from sensor data. In one implementation, an event may take place in a materials handling facility. The event may comprise a pick or place of an item from an inventory location, movement of a user, and so forth. The sensor data associated with the event is processed by an automated system to determine tentative results associated with the event. In some situations, an uncertainty may exist as to which of the tentative results accurately reflects the actual event. The system may then determine whether the event is to be merged with one or more temporally and spatially proximate events and, if so, the sensor data and tentative results for the merged event is sent to a human associate. The associate may select one of the tentative results.

Electronic device and controlling method thereof

An electronic device and a controlling method thereof are provided. A controlling method of an electronic device according to the disclosure includes: performing first learning for a neural network model for acquiring a video sequence including a talking head of a random user based on a plurality of learning video sequences including talking heads of a plurality of users, performing second learning for fine-tuning the neural network model based on at least one image including a talking head of a first user different from the plurality of users and first landmark information included in the at least one image, and acquiring a first video sequence including the talking head of the first user based on the at least one image and pre-stored second landmark information using the neural network model for which the first learning and the second learning were performed.

Real time tracking of shelf activity supporting dynamic shelf size, configuration and item containment

A system may be configured to accurately track shelf activity in real-time with support for dynamic shelf size, configuration, and item containment. In some aspects, the system may parse regions of a video frame to determine a region of interest representation corresponding to a physical location (e.g., a shelf compartment), determine an enhanced region of interest representation based at least in part on the region of interest representation and an image enhancement pipeline, determine edge information of one or more objects based on the enhanced region of interest representation, compare a reference representation of the physical location to the edge information, and determine the amount of available space for the physical location based on the comparing.

Techniques for optimizing the display of videos
11716300 · 2023-08-01 ·

The disclosed embodiments disclose techniques for optimizing the display of videos. During operation, a computing device receives a video stream to be displayed. The computing device determines a preferred orientation for the video stream, determines a present orientation for the computing device, and determines a mismatch between the preferred orientation and the present orientation. The computing device adjusts the video stream while displaying the video stream on the display. As the video stream plays, the computing device detects any rotation of the computing device, and if so, re-adjusts how the video stream is displayed.

Contour-based detection of closely spaced objects

A system includes a sensor and a client. The client receives a set of frames of top-view depth images generated by the sensor. The client identifies a frame of the received frames in which a first contour associated with a first object is merged with a second contour associated with a second object. The client determines, at a first depth in the identified frame, a merged-contour region which is associated with the merged contours. The client detects a third contour at a second depth that is less than the first depth and determines a first region associated with the third contour. The client detects a fourth contour at the second depth and determines a second region associated with the fourth contour. If criteria are satisfied, the client associates the first region with a position of the first object and associates the second region with a position of the second object.

Video clip classification using feature vectors of a trained image classifier
11712621 · 2023-08-01 · ·

In various examples, potentially highlight-worthy video clips are identified from a gameplay session that a gamer might then selectively share or store for later viewing. The video clips may be identified in an unsupervised manner based on analyzing game data for durations of predicted interest. A classification model may be trained in an unsupervised manner to classify those video clips without requiring manual labeling of game-specific image or audio data. The gamer can select the video clips as highlights (e.g., to share on social media, store in a highlight reel, etc.). The classification model may be updated and improved based on new video clips, such as by creating new video-clip classes.