Patent classifications
G06V40/103
Multi-Camera Video Stream Selection For In-Person Conference Participants
A best available video stream is determined for each of multiple conference participants within a conference room including multiple cameras based on scores determined for video streams obtained from the cameras. The scores are determined based on representations of the conference participants within the video streams, for example, based on percentages of conference participant faces visible within the video streams, directions of conference participant faces relative to the cameras, directions of eye gaze of the conference participants relative to the cameras, and/or degrees to which conference participant faces are obscured within the video streams. The best available video streams are output for rendering within separate user interface tiles of conferencing software.
INTELLIGENT BIRD FEEDING METHOD, ELECTRONIC DEVICE AND BIRD FEEDER
An intelligent bird feeding method may include: shooting video information of a bird in a preset area through the camera component, transmitting the video information to an electronic device to indicate the electronic device to determine a category of the bird and a state of the bird based on the video information; determining whether a bird food needs to be fed and a category of the bird food to be fed based on the category of the bird and the state of the bird; and under the circumstance that the bird food needs to be fed, selecting a bird food of a corresponding category for feeding according to the category of the bird food to be fed.
CONTENT SELECTION DEVICE, CONTENT DISPLAY SYSTEM, AND CONTENT SELECTION METHOD
A content selection device includes: an image acquisition unit configured to acquire an image captured by an image capture device configured to capture a person; a human detection unit configured to detect one or more persons included in the image; and a selection unit configured to select a first person who has a slower moving speed than at least one other person from among the one or more persons, and select a first content according to an attribute of the first person as a content to be displayed on a display device.
Action recognition method and apparatus, and human-machine interaction method and apparatus
A computer device extracts a plurality of target windows from a target video. Each of the target windows comprises a respective plurality of consecutive video frames. For each of the target windows, the device performs action recognition on the respective plurality of consecutive video frames corresponding to the target window to obtain respective first action feature information of the target window. The device obtains a similarity between the first action feature information of the target window and preset feature information. The device determines, from the respective obtained similarities corresponding to the plurality of target windows, a highest first similarity and a first target window corresponding to the highest first similarity. The device also determines a dynamic action corresponding to the highest first similarity as the preset dynamic action in accordance with threshold settings.
METHOD AND SYSTEM FOR HUMAN ACTIVITY RECOGNITION IN AN INDUSTRIAL SETTING
Example implementations described herein involve a system for training and managing machine learning models in an industrial setting. Specifically, by leveraging the similarity across certain production areas, it is possible to group such areas together to train models efficiently that use human pose data to predict human activities or specific task(s) that the workers are engaged in. Example implementations remove previous methods of independent model construction for each production area and takes advantage of the commonality amongst different environments.
Neural-Symbolic Action Transformers for Video Question Answering
Mechanisms are provided for performing artificial intelligence-based video question answering. A video parser parses an input video data sequence to generate situation data structure(s), each situation data structure comprising data elements corresponding to entities, and first relationships between entities, identified by the video parser as present in images of the input video data sequence. First machine learning computer model(s) operate on the situation data structure(s) to predict second relationship(s) between the situation data structure(s). Second machine learning computer model(s) execute on a received input question to predict an executable program to execute to answer the received question. The program is executed on the situation data structure(s) and predicted second relationship(s). An answer to the question is output based on results of executing the program.
Classification of Image Data with Adjustment of the Degree of Granulation
A device for classifying image data includes a trainable pre-processing unit configured to retrieve, from a trained context, and based on the image data, at least one specification in terms of how a degree of granulation of the image data is to be reduced, and to reduce the degree of granulation of the image data in accordance with the at least one specification. The device further includes a trainable classifier configured to map the granulation-reduced image data onto an assignment to one or more classes of a specified classification.
Enhanced input using recognized gestures
A representation of a user can move with respect to a graphical user interface based on input of a user. The graphical user interface comprises a central region and interaction elements disposed outside of the central region. The interaction elements are not shown until the representation of the user is aligned with the central region. A gesture of the user is recognized, and, based on the recognized gesture, the display of the graphical user interface is altered and an application control is outputted.
DYNAMIC ADAPTATION OF PARAMETER SET USED IN HOT WORD FREE ADAPTATION OF AUTOMATED ASSISTANT
Hot word free adaptation, of function(s) of an automated assistant, responsive to determining, based on gaze measure(s) and/or active speech measure(s), that a user is engaging with the automated assistant. Implementations relate to techniques for mitigating false positive occurrences of and/or false negative occurrences, of hot word free adaptation, through utilization of a permissive parameter set in some situation(s) and a restrictive parameter set in other situation(s). For example, utilizing the restrictive parameter set when it is determined that a user is engaged in conversation with additional user(s). The permissive parameter set includes permissive parameter(s) that are more permissive than counterpart(s) in the restrictive parameter set. A parameter set is utilized in determining whether condition(s) are satisfied, where those condition(s), if satisfied, indicate that the user is engaging in hot word free interaction with the automated assistant and result in adaptation of function(s) of the automated assistant
Recording medium, information processing system, and display method
A program executed by a processor causes the processor to identify, based on a result of an image captured by a first camera configured to capture an image of a user in a real space, a position of the user; display, based on the result of the image captured by the first camera, on a display, a first image representative of the user; and display, at a position that is based on an identification result obtained by identifying the position of the user, a second image that corresponds to an object in the real space.