Patent classifications
G06V20/49
COMPUTER VISION-BASED SURGICAL WORKFLOW RECOGNITION SYSTEM USING NATURAL LANGUAGE PROCESSING TECHNIQUES
Systems, methods, and instrumentalities are disclosed for computer vision-based surgical workflow recognition using natural language processing (NLP) techniques. Surgical video of surgical procedures may be processed and analyzed, for example, to achieve workflow recognition. Surgical phases may be determined based on the surgical video and segmented to generate an annotated video representation. The annotated video representation of the surgical video may provide information associated with the surgical procedure. For example, the annotated video representation may provide information on surgical phases, surgical events, surgical tool usage, and/or the like.
System for the automated, context sensitive, and non-intrusive insertion of consumer-adaptive content in video
Described herein is a method and system for automated, context sensitive and non-intrusive insertion of consumer-adaptive content in video. It assesses ‘context’ in the video that a consumer is viewing through multiple modalities and metadata about the video. The method and system described herein analyzes relevance for a consumer based on multiple factors such as the profile information of the end-user, history of the content, social media and consumer interests and professional or educational background, through patterns from multiple sources. The system also implements local-context through search techniques for localizing sufficiently large, homogenous regions in the image that do not obfuscate protagonists or objects in focus but are viable candidate regions for insertion for the intended content. This makes relevant, curated content available to a user in the most effortless manner without hampering the viewing experience of the main video.
GRATITUDE DELIVERY FOR MANUFACTURED PRODUCTS
System, methods, and other embodiments described herein relate to an improved approach to providing gratitude between consumers and workers. In one embodiment, a method includes acquiring from a camera within a manufacturing facility, source video of manufacturing of different stages of a product. The method includes identifying, from the source video, segments associated with the product. The method includes generating a combined video from the segments. The method includes providing the combined video to a consumer associated with the product.
VIDEO CLIPPING METHOD AND MODEL TRAINING METHOD
Provided are a video clipping and model training method, relating to the field of video technologies, and in particular, to the field of short video technologies. The video clipping method includes: acquiring interaction behavior data for an original video file; determining interaction heat at respective time points of the original video file, according to the interaction behavior data; selecting N time points with highest interaction heat, to take the selected time points as interest points of the original video file, where N is a positive integer; and clipping the original video file based on the respective interest points, to obtain N clipped video files. Therefore, high-quality short video files can be generated.
Enhanced Emotive Engagement with Volumetric Content
A volumetric content enhancement system (“the system”) can annotate at least a portion of a plurality of voxels from a volumetric video with contextual data. The system can determine at least one actionable position within the volumetric video. The system can create an annotated volumetric video that includes the volumetric video, an annotation with the contextual data, and the at least one actionable position. The system can provide the annotated volumetric video to a volumetric content playback system. The system can obtain viewer feedback associated with the viewer and can determine an emotional state of the viewer based, at least in part, upon the viewer feedback. The system can receive viewer position information that identifies a specific actionable position of the viewer. The system can generate manipulation instructions to instruct the volumetric content playback system to manipulate the annotated volumetric content to achieve a desired emotional state of the viewer.
Methods and Systems for Operative Analysis and Management
Embodiments of the application provide methods and devices for analyzing surgeries. It may include recording images of a surgery with a camera, wherein the images may include a visual element chosen from a surgeon's hands during the surgery, a patient's surgery area, equipment used in a surgery, instruments used in a surgery and the like; saving images of a surgery; displaying a timestamp in images; chapterizing the images into different chapters; leveraging additional radiologic imaging clinical data; maximizing treatment cost/benefit; and perhaps even analyzing recorded images of a surgery. Embodiments may use artificial intelligence, computer learning, and machine learning.
Apparatuses and methods for selectively inserting text into a video resume
Aspects relate to apparatuses and methods for selectively inserting text into a video resume. An exemplary apparatus includes a processor and a memory communicatively connected to the processor, the memory containing instructions configuring the processor to receive a video resume from a user, divide the video resume is into temporal sections, acquire a plurality of textual inputs from a user, wherein the plurality of textual inputs pertains to the same user of received video resume, classify the plurality of textual inputs to corresponding temporal sections of the received video resume and display, as a function of the classification, the received video resume with a corresponding plurality of textual inputs.
Ensemble Deep Learning Method for Identifying Unsafe Behaviors of Operators in Maritime Working Environment
The present invention proposes an ensemble deep learning method for identifying unsafe behaviors of operators in maritime working environment. Firstly, extract features of maritime images with the You Only Look Once (YOLO) V3 model, and then enhance a multi-scale detection capability by introducing a feature pyramid structure. Secondly, obtain instance-level features and time memory features of the operators and devices in the maritime working environment with the Joint Learning of Detection and Embedding (JDE) paradigm. Thirdly, transfer spatial-temporal interaction information into a feature memory pool, and update the time memory features with the asynchronous memory updating algorithm. Finally, identify the interaction between the operators, the devices, and unsafe behaviors with an asynchronous interaction aggregation network. The proposed invention can accurately determine the unsafe behaviors of the operators, and thus provide operation decisions for maritime management relevant activities.
Automated video cropping
The disclosed computer-implemented method may include receiving, as an input, segmented video scenes, where each video scene includes a specified length of video content. The method may further include scanning the video scenes to identify objects within the video scene and also determining a relative importance value for the identified objects. The relative importance value may include an indication of which objects are to be included in a cropped version of the video scene. The method may also include generating a video crop that is to be applied to the video scene such that the resulting cropped version of the video scene includes those identified objects that are to be included based on the relative importance value. The method may also include applying the generated video crop to the video scene to produce the cropped version of the video scene. Various other methods, systems, and computer-readable media are also disclosed.
METHOD FOR ACTION RECOGNITION IN VIDEO AND ELECTRONIC DEVICE
A method for action recognition in a video is described. The method includes inputting a plurality of consecutive clips divided from the video into a convolutional neural network (CNN), and obtaining a set of clip descriptors; processing the set of clip descriptors via a Bi-directional Attention mechanism, and obtaining a global representation of the video; and performing video-classification for the global representation of the video such that action recognition is achieved.