Patent classifications
G06V20/47
Information processing apparatus and information processing method
A clip image to be used as a highlight image, a replay image, or the like in a broadcast or the like is enabled to be generated easily and precisely. For this purpose, an information processing apparatus performs first processing for converting a received image signal into an image signal for real-time processing and transmitting the image signal to an analysis engine that is located outside. Furthermore, the information processing apparatus performs second processing for receiving event extraction information that has been received from the analysis engine and generating setting information of a clip image, by using the event extraction information.
Dynamic audiovisual segment padding for machine learning
Techniques for padding audiovisual clips (for example, audiovisual clips of sporting events) for the purpose of causing the clip to have a predetermined duration so that the padded clip can be evaluated for viewer interest by a machine learning (ML) algorithm. The unpadded clip is padded with audiovisual segment(s) that will cause the padded clip to have a level of viewer interest that it would have if the unpadded clip had been longer. In some embodiments the padded segments are synthetic images generated by a generative adversarial network such that the synthetic images would have the same level of viewer interest (as adjudged by an ML algorithm) as if the unpadded clip had been shot to be longer.
Image capture device with an automatic image capture capability
An image capture device may automatically capture images. An image sensor may generate visual content based on light that becomes incident thereon. A depiction of interest within the visual content may be identified, and one or more images may be generated to include one or more portions of the visual content including the depiction of interest.
DYNAMICALLY CREATING A COMPOSITION REFERENCE VIDEO TO SUPPORT A USER ACTIVITY
A computer-implemented method, a computer program product, and a computer system for dynamically creating a composition reference video to support a user activity. In response to that a user selects a reference video for performing an activity, the computer system identifies a search query of a reference video. The computer system identifies personalized parameters of the user, based on a knowledge corpus user preferences of performing activities, and the search query. The computer system identifies appropriate videos and video transcripts in an online video repository and identifies textual contents through document and text search, based on a prediction about how the user is to perform the activity. The computer system draws series of images based on the textual contents. The computer system normalizes contents from the appropriate videos and the series of images and normalizes voices in the contents from the appropriate videos.
SELECTING SUPPLEMENTAL AUDIO SEGMENTS BASED ON VIDEO ANALYSIS
Aspects of the present application correspond to generation of supplemental content based on processing information associated with content to be rendered. More specifically, aspects of the present application correspond to the generation of audio track information, such as music tracks, that are created for playback during the presentation of video content. Illustratively, one or more frames of the video content are processed by machine learned algorithm(s) to generate processing results indicative of one or more attributes characterizing individual frames of video content. A selection system can then identify potential music track or other audio data in view of the processing results.
ROBUST VIEW CLASSIFICATION AND MEASUREMENT IN ULTRASOUND IMAGING
For robust view classification and measurement estimation in sequential ultrasound imaging, the classification and/or measurements for a given image or sequence of images are gated. To prevent oscillation in results, the gating provides consistent output.
Systems and methods for generating a video summary
Systems and method of generating video summaries are presented herein. Information defining a video may be obtained. The video may include a set of frame images. Parameter values for parameters of individual frame images of the video may be determined. Interest weights for the frame images may be determined. An interest curve for the video that characterizes the video by interest weights as a function of progress through the set of frame images may be generated. One or more curve attributes of the interest curve may be identified and one or more interest curve values of the interest curve that correspond to individual curve attributes may be determined. Interest curve values of the interest curve may be compared to threshold curve values. A subset of frame images of the video to include within a video summary of the video may be identified based on the comparison.
Method and system for augmented reality (AR) content creation
A method and a system for Augmented Reality (AR) content creation is disclosed. The method includes creating a feature vector corresponding to each of a sequence of frames extracted from a video, based on a plurality of features captured. The method further includes determining a vector distance between each of two consecutive frames from the sequence of frames, based on the feature vector associated with each of the two consecutive frames. The method further includes dividing the video into a plurality of frames based on the determined vector distance. The method further includes creating a storyline based on an object and an action associated with the object in each of the plurality of frames, and generating a set of instructions for a user based on the storyline created for each of the plurality of frames and real-time video stream capturing a current state of a user environment.
TRANSFORMER-BASED TEMPORAL DETECTION IN VIDEO
With rapidly evolving technologies and emerging tools, sports-related videos generated online are rapidly increasing. To automate the sports video editing/highlight generation process, a key task is to precisely recognize and locate events-of-interest in videos. Embodiments herein comprise a two-stage paradigm to detect categories of events and when these events happen in videos. In one or more embodiments, multiple action recognition models extract high-level semantic features, and a transformer-based temporal detection module locates target events. These novel approaches achieved state-of-the-art performance in both action spotting and replay grounding. While presented in the context of sports, it shall be noted that the systems and methods herein may be used for videos comprising other content and events.
SYSTEM AND METHOD FOR CROWDSOURCING A VIDEO SUMMARY FOR CREATING AN ENHANCED VIDEO SUMMARY
System and method for crowdsourcing a video summary for creating an enhanced video summary are disclosed. The method includes receiving videos, analysing the videos, creating the video summary of the videos using a building block model, storing the video summary in a video library database, crowdsourcing the video summary to at least one of the plurality of users, enabling the at least one of the plurality of users to review the video summary and identify at least one new characteristic, enabling the at least one of the plurality of users to share the at least one new characteristic on the platform, comparing at least one existing characteristic of the building block model with the corresponding new characteristic, reconciling the video summary along with at least one inserted new characteristic, creating a new building block model, editing the video summary for creating the enhanced video summary.