METHOD AND SYSTEM FOR ENCODING, DECODING AND PLAYBACK OF VIDEO CONTENT IN CLIENT-SERVER ARCHITECTURE
20220182691 · 2022-06-09
Inventors
Cpc classification
H04N21/8543
ELECTRICITY
H04N19/119
ELECTRICITY
H04N19/46
ELECTRICITY
H04N19/25
ELECTRICITY
H04N21/44012
ELECTRICITY
H04N21/23412
ELECTRICITY
H04N21/23418
ELECTRICITY
H04N19/23
ELECTRICITY
H04N19/14
ELECTRICITY
International classification
H04N21/234
ELECTRICITY
H04N19/119
ELECTRICITY
H04N19/14
ELECTRICITY
H04N19/25
ELECTRICITY
H04N19/46
ELECTRICITY
H04N21/239
ELECTRICITY
Abstract
One or more methods and systems are provided for encoding, decoding and playback of a video content in a client-server architecture. The invention proposes a video encoding and decoding method that includes identification of activities in the video content, identification of corresponding API's with related parameters corresponding to activity and storing those API's along with base frame and object frame in the database. In this invention, animation API functions are created for unknown/random activities. The playback involves decoding the data, which is a set of instructions to play the animation with given objects and base frames, and animating object frame over base frame using said API functions.
Claims
1. A method for encoding, decoding and playback of a video content in a client-server architecture, the method comprising: processing, by a video processor module, the video content for dividing said video content into a plurality of parts based on one or more category of instructions; detecting, by an object and base frame detection module, one or more object frames and a base frame from the plurality of parts of the video content based on one or more related parameters; segregating, by an object and base frame segregation module, the object frame and the base frame from the plurality of parts of the video content based on the related parameters; detecting, by an activity detection module, a plurality of activities in the object frame; storing, in a second database, the object frame, the base frame, the plurality of activities and the related parameters; identifying and mapping, by an activity updating module, a plurality of API's corresponding to the plurality of activities based on the related parameters; receiving, by a server, a request for playback of the video content from one of a plurality of client devices; and merging, by an animator module, the plurality of activities with the object frame and the base frame for outputting a formatted video playback based on the related parameters.
2. The method as claimed in claim 1, wherein processing, by the video processor module, the video content for dividing said video content into the plurality of parts based on one or more category of instructions, further comprises: processing, by the video processor module , the received video content; detecting, by a scene detection module, one or more types of the video content; applying, by a first database, one or more category of instructions on a type of the video content; and dividing, by a video division module, the video content into the plurality of parts based on the one or more category of instructions from the first database.
3. The method as claimed in claim 1, further comprises: identifying, by the activity updating module, a plurality of unknown activities; creating, by the activity updating module, a plurality of API's for the plurality of unknown activities; and mapping, by the activity updating module, the created plurality of API's with the plurality of unknown activities.
4. The method as claimed in claim 1, wherein processing, by the video processor module, for dividing said video content into the plurality of parts based on one or more category of instructions, further comprises: extracting, by the video processor module, the related parameters of the object frames from the video content.
5. The method as claimed in claim 1, wherein the identifying and mapping, by the activity updating module, the plurality of API's corresponding to the plurality of activities further comprises: storing, by a timestamp module, a plurality of timestamps corresponding to the plurality of activities; storing, by an object locating module, a plurality of location details and an orientation of a relevant object corresponding to the plurality of activities; and generating and storing, by a file generation module, a plurality of data tables based on the timestamp and location information.
6. The method as claimed in claim 1, further comprises: storing, in the second database, an additional information corresponding to the object frame; detecting an interaction input on the object frame during playback of the video content; and displaying the additional information along with the object frame.
7. The method as claimed in claim 1, wherein a first database is a video processing cloud, and wherein the video processing cloud further comprises: providing instructions related to the detecting of a scene from the plurality of parts of the video content to the video processor module; determining the instructions for providing to each of the plurality of parts of the video content; assigning each of the plurality of parts of the video content to the server, wherein said server provides the instructions; and providing a buffer of instructions for downloading at the server.
8. A system for encoding, decoding and playback of a video content in a client-server architecture, the system comprising: a video processor module configured to process the video content to divide said video content into a plurality of parts based on one or more category of instructions; an object and base frame detection module configured to detect one or more object frames and a base frame from the plurality of parts of the video content based on one or more related parameters; an object and base frame segregation module configured to segregate the object frame and the base frame from the plurality of parts of the video content based on the related parameters; an activity detection module configured to detect a plurality of activities in the object frame; a second database configured to store the object frame, the base frame, the plurality of activities and the related parameters; an activity updating module configured to: identify a plurality of API's corresponding to the plurality of activities based on the related parameters; and map the plurality of API's corresponding to the plurality of activities based on the related parameters; and a server configured to receive a request for playback of the video content from one of a plurality of client devices; and an animator module configured to merge the plurality of activities with the object frame and the base frame for outputting a formatted video playback based on the related parameters.
9. The system as claimed in claim 8, wherein the video processor module configured to process the video content to divide said video content into the plurality of parts based on one or more category of instructions, further comprises: the video processor module configured to process the received video content; a scene detection module configured to detect one or more types of the video content; a first database configured to apply one or more category of instructions on a type of the video content; and a video division module configured to divide the video content into the plurality of parts based on the one or more category of instructions from the first database.
10. The system as claimed in claim 8, wherein the video processor module configured to divide said video content into the plurality of parts based on one or more category of instructions, further comprises: video processor module configured to extract the related parameters of the object frames from the video content.
11. The system as claimed in claim 8, wherein the object and base frame detection module configured to detect one or more object frames and a base frame further comprises: an object segregation module configured to detect a foreign object and a relevant object from the object frame.
12. The system as claimed in claim 8, wherein the activity detection module configured to detect the plurality of activities in the object frame further comprises: an activity segregation module configured to segregate the plurality of activities that are irrelevant in the video content.
13. The system as claimed in claim 8, wherein the activity updating module configured to identify and map the plurality of API's corresponding to the plurality of activities further comprises: a timestamp module configured to store a plurality of timestamps corresponding to the plurality of activities; an object locating module configured to store a plurality of location details and an orientation of a relevant object corresponding to the plurality of activities; and a file generation module configured to generate and store a plurality of data tables based on the timestamp and location information.
14. The system as claimed in claim 8, wherein the activity updating module configured to identify and map the plurality of API's corresponding to the plurality of activities is based on related parameters, wherein the related parameters includes the API, the object frame, the base frame, the activity performed by the object on the base frame and the like.
15. The system as claimed in claim 8, wherein, the second database is configured to store an additional information corresponding to the object frame; the object and base frame detection module is configured to detect an interaction input on the object frame during playback of the video content; and the one client device configured to display the additional information along with the object frame.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and modules.
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems embodying the principles of the present disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION
[0077] The various embodiments of the present disclosure provides a method and system for animation based encoding, decoding and playback of a video content in a client-server architecture. The invention, more particularly, relates to animating actions on the video content while playback after decoding the encoded video content, wherein a video compression, decompression and playback technique is used to save bandwidth and storage for the video content.
[0078] In the following description, for purpose of explanation, specific details are set forth in order to provide an understanding of the present claimed subject matter. It will be apparent, however, to one skilled in the art that the present claimed subject matter may be practiced without these details. One skilled in the art will recognize that embodiments of the present claimed subject matter, some of which are described below, may be incorporated into a number of systems.
[0079] However, the methods and systems are not limited to the specific embodiments described herein. Further, structures and devices shown in the figures are illustrative of exemplary embodiments of the present claimed subject matter and are meant to avoid obscuring of the present claimed subject matter.
[0080] Furthermore, connections between components and/or modules within the figures are not intended to be limited to direct connections. Rather, these components and modules may be modified, re-formatted or otherwise changed by intermediary components and modules.
[0081] The present claimed subject matter provides an improved method and system for animation based encoding, decoding and playback of a video content in a client-server architecture.
[0082] Various embodiments herein may include one or more methods and systems for animation based encoding, decoding and playback of a video content in a client-server architecture. In one of the embodiments, the video content is processed for dividing the video content into a plurality of parts based on one or more category of instructions. Further, one or more object frames and a base frame are detected from the plurality of parts of the video based on one or more related parameters. The one or more related parameters includes physical and behavioural nature of the relevant object, action performed by the relevant object, speed, angle and orientation of the relevant object, time and location of the plurality of activities and the like. Further, the detected object frame and the base frame are segregated from the plurality of parts of the video based on the related parameters. Further, a plurality of activities are detected in the object frame and the object frame, the base frame, the plurality of activities and the related parameters are stored in a second database. Further, a plurality of API's corresponding to the plurality of activities are identified and mapped based on the related parameters. Further, a request for playback of the video content is received from one of a plurality of client devices. Here, the plurality of client devices includes smartphones, tablet computer, web interface, camcorder and the like. Upon receiving a request for playback of the video content, the plurality of activities with the object frame and the base frame are merged together for outputting a formatted video playback based on the related parameters.
[0083] In another embodiment, the video content is captured for playback. Further, the captured video content is processed for dividing the video content into a plurality of parts based on one or more category of instructions. Further, one or more object frames and a base frame are detected from the plurality of parts of the video based on one or more related parameters. Further, the detected object frame and the base frame are segregated from the plurality of parts of the video based on the related parameters. Further, a plurality of activities are detected in the object frame and the object frame, the base frame, the plurality of activities and the related parameters are stored in a second database. Further, a plurality of API's corresponding to the plurality of activities are identified and mapped based on the related parameters. Further, the plurality of activities are merged with the object frame and the base frame together for outputting a formatted video playback based on the related parameters.
[0084] In another embodiment, a request is received for playback of the video content from one of a plurality of client devices. Further, the received video content is processed for dividing the video content into a plurality of parts based on one or more category of instructions. Further, one or more object frames and a base frame are detected from the plurality of parts of the video based on one or more related parameters. Further, the detected object frame and the base frame are segregated from the plurality of parts of the video based on the related parameters. Further, a plurality of activities are detected in the object frame and the object frame, the base frame, the plurality of activities and the related parameters are stored in a second database. Further, a plurality of API's corresponding to the plurality of activities are identified and mapped based on the related parameters. Further, the plurality of activities are merged with the object frame and the base frame together for outputting a formatted video playback based on the related parameters.
[0085] In another embodiment, a video player is configured to send a request for playback of video content to the server. Further, one or more object frames, a base frame, plurality of API's corresponding to a plurality of activities and one or more related parameters are received from the server. Furthermore, the object frames and the base frame are merged with the corresponding plurality of activities associated with the plurality of API's and the video player is further configured to play the merged video.
[0086] In another embodiment, the video player is further configured to download one or more object frames, the base frame, the plurality of API's corresponding to the plurality of activities and one or more related parameters and to store one or more object frames, the base frame, the plurality of API's corresponding to the plurality of activities and one or more related parameters. The video player which is configured to play the merged video further creates buffer of the merged video and the downloaded video.
[0087] In another embodiment, a video processor module is configured to process the video content to divide the video content into a plurality of parts based on one or more category of instructions. Further, an object and base frame detection module is configured to detect one or more object frames and a base frame from the plurality of parts of the video based on one or more related parameters. Further, an object and base frame segregation module is configured to segregate the object frame and the base frame from the plurality of parts of the video based on the related parameters. Further, an activity detection module is configured to detect a plurality of activities in the object frame. Furthermore, a second database is configured to store the object frame, the base frame, the plurality of activities and the related parameters. Further, an activity updating module is configured to identify a plurality of API's corresponding to the plurality of activities based on the related parameters and to map a plurality of API's corresponding to the plurality of activities based on the related parameters. Further, a server is configured to receive a request for playback of the video content from one of a plurality of client devices. Further, an animator module is configured to merge the plurality of activities with the object frame and the base frame for outputting a formatted video playback based on the related parameters.
[0088] In another embodiment, the object frame and the base frame are stored in the form of an image and the plurality of activities are stored in the form of an action with the location and the timestamp.
[0089] In another embodiment, the video content is processed for dividing said video content into a plurality of parts based on one or more category of instructions, wherein the received video content is processed by the video processor module. Further, one or more types of the video content are detected and one or more category of instructions are applied on the type of the video content by a first database. The video content is then divided into a plurality of parts based on the one or more category of instructions from the first database.
[0090] In another embodiment, a plurality of unknown activities are identified by the activity updating module. A plurality of API's are created for the plurality of unknown activities by the activity updating module. These created plurality of API's are mapped with the plurality of unknown activities. Moreover, the created plurality of API's for the plurality of unknown activities are updated in a third database.
[0091] In another embodiment, the related parameters of the object frames are extracted from the video content.
[0092] In another embodiment, the plurality of unknown activities that are identified by the activity updating module further comprises detecting the plurality of API's corresponding to the plurality of activities in the third database and segregating the plurality of activities from the plurality of unknown activities by the activity updating module.
[0093] In another embodiment, a foreign object and a relevant object from the object frame are detected by an object segregation module.
[0094] In another embodiment, the plurality of activities that are irrelevant in the video content are segregated by an activity segregation module.
[0095] In another embodiment, a plurality of timestamps corresponding to the plurality of activities are stored by a timestamp module. Further, a plurality of location details and the orientation of the relevant object corresponding to the plurality of activities are stored by an object locating module. A plurality of data tables are generated based on the timestamp and location information and stored by a files generation module.
[0096] In another embodiment, the location is a set of coordinates corresponding to the plurality of activities. And the plurality of timestamps are corresponding to start and end of the plurality of activities with respect to the location.
[0097] In another embodiment, an additional information corresponding to the object frame is stored in the second database. Further, an interaction input is detected on the object frame during playback of the video content and the additional information along with the object frame is displayed.
[0098] In another embodiment, the first database is a video processing cloud and the video processing cloud further provides instructions related to the detecting of the scene from the plurality of parts of the video to the video processor module and determines the instructions for providing to each of the plurality of parts of the video. Further, each of the plurality of parts of the video is assigned to the server, wherein said server provides the required instructions and a buffer of instructions are provided for downloading at the server.
[0099] In another embodiment, the second database is a storage cloud.
[0100] In another embodiment, the third database is an API cloud and the API cloud further stores the plurality of API's and provides the plurality of API's corresponding to the plurality of activities and a buffer of the plurality of API's at the client device.
[0101] In another embodiment, the first database, second database and the third database correspond to a single database providing a virtual division among themselves.
[0102] In another embodiment, the server is connected with the client and the storage cloud by a server connection module. And the client is connected with the server and the storage cloud by a client connection module.
[0103] In another embodiment, a plurality of instructions are generated for video playback corresponding to the object frame, the base frame and the plurality of activities based on the related parameters by a file generation module.
[0104] It should be noted that the description merely illustrates the principles of the present invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described herein, embody the principles of the present invention. Furthermore, all examples recited herein are principally intended expressly to be only for explanatory purposes to help the reader in understanding the principles of the invention and the concepts contributed by the inventor to furthering the art and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass equivalents thereof.
[0105]
[0106] In the present implementation, the server 108 includes, but are not limited to, a proxy server, a mail server, a web server, an application server, real-time communication server, an FTP server and the like.
[0107] In the present implementation, the client devices or user devices include, but are not limited to, mobile phones (for e.g. a smart phone), Personal Digital Assistants (PDAs), smart TVs, wearable devices (for e.g. smart watches and smart bands), tablet computers, Personal Computers (PCs), laptops, display devices, content playing devices, IoT devices, devices on content delivery network (CDN) and the like.
[0108] In the present implementation, the system 100 further includes one or more processor(s). The processor may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in a memory.
[0109] In the present implementation, the database may be implemented as, but not limited to, enterprise database, remote database, local database, and the like. Further, the database may themselves be located either within the vicinity of each other or may be located at different geographic locations. Furthermore, the database may be implemented inside or outside the system 100 and the database may be implemented as a single database or a plurality of parallel databases connected to each other and with the system 100 through network. Further, the database may be resided in each of the plurality of client devices, wherein the client 112 as shown in
[0110] In the present implementation, the audio/video input is the input source to the video processor module 102. The audio/video input can be an analog video signal or digital video data that is processed and deduced by the video processor module 102. It may also be an existing video format such as .mp4, .avi, and the like.
[0111] In the present implementation, the video processing cloud 114 is configured to provide the appropriate algorithm to process a part of the video content. The video processing cloud 114 is configured to provide scene detection algorithms to the video processor module 102. It further divides the video into a plurality of parts or sub frames and determines the algorithm to be used for each of the plurality of parts. Further, the video processing cloud 114 is configured to assign the plurality of parts or sub frames to the video processing server 110 that provides the appropriate algorithms to deduce about the object frame, base frame and plurality of activities of the video content. Further, the video processing cloud 114 is configured to detect and store a plurality of unknown activities in the form of animation in the API cloud 118. Further, a buffer of algorithms are provided which could be downloaded at the server 110. Further, the video processing cloud 114 is configured to maintain the video processing standards.
[0112] In the present implementation, the API cloud 118 is configured to store a plurality of animations that the video processing cloud 114 has processed. It further provides the accurate API as per the activity segregated out by the video processor module 102. The API cloud 118 is further configured to create an optimized and a Graphics Processing Unit (GPU) safe library. It is configured to provide a buffer of API's at the client 112 where the video is played.
[0113] In the present implementation, the storage cloud 116 is configured to store the object frame, the base frame and the plurality of activities that are segregated by the video processor module 102. The storage cloud 116 is present between the server 110 and client 112 through the connection module (104, 106). Here, the video processing cloud 114 is a first database, the storage cloud 116 is a second database and the API cloud 117 is a third database. The first database, second database and the third database correspond to a single database providing a virtual division among themselves.
[0114] Further, the system 100 includes a video processor module 102, a connection module (104, 106) and an animator module 108. The video processor module 102 is configured to process the analog video input and to segregate the entities which includes the objects also referred to as the object frame, the background frames also referred to as the base frame and the plurality of actions also referred to as the plurality of activities. The video processor module 102 is further configured to store these entities in the animator module 108. The video processor module 102 works in conjunction with the video processing cloud. Further, the conventional algorithms of the video processing techniques are used to deduce about the object frame, base frame and plurality of activities of the video content. Further, the system 100 includes the connection module which includes the server connection module 104 and the client connection module 106. The server connection module 104 is configured to connect the server 110 with the client 112 and the storage cloud 116. It also sends the output of the video processor module 102 to the storage cloud 116. The client connection module 106 is configured to connect the client 112 with the server 110 and the storage cloud 116. It also fetches the output of the video processor module 102 from the storage cloud 116. Further, the system 100 includes the animator module 108 which is configured to merge the plurality of activities with the object frame and the base frame and to animate a video out of it. The animator module 108 is connected to the API cloud 118 which helps it to map the plurality of activities with the animation API. It further works in conjunction with the API cloud 118.
[0115] In the present implementation, the system 100 includes the storage which includes the server storage 102 and the client storage 122. The server storage 120 is the storage device at the server side in which the output of the video processor module 102 is stored. The output of the video processor module 102 comes as the object frame, the base frame and the plurality of activities involved. These object frames and the base frames are stored as images and the plurality of activities are stored as action with location and timestamp. Further, the client storage 122 is configured to store the data obtained from the storage cloud 116. The data is the output of the video processor module 102 which comes as the object frame, the base frame and the plurality of activities involved. These object frames and the base frames are stored as images and the plurality of activities are stored as action with location and timestamp.
[0116] Further, the audio/video output is obtained using the animator module 108 which is configured to merge the plurality of activities with the object frame and the base frame.
[0117]
[0118] Further, the scene detection module 202 is configured to detect the type of algorithm to be used in the video content. Each of the plurality of parts of the video content may need different type of processing algorithm. This scene detection module 202 is configured to detect the algorithm to be used as per the change in the video content. Further, the type of the video is obtained to apply the appropriate processing algorithm. Further, the appropriate algorithms are deployed to detect the type of the scene. The video processing cloud 114 obtains the type of the scene from the scene detection module 202 and then determines from one or more category of instructions to apply as per the relevance of the scene. Further, the video division module 204 is configured to divide the video into a plurality of parts as per the processing algorithm required to proceed. The video can be divided into parts and even sub frames to apply processing and make it available as a video thread for the video processors. Further, many known methods are used for detection of scene changes in a video content, colour change, motion change and the like and automatically splitting the video into separate clips. Once the division of the each of the plurality of parts is completed, said each of the plurality of parts is sent to the video processing cloud 114 where the available server is assigned the tasks to process the video. The video is divided into a plurality of parts as per the video processing algorithm to be used.
[0119] Further, the objects and base frames detection module 206 is configured to detect one or more object frames present in the part of the video content. The main three key steps in the analysis of video process are: moving objects detection in video frames, track the detected object or objects from one frame to another and study of tracked object paths to estimate their behaviours. Mathematically every image frame is matrix of order i×j, and the fth image frame be defined as a matrix:
where i and j is the width and height of the image frame respectively. The pixel intensity or gray value at location (m, n) at time t is denoted by (m, n, t). Further, the objects and base frames segregation module 208 is configured to segregate the object frame and the base frame. The fundamental objective of the image segmentation calculations is to partition a picture into comparative areas. Each division calculation normally addresses two issues, to decide criteria based on that segmentation of images is doing and the technique for attaining effective dividing. The various division methods that are used are image segmentation using Graph-Cuts (Normalized cuts), mean shift clustering, active contours and the like. Further, the objects segregation module 210 is configured to detect if the object is relevant to the context. The appropriate machine learning algorithms are used to differentiate a relevant object and a foreign object from the object frame. The present invention discloses characterization of optimal decision rules. If anomalies that are local optimal decision rules are local even when the nominal behaviour exhibits global spatial and temporal statistical dependencies. This helps collapse the large ambient data dimension for detecting local anomalies. Consequently, consistent data-driven local observed rules with provable performance can be derived with limited training data. The observed rules are based on scores functions derived from local nearest neighbour distances. These rules aggregate statistics across spatio-temporal locations & scales, and produce a single composite score for video segments.
[0120] Further, the activity detection module 212 is configured to detect the plurality of activities in the video content. The activities can be motion detection, illuminance change detection, colour change detection and the like. In an exemplary implementation, the human activity detection/recognition is provided herein. The human activity recognition can be separated into three levels of representations, individually the low-level core technology, the mid-level human activity recognition systems and the high-level applications. In the first level of core technology, three main processing stages are considered, i.e., object segmentation, feature extraction and representation, and activity detection and classification algorithms. The human object is first segmented out from the video sequence. The characteristics of the human object such as shape, silhouette, colours, poses, and body motions are then properly extracted and represented by a set of features. Subsequently, an activity detection or classification algorithm is applied on the extracted features to recognize the various human activities. Moreover, in the second level of human activity recognition systems, three important recognition systems are discussed including single person activity recognition, multiple people interaction and crowd behaviour, and abnormal activity recognition. Finally, the third level of applications discusses the recognized results applied in surveillance environments, entertainment environments or healthcare systems. In the first stage of the core technology, the object segmentation is performed on each frame in the video sequence to extract the target object. Depending on the mobility of cameras, the object segmentation can be categorized as two types of segmentation, the static camera segmentation and moving camera segmentation. In the second stage of the core technology, characteristics of the segmented objects such as shape, silhouette, colours and motions are extracted and represented in some form of features. The features can be categorized as four groups, space-time information, frequency transform, local descriptors and body modelling. In the third stage of the core technology, the activity detection and classification algorithms are used to recognize various human activities based on the represented features. They can be categorized as dynamic time warping (DTW), generative models, discriminative models and others.
[0121] Furthermore, the activity segregation module 214 is configured to segregate the irrelevant activities from a video content. For example, an irrelevant activity can be some insect dancing in front of a CCTV camera. Further, the activity updating module 216 is configured to identify a plurality of unknown activities. Further, the timestamp module 218 is configured to store timestamps of each of the plurality of activities. The time-stamping, time-coding, and spotting are all crucial parts of audio and video workflows, especially for captioning and subtitling services and translation. This refers to the process of adding timing markers also known as timestamps to a transcription. The time-stamps can be added at regular intervals, or when certain events happen in the audio or video file. Usually the time-stamps just contain minutes and seconds, though they can sometimes contain frames or milliseconds as well. Further, the object locating module 220 is to store the location details of the plurality of activities. It can store the motion as start and end point of the motion and curvature of motion. Further, the file generation module 222 is configured to generate a plurality of data tables based on the timestamp and location information. The examples of the data tables generated are as below:
TABLE-US-00001 TABLE 1 Activity to animation map Activity Animation API Riding QueenHorse( ) Travelling SoldiersTravel( ) Leading QueenLeading( ) Smiling Smiling( )
TABLE-US-00002 TABLE 2 Activity to time map Activity Timestamp Riding T2 Travelling T0 Leading T1 Smiling T3
TABLE-US-00003 TABLE 3 Activity to location map Activity start end Motion equation Riding L1 L2 EQ0: Straight line Travelling L0 L2 EQ1: path curve Leading L3 L4 EQ2: random curve Smiling L5 L6 EQ3: smile curve
[0122] Further, the video processor module 102 is configured to output the activity details of the video content as the type of the activity i.e. the activity, who performs the activity i.e. the object, on whom is the activity performed i.e. the base frame, when the activity is performed i.e. the timestamp and where the activity is performed i.e. the location. The output is a formatted video playback based on the related parameters. The related parameters includes physical and behavioural nature of the relevant object, action performed by the relevant object, speed, angle and orientation of the relevant object, time and location of the plurality of activities and the like.
[0123]
[0124]
TABLE-US-00004 TABLE 4 Activity-Animation similarity Activity Animation API Similarity Riding RideHorse(.Math..Math..Math.) 0.49 QueenHorseRide(.Math..Math..Math.) 0.95 SoldierHorseRide(.Math..Math..Math.) 0.68 KingHorseRide(.Math..Math..Math.) 0.86
[0125] Further, the animation API animates the activity that had occurred. It needs basic parameters required for the animation to run. Some examples are shown below:
TABLE-US-00005 TABLE 5 Animation Parameters Animation API Parameters RideHorse(.Math..Math..Math.) Horse speed, Orientation, Angle, turns, Facing, sitting position etc. BouncingBowl( ) Speed, angle, orientation, bowl type, no. of bounces, rotate on bounce, etc. CarMoving(.Math..Math..Math.) Speed, angle, orientation, tyre angular speed, etc. Fight( ) Combat value, no. of punches energy, movement, etc.
[0126] Further, the player is an application capable of reading the object frame and the base frame and draw activities on and with them so as to give an illusion of a video. It is made up of simple image linkers and animation APIs. It is an application compatible for playback of a video in the format file. Further, the video player provides animation modules which are called with association of one or more objects. Further, the playback buffer is obtained by first downloading the contents which are the data of the plurality of activities, the object frame and the base frame. Then, merging the object frame and the base frame with the plurality of activities associated API's and playing the merged video.
[0127]
[0128]
[0129]
[0130]
[0131]
[0132]
[0133]
[0134]
[0135]
[0136]
[0137] Object Flower=new Object( )
[0138] BaseFrame Soil=new BaseFrame( )
[0139] Further, as a cactus in this soil is irrelevant to grow. Thus, the object is irrelevant to the context base frame. Thus, cactus would be a foreign object to this soil.
[0140]
TABLE-US-00006 TABLE 6 Timestamp Table for detected Scenario Activity Timestamp Planted flower T0 NONE T1 Blossom T2
TABLE-US-00007 TABLE 7 Location Table for detected Scenario Activity start Fend Motion details Planted flower L0 L0 EQ1: Appear NONE L0 L0 NULL Blossom L0 L1 EQ2: Appear with size change
[0141] Further, a plurality of data tables based on the timestamp and location information as shown below are generated by the file generation module. As the above data tables are generated for the given video scenario, the activity is animated at the given time and the location and with the applicable animation APIs. Further, in this figure, the mapped animation API is downloaded and initiated at the node to play the animation. For example, F Blossom( ) API is downloaded for flower's blossom activity.
[0142]
[0143] O: Set of foreground Objects
[0144] B: Set of Background Object
[0145] A: Action
[0146] Further, the video processor module 102 is configured to generate a function called as an Action function G (O, A, B) which is the function that is obtained after merging the entities O, A and B. thus G(O, A, B) is denoted as follows:
[0147] G(O,A,B):MovingCar(Car, Highway, Moving);
[0148] Such that,
[0149] O: Car
[0150] B: Highway
[0151] A: Moving
[0152] Here, the O, B being the images of the car and the highway, also holds the physical and behavioral data. Thus, O and B represent the object or the computer readable variable which holds the value of the object frame and the background frame. In
[0153] F(S): MovingCarAnimation(S)
[0154] Such that,
[0155] S={speed, angle, curvature, . . . }
[0156] Further, the animation-action mapping function is configured to calculate the most similar Animation function mapped to the input action function, which is given as below:
H(G)˜F
[0157] Thus, H(G) gives the most similar Animation Function F corresponding to given Action Function G which is shown in the below table:
TABLE-US-00008 TABLE 8 Animation function F corresponding to given action function G Action(F) Animation(G) F1 G1 F2 G2 F3 G3 F4 G4 F5 G5 | | | | | | Fx Gx | | | | Fn n
[0158] Further, if an animation F is produced by an action G, then an animation F can also produce an action F−1 which is G. For example, if MovingCarAnimation(F) is produced due to MovingCarAction(G) then MovingCarAction(G) can also produce MovingCarAction2(G′) which would had been MovingCarAction(G). In simple terms, Moving Car animation can produce Moving Car action if Moving Car animation is produced by Moving Car action and vice versa. The action function G (O,A,B) is the inverse of F. Thus, F−1=G. This implies,
[0159] If, G.fwdarw.F
[0160] Then, F.fwdarw.G
[0161] Hence, F.Math.G
[0162] Thus, the Similarity function is the measure of how inverse an animation-action pair is. As shown in
TABLE-US-00009 TABLE 9 Adding new animation function to the map Action (F) Animation (G) Action (F) Animation (G) F1 G1 F1 G1 F2 G2 F2 G2 F3 G3 F3 G3 F4 G4 F4 G4 F5 | | G5 | |
[0163] For example, there is no action-animation pair in the map for moving car without gravity as such a video has never been processed. Thus, when such an action is detected, the Action Function Gc is created by the video processor module 102. But a similar function Fc is not found in the map. Thus the create module 1404 creates a new Animation Function Fc for this action. As shown in
[0164]
[0165]
[0166] G: Action function for the motion of the car for parking it,
[0167] V.P.: The vertical plane of the background frame, and
[0168] H.P.: The horizontal plane of the background frame.
[0169] In the
y=a EQ.sub.1:
where ‘a’ is a constant distance from H.P. As the motion is horizontal and EQ1 is parallel to H.P. After reaching to the parking lot, the car needs to rotate by some angle to adjust the turns as shown in
(x−a).sub.2+(y−b).sub.2=r.sub.2 EQ2:
where,
[0170] a: distance between H.P. and the center of the circle;
[0171] b: distance between V.P. and the center of the circle; and
[0172] r: radius of the circle
[0173] Further, this motion could also be represented by the equation for the arc of the circle. This is given by:
arclength=2πr(ø/360) EQ2′:
where,
[0174] r: radius of the arc; and
[0175] Ψ: central angle of the arc in degrees
[0176]
y=mx+c EQ3:
where,
[0177] m: slope/gradient; and
[0178] c: intercept <value of y when x=0>
[0179] Further, the other motions shown in
(x−a).sub.2+(y−b).sub.2=r.sub.2 EQ.sub.4:
y=mx+c EQ5:
(x−a).sub.2+(y−b).sub.2=r.sub.2 EQ6:
y=mx+c EQ7:
(x−a).sub.2+(y−b).sub.2=r.sub.2 EQ8:
[0180]
x=b EQ9:
where ‘b’ is a constant distance from V.P. As the motion is horizontal, EQ9 is parallel to V.P. Hence, the action function G is represented as below:
G=EQ1>EQ2>EQ3>EQ4>EQ5>EQ6>EQ7>EQ8>EQ9>null
where,
[0181] >: a special type of binary function such that,
[0182] If A>B, A happens before B; and
[0183] Null marks the end of the function.
[0184] Thus, G is the combination of all the motions that had taken place. Further, the animation function F as discussed above is used while playing the video. During the search, the action functions are generated with the help of the animation function. The action functions similar to the occurred action is received by the video processor module. It is the decision of the video processor module either to map the action to animation API or create a new animation API corresponding to the action occurred if there is no similarity.
[0185] In the example above, the animation-action map stores the linear and the rotary motions of the car. Thus, many action functions would be downloaded until all these types of motion functions are obtained i.e. from <EQ1 to EQ9>. The set of similar functions are downloaded until all of EQ1 to EQ9 are found. In case any of the motion function is not found, then the action function's animation function is created and added into the map, which is shown in the below table:
TABLE-US-00010 TABLE 10 Activity-Animation similarity Action Functions Similar Action Functions Similarity G = G1: EQ1 > EQ4 2/9 EQ1 > EQ2 > EQ3 > EQ4 > G2: EQ3 > EQ10 2/9 EQ5 > EQ6 > EQ7 > G3: EQ4 > EQ5 > EQ2 3/9 EQ8 > EQ9 > null G4: EQ9 1/9 G5: EQ2 > EQ3 > EQ4 > EQ11 > EQ12 > EQ13 3/9 G6: EQ1 1/9 G7: EQ6 > EQ7 > EQ8 3/9 G8: FQ1 > EQ9 2/9
[0186] Thus,
G=G1∪G2∪G3∪G4∪G7 Or G3∪G5∪G7∪G8.
[0187]
[0188]
[0189]
[0190]
[0191]
[0192]
[0193] In another exemplary embodiment, match highlights can be made by analysing the frequencies of the video and sound waves. Further, the data related to the game is obtained which is most important. For example, a football goal kick could be kept in the highlights.
[0194]
[0195]
[0196]
[0197]
[0198] Scene1: Wedding of Blood bride:
[0199] Part1:
[0200] time <actor, action, base frame>
[0201] T0<bride, gets ready, wedding set>
[0202] T1<bride, listening to wedding prayers, wedding set>
[0203] Part 2:
[0204] T2<bridegroom, holds hand, wedding set>
[0205] T3<bridegroom, dies, wedding set>
[0206] Scene 2: Killing by blood bride:
[0207] Part3:
[0208] time <actor, action, base frame>
[0209] Tx <bride, dies, wedding set>
[0210] Ty <bride, becomes ghost, wedding set>
[0211] Part 2:
[0212] Tz <bride, kills X bride's bridegroom, X's wedding set>
[0213] The actors of the scene are detected and their physical and behavioral data traits are obtained. Further, the present invention provides a very refined and advanced video search engine, wherein even if the movie name is not known, the search could still return a relevant result.
[0214]
[0215] In
[0216] It should be noted that the description merely illustrates the principles of the present invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described herein, embody the principles of the present invention. Furthermore, all the used cases recited herein are principally intended expressly to be only for explanatory purposes to help the reader in understanding the principles of the invention and the concepts contributed by the inventor(s) to furthering the art and are to be construed as being without limitation to such specifically recited used cases and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass equivalents thereof.