SYSTEM AND METHOD OF INTRUDER DETECTION
20230105423 · 2023-04-06
Assignee
Inventors
- Anurag Anil DESHMUKH (Gujarat, IN)
- Jitendra BHATIA (Navi Mumbai, IN)
- Shardrul Shailendra PARAB (Pune, IN)
Cpc classification
H04N7/181
ELECTRICITY
G06V20/46
PHYSICS
G06V20/52
PHYSICS
G08B13/19645
PHYSICS
H04N7/188
ELECTRICITY
G06V10/25
PHYSICS
G08B13/19613
PHYSICS
G06V40/171
PHYSICS
International classification
H04N7/18
ELECTRICITY
G06V20/52
PHYSICS
Abstract
The present invention provides a robust and effective solution to an entity or an organization by enabling them to implement a system for facilitating intruder detection that is highly accurate especially for enhancing critical security zones. The intruder detection system (IDS) can provide high true positives and extremely low false positives. The system may be developed using techniques from artificial intelligence and computer vision and is focused on highly optimised intruder detection with equal attention paid to both accuracy and speed. The area of use is highly diverse and can be used mainly for surveillance and security purposes, be it right from outdoor areas such as perimeter walls, campuses to indoor areas such as malls, factory floors, and the like.
Claims
1. A system (110) for facilitating detection of one or more intruders (102) in and around an entity (114), said system comprises: a plurality of camera sensors (104), said plurality of camera sensors (104) operatively coupled to one or more processors (202), wirelessly coupled to the plurality of camera sensors (104) through a network (106), said one or more processors (202) coupled with a memory (204), wherein said memory (204) stores instructions which when executed by the one or more processors (202) cause the system (110) to: receive a plurality of video streams from the plurality of camera sensors (104); extract, by an artificial intelligence (AI) engine (214), a first set of attributes from each video stream, the first set of attributes pertaining to an image associated with a combination of region of interest (ROI) and a buffer zone, wherein the AI engine (214) is associated with one or more processors (202); based on the extracted first set of attributes, correlate, by the AI engine (214), the first set of attributes extracted with a knowledgebase having an image background of the ROI and the buffer zone; and, detect, by a detection processing module (216), presence of one or more intruders based on the correlation of the first set of attributes and the knowledgebase, wherein the detection processing module is associated with the one or more processors (202).
2. The system as claimed in claim 1, wherein an input module associated with the AI engine (214) receives from a user a first set of parameters associated with the plurality of camera sensors, wherein the first set of parameters pertain to directional parameters of the plurality of camera sensors to direct the plurality of camera sensors towards the ROI and the buffer zone.
3. The system as claimed in claim 1, wherein on detection of the one or more intruders (102), the system (110) alerts the entity (114) about a possible intrusion at the ROI and the buffer zone.
4. The system as claimed in claim 1, wherein the one or more intruders (102) includes any moving object, a person, an animal, an immobile object, and any suspicious object.
5. The system as claimed in claim 1, wherein the AI engine (216) processes a plurality of video frames present in the video streams received from the plurality of camera sensors (104) at real time.
6. The system as claimed in claim 1, wherein the AI engine (216) is associated with an online multi-frame logic (OML) module, wherein the OML module calculates an instantaneous velocity of the one or more intruders (102).
7. The system as claimed in claim 1, wherein the system (110) is further configured to: generate, by the detection processing module (216), a bounding box for each intruder detected; send, by the detection processing module (216), the bounding box for each intruder to the online multi-frame logic module
8. The system as claimed in claim 7, wherein the system (110) is further configured to link, by the OML module (218), one or more detections associated with the bounding box for each intruder (102) across a plurality of video frames received from the plurality of camera sensors.
9. The system as claimed in claim 8, wherein the system (110) is further configured to: determine, by the OML module (218), missed one or more detections across the plurality of video frames; based on the determination of missed one or more detections, improve, by the OML module (218), true positives associated with the one or more detections and reduce the number of false positives associated with the missed one or more detections.
10. The system as claimed in claim 8, wherein the system (110) is further configured to: measure, by the OML module (218), velocity of each intruder enabling more control and have additional criteria for raising alarms.
11. The system as claimed in claim 8, wherein the system (110) is further configured to: Instantly detect, by the AI engine (214), one or more intruders, even when only a part of the body of the one or more intruders is present in the ROI.
12. The system as claimed in claim 1, wherein the system (110) is further configured to be agnostic to the type of video streams from the plurality of camera sensors being input to the system (110), wherein a feed checking and model switching module associated with each camera sensor are used to process the plurality of video frames based on the type of the camera sensor feed.
13. The system as claimed in claim 1, wherein the system (110) is further configured to: log a plurality of stats such as alarm stats, camera stats, and individual tubelets associated with the detection of the one or more intruders.
14. A user equipment (UE) for facilitating detection of one or more intruders (102) in and around an entity (114), said UE comprises: a plurality of camera sensors (104), said plurality of camera sensors (104) operatively coupled to one or more processors (202), wirelessly coupled to the plurality of camera sensors (104) through a network (106), said one or more processors (202) coupled with a memory (204), wherein said memory (204) stores instructions which when executed by the one or more processors (202) cause the UE to: receive a plurality of video streams from the plurality of camera sensors (104); extract, by an artificial intelligence (AI) engine (214), a first set of attributes from each video stream, the first set of attributes pertaining to an image associated with a combination of region of interest (ROI) and a buffer zone, wherein the AI engine (214) is associated with one or more processors (202); based on the extracted first set of attributes, correlate, by the AI engine (214), the first set of attributes extracted with a knowledgebase having an image background of the ROI and the buffer zone; and, detect, by a detection processing module (216), presence of one or more intruders based on the correlation of the first set of attributes and the knowledgebase, wherein the detection processing module is associated with the one or more processors (202).
15. A method (110) for facilitating detection of one or more intruders (102) in and around an entity (114), said method comprises: receiving, by one or more processors, a plurality of video streams from a plurality of camera sensors (104), wherein the plurality of camera sensors (104) operatively coupled to one or more processors (202), wirelessly coupled to the plurality of camera sensors (104) through a network (106), the one or more processors (202) coupled with a memory (204), wherein said memory (204) stores instructions executed by the processor (202); extracting, by an artificial intelligence (AI) engine (214), a first set of attributes from each video stream, the first set of attributes pertaining to an image associated with a combination of region of interest (ROI) and a buffer zone, wherein the AI engine (214) is associated with the one or more processors (202); based on the extracted first set of attributes, correlating, by the AI engine (214), the first set of attributes extracted with a knowledgebase having an image background of the ROI and the buffer zone; and, detecting, by a detection processing module (216), presence of the one or more intruders based on the correlation of the first set of attributes and the knowledgebase, wherein the detection processing module (216) is associated with the one or more processors (202).
16. The method as claimed in claim 15, wherein the method further comprises the step of: receiving, by an input module associated with the AI engine (214) a first set of parameters from a user, said first set of parameters associated with the plurality of camera sensors, wherein the first set of parameters pertain to directional parameters of the plurality of camera sensors to direct the plurality of camera sensors towards the ROI and the buffer zone.
17. The method as claimed in claim 15, wherein on detection of the one or more intruders (102), the method further comprises the step of: alerting the entity (114) about a possible intrusion at the ROI and the buffer zone.
18. The method as claimed in claim 15, wherein the method further comprises the step of: calculating, by an online multi-frame logic (OML) module associated with the AI engine (214), an instantaneous velocity of the one or more intruders (102).
19. The method as claimed in claim 15, wherein the method further comprises the steps of generating, by the detection processing module (216), a bounding box for each intruder detected; sending, by the detection processing module (216), the bounding box for each intruder to the online multi-frame logic module (218).
20. The method as claimed in claim 19, wherein the method further comprises the steps of linking, by the OML module (218), one or more detections associated with the bounding box for each intruder (102) across a plurality of video frames received from the plurality of camera sensors; determining, by the OML module (218), missed one or more detections across the plurality of video frames; based on the determination of missed one or more detections, improving, by the OML module (218), true positives associated with the one or more detections and reduce the number of false positives associated with the missed one or more detections; and measuring, by the OML module (218), velocity of each intruder enabling more control and have additional criteria for raising alarms.
21. The method as claimed in claim 15, wherein the method further comprises the steps of Instantly detecting, by the AI engine (214), one or more intruders, even when only a part of the body of the one or more intruders is present in the ROI; and Logging, by the one or more processors, a plurality of stats such as alarm stats, camera stats, and individual tubelets associated with the detection of the one or more intruders.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0029] The accompanying drawings, which are incorporated herein, and constitute a part of this invention, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that invention of such drawings includes the invention of electrical components, electronic components or circuitry commonly used to implement such components.
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038] The foregoing shall be more apparent from the following more detailed description of the invention.
BRIEF DESCRIPTION OF INVENTION
[0039] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
[0040] The present invention provides a robust and effective solution to an entity or an organization by enabling them to implement a system for facilitating intruder detection that is highly accurate especially for enhancing critical security zones. The intruder detection system (IDS) can provide high true positives and extremely low false positives. The system may be developed using techniques from artificial intelligence and computer vision and is focused on highly optimised intruder detection with equal attention paid to both accuracy and speed. The area of use is highly diverse and can be used mainly for surveillance and security purposes, be it right from outdoor areas such as perimeter walls, campuses to indoor areas such as malls, factory floors, and the like.
[0041] Referring to
[0042] More specifically, the exemplary architecture (100) includes a system (110) equipped with an artificial intelligence (AI) engine (216) for facilitating detection of the intruders (102). The system (110) can be operatively coupled to the camera sensors (104) configured to detect presence of any intruder. In an exemplary embodiment, the camera sensors (104) can include one or more cameras, webcams, RGB cameras, IR cameras but not limited to the like. The system (110) can receive a plurality of video streams from the plurality of camera sensors (104). The plurality of video streams may include the predefined vicinity of the entity (114), the predefined vicinity may include a region of interest (ROI) and a buffer zone.
[0043] The system (110) may include a database (210) that may store a knowledgebase having background of the ROI and the buffer zone. The computing device (108) may be communicably coupled to the centralized server (112) through the network (106) to facilitate communication therewith. The computing device (104) may be operatively coupled to the centralised server (112) through the network (106). In an embodiment, the AI engine (216) of the system (110) may extract a first set of attributes from each video stream pertaining to an image associated with a combination of the ROI and the buffer zone. Based on the extracted first set of attributes, correlate the first set of attributes extracted with the knowledgebase having an image background of the ROI and the buffer zone. The system (110) may further be configured to detect, by using a detection processing module (216), presence of one or more intruders based on the correlation of the first set of attributes and the knowledgebase.
[0044] In an embodiment, an input module may be associated with the AI engine (214) and may receive from a user (116) a first set of parameters associated with the plurality of camera sensors (104). In an embodiment, the first set of parameters pertain to directional parameters of the plurality of camera sensors to direct the plurality of camera sensors towards the ROI and the buffer zone.
[0045] In an embodiment, on detection of the intruders, the system (110) may alert the entity (114) about a possible intrusion at the detected ROI and the buffer zone.
[0046] In an exemplary embodiment, the one or more intruders may include any moving object, a person, an animal, an immobile object, any suspicious object but not limited to the like.
[0047] In an exemplary embodiment, the AI engine (216) may process a plurality of video frames received from a plurality of camera sensors at real time.
[0048] In an embodiment, the AI engine (216) may further include an online multiframe logic (OML) module with instantaneous velocity calculation. In an exemplary embodiment, the detection processing module generates a bounding box for each intruder detected and may be then send to the online multiframe logic module to link detections across a plurality of video frames received from the plurality of camera sensors. As such, missed detections across the plurality of frames may be avoided which further may improve true positives and also reduces the number of false positives. The multiframe logic module may further be configured to measure velocity of the intruder enabling more control and have additional criteria for raising alarms.
[0049] In an exemplary embodiment, the buffer zone functionality may be provided by the AI engine (216) for instant detections even when only a part of the body may be present in the region of interest. It is not necessary for the entire or majority of the intruder's body to be inside the ROI. The buffer zone frames may also be sent to the OML for further processing.
[0050] In an exemplary embodiment, the system (110) may be configured to be agnostic to the type of camera feed being input to the system (110). The input can be RGB or IR or a combination thereof. In each case, we have a feed checking and model switching module which helps in robust intruder detection.
[0051] In an exemplary embodiment, the system (110) may log a plurality of stats such as alarm stats, camera stats, individual tubelets and the like. Logging is important for investigating any undesirable/unexpected behaviour by the code. For example, alarm stats may include all of the alarms raised and details about the API call such as: the timestamp at which the alarm was called, response time and message, a snapshot of the frame which triggered the alarm and the like.
[0052] In a way of example and not as a limitation, for the camera stats, each camera may include FPS that may be logged every 2 minutes but not limited to it. If FPS drops below at least 20, a warning may be logged that is not dependent on 2 min timer but is instantaneous. Every 2 minutes stream status may be logged, offline or online.
[0053] In a way of example and not as a limitation, for the Individual Tubelets, each detected tubelet may be individually stored in each respective folder. Each folder may have all of the frames with bounding boxes drawn over.
[0054] In an exemplary, the logging process may be at least two step logging process. For example, in case of detection where one or more intruders are detected in the frame, the frame may be saved in the disk with a frame_id of the frame in the name and the detection may be stored in a text file. A secondary script may read the text file and generates the final tubelet folders. This ensures that the log generation may be independent of the main detection process thus any failures/lag in logs does not affect the main process.
[0055] In an exemplary embodiment, the system may perform a volume-based detection instead of a line based detection but not limited to the like.
[0056] In an embodiment, the computing device (108) may communicate with the system (110) via set of executable instructions residing on any operating system, including but not limited to, Android™, iOS™, Kai OS™ and the like. In an embodiment, computing device (108) may include, but not limited to, any electrical, electronic, electro-mechanical or an equipment or a combination of one or more of the above devices such as mobile phone, smartphone, virtual reality (VR) devices, augmented reality (AR) devices, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, a smart TV, a Set Top Box (STB) or any other computing device, wherein the computing device may include one or more in-built or externally coupled accessories including, but not limited to, a visual aid device such as camera, audio aid, a microphone, a keyboard, input devices for receiving input from a user such as touch pad, touch enabled screen, electronic pen and the like. It may be appreciated that the computing device (104) and/or the user device (120) may not be restricted to the mentioned devices and various other devices may be used. A smart computing device may be one of the appropriate systems for storing data and other private/sensitive information.
[0057] In an exemplary embodiment, a network (106) may include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth. A network may include, by way of example but not limitation, one or more of: a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a public-switched telephone network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, some combination thereof.
[0058] In another exemplary embodiment, the centralized server (112) may include or comprise, by way of example but not limitation, one or more of: a stand-alone server, a server blade, a server rack, a bank of servers, a server farm, hardware supporting a part of a cloud service or system, a home server, hardware running a virtualized server, one or more processors executing code to function as a server, one or more machines performing server-side functionality as described herein, at least a portion of any of the above, some combination thereof.
[0059] In an embodiment, the system (110) may include one or more processors coupled with a memory, wherein the memory may store instructions which when executed by the one or more processors may cause the system to detect intruders in a vicinity of an entity.
[0060] In an embodiment, the system (110)/centralized server (112) may include an interface(s) 204. The interface(s) 204 may comprise a variety of interfaces, for example, interfaces for data input and output devices, referred to as I/O devices, storage devices, and the like. The interface(s) 204 may facilitate communication of the system (110). The interface(s) 204 may also provide a communication pathway for one or more components of the system (110) or the centralized server (112). Examples of such components include, but are not limited to, processing engine(s) 208 and a database 210.
[0061] The processing engine(s) (208) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) (208). In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) (208) may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing engine(s) (208) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) (208). In such examples, the system (110)/centralized server (112) may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to the system (110)/centralized server (112) and the processing resource. In other examples, the processing engine(s) (208) may be implemented by electronic circuitry.
[0062] The processing engine (208) may include one or more engines selected from any of a data acquisition engine (212), an artificial intelligence engine (214), a detection processing module (216), an online multiframe logic (OML) module (218) and other engines (220). The other engines may include, an alert generation module and other processing modules.
[0063]
[0064] In an exemplary embodiment, the input module (302) may have the following functionalities: [0065] The details of the camera such as IP address, and the like may be input by a user in the frontend. [0066] Post this, the user may select the region of interest and the buffer zone from the camera feed. [0067] The details may be then sent to the detection processing module after the camera has been onboarded and is ready for running the detection of intruders.
[0068] In an exemplary embodiment, the detection processing module (304) may perform the following functionalities: [0069] The frames from one or more camera sensors may be sent to the pre-processing component. [0070] The pre-processing component makes the frames ready in the format to be input to the AI-based detection model and then batched together. [0071] First, it is checked whether the feed being input is either RGB or IR using feed checking sub-module which decides whether the frames need to be sent either to the RGB or IR intruder detection AI model. [0072] Once the frames are passed to the AI model, detection outputs in the form of bounding boxes and confidence scores may be obtained in case an intruder is present in the region of interest. The outputs may be also run through a post processing algorithm after which the outputs are sent to the alarm raising module.
[0073] In an exemplary embodiment, the Alarm Raising module may provide the following functionalities: [0074] The main component of the Alarm raising module is the online multiframe logic (OML) submodule which decides whether an alarm can be raised or not. [0075] For each onboarded camera, a new instance of OML is created. [0076] Every detection output is passed to this instance and the results are aggregated only if the detection lies in the buffer zone/Region of interest. [0077] Based on the calculations/algorithm performed on the aggregated detection results, an alarm raising request for the intruder is generated.
[0078]
[0079]
[0080] At 436, the detection processing module may include, sending detection outputs to alarm raising module. The alarm raising module is responsible for taking the detection outputs from the detection processing module and passing it to the online multiframe logic (OML) submodule. If OML suggests that it is an actual intruder detection, an alarm request is raised. If at 434, intruder is not detected then at 438, returning to receive frames from the camera.
[0081]
[0082] The feed type checking submodule may include at 444, calculating mean R, G, B channels and getting new frames at 448. If at 450, the difference between mean and original value is greater than a threshold value, then at 448, RGB model may be used. If the difference between mean and original value is less than the threshold value, then at 452, IR model may be used. For example, during daytime, the cameras stream RGB frames and some cameras switch to IR to capture objects at low light. At least two different models may be used to detect intruders in these two different conditions. The system may automatically detect if the video stream switch to IR and may start detecting with the IR model and switch back to RGB and start detecting with the RGB model.
[0083] In an exemplary embodiment, the two models follow the same network structure but may be trained differently to get the best accuracy on the different use cases. The two models may be loaded into the GPU when the streams are added. There will be a thread running for each stream, which checks for every 5 seconds whether frames that are being streamed are IR or RGB. It changes the model if the RGB shifts to IR or IR shifts to RGB. The feed type is checked by comparing the mean of the channel values to each of the channel value.
[0084]
[0085] In an embodiment, the OML (460) may at 462 obtain new detections and match new detections with existing tubelets at 464. The OML may further at 470 Insert new detections to matched tubelets if the tubelet match was found at 466. If the tubelet match was not found at 466, then the OML may create a new tubelet at 468. The OML at 472 may then filter the tubelets based on age and confidence. At 474 the OML may perform tubelet level linking and fuse the tubelets and then at 474, raise an alarm. In a way of example and not as a limitation, the algorithm for OML may be given by [0086] 1: Match the new detections with existing tubelets [0087] 2: Insert new detections to matched tubelets [0088] 3: For the detections that don't have a match start a new tubelet [0089] 4: ≥If mean velocity>=velocity—thresh then [0090] ⇐ has moved <=True [0091] 6: end if [0092] 7: Delete all tubelets that are: [0093] 8: Older than n frames [0094] 9: Have mean conf.<mean conf threshold [0095] 10: Perform Tubelet-level Bbox Linking [0096] 11: Fuse matched tubelets
[0097] An algorithm for matching new detections with existing tubelets may be given by
TABLE-US-00001 Input: I bboxes detected in previous frame and J bboxes detected in current frame Distance_matrix=create_matrix (I,J) For i=1 to I do For j=1 to J do Set distance_matrix (I,j) to distance between I th bbox of previous frame of previous frame and jth bbox of current frame End for End for Set pairs to empty list Repeat Set I,j to line, column of minimum value in distance_matrix Add (I,j) to paira Set i-th line of distance_matrix to infinity Set j-th column of distance_matrix to infinity Until minimum value of distance_matrix is infinity Output: pairs
[0098] In an exemplary implementation, the following TABLE highlights the tubelet data structure
TABLE-US-00002 Variable data type Description tubelet id string Unique ID to identify a tubelet. DateTimeof the first detection is used as tubelet ID. bbox list list of list A list of all detected bounding boxes in the format: ([x1, y1, x2, y2]). x1, y1 are the pixel coordinates of the top left corner of the bounding box. x2, y2 are the pixel coordinates of the bottom right corner of the bounding box. The last detection is used to link new detected bounding boxes mean conf float A running average of confidence to avoid drastic change dues to one detection having very low or very high confidence. first frame id int The frame number from which the tubeletstarts. last frame id int The frame number of the last frame fromin the tubelet. current velocity float Current instantaneous velocity calculatedover past n frames(configurable) The velocity is scale invariant as it is calculated as a proportion of the diagonal ofthe bounding box has moved boolean A boolean thats changed to true if the current velocity crosses a threshold. Once set to true the variable is never again unset.
[0099] In an embodiment, a distance metric used in above algorithm, reciprocal of Intersection over Union a metric commonly used in segmentation. It is scale invariant thus works for both further and closer objects:
[0100] In another embodiment, once new detections are matched with existing tubelets those detections have to be inserted into the respective tubelets and properties for the tubelets have to be updated. Every time a new detection is inserted to a tubelet the following values are modified: [0101] bbox list: The new detection is appended at the end of bbox list mean conf: The mean confidence is updated with:
[0105] In an embodiment, the IDS Detectors may be prone to get false positives in poor lighting or in areas of vegetation. While lot of these are not consistent enough to create a consistent tubelets in some areas some small objects might resemble a person. Usually, these small objects are stationary. One way to reduce these False positive casesis to measure movement for the object.
[0106] In an exemplary embodiment, instantaneous velocity calculation may be used to remove false positives.
[0107] For example, velocity is measured as L.sub.2 norm of mean of instantaneous velocity over past n frames (eq. 4). Here instantaneous velocity is a vector. This is important as a stationary object might have slight changes in centers from time to time thus might accumulate motion, which is not desirable. In case of vectors these slight changes with cancel out. The instantaneous velocity is measured in X and Y directions as pixel distances between two consecutive frames, normalized by the pixel length of the diagonal of latest detection bounding box. The normalization makes the velocity scale invariant, this allows to have one velocity threshold and range for objects at any distances.
[0108] Detectors commonly miss some intermediate frames of detections. This could cause tubelet links to be broken. To avoid this older tubelets are checked for match with current running tubelets and are fused. As shown in the
[0109] In an exemplary embodiment, to identify such tubelets that can be of the same object(person), then taking all existing tubelets in the memory and for atl tubelet pairs t.sub.i, t.sub.j and check if the last detection from t.sub.i matches with the first detection of tubelet t.sub.j. It is ensured that last frame id(t.sub.i) first frame id(t.sub.j). And that the difference between both the frame ids is within n, a configurable number.
[0110] Suppose if match between two tubelets t.sub.i and t.sub.j is found the tubelets have to be fused to come a single tubelet. Suppose last frame id(t.sub.i)<first frame id(t.sub.j) the following updates would be made to t.sub.i: [0111] bbox list: The intermediate frame detections are stored in a list, inter_bbox_list_of_size:first_frame_id(tj)−last_frame id(ti). The elements in the list are filled linearly interpolating from last detection of ti and last detection of tj as follows:
The three boxes are then appended one after the other to form one complete list:
bbox list(t.sub.i)=bbox list(ti)+inter bbbox list+box list(tj) (10) [0112] mean conf: The mean confidence is updated with:
last frame id(t)=last frame id(t.sub.j) (12) [0115] current_velocity: Current velocity [0116] has moved: is set to True is either of has moved(ti) or has moved(tj) is set to True.
[0117] In an exemplary embodiment, after tubelet insertion, itering and Fusion is done. Each of the tubelet is checked if it is in ROI AND has moved by checking has_moved variable) AND if tubelet length greater than or equal to threshold.
[0118] In an exemplary embodiment, the raise an alarm module may maintain a list of timers for each tubelets is maintained and each tubelet sends a alarm signal after its timer runs out. The timer is then reset again thus raising alarms at regular intervals.
[0119]
[0120] In an exemplary embodiment, the drone delivery system (504) may be based on the bounding box coordinates obtained from the human detection system, the actual geographical coordinates may be calculated. For example, using the four bounding box coordinates, the latitude and longitudinal information of the intruder may be estimated based on the latitude and longitude range of the area covered by the field of view of the camera. Once the real-world coordinates are calculated, an alarm will be sent to the drone delivery system to dispatch a drone fitted with a camera. This drone upon reaching the fixed location of the intruder will perform facial recognition analysis.
[0121] In an exemplary embodiment, the face recognition system (506), may be activated by the drone delivery system (504. For example, the drone delivery system (504) upon reaching the fixed location of the intruder will perform facial recognition analysis. The steps for facial recognition may include [0122] a) Performing face detection based on face key points. [0123] b) Obtaining the crop of the face detected. [0124] c) Using this crop and run a search on the intruder database. [0125] d) If a match is found, raising an alarm with the name of the intruder, else adding the face of the intruder as a new entity in the database. [0126] e) Additionally, searching every intruder in a global checklist for high level security targets.
[0127] The AI based human detection system (502 may be based on a CSP approach which scales both up and down and is applicable to small and large networks while maintaining optimal speed and accuracy. The next task in hand was to choose and build datasets for the specific task. While certain datasets have a large number of images, most of them have a close up view of a person/group of people. Thus, a Dataset created from videos from actual location video footage was taken. The dataset was created to train, test and benchmark the model on how well it performs on detecting intruders accurately. Some videos were collected using cameras at RCP at different locations.
[0128] The dataset is further classified into three categories based on bounding box size of the person/object in the frame. There is a text file corresponding to each image. In each text file, there are n lines representing n bounding boxes in that corresponding image (n is some non-negative integer). For example, each line contains 5 values separated by spaces <class id><x center><y center><width><height>. These are then normalized to the images height and width. Class id is 0 for every bounding box because there is only one class which is “person”.
[0129] In an exemplary embodiment, a detailed training scheme was prepared and at least 80-90 combinations were trained. The training models include an RGB model and an IR model.
[0130] In an exemplary embodiment, the RGB model may be trained for 300 epochs in pytorch format with adam as optimizer and with a resolution of 1280*1280 and following are the hyperparameters [0131] Ir0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3) [0132] momentum: 0.937 # SGD momentum/Adam beta1 [0133] weight_decay: 0.0005 # optimizer weight decay 5e-4 [0134] giou: 0.05 # GIoU loss gain [0135] cls: 0.5 # cls loss gain [0136] cls_pw: 1.0 # cls BCELoss positive_weight [0137] obj: 1.0 # obj loss gain (scale with pixels) [0138] obj_pw: 1.0 # obj BCELoss positive_weight [0139] iou_t: 0.20 # IoU training threshold [0140] anchor_t: 4.0 # anchor-multiple threshold [0141] fl_gamma: 0.0 # focal loss gamma [0142] hsv_h: 0.015 # image HSV-Hue augmentation (fraction) [0143] hsv_s: 0.7 # image HSV-Saturation augmentation (fraction) [0144] hsv_v: 0.4 # image HSV-Value augmentation (fraction) [0145] degrees: 0.0 # image rotation (+/−deg) [0146] translate: 0.5 # image translation (+/− fraction) [0147] scale: 0.5 # image scale (+/−gain) [0148] shear: 0.0 # image shear (+/−deg) [0149] perspective: 0.0 # image perspective (+/− fraction), range 0-0.001 [0150] flipud: 0.0 # image flip up-down (probability) [0151] fliplr: 0.5 # image flip left-right (probability) [0152] mixup: 0.0 # image mixup (probability)
[0153] Other modes of training were also done using multiscale training, rectangular training and SGD as optimizer. But maximum Map is attained by above mentioned parameters and hyperparameters.
[0154] In the IR model, images may be converted to grayscale format from RGB and trained with above mentioned hyperparameters. In an embodiment, the IR model was trained for 300 epochs with a resolution of 1280*1280 but not limited to it.
[0155] Additionally, other salient features of the model are: [0156] The model supports real time day and night intruder detection up to 70 m but not limited to the like. [0157] Support for multiple intruder profiles
[0158]
[0159]
[0160]
[0161] In an exemplary embodiment, the user can select the region of interest while adding the streams to the database and for each stream the coordinates of the ROI region may be stored and will be used to raise alarms for intrusions and track the intruder. These coordinates will be stored in a database for each stream and they can be modified later too. For example, the coordinates may be stored as a list of x,y tuples but not limited to the like. So, if there are n coordinates, there will be a list of length n and each tuple represents a coordinate of the polygon selected on the frame. While performing detections across frames, the person's movement is tracking while the intruder is entering or exiting the region of interest. Alarms will be raised in real time when the person enters the region of interest and once the person moves out, the alarm will be raised again if the person enters the region.
[0162]
[0163]
[0164]
[0169] All the steps are done serially for each of the two images in the batch and once the images are pre-processed, they may be stacked along the batch axis for inference.
[0170] In an exemplary implementation, apart from all the pixels except for the region of interest and buffer zone in the frame are blacked out by the system before processing the frame. This feature helps the system reduce false positives detection rate. Using the coordinates of ROI and the virtual buffer zone that are stored in the database, a mask will be applied over the frame to blackout all the pixels that are not in the ROI and the virtual buffer zone. Each stream has different ROI coordinates and virtual buffer zone. So blacked out frames have different shapes and sizes for different streams.
[0171]
[0172]
[0173] In an embodiment, this generated data was then used to train models for taking detection rate for profiles like crawling from 30% to 98%.
[0174]
[0175] Bus 820 communicatively couples processor(s) 870 with the other memory, storage and communication blocks.
[0176] Optionally, operator and administrative interfaces, e.g. a display, keyboard, and a cursor control device, may also be coupled to bus 820 to support direct operator interaction with a computer system. Other operator and administrative interfaces can be provided through network connections. Components described above are meant only to exemplify various possibilities. In no way should the aforementioned exemplary computer system limit the scope of the present disclosure.
[0177] Thus, the present disclosure provides a unique and inventive solution for real time detection of intruders. As soon as an intruder approaches the area of interest, the real-time intruder detection system may detect the intruder and subsequently raise an alarm for intrusion. The alarm for intrusion may contain the relevant geographical coordinates. The geographical co-ordinates may be sent to the relevant drone delivery system. After the coordinates are processed by the drone, the co-ordinates may be dispatched to the concerned area where the intrusion has taken place. The camera may be attached to the drone that will observe the face of the intruder and run a facial recognition algorithm to store these details in the database. Along with this, a subsequent search mechanism will be run to check if the intruder is present in the list of any previous intruders.
[0178] While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter to be implemented merely as illustrative of the invention and not as limitation.
[0179] A portion of the disclosure of this patent document contains material which is subject to intellectual property rights such as, but are not limited to, copyright, design, trademark, IC layout design, and/or trade dress protection, belonging to Jio Platforms Limited (JPL) or its affiliates (herein after referred as owner). The owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.
ADVANTAGES OF THE PRESENT DISCLOSURE
[0180] The present disclosure provides for a system and a method for facilitating an online multi-frame logic submodule that enables increase in the true positives and reduce the false positives helping in receiving high accuracy for such critical high security use-cases.
[0181] The present disclosure provides for a system and a method for an online/real-time working submodule which helps in avoiding any latency for intruder detection.
[0182] The present disclosure provides for a system and a method for facilitating a buffer zone for instant detections that ensures that there is no need for a large part of the intruder's body to be a part of the Region of Interest and instead, by using the buffer zone we can raise alerts if a small part of the body enters the region of interest.
[0183] The present disclosure provides for a system and a method for facilitating a Feed type checking submodule that enables the intruder detection system to be completely agnostic to the camera feed type (RGB or IR), without any manual intervention.
[0184] The present disclosure provides for a system and a method for avoiding false alarms from Pressure sensor in outside houses and provide more accuracy with an on line camera—X Y direction.