VISION SYSTEM FOR OBJECT DETECTION, RECOGNITION, CLASSIFICATION AND TRACKING AND THE METHOD THEREOF

20230145405 · 2023-05-11

    Inventors

    Cpc classification

    International classification

    Abstract

    The present invention relates to a method 100 for object detection (140), recognition, classification and tracking using a distributed networked architecture comprising one or more sensor units (20) wherein the image acquisition and the initial feature extraction are performed and a gateway processor (30) for further data processing. The present invention also relates to a vision system (10) for object detection (140) wherein the method may be implemented, to the devices of the vision system (10), and to the algorithms implemented in the vision system (10) for executing the method acts.

    Claims

    1-17. (canceled)

    18. A method of object detection, identification and localization, the method including acts of: acquiring an image from a camera; generating a pre-processed image by performing image pre-processing of the said acquired image; detecting and identifying an object in the pre-processed image using a computer vision detection algorithm; localizing the object; wherein localizing the object includes approximating a distance of the detected object to the camera.

    19. The method of claim 18, wherein the acts are performed on a single image.

    20. The method of claim 18, further comprising an act of: extracting a feature on the detected and identified object using a computer vision data feature extraction algorithm (DFE algorithm) and generating a reduced dataset comprising extracted data features.

    21. The method of claim 18, wherein the act of approximating the distance is performed by: acquiring a pixel object distance of the detected object; and comparing the pixel object distance with tabulated physical object height(s) and tabulated camera parameter(s).

    22. The method of claim 20, wherein the act of the approximating the distance is performed by: acquiring a pixel object distance of the detected object from the reduced dataset; and comparing the pixel object distance with a tabulated physical object height and a tabulated camera parameter.

    23. The method of claim 18, being performed in a sequence and further comprising an act of motion tracking the localized object.

    24. The method of claim 20, further comprising an act of approximating an object-camera angle between a feature point in the feature and a center point in a feature plane that is parallel to an image plane of the camera.

    25. The method of claim 24, wherein the acts are performed on a single image.

    26. The method of claim 24, further comprising an act of combining the approximation of the distance and the approximation of angle to improve the localization of the object.

    27. The method of claim 18 performed by acquiring images from multiple cameras.

    28. The method of claim 24, wherein the angle approximation may be used on one object, using two sensors with overlapping fields of view and triangulation, for a more precise object location.

    29. The method of claim 27 further including acts of approximating a first object-camera-distance to a detected object in a first preprocessed image, approximating a second object-camera-distance to a detected object in a second pre-processed image, where the first pre-processed image captures a first scene, and the second preprocessed image captures a second scene which completely or partly overlaps the first scene, and using the first and second object-camera-distances to validate that the detected object in the first and second pre-processed image is the same object.

    30. The method of claim 27 further including an act of estimating an orientation of the object.

    31. The method of claim 27, further including an act of self-calibration based on at least two approximated distances.

    32. The method of claim 24, further including an act of self-calibration based on at least two approximated angles.

    33. The method of claim 27, further including an act of self-calibration based on at least two approximated angles.

    34. The method of claim 27, further including an act of time-synchronization of acquiring a plurality of images.

    35. The method of claim 27, further including an act of spatial coordination of acquiring cameras by deducing relative geometries of the cameras from their pixel correspondence.

    36. A sensor unit configured to perform the acts of claim 18.

    37. The sensor unit according to claim 36 further comprising sensor communication means arranged for transmitting detected, identified and localized object data.

    38. A vision system comprising one or more sensor units according to claim 36.

    Description

    DESCRIPTION OF THE DRAWING

    [0128] FIG. 1 illustrates one embodiment of the method for object detection.

    [0129] FIG. 2 illustrates another embodiment of the method for object detection.

    [0130] FIG. 3 illustrates one embodiment of the method acts of image pre-processing.

    [0131] FIG. 4 illustrates another embodiment of the method for object detection.

    [0132] FIGS. 5A and 5B illustrate one embodiment of parameters and a method for estimating the object-camera distance.

    [0133] FIGS. 6A and 6B illustrate one embodiment of object tracking.

    [0134] FIGS. 7A and 7B illustrate two embodiments of the vision system.

    [0135]

    TABLE-US-00001 Detailed Description of the Invention No Item 10 Vision system 20 Sensor unit 22 Sensor communication means 24 Camera 26 Pre-processor means 28 Camera parameter 30 Gateway processor 32 Gateway communication means 40 Management server 42 Object data 50 Computer program product 52 Computer-readable medium 60 Acquired image 62 Full-frame image 64 Sub-frame image 70 Pre-processed image 80 Reduced dataset 90 Detected object 92 Pixel object height 94 Physical object height 96 Object-camera distance 97 Object-camera angle 100 method 110 acquiring 112 performing 114 transmitting 116 receiving 118 obtaining 120 generating 122 feeding 124 comparing 126 approximate 130 Pre-processing 140 object detection 142 Object feature 150 Object recognition 160 Object tracking 180 Object classification 190 Data feature extraction (DFE) 192 extracted data features 210 Computer vision detection algorithm 220 computer vision DFE algorithm 240 Machine learning algorithm 242 Machine learning model

    [0136] FIG. 1 illustrates one embodiment of the method 100 for object detection 140. The method 100 comprises a number of acts. In connection with some of the steps intermediate products are illustrated. The method 100 is illustrated by a dotted line surrounding the method acts. The method acts are likewise illustrated by dotted lines. The intermediate products are illustrated by solid lines as are the units in which the acts are performed. The units include a sensor unit 20 comprising a camera 24 and a gateway processor 30 comprising gateway communication means 32. The camera 24 acquires 110 an image 60. A method act of performing 112 image pre-processing 130 is performed on the acquired image 60 thereby obtaining a pre-processed image 70. The pre-processing is performed using the pre-processor means 26.

    [0137] The pre-processed image 70 is used for performing 112 object detection 140. The object detection 140 is performed using a computer vision detection algorithm 210. In another method act of performing 112 data feature extraction 190 a reduced dataset 80 is generated. The data feature extraction 190 is performed using a computer vision DFE algorithm 220. The pre-processed image 70, information from the performed object detection 140, and object features 142 are used in the computer vision DFE algorithm 220 to generate the reduced dataset 80 comprising extracted data features 192. The reduced dataset 80 is transmitted 114 from the sensor unit 20 to the gateway processor 30 using the sensor communication means 22. Optionally object features 142 may also be transmitted to the gateway processor 30 either as separate date or comprised in the reduced dataset 80. In the gateway processor 30, the reduced dataset 80 is received 116 using the gateway communication means 32.

    [0138] FIG. 1 further illustrates an embodiment wherein the gateway processor 30 is configured with a machine learning model 242 configured to execute a machine learning algorithm 240 comprising instructions to cause the gateway processor 30 to execute the act of the method of performing object recognition. The reduced dataset 80, extracted data features 192 and optionally object features 142 are fed 122 into the machine learning model 242.

    [0139] The gateway processor 30 and the sensor unit(s) 20 may each comprise a computer program product 50 comprising instructions, which, when executed by a computer, may cause the computer to carry out one or more of the illustrated method acts.

    [0140] The gateway processor 30 and the sensor unit(s) 20 may each comprise a computer-readable medium 52 comprising instructions which, when executed by a computer, may cause the computer to carry out one or more of the illustrated method acts. FIG. 2 illustrates another embodiment of the method 100 for object detection 140. Aspects from the previous FIG. 1 may also pertain to the details disclosed in this embodiment. The difference between the two embodiments concerns the use of the object features 142. In this embodiment, the object features 142 are transmitted 114 to the gateway processor 30 for further processing or analyzing. In the gateway processor 30, the reduced dataset 80 and the object features 142 are received 116 using the gateway communication means 32.

    [0141] FIG. 2 further illustrates an embodiment wherein the gateway processor 30 is configured with a machine learning model 242 configured to execute a machine learning algorithm 240 comprising instructions to cause the gateway processor 30 to execute the act of the method of performing object recognition. The reduced dataset 80, extracted data features 192, and object features 142 are fed 122 into the machine learning model 242.

    [0142] One embodiment of the method acts of image pre-processing 130 is illustrated in FIG. 3. The method acts are performed 112 on the acquired image 60. The method acts are illustrated by dotted lines. The intermediate products are illustrated by solid lines. In the pre-processing 130 the acquired image 60 is received as a full-frame image 62. One or more sub-frame images 64 are obtained 118 within the full-frame image 62. The full-frame image 62 is thus divided into a number of sub-frame images 64. For this embodiment the full-frame image 62 is divided into four sub-frame images 64. The sub-frame images may be defined by a set of sub-frame boundaries. The sub-frames may be generated such that the sub-frame boundaries of the different sub-frame images overlap. One or more of the sub-frame images 64 may be further preprocessed for generating 120 a pre-processed image 70.

    [0143] FIG. 4 illustrates one embodiment of an act which may be performed in the gateway processor 30 performing object recognition 150, object classification 180 and/or object tracking 160 by feeding 122 the reduced dataset 80 into a machine learning model 242. The machine learning model 242 may execute a machine learning algorithm 240 adapted to perform object recognition 150, object tracking 160 and/or object classification 180 based on the reduced dataset 80. The act may be comprised in the method 100 as an additional act. Especially the illustrated act may be inserted in the embodiments illustrated in FIGS. 1 and 2.

    [0144] FIG. 5B illustrates an embodiment of method acts which may be performed in the sensor unit 20 and/or in the gateway processor 30 estimating the object-camera distance 96. Estimating the object-camera distance 96 may be performed by acquiring 110 a pixel object height 92 of a detected object 90 from the reduced dataset 80. A further act of comparing 124 the pixel object height 92 with tabulated physical object height(s) 94 and tabulated camera parameter(s) 28 may be performed to approximate 126 the distance of the detected object(s) 90 to the camera 24 being the object-camera distance 96.

    [0145] FIG. 5A illustrates the heights and distances used in the method. An image 60 is acquired by the camera 24 in the sensor unit 20. The sensor unit may be defined by tabulated camera parameter(s). The tabulated camera parameter(s) may be stored in the sensor unit 20 or in the gateway processor 30. The detected object 90 in this embodiment is illustrated to be a cup, which is only an example and should literally only be perceived as such. The tabulated physical object height(s) 94 for a cup may be the distance from the bottom of the cup to the point where the upperpart of the handle is connected to the cup itself The pixel object height 92 of the cup is acquired from the reduced dataset 80. As this distance may be a distinct feature for detecting a cup, this may be comprised in the reduced dataset 80. From the two heights, the distance 96 from the camera to the object may be approximated. Here the distance is illustrated as the centre point of the sensor to the centre point of the cup. Other distances may be used e.g. from the camera lens to the closest point of the cup facing the camera.

    [0146] One embodiment of object tracking is illustrated in FIG. 6. The object tracking may also be referred to as feature tracking as the object tracking may be performed by tracking object features 142. In this embodiment, the detected object 90 to be tracked is a face. FIG. 6A illustrates an acquired image 60 in which three faces are present. FIG. 6B illustrates the acquired image 60 or full-frame image 62 comprising a sub-frame image 64. The sub-frame image may be one amongst several sub-frame images comprised in the full-frame image. The sub-frame image 64 is pre-processed such that a pre-processed image 70 is obtained, and in which the detected object 90 is a face. The face may be detected as a face or as a collection of features such as eyes, nose, mouth etc. For the object tracking, the object features 142 may be used. The object features in the illustrated embodiment is illustrated by X's and are here chosen as the corners of the mouth, two points on the forehead and the cheeks. Using the object features instead of the face as the objects to be tracked has the effect that when the face is turned, e.g. by 90 degrees, some of the object features are still visible in the image, whereas the face for detection is no longer completely visible. This may be advantageous in regard to improved detection of the object, even when it is rotated or is partly covered by another object partly covering the object to be tracked.

    [0147] The object tracking may thus be performed by tracking object features 142. The object tracking may be performed by performing only a minor degree of analyzing of the subsequent sub-frame images where only the object features are tracked and the sub-frame image is not analysed for new objects. For the subsequent full-frame images the other sub-frame images may be successively analysed.

    [0148] Using object features for tracking may aid for a further use of the method and the vision system. The object features may reveal the mood of a person by estimating the distance from the eyes to the mouth corners, a change in eye size, the change in the position of the shoulders to mention a few features which may be used.

    [0149] One embodiment of the use of the vision system 10 is illustrated in FIG. 7A. Seven sensor units 20 are placed in a room imaging different scenes. The illustrated embodiment is a meeting taking place in the room where seven persons x1-x7 participate. The seven participants are placed around a table. The room is illustrated with a top view as seen from e.g. the ceiling.

    [0150] This embodiment illustrates the use of multiple sensor units. The illustration shows how one or more persons may be imaged by multiple sensor units each imaging a scene different from the scenes of the other sensor units. Person x4 is illustrated to be imaged by five sensor units. In the case where x4 is placed to face the table, he is imaged from the back, the side, frontally and semi-frontally. This embodiment may illustrate the item in the description of the invention referred to as Mitigation of doublets.

    [0151] This illustrated embodiment may have the effect of mitigating the appearance of doublets of objects when the reduced datasets are further analysed after being transmitted from the sensor units, thereby increasing the quality and the robustness of the vision system 10.

    [0152] The embodiment in FIG. 7A further illustrates a vision system comprising a gateway server 30 and a management server 40, where the sensor unit 80 transmits reduced datasets 80 to the gateway server 30, and object data 42 are transmitted from the gateway server 30 to the management server 40.

    [0153] Furthermore, FIG. 7A illustrates an embodiment wherein the gateway processor 30 is configured with a machine learning model 242 configured to execute a machine learning algorithm 240 comprising instructions to cause the gateway processor 30 to execute the act of the method of performing object recognition.

    [0154] Another embodiment of the use of the vision system 10 is illustrated in FIG. 7B. For this embodiment, only the placement of the sensor units 20 is illustrated. The remaining parts of the system and use hereof are as illustrated in FIG. 7A.

    [0155] The room in FIG. 7B is illustrated with a side view as seen from e.g. a wall. Here two sensor units 20 are placed in a room imaging different scenes with a field of view of each sensor unit being apart from each other i.e. not overlapping fields of view.