OBJECT IDENTIFICATION SYSTEM AND COMPUTER-IMPLEMENTED METHOD
20210256296 · 2021-08-19
Inventors
Cpc classification
G06V10/454
PHYSICS
G06F18/2433
PHYSICS
G06V10/26
PHYSICS
G06V20/52
PHYSICS
G06V10/75
PHYSICS
International classification
Abstract
An object identification system and computer implemented method are described. The system includes a classification database encoding data on each of a plurality of pre-classified objects, an imaging input interface configured to receive imaging data of an object from an imaging scanner, the imaging data including imaging data on internal components of the object, an imaging processor configured to receive the imaging data from the imaging input interface and to orient and scale the imaging data according to a predetermined grid reference to generate corrected image data, and a classifier configured to process the corrected image data to segment the image, the classifier being further configured to match the object to one of the pre-classified objects in the classification database in dependence on the segments of the image and on the encoded data in the classification database, the classifier being further configured to identify and output differences between one or more segments of the image and the matched pre-classified object.
Claims
1. An object identification system comprising: a classification database encoding data on each of a plurality of pre-classified objects an imaging input interface configured to receive imaging data of an object from an imaging scanner, the imaging data including imaging data on internal components of the object; a processor configured to execute computer program code for executing an image processing, including: computer program code configured to receive the imaging data from the imaging input interface and to orient and scale the imaging data according to a predetermined grid reference to generate corrected image data; a processor configured to execute computer program code for executing a classification system, including: computer program code configured to execute a classifier configured to process the corrected image data to segment the image, the classifier being further configured to match the object to one of the pre-classified objects in the classification database in dependence on the segments of the image and on the encoded data in the classification database, the classifier being further configured to identify and output differences between one or more segments of the image and the matched pre-classified object.
2. The object identification system of claim 1, further comprising a user interface configured to receive a designation of an object from a user, the designation corresponding to one of the pre-classified objects in the classification database, the classifier being configured to match the object to the designated pre-classified object and identify and output differences between the segments of the image and the designated pre-classified object.
3. The object identification system of claim 2, wherein the user interface is configured to receive a designation of a category, the classifier being configured to match the object to the pre-classified objects in the category and identify and output differences between the segments of the image and a closest pre-classified object.
4. The object identification system of claim 1, wherein upon an object not being matched to one in the database, the system is configured to apply the image data to a deep classification algorithm comprising a 3-layer architecture, the first two layers being configured to narrow the search space, the third layer comprising a convolutional neural network configured to show similarity between candidates in the narrowed search space and objects in the classification database.
5. The object identification system of claim 4, wherein the first and second layers are selected from classifiers including Hu Invariants Matching classifiers and Shape and Pixel Intensity Matching classifiers.
6. The object identification system of claim 4, wherein the third layer comprises a Siamese convolutional neural network.
7. The object identification system of claim 1, wherein the imaging data comprises imaging data for the object of differing energies, the classifier being further configured to subtract the images of the imaging data of corresponding energies and z effective to determine residual images.
8. The object identification system of claim 7, wherein the system is configured to obtain largest connected segments using the residual images and extract features therefrom.
9. The object identification system of claim 1, wherein the imaging data comprises imaging data from a high energy scan, imaging data from a low energy scan and z effective imaging data from a derived from the high and low energy scans, the system being configured to input the matched reference and trial device data into a trained residual convolutional neural network to predict residuals conducive of a threat.
10. The object identification system of claim 1, wherein the imaging data from the imaging scanner is 3-dimensional, the system being configured to flatten the imaging data into a 2d image prior to processing by the imaging processor and classifier.
11. The object identification system of claim 7, wherein the system is configured to determine summative and geometric features from the trial, matched reference, residual and LCC images, and input into a gradient boosting algorithm configured to determine the probability, based on the features, of the device being a threat.
12. A computer implemented object identification method comprising: encoding, in a classification database encoding data on each of a plurality of pre-classified objects receive at an imaging input interface imaging data of an object from an imaging scanner, the imaging data including imaging data on internal components of the object; orienting and scaling the imaging data by an imaging processor according to a predetermined grid reference to generate corrected image data; processing the corrected image data to segment the image, matching the object to one of the pre-classified objects in the classification database in dependence on the segments of the image and on the encoded data in the classification database, and identifying and outputting differences between one or more segments of the image and the matched pre-classified object.
13. The computer implemented method of claim 12, further comprising receiving, via a user interface, a designation of an object from a user, the designation corresponding to one of the pre-classified objects in the classification database, matching the object to the designated pre-classified object and identifying and outputting differences between the segments of the image and the designated pre-classified object.
14. The computer implemented method of claim 13, further comprising receiving, via the user interface, a designation of a category, matching the object to the pre-classified objects in the category and identify and output differences between the segments of the image and a closest pre-classified object.
15. The computer implemented method of claim 12, wherein upon an object not being matched to one in the database, applying the image data to a deep classification algorithm comprising an image segmentation based CNN which identifies segments of the device x-ray images (high, low, z effective) that contain features conducive of an particular substance such as an explosive or other substance(s) of interest.
16. The computer implemented method of claim 15, wherein the first and second layers are selected from classifiers including Hu Invariants Matching classifiers and Shape and Pixel Intensity Matching classifiers and the third layer comprises a Siamese convolutional neural network.
17. The computer implemented method of claim 12, wherein the imaging data comprises imaging data for the object of differing energies, the method further comprising subtracting the images of the imaging data of corresponding energies and z effective values to determine residual images.
18. The computer implemented method of claim 17, further comprising obtaining largest connected segments using the residual images and extracting features therefrom.
19. The computer implemented method of claim 17, wherein the imaging data comprises imaging data from a high energy scan, imaging data from a low energy scan and z effective imaging data derived from the high and low energy scans, the method further comprising inputting matched reference and scanned device images containing the high, low and z effective images into a trained convolutional neural network to predict residuals conducive of a threat.
20. The computer implemented method of claim 12, wherein the imaging data from the imaging scanner is 3-dimensional, the method comprising flattening the imaging data into a 2d image prior to processing by the imaging processor and classifier.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0038] Embodiments of the present invention will now be described by way of example only with reference to the accompanying drawings in which:
[0039]
[0040]
DETAILED DESCRIPTION
[0041]
[0042] The object identification system 10 includes an imaging input interface 20, an imaging processor 30, a classifier 40 and a classification database 50.
[0043] The classification database 50 encodes data on each of a plurality of pre-classified objects
[0044] The imaging input interface 20 is configured to receive imaging data of an object from an imaging scanner such as an X-ray scanner, gamma scanner, a CT scanner or the like. It passes the data to the imaging processor 30 which is configured to process the imaging data to orient and scale it to according to a predetermined grid reference, generating corrected image data.
[0045] The imaging processor 30 acts to ensure that imaging data is normalised and can be matched on a like-for like basis. It may take into account calibration information from the imaging scanner, knowledge of the imaging scanner type, brand etc. It may also consider the imaging data itself and apply image processing based on content identified in the image data and/or attributes of the image data to correct the image data for issues like distortion, rotation, scale, aperture.
[0046] Once the image data has been corrected, it is passed to a classifier 40. The classifier 40 may be a single computing system executing various processes discussed below or may be a series of systems that may be local or remote. The classifier 40 is configured to process the corrected image data to segment the image. The segments correspond to individual objects or object parts identifiable from the corrected image data. While in the ideal world all components would be separately segmented, it will be appreciated that this is dictated by what is discernible from the imaging data. Once segmented, the classifier 40 matches the object(s) to one of the pre-classified objects in the classification database. This is done in dependence on the segments of the image and on the encoded data in the classification database. Various ways of doing this are discussed below. If a match is found, the classifier identifies and outputs differences between the segments of the image and the matched pre-classified object. For example, if a component is missing or the battery replaced by something else, these would be alerted to the operator either visually by superimposing over the image of the object or else via an alarm, log file or other approach.
[0047] Where an object cannot be matched to one in the database 50, a deep classification algorithm described below may be used. The deep classification algorithm includes a classifier that has been trained on data containing features that the system should detect and is more accurate in classification than the approach initially taken by the classifier 40. It will, however, be appreciated that the deep classification algorithm could be operated in a stand-alone mode or in conjunction with other systems and without being limited to being used only on non-recognised objects.
[0048] In one embodiment, the database 50 encodes data on pre-classified objects including one or a number of scans of that device, name and optionally other data such as manufacturer, model, part number/code etc. These are reference images that the system can match to.
[0049]
[0050] A z effective image is an image where locations in the image are represented by an effective atomic number calculated from the low and high energy scans. An example of how this is calculated is set out in Calculation of Effective Atomic Number and Normal Density Using a Source Weighting Method in a Dual Energy X-ray Inspection System by Park et al, Journal of the Korean Physical Society, Vol. 59, No. 4, October 2011, pp. 2709-2713.
[0051] The processing of the classification system results in a match to a pre-classified object and absolute differences to that object are shown in the red highlighted section of
[0052] One example of the approach taken by the image processor 30 is set out below, although it will be appreciated that there are other image processing methods that could be applied.
[0053] Firstly, a mapping between the actual and ‘restored’ grid coordinates of the image data is identified. Once the mapping has been found, points in the image data are triangulated to form a mesh. Each triangle being used to find a local affine transformation. Pixel values are identified using bi-linear interpolation. http://www.sci.utah.edu/˜acoste/uou/Image/project3/ArthurCOSTE_Project3.pdf (the content of which is incorporated by reference) contains details on this approach.
[0054] This correction is applied to all images that are generated from this x-ray/CT device to ensure images are spatially correct. A standard pixel/spatial reference is set to achieve this.
[0055] This correction can be utilised on both 2d and 3d images to correct for distortion and scale. Essentially this spatially corrects images so that the object within can be matched effectively.
[0056] The classifier 40 segments objects from the corrected image data. One way to segment objects is described below, although it will be appreciated that other approaches could also be taken. To segment an object, the corrected image data is thresholded (binarised). Using the binarised image, connected segments are then identified.
[0057] Preferably, segments under a predetermined area size are eliminated so that particularly small areas such as small air gaps etc do not result in irrelevant segments. The rest of segments are considered to be devices or objects or components of devices or objects (all referred to as objects below).
[0058] Images of objects are extracted based on the coordinates of minimum enclosing rectangle that surrounds each segment. Object-level features are extracted and preferably each object is converted into a data object.
[0059] In one embodiment, this data object is a custom written python class to handle device image data, keypoints, and many other features. https://escholarship.org/uc/item/7jg5d1zn (the content of which is incorporated by reference) describes the algorithm to find segments.
[0060] This approach can be utilised on both 2d and 3d images. In the case of 3d, for best effectiveness, the image data preferably includes a “top down” view of the 3d image that is used to segment the object.
[0061] Preferably, the classifier 40 uses a 3-layer-approach. The first two layers narrow the search space for the third one which preferably is Siamese CNN. The first two layers are preferably: Hu Invariants Matching, Shape and Pixel Intensity Matching.
[0062] Weighted Hu Invariants are extracted from an image of the object and find 5 best matches from our reference set. In the case of x-ray, low and high energy scans may be used. In other scanning technologies, multiple scanning modalities may be used to capture multiple scan images. The same principle is used with Shape and Pixel Intensity Matching. Euclidean distance is then used as a similarity metric in both layers to select candidates to be passed to the second and third layer.
[0063] The classifier is arranged such that the third layer (CNN) receives only ˜10 candidates for identification. The convolutional neural network does not predict the object model but rather shows the similarity between the trial and the reference. The network outputs values between 0 and 1, where 1 demonstrates perfect similarity between trial and reference. Other feature extractors could also or alternatively be used including VGG16, ResNet, YOLO.
[0064] Preferably a threshold value is set. Below this value we consider the device to be unknown. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8124469 (the content of which is incorporated by reference) discusses the Siamese CNN. https://www.researchgate.net/publication/224146066_Analysis_of_Hu's_moment_invariants_on_image_scaling_and_rotation (the content of which is incorporated by reference) describes the theory behind Hu Invariant Moments.
[0065] Although the 3-layer approach described above is preferred, it will be appreciated that other approaches for classification are possible including only a Siamese CNN or only a standard CNN.
[0066] As indicated above, the 3-layer approach could be used independently, for example, without a reference database for threat identification on unknown objects. Such a system would operate as described above but without the attempt to classify the objects as a precursor. In independent operation, the segmented images derived from raw images of the scanning system are input to the first layer and a probability is output from the third layer.
[0067] As before, this step can be utilised on both 2d and 3d images. In the case of 3d, a “top down” view of the 3d image is used to identify the object. While top-down is mentioned here and below, it will be appreciated that other angles of scanning are possible.
[0068] Once the object has been identified, the classifier 40 aligns the image from the image data (the trial image) with the reference image. It may be rotated/scaled so that it is aligned with the reference. In a preferred embodiment, image alignment is based on SIFT keypoints. Alternatively, it may be based on matching the corner coordinates.
[0069] A transformation matrix is computed by matching the SIFT descriptors/corner coordinates.
[0070] Preferably, all the instances of the reference object in the database are used to find the best possible alignment. The best alignment is considered to have the minimum mean intensity of the residual image (reference—trial). https://www.cs.ubc.ca/˜lowe/papers/ijcv04.pdf (the content of which is incorporated by reference) describes SIFT algorithm. https://ieeexplore.ieee.org/docunnent/6396024 (the content of which is incorporated by reference) describes image alignment
[0071] This step is utilised to align 2d images however can also be utilised on a top view of a 3d dataset to align the images.
[0072] The classifier 40 then preferably subtracts the images of corresponding energies and z effective values An opening filter (erosion followed by dilation) is preferably applied with relatively small 5×5 kernel to remove noise. https://docs.opencv.org/trunk/d9/d61/tutorial_py_morphological_ops.html (the content of which is incorporated by reference) discusses postfiltering.
[0073] If the image is 3d this step may be applied by segmenting the image into slices and performing the same process on each slice.
[0074] Having obtained the residuals, a number of largest connected components (LCCs) are extracted.
[0075] First, the residual image is thresholded (binarised). An average threshold value is computed, preferably by several thresholding methods: Otsu's method, Li's Minimum Cross Entropy method, Ridler-Calvard method.
[0076] Once the binary images are obtained, using the same principle as in Device Segmentation the largest connected segments (LCCs) are obtained and features extracted from them. http://www.sciencedirect.com/science/article/pii/003132039390115D (the content of which is incorporated by reference) discusses minimum cross entropy thresholding https://en.wikipedia.org/wiki/Otsu%27s_method (the content of which is incorporated by reference) discusses Otsu's method https://ieeexplore.ieee.org/document/4310039 (the content of which is incorporated by reference) discusses Ridler-Calvard method
[0077] If the image is 3d this is applied by segmenting the image into slices and performing the same process on each slice.
[0078] Having done all the steps above and extracted device-level and LCC-level features, the system can predict whether a difference between the imaged object and the reference in the database is likely to be a threat.
[0079] From this point, a number of features are extracted which is dependent upon the threat detection process which can be either: [0080] GBM process [0081] RCNN process
GBM Process
[0082] In a preferred embodiment, a gradient boosting (GBM) algorithm is used to make a prediction (by outputting a value such as 0-1 or a percentage) as to if the device contains a threat or something which is benign.
[0083] In this process the following features are calculated (by performing geometric calculations) from the image in comparison to the reference: [0084] Features current device (device being scanned) [0085] Device area [0086] Convex area [0087] Eccentricity [0088] Equivalent diameter [0089] Euler number [0090] Extent [0091] Filled area [0092] Height [0093] Size [0094] Perimeter [0095] Max length [0096] Max width [0097] Min length [0098] Min width [0099] Solidarity [0100] Features from high energy image, low energy image and z effective image: [0101] Mean pixel value [0102] Standard deviation [0103] Features reference device (the matched device reference image) [0104] Device area [0105] Convex area [0106] Eccentricity [0107] Equivalent diameter [0108] Euler number [0109] Extent [0110] Filled area [0111] Height [0112] Size [0113] Perimeter [0114] Max length [0115] Max width [0116] Min length [0117] Min width [0118] Solidarity [0119] Features from high energy image, low energy image and z effective image: [0120] Mean pixel value [0121] Standard deviation [0122] Features from difference between reference and device image: [0123] Difference in terms of percentage from 1 to 10% (in increments—so 10 features for each image). [0124] Features from the largest connected component (the largest part of the image showing difference). On high, low and z effective difference images. [0125] Q [0126] Convex area [0127] Eccentricity [0128] Equivalent diameter [0129] Euler number [0130] Extent [0131] Filled area [0132] Height [0133] Hu moments [0134] Size [0135] Perimeter [0136] Max length [0137] Max width [0138] Min length [0139] Min width [0140] Solidarity
[0141] Once the features have be extracted, they input into the GBM which determines the probability, based on all the features, of the device being a threat. The output is a probability. The input features may be displayed within the GUI.
[0142] Like other boosting methods, gradient boosting combines “weak” learners into a single model in an iterative fashion. https://statweb.stanford.edu/˜jhf/ftp/trebst.pdf (the content of which is incorporated by reference) discusses Gradient Boosting https://en.wikipedia.org/wiki/Gradient_boosting
RCNN Process
[0143] In an alternative to GBM (or one that may be used in parallel), a residual based convolutional neural network (RCNN) may be used that accepts residuals of high, low and z effective images and outputs a value (0-1, or a percentage etc) on whether the object contains a threat or not.
[0144] In this process two tensors of shape (224, 224, 3) (Low, High, Z effective) rescaled to range [0,1] and input to the RCNN. One for the reference and another for the trial image.
[0145] The RCNN processes these inputs and directly outputs the threat area.
[0146] The output is Tensors of shape (224, 224, 1), which are predicted threat mask and class, which is an array of probabilities of a threat (0-1). These segments are normally classified by a specific threshold. The ratio (threat area to overall device size) or overall number of segments displayed above this threshold is typically used to classify if the device contains a threat or not.
[0147] Training:
[0148] Training involved using a dataset with x-ray scan images of devices that were known which were both benign and contained labelled threats. Each device image containing a threat was paired with one of the benign images of the same type and input and the labelled data was utilised to train the RCNN as to the correct result.
[0149] This RCNN is an CNN trained on images with the threat/modification area explicitly labelled. The residual is calculated for each images and, using the labelled data the CNN is trained on residual showing a threat. The preferred CNN model is VGG16 although other CNN architectures can also be trained in the same method.
[0150] If the image is 3d this is applied by inputting the same device level and LCC features albeit for each slice.
[0151] Optionally, both the GBM and RCNN processes may be operated in parallel and their results combined/compared.
[0152] Optionally, the system may also output the absolute difference between the reference device and the image of the object being analysed. The algorithm analyses the aligned images pixel by pixel value and generates the difference.
[0153] If the image is 3d this is applied by segmenting the image into slices and performing the same process on each slice.
[0154] The Deep classifier model/algorithm is a Fully Convolutional Network (FCN). Such architecture can efficiently learn to make dense predictions for per-pixel tasks like semantic segmentation. In this embodiment three segments are predicted: background, device and threat.
[0155] The deep classifier takes as an input of 3 images (1 high energy, 1 low energy and 1 z effective, of the segmented device). The CNN takes these and outputs a suspected threat area.
[0156] As a backbone model (for the purpose of feature extraction) we preferably utilize pre-trained model VGG16. It is a deep neural network trained on millions of images and designed for image classification task. The FCN network may be trained from images segmented using the above described process. The anomaly area and classification are provided by the known device process described above in which known good device/object scan images are provided as annotated training data.
[0157] Alternatively or additionally it may be trained using manually annotated images where the position of the threat is known. The pre-processing steps include the following: [0158] Segment the scan into a set of objects (as discussed above) [0159] Stack Low/High/Z-effective images of each device into a three-dimensional array [0160] Resize to 224×224×3 preserving the aspect ratio [0161] Normalize every channel to have values in range [0,1]
[0162] This model can be used for unknown objects or in conjunction with the main algorithm above. https://arxiv.org/pdf/1605.06211.pdf (the content of which is incorporated by reference) discusses Fully Convolutional Network (FCN) http://www.image-net.org/challenges/LSVRC/ (the content of which is incorporated by reference) discusses ImageNet Visual Recognition Challenge. https://neurohive.io/en/popular-networks/vgg16/ (the content of which is incorporated by reference) discusses VGG16 architecture.
[0163] If the image is 3d this is applied by inputting the entire 3 dimensional array of high, low and z-effective values in the FCN.
[0164] In the case of 3D imaging, such as is possible using CT machines, objects may be identified using Volumetric Object Recognition Using 3-D CNN. This produces a 3d array of density, in the form of a high energy and low energy array, within the identified object box. The 3d arrays for a given object box, containing for example a device such as a laptop is preferably then flattened into a 2d image and analysed in the same manor that 2d x-ray images are processed as discussed above.
[0165] Using the method, objects can be extracted from their environment (for example they may be scanned while within a bag, tray or other container that may also then be represented in the scan image). To achieve this, once identified, the object can be rotated and an array of its values can be extracted from the 3d array. The extracted 3d array is the flattened along its shortest edge, by taking the sum of all array values along the shortest axis.
[0166] The process for this is as follows: [0167] Electronic object identified in 3d space using Volumetric Object Recognition Using 3-D CNN. [0168] 3d array extracted of object within a “box”. [0169] Shortest dimension identified. [0170] All density values along the shortest dimension of the array are summed (to give and overall density for the object). [0171] This produces effectively a 2d density image of the object similar to a 2d x-ray image. [0172] This is analysed in the same way as present with 2d x-ray images above, for both known and unknown devices.
[0173] This process preferably takes place for low energy and high energy arrays generated by the CT device.
[0174] There are two models in which the system typically operates in, unsupervised and auto-review. In unsupervised mode, the software runs automatically and an alarm is raised if a threat is detected.
[0175] In auto-review mode, the user can set the software to auto review a threat decision, therefore negating the need for them to attend unless there is an issue after review. In which case an alarm is raised.
[0176] Any x ray/CT device can be calibrated for use in providing imaging data using a calibration tool. A scan is taken of a plate with holes of equal spacing. These scans are utilised to spatially correct the image as previously discussed above.
[0177] In one embodiment, the classification can be reviewed by a review panel. A scan can be selected for review by the user by selecting “review” button in a front-end user interface. It is then communicated to a review team. In one embodiment, the system may include a review parameter. This is the number of reviews a scan must pass through before a decision is given. If one review result is “Inconsistency found”, this is the result returned.
[0178] If an object image is of sufficient quality the reviewer can add the object to the reference database. All reviewers (set number as previously described) must approve the device for data basing before it is added.
[0179] It will be appreciated that the reference database, containing “known” devices, may take various forms including a central or distributed file store, database (such as SQL or other relational or non-relational database types). It may be implemented using storage devices such as hard disks, random access memories, solid state disks or any other forms of storage media. It will also be appreciated that the processor discussed herein may represent a single processor or a collection of processors acting in a synchronised, semi-synchronised or asynchronous manner.
[0180] It is to be appreciated that certain embodiments of the invention as discussed below may be incorporated as code (e.g., a software algorithm or program) residing in firmware and/or on computer useable medium having control logic for enabling execution on a computer system having a computer processor. Such a computer system typically includes memory storage configured to provide output from execution of the code which configures a processor in accordance with the execution. The code can be arranged as firmware or software, and can be organized as a set of modules such as discrete code modules, function calls, procedure calls or objects in an object-oriented programming environment. If implemented using modules, the code can comprise a single module or a plurality of modules that operate in cooperation with one another.
[0181] Optional embodiments of the invention can be understood as including the parts, elements and features referred to or indicated herein, individually or collectively, in any or all combinations of two or more of the parts, elements or features, and wherein specific integers are mentioned herein which have known equivalents in the art to which the invention relates, such known equivalents are deemed to be incorporated herein as if individually set forth.