OBJECT CLASSIFICATION WITH CONTENT AND LOCATION SENSITIVE CLASSIFIERS
20210357750 · 2021-11-18
Inventors
- Chaithanya Kumar Mummadi (Heimsheim, DE)
- Anna Khoreva (Stuttgart, DE)
- Kaspar Sakmann (Wien, AT)
- Kilian Rambach (Stuttgart, DE)
- Piyapat Saranrittichai (Korntal-Muenchingen, DE)
- Volker Fischer (Renningen, DE)
Cpc classification
G06V10/454
PHYSICS
G06F18/254
PHYSICS
G06F18/241
PHYSICS
International classification
Abstract
A system and method are provided for classifying objects in spatial data using a machine learned model, as well as a system and method for training the machine learned model. The machine learned model may comprise a content sensitive classifier, a location sensitive classifier and at least one outlier detector. Both classifiers may jointly distinguish between objects in spatial data being in-distribution or marginal-out-of-distribution. The outlier detection part may be trained on inlier examples from the training data, while the presence of actual outliers in the input data of the machine learnable model may be mimicked in the feature space of the machine learnable model during training. The combination of these parts may provide a more robust classification of objects in spatial data with respect to outliers, without having to increase the size of the training data.
Claims
1. A computer-implemented method for training a machine learnable model for classification of objects in spatial data, wherein the objects are classifiable into different object classes by combining content information and location information contained in the spatial data, the method comprising the following steps: accessing training data, the training data including instances of spatial data, the instances of the spatial data including objects belonging to different object classes; providing the machine learnable model, wherein the machine learnable model includes a convolutional part for generating one or more feature maps from an instance of spatial data, and a content classification part and a location classification part; generating, as part of the training of the machine learnable model, a content information-specific feature map by removing location information from the one or more feature maps, wherein the location information characterizes spatial arrangements of parts of the objects, and training the content classification part on the content information-specific feature map; generating a location information-specific feature map by removing content information from the one or more feature maps, wherein the content information characterizes presence of parts composing the objects, and training the location classification part on the location information-specific feature map; providing, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data; and generating, as part of the training of the machine learnable model, a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, wherein modifying the one or more previously generated feature maps includes at least one of: removing feature information from the previously generated feature maps, pseudo-randomly shuffling locations of the feature information in the previously generated feature maps; mixing the feature information between feature maps of different object classes; swapping the feature information at different locations in the previously generated feature maps, and training the outlier detection part on the pseudo outlier feature map.
2. The method according to claim 1, wherein the machine learnable model includes a location-and-content outlier detection part, and wherein the method further comprises generating the pseudo outlier feature map for the location-and-content outlier detection part by modifying the feature information which is contained in the one or more previously generated feature maps and which is associated with both the location information and the content information.
3. The method according to claim 1, wherein the machine learnable model includes a location outlier detection part, and wherein the method further comprises generating the pseudo outlier feature map for the location outlier detection part by modifying the feature information which is contained in the one or more previously generated feature maps and which is associated with the location information.
4. The method according to claim 3, wherein the location outlier detection part is implemented by the location classification part by providing the pseudo outlier feature map to the location classification part as part of a separate outlier object class to be learned.
5. The method according to claim 1, wherein the machine learnable model includes a content outlier detection part, and wherein the method further comprises generating the pseudo outlier feature map for the content outlier detection part by modifying the feature information which is contained in the one or more previously generated feature maps and which is associated with the content information.
6. The method according to claim 1, wherein each one of the one or more feature maps generated by the convolutional part each has at least two spatial dimensions associated with the location information and wherein feature values of the one or more feature maps at each respective spatial coordinate together form a feature vector representing content information at the respective spatial coordinate, and wherein: the removing of the location information from the one of the one or more feature maps includes aggregating the one or more feature maps over the spatial dimensions to form a content information-specific feature map comprising one feature vector; the removing of the content information from the one of the one or more feature maps includes aggregating the feature values per spatial coordinate over the one or more feature maps to form the location information-specific feature map having at least two spatial dimensions and one feature value channel.
7. The method according to claim 1, wherein the machine learnable model is a deep neural network, wherein the convolutional part is a convolutional part of the deep neural network and wherein the content classification part and the location classification part are respective classification heads of the deep neural network.
8. A computer-implemented method for classifying objects in spatial data, wherein the objects are classifiable into different object classes by combining content information and location information contained in the spatial data, the method comprising the following steps: accessing a machine learned model, wherein the machine learned model is a machine learnable model trained by: accessing training data, the training data including instances of spatial data, the instances of the spatial data including objects belonging to different object classes, providing the machine learnable model, wherein the machine learnable model includes a convolutional part for generating one or more feature maps from an instance of spatial data, and a content classification part and a location classification part, generating, as part of the training of the machine learnable model, a content information-specific feature map by removing location information from the one or more feature maps, wherein the location information characterizes spatial arrangements of parts of the objects, and training the content classification part on the content information-specific feature map, generating a location information-specific feature map by removing content information from the one or more feature maps, wherein the content information characterizes presence of parts composing the objects, and training the location classification part on the location information-specific feature map, providing, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data, and generating, as part of the training of the machine learnable model, a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, wherein modifying the one or more previously generated feature maps includes at least one of: removing feature information from the previously generated feature maps, pseudo-randomly shuffling locations of the feature information in the previously generated feature maps, mixing the feature information between feature maps of different object classes, swapping the feature information at different locations in the previously generated feature maps, and training the outlier detection part on the pseudo outlier feature map; accessing first input data, the first input data including an instance of spatial data, the instance of the spatial data including an object to be classified; applying the convolutional part of the machine learned model to the first input data to generate one or more first feature maps; generating a first content information-specific feature map by removing location information from one of the one or more first feature maps, and applying the content classification part to the first content information-specific feature map to obtain a content-based object classification result; generating a first location information-specific feature map by removing content information from one of the one or more first feature maps, and applying the location classification part to the first location information-specific feature map to obtain a location-based object classification result; applying the outlier detection part to one or more previously generated first feature maps which are generated for the instance of the spatial data, to obtain an outlier detection result; and classifying the object in the spatial data in accordance with the content-based object classification result, the location-based object classification result and the outlier detection result, wherein the classifying includes classifying the first input data in accordance with an object class when the content-based object classification result and the location-based object classification result both indicate the object class and when the outlier detection result does not indicate a presence of an outlier.
9. The method according to claim 8, wherein the training of the machine learnable model is performed before using the machine learned model to classify the objects in the spatial data.
10. A non-transitory computer-readable medium on which is stored a computer program for training a machine learnable model for classification of objects in spatial data, wherein the objects are classifiable into different object classes by combining content information and location information contained in the spatial data, the computer program, when executed by a computer, causing the computer to perform the following steps: accessing training data, the training data including instances of spatial data, the instances of the spatial data including objects belonging to different object classes; providing the machine learnable model, wherein the machine learnable model includes a convolutional part for generating one or more feature maps from an instance of spatial data, and a content classification part and a location classification part; generating, as part of the training of the machine learnable model, a content information-specific feature map by removing location information from the one or more feature maps, wherein the location information characterizes spatial arrangements of parts of the objects, and training the content classification part on the content information-specific feature map; generating a location information-specific feature map by removing content information from the one or more feature maps, wherein the content information characterizes presence of parts composing the objects, and training the location classification part on the location information-specific feature map; providing, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data; and generating, as part of the training of the machine learnable model, a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, wherein modifying the one or more previously generated feature maps includes at least one of: removing feature information from the previously generated feature maps, pseudo-randomly shuffling locations of the feature information in the previously generated feature maps; mixing the feature information between feature maps of different object classes; swapping the feature information at different locations in the previously generated feature maps, and training the outlier detection part on the pseudo outlier feature map.
11. A system for training a machine learnable model for classification of objects in spatial data, wherein the objects are classifiable into different object classes by combining content information and location information contained in the spatial data, the system comprising: an input interface configured to access training data, the training data including instances of spatial data, the instances of the spatial data including objects belonging to different object classes; a processor subsystem configured to: provide the machine learnable model, wherein the machine learnable model includes a convolutional part for generating one or more feature maps from an instance of spatial data, and a content classification part and a location classification part; generate, as part of the training of the machine learnable model, a content information-specific feature map by removing location information from the one or more feature maps, wherein the location information characterizes spatial arrangements of parts of the objects, and train the content classification part on the content information-specific feature map; generate a location information-specific feature map by removing content information from the one or more feature maps wherein the content information characterizes presence of parts composing the objects, and train the location classification part on the location information-specific feature map; provide, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data; and generate, as part of the training of the machine learnable model, a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, wherein modifying the one or more previously generated feature maps comprises at least one of: removing feature information from said feature maps; pseudo-randomly shuffling locations of feature information in said feature maps; mixing feature information between feature maps of different object classes; swapping feature information at different locations in said feature maps, and train the outlier detection part on the pseudo outlier feature map; and an output interface configured to output machine learned model data representing the machine learnable model after training.
12. A system for classifying objects in spatial data, wherein the objects are classifiable into different object classes by combining content information and location information contained in the spatial data, the system comprising: an input interface for accessing first input data, the first input data including an instance of spatial data, the instance of the spatial data including an object to be classified; a processor subsystem configured to: access a machine learned model, wherein the machine learned model is a machine learnable model trained by: accessing training data, the training data including instances of spatial data, the instances of the spatial data including objects belonging to different object classes, providing the machine learnable model, wherein the machine learnable model includes a convolutional part for generating one or more feature maps from an instance of spatial data, and a content classification part and a location classification part, generating, as part of the training of the machine learnable model, a content information-specific feature map by removing location information from the one or more feature maps, wherein the location information characterizes spatial arrangements of parts of the objects, and training the content classification part on the content information-specific feature map, generating a location information-specific feature map by removing content information from the one or more feature maps, wherein the content information characterizes presence of parts composing the objects, and training the location classification part on the location information-specific feature map, providing, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data, and generating, as part of the training of the machine learnable model, a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, wherein modifying the one or more previously generated feature maps includes at least one of: removing feature information from the previously generated feature maps, pseudo-randomly shuffling locations of the feature information in the previously generated feature maps, mixing the feature information between feature maps of different object classes, swapping the feature information at different locations in the previously generated feature maps, and training the outlier detection part on the pseudo outlier feature map; apply the convolutional part of the machine learned model to the first input data to generate one or more first feature maps; generate a first content information-specific feature map by removing location information from one of the one or more first feature maps, and apply the content classification part to the first content information-specific feature map to obtain a content-based object classification result; generate a first location information-specific feature map by removing content information from one of the one or more first feature maps, and apply the location classification part to the first location information-specific feature map to obtain a location-based object classification result; apply the outlier detection part to one or more previously generated first feature maps which are generated for the instance of the spatial data, to obtain an outlier detection result; classify the object in the spatial data in accordance with the content-based object classification result, the location-based object classification result and the outlier detection result, wherein the classifying includes classifying the first input data in accordance with an object class when the content-based object classification result and the location-based object classification result both indicate the object class and when the outlier detection result does not indicate a presence of an outlier.
13. The system according to claim 12, wherein the input interface is a sensor interface to a sensor, wherein the sensor is configured to acquire the spatial data.
14. The system according to claim 12, wherein the system is a control system configured to adjust a control parameter based on the classification of the object.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0043] These and other aspects of the present invention will be apparent from and elucidated further with reference to the embodiments described by way of example in the following description and with reference to the figures.
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0054] It should be noted that the figures are purely diagrammatic and not drawn to scale. In the figures, elements which correspond to elements already described may have the same reference numerals.
LIST OF REFERENCE NUMBERS
[0055] The following list of reference numbers is provided for facilitating the interpretation of the drawings and shall not be construed as limiting the present invention.
[0056] 20 sensor
[0057] 22 camera
[0058] 40 actuator
[0059] 42 electric motor
[0060] 60 physical environment
[0061] 80 (semi-)autonomous vehicle
[0062] 100 system for training machine learnable model
[0063] 160 processor subsystem
[0064] 180 data storage interface
[0065] 190 data storage
[0066] 192 training data
[0067] 194 data representation of machine learnable model
[0068] 196 data representation of machine learned model
[0069] 200 method of training machine learnable model
[0070] 210 accessing training data for training
[0071] 220 providing machine learnable model
[0072] 225 providing outlier detection part
[0073] 230 training the machine learnable model
[0074] 240 generating content information-specific feature map
[0075] 245 training content classification part
[0076] 250 generating location information-specific feature map
[0077] 255 training location classification part
[0078] 260 generating pseudo outlier feature map
[0079] 265 training outlier detection part
[0080] 212 accessing input data for inference
[0081] 220 training machine learnable model
[0082] 222 using machine learned model for inference
[0083] 230 providing state memory
[0084] 240 extracting previous internal state information
[0085] 250 updating state memory with current internal state
[0086] 300 object (person) classifiable by content and location information
[0087] 310 violations of content and location information
[0088] 320 fruits in facial arrangement of person
[0089] 330 elements of face with locations permuted
[0090] 340 randomly shuffled facial elements
[0091] 400 location (spatial arrangement) marginal
[0092] 410 content marginal
[0093] 420 in-distribution sample
[0094] 430 joint-out-of-distribution sample
[0095] 440 marginal-out-of-distribution sample
[0096] 450 full-out-of-distribution sample
[0097] 500 input instance
[0098] 510 location sensitive classifier
[0099] 512 feature aggregation
[0100] 514 H×W×1 feature map
[0101] 516 N+1 class classification
[0102] 520 content sensitive classifier
[0103] 522 spatial aggregation
[0104] 524 1×1×C feature map
[0105] 526 N-class classification
[0106] 528 outlier detection for content marginal distribution
[0107] 530 location-and-content outlier detection part
[0108] 532 flatten
[0109] 534 H×W×C feature map
[0110] 536 outlier detection for joint marginal distribution
[0111] 540 intermediate feature maps
[0112] 600 intermediate feature maps of inliers
[0113] 610 class A
[0114] 620 class B
[0115] 650 intermediate feature maps of pseudo outliers
[0116] 660 feature map generated by removal of information
[0117] 670 feature map generated by random location shuffle
[0118] 680 feature map generated by content-mixing between classes
[0119] 690 feature map generated by location swapping
[0120] 700 system for control or monitoring using machine learned model
[0121] 720 sensor data interface
[0122] 722 sensor data
[0123] 740 actuator interface
[0124] 742 control data
[0125] 760 processor subsystem
[0126] 780 data storage interface
[0127] 790 data storage
[0128] 800 method for classifying objects in spatial data
[0129] 810 accessing machine learned model
[0130] 820 accessing input data
[0131] 830 generating feature map(s)
[0132] 840 generating content information-specific feature map
[0133] 850 generating location information-specific feature map
[0134] 860 generating content-based object classification result
[0135] 870 generating location-based object classification result
[0136] 880 generating outlier detection result
[0137] 890 classifying object in spatial data
[0138] 900 computer-readable medium
[0139] 910 non-transitory data
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0140] The following provides with reference to
[0141]
[0142] As shown in
[0143] In some embodiments of the present invention, the data storage 190 may further comprise a data representation 194 of an untrained version of the machine learnable model which may be accessed by the system 100 from the data storage 190. It will be appreciated, however, that the training data 192 and the data representation 194 of the machine learnable model may also each be accessed from a different data storage, e.g., via a different subsystem of the data storage interface 180. Each subsystem may be of a type as is described above for the data storage interface 180. In other embodiments, the data representation 194 of the untrained version of the machine learnable model may be internally generated by the system 100, for example on the basis of design and/or architectural parameters for the machine learnable model, and therefore may not be explicitly stored in the data storage 190.
[0144] The system 100 may further comprise a processor subsystem 160 which may be configured to, during operation of the system 100 and as part of the training of the machine learnable model, generate a content information-specific feature map by removing location information from the one or more feature maps and training the content classification part on the content information-specific feature map; generate a location information-specific feature map by removing content information from the one or more feature maps and training the location classification part on the location information-specific feature map; provide, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data; and as part of the training of the machine learnable model, generate a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, and train the outlier detection part on the pseudo outlier feature map. It will be appreciated that these aspects of the operation of the system 100 will be further explained with reference to
[0145] The system 100 may further comprise an output interface for outputting a data representation 196 of the trained machine learnable model, this model also being referred to as a machine ‘learned’ model and its data also being referred to as trained model data 196. For example, as also illustrated in
[0146]
[0147] The following describes the classification of objects in spatial data with the example of images. It will be appreciated, however, that the measures described below may also be applied to other types of spatial data which may not directly be considered images.
[0148]
[0149] A standard classification network may be biased towards the spatial locations of elements and thus may also classify the second image 320, showing fruits placed in the facial arrangement of the first image 300, as a face. Alternatively, a standard classification network may be biased towards the elements themselves, e.g., the content, and thus classify the third image 330 with shuffled facial elements as a face. However, for a human observer, these images do not present a challenge. Namely, while humans would, to a certain degree, recognize all three images as faces, they would recognize that only the first image 300 shows a true face and the other images show only certain facial attributes.
[0150] The machine learnable/learned model as described in this specification represents a classification framework which may be empowered to mimic a human response. Namely, the classification framework may on the one hand recognize true faces and on the other hand may also detect an outlier face and provide an explanation of its nature, e.g., by containing true facial elements but in false locations. This ability to detect and interpret outliers while performing classification may be of great importance in many applications and for example may be used for active labeling to improve network generalization or for quality-control in manufacturing industries. In particular, the described classification framework may learn the feature representation to be specifically sensitive towards the content and location factors. Besides the standard classification, this may provide additional explanation about image ambiguities in terms of content and location.
[0151]
[0152] In this example, the two classes (faces and fruits) require the content and the location each to fall within a select marginal distribution, with the select marginal distributions then jointly define the object class. The marginal distributions are schematically indicated at the respective axis as respective graphs, with the graph 400 defining two marginal distributions in terms of location (spatial arrangement), namely a face-like arrangement and a vertical stack arrangement, and with the graph 410 defining two marginal distributions in terms of content, namely face features (e.g., nose, ears, eyes) and fruit features (e.g., banana, apple, cherry). Throughout this specification, the samples of which both factors fall within the respective marginal distributions and which then jointly define an object class will be referred to as ‘in-distribution’ samples, see reference numeral 420. These inliers belong to the two possible object classes: faces in which the content information falls within the ‘face feature’ marginal distribution and in which the location information falls within the ‘face-like arrangement’ marginal distribution, and fruits in which the content information falls within the ‘fruit feature’ marginal distribution and in which the location information falls within the ‘vertical stack arrangement’ marginal distribution. All other samples may be referred to as ‘out-of-distribution samples’ or ‘outliers’, see reference numerals 430-450. The out-distribution samples as a group may be further segregated into three distinct parts:
[0153] 1. The joint-out-of-distribution samples 430 for which both factors are in their marginal distributions but their association is wrong, e.g., a sample having the content of one class but the location of another class (regions with dashed borders in
[0154] 2. The marginal-out-of-distribution samples 440 for which one factor is out of its marginal distribution and the other is in its marginal distribution (striped regions in
[0155] 3. The full-out-of-distribution samples 450 for which all factors are out of their marginal distributions (dotted regions in
[0156] In general, the structure of the out-distribution space may be more complex, e.g., with more factors besides content and location, but for the sake of simplicity, the two-factor case is considered, which leads to the three types of the outliers described above.
[0157] As will also be described with reference to
[0158] However, both classifiers may not be able to distinguish the other two out-of-distribution types, namely the joint-out-of-distribution and full-out-of-distribution samples. To also distinguish the joint-out-of-distribution case, a separate outlier detector may be provided in the classification framework. To train this outlier detector, hard negative examples may be generated in the feature space by mixing information from different samples. This ‘joint’ outlier detector may now detect samples for which both content and location information is a valid factor for a class, but which are wrongly associated. For example, the face in the third image 330 in
[0159] The classifiers described above, which may also be referred to as ‘marginal-sensitive’ classifiers, may also provide high-confident classifications even in the full-out-of-distribution case. To address this, a further outlier detector may be added for each of the classifiers. To predict outliers, each outlier detector may be trained in a similar way as the respective classifier, namely by deliberately removing or masking certain information from the internal representation used as input to the respective classifier and by modifying the internal representation to mimic the presence of outliers in the input of the machine learnable model. The output of the outlier detectors may be used to distinguish in the classification outcome of the marginal-sensitive classifiers between inliers and the corresponding out-of-the-marginal outliers. Notably, the machine learnable model incorporating the above described classification framework may be trained using only in-distribution samples.
[0160] In some examples of the present invention, the machine learnable model may thus comprise at least two classifiers and three outlier detectors which together provide additional information about content and location factors of an image. More specifically, the output of each of the classifiers and the outlier detectors may be used to differentiate between different types of outliers and provide explanations for ambiguous cases. Such outputs may for example be logged or output to an operator, or may be used to take subsequent decisions. The following briefly summarizes the three different outlier detectors and their output when applied to the different types of input instances. It can be seen that each type of input instance may be uniquely identified by the combination of outputs of the respective outlier detectors:
TABLE-US-00001 Outlier Outlier Outlier detection detection detection for content for location for joint Input instance marginal marginal marginal Inlier No outlier No outlier No outlier Outlier in Outlier No outlier Outlier content marginal Outlier in No outlier Outlier Outlier location marginal Outlier in No outlier No outlier Outlier joint marginal Outlier in Outlier Outlier Outlier both marginal
[0161]
[0162] Content-sensitive classifier (CSC, 520): This classifier's decisions may be based upon the content of object elements/parts independent of their spatial location. Let the dimension of the input feature map F.sup.input 540 of the CSC-branch 520 be H×W×C, where H, W C are the height, width and channels of the feature map. Note that the spatial resolution H×W may capture spatial information and channels C may encode feature representations, and that such an input feature map of H×W×C may also be considered to represent a C-tuple of H×W×1 feature maps. Upon removing the spatial information from the channels, the content-sensitive classifier may be directed to respond to the features encoded in different channels irrespective of their spatial location. For that purpose, the spatial information H×W may be aggregated 522 across the channels C in order to remove the spatial information and allow the classifier to make its decisions based on the feature representation encoded in spatially aggregated channels F.sup.c 524 with size 1×1×C, which may elsewhere also be referred to as a content information-specific feature map. Spatial aggregation of F.sup.input at channel k may be formulated as: F.sub.k.sup.c=Σ.sub.i=1.sup.HΣ.sub.j=1.sup.WF.sub.ijk.sup.input.
[0163] Outlier detection for content marginal (ODC): Given that the CSC 520 may classify an input sample into one of the object classes, it may be desirable to provide an outlier detector to identify outliers of which the content lies outside of the marginal distribution. It may be desired to detect outliers which are unseen during the training while only using the inlier samples from the training data. The following explains how potential outliers may be generated and how the outlier detector may be trained in a self-supervised manner. Namely, hard negative examples of outliers may be generated by augmenting the intermediate feature maps F.sup.input 540 before the spatial aggregation 522. Here, the samples with the entire content present may be considered as inliers, whereas the samples with absent, incomplete and/or mismatched content may be considered as outliers. Such outliers may be generated in the feature map F.sup.input by removing a part of information or blending the content of samples from one class with the content of samples from another class. For example, the blending may be accomplished by replacing a patch of size h×w×C from the feature map F.sup.input (class1) with a same patch size from the feature map of a different class F.sup.input (not class1), where h<H and w<W. Information may for example be removed by setting all values in the patch to be 0. Such self-generated outliers in the feature space may be referred to as pseudo outliers F.sup.pseudo. These set of outliers may be generated in every training iteration and the outlier detector may be trained on both the inlier and pseudo outlier feature maps. Note that the CSC may be trained solely on the valid inlier training data, and that the pseudo outliers may be used only to train the outlier detector (ODC).
[0164] Location-sensitive classifier (LSC, 510): This classifier's decisions may be based on the spatial locations of the object's parts/elements but not their content. Consider the input feature map F.sup.input 540 of the LSC with dimensions H×W×C. The spatial resolution H×W may contain spatial information and channels C may encode feature representations. It may be desired to capture only the spatial information and thereby possibly discard the feature representations. Similar to the spatial aggregation in the CSC branch 520, feature aggregation 512 may be applied to the intermediate feature map F.sup.input 540 to integrate-out content information, resulting in the feature map F.sup.l 514 with dimensions H×W×1. The feature aggregation at every location i,j in F.sup.input may be formulated as: F.sub.ij.sup.l=Σ.sub.m=1.sup.c F.sub.ijm.sup.input, for i,j ∈ H×W. This feature aggregation may weaken the content representation but does not affect the spatial information. The classifier branch with this feature aggregation as a component (branch 510 in
[0165] Outlier detection for location marginal (ODL): This outlier detector may categorize the samples with unknown or incomplete spatial locations of object elements as outliers, as they may not correspond to the known location marginal distribution. Similar to ODC, the outlier detection may be trained in a self-supervised manner. For example, pseudo outlier samples may be generated from the aggregated feature map F.sup.l 514. As mentioned above, potential outliers are the samples with unknown or incomplete spatial arrangement. Such outlier samples may be generated from the feature map F.sup.l either by removing a part of information or randomly shuffling locations in the feature map. It was found that the ODL may not need to be implemented as a separate outlier detector but that the outlier detection may be implemented as an additional class in the classification branch 510. It is hypothesized that learning spatial arrangement may be ‘easier’ than learning distinct feature representations, and thus that the former may be included as an additional class in the classification branch, which is shown in .sub.pseudo outlier=−log(P.sub.outliers)+λ log(P.sub.class). Here, P.sub.outliers, P.sub.class are the outlier and inlier class probabilities. In a specific example, λ may be set to 0.05. Considering outliers as an additional class and computing the above loss may improve the outlier detection. This may not hold true for other outlier detectors, as initial empirical evidence appears to show.
[0166] Outlier detection for joint marginal distribution (ODJ): This outlier detector may treat the samples that share similar attributes to the training data as inliers and the rest as outliers. Similar to the ODC, the pseudo outliers may be generated from the intermediate feature map 540 with dimension H×W×C. In ODC, the pseudo outliers may be subject to spatial aggregation 522 and which may result in the loss of spatial information. On the other hand, ODL may simulate pseudo outliers from the aggregated feature map and may have no cues for the content information. Unlike these two cases, ODJ may be desired to be sensitive to both content and location marginals. Hence, the entire feature map 540 of dimension H×W×C may, after flattening 532, be used as input for outlier detection so that both content and spatial information persists. The augmenting strategy may be similar to the ODC, where a part of information may be removed or information blended between inter-class features. In addition to the above strategies, also any of the two locations with patch size h×w×C may be swapped within the same feature map. This additional example may allow the outlier detector to be sensitive to changes in any of the two marginals.
[0167]
[0168] With respect to the training, it is noted that the inlier examples from the training data and any generated pseudo outliers may be used to train the outlier detectors. All three outlier detectors and the two classifiers may be trained independently to each other, but may be trained in a same training session, e.g., in same or separate iterations of the training session. The following describes specific examples of training parameters and architecture/network parameters, but which are merely exemplary and entirely non-limiting.
[0169] In a specific example, both CSC and ODJ may be trained for 100 epochs starting with learning rate 0.001, with the learning rate being dropped by a factor of 0.1 after 80 epochs. It was found that LSC may be trained for fewer number of epochs than the other branches. In a specific example, it was sufficient to train only for 25 epochs with learning rate 0.001. In a specific example, a batch size 128 and Adam optimizer with no weight decay may be used. In a specific example, the network weights may be initialized with Xavier initialization. In a specific example, the mean square error loss and tan h activation function may be used for outlier detection in both CSC and ODJ. Here, the labels may also be flipped with a probability of 0.1 in every batch training, flipping the label of inlier samples as outliers and self-generated outlier samples as inliers to avoid network overfitting on inlier samples.
[0170] In a specific example, for CSC, training may be started only with the classification loss until epoch 8 while later including the loss for outlier detection. Such setting may pretrain the weights and stabilize the training for outlier detection. A classifier similar to CSC may be used in the ODJ branch to pretrain the weights until epoch 8 and later use both the classification loss and also the loss for outlier detection to stabilize rest of the training. The classifier including the outlier class in LSC may start training from scratch.
[0171] In a specific example, the architecture details of LSC may be as follows: input1c-conv16c-conv32c-conv64c-FeatureAggregation1c-conv1c-fc4. The kernel size of the convolutional layers may be [5,5,3,3] respectively. Here, input1c, conv16c, fc4 refers to input with 1 channel, convolutional layer with 16 feature maps and classifier with 4 classes (3 object classes+1 outlier class) respectively. In a specific example, the base architecture details of CSC may be as follows: input1c-conv16c-conv32c-conv64c-conv16c-SpatialAggregation16c-fc512-fc128. The classification and outlier heads are follows: fc128-fc3 and fc128-fc1. Here fc3 represents classifier for 3 class objects and fc1 refers to outlier detection neuron. In a specific example, the architecture details of ODJ may be as follows: input1c-conv16c-conv32c-conv64c-conv4c-reshape(4×16×16)-fc512-fc128-fc1. The classification loss to stabilize the training of ODJ may be as follows: fc128-fc3.
[0172] In a specific example, for the generation of outlier samples in all the three outlier branches during training, a patch size of either 3×3 or 5×5 may be chosen to remove or mix information from another class or swap the locations within the feature map as discussed elsewhere in this specification. In a specific example, instead of choosing the entire patch in a random manner, the center of the patch may be selected based on the highest activation location in the feature map. The feature map of size H×W×C may be aggregated along the channels to obtain a feature map of size H×W, which may then be normalized between [0,1] to treat it as a probability map to pick a location with probability p. In CSC and ODJ, the aggregation of channels may be performed only to pick a location with high activations but not need to be treated as any input in the network, whereas the feature map may already be aggregated in LSC. During the course of training, this scheme may enable the highest activation locations to be modified that would potentially result in outliers.
[0173] Experiments show that the measures describe in this specification may allow outliers in content marginal, location marginal and joint marginal distribution to be detected with high accuracy by training the machine learnable model (substantially) only on the in-distribution samples from the training data. The framework comprising the outlier detectors along with the classifiers may provide explicit cues for interpretable decision-making on unseen outlier samples. The following table illustrates the explicit cues for interpretable decision making on a few examples. As shown in this table, it is possible to interpret the type of outlier sample from the output of outlier detectors and class decision from the classifiers. When using the machine learned model on new input data, the system and/or method may provide the final inference as output or use the final inference in its final decision making.
TABLE-US-00002 Input example ODC ODL ODJ CSC LSC Final inference Ex. 1 Not Not Not Class A Class A In-distribution outlier outlier outlier Ex. 2 Not Not Outlier Class B Class A Joint-out- outlier outlier distribution Ex. 3 Outlier Not Outlier Class A Class A Outlier in outlier content, inlier in location
[0174]
[0175] The system 700 may further comprise a processor subsystem 760 which may be configured to, during operation of the system 700, apply the convolutional part of the machine learned model to the input data to generate one or more feature maps, generate a content information-specific feature map by removing location information from one of the one or more feature maps, and apply the content classification part to the content information-specific feature map to obtain a content-based object classification result. The processor subsystem 760 may be further configured to generate a location information-specific feature map by removing content information from one of the one or more feature maps, and apply the location classification part to the location information-specific feature map to obtain a location-based object classification result. The processor subsystem 760 may be further configured to apply the outlier detection part to one or more previously generated feature maps which are generated for the instance of the spatial data, to obtain an outlier detection result, and to classify the object in the spatial data in accordance with the content-based object classification result, the location-based object classification result and the outlier detection result, wherein said classifying comprises classifying the input data in accordance with an object class if the content-based object classification result and the location-based object classification result both indicate the object class and if the outlier detection result does not indicate a presence of an outlier.
[0176] In general, the processor subsystem 760 may be configured to perform any of the functions as previously described with reference to
[0177]
[0178] In some embodiments of the present invention, the system 700 may comprise an actuator interface 740 for providing control data 742 to an actuator 40 in the environment 60. Such control data 742 may be generated by the processor subsystem 760 to control the actuator 40 based on the classification result, as may be generated by the machine learned model when applied to the input data 722. For example, the actuator 40 may be an electric, hydraulic, pneumatic, thermal, magnetic and/or mechanical actuator. Specific yet non-limiting examples include electrical motors, electroactive polymers, hydraulic cylinders, piezoelectric actuators, pneumatic actuators, servomechanisms, solenoids, stepper motors, etc. Such type of control is described with reference to
[0179] In other embodiments of the present invention (not shown in
[0180] In general, each system described herein, including but not limited to the system 100 of
[0181]
[0182]
[0183] The method 800 is shown to comprise, in a step titled “ACCESSING MACHINE LEARNED MODEL”, accessing 810 a machine learned model as described elsewhere in this specification and in a step titled “ACCESSING INPUT DATA”, accessing 820 input data, the input data comprising an instance of spatial data, the instance of the spatial data comprising an object to be classified. The method 800 is further shown to comprise, in a step titled “GENERATING FEATURE MAP(S)”, applying the convolutional part of the machine learned model to the input data to generate 830 one or more feature maps, and in a step titled “GENERATING CONTENT INFORMATION-SPECIFIC FEATURE”, generating 840 a content information-specific feature map by removing location information from one of the one or more feature maps. The method 800 is further shown to comprise, in a step titled “GENERATING LOCATION INFORMATION-SPECIFIC FEATURE”, generating 850 a location information-specific feature map by removing content information from one of the one or more feature maps, and in a step titled “GENERATING CONTENT-BASED OBJECT CLASSIFICATION RESULT”, applying 860 the content classification part to the content information-specific feature map to obtain a content-based object classification result. The method 800 is further shown to comprise, in a step titled “GENERATING LOCATION-BASED OBJECT CLASSIFICATION RESULT”, applying 870 the location classification part to the location information-specific feature map to obtain a location-based object classification result, and in a step titled “GENERATING OUTLIER DETECTION RESULT”, applying 880 the outlier detection part to one or more previously generated feature maps which are generated for the instance of the spatial data, to obtain an outlier detection result. The method 800 is further shown to comprise, in a step titled “CLASSIFYING OBJECT IN SPATIAL DATA”, classifying 890 the object in the spatial data in accordance with the content-based object classification result, the location-based object classification result and the outlier detection result, wherein said classifying comprises classifying the input data in accordance with an object class if the content-based object classification result and the location-based object classification result both indicate the object class and if the outlier detection result does not indicate a presence of an outlier.
[0184] It will be appreciated that, in general, the operations or steps of the computer-implemented methods 200 and 800 of respectively
[0185] Each method, algorithm or pseudo-code described in this specification may be implemented on a computer as a computer implemented method, as dedicated hardware, or as a combination of both. As also illustrated in
[0186] Examples, embodiments or optional features, whether indicated as non-limiting or not, are not to be understood as limiting the present invention.
[0187] It is noted a system and method may be provided for classifying objects in spatial data using a machine learned model, as well as a system and method for training the machine learned model. The machine learned model may comprise a content sensitive classifier, a location sensitive classifier and at least one outlier detector. Both classifiers may jointly distinguish between objects in spatial data being in-distribution or marginal-out-of-distribution. The outlier detection part may be trained on inlier examples from the training data, while the presence of actual outliers in the input data of the machine learnable model may be mimicked in the feature space of the machine learnable model during training. The combination of these parts may provide a more robust classification of objects in spatial data with respect to outliers, without having to increase the size of the training data.
[0188] It should be noted that the above-mentioned embodiments illustrate rather than limit the present invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the present invention. Use of the verb “comprise” and its conjugations does not exclude the presence of elements or stages other than those stated. The article “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. Expressions such as “at least one of” when preceding a list or group of elements represent a selection of all or of any subset of elements from the list or group. For example, the expression, “at least one of A, B, and C” should be understood as including only A, only B, only C, both A and B, both A and C, both B and C, or all of A, B, and C. The present invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a device enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are described separately does not indicate that a combination of these measures cannot be used to advantage.