OBJECT CLASSIFICATION WITH CONTENT AND LOCATION SENSITIVE CLASSIFIERS

Abstract

A system and method are provided for classifying objects in spatial data using a machine learned model, as well as a system and method for training the machine learned model. The machine learned model may comprise a content sensitive classifier, a location sensitive classifier and at least one outlier detector. Both classifiers may jointly distinguish between objects in spatial data being in-distribution or marginal-out-of-distribution. The outlier detection part may be trained on inlier examples from the training data, while the presence of actual outliers in the input data of the machine learnable model may be mimicked in the feature space of the machine learnable model during training. The combination of these parts may provide a more robust classification of objects in spatial data with respect to outliers, without having to increase the size of the training data.

Claims

1. A computer-implemented method for training a machine learnable model for classification of objects in spatial data, wherein the objects are classifiable into different object classes by combining content information and location information contained in the spatial data, the method comprising the following steps: accessing training data, the training data including instances of spatial data, the instances of the spatial data including objects belonging to different object classes; providing the machine learnable model, wherein the machine learnable model includes a convolutional part for generating one or more feature maps from an instance of spatial data, and a content classification part and a location classification part; generating, as part of the training of the machine learnable model, a content information-specific feature map by removing location information from the one or more feature maps, wherein the location information characterizes spatial arrangements of parts of the objects, and training the content classification part on the content information-specific feature map; generating a location information-specific feature map by removing content information from the one or more feature maps, wherein the content information characterizes presence of parts composing the objects, and training the location classification part on the location information-specific feature map; providing, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data; and generating, as part of the training of the machine learnable model, a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, wherein modifying the one or more previously generated feature maps includes at least one of: removing feature information from the previously generated feature maps, pseudo-randomly shuffling locations of the feature information in the previously generated feature maps; mixing the feature information between feature maps of different object classes; swapping the feature information at different locations in the previously generated feature maps, and training the outlier detection part on the pseudo outlier feature map.

2. The method according to claim 1, wherein the machine learnable model includes a location-and-content outlier detection part, and wherein the method further comprises generating the pseudo outlier feature map for the location-and-content outlier detection part by modifying the feature information which is contained in the one or more previously generated feature maps and which is associated with both the location information and the content information.

3. The method according to claim 1, wherein the machine learnable model includes a location outlier detection part, and wherein the method further comprises generating the pseudo outlier feature map for the location outlier detection part by modifying the feature information which is contained in the one or more previously generated feature maps and which is associated with the location information.

4. The method according to claim 3, wherein the location outlier detection part is implemented by the location classification part by providing the pseudo outlier feature map to the location classification part as part of a separate outlier object class to be learned.

5. The method according to claim 1, wherein the machine learnable model includes a content outlier detection part, and wherein the method further comprises generating the pseudo outlier feature map for the content outlier detection part by modifying the feature information which is contained in the one or more previously generated feature maps and which is associated with the content information.

6. The method according to claim 1, wherein each one of the one or more feature maps generated by the convolutional part each has at least two spatial dimensions associated with the location information and wherein feature values of the one or more feature maps at each respective spatial coordinate together form a feature vector representing content information at the respective spatial coordinate, and wherein: the removing of the location information from the one of the one or more feature maps includes aggregating the one or more feature maps over the spatial dimensions to form a content information-specific feature map comprising one feature vector; the removing of the content information from the one of the one or more feature maps includes aggregating the feature values per spatial coordinate over the one or more feature maps to form the location information-specific feature map having at least two spatial dimensions and one feature value channel.

7. The method according to claim 1, wherein the machine learnable model is a deep neural network, wherein the convolutional part is a convolutional part of the deep neural network and wherein the content classification part and the location classification part are respective classification heads of the deep neural network.

8. A computer-implemented method for classifying objects in spatial data, wherein the objects are classifiable into different object classes by combining content information and location information contained in the spatial data, the method comprising the following steps: accessing a machine learned model, wherein the machine learned model is a machine learnable model trained by: accessing training data, the training data including instances of spatial data, the instances of the spatial data including objects belonging to different object classes, providing the machine learnable model, wherein the machine learnable model includes a convolutional part for generating one or more feature maps from an instance of spatial data, and a content classification part and a location classification part, generating, as part of the training of the machine learnable model, a content information-specific feature map by removing location information from the one or more feature maps, wherein the location information characterizes spatial arrangements of parts of the objects, and training the content classification part on the content information-specific feature map, generating a location information-specific feature map by removing content information from the one or more feature maps, wherein the content information characterizes presence of parts composing the objects, and training the location classification part on the location information-specific feature map, providing, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data, and generating, as part of the training of the machine learnable model, a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, wherein modifying the one or more previously generated feature maps includes at least one of: removing feature information from the previously generated feature maps, pseudo-randomly shuffling locations of the feature information in the previously generated feature maps, mixing the feature information between feature maps of different object classes, swapping the feature information at different locations in the previously generated feature maps, and training the outlier detection part on the pseudo outlier feature map; accessing first input data, the first input data including an instance of spatial data, the instance of the spatial data including an object to be classified; applying the convolutional part of the machine learned model to the first input data to generate one or more first feature maps; generating a first content information-specific feature map by removing location information from one of the one or more first feature maps, and applying the content classification part to the first content information-specific feature map to obtain a content-based object classification result; generating a first location information-specific feature map by removing content information from one of the one or more first feature maps, and applying the location classification part to the first location information-specific feature map to obtain a location-based object classification result; applying the outlier detection part to one or more previously generated first feature maps which are generated for the instance of the spatial data, to obtain an outlier detection result; and classifying the object in the spatial data in accordance with the content-based object classification result, the location-based object classification result and the outlier detection result, wherein the classifying includes classifying the first input data in accordance with an object class when the content-based object classification result and the location-based object classification result both indicate the object class and when the outlier detection result does not indicate a presence of an outlier.

9. The method according to claim 8, wherein the training of the machine learnable model is performed before using the machine learned model to classify the objects in the spatial data.

10. A non-transitory computer-readable medium on which is stored a computer program for training a machine learnable model for classification of objects in spatial data, wherein the objects are classifiable into different object classes by combining content information and location information contained in the spatial data, the computer program, when executed by a computer, causing the computer to perform the following steps: accessing training data, the training data including instances of spatial data, the instances of the spatial data including objects belonging to different object classes; providing the machine learnable model, wherein the machine learnable model includes a convolutional part for generating one or more feature maps from an instance of spatial data, and a content classification part and a location classification part; generating, as part of the training of the machine learnable model, a content information-specific feature map by removing location information from the one or more feature maps, wherein the location information characterizes spatial arrangements of parts of the objects, and training the content classification part on the content information-specific feature map; generating a location information-specific feature map by removing content information from the one or more feature maps, wherein the content information characterizes presence of parts composing the objects, and training the location classification part on the location information-specific feature map; providing, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data; and generating, as part of the training of the machine learnable model, a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, wherein modifying the one or more previously generated feature maps includes at least one of: removing feature information from the previously generated feature maps, pseudo-randomly shuffling locations of the feature information in the previously generated feature maps; mixing the feature information between feature maps of different object classes; swapping the feature information at different locations in the previously generated feature maps, and training the outlier detection part on the pseudo outlier feature map.

11. A system for training a machine learnable model for classification of objects in spatial data, wherein the objects are classifiable into different object classes by combining content information and location information contained in the spatial data, the system comprising: an input interface configured to access training data, the training data including instances of spatial data, the instances of the spatial data including objects belonging to different object classes; a processor subsystem configured to: provide the machine learnable model, wherein the machine learnable model includes a convolutional part for generating one or more feature maps from an instance of spatial data, and a content classification part and a location classification part; generate, as part of the training of the machine learnable model, a content information-specific feature map by removing location information from the one or more feature maps, wherein the location information characterizes spatial arrangements of parts of the objects, and train the content classification part on the content information-specific feature map; generate a location information-specific feature map by removing content information from the one or more feature maps wherein the content information characterizes presence of parts composing the objects, and train the location classification part on the location information-specific feature map; provide, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data; and generate, as part of the training of the machine learnable model, a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, wherein modifying the one or more previously generated feature maps comprises at least one of: removing feature information from said feature maps; pseudo-randomly shuffling locations of feature information in said feature maps; mixing feature information between feature maps of different object classes; swapping feature information at different locations in said feature maps, and train the outlier detection part on the pseudo outlier feature map; and an output interface configured to output machine learned model data representing the machine learnable model after training.

12. A system for classifying objects in spatial data, wherein the objects are classifiable into different object classes by combining content information and location information contained in the spatial data, the system comprising: an input interface for accessing first input data, the first input data including an instance of spatial data, the instance of the spatial data including an object to be classified; a processor subsystem configured to: access a machine learned model, wherein the machine learned model is a machine learnable model trained by: accessing training data, the training data including instances of spatial data, the instances of the spatial data including objects belonging to different object classes, providing the machine learnable model, wherein the machine learnable model includes a convolutional part for generating one or more feature maps from an instance of spatial data, and a content classification part and a location classification part, generating, as part of the training of the machine learnable model, a content information-specific feature map by removing location information from the one or more feature maps, wherein the location information characterizes spatial arrangements of parts of the objects, and training the content classification part on the content information-specific feature map, generating a location information-specific feature map by removing content information from the one or more feature maps, wherein the content information characterizes presence of parts composing the objects, and training the location classification part on the location information-specific feature map, providing, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data, and generating, as part of the training of the machine learnable model, a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, wherein modifying the one or more previously generated feature maps includes at least one of: removing feature information from the previously generated feature maps, pseudo-randomly shuffling locations of the feature information in the previously generated feature maps, mixing the feature information between feature maps of different object classes, swapping the feature information at different locations in the previously generated feature maps, and training the outlier detection part on the pseudo outlier feature map; apply the convolutional part of the machine learned model to the first input data to generate one or more first feature maps; generate a first content information-specific feature map by removing location information from one of the one or more first feature maps, and apply the content classification part to the first content information-specific feature map to obtain a content-based object classification result; generate a first location information-specific feature map by removing content information from one of the one or more first feature maps, and apply the location classification part to the first location information-specific feature map to obtain a location-based object classification result; apply the outlier detection part to one or more previously generated first feature maps which are generated for the instance of the spatial data, to obtain an outlier detection result; classify the object in the spatial data in accordance with the content-based object classification result, the location-based object classification result and the outlier detection result, wherein the classifying includes classifying the first input data in accordance with an object class when the content-based object classification result and the location-based object classification result both indicate the object class and when the outlier detection result does not indicate a presence of an outlier.

13. The system according to claim 12, wherein the input interface is a sensor interface to a sensor, wherein the sensor is configured to acquire the spatial data.

14. The system according to claim 12, wherein the system is a control system configured to adjust a control parameter based on the classification of the object.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0043] These and other aspects of the present invention will be apparent from and elucidated further with reference to the embodiments described by way of example in the following description and with reference to the figures.

[0044] FIG. 1 shows a schematic representation of a system for training a machine learnable model for classification of objects in spatial data, in accordance with an example embodiment of the present invention.

[0045] FIG. 2 shows a schematic representation of a method for training a machine learnable model for classification of objects in spatial data, in accordance with an example embodiment of the present invention.

[0046] FIG. 3 illustrates a relation between content information and location information in spatial data and the classification of objects in the spatial data, in accordance with an example embodiment of the present invention.

[0047] FIG. 4 illustrates the joint space of the location (x-axis) and content (y-axis) factor in the classification of objects in spatial data, with the marginals for both factors being indicated at the respective axis, in accordance with an example embodiment of the present invention.

[0048] FIG. 5 shows a schematic representation of an exemplary classification framework in which an input instance is mapped onto the two factor-sensitive classification branches and three outlier branches, and in which information of a factor is removed in select branches to generate an invariance in this branch with respect to this factor, in accordance with an example embodiment of the present invention.

[0049] FIG. 6 shows examples of pseudo outliers generated in the three outlier branches of the FIG. 5 example, in which the tokens a, b, c, d indicate locations in an intermediate network state that encodes information about an element and wherein the pseudo outliers are obtained by perturbing locations in four different ways, in accordance with an example embodiment of the present invention.

[0050] FIG. 7 shows a schematic representation of a system for classifying objects in spatial data using a machine learned model, in accordance with an example embodiment of the present invention.

[0051] FIG. 8 shows a schematic representation of a method for classifying objects in spatial data using a machine learned model, in accordance with an example embodiment of the present invention.

[0052] FIG. 9 shows the system as part of an (semi-)autonomous vehicle, in accordance with an example embodiment of the present invention.

[0053] FIG. 10 shows a computer-readable medium comprising data, in accordance with an example embodiment of the present invention.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0054] It should be noted that the figures are purely diagrammatic and not drawn to scale. In the figures, elements which correspond to elements already described may have the same reference numerals.

LIST OF REFERENCE NUMBERS

[0055] The following list of reference numbers is provided for facilitating the interpretation of the drawings and shall not be construed as limiting the present invention.

[0056] 20 sensor

[0057] 22 camera

[0058] 40 actuator

[0059] 42 electric motor

[0060] 60 physical environment

[0061] 80 (semi-)autonomous vehicle

[0062] 100 system for training machine learnable model

[0063] 160 processor subsystem

[0064] 180 data storage interface

[0065] 190 data storage

[0066] 192 training data

[0067] 194 data representation of machine learnable model

[0068] 196 data representation of machine learned model

[0069] 200 method of training machine learnable model

[0070] 210 accessing training data for training

[0071] 220 providing machine learnable model

[0072] 225 providing outlier detection part

[0073] 230 training the machine learnable model

[0074] 240 generating content information-specific feature map

[0075] 245 training content classification part

[0076] 250 generating location information-specific feature map

[0077] 255 training location classification part

[0078] 260 generating pseudo outlier feature map

[0079] 265 training outlier detection part

[0080] 212 accessing input data for inference

[0081] 220 training machine learnable model

[0082] 222 using machine learned model for inference

[0083] 230 providing state memory

[0084] 240 extracting previous internal state information

[0085] 250 updating state memory with current internal state

[0086] 300 object (person) classifiable by content and location information

[0087] 310 violations of content and location information

[0088] 320 fruits in facial arrangement of person

[0089] 330 elements of face with locations permuted

[0090] 340 randomly shuffled facial elements

[0091] 400 location (spatial arrangement) marginal

[0092] 410 content marginal

[0093] 420 in-distribution sample

[0094] 430 joint-out-of-distribution sample

[0095] 440 marginal-out-of-distribution sample

[0096] 450 full-out-of-distribution sample

[0097] 500 input instance

[0098] 510 location sensitive classifier

[0099] 512 feature aggregation

[0100] 514 H×W×1 feature map

[0101] 516 N+1 class classification

[0102] 520 content sensitive classifier

[0103] 522 spatial aggregation

[0104] 524 1×1×C feature map

[0105] 526 N-class classification

[0106] 528 outlier detection for content marginal distribution

[0107] 530 location-and-content outlier detection part

[0108] 532 flatten

[0109] 534 H×W×C feature map

[0110] 536 outlier detection for joint marginal distribution

[0111] 540 intermediate feature maps

[0112] 600 intermediate feature maps of inliers

[0113] 610 class A

[0114] 620 class B

[0115] 650 intermediate feature maps of pseudo outliers

[0116] 660 feature map generated by removal of information

[0117] 670 feature map generated by random location shuffle

[0118] 680 feature map generated by content-mixing between classes

[0119] 690 feature map generated by location swapping

[0120] 700 system for control or monitoring using machine learned model

[0121] 720 sensor data interface

[0122] 722 sensor data

[0123] 740 actuator interface

[0124] 742 control data

[0125] 760 processor subsystem

[0126] 780 data storage interface

[0127] 790 data storage

[0128] 800 method for classifying objects in spatial data

[0129] 810 accessing machine learned model

[0130] 820 accessing input data

[0131] 830 generating feature map(s)

[0132] 840 generating content information-specific feature map

[0133] 850 generating location information-specific feature map

[0134] 860 generating content-based object classification result

[0135] 870 generating location-based object classification result

[0136] 880 generating outlier detection result

[0137] 890 classifying object in spatial data

[0138] 900 computer-readable medium

[0139] 910 non-transitory data

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0140] The following provides with reference to FIGS. 1 and 2 a schematic overview of a system and method for training a machine learnable model for classification of objects in spatial data, and with reference to FIGS. 7 and 9 a schematic overview of a system and method for classifying objects in spatial data using the resulting machine learned model. The machine learnable/learned model and its training and use are further explained with reference to FIGS. 3 to 6, while FIG. 8 relates to the use of the machine learned model for control or monitoring of a physical system, being in this example an (semi-)autonomous vehicle. FIG. 10 relates to a computer-readable medium comprising a computer program.

[0141] FIG. 1 shows a system 100 for training a machine learnable model for classification of objects in spatial data. Given such spatial data, the objects may be classifiable into different object classes, e.g., by a human, by combining content information and location information contained in the spatial data. The system 100 may comprise an input interface for accessing training data 192 for the machine learnable model. The training data 192 may comprise instances of spatial data, with the instances of the spatial data comprising objects belonging to different object classes. For example, the spatial data may be image data, and each instance may be an image which may comprise one or more objects. In a specific example, the images may be of a road, and the objects contained in the images may be cars, pedestrians, bicyclist, road signs, obstacles, the road itself, etc.

[0142] As shown in FIG. 1, the input interface may be constituted by a data storage interface 180 which may access the training data 192 from a data storage 190. For example, the data storage interface 180 may be a memory interface or a persistent storage interface, e.g., a hard disk or an SSD interface, but also a personal, local or wide area network interface such as a Bluetooth, Zigbee or Wi-Fi interface or an ethernet or fiberoptic interface. The data storage 190 may be an internal data storage of the system 100, such as a hard drive or SSD, but also an external data storage, e.g., a network-accessible data storage.

[0143] In some embodiments of the present invention, the data storage 190 may further comprise a data representation 194 of an untrained version of the machine learnable model which may be accessed by the system 100 from the data storage 190. It will be appreciated, however, that the training data 192 and the data representation 194 of the machine learnable model may also each be accessed from a different data storage, e.g., via a different subsystem of the data storage interface 180. Each subsystem may be of a type as is described above for the data storage interface 180. In other embodiments, the data representation 194 of the untrained version of the machine learnable model may be internally generated by the system 100, for example on the basis of design and/or architectural parameters for the machine learnable model, and therefore may not be explicitly stored in the data storage 190.

[0144] The system 100 may further comprise a processor subsystem 160 which may be configured to, during operation of the system 100 and as part of the training of the machine learnable model, generate a content information-specific feature map by removing location information from the one or more feature maps and training the content classification part on the content information-specific feature map; generate a location information-specific feature map by removing content information from the one or more feature maps and training the location classification part on the location information-specific feature map; provide, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data; and as part of the training of the machine learnable model, generate a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, and train the outlier detection part on the pseudo outlier feature map. It will be appreciated that these aspects of the operation of the system 100 will be further explained with reference to FIGS. 3-6.

[0145] The system 100 may further comprise an output interface for outputting a data representation 196 of the trained machine learnable model, this model also being referred to as a machine ‘learned’ model and its data also being referred to as trained model data 196. For example, as also illustrated in FIG. 1, the output interface may be constituted by the data storage interface 180, with said interface being in these embodiments an input/output (‘IO’) interface, via which the trained model data 196 may be stored in the data storage 190. For example, the data representation 194 defining the ‘untrained’ model may during or after the training be replaced, at least in part, by the data representation 196 of the trained model, in that the parameters of the model, such as weights, hyperparameters and other types of parameters of the model, may be adapted to reflect the training on the training data 192. In other embodiments, as also shown in FIG. 1, the data representation 196 may be stored separately from the data representation 194 defining the ‘untrained’ model. In some embodiments, the output interface may be separate from the data storage interface 180, but may in general be of a type as described above for the data storage interface 180.

[0146] FIG. 2 shows a computer-implemented method 200 for training a machine learnable model for classification of objects in spatial data. The method 200 may correspond to an operation of the system 100 of FIG. 1, but does not need to, in that it may also correspond to an operation of another type of system, apparatus or device or in that it may correspond to a computer program. The method 200 is shown to comprise, in a step titled “ACCESSING TRAINING DATA FOR TRAINING”, accessing 210 training data, and in a step titled “PROVIDING MACHINE LEARNABLE MODEL”, providing 220 the machine learnable model. The method 200 is further shown to comprise, in a step of training 230 the machine learnable model which is titled “TRAINING THE MACHINE LEARNABLE MODEL”, in a sub-step titled “GENERATING CONTENT INFORMATION-SPECIFIC FEATURE MAP”, generating 240 a content information-specific feature map, in a sub-step titled “TRAINING CONTENT CLASSIFICATION PART”, training 245 the content classification part on the content information-specific feature map, in a sub-step titled “GENERATING LOCATION INFORMATION-SPECIFIC FEATURE MAP”, generating 250 a location information-specific feature map, and in a sub-step titled “TRAINING LOCATION CLASSIFICATION PART”, training 255 the location classification part on the location information-specific feature map. The method 200 is further shown to comprise, in a sub-step titled “PROVIDING OUTLIER DETECTION PART” which is a sub-step of the providing 220 of the machine learnable model, providing 225, as part of the machine learnable model, a least one outlier detection part for detecting outliers in input data of the machine learnable model which do not fit a distribution of the training data. The method 200 is further shown to comprise, in a sub-step titled “GENERATING PSEUDO OUTLIER FEATURE MAP”, this being a sub-step of the training 230, generating 260 a pseudo outlier feature map by modifying one or more previously generated feature maps which are generated for the instance of the spatial data, to mimic a presence of an actual outlier in the input data of the machine learnable model, and in a sub-step titled “TRAINING OUTLIER DETECTION PART”, being a sub-step of the training 230, training 265 the outlier detection part on the pseudo outlier feature map.

[0147] The following describes the classification of objects in spatial data with the example of images. It will be appreciated, however, that the measures described below may also be applied to other types of spatial data which may not directly be considered images.

[0148] FIG. 3 illustrates a relation between content information and location information in spatial data and the classification of objects in the spatial data, and also serves to illustrate the shortcomings of deep classifiers, e.g., of deep neural networks, when classifying objects. Namely, broadly speaking, an object class may be classifiable in spatial data on the basis of two fundamental factors: content, e.g., the presence of elements/parts composing the object, and location, e.g., the spatial placement and arrangement of these parts. FIG. 3 shows a real-life example in the form of an image 300 containing a face. The face is typically classifiably by a human observer as a face based on content information, e.g., the texture and edges representing features such as eyes, the nose, the hair, etc., and location information, e.g., the spatial arrangement of the eyes, the nose, the hair, etc.

[0149] A standard classification network may be biased towards the spatial locations of elements and thus may also classify the second image 320, showing fruits placed in the facial arrangement of the first image 300, as a face. Alternatively, a standard classification network may be biased towards the elements themselves, e.g., the content, and thus classify the third image 330 with shuffled facial elements as a face. However, for a human observer, these images do not present a challenge. Namely, while humans would, to a certain degree, recognize all three images as faces, they would recognize that only the first image 300 shows a true face and the other images show only certain facial attributes.

[0150] The machine learnable/learned model as described in this specification represents a classification framework which may be empowered to mimic a human response. Namely, the classification framework may on the one hand recognize true faces and on the other hand may also detect an outlier face and provide an explanation of its nature, e.g., by containing true facial elements but in false locations. This ability to detect and interpret outliers while performing classification may be of great importance in many applications and for example may be used for active labeling to improve network generalization or for quality-control in manufacturing industries. In particular, the described classification framework may learn the feature representation to be specifically sensitive towards the content and location factors. Besides the standard classification, this may provide additional explanation about image ambiguities in terms of content and location.

[0151] FIG. 4 illustrates the joint space of the location (or spatial arrangement, x-axis) and content (y-axis) factor in the classification of objects in spatial data, in that the location and content jointly define two object classes, namely faces and fruits. Each point in this 2D plot resembles a possible observed sample, in this case an image, of the joint distribution of these two factors, e.g., certain content (y-axis) placed at certain spatial locations (x-axis).

[0152] In this example, the two classes (faces and fruits) require the content and the location each to fall within a select marginal distribution, with the select marginal distributions then jointly define the object class. The marginal distributions are schematically indicated at the respective axis as respective graphs, with the graph 400 defining two marginal distributions in terms of location (spatial arrangement), namely a face-like arrangement and a vertical stack arrangement, and with the graph 410 defining two marginal distributions in terms of content, namely face features (e.g., nose, ears, eyes) and fruit features (e.g., banana, apple, cherry). Throughout this specification, the samples of which both factors fall within the respective marginal distributions and which then jointly define an object class will be referred to as ‘in-distribution’ samples, see reference numeral 420. These inliers belong to the two possible object classes: faces in which the content information falls within the ‘face feature’ marginal distribution and in which the location information falls within the ‘face-like arrangement’ marginal distribution, and fruits in which the content information falls within the ‘fruit feature’ marginal distribution and in which the location information falls within the ‘vertical stack arrangement’ marginal distribution. All other samples may be referred to as ‘out-of-distribution samples’ or ‘outliers’, see reference numerals 430-450. The out-distribution samples as a group may be further segregated into three distinct parts:

[0153] 1. The joint-out-of-distribution samples 430 for which both factors are in their marginal distributions but their association is wrong, e.g., a sample having the content of one class but the location of another class (regions with dashed borders in FIG. 4). The second image 320 of FIG. 3 is an example thereof, with the fruits placed in a facial arrangement.

[0154] 2. The marginal-out-of-distribution samples 440 for which one factor is out of its marginal distribution and the other is in its marginal distribution (striped regions in FIG. 4). The fourth image 340 in FIG. 3 is an example thereof, with randomly shuffled facial elements.

[0155] 3. The full-out-of-distribution samples 450 for which all factors are out of their marginal distributions (dotted regions in FIG. 4).

[0156] In general, the structure of the out-distribution space may be more complex, e.g., with more factors besides content and location, but for the sake of simplicity, the two-factor case is considered, which leads to the three types of the outliers described above.

[0157] As will also be described with reference to FIG. 5, to determine into which of these out-of-distribution types a given sample falls, two separate classifiers may be provided, namely one which is sensitive to the content present in the image and insensitive (invariant) to the location/spatial arrangement of the content, and another one which is sensitive to the location/spatial arrangement and insensitive (invariant) to the content in the image. This may be achieved by purposefully removing or masking the information (content or location) to which a respective classifier should be invariant. Thereby, the classifiers may jointly distinguish between the in-distribution and marginal-out-of-distribution case. Namely, in the former case, the classification output of both classifiers may be in agreement, while in the latter case, the classification output of both classifiers may be in disagreement. As such, while the location-sensitive and content-invariant classifier may misclassify the second image 320 in FIG. 3 as a face, the content-sensitive and location-invariant classifier may misclassify the second image 320 as a fruit object (which is a misclassification, as the ‘fruit’ class may be defined as fruit placed a vertical stack arrangement, see also FIG. 4). Since both classifiers may be in disagreement, the second image 320 may be identified as a marginal-out-of-distribution outlier by the classification outcome of both classifiers.

[0158] However, both classifiers may not be able to distinguish the other two out-of-distribution types, namely the joint-out-of-distribution and full-out-of-distribution samples. To also distinguish the joint-out-of-distribution case, a separate outlier detector may be provided in the classification framework. To train this outlier detector, hard negative examples may be generated in the feature space by mixing information from different samples. This ‘joint’ outlier detector may now detect samples for which both content and location information is a valid factor for a class, but which are wrongly associated. For example, the face in the third image 330 in FIG. 3 has all the elements of a face, but of which the locations were permuted.

[0159] The classifiers described above, which may also be referred to as ‘marginal-sensitive’ classifiers, may also provide high-confident classifications even in the full-out-of-distribution case. To address this, a further outlier detector may be added for each of the classifiers. To predict outliers, each outlier detector may be trained in a similar way as the respective classifier, namely by deliberately removing or masking certain information from the internal representation used as input to the respective classifier and by modifying the internal representation to mimic the presence of outliers in the input of the machine learnable model. The output of the outlier detectors may be used to distinguish in the classification outcome of the marginal-sensitive classifiers between inliers and the corresponding out-of-the-marginal outliers. Notably, the machine learnable model incorporating the above described classification framework may be trained using only in-distribution samples.

[0160] In some examples of the present invention, the machine learnable model may thus comprise at least two classifiers and three outlier detectors which together provide additional information about content and location factors of an image. More specifically, the output of each of the classifiers and the outlier detectors may be used to differentiate between different types of outliers and provide explanations for ambiguous cases. Such outputs may for example be logged or output to an operator, or may be used to take subsequent decisions. The following briefly summarizes the three different outlier detectors and their output when applied to the different types of input instances. It can be seen that each type of input instance may be uniquely identified by the combination of outputs of the respective outlier detectors:

TABLE-US-00001 Outlier Outlier Outlier detection detection detection for content for location for joint Input instance marginal marginal marginal Inlier No outlier No outlier No outlier Outlier in Outlier No outlier Outlier content marginal Outlier in No outlier Outlier Outlier location marginal Outlier in No outlier No outlier Outlier joint marginal Outlier in Outlier Outlier Outlier both marginal

[0161] FIG. 5 shows a schematic representation of an example of the classification framework in which an input instance 500 is mapped onto the two factor-sensitive classifiers and three outlier detectors. The respective classifiers and outliers may be considered to represent ‘branches’ of the framework, which may some embodiments be implemented by different heads of a deep neural network. The two different classifiers or classification branches 510, 520 may provide their output based on two different marginals (e.g., location and content), which adds interpretability to the decision making. To further enhance the interpretability, both the content and location sensitive classifiers 510, 520 may each be provided with a respective outlier detector to predict the outliers that do not align with the respective marginal distribution. The outlier detector for the location-sensitive classifier 510 (LSC) may detect the outliers with respect to the location marginal, and thereby detect those samples that do not fit the known spatial location distribution. Similarly, the outlier detector for the content-sensitive classifier 520 (CSC) may categorize unknown or incomplete content as outliers irrespective of its spatial location. Moreover, it may be desirable to detect outliers on the joint marginal distribution, and thereby detect those samples which are inliers in both marginals but of which the association between the marginals is inaccurate. To detect such outliers, a third outlier detector, which is also referred to ‘outlier detector on joint marginal distribution’ 530 (ODJ), may be provided. Overall, the framework with three outlier detectors along with two classifiers makes it possible to detect and provide interpretability whether an outlier belongs to marginal-out-distribution, joint-out-distribution or full-out-distribution. This framework is depicted in FIG. 5, with the respective components being described below.

[0162] Content-sensitive classifier (CSC, 520): This classifier's decisions may be based upon the content of object elements/parts independent of their spatial location. Let the dimension of the input feature map F.sup.input 540 of the CSC-branch 520 be H×W×C, where H, W C are the height, width and channels of the feature map. Note that the spatial resolution H×W may capture spatial information and channels C may encode feature representations, and that such an input feature map of H×W×C may also be considered to represent a C-tuple of H×W×1 feature maps. Upon removing the spatial information from the channels, the content-sensitive classifier may be directed to respond to the features encoded in different channels irrespective of their spatial location. For that purpose, the spatial information H×W may be aggregated 522 across the channels C in order to remove the spatial information and allow the classifier to make its decisions based on the feature representation encoded in spatially aggregated channels F.sup.c 524 with size 1×1×C, which may elsewhere also be referred to as a content information-specific feature map. Spatial aggregation of F.sup.input at channel k may be formulated as: F.sub.k.sup.c=Σ.sub.i=1.sup.HΣ.sub.j=1.sup.WF.sub.ijk.sup.input.

[0163] Outlier detection for content marginal (ODC): Given that the CSC 520 may classify an input sample into one of the object classes, it may be desirable to provide an outlier detector to identify outliers of which the content lies outside of the marginal distribution. It may be desired to detect outliers which are unseen during the training while only using the inlier samples from the training data. The following explains how potential outliers may be generated and how the outlier detector may be trained in a self-supervised manner. Namely, hard negative examples of outliers may be generated by augmenting the intermediate feature maps F.sup.input 540 before the spatial aggregation 522. Here, the samples with the entire content present may be considered as inliers, whereas the samples with absent, incomplete and/or mismatched content may be considered as outliers. Such outliers may be generated in the feature map F.sup.input by removing a part of information or blending the content of samples from one class with the content of samples from another class. For example, the blending may be accomplished by replacing a patch of size h×w×C from the feature map F.sup.input (class1) with a same patch size from the feature map of a different class F.sup.input (not class1), where h<H and w<W. Information may for example be removed by setting all values in the patch to be 0. Such self-generated outliers in the feature space may be referred to as pseudo outliers F.sup.pseudo. These set of outliers may be generated in every training iteration and the outlier detector may be trained on both the inlier and pseudo outlier feature maps. Note that the CSC may be trained solely on the valid inlier training data, and that the pseudo outliers may be used only to train the outlier detector (ODC).

[0164] Location-sensitive classifier (LSC, 510): This classifier's decisions may be based on the spatial locations of the object's parts/elements but not their content. Consider the input feature map F.sup.input 540 of the LSC with dimensions H×W×C. The spatial resolution H×W may contain spatial information and channels C may encode feature representations. It may be desired to capture only the spatial information and thereby possibly discard the feature representations. Similar to the spatial aggregation in the CSC branch 520, feature aggregation 512 may be applied to the intermediate feature map F.sup.input 540 to integrate-out content information, resulting in the feature map F.sup.l 514 with dimensions H×W×1. The feature aggregation at every location i,j in F.sup.input may be formulated as: F.sub.ij.sup.l=Σ.sub.m=1.sup.c F.sub.ijm.sup.input, for i,j ∈ H×W. This feature aggregation may weaken the content representation but does not affect the spatial information. The classifier branch with this feature aggregation as a component (branch 510 in FIG. 5) may be sensitive only to the spatial locations of object elements for decision-making irrespective of their exact content.

[0165] Outlier detection for location marginal (ODL): This outlier detector may categorize the samples with unknown or incomplete spatial locations of object elements as outliers, as they may not correspond to the known location marginal distribution. Similar to ODC, the outlier detection may be trained in a self-supervised manner. For example, pseudo outlier samples may be generated from the aggregated feature map F.sup.l 514. As mentioned above, potential outliers are the samples with unknown or incomplete spatial arrangement. Such outlier samples may be generated from the feature map F.sup.l either by removing a part of information or randomly shuffling locations in the feature map. It was found that the ODL may not need to be implemented as a separate outlier detector but that the outlier detection may be implemented as an additional class in the classification branch 510. It is hypothesized that learning spatial arrangement may be ‘easier’ than learning distinct feature representations, and thus that the former may be included as an additional class in the classification branch, which is shown in FIG. 5 as a classification of N+1 classes 516 compared to the other classification branch's N classes 526. Accordingly, inliers may have their class values provided as a ground truth whereas the pseudo outliers may be treated as outlier class. For example, the standard cross-entropy loss may be used to train the inliers, while the loss for pseudo outlier samples may include the standard cross entropy loss on the outlier class, while in addition the entropy of the pseudo outlier samples on the inlier classes may be increased. In a specific example, the outlier classification loss may be as follows: custom-character .sub.pseudo outlier=−log(P.sub.outliers)+λ log(P.sub.class). Here, P.sub.outliers, P.sub.class are the outlier and inlier class probabilities. In a specific example, λ may be set to 0.05. Considering outliers as an additional class and computing the above loss may improve the outlier detection. This may not hold true for other outlier detectors, as initial empirical evidence appears to show.

[0166] Outlier detection for joint marginal distribution (ODJ): This outlier detector may treat the samples that share similar attributes to the training data as inliers and the rest as outliers. Similar to the ODC, the pseudo outliers may be generated from the intermediate feature map 540 with dimension H×W×C. In ODC, the pseudo outliers may be subject to spatial aggregation 522 and which may result in the loss of spatial information. On the other hand, ODL may simulate pseudo outliers from the aggregated feature map and may have no cues for the content information. Unlike these two cases, ODJ may be desired to be sensitive to both content and location marginals. Hence, the entire feature map 540 of dimension H×W×C may, after flattening 532, be used as input for outlier detection so that both content and spatial information persists. The augmenting strategy may be similar to the ODC, where a part of information may be removed or information blended between inter-class features. In addition to the above strategies, also any of the two locations with patch size h×w×C may be swapped within the same feature map. This additional example may allow the outlier detector to be sensitive to changes in any of the two marginals.

[0167] FIG. 6 shows examples of pseudo outliers generated for the outlier detectors of the FIG. 5 example, in which the tokens a, b, c, d represent respective activations in an intermediate feature map, and wherein the pseudo outliers may be obtained by perturbing locations and/or content in four different ways. More specifically, FIG. 6 shows intermediate feature maps 600 of inliers for class A 610 and for class B 620. Class A is shown to comprise tokens a and b in the depicted spatial arrangement, while class B is shown to comprise tokens c and d in the depicted spatial arrangement. Pseudo outliers may be generated by modifying one or both intermediate feature maps 600 to create intermediate feature maps 650 of pseudo outliers. For example, an intermediate feature map 660 may be generated by removing information, being in this case the token b from the intermediate feature map 610. Another example is the intermediate feature map 670 which may be generated by randomly shuffling the location of tokens, thereby moving token a and token b to new locations. Yet another example is the intermediate feature map 680 which may be generated by mixing the content between classes, for example by replacing the token c in the intermediate feature maps 620 by the token b from the intermediate feature maps 610. Yet another example is the intermediate feature map 690 which may be generated by swapping the location of token a and token bin the intermediate feature maps 610.

[0168] With respect to the training, it is noted that the inlier examples from the training data and any generated pseudo outliers may be used to train the outlier detectors. All three outlier detectors and the two classifiers may be trained independently to each other, but may be trained in a same training session, e.g., in same or separate iterations of the training session. The following describes specific examples of training parameters and architecture/network parameters, but which are merely exemplary and entirely non-limiting.

[0169] In a specific example, both CSC and ODJ may be trained for 100 epochs starting with learning rate 0.001, with the learning rate being dropped by a factor of 0.1 after 80 epochs. It was found that LSC may be trained for fewer number of epochs than the other branches. In a specific example, it was sufficient to train only for 25 epochs with learning rate 0.001. In a specific example, a batch size 128 and Adam optimizer with no weight decay may be used. In a specific example, the network weights may be initialized with Xavier initialization. In a specific example, the mean square error loss and tan h activation function may be used for outlier detection in both CSC and ODJ. Here, the labels may also be flipped with a probability of 0.1 in every batch training, flipping the label of inlier samples as outliers and self-generated outlier samples as inliers to avoid network overfitting on inlier samples.

[0170] In a specific example, for CSC, training may be started only with the classification loss until epoch 8 while later including the loss for outlier detection. Such setting may pretrain the weights and stabilize the training for outlier detection. A classifier similar to CSC may be used in the ODJ branch to pretrain the weights until epoch 8 and later use both the classification loss and also the loss for outlier detection to stabilize rest of the training. The classifier including the outlier class in LSC may start training from scratch.

[0171] In a specific example, the architecture details of LSC may be as follows: input1c-conv16c-conv32c-conv64c-FeatureAggregation1c-conv1c-fc4. The kernel size of the convolutional layers may be [5,5,3,3] respectively. Here, input1c, conv16c, fc4 refers to input with 1 channel, convolutional layer with 16 feature maps and classifier with 4 classes (3 object classes+1 outlier class) respectively. In a specific example, the base architecture details of CSC may be as follows: input1c-conv16c-conv32c-conv64c-conv16c-SpatialAggregation16c-fc512-fc128. The classification and outlier heads are follows: fc128-fc3 and fc128-fc1. Here fc3 represents classifier for 3 class objects and fc1 refers to outlier detection neuron. In a specific example, the architecture details of ODJ may be as follows: input1c-conv16c-conv32c-conv64c-conv4c-reshape(4×16×16)-fc512-fc128-fc1. The classification loss to stabilize the training of ODJ may be as follows: fc128-fc3.

[0172] In a specific example, for the generation of outlier samples in all the three outlier branches during training, a patch size of either 3×3 or 5×5 may be chosen to remove or mix information from another class or swap the locations within the feature map as discussed elsewhere in this specification. In a specific example, instead of choosing the entire patch in a random manner, the center of the patch may be selected based on the highest activation location in the feature map. The feature map of size H×W×C may be aggregated along the channels to obtain a feature map of size H×W, which may then be normalized between [0,1] to treat it as a probability map to pick a location with probability p. In CSC and ODJ, the aggregation of channels may be performed only to pick a location with high activations but not need to be treated as any input in the network, whereas the feature map may already be aggregated in LSC. During the course of training, this scheme may enable the highest activation locations to be modified that would potentially result in outliers.

[0173] Experiments show that the measures describe in this specification may allow outliers in content marginal, location marginal and joint marginal distribution to be detected with high accuracy by training the machine learnable model (substantially) only on the in-distribution samples from the training data. The framework comprising the outlier detectors along with the classifiers may provide explicit cues for interpretable decision-making on unseen outlier samples. The following table illustrates the explicit cues for interpretable decision making on a few examples. As shown in this table, it is possible to interpret the type of outlier sample from the output of outlier detectors and class decision from the classifiers. When using the machine learned model on new input data, the system and/or method may provide the final inference as output or use the final inference in its final decision making.

TABLE-US-00002 Input example ODC ODL ODJ CSC LSC Final inference Ex. 1 Not Not Not Class A Class A In-distribution outlier outlier outlier Ex. 2 Not Not Outlier Class B Class A Joint-out- outlier outlier distribution Ex. 3 Outlier Not Outlier Class A Class A Outlier in outlier content, inlier in location

[0174] FIG. 7 shows a system 700 for classifying objects in spatial data. The system 700 may comprise an input interface 780 for accessing trained model data 196 representing a machine learned model as may be generated by the system 100 of FIG. 1 or the method 200 of FIG. 2 or as described elsewhere. For example, as also illustrated in FIG. 7, the input interface may be constituted by a data storage interface 780 which may access the trained model data 196 from a data storage 790. In general, the input interface 780 and the data storage 790 may be of a same type as described with reference to FIG. 1 for the input interface 180 and the data storage 190. FIG. 7 further shows the data storage 792 comprising input data 722 comprising at least one instance of spatial data, with the instance of the spatial data comprising an object to be classified. For example, the input data 722 may be or may comprise sensor data obtained from one or more sensors. In a specific example, the input data 722 may represent an output of a sensor-based observation, e.g., a sensor measurement, of an environment containing an object, and the machine learned model may provide a classification of the object. An example of an environment is the road ahead of an (semi-)autonomous vehicle, an inside of warehouse, an assembly line, etc. In some embodiments, the sensor data as input data 722 may also be received directly from a sensor 20, for example via a sensor interface 720 or via another type of interface instead of being accessed from the data storage 790 via the data storage interface 780. In such embodiments, the sensor data may be received ‘live’, e.g., in real-time or pseudo real-time.

[0175] The system 700 may further comprise a processor subsystem 760 which may be configured to, during operation of the system 700, apply the convolutional part of the machine learned model to the input data to generate one or more feature maps, generate a content information-specific feature map by removing location information from one of the one or more feature maps, and apply the content classification part to the content information-specific feature map to obtain a content-based object classification result. The processor subsystem 760 may be further configured to generate a location information-specific feature map by removing content information from one of the one or more feature maps, and apply the location classification part to the location information-specific feature map to obtain a location-based object classification result. The processor subsystem 760 may be further configured to apply the outlier detection part to one or more previously generated feature maps which are generated for the instance of the spatial data, to obtain an outlier detection result, and to classify the object in the spatial data in accordance with the content-based object classification result, the location-based object classification result and the outlier detection result, wherein said classifying comprises classifying the input data in accordance with an object class if the content-based object classification result and the location-based object classification result both indicate the object class and if the outlier detection result does not indicate a presence of an outlier.

[0176] In general, the processor subsystem 760 may be configured to perform any of the functions as previously described with reference to FIGS. 3-6 and elsewhere. In particular, the processor subsystem 760 may be configured to apply a machine learned model of a type as described with reference to the training of the machine learnable/learned model. It will be appreciated that the same considerations and implementation options apply for the processor subsystem 760 as for the processor subsystem 160 of FIG. 1. It will be further appreciated that the same considerations and implementation options may in general apply to the system 700 as for the system 100 of FIG. 1, unless otherwise noted.

[0177] FIG. 7 further shows various optional components of the system 700. For example, in some embodiments of the present invention, the system 700 may comprise a sensor data interface 720 for directly accessing sensor data 722 acquired by a sensor 20 in an environment 60. The sensor 20 may, but does not need to, be part of the system 700. The sensor 20 may have any suitable form, such as an image sensor, lidar sensor, radar sensor, etc., or in general any individual or set of sensors which provide spatial data containing objects to be classified. The sensor data interface 720 may have any suitable form corresponding in type to the type of sensor, including but not limited to a low-level communication interface, an electronic bus, or a data storage interface of a type as described above for the data storage interface 780.

[0178] In some embodiments of the present invention, the system 700 may comprise an actuator interface 740 for providing control data 742 to an actuator 40 in the environment 60. Such control data 742 may be generated by the processor subsystem 760 to control the actuator 40 based on the classification result, as may be generated by the machine learned model when applied to the input data 722. For example, the actuator 40 may be an electric, hydraulic, pneumatic, thermal, magnetic and/or mechanical actuator. Specific yet non-limiting examples include electrical motors, electroactive polymers, hydraulic cylinders, piezoelectric actuators, pneumatic actuators, servomechanisms, solenoids, stepper motors, etc. Such type of control is described with reference to FIG. 8 for an (semi-)autonomous vehicle.

[0179] In other embodiments of the present invention (not shown in FIG. 7), the system 700 may comprise an output interface to a rendering device, such as a display, a light source, a loudspeaker, a vibration motor, etc., which may be used to generate a sensory perceptible output signal which may be generated based on the classification result by the machine learned model. The sensory perceptible output signal may be directly indicative of the classification result by the machine learned model, but may also represent a derived sensory perceptible output signal, e.g., for use in guidance, navigation or other type of control of the physical system.

[0180] In general, each system described herein, including but not limited to the system 100 of FIG. 1 and the system 700 of FIG. 7, may be embodied as, or in, a single device or apparatus, such as a workstation or a server. The device may be an embedded device. The device or apparatus may comprise one or more microprocessors which execute appropriate software. For example, the processor subsystem of the respective system may be embodied by a single Central Processing Unit (CPU), but also by a combination or system of such CPUs and/or other types of processing units. The software may have been downloaded and/or stored in a corresponding memory, e.g., a volatile memory such as RAM or a non-volatile memory such as Flash. Alternatively, the processor subsystem of the respective system may be implemented in the device or apparatus in the form of programmable logic, e.g., as a Field-Programmable Gate Array (FPGA). In general, each functional unit of the respective system may be implemented in the form of a circuit. The respective system may also be implemented in a distributed manner, e.g., involving different devices or apparatuses, such as distributed local or cloud-based servers. In some embodiments, the system 700 may be part of vehicle, robot or similar physical entity, and/or may be represent a control system configured to control the physical entity.

[0181] FIG. 8 shows an example of the above, in that the system 700 is shown to be a control system of an (semi-)autonomous vehicle 80 operating in an environment 60. The autonomous vehicle 80 may be autonomous in that it may comprise an autonomous driving system or a driving assistant system, with the latter also being referred to as a semiautonomous system. The autonomous vehicle 80 may for example incorporate the system 700 to control the steering and the braking of the autonomous vehicle based on sensor data obtained from a video camera 22 integrated into the vehicle 80. For example, the system 700 may control an electric motor 42 to perform (regenerative) braking in case the autonomous vehicle 80 is expected to collide with a pedestrian. The system 700 may control the steering and/or braking to avoid collision with the pedestrian. For that purpose, the system 700 may classify objects such as pedestrians in the sensor data obtained from the video camera. If the state of the vehicle, e.g., its position relative to the pedestrian, is expected to result in a collision, the system 700 may take corresponding action.

[0182] FIG. 9 shows a computer-implemented method 800 for classifying objects in spatial data. The method 800 may correspond to an operation of the system 700 of FIG. 7, but may also be performed using or by any other system, apparatus or device.

[0183] The method 800 is shown to comprise, in a step titled “ACCESSING MACHINE LEARNED MODEL”, accessing 810 a machine learned model as described elsewhere in this specification and in a step titled “ACCESSING INPUT DATA”, accessing 820 input data, the input data comprising an instance of spatial data, the instance of the spatial data comprising an object to be classified. The method 800 is further shown to comprise, in a step titled “GENERATING FEATURE MAP(S)”, applying the convolutional part of the machine learned model to the input data to generate 830 one or more feature maps, and in a step titled “GENERATING CONTENT INFORMATION-SPECIFIC FEATURE”, generating 840 a content information-specific feature map by removing location information from one of the one or more feature maps. The method 800 is further shown to comprise, in a step titled “GENERATING LOCATION INFORMATION-SPECIFIC FEATURE”, generating 850 a location information-specific feature map by removing content information from one of the one or more feature maps, and in a step titled “GENERATING CONTENT-BASED OBJECT CLASSIFICATION RESULT”, applying 860 the content classification part to the content information-specific feature map to obtain a content-based object classification result. The method 800 is further shown to comprise, in a step titled “GENERATING LOCATION-BASED OBJECT CLASSIFICATION RESULT”, applying 870 the location classification part to the location information-specific feature map to obtain a location-based object classification result, and in a step titled “GENERATING OUTLIER DETECTION RESULT”, applying 880 the outlier detection part to one or more previously generated feature maps which are generated for the instance of the spatial data, to obtain an outlier detection result. The method 800 is further shown to comprise, in a step titled “CLASSIFYING OBJECT IN SPATIAL DATA”, classifying 890 the object in the spatial data in accordance with the content-based object classification result, the location-based object classification result and the outlier detection result, wherein said classifying comprises classifying the input data in accordance with an object class if the content-based object classification result and the location-based object classification result both indicate the object class and if the outlier detection result does not indicate a presence of an outlier.

[0184] It will be appreciated that, in general, the operations or steps of the computer-implemented methods 200 and 800 of respectively FIGS. 2 and 9 may be performed in any suitable order, e.g., consecutively, simultaneously, or a combination thereof, subject to, where applicable, a particular order being necessitated, e.g., by input/output relations.

[0185] Each method, algorithm or pseudo-code described in this specification may be implemented on a computer as a computer implemented method, as dedicated hardware, or as a combination of both. As also illustrated in FIG. 10, instructions for the computer, e.g., executable code, may be stored on a computer-readable medium 900, e.g., in the form of a series 910 of machine-readable physical marks and/or as a series of elements having different electrical, e.g., magnetic, or optical properties or values. The executable code may be stored in a transitory or non-transitory manner. Examples of computer-readable mediums include memory devices, optical storage devices, integrated circuits, servers, online software, etc. FIG. 10 shows an optical disc 910. In an alternative embodiment of the computer-readable medium 900, the computer-readable medium may comprise model data 910 defining a machine learned model as described elsewhere in this specification.

[0186] Examples, embodiments or optional features, whether indicated as non-limiting or not, are not to be understood as limiting the present invention.

[0187] It is noted a system and method may be provided for classifying objects in spatial data using a machine learned model, as well as a system and method for training the machine learned model. The machine learned model may comprise a content sensitive classifier, a location sensitive classifier and at least one outlier detector. Both classifiers may jointly distinguish between objects in spatial data being in-distribution or marginal-out-of-distribution. The outlier detection part may be trained on inlier examples from the training data, while the presence of actual outliers in the input data of the machine learnable model may be mimicked in the feature space of the machine learnable model during training. The combination of these parts may provide a more robust classification of objects in spatial data with respect to outliers, without having to increase the size of the training data.

[0188] It should be noted that the above-mentioned embodiments illustrate rather than limit the present invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the present invention. Use of the verb “comprise” and its conjugations does not exclude the presence of elements or stages other than those stated. The article “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. Expressions such as “at least one of” when preceding a list or group of elements represent a selection of all or of any subset of elements from the list or group. For example, the expression, “at least one of A, B, and C” should be understood as including only A, only B, only C, both A and B, both A and C, both B and C, or all of A, B, and C. The present invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a device enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are described separately does not indicate that a combination of these measures cannot be used to advantage.

OBJECT CLASSIFICATION WITH CONTENT AND LOCATION SENSITIVE CLASSIFIERS

Inventors

Cpc classification

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06F18/2431

PHYSICS

Classification Explorer

G06V10/454

PHYSICS

Classification Explorer

G06F18/254

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06F18/251

PHYSICS

Classification Explorer

G06F18/241

PHYSICS

Classification Explorer

G06V10/809

PHYSICS

International classification

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Abstract

Claims

Description