METHOD FOR ANNOTATING TRAINING DATA
20230394803 · 2023-12-07
Assignee
Inventors
Cpc classification
International classification
Abstract
The invention relates to a method of annotating training data for an artificial intelligence comprising the following steps: storing, in a database, a set of data to be annotated, storing, in said database, at least a first description of a first facet for data selection in said set of data, said first description being associated with a first task to be performed by said artificial intelligence, selecting said first facet in said database, applying said first facet to data in said set of data to obtain first filtered data, receiving at least a first annotation of said first filtered data, and store said first annotation in the database in association with said first facet.
Claims
1. A method of annotating training data for artificial intelligence comprising the following steps: storing, in a database, a set of data to be annotated; storing, in said database, at least a first description of a first facet for data selection in said set of data, said first description being associated with a first task to be performed by said artificial intelligence; selecting said first facet in said database; applying said first facet to data in said set of data to obtain first filtered data; receive at least a first annotation of said first filtered data; and storing said first annotation in the database in association with said first facet.
2. The method of claim 1, wherein said database comprises a plurality of descriptions of a plurality of facets for data selection in said set of data and wherein: said first description includes a hierarchical link to a second description of a second facet for data selection in the database; and said first facet is applied to second filtered data obtained by applying said second facet to data of said set of data.
3. The method according to claim 2, wherein: said second facet covers a plurality of regions in said set of data; and the first facet is applied on each region on which the second facet is applied.
4. The method according to claim 3, wherein: annotations are associated with some of said regions covered by said second facet as well as with said second facet; and the first facet is applied on each region carrying an annotation associated with the second facet.
5. The method according to claim 2, wherein the description of the first facet comprises a filtering condition applied to the annotations associated with said regions as well as to said second facet and wherein the first facet is applied only for those regions for which said filtering condition is verified.
6. The method according to claim 5, wherein said filtering condition is associated with the regions annotated by said second facet and wherein the first facet is applied only to data from a cropping by these regions and for which the condition is verified.
7. The method according to claim 6, wherein said annotation generates the definition of a region in said set of data, wherein said region is stored in a database in relation to the region used to crop the annotated data and wherein said annotation is stored in said database in relation to said first facet and said region.
8. The method according to claim 6, wherein said annotation does not create a new region, and wherein said annotation is stored in said database in relation to said first facet as well as the region used to crop the annotated data.
9. The method according to claim 1, further comprising a step of displaying said first filtered data to a user, said annotation being received from said user.
10. The method according to claim 1, wherein said first filtered data is provided as input to an artificial intelligence module implementing said task, said annotation being received from said module.
11. A machine learning method for performing a task by an artificial intelligence, comprising the following steps: accessing a database comprising a set of data and at least one definition of at least one facet for data selection in said set of data, said one definition further comprising at least one annotation associated with said facet; applying said data selection facet to said set of data to obtain first filtered data; storing said first filtered data in an annotated training data memory; associating said first filtered data with annotations; and performing said task by said artificial intelligence.
12. The method according to claim 11, wherein, said annotation is generated according to a method of annotating training data for artificial intelligence comprising the following steps: storing, in a database, a set of data to be annotated; storing, in said database, at least a first description of a first facet for data selection in said set of data, said first description being associated with a first task to be performed by said artificial intelligence; selecting said first facet in said database; applying said first facet to data in said set of data to obtain first filtered data; receive at least a first annotation of said first filtered data; and storing said first annotation in the database in association with said first facet.
13. A device comprising a processing unit configured to implement steps according to the method according to claim 1.
14. A device comprising a processing unit configured to implement steps according to the method according to claim 11.
Description
FIGURES
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
DETAILED DESCRIPTION OF THE INVENTION
[0077] In the following, embodiments are described that provide data annotation in a hierarchical manner. They allow, for example, the design of image or video recognition applications based on machine learning. The invention is not limited to this type of application and other types of data can be used.
[0078] The embodiments of the invention make it possible to manipulate and store the hierarchical character of annotated concepts, to annotate more efficiently and to decouple the notion of machine learning model from the notion of dataset.
[0079] According to the invention, the embodiments take advantage of the hierarchical structure of the data to be annotated to facilitate annotation even though this hierarchical structure is a source of problems in the prior art.
[0080] This makes it possible to split an annotation job into several independent subtasks.
[0081] The embodiments address the problem of error propagation in prior art models. For example, they allow to automatically take into account the changes of classes to pass them on automatically.
[0082] The embodiments also allow annotating data without causing inflation of the data as the annotation proceeds. The generation phase of the actually annotated data can be postponed to an actual training phase.
[0083] As described in detail in the following, annotation according to the embodiments takes into account the hierarchical structure of the annotations to be performed. Automatic cropping of the data around the relevant regions to be annotated can be performed for annotation without generating new subsets of the data. It is also possible to filter the images to be annotated to display only the relevant data regions.
[0084] In what follows, the principles applicable to the various embodiments of the invention are first presented.
[0085] Contrary to the annotation techniques of the prior art, the embodiments of the invention will not focus on creating “sub-datasets” as annotations are made on the dataset. Thus, the inflation of the dataset by the annotation process, which is the source of error propagation in the prior art, is not repeated.
[0086] On the contrary, the initial dataset will be kept and we will create a dynamic construction system of datasets (the “facets” or “views”) to which annotations will be associated. Thus, it will be possible to make any manipulation, including corrections on this construction system, without touching the dataset. The “sub-datasets” will be generated on demand, according to the desired use (for example, training an AI) once the construction system has been validated.
[0087] The context for implementing the embodiments of the invention is thus as follows.
[0088] We consider a set of unstructured data to be annotated to allow the training of an artificial intelligence on a certain number of tasks.
[0089] “Data” (unannotated) refers to any type of data that can typically be produced by a sensor, capture device, or manual input device. It can be an image, a video stream, a point cloud, a 3D image made of voxels, a sound stream, a text, a time series, etc.
[0090] Annotations are the additional information linked to a piece of data and related to its content. For example, the position of an object in an image and/or its “class” (or “type”).
[0091] A “task” refers to any action of a machine that allows it to automatically predict the annotations of a datum, or a subset of the annotations. A large number of tasks can be cited. Some examples are given below.
[0092] For a classification task, the annotation to be predicted is a category of object, otherwise called a “class”, among a predetermined list of possible classes. For example, given a picture of an animal, we want to know which animal it is.
[0093] For a detection task, the annotation to be predicted is a list of objects present in the image among a list of classes of interest. Each predicted object must indicate a simple delimitation of the object, typically in the form of a bounding box as well as its class. For example, given a video captured by an autonomous car, we want to know where all the vehicles, pedestrians, cyclists, etc. are.
[0094] For a segmentation task, the annotation to be predicted is the same as for a detection task, but the objects must be delimited to the nearest pixel.
[0095] For an OCR (Optical Character Recognition) task, the goal is to predict the text in an image. For example, reading a license plate number from a photo of a license plate.
[0096] For a pose estimation task, the goal is to predict the “pose” of a deformable object. Typically, key parts of the object are identified beforehand and linked by a tree. It is then a matter of predicting the position of the different nodes of the tree if they are visible or to indicate that they are invisible. For example, in the case of estimating the pose of a person, we typically try to locate the head, hands, feet, etc. of each person present in the image.
[0097] For a regression task, the goal is to predict a number or a vector (i.e. a list of N numbers, N being known beforehand). For example, the task is to predict the age of a person given a picture of a face.
[0098] This list of tasks is of course non-exhaustive but it allows to realize the variety of tasks that can be asked to an AI and thus the variety of possible annotations in a dataset.
[0099] When these different tasks contribute to the solution of the same problem by relying on one or more automatic recognition algorithms, it is common that these tasks are organized in a hierarchical manner.
[0100] The hierarchy relationship between a task A and a task B can for example appear when the need to annotate data for task B depends on the annotation of that data for task A. This is for example the case where the need to annotate for task B depends on the object class specified in A. In the example of the connection check given in the introduction, this is for example the annotation of the power meter screen which is only relevant in the context where the image actually shows a power meter and has been annotated as such in the first annotation level. It is therefore a question of being able to “filter” the regions to be annotated for task B according to a certain condition on the annotation of task A.
[0101] The hierarchy relation can also appear when task B consists in annotating the same data regions as those defined by task A. In the example given in the introduction concerning the meal trays, it is for example a question of annotating the nature of the dish on the image resulting from the cropping of the original image by the region defined in task A.
[0102] Hierarchy relationships may appear for any other reason that induces the fact that the annotation for task A can be directly reused to define the annotation need or simplify the annotation process for task B.
[0103] As mentioned above, it is common to annotate the same region of a data in different tasks. For example, this is the case when a classification task focuses on annotating the detailed class of an object created during an upstream detection task. A region is a sub-part of a data defined by its extremal values on the different axes of the data (x, y, z for spatial data and/or t for video). When the temporal axis is involved (for example for a video), the spatial position (i.e. in x, y, z) can vary for each value of t between t_min and t_max. The interest of a region is to be able to crop a data to produce a new (sub)data to annotate.
[0104] In accordance with the embodiments described in the following, in order to represent the hierarchical relationships between tasks and regions, the notion of “facet” is used. Each annotation task is assigned a unique facet (i.e. a task is linked to a unique facet and vice versa).
[0105] A facet is understood by analogy to the term facet in the field of faceted classification-based information retrieval, which gives users the ability to filter data based on the selected facet. In what follows, and without loss of generality, the term “view” is used.
[0106] In addition to being linked to a task, a view can be hierarchically attached to another view, called a “parent view” (or conversely a “child view”). A view that does not have a parent view is called a “root view”.
[0107] When a view is a child view, it sets a “filter condition” on the parent view's annotation, or “condition”. The data in the parent view that checks the condition defines the child view. This allows to create sub datasets.
[0108] An annotation process can have several root views. Formally, the relationship between the views induces a forest structure in the sense of a set of disjoint trees as defined in graph theory.
[0109] As described in the following, the views as defined above allow, in embodiments of the invention, filtering and cropping of the regions produced by the parent view to present the annotating user with valid data focused on the regions of interest. As explained in the following, this allows for more efficient annotation. It also allows to manipulate data on the fly in the annotation phase and to reserve the annotated datasets generation phase for the AI training phase.
[0110] A more formal definition of views is given below.
[0111] We define the set of all possible regions R and we associate a root region r.sub.i,0∈R to each of the data d.sub.i∈D of the dataset. Thus, each dataset has at least one root view.
[0112] We define the cropping function crop: D×R.fwdarw.D which takes as input a data d and a region r and returns the sub-data resulting from the cropping of d by r. Cropping by a root region does not change the data: ∀i, crop(d.sub.i, r.sub.i,0)=d.sub.i.
[0113] We note {v.sub.j|j=1, . . . , n} the set of n views and p: .fwdarw.
(with
the set of natural numbers) the parent function such that p(j)=0 if v.sub.j is a root view and p(j)=k if v.sub.k is the parent view of v.sub.j.
[0114] We note annotated(j,r)∈{0,1} the function that defines whether the region r is annotated for the view v.sub.j (in addition, we set ∀r∈R, annotated(0,r)=1) and A.sub.j the set of possible annotations for v.sub.j (the exact form of A.sub.j depends on the type of task associated with v.sub.j classification, detection, etc.).
[0115] We also note a.sub.j: R.fwdarw.A.sub.j the function that allows to retrieve the annotations of a region already annotated in the view v.sub.j.
[0116] The set of regions that are annotated in a region r by the view v.sub.j is noted regions(j,r).
[0117] For example, in a detection task regions(j,r) corresponds to the set of regions defined by the newly annotated bounding boxes. If no region is created (as in the case of a classification task), then we have regions (j,r)={r}. To handle the case of root views, we set regions(0, r.sub.i,0)={r.sub.i,0}.
[0118] We note c.sub.j: R.fwdarw.{0,1} the function which is 0 or 1 depending on whether the filtering condition of the view v.sub.j is valid or not (for any root view v.sub.k we define ∀r∈R, c.sub.k (r)=1).
[0119] We call R.sub.i,j the set of regions to annotate for the view v.sub.j on the data d.sub.i. We set R.sub.i,0={r.sub.i,0}, R.sub.i,j.sup.valid={r|r∈R.sub.i,p(j)telle que annotated(p(j),r)=1 et c.sub.j(r)=1} and
[0120] The data presented to the annotation for the view v.sub.j is the set of data cropped by the regions to be annotated, i.e. D.sub.j={crop(d.sub.i,r)|d.sub.i∈D et r∈R.sub.i,j}. This data can then be annotated in the same way as the state of the art, i.e. by considering D.sub.j as a stream of data to be annotated that can be taken in isolation from the rest of the annotation process.
[0121] Thus, the interaction of a view with its parent view (through the notions of region and condition, defined above) is the means that allows the implementation of the necessary manipulations during annotation in order to accelerate and make reliable the annotation process of the task to which it is attached. This makes it possible to avoid the inflation of data during the annotation phase according to the techniques of the prior art.
[0122] In other words, the creation of the notion of “view” allows a dynamic annotation of the data, which has the consequence of not creating sub-datasets as the annotation proceeds. According to the embodiments described below, it is possible to annotate the data without modifying the data itself. It is thus possible to modify any annotation at any time and this with the possibility, which does not exist in the prior art, of passing on these modifications as deeply as one wishes since the list of data to be annotated for a given task and the cropping on the zone of interest are calculated on the fly according to the annotations already present in the parent task.
[0123] According to the definitions and formulas stated above, one knows how to formally define the data to be presented to a user (or an automatic process) from an initial set, for annotation. As described in the following, in the annotation process, instead of making direct links between data and annotations, it is here a question of making links between annotations and formalization of the way to obtain the data presented to the user (or the process) from the initial data. This association allows to improve the annotation process.
[0124] In the following, the formal equations given above are illustrated for different cases.
[0125] With reference to
[0126] the data presented for annotation is exactly the raw data d.sub.i of the data set to be annotated. Indeed, the region to be annotated is necessarily the root region 103 which encompasses all the data d.sub.i 102. This is found via the above formalism because p(j)=0 therefore R.sub.i,j.sup.valid=R.sub.i,0={r.sub.i,0} and so the data presented to the annotation is crop(d.sub.i,r.sub.i,0)=d.sub.i. The root view 101 thus behaves like a “standard” annotation task that would follow the state of the art practice of adding annotations 104 to the data set D for the task related to v.sub.j. The difference is that the annotation is attached to the root region 103 and not directly to the data 102. In the case where there are multiple root views, all annotations are attached to the same root region 103.
[0127] In what follows, we consider cases where the regions to be annotated for a view B are this time those resulting from an annotation of the view A.
[0128] In the example of
[0129] It is assumed that the annotations in region 205 (solid line) verify a condition B while those in region 204 do not (dashed line). Thus only the data 203 cropped by region A 205 verifying condition B will be presented in the annotation view according to view B. The data resulting from a cropping by region 204 is not presented.
[0130] In general, the data to be annotated for view B 202 is the original data cropped by the regions annotated by view A 200 whose annotation verifies condition B. After the application of view B 202, the annotation 209 is associated with the region 205 that was used to crop the annotated data.
[0131] The example in
[0132] In the example in
[0133] According to the prior art, instead of keeping a single dataset 203, two subsets of the dataset would have been created corresponding to the two tasks of views A and B. For view A, which could for example be a detection view, the dataset would include the data 203 cropped by region 208, annotated by two regions (e.g. bounding boxes and their classes) 204+206 and 205+207. For view B, which could be a classification view, the dataset would have the data 203 cropped by region 205 and annotated by 209.
[0134] According to the invention, the data 203 is not modified and the annotations are not directly attached to it. Instead, a representation of views A and B, as well as regions, is created and annotations are associated with these regions. During the annotation process, no new subsets of data are created. Only views are used to generate the data to be annotated and allow the user (or an automatic process) to annotate them. These same views can be used to actually generate the annotated data to give as input to an AI in training phase. In the meantime, the storage and processing of annotations is greatly facilitated because the hierarchy between annotations is automatically preserved, which makes the process of updating (modifying or deleting) annotations more reliable.
[0135] In the example of
[0136] We start by applying a parent view A 300 (parent of view B 301) on a data 302 from the dataset to be annotated. As indicated, this parent view A 300 does not create new regions as previously. We therefore find the same region 303 after the application of view A 300. This region is associated with the annotation 304. We assume that the view A is applied on a root region 303 to simplify the figure (we could consider the case of a region that is not).
[0137] It is assumed that the annotations 304 verify the condition for view B 301. The data to be annotated for view B 301 is then the initial data 302 cropped by region 303. That is, the one already annotated by view A 300. After the application of view B 301, annotation 305 is associated with region 303.
[0138] We can see in the example of
[0139] According to the prior art, instead of keeping a single dataset, two subsets of the dataset corresponding to the two tasks of views A and B would have been created. For view A, which could for example be a classification view, the dataset would include the data 302 cropped by region 303 and annotated by 304. For view B, which could also be, for example, a classification view, the dataset would include the data 302 cropped by region 303 and annotated by 305.
[0140] The example in
[0141] In this case, the data region will not be presented to the user for annotation and no annotation will be added (annotation 305 is missing).
[0142] Annotation of data according to the above principles can take the form described in the following.
[0143] For example, a user can access an interface 500 of an annotation system as shown in
[0144] Thus, the interface 500 includes an action area 501 with various buttons 502 (ACT1), 503 (ACT2), 504 (ACT3). For example these buttons allow the user to manage the dataset in a general way, for example by adding data, viewing a view map, deleting images, etc.
[0145] The interface 500 also includes an area 505 for displaying root views. Here, for brevity, a root view 506 (V1) is shown. Other root views could be present in this area. In this area, a button 508 allows the user to add root views.
[0146] The user can select a view, for example 506, and an area 509 similar to 500 appears to display the child views of the selected view. For example, views 510 (V1.1), 511 (V1.2), 512 (V1.3) depend on view 506 (V1) which is therefore a parent view for them. In this zone, a button 513 allows the user to add views dependent on the view selected in zone 500. For example, the user selects the view 506, then clicks on the button 513 to create a dependent view of the view 506.
[0147] The process continues recursively as long as there is depth in the view tree. For example, an area 514 is used to display dependent views of the view selected in area 509 and a button 515 is used to add a dependent view to the selected view. In the illustrated example, the view 511 is for example selected but does not contain a child view. The user then clicks on button 516 to create a first dependent view of this selected view.
[0148] As illustrated in
[0149]
[0150] The interface 600 further includes an area 605, with a number of windows allowing the user to manage the images in a view. For example, a window 606 (DISTR1 IMG) provides access to a distribution of the images in the view among different uses (one can retrieve the number of training data, unannotated data, or the like). This makes it possible to know for which use the images of the view are intended. A window 607 (DISTR2 CNCPT) gives access to another distribution concerning the concepts that the machine will have to predict. For each concept, we can see the number of images associated with it.
[0151] A window 608 (IMG) can give access to the number of images in the view. A window 609 (CNCPT) gives access to the number of concepts associated with the view.
[0152] One of the buttons in area 601 can provide the user with access to unannotated images in an interface 700 shown in
[0153] This interface 700 comprises an action zone 701 with a certain number of buttons 702 (ACT7), 703 (ACT5), 704 (ACT9) allowing the user to perform a certain number of actions. These buttons are, for example, the same as those on the interface 600 or may be supplemented by others. According to some examples, it may include a button opening the possibility for the user to annotate a not yet annotated image, thanks to an annotation interface allowing to perform an annotation specific to the type of task related to the view selected in the interface 600. This interface is consistent with state of the art practice and is not described here. That said, unlike the state of the art, it does not apply directly to the annotated data but to the region of interest as described in
[0154] The interface 700 further includes an area 705 with all images 706 (IMG) not yet annotated. For example, the user selects an image in the area by clicking on it and is redirected to the image annotation interface for the selected task and view as before.
[0155] With interfaces such as those presented above, we can see that the annotation process is totally different from the state of the art. Indeed, the data to be annotated can be displayed to the user “on the fly” according to the different views. We therefore take advantage of the hierarchical nature of the tasks, without creating new data for each annotation.
[0156] The data can therefore be annotated automatically according to the parent views. The hierarchy of views in the form of a tree is therefore different from the generation of annotated data as in the prior art. This hierarchy allows a display for adding annotation, not on the data but on the regions produced by the parent views of the view being annotated.
[0157] The process is schematically presented in
[0158] The process presented above is described in the case of a manual annotation by a human user. However, the annotation can also be performed on the fly by an automatic annotation module. In this case, the process in
[0159] In practice, the notions of data, views, regions, annotations are stored in database. When the user seeks to view the unannotated data (see
[0160] For each annotation, a new “annotation” object is created in the database. This object is linked to the view that produced it and to the region it concerns.
[0161] If the task linked to the view does not create a new region but simply enriches the region passed as input (as for example in the case of classification) then the newly created annotation is linked to this region and it is considered as annotated for the current view. The annotation then becomes available for annotation in the child views.
[0162] If the view-related task creates new regions (as for example in the case of detection), it is these new regions that become available for annotation in the child views. For practical reasons, these new regions store their parent region (see step 1305 written below) in the database, in particular to allow the deletion of child regions and annotations in cascade as explained below.
[0163] The storage of annotations, not linked to data but linked to hierarchical views, allows annotation corrections to be made in a very simple way. For example, if a region is deleted, this allows to rely on the cascading deletion mechanism of a database. AIl regions and annotations that inherit from this deleted region in the child views are automatically deleted.
[0164] In addition, when modifying an annotation, one of the difficulties of annotating is to check that the child view filtering condition is not violated, and if so, to correctly delete annotations that have become invalid. Moreover, a region stores the view that created it in order to allow to efficiently delete all regions and annotations from a given view and its children when this view is deleted.
[0165]
[0166] As can be seen, both the data 901 and the views (with their conditions) 902 are linked to the dataset 900 to which they belong. The regions 903 are linked to the data 901 and to the views (with their conditions) 902 which carry them. Since a view must store a reference to its parent view (this reference is null in the case of a root view), the view table is linked to itself in
[0167] In comparison, a schematic of the same type is also shown in
[0168] The annotation of images in the case of an application for assisting technicians coming to connect a house or a building to the optical fiber according to the embodiments is now described with reference to
[0169] We first assume a first root classification view v1 1000 (V1) named “Context” with two classes (or type): “Wattmeter” and “Cabinet”. This allows us to differentiate between two types of photos taken by the technician: (i) either a photo of the device used to measure the signal power, called a wattmeter (we will then have to say whether the signal power displayed on the screen complies with the minimum threshold), (ii) or a photo of the fiber optic connection cabinet (we will then have to say whether the various connection zones are valid).
[0170] We also suppose a detection view v2 1001 (V2) named “Screen” having for parent the view v1 1000 and a single class (or type) “Screen”. Its role is to allow to locate the screen of the power meter to read it. This view 1001 has a condition. The region must be annotated “Wattmeter” in the parent view for an annotation to be associated with an image in this view.
[0171] We also suppose an OCR view v3 1002 named “Signal Quality” having for parent the v2 view 1001 whose goal is to allow to annotate the text on the screen. This view has no condition, that is ∀r∈R, c.sub.3(r)=1 (according to the formalism described above).
[0172] A detection view v4 1003 named “Connection” having for parent the v1 view 1000 and two classes (or type) “OK” and “KO”. Its purpose is to identify compliant or non-compliant connection zones. This view has a condition. The region must be annotated “Cabinet” in the parent view for an annotation to be associated with an image in this view.
[0173] We now describe the link between data (in this case images) and regions 1004, annotations 1005 and views 1006 for two images 1007 and 1008. Each piece of data carries regions organized in the form of a tree starting from a root region that encompasses the whole piece of data. An annotation is attached to a region and a view. Depending on the type of view, the annotation can be a class, text, etc. Each view has a type and possibly a condition.
[0174] The image 1007 represents a power meter. We then define a first region 1009 which is a root region. We also define a sub-region 1010 around the power meter screen.
[0175] The region 1009 is thus annotated 1011 according to the “Wattmeter” class. The region 1010 receives two annotations: one is the “Screen” class 1012 and the other is the text recognized by OCR on the screen 1013 for example “−4.6 dB”. The annotations 1011, 1012, 1013 are thus respectively associated with views 1000, 1001, 1002.
[0176] The image 1008 represents a junction box. We then define a first region 1014 which is a root region. We also define a sub-region 1015 and a sub-region 1016 which correspond to two different zones of the cabinet.
[0177] Region 1014 is thus annotated 1017 according to the “Cabinet” class. Region 1015 is annotated “OK” 1018 because it has a compliant connection area. The region 1016 receives a “KO” annotation 1019 because it has a non-conforming connection zone. Annotation 1017 is associated with view v1 1000 because the presence of the cabinet is a “Context” annotation. Annotations 1018 and 1019 are both associated with view v4 1003 because the good or bad connection is a “Connection” annotation.
[0178] In the above formalization, conditions are applied to annotations in the parent view in general. The preceding examples show that in particular it may be important to be able to define conditions on the annotated classes in the parent region. Variations according to embodiments are now described.
[0179] The condition of a view v.sub.j can for instance be about the class annotated in the parent view (if this one is unique, see paragraph below for the multi-class case) to restrict the annotation of v.sub.j annotation to certain classes only. This allows for example to specialize views to certain contexts of shooting or objects, typically in order to structure and simplify the annotation work in order to have less classes (or types) to annotate.
[0180] Let's call class.sub.j: A.sub.j.fwdarw. the class function that associates an annotation of v.sub.j to its class represented as an integer, and C.sub.j the set of acceptable classes, then given a region r we have c.sub.j (r)=1 if class.sub.p(j)(a.sub.p(j)(r))∈C.sub.j,0 otherwise.
[0181] In the multi-class case where the parent view allows annotating the region with several classes among a possible set of N classes, we can represent the class function by class: A.sub.j.fwdarw.{0,1}.sup.N and the acceptable classes as a logical boolean formula over the classes: C.sub.j: {0,1}.sup.N.fwdarw.{0,1}. In this case we have c.sub.j (r)=C.sub.j(class.sub.p(j)(a.sub.p(j)(r))). This variant actually encompasses the previous case where we have only one annotated class at a time.
[0182] According to the embodiments when the annotation of the parent view does not comprise a notion of class, as for example in the case of pose estimation or a textual annotation, one can define “clusters” on which one can also apply conditions.
[0183] We use the notion of clustering when the annotation of a view v.sub.j does not directly provide a class (or type). This is for example the case for a pose estimation task where the annotation corresponds to placing the nodes of a tree on the data. In this case, a partition of the annotation space can be calculated beforehand (clustering). This method divides the space A.sub.j into N groups (or “clusters”) and we can associate any annotation to the closest a ∈A.sub.j to the closest cluster, i.e. we have a function class.sub.j: A.sub.j.fwdarw. that allows to associate a class to an annotation.
[0184] In some embodiments, each annotation can be associated with multiple clusters and the class function has the general form class.sub.j: A.sub.j.fwdarw.{0,1}.sup.N. We then fall back to the case of the previous paragraph for the multi-class annotation case.
[0185] In the above, tasks and views have been associated in a bijective way. However, according to embodiments, a task can be divided over several views in order to notably reduce the number of classes to be annotated in each view. This allows a person (or an annotation module) annotating to focus on fewer classes, thus being more efficient and making fewer errors.
[0186] This does not change the general embodiment according to which, in practice, it is sufficient to keep the bijective link between view and task. It is only when training a machine learning algorithm on a set of annotated datasets that it is necessary to be able to combine two sister views (i.e. having the same parent) into a single view.
[0187] As explained, the views system allows for the automation of data manipulation and annotations, so that the data stream annotated by the different views corresponds to a set of datasets that would share a hierarchical structure. In this scheme, once data has been annotated for a given view, we can then train a machine learning model on the dataset that corresponds to the annotated data of that view.
[0188] According to some embodiments, this means that a model is trained on the data annotated by a certain view. However, it can be interesting to train a model on several views.
[0189] It is then necessary to be able to merge the annotated data from several views into a single annotated data set.
[0190] The steps of a process are now described with reference to
[0191] In what follows, we consider that data (for example images) are already stored in memory and associated with their root region. This association is realized as soon as a data is added to the database: a root region is created for each added data. AIternatively, one can foresee in the following steps to add or remove data to an existing dataset, either by a user or automatically. These steps are not represented.
[0192] In the case of an application, the user can, for example, create an annotation project in which data to be annotated and annotations will be stored according to the invention. This is for example a “Meal trays” or “Fiber connection” project.
[0193]
[0194] Once the parent view is selected, a condition on that parent view is defined (step 1104) to allow filtering of data to be offered for annotation for the view created in step 1101. The exact form of this condition depends on the task type of the parent view. For example, if the parent view is a classification or detection view where only one class can be annotated, the condition may take the form of a drop-down list from which the user selects the various parent classes of interest, thereby defining the set C 1 of acceptable classes defined above. If the type of parent task is multi-class and allows to annotate several classes on the same region, the condition can be defined by a boolean formula whose clauses will relate to the presence of certain parent classes. Then, the type of annotations for this view is created (step 1105), i.e. the type of associated task (classification, OCR, pose, detection or other). Depending on the type of task, the annotation configuration is then defined (step 1106). For example, for annotations based on classes, the classes of interest are defined: “Screen” or “Cabinet” in the example of the fiber connection. This step is optional because the classes can be modified after the view is saved in memory: the user can use the buttons in the action area 601 of the interface 600 in
[0195] If the view created is a root view, the process goes from step 1102 (Y) to step 1105 directly.
[0196] Once this process is completed, the view description is stored in memory (step 1107).
[0197] Next, with reference to
[0198] If it is a root view (Y), we initialize a loop (step 1203) during which we will determine (step 1204) all the regions not yet annotated according to the current view. We therefore iterate on the root regions of the data in memory: if the region is not annotated by the view v.sub.j the region is selected (step 1205). Otherwise (N) the region is ignored. Iteration is typically done via a query to a database but can be implemented by a counter i that traverses all root regions. In step 1206, if i has not yet finished iterating over the root regions (N), the loop is incremented (step 1207) or (Y), if all regions have been tested, a storage step 1208 of all regions filtered in step 1205 is performed. This is not a storage of a new dataset or the creation of a sub-dataset but the memorization of all the regions to be annotated according to the current view, typically according to a cursor returned in response to the database query.
[0199] This step is a prerequisite to a display of the data to the user, for example, to prepare for the display of the data to be annotated in the area 705 of the interface 700 of
[0200] Back to step 1202, if the selected view is not a root view (N), then a loop is initialized (step 1209), during which, all regions of the data annotated by the parent view are determined (step 1210).
[0201] For each region annotated by the parent view (Y), it is determined whether it is annotated by the current view in step 1211. If the region is not annotated by the current view (Y), then it is determined (step 1212) whether the filtering condition of the parent view is met for the current region. If the condition is met (Y), then the region is selected (step 1215) to present it for annotation.
[0202] The process then continues until all regions have been considered. This is determined in step 1216. If there are still regions to be considered (N), the loop is incremented in step 1217 and the process continues at step 1210. Otherwise (Y), the process ends with step 1208 already described, storing the regions filtered in step 1215.
[0203] When testing steps 1210, 1211 and 1212, if the result is negative (N), the process continues with step 1216 as shown in
[0204] The data corresponding to the regions to be annotated stored in step 1208 is either displayed for annotation by a user, for example via the interface 705, or provided to an automatic annotation module.
[0205] This annotation (manual or automatic) is described with reference to
[0206] For each region r stored in step 1208, the user (human or algorithm) is presented with the data d to which r is attached, cropped by r. Formally, at step 1300, the data is presented to the user for annotation crop(d,r) is presented to the user for annotation according to the view v.sub.j selected in 1201. In step 1301, an annotation is received for this data. It is then determined in step 1302 whether the type of task type related to v.sub.j is region-creating. If this is not the case (Y), then the annotation is stored in step 1303. As already indicated, this memorization is done in relation, not to the data, but to the current view and the region to which the data belongs.
[0207] If on the other hand the annotation creates one or more regions (N), noted {r.sub.k}, for example if it is a view linked to a detection task, we go to a step 1304 of memorizing a relationship between the created regions and the regions from which they originate. Then, for each of the created regions, a step 1305 of memorizing the annotation related to the created region and the current view is performed. For example, in the context of detection, we associate an annotated class on a bounding box with the corresponding region.
[0208] At the end of the process, a database according to
[0209] As already explained, it is advantageous that no sub-datasets have been created. Thus, we can correct and update annotations at one level and easily propagate the changes to the child views, which contributes to the reliability of the annotation process.
[0210] Only the visualization can be done temporarily to allow the user to visualize the data of the sub-datasets and annotate them or to allow an automatic process to take them as inputs and annotate them. This process being realized data by data, it remains light and does not present any complexity of execution.
[0211] The generation of datasets is then done at the time of training the AI. Either it is done in one go, or on the fly as the training progresses.
[0212]
[0213] The process is initialized by step 1401 during which annotations are accessed, for example a database as shown in
[0214] The result of the filtering is stored in memory at step 1405, with the annotations associated with the view v.sub.j. Thus, here we make the link between the data and the annotations. We thus start to build the dataset that will be used by the AI.
[0215] We then check (step 1406) if all the views have been considered. If there are still views (N), the loop is incremented (step 1407) and we return to step 1403.
[0216] If all views have been used (Y), then we proceed to step 1408 of providing the dataset to the AI which can then be trained in step 1409 depending on the application.
[0217] In addition, one can also foresee that the data is also linked to particular applications. In this case, the views will also be applied to a subset of the data related to the considered application.
[0218]
[0219] The system 1500 includes a communication bus to which are connected: [0220] a processing unit 1501, such as a microprocessor, referred to as CPU; [0221] a random access memory unit 1502, called RAM, for storing executable code of a method according to an embodiment of the invention, as well as registers adapted to store variables and parameters necessary for the implementation of a method according to embodiments, the memory capacity of which can be extended by an optional RAM connected to an expansion port for example. [0222] a memory unit 1503, called ROM, for storing computer programs for implementing the embodiments of the invention. [0223] a network interface unit 1504 connected to a communication network over which the digital data to be processed is transmitted or received. The network interface 1504 may be a single network interface or composed of a set of different network interfaces (e.g., wired and wireless interfaces, or different types of wired and wireless interfaces). Data is written to the network interface for transmission or read from the network interface for reception under the control of the software application running in the CPU 1501; [0224] a graphical user interface unit 1505 for receiving input from a user or displaying information to a user. [0225] a 1506 hard drive denoted HD [0226] an I/O module 1507 to receive/send data from/to external systems such as a video source or a display.
[0227] The executable code may be stored either in read-only memory 1503, on hard disk 1506, or on a removable digital medium such as a disk. According to one embodiment, the executable code of the programs may be received by means of a communication network, via the network interface 1504, in order to be stored in one of the storage means of the communication system 1500, such as the hard disk 1506, prior to being executed.
[0228] The central processing unit 1501 is adapted to control and direct the execution of instructions or portions of software code of the program(s) according to embodiments of the invention, such instructions being stored in any of the aforementioned storage means. After power-up, the processing unit 1501 is capable of executing instructions from the main RAM 1502 relating to a software application after such instructions have been loaded from the ROM program 1503 or the hard disk (HD) 1506 for example. Such a software application, when executed by the CPU 1501, causes the steps of a method to be executed according to embodiments.
[0229] The present invention has been described and illustrated in this detailed description with reference to the accompanying figures. However, the present invention is not limited to the embodiments shown. Other variants, embodiments and combinations of features may be deduced and implemented by the person skilled in the art from the present description and the attached figures.
[0230] To meet specific needs, a person skilled in the field of the invention may apply modifications or adaptations.
[0231] In the claims, the term “comprise” does not exclude other elements or steps. The indefinite article “a” does not exclude the plural. The various features presented and/or claimed may advantageously be combined. Their presence in the description or in different dependent claims does not exclude the possibility of combining them. The reference signs should not be understood as limiting the scope of the invention.