Annotation Method of Arbitrary-Oriented Rectangular Bounding Box
20230019343 · 2023-01-19
Inventors
- Wenlong Song (Beijing, CN)
- Juan Lv (Beijing, CN)
- Changjun LIU (Beijing, CN)
- Rui Tang (Beijing, CN)
- Tao Sun (Beijing, CN)
- Xiaotao LI (Beijing, CN)
- June FU (Beijing, CN)
- He ZHU (Beijing, CN)
- Yizhu Lu (Beijing, CN)
- Long Chen (Beijing, CN)
- Hongjie Liu (Beijing, CN)
Cpc classification
G06V10/774
PHYSICS
G06V10/26
PHYSICS
G06V10/467
PHYSICS
G06V10/25
PHYSICS
International classification
G06V10/26
PHYSICS
G06V10/24
PHYSICS
Abstract
Disclosed in the present invention is An annotation method of arbitrary-oriented rectangular bounding box, wherein: the elements for annotation being: the coordinates of the center point C, a vector {right arrow over (CD)} formed by the center point C and a chosen vertex D, and the ratio of the vector {right arrow over (CP)} to vector {right arrow over (CD)}, where {right arrow over (CP)} is the projection of the vector {right arrow over (CE)} to {right arrow over (CD)}, and {right arrow over (CE)} is a vector formed by the center of the bounding box to one of the vertex E that close neighbor to vertex D; and it is also required that the vector {right arrow over (CP)} is in the same direction as the vector {right arrow over (CD)}, the vertex E in either of the clockwise or counterclockwise direction of the vertex D. The symbol notation of this method is (x.sub.c, y.sub.c, u, v, ρ), x.sub.c and y.sub.c are the two coordinate values of the center point C, u and v are the two components of vector {right arrow over (CD)}, ρ is the ratio of the vector {right arrow over (CP)} to vector {right arrow over (CD)}. Also let a binary value s to indicate whether the two components of the vector {right arrow over (CD)} have same sign or not to represent {right arrow over (CD)} and −{right arrow over (CD)} at once by (|u|, |v|, s), then getting a method for annotating arbitrary-oriented rectangular bounding box that one bounding box has only two representation vectors. Its symbol notation is (x.sub.c, y.sub.c, |u|, |v|, s, ρ), wherein |u| and |v| are magnitude of two components of the vector {right arrow over (CD)}. This method avoids loss inconsistency between representations of the same bounding box and is beneficial to model regression training.
Claims
1. An annotation method of arbitrary-oriented rectangular bounding box, characterized in that the elements for annotation being: the coordinates of the center point C, a vector {right arrow over (CD)} formed by the center point C and a chosen vertex D, and the ratio of the vector CP to vector {right arrow over (CD)}, where {right arrow over (CP)} is the projection of the vector {right arrow over (CE)} to {right arrow over (CD)}, and {right arrow over (CE)} is a vector formed by the center of the bounding box to one of the vertex E that close neighbor to vertex D; the vector {right arrow over (CP)} is in the same direction as the vector {right arrow over (CD)}, and the vertex E in either of the clockwise or counterclockwise direction of the vertex D; the symbol notation of this method is (x.sub.c, y.sub.c, u, v, ρ), x.sub.c and y.sub.c are the two coordinate values of the center point C, u and v are the two components of vector {right arrow over (CD)}, ρ is the ratio of the vector {right arrow over (CP)} to vector {right arrow over (CD)}.
2. The annotation method of arbitrary-oriented rectangular bounding box according to claim 1, characterized in that: using a binary value s to indicate whether the two components of the vector {right arrow over (CD)} are all positive (or negative) or a positive and a negative, and making {right arrow over (CD)} and −{right arrow over (CD)} be represented by (|u|, |v|, s) at once, which leads to on bounding box has only one representation vector; the symbol notation is (x.sub.c, y.sub.c, |u|, |v|, s, ρ) , wherein |u| and |v| are magnitude of two components of the vector {right arrow over (CD)}.
3. The annotation method of arbitrary-oriented rectangular bounding box according to claim 1, characterized in that: let (u, v)=2{right arrow over (CD)} makes this method compatible with the axis-aligned rectangular annotated by the center point, width and height, its symbol notation is (x.sub.c, y.sub.c, 2|u|, 2|v|, s, ρ).
4. The annotation method of arbitrary-oriented rectangular bounding box according claim 2, characterized in that: let (u, v)=2{right arrow over (CD)} makes this method compatible with the axis-aligned rectangular annotated by the center point, width and height, its symbol notation is (x.sub.c, y.sub.c, 2|u|, 2|v|, s, ρ).
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0014]
[0015]
DETAILED DESCRIPTION OF THE INVENTION
[0016] In
[0017] In
[0018] An annotation method of arbitrary-oriented rectangular bounding box that used for taking as anchor boxes, annotating sample images and bounding box output at predicting of target detection and tracking algorithm, wherein
[0019] the elements for annotation being the coordinates of the center point C, a vector {right arrow over (CD)} formed by the center point C and a chosen vertex D, and the ratio of the vector {right arrow over (CP)} to vector {right arrow over (CD)}, where {right arrow over (CP)} is the projection of the vector {right arrow over (CE)} to {right arrow over (CD)}, and {right arrow over (CE)} is a vector formed by the center of the bounding box to one of the vertex E that close neighbor to vertex D; the symbol notation of this method is (x.sub.c, y.sub.c, u, v, ρ), x.sub.c, and y.sub.c, are the two coordinate values of the center point C, u and v are the two components of vector {right arrow over (CD)}, ρ is the ratio of the vector {right arrow over (CP)} to vector {right arrow over (CD)}.
[0020] To reduce the number of representation vectors, the value range of ρ required to be in [0,1), i.e. the vector {right arrow over (CP)} is in the same direction as the vector {right arrow over (CD)}, the vertex E in either of the clockwise or counterclockwise direction of the vertex D. With this constraint, there are only two representation vectors of one bounding box. In other words, taking the opposite vector of {right arrow over (CD)} and leaving the rest unchanged is still represents the same bounding box.
[0021] Since one bounding box still has two representation vectors, means are needed to avoids loss inconsistency, a loss function that produce the same output of the prediction between the two representation vectors should be provided. Because only the vectors {right arrow over (CD)} of the two representations are in the opposite direction, letting the loss value of the prediction {right arrow over (CD*)} between {right arrow over (CD)} and −{right arrow over (CD)} be the same will achieve the goal. Let {right arrow over (CP)} be the projection vector of {right arrow over (CD*)} on {right arrow over (CD)}, then an available loss function can be:
|{right arrow over (CD*)}−{right arrow over (CP)}|+||{right arrow over (CD)}|−|{right arrow over (CP)}||
[0022] As shown in
[0023] Because only the vectors {right arrow over (CD)} of the two representations are in the opposite direction, they can be represented at once. Using a binary value s to indicate whether the two components of the vector {right arrow over (CD)} are all positive (or negative) or a positive and a negative (hereinafter referred to same sign or different sign), then {right arrow over (CD)} and −{right arrow over (CD)} can be represented by (|u|, |v|, s) at once, wherein |u| and |v| are magnitude of two components of the vector {right arrow over (CD)}. If the two components are of same sign, {right arrow over (CD)} and −{right arrow over (CD)} are (|u|, |v|) and (−|u|, −|v|). If the two components are of different sign, {right arrow over (CD)} and −{right arrow over (CD)} are (−|u|, |v|) and (|u|, −|v|). Now, we can reduce the number of representation vectors of one bounding box to one, its symbol notation is (x.sub.c, Y.sub.c, |u|, |v|, s, ρ).
[0024] Since the representation vector has been reduced to one, the calculation of the loss will be more convenient. When predicting a target box directly, the loss of x.sub.c, Y.sub.c, |u|, |v|, s, ρ can be calculated in a regression way, that is, the difference between values is directly calculated, such as SmoothL1, L2, etc. The loss of s can be calculated in a classified way, so that the model outputs two values for s, indicating the possibility of taking the same sign and the different sign. If the value representing the same sign is bigger, the two components are of same sign, otherwise the opposite. The loss function can be CorssEntropy, L2, etc.
[0025] When using the feature vector to predict the regression parameters of the anchor box to the target box, it is possible to artificially stipulate that the anchor box of the same sign regress to the target box of the same sign, and the anchor box of the different sign regress to the target box of the different sign. Then there is no need to calculate the loss of s.
[0026] When use this method to annotate axis-aligned rectangular b-box, we can find that the two components of the vector {right arrow over (CD)} are the half of the width and the height. So, let (u, v)=2{right arrow over (CD)} makes this method be compatible with the axis-aligned rectangular annotated by the center point, width and height.
[0027] With this annotation method, we can calculate the four vertexes of rectangular by solve the following equations. The coordinates of {right arrow over (CE)} is unknown, after {right arrow over (CE)} is solved the coordinates of the vertexes can be calculated by doing addition and subtraction of vectors.
[0028] Where the first equation means {right arrow over (EP)} is perpendicular to {right arrow over (CD)}, the second equation means the length of CE and CD are identical, the constraint means the vertex E in either of the clockwise or counterclockwise direction of the vertex D. Only one of {right arrow over (CE)}×{right arrow over (CD)}≥0 and {right arrow over (CE)}×{right arrow over (CD)}≤0 can be taken.
[0029] One embodiment thereof is: when annotating the sample image, the value of x.sub.c, y.sub.c, |u|, |v| is normalized according to image width (w.sub.i) and height (h.sub.i). For compatibility with the axis-aligned rectangular annotated by the center point, width and height, expand |u| and |v| by a factor of 2. Then the corresponding value of the target bounding box in the annotated document is x.sub.c/w.sub.i, y.sub.c/h.sub.i, 2|u|/w.sub.i, 2|v|/h.sub.i, d, ρ.
[0030] Another embodiment thereof is: When we artificially stipulate that the anchor box of the same sign regress to the target box of the same sign, and the anchor box of the different sign regress to the target box of the different sign. The regression parameters from the anchor box to the target box can be defined using the following formula:
t.sub.x=(x*.sub.c−x.sub.c.sup.a)/w.sub.a, t.sub.y=(y*.sub.c−y.sub.c.sup.a)/h.sub.a
t.sub.u=ln(|u|*/|u|.sup.a),t.sub.v=ln(|v|*/|v|.sup.a),t.sub.ρ=ln(ρ*/ρ.sup.a)
[0031] Wherein, x*.sub.c, y*.sub.c, |u|*, |v|* and ρ* are parameters of target box, x.sub.c.sup.a, y.sub.c.sup.a, |u|.sup.a, |v|.sup.a and ρ.sup.a are parameters of pre-setting anchor box, t.sub.x, t.sub.y, t.sub.u, t.sub.v and t.sub.ρ are the regression parameters that transforms the anchor box into the target box, and is also the value that the model needs to output directly.