APPARATUS AND METHOD WITH OBJECT DETECTION
20230048497 · 2023-02-16
Assignee
Inventors
- Jinhyuk CHOI (Hwaseong-si, KR)
- SEHO SHIN (Seoul, KR)
- ByeongJu LEE (Seoul, KR)
- Sung Hyun CHUNG (Osan-si, KR)
- DAE HYUN JI (Hwaseong-si, KR)
Cpc classification
G06V20/70
PHYSICS
International classification
G06V20/70
PHYSICS
Abstract
Disclosed is an apparatus and method with object detection. The method may include updating a pre-trained model based on sensing data of an image sensor, performing pseudo labeling using an interim model provided a respective training set, determining plural confidence thresholds based on an evaluation of the interim model, performing multiple trainings using the interim model and the generated pseudo labeled data, by applying the determined plural confidence thresholds to the multiple trainings, respectively, and generating an object detection model dependent on the performance of the multiple trainings, including generating an initial candidate object detection model when the interim model is the updated model.
Claims
1. A processor-implemented method, the method comprising: updating a pre-trained model based on sensing data of an image sensor; performing pseudo labeling using an interim model provided a respective training set; determining plural confidence thresholds based on an evaluation of the interim model; performing multiple trainings using the interim model and the generated pseudo labeled data, by applying the determined plural confidence thresholds to the multiple trainings, respectively; and generating an object detection model dependent on the performance of the multiple trainings, including generating an initial candidate object detection model when the interim model is the updated model.
2. The method of claim 1, wherein the updating of the pre-trained model comprises updating a first layer of the pre-trained model using the sensing data.
3. The method of claim 2, wherein the first layer is a batch normalization layer.
4. The method of claim 1, wherein the updating of the pre-trained model comprises performing image adaptation on the sensing data.
5. The method of claim 4, wherein the performing of the image adaptation on the sensing data comprises adjusting an intensity distribution of the sensing data.
6. The method of claim 1, wherein the pre-trained model is based on corresponding sensing data in a different format than the sensing data, and wherein the updating of the pre-trained model comprises converting the sensing data into the different format.
7. The method of claim 1, wherein each of the multiple trainings includes an implementing of the interim model, provided the generated pseudo labeled data, using a different confidence threshold, of the determined plural confidence thresholds, for obtaining a respective labeling result of the implemented interim model, and wherein each of the multiple trainings includes additional training based at least on the respective labeling result.
8. The method of claim 7, wherein the pre-trained model is based on sensing data of another image sensor having different characteristics than the image sensor.
9. The method of claim 7, wherein the multiple trainings are collectively repeated a plurality of times, after an initial time of the plurality of times when the interim model is the updated model, with the interim model being a previous candidate object detection model generated, in the generating of the object detection model, at an immediately previous time of the plurality of times, wherein the generating of the object detection model further includes generating another candidate object detection model at a final time of the plurality of times, and wherein the previous candidate object detection model at a time immediately after the initial time is the initial candidate object detection model.
10. The method of claim 9, wherein the generating of the object detection model includes selecting the object detection model from among plural candidate object detection models based on performance comparisons between the plural candidate object detection models, where the plural candidate object detection models include the initial candidate object detection model, the previous candidate object detection models respectively generated at the plurality times, except at a time of the plurality of times immediately after the initial time, and the other candidate object detection model.
11. The method of claim 10, further comprising performing, by a vehicle, object detection using the generated object detection model provided an image captured by the image sensor.
12. The method of claim 9, wherein the evaluating of the interim model comprises determining a plurality of evaluation scores from respective implementations of the interim model using a plurality of thresholds and a respective validation set, wherein the determining of the plural confidence thresholds comprises: determining a first confidence threshold, of the plural confidence thresholds, used to determine a highest evaluation score of the determined evaluation scores; determining a second confidence threshold, of the plural confidence thresholds, greater than the determined first confidence threshold; and determining a third confidence threshold, of the plural confidence thresholds, less than the determined first confidence threshold.
13. The method of claim 9, wherein the multiple trainings at each of the plurality of times have respective trained model results, wherein, at each of the plurality of times, the generating of the object detection model generates a corresponding candidate object detection model by performing an ensemble of the respective trained model results.
14. The method of claim 1, wherein the generating of the object detection model comprises generating the initial candidate object detection model by performing an ensemble of the respective model results of the multiple trainings when the interim model is the updated model.
15. The method of claim 1, wherein the evaluating of the interim model comprises determining a plurality of evaluation scores from respective implementations of the interim model using a plurality of thresholds and a respective validation set, and wherein the determining of the plural confidence thresholds comprises: determining a first confidence threshold, of the plural confidence thresholds, used to determine a highest evaluation score of the determined evaluation scores; determining a second confidence threshold, of the plural confidence thresholds, greater than the determined first confidence threshold; and determining a third confidence threshold, of the plural confidence thresholds, less than the determined first confidence threshold.
16. The method of claim 15, wherein the performing of the multiple trainings comprises: performing a first training to which the determined first confidence threshold is applied using the interim model and the generated pseudo labeled data; performing a second training to which the determined second confidence threshold is applied using the interim model and the generated pseudo labeled data; and performing a third training to which the determined third confidence threshold is applied using the interim model and the generated pseudo labeled data.
17. The method of claim 1, wherein the performing of the pseudo labeling using the interim model includes: generating first pseudo labeled data by performing the pseudo labeling based on the updated model and a first unlabeled training set as the respective training set; and generating second pseudo labeled data by performing the pseudo labeling based on the initial object detection model and a second unlabeled training set as the respective training set; evaluating the initial candidate object detection model; determining confidence thresholds for the generated second pseudo labeled data based on a result of evaluating the initial candidate object detection model; performing multiple second trainings, among the multiple trainings, using the initial candidate object detection model and the generated second pseudo labeled data, by applying the confidence thresholds for the generated second pseudo labeled data to the multiple second trainings, respectively; and generating, in the generating the object detection model, a second candidate object detection model using results of the multiple second trainings, wherein the first unlabeled training set and the second unlabeled training set are same or different training sets.
18. The method of claim 17, further comprising: repeating a plurality of times, after the generating of the initial candidate object detection model and except for an initial time of the plurality of times when the second candidate object detection model is generated: the performing of the pseudo labeling using, as the interim model at a corresponding time of the plurality of times, a previous candidate object detection model generated, in the generating of the object detection model, at an immediately previous time of the plurality of times; the evaluating of the interim model, at the corresponding time; the performing of the multiple trainings, at the corresponding time, with respect to the interim model; and a generating, in the generating the object detection model at the corresponding time, another candidate object detection model based on results of the multiple trainings at the corresponding time; and generating the object detection model by selecting the object detection model from among plural candidate object detection models based on performance comparisons between the plural candidate object detection models, where the plural candidate object detection models include the initial candidate object detection model, the previous candidate object detection models at the immediately previous times, and the other candidate object detection model at a final time of the plurality of times.
19. An apparatus, the apparatus comprising: a memory configured to store an object detection model; and a processor configured to perform object detection using an image from an image sensor and the object detection model, wherein, for the generation of the object detection model, the processor is configured to: update a pre-trained model based on sensing data of the image sensor; perform pseudo labeling using an interim model provided a respective training set, to generate pseudo labeled data; determine plural confidence thresholds based on an evaluation of the interim model; perform multiple trainings using the interim model and the generated pseudo labeled data, by applying the determined plural confidence thresholds to the multiple trainings, respectively; and generate the object detection model dependent on the performance of the multiple trainings, including generating a candidate object detection model when the interim model is the updated model.
20. The apparatus of claim 19, wherein the update of the pre-trained model comprises updating a first layer of the pre-trained model using sensing data of another image sensor that has same characteristics as the image sensor.
21. The apparatus of claim 20, wherein the first layer is a batch normalization layer.
22. The apparatus of claim 19, wherein the update of the pre-trained model comprises performing image adaptation on the sensing data.
23. The apparatus of claim 19, wherein the processor is further configured to perform the evaluation by determining a plurality of evaluation scores for the interim model using a plurality of thresholds and a respective validation set, and wherein the determination of the confidence thresholds comprises: a determination of a first confidence threshold used to determine a highest evaluation score of the determined evaluation scores; a determination of a second confidence threshold greater than the determined first confidence threshold; and a determination of a third confidence threshold less than the determined first confidence threshold.
24. The apparatus of claim 23, wherein, for the performing of the multiple trainings, the processor is configured to: perform a training to which the determined first confidence threshold is applied using the interim model and the generated pseudo labeled data; perform a training to which the determined second confidence threshold is applied using the interim model and the generated pseudo labeled data; and perform a training to which the determined third confidence threshold is applied using the interim model and the generated pseudo labeled data.
25. The apparatus of claim 19, wherein the generation of the object detection model comprises generating the candidate object detection model by performing an ensemble of respective model results of the plurality of trainings.
26. The apparatus of claim 19, wherein the processor is further configured to adjust an intensity distribution of the image using the object detection model.
27. The apparatus of claim 26, further comprising the image sensor.
28. The apparatus of claim 27, wherein the apparatus is a vehicle.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054] Throughout the drawings and the detailed description, unless otherwise described or provided, the same drawing reference numerals will be understood to refer to the same or like elements, features, and structures. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.
DETAILED DESCRIPTION
[0055] The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, with the exception of operations necessarily occurring in a certain order. Also, descriptions of features that are known after an understanding of the disclosure of this application may be omitted for increased clarity and conciseness.
[0056] The features described herein may be embodied in different forms and are not to be construed as being limited to the examples described herein. Rather, the examples described herein have been provided merely to illustrate some of the many possible ways of implementing the methods, apparatuses, and/or systems described herein that will be apparent after an understanding of the disclosure of this application.
[0057] The terminology used herein is for describing various examples only and is not to be used to limit the disclosure. The articles “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any one and any combination of any two or more of the associated listed items. As non-limiting examples, terms “comprise” or “comprises,” “include” or “includes,” and “have” or “has” specify the presence of stated features, numbers, operations, members, elements, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, numbers, operations, members, elements, and/or combinations thereof.
[0058] Throughout the specification, when a component or element is described as being “connected to,” “coupled to,” or “joined to” another component or element, it may be directly “connected to,” “coupled to,” or “joined to” the other component or element, or there may reasonably be one or more other components or elements intervening therebetween. When a component or element is described as being “directly connected to,” “directly coupled to,” or “directly joined to” another component or element, there can be no other elements intervening therebetween. Likewise, expressions, for example, “between” and “immediately between” and “adjacent to” and “immediately adjacent to” may also be construed as described in the foregoing.
[0059] Although terms such as “first,” “second,” and “third”, or A, B, (a), (b), and the like may be used herein to describe various members, components, regions, layers, or sections, these members, components, regions, layers, or sections are not to be limited by these terms. Each of these terminologies is not used to define an essence, order, or sequence of corresponding members, components, regions, layers, or sections, for example, but used merely to distinguish the corresponding members, components, regions, layers, or sections from other members, components, regions, layers, or sections. Thus, a first member, component, region, layer, or section referred to in the examples described herein may also be referred to as a second member, component, region, layer, or section without departing from the teachings of the examples.
[0060] Unless otherwise defined, all terms, including technical and scientific terms, used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains and based on an understanding of the disclosure of the present application. Terms, such as those defined in commonly used dictionaries, are to be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the disclosure of the present application and are not to be interpreted in an idealized or overly formal sense unless expressly so defined herein. The use of the term “may” herein with respect to an example or embodiment, e.g., as to what an example or embodiment may include or implement, means that at least one example or embodiment exists where such a feature is included or implemented, while all examples are not limited thereto.
[0061] Herein, an “object” may be a target to be detected from an image, such as a person or a thing. The various types (classes) of the “object” to be detected may be defined in advance as various preset types, e.g., a person type, a vehicle type, an animal type, etc., as well as various sub-types of the same, noting that examples are not limited to only such various types.
[0062] An object detection model may be or include a deep convolutional neural network (DCNN). As a non-limiting example, the DCNN may include one or more convolutional layers, one or more pooling layers, and one or more fully connected layers. Here, the DCNN is provided as merely an example, and the object detection model may be or include a neural network or other machine learning model having a structure other than the DCNN.
[0063]
[0064] Referring to
[0065] In
[0066] The computing apparatus 100 may perform model update 110 using the sensing data 101-1 of the image sensor 101 and a pre-trained model 102. The pre-trained model 102 may be a model trained based on labeled data of a previous image sensor, or a model for detecting an object from sensing data of the previous image sensor. The computing apparatus 100 may update the pre-trained model 102 using the sensing data 101-1 of the image sensor 101. An example model update 110 will be described in greater detail below with reference to
[0067] The computing apparatus 100 may perform pseudo labeling 120 using the model updated through model update 110 and a training set, e.g., a respective training set for this case of performing pseudo labeling 120 using the updated model. The training set may be a data set obtained by the image sensor 101. As will be described in greater detail below with reference to
[0068] The computing apparatus 100 may perform evaluation 130 of the model updated through model update 110, and determine confidence thresholds for the pseudo labeled data based on a result of evaluation. The confidence thresholds may have different values. An example evaluation 130 will be described in greater detail below with reference to
[0069] The computing apparatus 100 may perform a plurality of trainings using the updated model and the pseudo labeled data. Herein, such plurality of trainings may also be referred to as multiple trainings using an interim model, e.g., wherein this case the interim model would be the updated model, while in other cases the interim model may be a generated model based on model results of the multiple trainings. Such a generated model with respect to the updated model may also be referred to as an initial candidate model, while subsequent generated models based on a repetition of the multiple trainings collectively a plurality of times, such as described below with respect to
[0070] Returning to the case of the performing of the plurality of trainings with respect to the updated model, the computing apparatus 100 may perform a plurality of trainings by applying the confidence thresholds to the plurality of trainings, respectively. The computing apparatus 100 may generate an object detection model (hereinafter, referred to as “object detection model.sub.1” for ease of description) using results of the plurality of trainings.
[0071] The computing apparatus 100 may generate an object detection model.sub.2 by performing pseudo labeling 120, evaluation 130, and training 140 on the object detection model.sub.1, and generate an object detection model.sub.3 by performing pseudo labeling 120, evaluation 130, and training 140 on the object detection model.sub.2. In this way, the computing apparatus 100 may generate a plurality of object detection models, and determine an object detection model having a best or maximum performance of the plurality of object detection models, e.g., to desirably be the primary object detection model for the image sensor 101.
[0072]
[0073] Referring to
[0074] The computing apparatus 100 may adjust (or change) a distribution, e.g., intensity distribution, of sensing data 101-1, e.g., sensing data 101-1 of
[0075] For smooth pseudo labeling 120, e.g., the pseudo labeling 120 of
Equation 1
if I.sub.avg<I.sub.thr.sup.low and 2.sup.μ≤I.sub.avg<2.sup.μ+1: I.sub.xy=I.sub.xy<<(α−μ+1) (1)
if I.sub.avg>I.sub.thr.sup.high and 2.sup.θ≤I.sub.avg<2.sup.θ+1: I.sub.xy=I.sub.xy>>(θ−β) (2) [0076] 0<I<65536, about 16 bit RAW data [0077] I.sub.thr.sup.low=2.sup.α, α=11, 0≤μ<11 [0078] I.sub.thr.sup.high=2.sup.β, β=14, 14≤θ<16
[0079] In Equation 1 above, I.sub.avg denotes an average intensity of the sensing data 101-1, I.sub.thr.sup.low denotes a lower threshold, and I.sub.thr.sup.high denotes an upper threshold.
[0080] When an image sensor 101, e.g., the image sensor 101 of
[0081] When the image sensor 101 collects the sensing data 101-1 in a high-intensity situation, e.g., situation in which a vehicle drives during the daytime, the computing apparatus 100 may adjust the intensity distribution of the sensing data 101-1 to be relatively low through (2) of Equation 1.
[0082] The computing apparatus 100 may perform batch normalization update 220 on the pre-trained model 102 based on adjusted sensing data 210-1. More specifically, the computing apparatus 100 may update a batch normalization layer of the pre-trained model 102 through the adjusted sensing data 210-1. As an example, the computing apparatus 100 may update a mean and a variance of a batch through the below example Equation 2. Through this, the computing apparatus 100 may update the batch normalization layer of the pre-trained model 102 to match the adjusted sensing data 210-1.
[0083] At this stage, when updating the batch normalization layer, the computing apparatus 100 may not update values, e.g., weights in the pre-trained model 102, other than the mean and the variance of the batch, e.g., the computing apparatus 100 may update only the batch normalization layer of the pre-trained model 102 to match the adjusted sensing data 210-1.
[0084] For example, the sensing data 101-1 may be converted into the format of the data set used to update the pre-trained model 102. The batch normalization layer of the pre-trained model 102 may be updated through the sensing data with the format converted. For example, the sensing data 101-1 may be raw data, but the data set that was used to previously train the pre-trained model 102 may have been in the RGB format, as only an example, so sensing data 101-1 may be converted into the RGB format, and the batch normalization layer of the pre-trained model 102 updated using the converted sensing data in the RGB format to generate the updated model 230.
[0085]
[0086] Referring to
[0087] The updated model 230 may generate first pseudo labeled data 310 by performing inference operations for on the input training set 301 input to the updated model 230. The updated model 230 may determine at least a bounding box in each of the images in the input training set 301, and calculate per-class probability values for the bounding box of each image. For example, the updated model 230 may determine a bounding box.sub.A in an image.sub.A in the input training set 301, calculate per-class probabilities, e.g., a probability of belonging to a vehicle class, a probability of belonging to a human class, a probability of belonging to an animal class, etc., for the bounding box.sub.A, and assign the calculated per-class probabilities as pseudo labels for the bounding box.sub.A. This may be performed for each image.sub.A in the input training set 301.
[0088]
[0089] Referring to
[0090] The computing apparatus 100 may apply a threshold th.sub.1 to the updated model 230. The updated model 230 to which the threshold th.sub.1 is applied may generate an object detection result upon receiving the validation set 401. As an example, the updated model 230 may determine a bounding box.sub.1 in an image.sub.1 in the validation set 401, calculate per-class probabilities for the bounding box.sub.1, and select an object in the bounding box.sub.1 as a final detection result when a highest probability of the calculated per-class probabilities is greater than or equal to the threshold th.sub.1. The computing apparatus 100 may classify an object detection result of the updated model 230 as false positive (FP), false negative (FN), true positive (TP), or true negative (TN) for the threshold th.sub.1. The computing apparatus 100 may calculate a precision and a recall for the threshold th.sub.1 through the classification result, and calculate a score.sub.1 using the precision and the recall. The score.sub.1 may be, for example, but not limited to, an F1 score.
[0091] Similarly, the computing apparatus 100 may calculate evaluation scores for the updated model 230 respectively when the other thresholds th.sub.2, . . . , th.sub.n are applied. As in the example shown in
[0092] The computing apparatus 100 may determine a threshold used to determine a highest evaluation score of the plurality of evaluation scores 420 to be a first confidence threshold β.sub.1, determine a second confidence threshold α.sub.1 greater than the first confidence threshold, and determine a third confidence threshold γ.sub.1 smaller than the first confidence threshold. As an example, the computing apparatus 100 may determine the first to third confidence thresholds β.sub.1, α.sub.1, and γ.sub.1 through the below example Equation 3.
Equation 3:
β: optimal detection threshold
α: β+0.1+ε
γ: β−0.1+ε (5)
[0093] In Equation 3 above, optimal detection threshold denotes a threshold used for determining the highest evaluation score described above. ε denotes a constant.
[0094] Of the first to third confidence thresholds β.sub.1, α.sub.1 and γ.sub.1, the second confidence threshold α.sub.1 is the largest, and the third confidence threshold γ.sub.1 is the smallest.
[0095]
[0096] Referring to
[0097] The computing apparatus 100 may perform training 510 to which the first confidence threshold β.sub.1 is applied, using the updated model 230 and the first pseudo labeled data 310. For example, the computing apparatus 100 may perform training 510 of the updated model 230 through an image having a resulting pseudo label, e.g., a highest probability among the per-class probabilities being a particular pseudo label among all label classes, greater than or equal to the first confidence threshold β.sub.1 among the plurality of images in the first pseudo labeled data 310. Of the confidence thresholds β.sub.1, α.sub.1, and γ.sub.1, the first confidence threshold β.sub.1 is the median. In training 510 to which the first confidence threshold β.sub.1 is applied, more false positives (FPs) may occur than in training 510 to which the second confidence threshold α.sub.1 is applied, and more false negatives (FNs) may occur than in training 510 to which the third confidence threshold γ.sub.1 is applied.
[0098] The computing apparatus 100 may perform training 520 to which the second confidence threshold α.sub.1 is applied, using the updated model 230 and the first pseudo labeled data 310. For example, the computing apparatus 100 may perform training 520 of the updated model 230 through an image having a pseudo label greater than or equal to the second confidence threshold α.sub.1 among the plurality of images in the first pseudo labeled data 310. Since the second confidence threshold α.sub.1 is the largest of the confidence thresholds β.sub.1, α.sub.1, and γ.sub.1, FPs may not occur in the result of training 520. In other words, the computing apparatus 100 may perform training 520 to which the highest confidence threshold α.sub.1 is applied so that FPs may not occur.
[0099] The computing apparatus 100 may perform training 530 to which the third confidence threshold γ.sub.1 is applied, using the updated model 230 and the first pseudo labeled data 310. For example, the computing apparatus 100 may perform training 530 of the updated model 230 through an image having a pseudo label greater than or equal to the third confidence threshold γ.sub.1 among the plurality of images in the first pseudo labeled data 310. Since the third confidence threshold γ.sub.1 is the smallest of the confidence thresholds β.sub.1, α.sub.1, and γ.sub.1, FNs may not occur in the result of training 530. In other words, the computing apparatus 100 may perform training 530 to which the smallest confidence threshold γ.sub.1 is applied so that FNs may not occur.
[0100] The computing apparatus 100 may generate the object detection model.sub.1 550 by performing the ensemble 540 of the results of the trainings 510, 520, and 530. For example, the computing apparatus 100 may perform the ensemble 540 of the results of the plurality of trainings 510, 520, and 530 through the below example Equation 4.
[0101] In Equation 4 above, M.sub.1 may correspond to the result of training 510, M.sub.2 may correspond to the result of training 520, M.sub.3 may correspond to the result of training 530, and ME may correspond to the object detection model.sub.1 550.
[0102] Thus, as only an example and depending on implementation embodiment, the computing apparatus 100 may generate the object detection model.sub.1 550 by averaging respective weights of the results of the trainings 510, 520, and 530, for each weight of each of plural layers of the model. As an example, Table 1 below shows examples of weights of models trained respectively through the trainings 510, 520, and 530.
TABLE-US-00001 TABLE 1 Weights of model Weights of model Weights of model trained through trained through trained through training 510 training 520 training 530 w.sub.1.sub.
[0103] The computing apparatus 100 may generate the object detection model.sub.1 550 using the corresponding weights in Table 1 above. As an example, the computing apparatus 100 may average the corresponding weights in Table 1 above. Table 1 below shows examples of average results w.sub.1, w.sub.2, . . . , w.sub.n of the corresponding weights.
TABLE-US-00002 TABLE 2 w.sub.1 = (w.sub.1.sub.
[0104] The computing apparatus 100 may generate the object detection model.sub.1 550 having the weights w.sub.1, w.sub.2, . . . , w.sub.n of Table 2 above. As another example, the computing apparatus 100 may apply ratios to the corresponding weights of Table 1 above. The computing apparatus 100 may apply a ratio a to w.sub.1_1, apply a ratio b to w.sub.1_2, and apply a ratio c to w.sub.1_3. Table 3 below shows examples of results of applying such ratios to the corresponding weights.
TABLE-US-00003 TABLE 3 w.sub.1 = a × w.sub.1.sub.
[0105] In Table 3 above, the sum of a, b, and c is “1”. The computing apparatus 100 may generate the object detection model.sub.1 550 having the weights w.sub.1, w.sub.2, . . . , w.sub.n of Table 3 above.
[0106] The computing apparatus 100 may generate the object detection model.sub.2 by performing a pseudo labeling 120, an evaluation 130, and training 140 on the object detection model.sub.1 550. As an example, the pseudo labeling 120, evaluation 130, and training 140 may correspond to the repetition of the pseudo labeling 120, evaluation 130, and training 140 of
[0107] The computing apparatus 100 may generate second pseudo labeled data by performing pseudo labeling 120 based on a second training set and the object detection model.sub.1 550, e.g., a respective training set for this case of performing pseudo labeling 120 using the updated model. The second training set is a data set obtained through the image sensor 110, and may be an unlabeled data set. The second training set may be the same as or different from the training set described with reference to
[0108] The computing apparatus 100 may perform evaluation 130 on the object detection model.sub.1 550, and determine confidence thresholds β.sub.2, α.sub.2, and γ.sub.2 for the second pseudo labeled data based on a result of evaluation on the object detection model.sub.1 550. The description provided with reference to
[0109] Referring to
[0110] The computing apparatus 100 may generate an object detection model.sub.3 by again performing the pseudo labeling 120, the evaluation 130, and the training 140 on the object detection model.sub.2 650. In this way, the computing apparatus 100 may generate a plurality of object detection models, and select an object detection model having a best or maximum performance from among the plurality of object detection models. The selected object detection model may be stored in various devices, such as a memory or storage device of a vehicle, other electronic, etc., and perform object detection by receiving sensing data from an image sensor mounted on or in, or connected to, such device.
[0111] In an example, the image sensor 101 may be a newly released image sensor, and have different characteristics than a previous image sensor that was used to generate the pre-trained model 102 of
[0112]
[0113] Referring to
[0114] In operation 720, the computing apparatus 100 may generate pseudo labeled data by performing pseudo labeling 120, e.g., any of the pseudo labeling 120 described herein, based on an updated model and training set, e.g., the updated model 230 of
[0115] In operation 730, the computing apparatus 100 may perform evaluation 130, e.g., any of the evaluations 130 described herein, on the updated model 230.
[0116] In operation 740, the computing apparatus 100 may determine confidence thresholds for the pseudo labeled data based on a result of the evaluation on the updated model 230.
[0117] In operation 750, the computing apparatus 100 may perform a plurality of trainings using the updated model 230 and the pseudo labeled data, by applying the confidence thresholds to the plurality of trainings, respectively.
[0118] As an example, as described with reference to
[0119] In operation 760, the computing apparatus 100 may generate an object detection model based on the results of the plurality of trainings.
[0120] The description provided with reference to
[0121]
[0122] Referring to
[0123] As an example, the computing apparatus 800 may be any one or any combination of the computing apparatuses 100 described herein, and may further be any one or any combination of any of the computing apparatuses described herein.
[0124] The processor 810 may perform a model update 110, a pseudo labeling 120, an evaluation 130, and a training 140, such as described above with reference to the model update 110, the pseudo labeling 120, the evaluation 130, and the training 140 of any one or any combination of
[0125] The memory 820 may store information necessary for the processor 810 to perform the processing operation. As an example, the memory 820 may store instructions to be executed by the processor 810 and store sensing data 101-1, a pre-trained model 102, a training set 301, a validation set 401, e.g., the sensing data 101-1 and the pre-trained model 102 of
[0126] The memory 820 may store a result of the processing operation of the processor 810. As an example, the memory 820 may store the updated model 230, first pseudo labeled data 310, e.g., the first pseudo labeled data 310 of
[0127] Example embodiments further include the computing apparatus 800, or a combination of the computing apparatus 800 and the below computing apparatus 900 of
[0128]
[0129] Referring to
[0130] The memory 920 stores an object detection model. The processor 910 may be configured to generate any one or a plurality of object detection models described herein through performance any one or any combination of the respective operations, e.g., a model update 110, a pseudo labeling 120, an evaluation 130, and a training 140, described above with reference to the model update 110, or with respect to any one or any combination of such operations of
[0131] The processor 910 receives an image from an image sensor and performs object detection using the object detection model retrieved from the memory 920 and the received image. The processor 910 may provide an object detecting result including a position, e.g., reference position such as center position or corner, etc., of a bounding box in the received image, a size (width and height) of the bounding box, a class of an object, and the like.
[0132] In an example, the processor 910 may adjust an intensity distribution of the image using the object detection model. For example, the processor 910 may receive raw sensing data from the image sensor. The object detection model may perform image adaptation on the raw sensing data. For example, an input layer of the object detection model may be configured to adjust an intensity distribution of input data through Equation 1 above. Examples are not limited thereto, and the input layer of the object detection model may be configured to perform various operations typically performed by an ISP, e.g., contrast adjustment, distortion correction, etc. The subsequent layers of the object detection model may detect an object based on the raw sensing data on adapted raw sensing data. Accordingly, in an example, the processor 910 may perform object detection based on the raw sensing data without an ISP for processing the raw sensing data of the image sensor.
[0133] Example embodiments include computing apparatus 900 being applied in various fields. For example, the computing apparatus 900 may be, or be included in, advanced driver-assistance systems (ADAS)/autonomous driving (AD) systems of a vehicle. Examples are not limited thereto, and example embodiments include the computing apparatus 900, or a combination of the computing apparatuses 800 and 900, being a surveillance system, e.g., closed-circuit television (CCTV) surveillance, or military border surveillance, a sports game analysis system, a smart campus system, a video conference system, and the like. In addition, example embodiments include the computing apparatus 900, or the combination of the computing apparatuses 800 and 900, being configured with respect to any field, system, or device with object detection.
[0134]
[0135] In the examples of
[0136] The memory 1120 of the vehicle 1100 stores one or more object detection models, among any one or any combination of all models and data described herein with respect to captured, training, labeled, or pseudo-labeled images, as well as any sensing data with respect to the image sensors 1130, as non-limiting examples. For example, one or more of the processor 1110 may be configured to generate any one or a plurality of object detection models described herein through performance any one or any combination of respective operations, e.g., a model update 110, a pseudo labeling 120, an evaluation 130, and a training 140, described above with reference to the model update 110, the pseudo labeling 120, the evaluation 130, and the training 140 of any one or any combination of the descriptions with respect to
[0137] The ADAS/AD systems of the vehicle 1000 and the ADAS/AD system 1150 of vehicle 1100 may generate information associated with the traveling of the vehicle 1000 and vehicle 1100, respectively. The information associated with the traveling of the vehicle 1000 may be data used to assist in the traveling of the vehicle 1000 or used for the traveling of the vehicle 1000, and include, for example, route guidance information, danger warning information, e.g., information about an accident such as a collision, road condition information, e.g., road congestion, and surrounding environment information. Such information may also be generated and/or provided by the ADAS/AD system 1150 and/or the information output 1170 of
[0138] The image sensor in the vehicle 1000 of
[0139] The ADAS/AD systems of the vehicle 1000 and the ADAS/AD system of 1150, and/or one or more processors 1110 of the vehicle 1100, may perform or control autonomous driving based on a result of object detection by the computing apparatus. For example, the ADAS/AD systems of the vehicle 1000 may perform any one or any combination of speed control, acceleration control, and steering control of the vehicle 1000, as non-limiting examples. Likewise, as an example, the ADAS/AD system 1150 of the vehicle 1100 may perform any one or any combination of speed control, acceleration control, and steering control of the vehicle 1100, e.g., based on corresponding control or instructions from the ADAS/AD system 1150 to the vehicle operation/function 1140 of the vehicle 1100 to implement such physical controls of the speed, acceleration, and steering of the vehicle 1100. For example, the ADAS/AD systems of the vehicle 1000 or the ADAS/AD system 1150 of vehicle 1100 (or combination of the ADAS/AD system 1150 and the vehicle operation/function 1140) may calculate a distance to an object existing in the vicinity or environment of the vehicle 1000 or vehicle 1100, and control, cause, or perform any one or any combination of the speed changes, e.g., increase or decrease, the acceleration changes, e.g., increase or decrease, and the steering changes for the vehicle 1000 or the vehicle 1100 based on the distance to the object, as non-limiting examples.
[0140] The computing apparatuses, the vehicles, the electronic devices, the processors, the memories, the image sensors, the vehicle/operation function hardware, the ADAS/AD systems, the displays, the information output system and hardware, the storage devices, and other apparatuses, devices, units, modules, and components described herein with respect to
[0141] The methods illustrated in
[0142] Instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above may be written as computer programs, code segments, instructions or any combination thereof, for individually or collectively instructing or configuring the one or more processors or computers to operate as a machine or special-purpose computer to perform the operations that are performed by the hardware components and the methods as described above. In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler. In another example, the instructions or software includes higher-level code that is executed by the one or more processors or computer using an interpreter. The instructions or software may be written using any programming language based on the block diagrams and the flow charts illustrated in the drawings and the corresponding descriptions herein, which disclose algorithms for performing the operations that are performed by the hardware components and the methods as described above.
[0143] The instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, may be recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. Examples of a non-transitory computer-readable storage medium include read-only memory (ROM), random-access programmable read only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROMs, CD-Rs, CD+Rs, CD-RWs, CD+RWs, DVD-ROMs, DVD-Rs, DVD+Rs, DVD-RWs, DVD+RWs, DVD-RAMs, BD-ROMs, BD-Rs, BD-R LTHs, BD-REs, blue-ray or optical disk storage, hard disk drive (HDD), solid state drive (SSD), flash memory, a card type memory such as multimedia card micro or a card (for example, secure digital (SD) or extreme digital (XD)), magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state disks, and any other device that is configured to store the instructions or software and any associated data, data files, and data structures in a non-transitory manner and provide the instructions or software and any associated data, data files, and data structures to one or more processors or computers so that the one or more processors or computers can execute the instructions. In one example, the instructions or software and any associated data, data files, and data structures are distributed over network-coupled computer systems so that the instructions and software and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by the one or more processors or computers.
[0144] While this disclosure includes specific examples, it will be apparent after an understanding of the disclosure of this application that various changes in form and details may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered in a descriptive sense only, and not for purposes of limitation. Descriptions of features or aspects in each example are to be considered as being applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents.
[0145] Therefore, in addition to the above disclosure, the scope of the disclosure may also be defined by the claims and their equivalents, and all variations within the scope of the claims and their equivalents are to be construed as being included in the disclosure.