Adaptive boosting machine learning
11494590 · 2022-11-08
Assignee
Inventors
Cpc classification
G06V10/772
PHYSICS
G06F18/2148
PHYSICS
G06F18/285
PHYSICS
International classification
Abstract
An apparatus comprising memory configured to store data to be machine-recognized (710), and at least one processing core configured to run an adaptive boosting machine learning algorithm with the data, wherein a plurality of learning algorithms are applied, wherein a feature space is partitioned into bins, wherein a distortion function is applied to features of the feature space (720), and wherein a first derivative of the distortion function is not constant (730).
Claims
1. An apparatus comprising: memory configured to store data to be machine-recognized; at least one processing core configured to run an adaptive boosting machine learning algorithm with the data, wherein a plurality of learning algorithms are applied, wherein a feature space is partitioned into bins, wherein a distortion function is applied to features of the feature space, and wherein a first derivative of the distortion function is not constant.
2. The apparatus according to claim 1, wherein each of the learning algorithms operates on a distinct bin.
3. The apparatus according to claim 1, wherein the at least one processing core is configured to partition the feature space such that at least a subset of the bins are uniformly sized.
4. The apparatus according to claim 1, wherein the at least one processing core is configured to partition the feature space such that all of the bins are uniformly sized.
5. The apparatus according to claim 1, wherein the at least one processing core is configured to partition the feature space such that there is no overlap between any two of the bins.
6. The apparatus according to claim 1, wherein the at least one processing core is configured to partition the feature space into the bins after applying the distortion function to the features.
7. The apparatus according to claim 1, wherein in the adaptive boosting machine learning algorithm, each bin is treated independently.
8. The apparatus according to claim 1, wherein the at least one processing core is configured to partition the feature space into 256 bins.
9. The apparatus according to claim 1, wherein the at least one processing core is configured to determine, in the adaptive boosting machine learning algorithm, one of the learning algorithms as an optimal classifier in each iteration.
10. The apparatus according to laim 1 wherein the at least one processing core is configured to determine, in the adaptive boosting machine learning algorithm, a final output as a weighted sum of outputs of each of the plurality of learning algorithms.
11. The apparatus according to claim 1, wherein the apparatus is a smartphone or automobile; wherein the data is image data; and wherein the final output of the stored data that is machine-recognized is at least one of the following: a human face, a pedestrian, a bicyclist recognition, a handwriting recognition, a traffic sign recognition, a sign language recognition, a text, or a document.
12. The apparatus according to claim 1, wherein the apparatus is a smartphone; wherein the stored data is acceleration sensor data; and wherein a final output of the stored data that is machine-recognized is at least one of the following: a person is walking, the person is running, or the person is falling.
13. A method comprising: storing data to be machine-recognized; running an adaptive boosting machine learning algorithm with the data, wherein a plurality of learning algorithms are applied, wherein a feature space is partitioned into bins, wherein a distortion function is applied to features of the feature space, and wherein a first derivative of the distortion function is not constant.
14. The method according to claim 13, wherein each of the learning algorithms operates on a distinct bin.
15. The method according to claim 13, wherein the feature space is partitioned such that at least a subset of the bins are uniformly sized.
16. The method according to claim 13, wherein the feature space is partitioned such that all of the bins are uniformly sized.
17. The method according to claim 13, wherein the feature space is partitioned such that there is no overlap between any two of the bins.
18. The method according to claim 13, wherein partitioning the feature space into the bins takes place after applying the distortion function to the features.
19. The method according to claim 13, wherein in the adaptive boosting machine learning algorithm, each bin is treated independently.
20. The method according to claim 13, wherein the feature space is partitioned into 256 bins.
21. The method according to claim 13, further comprising determining, in the adaptive boosting machine learning algorithm, one of the learning algorithms as an optimal classifier in each iteration.
22. A non-transitory computer readable medium having stored thereon a set of computer readable instructions that, when executed by at least one processor, cause an apparatus to at least: store data to be machine-recognized, and run an adaptive boosting machine learning algorithm with the data, wherein a plurality of learning algorithms are applied, wherein a feature space is partitioned into bins, wherein a distortion function is applied to features of the feature space, and wherein a first derivative of the distortion function is not constant.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
EMBODIMENTS
(8) In an AdaBoost algorithm, a feature space may be partitioned into uniformly sized bins or into non-uniform bins. While a uniform partitioning, into bins of the same size, is not always optimal, using a non-uniform partitioning, wherein bins are of differing sizes, involves more complex processing. A distortion function may be applied to features when using uniform partitioning, to obtain processing that is simpler than in the case of non-uniform partitioning, but with at least some benefits of non-uniform partitioning. For example, the distortion function may comprise a stretching function. In some embodiments of the invention, using the distortion function results in uniform-partitioning processing that is equivalent in results to using non-uniform partitioning. Terminologically, using the distortion function may be referred to as implicitly non-uniform partitioning.
(9)
(10) In
(11) An image feed from camera 130 may be provided to image recognition algorithm 140. Physically, image recognition algorithm 140 may operate in the same device as camera 130, or, alternatively, it may reside in another device. In some embodiments, image recognition algorithm 140 is arranged in a distinct computing node, which may comprise a cloud computing node, a server, or other suitable device.
(12) Camera 130 may output an image feed, which may comprise an image frame, for example. The image frame may be digital and/or rectangular. The image frame may be provided to a filter stage, which obtains, based on the image frame, a filtered dataset that comprises features extracted from the image frame. The filter stage may comprise a plurality of filters, each designed to perform a task, such as edge detection, thresholding, feature wavelength analysis, obtaining Gaussian derivatives and/or similar tasks. The filter stage may comprise at least one neural network layer, for example. The filter layer may comprise at least one Gabor filter or Gabor wavelet filter. The features may be comprised, for example, in at least one matrix or at least one vector of numerical values, extracted by filtering from the image frame.
(13) Where the incoming data to be recognized is not visual data, the incoming data may comprise, for example, a vector of digital samples obtained from an analogue-to-digital converter. The analogue-to-digital converter may obtain an analogue feed from a microphone, for example, and generate the samples from the analogue feed. Overall, as discussed above, data of other forms may also be the subject of machine recognition. For example, accelerometer or rotation sensor data may be used to detect whether a person is walking, running or falling.
(14) The features may be modified by applying the distorting function, which may comprise a stretching function, to the features, before providing the distorted features to an AdaBoost algorithm that employs uniform partitioning of the feature space. The feature space may comprise the linear space of the at least one matrix of vector, in which the features are present as numerical values.
(15)
(16) The lowest part of
(17)
(18) Device 300 may comprise memory 320. Memory 320 may comprise random-access memory and/or permanent memory. Memory 320 may comprise at least one RAM chip. Memory 320 may comprise solid-state, magnetic, optical and/or holographic memory, for example. Memory 320 may be at least in part accessible to processor 310. Memory 320 may be at least in part comprised in processor 310. Memory 320 may be means for storing information. Memory 320 may comprise computer instructions that processor 310 is configured to execute. When computer instructions configured to cause processor 310 to perform certain actions are stored in memory 320, and device 300 overall is configured to run under the direction of processor 310 using computer instructions from memory 320, processor 310 and/or its at least one processing core may be considered to be configured to perform said certain actions. Memory 320 may be at least in part comprised in processor 310. Memory 320 may be at least in part external to device 300 but accessible to device 300. Computer instructions in memory 320 may comprise a plurality of applications or processes. For example, machine learning algorithms, such as an AdaBoost algorithm with its classifiers, may run in one application or process, a camera functionality may run in another application or process, and an output of a machine learning procedure may be provided to a further application or process, which may comprise an automobile driving process, for example, to cause a braking action to be triggered responsive to recognition of a pedestrian in a camera view.
(19) Device 300 may comprise a transmitter 330. Device 300 may comprise a receiver 340. Transmitter 330 and receiver 340 may be configured to transmit and receive, respectively, information in accordance with at least one communication standard. Transmitter 330 may comprise more than one transmitter. Receiver 340 may comprise more than one receiver. Transmitter 330 and/or receiver 340 may be configured to operate in accordance with wireless local area network, WLAN, Ethernet, universal serial bus, USB, and/or worldwide interoperability for microwave access, WiMAX, standards, for example. Alternatively or additionally, a proprietary communication framework may be utilized.
(20) Device 300 may comprise user interface, UI, 360. UI 360 may comprise at least one of a display, a keyboard, a touchscreen, a vibrator arranged to signal to a user by causing device 300 to vibrate, a speaker and a microphone. A user may be able to operate device 300 via UI 360, for example to configure machine learning parameters and/or to switch device 300 on and/or off.
(21) Processor 310 may be furnished with a transmitter arranged to output information from processor 310, via electrical leads internal to device 300, to other devices comprised in device 300. Such a transmitter may comprise a serial bus transmitter arranged to, for example, output information via at least one electrical lead to memory 320 for storage therein. Alternatively to a serial bus, the transmitter may comprise a parallel bus transmitter. Likewise processor 310 may comprise a receiver arranged to receive information in processor 310, via electrical leads internal to device 300, from other devices comprised in device 300. Such a receiver may comprise a serial bus receiver arranged to, for example, receive information via at least one electrical lead from receiver 340 for processing in processor 310. Alternatively to a serial bus, the receiver may comprise a parallel bus receiver.
(22) Device 300 may comprise further devices not illustrated in
(23) Processor 310, memory 320, transmitter 330, receiver 340, and/or UI 360 may be interconnected by electrical leads internal to device 300 in a multitude of different ways. For example, each of the aforementioned devices may be separately connected to a master bus internal to device 300, to allow for the devices to exchange information. However, as the skilled person will appreciate, this is only one example and depending on the embodiment various ways of interconnecting at least two of the aforementioned devices may be selected without departing from the scope of the present invention.
(24)
(25)
(26) where x.sub.c denotes a center of a stretching effect created by this distortion function. The area of feature space being stretched is schematically denoted as 410 in the figure. In case no distortion was performed, the function would have the trivial form f(x)=x, and it would have a constant first derivative. In the instant case, features around point x.sub.c are spread out, with features outside area 410 being compressed in the remaining feature space. In general, distortion functions may be monotonically increasing, for example non-linearly increasing, meaning the first derivative is not constant. In general, distortion functions may satisfy f(0)=0 and f(1)=1, when the feature space is scaled to have dimension 1, to prevent features from being stretched outside the feature space. In general, a distortion function may act to stretch a first part of feature space and compress a second part of feature space. The second part may be disposed on both sides of the first part.
(27) The weighted training error Z.sub.t is defined as that in classical AdaBoost with domain-partitioning weak classifiers, reference is made to R. E. Schapire and Y. Singer: “Improved boosting algorithms using confidence-rated predictions”, Machine Learning, 37(3): 297-336, 1999. Let S=[(x.sub.1,y.sub.1), . . . , (x.sub.m,y.sub.m)] be a sequence of training samples where each instance x.sub.i belongs to a feature space or instance space χ and each label y.sub.t belongs to a binary label space y={+1,−1}. Generally speaking, AdaBoost is an iteration process where an optimal weak classifier h.sub.1(x) is computed in each iteration t. The final output is the strong classifier
(28)
which combines the weak classifiers with optimal weights α.sub.i. The goal of each iteration is to compute a weak classifier h.sub.t(x): χ.fwdarw. given a sequence of training samples S along with a distribution D over {1, . . . , m}, that is, over the indices of S. The weak classifier can be obtained by minimizing the weighted training error z.sub.t. Reference is here made to R. E. Schapire and Y. Singer: “Improved boosting algorithms using confidence-rated predictions”, Machine Learning, 37(3): 297-336, 1999):
(29)
(30) By folding α.sub.t into h.sub.t, Z.sub.t and H can be expressed as
(31)
(32) respectively. For the sake of notation simplicity, in what follows we omit the subscript t of h.sub.t, D.sub.t, and α.sub.t.
(33) The minimization of Z.sub.t can be accomplished by domain partitioning, see e.g. R. E. Schapire and Y. Singer: “Improved boosting algorithms using confidence-rated predictions”, Machine Learning, 37(3): 297-336, 1999. Specifically, each weak classifier may be associated with a partition of χ into disjoint blocks X.sub.1, . . . , X.sub.N and for which h(x)=h(x′) holds for all x, x′∈X.sub.j. The response of h(x) depends only on which block X.sub.j a given sample x falls into. To be more specific, h(x) equals to a function of the ratio of the weighted fraction W.sub.+.sup.j of samples falling into block j with label +1 and the weighted fraction W.sub.−.sup.j of samples falling into block j with label −1. W.sub.+.sup.j and W.sub.−.sup.j are defined respectively as
(34)
(35) With and W.sub.+.sup.j and W.sub.−.sup.j letting c.sub.j=h(x) for x∈X.sub.1, Z.sub.1 can be calculated as:
(36)
(37) Computing the derivative of Z.sub.t with regard to x.sub.j and letting it to be zero yields the optimal solution:
(38)
(39) Substituting (8) into (7) yields
(40)
(41) The usefulness of the function is graphically supported by
(42)
(43) Now employ ƒ(x;x.sub.c) to stretch the features with the center x.sub.c inside block #5. Because ƒ(x−x.sub.c) has large slope nearby x.sub.c, block #5 in
a=a.sub.1+a.sub.2+a.sub.3 (11)
(44) Meanwhile, W.sub.−.sup.5 in the upper part of
b=b.sub.1+b.sub.2+b.sub.3 (12)
(45) The training error Z.sub.t, that is, equation 9, corresponding to the upper part of
(46)
(47) The training error Z′.sub.t corresponding to the lower part of
(48)
(49) It can be proved (see the proof of Theorem 1) that:
Z′.sub.t<Z.sub.t (15)
(50) Inequality 15 tells that applying the stretching function ƒ (x;x.sub.c) reduces the training error. More generally, we have the following theorem.
(51) Theorem 1. Consider a block j where the weighted fractions aW.sub.+.sup.j0 and b
W.sub.−.sup.j>0, respectively. The training error Z.sub.t corresponding to the block j is Z.sub.t=2√{square root over (ab)}. Let is a point inside the block j. With a monotonically increasing function ƒ(x−x.sub.c), a
W.sub.+.sup.j of the block j is stretched to n blocks whose weighted fractions are a.sub.1, . . . , a.sub.n (here a.sub.i
W.sub.+.sup.i). Similarly, b
W.sub.−.sup.j of the same block j is stretched to n blocks whose weighted fractions are b.sub.1, . . . , b.sub.n (here b.sub.i
W.sub.−.sup.j). The training error Z′.sub.t corresponding to the stretched histogram is
(52)
Then it holds that
(53)
(54) Proof.
(55) A fact is that
(56)
(57) Therefore, it is equivalent to prove
(58)
(59) Here, inequality 20 is the known Cauchy-Buniakowsky-Schwarz inequality.
(60)
(61) In the following, an algorithm will be described, wherein a training stage is described first, to be followed by a testing stage.
(62) The stretching function ƒ has a parameter x.sub.c which determines which portion is to be spread and which portion is to be narrowed, or compressed. Let x.sub.o be the center of the block j where the difference between W.sub.+.sup.j and W.sub.−.sup.j is the least, j=1, . . . , N.
(63) Uniformly partition the feature domain into N blocks: χ.sub.1, . . . , χ.sub.N 5: Compute
(64)
for each block j 6: Compute the response c.sub.j of the weak classifier h.sub.t at block j:
(65)
(66)
(67)
for each block j 13: Compute the response c′.sub.j of the weak classifier h.sub.t at block j:
(68)
(69)
(70)
(71) Testing Stage:
(72) It is noted that in the training stage the features may be stretched by ƒ and then, subsequently, the weighted fractions W.sub.+.sup.j and W.sub.−.sup.j may be computed, respectively. Suppose that x.sub.? is a sample to be classified in the testing stage. One strategy is to map x.sub.? by ƒ (the result is denoted by x′.sub.? with x′.sub.?=ƒ(x.sub.?;x.sub.c), determine which block x′.sub.? belongs to and then calculate the classifier response k according to W.sub.+.sup.i′ and W.sub.−.sup.i′ where i′=┌x′.sub.?/Δd.sub.large┘ indexes the block x′.sub.? belongs to. Δd.sub.large stands for the block width, or step, in the mapped domain and ┌x┘ stands for round up function, for example round toward infinity. Suppose that the features are in the range [0,1] and the number of blocks is N (a typical value is N=256), then Δd.sub.large=1/N.
(73) Overall, due to the stretching function, the proposed implicitly non-uniform domain partitioning method has less weighted training error and testing error than the classical domain-partitioning method.
(74) The INRIA pedestrian dataset is used for showing weighted training error. One can see from that the weighted training error of the proposed method is steadily lower than that of the DP-AdaBoost. Therefore, it is concluded that the stretching function plays a positive role on decreasing the weighted training error. The difference is illustrated in
(75) In
(76)
(77) Phase 710 comprises storing data to be machine-recognized. Phase 720 comprises running an adaptive boosting machine learning algorithm with the data, wherein a plurality of learning algorithms are applied, wherein a feature space is partitioned into bins, wherein a distortion function is applied to features of the feature space. Finally, phase 730 comprises wherein a first derivative of the distortion function is not constant.
(78) It is to be understood that the embodiments of the invention disclosed are not limited to the particular structures, process steps, or materials disclosed herein, but are extended to equivalents thereof as would be recognized by those ordinarily skilled in the relevant arts. It should also be understood that terminology employed herein is used for the purpose of describing particular embodiments only and is not intended to be limiting.
(79) Reference throughout this specification to one embodiment or an embodiment means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Where reference is made to a numerical value using a term such as, for example, about or substantially, the exact numerical value is also disclosed.
(80) As used herein, a plurality of items, structural elements, compositional elements, and/or materials may be presented in a common list for convenience. However, these lists should be construed as though each member of the list is individually identified as a separate and unique member. Thus, no individual member of such list should be construed as a de facto equivalent of any other member of the same list solely based on their presentation in a common group without indications to the contrary. In addition, various embodiments and example of the present invention may be referred to herein along with alternatives for the various components thereof. It is understood that such embodiments, examples, and alternatives are not to be construed as de facto equivalents of one another, but are to be considered as separate and autonomous representations of the present invention.
(81) Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the preceding description, numerous specific details are provided, such as examples of lengths, widths, shapes, etc., to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
(82) While the forgoing examples are illustrative of the principles of the present invention in one or more particular applications, it will be apparent to those of ordinary skill in the art that numerous modifications in form, usage and details of implementation can be made without the exercise of inventive faculty, and without departing from the principles and concepts of the invention. Accordingly, it is not intended that the invention be limited, except as by the claims set forth below.
(83) The verbs “to comprise” and “to include” are used in this document as open limitations that neither exclude nor require the existence of also un-recited features. The features recited in depending claims are mutually freely combinable unless otherwise explicitly stated. Furthermore, it is to be understood that the use of “a” or “an”, that is, a singular form, throughout this document does not exclude a plurality.
INDUSTRIAL APPLICABILITY
(84) At least some embodiments of the present invention find industrial application in optimizing machine recognition, to, for example, reduce traffic accidents in self-driving vehicles.
REFERENCE SIGNS LIST
(85) TABLE-US-00001 110 View 101 Road 125 Area of interest 120 Pedestrian 130 Camera 140 Image recognition algorithm 300-360 Structure of device of FIG. 3 410 The area of feature space being stretched 710-730 Phases of the method of FIG. 7
CITATION LIST
(86) [1] R. Abiantun and M. Savvides. Dynamic three-bin real AdaBoost using biased classifiers: an application in face detection. In Proc. IEEE Intl' Conf. Biometrics: Theory, Applications, and Systems, 2009. [2] Z. Fu, D. Zhang, X. Zhao, and X. Li. Adaboost algorithm with floating threshold. In ACAI, 2012. [3] R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3): 297-336, 1999. [4] Wen J and Xiong Y. Smoothing LUT classifiers for robust face detection. In ICIST, 2013. [5] Y. Hanai and T. Kuroda. Face detection through compact classifier using adaptive Look-Up-Table. In ICIP, 2009. [6] C. Huang, H. Ai, T. Yamashita, S. Lao, and M. Kawade. Incremental learning of boosted face detector. In ICCV, 2007. [7] P. Sharma, C. Huang, and R. Nevatia. Unsupervised incremental learning for improved object detection in a video. In CVPR, 2012. [8] Z. Li and Y Zhao. Pedestrian detection in single frame by edgelet-LBP part detectors. In AVSS, 2013.