PREDICTIVE VEHICLE DIAGNOSTIC METHOD
20210090359 ยท 2021-03-25
Inventors
- Anthony Peter GRIFFITHS (Newton, GB)
- Christopher Lee DAVIES (Sketty, GB)
- James Gareth DAVIES (Ann Arbor, MI, US)
- Stephen David NORRIS (London, GB)
- Patrick James TUDOR (Sketty, GB)
- Timothy Peter DAVIS (Warwick, GB)
- Kelly Marie NOCK (Cardiff, GB)
- Steffan Rees (London, GB)
Cpc classification
B60W50/14
PERFORMING OPERATIONS; TRANSPORTING
G05B23/024
PHYSICS
G05B23/0221
PHYSICS
G05B23/0224
PHYSICS
G05B23/0283
PHYSICS
B60R16/0234
PERFORMING OPERATIONS; TRANSPORTING
International classification
G07C5/08
PHYSICS
B60R16/023
PERFORMING OPERATIONS; TRANSPORTING
Abstract
A computer-implemented method of predicting vehicle faults, the method comprising, at a data processing stage: receiving: i) sets of telematics data each associated with a vehicle identifier, and ii) a vehicle fault dataset, which records historic vehicle fault events, wherein the vehicle fault events are associated in the datasets with cooperating vehicle identifiers; for each of the vehicle identifiers, determining i) a feature object by processing the associated set of telematics data to determine at least one driving style parameter therefrom, the feature object comprising the at least one driving style parameter, and ii) a training label for the feature object based on one or more of the vehicle fault events associated with that vehicle identifier; and using the feature objects and their training labels to train a predictive component, executed at the data processing stage, to learn causal associations between the driving style parameters and the vehicle fault events, such that a feature object comprising at least one target driving style parameter, associated with a target vehicle, inputted to the trained predictive component causes the predictive component to output a corresponding vehicle fault prediction.
Claims
1.-48. (canceled)
49. A computer-implemented method of predicting vehicle faults, the method comprising: receiving at a computer system: i) sets of telematics data each associated with a vehicle identifier, and ii) a vehicle fault dataset, which records historic vehicle fault events, wherein the historic vehicle fault events are associated in the datasets with cooperating vehicle identifiers; for each of the vehicle identifiers, determining i) a feature object by processing the associated set of telematics data to determine at least one driving style parameter therefrom, the feature object comprising the at least one driving style parameter, and ii) a training label for the feature object based on one or more of the historic vehicle fault events associated with that vehicle identifier; and using the feature objects and their training labels to train a predictive component to learn causal associations between the driving style parameters and the historic vehicle fault events, such that a feature object comprising at least one target driving style parameter, associated with a target vehicle, inputted to the trained predictive component causes the predictive component to output a corresponding vehicle fault prediction.
50. The method of claim 49, comprising: inputting a feature object comprising the target set of driving style parameters to the trained predictive component; and outputting the corresponding vehicle fault prediction, by the predictive component.
51. The method of claim 50, wherein the vehicle fault prediction is outputted to a user via an output device.
52. The method of claim 49, wherein the corresponding vehicle fault prediction comprises a significance value denoting a likelihood of a vehicle fault occurring with the target vehicle.
53. The method of claim 52, wherein the training labels are determined based on respective types of the historic vehicle fault events such that the significance value denotes a likelihood of a specific type or types of vehicle fault event occurring.
54. The method of claim 52, wherein the training labels are determined based on timing or usage values associated with the historic vehicle fault events such that the significance value denotes a likelihood of the specific type or types of vehicle fault occurring within the target vehicle within a predetermined period of time and/or a predetermined usage interval.
55. The method of claim 54, wherein the training labels are vectors having components corresponding to different time or usage intervals.
56. The method of claim 49, wherein the training labels are determined based on recorded resource values for the historic vehicle fault events such that the corresponding vehicle fault prediction is an expected vehicle fault resource value for the target vehicle.
57. A system for predicting vehicle faults, the system comprising: a computer interface configured to receive: i) sets of telematics data each comprising a vehicle identifier, and ii) a vehicle fault dataset, which records historic vehicle fault events, wherein the vehicle fault events are associated in the datasets with cooperating vehicle identifiers; at least one processor; and a memory configured to store executable instructions which, when executed on the at least one processor, cause the at least one processor to: process each of the sets of telematics data to determine a feature object comprising at least one driving style parameter; group the feature objects into a plurality of driving style groups, by comparing at least the driving style parameters of the feature objects; and link each of the driving style groups with one or more of the historic vehicle fault events based on the associated vehicle identifiers.
58. The system of claim 57, wherein the at least one processor is configured to receive a feature object of a target vehicle comprising at least one driving style parameter, match the feature object of the target vehicle to at least one of the driving style groups, output a vehicle fault prediction for the target vehicle based on the vehicle fault events linked to the at least one driving style group, and determine the feature object for the target vehicle by processing a set of telematics data received for the target vehicle.
59. The system of claim 57, comprising a user interface for accessing vehicle fault information for each of the driving style groups, the vehicle fault information being derived from the one or more vehicle fault events to which the driving style group is linked.
60. The system of claim 57, wherein the at least one processor is configured to group the feature objects using an unsupervised machine learning algorithm.
61. The system of claim 57, wherein the at least one processor is configured to aggregate constituent driving style parameter sets of each of the driving style groups to determine a representative driving style profile for that driving style group.
62. The system of claim 57, wherein the at least one processor is configured to aggregate, for each of the driving style groups, the historic vehicle fault events linked to it, to determine a representative historic vehicle fault profile, wherein the vehicle fault prediction is based on the representative historic vehicle fault profile of the at least one driving style group.
63. The system of claim 62, wherein the vehicle fault prediction comprises a likelihood of at least one type of vehicle fault occurring with the target vehicle, wherein the likelihood is determined based on the representative historic vehicle fault profile of the at least one driving style group.
64. The system of claim 57, wherein the at least one processor is configured to implement a plurality of predictive components, each corresponding to one of the driving style groups, wherein each of the predictive components is trained using the feature objects of the driving style group to which is corresponds and the one or more vehicle faults linked to that group, wherein the one or more vehicle faults are used to determine training labels for that driving style group.
65. The system of claim 57, wherein each of the feature objects also comprises at least one vehicle attribute and/or at least one environmental parameter.
66. The system of claim 65, wherein each of the feature objects also comprises the at least one vehicle attribute, and the at least one vehicle attribute comprises at least one of: i) an age of the vehicle, ii) a mileage of the vehicle, iii) a vehicle manufacturer, iv) a vehicle model, v) a vehicle engine type, and vi) a vehicle transmission type.
67. The system of claim 57, wherein the driving style parameters comprise at least one selected from the group consisting of: a vehicle speed metric, a driving distance metric, an vehicle acceleration metric, a vehicle engine metric, a vehicle braking metric, a total number of journeys, a total number of days, a number of journeys per day, a time per journey, a journey time per day, a moving time per journey, a moving time per day, a distance covered per journey, a distance covered per day, an average speed, a maximum speed, an average moving speed, a maximum moving speed, an average acceleration, a maximum acceleration, an average deceleration, a maximum deceleration, a total number of brakes per journey, a total number of brakes per day, an average engine revolutions per minute (RPM), a maximum engine RPM, an average engine RPM during acceleration, a maximum engine RPM during acceleration, an average engine RPM at constant speed, and a maximum engine RPM at constant speed.
68. At least one non-transitory computer readable medium having stored thereon computer readable instructions that, when executed on one or more computer processors, implement operations comprising: receiving i) sets of telematics data each associated with a vehicle identifier, and ii) a vehicle fault dataset, which records historic vehicle fault events, wherein the historic vehicle fault events are associated in the datasets with cooperating vehicle identifiers; for each of the vehicle identifiers, determining i) a feature object by processing the associated set of telematics data to determine at least one driving style parameter therefrom, the feature object comprising the at least one driving style parameter, and ii) a training label for the feature object based on one or more of the historic vehicle fault events associated with that vehicle identifier; and using the feature objects and their training labels to train a predictive component to learn causal associations between the driving style parameters and the historic vehicle fault events, such that a feature object comprising at least one target driving style parameter, associated with a target vehicle, inputted to the trained predictive component causes the predictive component to output a corresponding vehicle fault prediction.
Description
BRIEF DESCRIPTION OF FIGURES
[0090] For a better understanding of the present invention, and to show how embodiments of the same may be carried into effect, reference is made to the following figures in which:
[0091]
[0092]
[0093]
[0094]
[0095]
[0096]
[0097]
[0098]
[0099]
[0100]
[0101]
[0102]
[0103]
[0104]
[0105]
[0106]
DETAILED DESCRIPTION
[0107] Embodiments of the invention are describe in detail below. First, some useful context to the invention is provided.
[0108]
[0109] The OBD system 2 collects various telematics data, as represented by the set of inputs labelled 4. The telematics data 4 collected by the OBD system 2 comprises raw telematics data collected from on-board sensors 5, which are coupled to the OBD system 2 which can be arranged to monitor essentially any desired property of the vehicle 1 or its various subsystems and components. The OBD system 2 can also be coupled to other on-board data sources of the vehicle 1, such as other operational components 6 (physical or software) of the on-board computer system, and the telematics data 4 can comprise data collected from such sources.
[0110] The telematics data is representative of the vehicle's internal state over time, and can also include location, speed/velocity and/or acceleration data, e.g. collected via GPS or similar.
[0111] The on-board sensors 5 may monitor, amongst others, the following vehicle data: speed, distance driven, time driven, acceleration, deceleration, engine RPM, temperature, engine temperature, engine events (e.g. on, off), engine on time (e.g. whilst stationary, whilst moving), braking (number of times, braking force).
[0112] For example,
[0113]
[0114] Embodiments of the invention will now be described by way of example only. As noted above, although the following is described with reference to vehicle repair events, the description applies equally to other types of vehicle fault event. That is, the description applies equally to other forms of vehicle fault data set and not just vehicle repair datasets.
[0115] Driving Style Profiles
[0116]
[0117] A telematics dataset 13 undergoes processing by the first processing component 12 to generate a set of driving style parameters 14. It does this for each vehicle in a population of vehicles 1P for which the analysis is being performed, so as to generate a set In particular, the processing component 12 takes, for example, raw speed data and determines parameters such as, average speed, maximum speed, average moving speed, maximum moving speed. These parameters may be determined, for example, per journey or per day. A journey begins with the vehicle's engine being switched on and ends with the vehicle's engine being switched off.
[0118]
[0119] The data linking component 10 receives, as inputs, the set of driving style parameters 14 for each vehicle in the population of vehicles 1P for which the analysis is being performed, and a vehicle fault dataset 15 for the same population of vehicles 1P.
[0120] Each vehicle 1 within the population 1P is uniquely identified by a vehicle identifier (ID), in the form of a vehicle identification number (VIN). As is known in the art, a VIN is a unique code that is used to identify an individual vehicle 1 throughout its life. Each set of telematics data 13 is associated with the VIN of the vehicle fault from which it has been collected to allow it to be linked to other data for that vehicle, as described later. The set of driving style parameters 14 derived therefrom is specific to that vehicle and remains associated with its VIN.
[0121] The driving style parameter set 14 comprises driving style parameters derived from the telematics data of the kind described above with reference to
[0122] For example,
[0123]
[0124] The vehicle repair dataset 15 is shown to comprise a set of repair record(s) 15, each of which is associated with a VIN, and records at least one repair operation performed on the corresponding vehicle in the population 1P. Each repair record 15 may comprise a timing value 15A. This can correspond to the time at which the repair operation was actually performed, but this is not essentialit could for example be a later time at which the repair record 15 was processed, and this can still be used to give reliable results. In this respect, it is noted that where this description refers to the time at which an event occurs, the relevant description applies more generally to the timing associated with that event.
[0125] The timing of the vehicle repair can be important in different use cases: as described later, a prediction can be made for a given target vehicle in terms of a risk of repair value, and it may be most appropriate for that value within a window of time or usage (prediction window). Examples of how this can be achieved are described later with reference to
[0126] The repair records 15 can be in the form of warranty claim or service records. One of the realizations underpinning the described techniques is that, within a predetermined window of a vehicle's lifetime (the warranty period), comprehensive data about component faults/failures within that widow is available to the manufacturer. This is because, during that time window, whilst the vehicle is still under warranty, it is the manufacturer who bears the responsibility for such failures/repairs. Likewise, data about component faults/failures may be recorded when the vehicle is serviced.
[0127] Each repair record 15 is shown to comprise at least one repair code (RC) relating to the type of repair operation(s) that was performed as part of the repair event. The RC can for example be a labour operation (LOp) code, identifying a type of labour operation performed, or part code of a faulty vehicle component identified in the repair. Whilst such information can be used to refine the analysis that is performed, it is not in fact essential for the purposes of the invention for the repair record 15 to identify the type of repair; embodiments of the invention can be implemented using only the associated timing information 15A.
[0128] In this example, the datasets 14, 15 are generated by the pre-processing components 8, 9 applying any necessary pre-processing to, respectively, telematics data and vehicle repair data received at the data processing stage to place them in a form that allows them to be used in the manner described below. This can for example include the removal of duplicate or erroneous records, re-formatting, reformulation of telematics data etc. As will be appreciated, the level of pre-processing required will depend on the state of the initial data, and pre-processing may be omitted if the data is received in a sufficiently refined form.
[0129] Although not show in
[0130] The set of driving style parameters 14 contains one or more of the following example parameters: total journeys, total days, number of journeys per day, time per journey, journey time per day, moving time per journey, moving time per day, distance covered per journey, distance covered per day, average speed, maximum speed, average moving speed, maximum moving speed, average acceleration, maximum acceleration, average deceleration, maximum deceleration, total number of brakes per journey, total number of brakes per day, average engine revolutions per minute (RPM), maximum engine RPM, average engine RPM during acceleration, maximum engine RPM during acceleration, average engine RPM at constant speed, and maximum engine RPM at constant speed.
[0131] The respective VINs contained in the driving style parameters and vehicle repair datasets 14, 15 cooperate in that they allow repair operations 15 recorded in the repair dataset 15 to be matched to corresponding driving style parameters 14 for the same vehicle in the set of driving style parameters 14. A function of the data linking component 10 is to link the repair record(s) 15 associated with each VIN in the repair dataset 15 to the corresponding set of driving style parameters 14 associated with the matching VIN in the driving style parameters dataset 14.
[0132] Linking driving style parameters to repair histories ultimately allows the system to understand the causal relationships between driving style and vehicle fault events. Modern machine learning (ML) techniques can be used to learn these associates in a systematic and automated fashion. As per the examples described below, both supervised and unsupervised learning can be utilized in this context.
[0133] Once linked, the sets of driving style parameters and corresponding sets of repair records are used to train a predictive model, which as mentioned above, may be a machine learning model. Regardless of the specific technique used, the predictive model is trained to learn causal connections between driving style parameters and repair records based on the provided training data. For example, the model may be used to determine which driving style parameters are most likely to have caused a historic repair record. As explained below, this is a form of supervised learning where the driving style parameters are used to form input feature objects, such as feature vectors, to the model and the repair history is used to derive expected outputs for the input feature objects. Such feature objects may also be referred to as driving style profiles of individual vehicles herein.
[0134] The predictive model may go through a number of iterations to correctly train the model. With each iteration, the model takes one or more driving style parameters and predicts a repair operation. The model can be deemed trained when the model correctly predicts repair operations known to be associated with particular driving style parameters. Note that the model does not know with certainty which driving style parameters caused a particular repair operation, but can make predictions with a degree of confidence based on the training data.
[0135] A trained model is able to take new inputs and make predictions based on those new inputs and the training data. For example, the model may make a vehicle fault prediction. That is, the model may predict one or more vehicle faults likely to occur with a target vehicle based on a target set of driving style parameters. For example, the model may output a probability of one or more (e.g. all) known repair operations occurring with the target vehicle.
[0136] Some specific examples of how different ML model can be applied to the task of making a fault prediction about a target vehicle based on data about the target vehicle, and in particular based on a set of (one or more) driving style parameters determined for the vehicle from its telematics data will now be described. In the following examples, the prediction is a risk value, such as a probability of the vehicle requiring a repair operation. This can be the probability of it requiring any repair operation, or a specific type of repair operation, such as a brake pad replacement. The risk value may be time or usage specifice.g. a probability (or other risk value) of a vehicle requiring a repair operation within: [0137] a certain interval of (absolute) time, e.g. a particular month; [0138] a certain age range, e.g. when the vehicle is between 18 and 19 months old, as measured from the date it entered active service (which can be determined from the vehicle record if necessary; this can be the date it was sold to the customer); or [0139] a certain usage (e.g. mileage or hourage) range, e.g. when the vehicle has between 1200 and 1400 miles on the clock.
[0140] The following examples are provided as a means of further illustrating certain underlying principles of the invention and its preferred embodiments. However as will be appreciated, these examples are not, and are not intended to be, exhaustive.
[0141] Variations that exploit the same or similar underlying principles will become apparent in view of the following description.
[0142] Before describing specific examples of how ML models may be adapted, some of the principles according to which a suitable feature vector may be assigned to a VIN are described with reference to
[0143] The feature vectors are determined for the training population of vehicles 1P by the second processing component 11 of the data processing stage, from the data of the linked dataset 19 in this example.
[0144]
[0145] In this example, the feature vector v.sub.n is shown as being populated with a mixture of driving style parameters 14n derived from the telematics data 13n; vehicle attributes derived from the vehicle records 21n, such as manufacturer, product group (brand), product, model, model year, age and/or usage; and environmental parameters derived from the environmental data 30, such as average temperature per journey, average number of journeys with precipitation per month, and average (e.g. most common) terrain type etc.on the basis that all of these factors might contribute to the timing and nature of required repairs.
[0146] Classification
[0147] Classifiers can broadly speaking be broken down into ones that output a probabilistic score e.g. Logistic Regression, and ones that do not e.g. SVM. However, probabilistic classifications can be converted to deterministic results, and there are also methods available to turn a non-probabilistic score into a probabilistic score (or pseudo-probability).
[0148] Note that all references herein to vehicle fault probabilities (or similar) given a particular input apply more generally to any significance score denoting the likelihood (in the everyday sense of the work) of a vehicle fault occurring, e.g. within a particular time or usage window, given that input, which can be a probabilistic, pseudo-probabilistic or non-probabilistic result, as derived using any suitable classifier or other suitable model.
[0149] Probabilistic Classification
[0150] Probabilistic classification is one way in which a risk value can be determined for a given feature vector.
[0151]
[0152] Although not shown in
[0153] This is based on supervised learning. The basis of supervised learning is that the model 1502 is trained to learn a function y(v) given a set of example values of y(v)denoted (y.sub.0, . . . , y.sub.N) (the training labels)for respective input vectors (v.sub.0, . . . , v.sub.N). Together, these make up a set of training data (training set). Each y.sub.n value can be thought of as class label assigned to the corresponding input vector v.sub.n. The power of an ML model is that it is able to generalize from the training set, to give a reliable estimate of y(v) for an input vector v it has not encountered before.
[0154] The model 1502 is a computer program that receives x as an input, and transforms it to generate an output y(v), according to a set of electronically stored model 1502 parameters {c.sub.0, . . . , c.sub.M}. Strictly speaking, y is a function of v and the model parameters, and thus could be legitimately denoted y(v, c.sub.0, . . . , c.sub.M), though that is avoided herein in the interests of conciseness.
[0155] During training, the model parameters are recursively adapted, according to a training algorithm, with the objective of minimizing a loss function:
O(y(v.sub.n)y.sub.n)
[0156] for each (v.sub.n, y.sub.n) pairing in the training data, until a set of selected stopping criteria are met. Here, the loss objective function O provides a measure of difference between its inputs. In practice, what is often optimized is a cost function, which can comprise an aggregation of the loss functions across the training inputs (with regularization if necessary). A variety of different loss functions can be used, such as mean squared error, cross-entropy etc. One example of a suitable training algorithm is gradient descent, though different training algorithms can be used depending on the context. These are well known per se so are not descried in any further detail.
[0157] The training set, that is the feature vectors and their category labels, are determined at Step 1 in
[0158] For the task at hand, feature vectors v.sub.0, . . . , V.sub.N1 are derived as described above for vehicles n=0, . . . , N.sub.1.
[0159] Time or usage based predictions can for example be handled by defining suitable time or usage-based class (category) labels. For binary classification, the output label y for the ith training example is a 1-d binary label, which it can be convenient to encode as a binary value (0 or 1). For multi-classification, a 1-d output vector can be used for the purposes of collecting and labelling the data e.g. [a, b, c, . . . ]. However, this might be encoded differently for the training phase. For example, for the purposes of probabilistic classification, this can be used to construct a k-dimensional probability vector, where each dimension corresponds to a particular time or usage interval (category) and any dimension that is 1 corresponds to the category to which the training vector belongs. For example, following the above example, if the output label for the ith training example belongs to the set {a, b, c }, each element of that set can be encoded as:
[0160] a=>[1, 0, 0]
[0161] b=>[0, 1, 0]
[0162] c=>[0, 0, 1]
[0163] Accordingly, if VINn has experienced a particular vehicle fault event having an associated timing/usage that falls within time (absolute or vehicle age) or usage interval q, e.g.:
[t.sub.q,t.sub.q+t]
[0164] then its category label q would correspond to a probability vector with the following components:
y.sub.n=(y.sub.n,0, . . . ,y.sub.n,p)
[0165] with:
[0166] Once the training set has been determined in this manner, then at Step 2, it is used to train the probabilistic classification model 1502, in the manner described above.
[0167] With the model 1502 trained then, at Step 3, the trained model 1502 can be used to make a vehicle prediction about a target vehicle T, for which no repair data is available (strictly speaking, for which no repair data is required).
[0168] To do this, a feature vector v.sub.TV is determined for the target vehicle T using the available data, in exactly the same way as the training feature vectors are determined, and inputted to the trained model 1502. The output will be a vector that looks something like:
[0169] (It goes without saying that the numerical values are merely illustrative). Provided a suitable probabilistic classification model is chosen and trained sensibly in accordance with the principles set out above, then each y.sub.q value can be interpreted as a probability that the input vector v.sub.T belongs to class q, which in turn can be interpreted as the probability of the target vehicle T experiencing whatever repair event is under consideration with a timing or usage value that falls within the corresponding time or usage interval.
[0170] So, for example, in the above, there is a 64% probability of the target vehicle experiencing whatever vehicle fault is under consideration with an associated timing or usage that falls within the time or usage value interval of class q, however those are defined with reference to the training data. For instance, if the training labels have been assigned to the training data such that vehicles which have experienced a brake pad replacement when the vehicle is between 16 and 17 months old (relative to the date it entered active service), then y.sub.r=0.64 can be interpreted as a 64% probability of the target vehicle requiring a break pad replacement at that point in its life.
[0171] Mathematically, this can be expressed as a conditional probability:
y.sub.n,p=Pr(R.sub.q|v.sub.n)
[0172] That is as the probability or vehicle vault event (R.sub.q) occurring with an associated timing or usage falling in the interval corresponding to q, given vehicle history v.sub.n.
[0173] Examples of suitable probabilistic classification models include a logistic regression model, a gradient boosting machine, or a neural network with a probabilistic output (e.g. a softmax layer).
[0174] Note that the timing or usage associated with a vehicle fault event can be the time or usage value for the vehicle at the time the fault occurred, but it does not have to be. It could for example be the time at which the fault was identified or repaired, or the usage value at that point, the time the fault was logged, or the time at which the corresponding warranty claim was processed (for example). That is, it can be, but need not be, the time at which the fault actually occurred or is believed to have occurred.
[0175] Although in above there are multiple classes corresponding to different time/usage intervals, the same principles could also be applied to a simpler binary classification, with the two classes corresponding to (say) a repair having an associated timing/usage in a particular interval and no repair in that interval respectively.
[0176] In ML terminology, a distinction is drawn between deterministic classification, in which a feature vector is assigned to a single class, and regression, in which a continuous output value is determined. Under this definition, probabilistic classification is a form of regression, with the continuous output being the class probability value(s). For this particular task, probabilistic classification may be preferred in some contexts, however deterministic classification could also be used, for example to assign a feature vector to one of a set of discrete risk categories, applying the same principles to generate feature vectors using (at least) driving style parameters, and training labels using vehicle fault history. If desired, a probabilistic classifier can be used to implement a discrete classifier, by selecting the highest probability category, or an inherently deterministic classification algorithm can be used.
[0177] Driving Style Groups
[0178] It can also be useful to evaluate the similarity of driving styles (and other factors too, such as environmental parameters and vehicle attributes) across different vehicles. This can be achieved by using ML techniques to group the feature vectors for the training population 1P. Unsupervised learning, such as clustering, can be used to identify latent groups of the feature vectors. These are referred to herein as driving style groups, however as will be appreciated, in the case that the feature vectors comprise other forms of parameter (e.g. environmental parameters and/or vehicle attributes), the groups may not be determined by driving style exclusively as the other factors can also contribute.
[0179] That is, a machine learning algorithm can be used to group driving style parameters based on the similarity between the driving style parameters. For example, a classification algorithm may take the driving style parameters as inputs and output a label (sometimes referred to as a category or class) for each parameter. Driving style parameters with the same label or category form a distinct group.
[0180] Clustering
[0181] An unsupervised learning example will now be described with reference to
[0182] In this example, unsupervised learning is used to categorize the feature vectors v.sub.0, . . . , v.sub.N1 (derived as above, with reference to
[0183] This is shown at Step 1 in
[0184] This can be absolute (e.g. each vector assigned to the closest cluster mean), or probabilistic. These latent clusters are examples of driving style groups as that term is used herein. The underlying idea is to group together similar driving styles, taking into account environmental parameters and vehicle attributes, if used (so similar driving styles for similar vehicles under similar environmental conditions).
[0185] Once the clusters have been identified, at Step 2, the vehicle repair data 13n for each VINn in a particular cluster can be collated to give overall repair information for the cluster. In this example, VIN 2, 6, 7, 12 and 19 belong to cluster 1, and their repair data is collated to provide collated repair data 13C1 for cluster 1.
[0186] At step 3, a k-means (or other cluster-based) classifier 1602 can be used to classify the feature vector v.sub.T for the target vehicle T. The cluster-based classifier classifies v in terms of its relation to the latent clusters. In
[0187] The prediction could for example be generated as described above with reference to
[0188] Although k-means clustering is used as an example, other forms of unsupervised classification can also be used.
[0189] This is one example of a suitable mechanism by which the linked sets 19 of driving style parameters 14 and repair records 15 can be grouped together by the second processing component 11, based on a comparison of at least the driving style parameters 14 in each set. This may also take into account similarity of vehicle attributes and environmental factors, as noted.
[0190] The classification algorithm can be employed to determine similarity and grouping thresholds.
[0191] The second processing component 11 takes each group 16 of driving style parameter sets 14 and linked repair record sets 15 and outputs, for each group 16, a representative driving style (behaviour) profile 17 and linked representative repair record profile 18 (collectively labelled 20). For example, each driving style profile 17 may be formed of typical driving style parameters derived from the multiple sets of driving style parameters of the respective driving style group 16. The driving style profile 17 may indicate the most prevalent driving style parameters. In another example, the profile 17 may indicate the driving style parameters with values above or below a typical value (or the average value taken from other groups) by a certain amount. That is, a driving style profile 17 may represent groups with, for example, aggressive braking (high number of total brakes per trip) or excessive speeding (large maximum and/or average speed). Similarly, each representative repair record profile 18 represents the repair records 15 which make up a respective group 16.
[0192] A machine learning algorithm, such as a classification or clustering algorithm, may also be used to determine a representative driving style profile from the feature vectors in a particular driving style group.
[0193] Matching Profile
[0194] As illustrated in
[0195] A matching component 22 identifies one of the representative driving style profiles 17 that matches with the target driving style profile 17T. For example, the target profile 17T may be representative of a driver driving with aggressive braking. This profile is compared with the representative profiles (which for example may represent excessive speeding, repetitive engine stalling, aggressive braking, driving in lower gears, etc.).
[0196] The third processing component 21 and matching component 22 are functional components of the data processing stage, i.e. they represent functions that are carried out according to computer-readable instructions (software) executed on one or more processing units of the data processing stage (such as CPUs, GPUs etc.).
[0197] An advantage of identifying a matching profile is that a vehicle fault may be predicted before the fault occurs with the target vehicle. For example, the matching profile is linked with a vehicle fault profile containing one or more commonly occurring vehicle faults. It can therefore be assumed that if a set of vehicles driven in the same style experience the same vehicle fault, a target vehicle driven in a comparable style is also likely to experience that fault. This provides the opportunity for the vehicle to be checked for a vehicle fault, or for a component to be repaired or replaced.
[0198] A matching profile may be determined by the predictive model. For example, classification techniques may be used to take the target driving style profile as an input and identify a matching representative driving style profile. In this case, the classification algorithm is assigning a known label (which may denote denoting a particular class/category) to the target driving parameters.
[0199] The matching component 22 may determine a matching score between the target style profile 17T and the matching representative driving style profile 17. That is, the matching score may be a measure of similarity between the two (target and matching) style profiles.
[0200] For example, the classification model may predict a continuous value as the probability of a target driving style profile belonging to each representative driving style profile. The probabilities can be interpreted as the likelihood or confidence of the target driving style profile belonging to each class (i.e. each representative driving style profile). A predicted probability can be converted into a class value by selecting the class label that has the highest probability. That is, the representative driving style profile with the highest probability is identified as the matching profile.
[0201] Whilst any number of representative driving style profiles 17 may be determined, the matching driving style profile may not be an exact match to the target driving style profile 17T. For example, the target and matching profile may have some common features whilst they may also differ in some aspects. An advantage of the matching score is that the user can, based on the score, easily determine how similar the profiles are. If the profiles have a high matching score, the profiles are very similar and therefore the target vehicle 1T is likely to experience one of the vehicle faults from the linked vehicle fault profile.
[0202] In the example of
[0203] The vehicle records 21 can be sales records created when the vehicles are sold. Sales records are a convenient instrument for collecting comprehensive data about vehicles commencing active service, however any suitable form of vehicle records can be used.
[0204] The vehicle records 21 can also be updated when a vehicle is, for example, repaired or serviced. For example, a current mileage may be updated during a service of the vehicle.
[0205] It is noted that, whilst vehicle records 21 relate to the same population of vehicles 1P, they are generally collected at different times than the repair and telematics datasets 15, 14 from different (sometimes disparate) sources. The use of VINS in these datasets make this possible, as it allows the disparate records to be linked.
[0206] In some examples, the vehicle records 21 (or more particularly the vehicle attributes recorded in those datasets) are used to determine more specific driving style groups. The vehicle attributes may be for example, the make and/or model of the vehicle, the age of the vehicle (e.g. a sale record), a vehicle mileage, an engine type, a transmission type, etc. When determining the plurality of driving style groups 16, not only are sets of driving style parameters between groups compared 14, the vehicle attributes are also compared. For example, groups 16 may be formed of vehicles with matching manufacturers, or similar engine sizes, or similar chassis types, e.g. hatchback, coupe, saloon. As an example, a group 16 may be made up of driving style parameters (e.g. high average acceleration values) coupled with BMW vehicles, whilst another group may be made up of the same driving style parameters (high average acceleration values) coupled with Ford vehicles. Another group may be made up of BMW vehicles with a different driving style parameter (e.g. low average acceleration values).
[0207] An advantage of this sub-group analysis is that the representative driving style profiles 17, each of which is based on a respective driving style group 16, are thereby made more specific. As each driving style style profile 17 is linked with a vehicle fault profile 18, the faults linked with a style profile are made more specific to that profile. For example, a driving style profile representative of heavy acceleration may have an associated fault (e.g. transmission damage). However, that driving style profile 17 may be made up of a range of different transmission systems, e.g. from different manufacturers. It may be that only certain types of transmission systems suffer from damage at the level of heavy acceleration of that profile. So, if the groups are formed based on the transmission type, the profiles derived from these groups are split. The result is that one profile may be linked with the transmission damage, whilst another profile is not.
[0208] Similarly, the target driving style profile 17T of the target vehicle may be based on the vehicle attributes 21T (received at the processing stage) of the target vehicle 1T. A benefit of this is that the target driving style profile may be matched to a more specific representative driving style profile 17. This results in predicting linked vehicle faults which are more specific to the target vehicle.
[0209] Vehicle Fault Risk
[0210] As an optional feature, the representative historic vehicle fault profile linked to the matching representative driver style profile may be used to determine a significance value denoting the probability (or likelihood) of a type of vehicle fault occurring with the target vehicle. That is, the risk of a particular fault occurring in the target vehicle may be calculated. The risk may be of a particular vehicle component (e.g. brake pads, head gasket, starter motor, etc.) failing partially or completely, or the probability that the component will need replacing. For example, the probability of the brake pads of a vehicle requiring replacement may be calculated. In some examples, the probability of a vehicle fault occurring after a given time period (e.g. hours of use), or mileage period, may be calculated. The vehicle fault type corresponds to a vehicle fault contained in the representative vehicle fault profile.
[0211] As discussed above, each driving style profile is linked to a vehicle fault profile, with each vehicle profile containing a set of repair records. As shown in
[0212] Any number of machine learning techniques (e.g. logistic regression, gradient boosting machines, neural nets, etc.) can be applied to an historic training dataset of multiple vehicles described by a set of input parameters (e.g. vehicle age, avg. speed, number of hard brakes, etc.) to determine a risk score for a particular outcome (e.g. claim in the following week) of interest.
[0213] The risk value can be calculated by applying a machine learning technique such as, for example, logistic regression, gradient boosting, neural networks, etc., to the training data.
[0214] For example, a logistic regression technique may be used. For example, the set of driving style parameters derived from the historic telematics data are used as training data. The driving style parameters are independent variables. The linked vehicle fault events are the associated outcome dependent variables. The goal of logistic regression is to construct a model that explains the relationship between the independent variables and the outcome dependent variables, so that the outcome of a new experiment can be correctly predicted for a new data point for which the independent variables, but not the outcome, are available. The predictive model is trained on the training data until the vehicle fault events are correctly predicted from the driving style parameters.
[0215] The target vehicle may be equated with the new data point, with the target driving style parameters (and/or the target driving style profile) used as the new independent variables input to the trained model. The trained model outputs one or more of the dependent variables (outcomes) with an associated probability of that outcome. That is, the trained model predicts the likelihood of a vehicle fault event occurring with the target vehicle.
[0216] The risk value may be the risk of the vehicle fault occurring within a given period of time or within a given usage period, given the target driving style parameters and/or profile. The usage period may, for example, a period of mileage, or a number of hours of use (e.g. number of hours whilst driving).
[0217] As another example, the trained model may be used to determine an estimated cost of (resource value for) repairs for the target vehicle. For example, each historic vehicle fault event may include a cost of repair for that particular vehicle fault. Cost is a useful metric in this context because it is a reasonably reliable indicator as to the severity of the issue. The model may be trained to correctly predict the repair cost of each historic vehicle based on the training data. The trained model may then be used to predict the cost of repairs for the target vehicle, e.g. occurring within the next year, based on the target driving style parameters.
[0218] The same principles as described above in relation to
[0219]
[0220] In the example of
[0221] A benefit of this telematics based approach is that it allows preventative maintenance to be performed at the component level on a per-vehicle basis, based on how that vehicle has been driven compared to other (similar) vehicles. For example, where it can be seen that an individual vehicle has been driven in the same way that a number of vehicles have been, and those vehicles have all experienced the same or similar vehicle fault, one can take preventative action to prevent that same fault occurring. This approach can take into account the type of failure (e.g. how severe will the consequences of not taking preventative action be) and also the similarity between the style profiles and the number of linked faults (e.g. how confident should the user be that the fault will occur).
[0222] In other words, this allows issues with individual vehicles to be detected earlier than would otherwise be the case. This in turn allows any necessary repair/replace operations to be scheduled in advance in an appropriate manner, e.g. alongside other planned maintenance work or during normal operational hours, with less vehicle downtime (as opposed to those repair/replace operations being driven be vehicle breakdown, as might otherwise be the case, for example).
[0223] To take an extreme example, if a target driving style profile 17T has a matching score of 95% with a matching representative driving style profile, and there is a 90% risk of the brake pads requiring replacement, the user can be sufficiently confident that the brake pads are likely to need replacing. Instead of waiting until the brake pads fail, the user can instead replace the brake pads and prevent any further damage to the car, or reduce the risk of an associated accident occurring. This is contrast to reactive action which would result in the brake pads being replaced only after the user becomes aware due to, e.g. brake failure.
[0224] By detecting the issue earlier, at the very least, this allows a maintenance operation to be scheduled for the vehicle at a convenient time (rather than having to perform maintenance in response to the failure), and in some cases, if a fault can be detected earlier it may be less burdensome to repair.
[0225] Environmental Factors
[0226] The example of
[0227] In examples, in addition to driving style and vehicle attributes, the grouping may also take into account environmental factors associated with the vehicles, such that each group corresponds not only to a particular driving style and optionally a particular type or class of vehicle, but also the type of weather or terrain the vehicle is driven in. For example, groups 16 may be formed of vehicles driven in matching temperature, humidity, terrain, etc.
[0228] The determined driving style profiles may therefore be based on these environmental factors. Similarly, the target driving style profile may be based on environmental factors, as shown in
[0229] Vehicle Alerts
[0230] One possible way of driving preventative maintenance is to provide an early-warning system based on alerts driven by a vehicle's telematics activity, that can be used in conjunction with an OBD-capable vehicle of the kind described with reference to
[0231]
[0232] In some examples, the alert component may output an alert detailing the target driving style profile. The alert may contain one or more driving style parameters of the target vehicle, such as, the maximum acceleration. This may prompt the user of the vehicle to adjust their driving style and reduce the likelihood of component failure/damage.
[0233] The alert 51 may identify the type of vehicle fault that is expected to occur based on the matching driving style profile and linked vehicle fault profile. In some examples, the alert 51 may comprise the matching score. This may provide the user with a confidence in the degree of similarity between the target and matching driving style profiles.
[0234] The alert component 50 can be implemented within the vehicle 1T itself, or it can be a remote component that the vehicle communicates with wirelessly, for example.
[0235] As will be appreciated, this simple thresholding is just one example of how the alert component 50 can be configured using the results of the predictive analysis. The criteria according to which the alert component 50 is configured based on the probability value(s) can be more refined than this. The alert component 50 can be configured automatically using the results, but there may be a degree of manual oversight (for example, in selecting the rules according to which the alert component 50 is configured based on the probability value(s)).
[0236] The benefit of this approach is that the user is only alerted to probabilities imply a sufficiently high probability of the vehicle 1T requiring preventative maintenance.
[0237] In other cases, the alert 51 could be output elsewhere, not necessarily to the driver (e.g. it could be outputted to a vehicle fleet operator or manager), and need not be outputted in real-time.
[0238] Generally a user interface (comprising an output device, such as a display etc., and an input device, such as a touchscreen, mouse, track pad etc.) can be used to access any of the information computed at the data processing stage, for use in making vehicle predictions.
[0239] Extensions:
[0240] To further illustrate how the teaching presented herein can be extended to other models, it is useful to consider the general steps involved from receiving the telematics and claims data sets and arriving at an end solution, which could be real-time or not. [0241] 1. Telematics data and claims data received. They may be stored in the same database or have separate databases depending on size of data, etc. The telematics data sets could be sensor readings and/or diagnostic trouble codes. The columns in the telematics data sets will have similarities between each company but will not be a predetermined set of columns. [0242] 2. At this stage, without any linkage between the telematics and claims data, it is possible to do various types of data analysis such as statistics, unsupervised machine learning, topological data analysis, etc. If the data is large enough that it needs to be stored in a distributed database, an extra layer of complexity exists to be able to do the analysis, as the data needs to be collated from multiple sources. Similarly if the analysis is in real-time. [0243] 3. Linking the data sets provides telematics history on vehicles and information on whether or not the vehicles/parts experience a failure or not, and their respective costs. This opens up the possibility of predicting if a vehicle/part will experience a failure or not using techniques such as supervised machine learning. [0244] 4. When building models using supervised ML, extra features may be crafted from the original columns in the telematics data sets together with external data. Each algorithm will have different input features requirements e.g. DTC counts for Bayes, sequences for RNNs (recurrent neural networks), driving style parameters as described above etc. For predicting vehicle failure, the output variable (which is a training label, referred to herein as a significance label) could be: [0245] a. binary e.g. failure within a month or not [0246] b. multiclass e.g. failure within various time intervals [0247] 5. Once the model is built, has achieved acceptable performance, the model can be deployed to predict vehicle failure on new, unlabelled data. Again, this could be real-time or daily/weekly updates, etc. [0248] 6. The model may be re-trained, and this could be offline or online training/learning. [0249] 7. Displaying the analysis and predictive results, a front-end which displays the results may be provided.
[0250] That is to say, in general, a significance value can be assigned by the trained model to an unlabelled portion of vehicle diagnostics data. The significance value indicates how significant it is in terms of its expected consequences with regards to vehicle fault events. The model is trained using equivalent but labelled pieces of vehicle diagnostics data, where the significance label to it captured what the relevant consequences, in terms of vehicle fault events, actually were for that portion of data.
[0251] For each vehicle in the training population, at least one portion of diagnostics data collected from it is assigned a significance label, which is determined in dependence on any vehicle fault event experienced by that vehicle within a prediction time window. That is, based on any vehicle fault data available for that vehicle within the prediction time window. The prediction time window can be defined relative to the portion of diagnostic data in question.
[0252] Although the technology has been described in relation to vehicles, the technology can also be applied to other forms of machine.
[0253] Another aspect of the present invention provides a computer-implemented method of predicting machine faults, the method comprising implementing, at a data processing stage, the following steps: receiving diagnostics data collected from a plurality of machines; receiving machine fault data recording fault events experienced by the machines; for each of the machines, determining, for at least one piece of diagnostics data collected from that machine, a significance label (training label) based on the machine fault data for that machine; and using the pieces of diagnostics data and their significance labels to make a machine fault event prediction for a target piece of diagnostics data.
[0254] Although specific embodiments of the inventions have been described, variants of the described embodiments will be apparent. The scope is not defined by the described embodiments but only by the accompanying claims.