SYSTEM AND METHOD FOR DETERMINING A DRIVER SCORE USING MACHINE LEARNING
20230083899 · 2023-03-16
Inventors
- Olivier GANDOUET (MONTREAL, CA)
- Jean-Christophe BOUËTTÉ (Montreal, CA)
- Ghaith KAZMA (Laval, CA)
- MAXIME LAFLEUR-FORCIER (BOUCHERVILLE, CA)
- Linda AIDA (Longueuil, CA)
Cpc classification
B60W2756/10
PERFORMING OPERATIONS; TRANSPORTING
G06F18/28
PHYSICS
International classification
Abstract
A computer-implemented system and method is provided for determining a risk assessment. The method comprises receiving a plurality of vehicle behaviour data over a defined data collection period. This data is input into a supervised learning prediction model which is trained on historical vehicle behaviour data over a past time period, to generate a predicted value of a frequency of expected claim submissions for the policyholder of the vehicle in a future time period. Then a Shapley estimate is computed for each feature of the behaviour data applied to the model for determining a contribution of each said feature to the predicted value. A spline approximation is applied to the Shapley estimate for each said feature to estimate the contribution of each said feature. Then, a sum of the spline approximation for each said feature is calculated and a corresponding risk score determined based on the sum.
Claims
1. A risk assessment server configured to provide a risk assessment for a policyholder's vehicle, the server communicating with a mobile device on the vehicle and comprising: a computer processor; and a non-transitory computer-readable storage medium storing instructions that when executed by the computer processor perform actions comprising: receiving a plurality of vehicle behaviour data from the mobile device associated with the vehicle aggregated over a defined data collection period, the vehicle behaviour data comprising a plurality of features relating to operating the vehicle over a defined data collection period, at least some of the features captured from a geo-tracking system on the mobile device while the vehicle is being driven; providing the vehicle behaviour data to a supervised learning prediction model, the prediction model being trained on historical vehicle behaviour data over a past time period, to generate a predicted value of a frequency of expected claim submissions submitted to an entity managing a policy of the policyholder's vehicle in a future time period; computing a Shapley estimate value for each feature of the vehicle behaviour data applied to the prediction model for determining a contribution of each said feature to the predicted value, wherein the Shapley estimate value for each said feature is determined by performing a spline approximation to an output of a Shapley function applied to each said feature to estimate the contribution of each said feature; and, generating an output of a sum of the Shapley estimate value for each said feature, the sum being correlated directly to a risk score for the risk assessment and instructing the mobile device to display the risk score on a risk assessment computer application storing a profile for the policyholder's vehicle.
2. The risk assessment server of claim 1, wherein computing the Shapley estimate value further comprises: applying the Shapley function to each said feature relative to all the other features in the plurality of features, the Shapley function providing an average expected marginal contribution of each said feature for generating a Shapley local approximation for each said feature; applying the spline approximation to the Shapley local approximation to generate a spline representation having a plurality of coefficients defining a spline curve; and, computing a sum of the coefficients to generate the sum of the Shapley estimate value.
3. The risk assessment server of claim 2, wherein the vehicle behaviour data further comprises: usage characteristics of the risk assessment computer application, associated with the policyholder's vehicle, on the mobile device in the defined data collection period.
4. The risk assessment server of claim 3, wherein an increased Shapley estimate value for a particular feature indicates a higher contribution of the particular feature in the prediction model thereby a higher risk associated with that particular feature for determining the risk score.
5. The risk assessment server of claim 2, wherein the prediction model is trained on the historical vehicle behaviour data over the past time period to predict the frequency of the claim submissions in the future time period wherein the past time period is for a same duration of time as the future time period.
6. The risk assessment server of claim 5, wherein the prediction model is initially trained to use the vehicle behaviour data comprising: a duration of trips and a distance of trips taken by the vehicle over the past time period via a regression model to predict the vehicle behaviour data over the future time period that is correlated with the frequency of the claims submissions in the future time period.
7. The risk assessment server of claim 3, wherein in response to receiving a plurality of vehicle behaviour data, the actions further comprise: extracting the features from the vehicle behaviour data, comprising: a set of frequency features pertaining to a frequency of trips taken by the vehicle within the data collection period; a set of location features pertaining to a plurality of key locations as determined from trips taken by the vehicle during the data collection period; a set of driving quality features including driving information pertaining to how the vehicle is being driven as captured from the geo-tracking system; and a set of application features derived from the usage characteristics of interacting with the risk assessment computer application for a profile associated with the policyholder's vehicle.
8. The risk assessment server of claim 7, wherein the features further comprise: the frequency features comprising metadata about how often the vehicle is driven, average duration of time that the vehicle is driven on average, average distance travelled by the vehicle on a given trip, and a time at which the trips are taken; the location features comprising: a source and end destination for each of the trips within the data collection period and most visited location for the vehicle; the driving quality features comprising: at risk events taken in the trips and average speed occurring within the data collection period; and, the application features comprising the usage characteristics for the risk assessment computer application relating to how often trip data points are deleted from a profile associated with the policyholder's vehicle during the data collection period.
9. The risk assessment server of claim 8, wherein the actions further comprise: determining the key locations in the location features extracted for the vehicle by applying hierarchical clustering wherein a geographical vicinity that the trip starts or ends at most frequently is considered to be a home location for the vehicle, the geographical vicinity that the trip starts or ends at a second most is considered to be a work location.
10. The risk assessment server of claim 9, wherein the location features are derived by automatically separating a start and end point of each trip within the data collection period into the key locations and averaging a number of trips that start or end at the key locations as one of the vehicle behaviour data which is input into the prediction model.
11. The risk assessment server of claim 8, wherein the application features for deletion are derived by adding up amount of times trip data was deleted from the risk assessment computer application during the data collection period, and a total distance traveled within deleted trips.
12. The risk assessment server of claim 1, wherein the risk score is assigned to the data collection period by first assigning a weight to each said feature based on a contribution that that feature has in the prediction model, and then applying a sum to a corresponding weight for each said feature to assign the risk score.
13. The risk assessment server of claim 1, wherein the prediction model is an extreme gradient boosting model wherein the model is trained in an additive manner using the historical vehicle behaviour data.
14. The risk assessment server of claim 1, wherein determining the risk score further comprises accessing a database storing a relationship between the sum of the Shapley estimate value from each said feature and a level of risk for the vehicle, the level of risk applied to calculated the risk score.
15. A computer-implemented method for providing a risk assessment for a policyholder's vehicle, the method comprising: receiving a plurality of vehicle behaviour data from a mobile device associated with the vehicle aggregated over a defined data collection period, the vehicle behaviour data comprising a plurality of features relating to operating the vehicle over a defined data collection period, at least some of the features captured from a geo-tracking system on the mobile device while the vehicle is being driven; providing the vehicle behaviour data to a supervised learning prediction model, the prediction model being trained on historical vehicle behaviour data over a past time period, to generate a predicted value of a frequency of expected claim submissions submitted to an entity managing a policy of the policyholder's vehicle in a future time period; computing a Shapley estimate value for each feature of the vehicle behaviour data applied to the prediction model for determining a contribution of each said feature to the predicted value, wherein the Shapley estimate value for each said feature is determined by performing a spline approximation to an output of a Shapley function applied to each said feature to estimate the contribution of each said feature; and, generating an output of a sum of the Shapley estimate value for each said feature, the sum being correlated directly to a risk score for the risk assessment and instructing the mobile device to display the risk score on a risk assessment computer application storing a profile for the policyholder's vehicle.
16. The method of claim 15, wherein computing the Shapley estimate value further comprises: applying the Shapley function to each said feature relative to all the other features in the plurality of features, the Shapley function providing an average expected marginal contribution of each said feature for generating a Shapley local approximation for each said feature; applying the spline approximation to the Shapley local approximation to generate a spline representation having a plurality of coefficients defining a spline curve; and, computing a sum of the coefficients to generate the sum of the Shapley estimate value.
17. The method of claim 16, wherein the vehicle behaviour data further comprises: usage characteristics of the risk assessment computer application, associated with the policyholder's vehicle, on the mobile device in the defined data collection period.
18. The method of claim 17, wherein an increased Shapley estimate value for a particular feature indicates a higher contribution of the particular feature in the prediction model thereby a higher risk associated with that particular feature for determining the risk score.
19. The method of claim 16, wherein the prediction model is trained on the historical vehicle behaviour data over the past time period to predict the frequency of the claim submissions in the future time period wherein the past time period is for a same duration of time as the future time period.
20. The method of claim 19, wherein the prediction model is initially trained to use the vehicle behaviour data comprising: a duration of trips and a distance of trips taken by the vehicle over the past time period via a regression model to predict the vehicle behaviour data over the future time period that is correlated with the frequency of the claims submissions in the future time period.
21. The method of claim 17, wherein in response to receiving a plurality of vehicle behaviour data, the method further comprises: extracting the features from the vehicle behaviour data, comprising: a set of frequency features pertaining to a frequency of trips taken by the vehicle within the data collection period; a set of location features pertaining to a plurality of key locations as determined from trips taken by the vehicle during the data collection period; a set of driving quality features including driving information pertaining to how the vehicle is being driven as captured from the geo-tracking system; and a set of application features derived from the usage characteristics of interacting with the risk assessment computer application for a profile associated with the policyholder's vehicle.
22. The method of claim 21, wherein the features further comprise: the frequency features comprising metadata about how often the vehicle is driven, average duration of time that the vehicle is driven, average distance travelled by the vehicle on a given trip, and time at which the trips are taken; the location features comprising: a source and end destination for each of the trips within the data collection period and most visited location for the vehicle; the driving quality features comprising: at risk events taken in the trips and average speed occurring within the data collection period; and, the application features comprising the usage characteristics for the risk assessment computer application relating to how often trip data points are deleted from a profile associated with the policyholder's vehicle during the data collection period.
23. The method of claim 22 further comprising: determining the key locations in the location features extracted for the vehicle by applying hierarchical clustering wherein a geographical vicinity that the trip starts or ends at most frequently is considered to be a home location for the vehicle, the geographical vicinity that the trip starts or ends at a second most is considered to be a work location.
24. The method of claim 23, wherein the location features are derived by automatically separating a start and end point of each trip within the data collection period into the key locations and averaging a number of trips that start or end at the key locations as one of the vehicle behaviour data which is input into the prediction model.
25. The method of claim 22, wherein the application features for deletion are derived by adding up a number of times trip data was deleted from the risk assessment computer application during the data collection period, and a total distance traveled within deleted trips.
26. The method of claim 15, wherein the risk score is assigned to the data collection period by first assigning a weight to each said feature based on a contribution that that feature has in the prediction model, and then applying a sum to a corresponding weight for each said feature to assign the risk score.
27. The method of claim 15, wherein the prediction model is an extreme gradient boosting model wherein the model is trained in an additive manner using the historical vehicle behaviour data.
28. The method of claim 15, wherein determining the risk score further comprises accessing a database storing a relationship between the sum of the Shapley estimate value from each said feature and a level of risk for the vehicle, the level of risk applied to calculated the risk score.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] These and other features of the disclosure will become more apparent from the following description in which reference is made to the appended drawings wherein:
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
DETAILED DESCRIPTION
[0026] Generally, in at least some embodiments, the present disclosure is directed to risk assessment systems and methods for providing a dynamic driver risk assessment including a driver risk score, by the training of computerized machine learning prediction models based on driving features to predict a frequency of future expected claim submissions. The systems and methods use a sum of estimated Shapley values for the features input into the prediction model to determine a contribution of each feature as compared to other features to the overall model output. The sum of the estimated Shapley values may then be used to generate a corresponding risk score. The Shapley values may be better estimated using a spline representation such that the coefficients of the spline curve are used to calculate the sum used to determine the risk score. Generally, the features used may include driver performance captured from telematics and/or risk application usage behaviours including how often the user drives, where the user drives, how the user drives, and whether the user decides to delete or modify certain trips or driver behaviours from their driving profile on the system.
[0027] The proposed system and method provides a more accurate and efficient method of evaluating a driver's safety based on a sum of contribution averages taking into account other features rather than cumulative scoring, and better correlates to future expected claim submissions as the prediction model has been trained based on historical driving features to predict future expected claim submissions. Rather than assigning a safety score to a driver based on arbitrary reductions from a score of 100 (e.g. speeding event=−10, accidents=−10, etc.), in at least some implementations, the proposed system and method is skewed towards the risk that the driver takes by assessing each event where each event has a magnitude. In at least some aspects, the system aggregates these trip-based scores on a monthly basis in order to provide more stability to the scoring system. The monthly aggregated score, as proposed, are determined to be more effective in predicting the amount of potential claim submissions for the month immediately following that aggregated month.
[0028] Thus, in at least some aspects, the system generates for the driver of a vehicle a score for each trip that they take (e.g. based on driver performance and/or driver use of application for generating the driver score) and this score is aggregated on a monthly basis. These monthly aggregate scores correlate to the number of claims the driver is expected to make in the month immediately following the aggregate score month.
[0029]
[0030] Generally, in at least some aspects, the proposed risk assessment system provided by the system 100 operates by compiling a number of driver related features (e.g. driving behaviours captured via the geo-tracking system 110 and/or driver interaction with a computer application tracking and outputting a risk score 103 for a driver of the vehicle 106) collected over a defined time period. The computer application tracking the risk score 103 may be provided on the mobile computing device 108 associated with the vehicle 106.
[0031] In a present non-limiting example, the risk assessment server 102 may receive a request 107 for a risk score 103 associated with a driver of the vehicle 106 being a policy holder for insurance with an entity also associated with the claims server 104. The claims server 104 is in turn configured to store profiles of all drivers insured by the entity (e.g. historical driving behaviours, driver features, and customer information) as well as a set of claims submitted for the entity by each of the drivers of the system 100 (e.g. driver of vehicle 106). The request 107 may thus originate from the claims server 104 and/or the mobile computing device 108 having a risk assessment application for tracking and presenting risk scores to associated users so that they may be aware of their risk score and factors associated with the generated scores.
[0032] In response to receiving the request 107, the risk assessment server 102 is configured to access the claims server 104 to retrieve claims data 105 which includes historical claims submitted over a past time period for a driver of the vehicle 106. The risk assessment server 102 may be configured to continually track historical driver features associated with a policyholder of a vehicle (e.g. vehicle 106). Thus the system 100 is constantly capturing features 111 and related metadata defining the features, via the risk assessment server 102, relating to the user of the mobile computing device 108 accessing a risk assessment application (a GUI of which is shown in the graphical user interface 109) and, more specifically, the user's driving behavior (e.g. a driver of the vehicle 106 which may be captured via the geo-tracking system 110). As will be understood, although a single mobile computing device 108 and a single vehicle 106 have been depicted in
[0033] Generally, different types of driver features 111 may be obtained from one or more computing devices associated with the vehicle 106 including the mobile computing device 108 and the geo-tracking system 110. The geo-tracking system 110 may be a computing device and/or telematics directed located within the on-board processing system of the vehicle 106 or alternatively an external monitoring and sensing device in communication with the vehicle 106. The geo-tracking system 110 may include but not limited to, a global positioning system (GPS) tracking unit, on-board diagnostics system, telematics devices, a geo-tracking unit, or other electronic navigational tracking systems which allows tracking and monitoring of real-time physical locations of the vehicle 106 and associated metadata such as time or duration associated with each location. The geo-tracking system 110 may additionally track trip starting points, ending points, time, duration and other trip information of various trips taken by the vehicle 106.
[0034] The features 111 may be aggregated into defined categories of features over a defined data collection period of time (e.g. a month). As will be defined, in some aspects, the data collection period of time may be similar to how far out the projection of the claim frequency will be made by the prediction model of the risk assessment server 102 in order to calculate the risk score 103 (e.g. aggregate feature data over the last month to predict claim frequency over the next month).
[0035] In a non-limiting example, referring to
[0036] Location features 116 may relate to information about where the driver of the vehicle 106 drives in terms of their start and end destinations of each trip and locations most visited. These features include home location features that are the average number of trips that start or finish at home; work location features that are the average number of trips that start or finish at work; and the infrequent location features that are the average number of trips that start or finish at an unknown place. Preferably upon receiving the location features 116, in order to derive useful information therefrom, the risk assessment server 102 may be configured to perform clustering (e.g. via clustering module 218), from a large number of start and endpoints captured in the driving data, to determine patterns such as which detected vehicle locations relate to known locations (e.g. home, work, other). Additionally, clustering may be used to determine what each of the start and endpoints relate to: the driver's home, the driver's work, another location that the driver of the vehicle 106 goes to most frequently, and places that do not fit into any of these defined categories. In at least some aspects, the risk assessment server 102 may employ a hierarchical clustering method, via a clustering module 218 shown in
[0037] When hierarchical clustering is applied, the number of clusters is not specified. Rather, after the hierarchy is built, a defined linkage distance may be set. Each location point may be treated as a separate cluster and with every iteration, the closest clusters get merged. This process may thus be repeated by the clustering module 218 until one single cluster remains. The linkage determines the distance between set of points as a function of the pairwise distances between points. The process for clustering starting points is repeated for end points. After clustering the end points, one of the steps performed by the clustering module 218 is to find the closest ending point cluster for each starting point and if the distance between the starting point and its cluster is greater than a defined distance (e.g. 1 km), a new cluster is created with the starting point co-ordinates.
[0038] Quality features 118 may relate to information about a characterization or safety information of the driver's driving on each trip (e.g. driver associated with the vehicle 106). These quality features 118 include an average sum of events per km, where the events are scaled on magnitude levels, e.g. 1-3 and 4-5, and events may be acceleration, braking, cornering, and excess speed above threshold (speeding); the average speed of the vehicle 106 when driven by a particular driver during each trip; the average number of distracted driving events, measured by factors such as how often the driver looks at their phone during a trip; and the average battery consumption per trip (e.g. battery consumption of the mobile computing device 108).
[0039] Application features 120 captured relate to information about a user's interaction(s) with a software application for tracking and displaying a risk score 103 on a display of the mobile computing device 108 for a driver of the vehicle 106 as obtained from the risk assessment server 102 and associated reasoning (e.g. as shown in an example display of a graphical user interface 109 in
[0040] Table 1 illustrates additional examples of the features 111 tracked and captured by the computing systems associated with the vehicle 106 and provided to the risk assessment server 102 for subsequent processing.
TABLE-US-00001 TABLE 1 Usage Based Insurance (UBI) Example Features (Aggregated Data) Risk Assessment Application Usage (e.g. How much do you deleted or modified trip drive? Where do you drive? How do you drive? information) Distance Home locations Average sum of events Average driving Number of deleted trips The average number of per km score trips that start or finish at Magnitude levels 1-3 and 4-5 home Acceleration (a1-a3 and a4-a5) Braking (b1-b3 and b4-b5) Cornering (c1-c3 and c4-c5) Excess Speed Above Threshold (s1-s3 and s4-s5) Duration Work locations Average speed Average number of Total distance traveled in The average number of distracted driving the deleted trips trips that start or finish at events work Month Infrequent locations Average battery The average number of consumption trips that start or finish at an unknown place Season
[0041] Referring again to
[0042] The example of
[0043] In the example of
[0044] In the example of
[0045] Risk assessment server 102, the claims server 104, the vehicle 106 (including on-board and external computing systems), the mobile computing device 108, the geo-tracking system 110 are coupled for communication to one another via the communications network 112, which may be a wide area network (WAN) such as the Internet. Additional networks may also be coupled to the WAN of communications network 112 such as a wireless network and/or a local area network (LAN) between the WAN and computing devices shown in
[0046]
[0047] The risk assessment server 102 comprises one or more processors 202, one or more input devices 204, one or more communication units 206, one or more output devices 208 and a memory 230. Risk assessment server 102 also includes one or more storage devices 210 storing one or more computer modules such as a communications module 212, a feature tracking module 214, a claims tracking module 216, a clustering module 218, a scoring module 220 comprising a prediction model 222, a Shapley estimation module 224, a spline module 226, a risk module 228, and a set of driver features 111 capturing driver performance behaviour and/or application usage comprising: frequency features 114, location features 116, quality features 118, and application usage features 120. Examples of the features 111 are also shown in Table 1.
[0048] Communication channels 232 may couple each of the components including processor(s) 202, input device(s) 204, communication unit(s) 206, output device(s) 208, memory 230, storage device(s) 210, and the modules stored therein for inter-component communications, whether communicatively, physically and/or operatively. In some examples, communication channels 232 may include a system bus, a network connection, an inter-process communication data structure, or any other method for communicating data.
[0049] One or more processors 202 may implement functionality and/or execute instructions within the risk assessment server 102. For example, processors 202 may be configured to receive instructions and/or data from storage devices 210 to execute the functionality of the modules shown in
[0050] One or more communication units 206 may communicate with external computing devices (e.g. computing devices shown in
[0051] Input devices 204 and output devices 208 may include any of one or more buttons, switches, pointing devices, cameras, a keyboard, a microphone, one or more sensors (e.g. biometric, etc.) a speaker, a bell, one or more lights, etc. One or more of same may be coupled via a universal serial bus (USB) or other communication channel (e.g. communication channels 232).
[0052] The one or more storage devices 210 may store instructions and/or data for processing during operation of the risk assessment server 102. The one or more storage devices 210 may take different forms and/or configurations, for example, as short-term memory or long-term memory. Storage devices 210 may be configured for short-term storage of information as volatile memory, which does not retain stored contents when power is removed. Volatile memory examples include random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), etc. Storage devices 210, in some examples, also include one or more computer-readable storage media, for example, to store larger amounts of information than volatile memory and/or to store such information for long term, retaining information when power is removed. Non-volatile memory examples include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memory (EPROM) or electrically erasable and programmable (EEPROM) memory.
[0053] The risk assessment server 102 may include additional computing modules or data stores in various embodiments. Additional modules, data stores and devices that may be included in various embodiments may be not be shown in
[0054] Communications module 212 may be configured to communicate various data between the risk assessment server 102, its internal modules shown in the storage 210 and other computing devices shown in
[0055] The feature tracking module 214 may be configured to track the system 100 and collect the features 111 as they become available (e.g. track for any new features and/or modifications to existing features). For example, as the system 100 operates dynamically and in real-time, the feature tracking module 214 may track whether any new trips have been taken by the vehicle 106 and extract relevant features 111. The features 111 may further then be processed as described herein to retrieve and classify them into relevant categories of information such as the frequency features 114, location features 116, quality features 118, and application features 120 as well as other feature categories and subcategories not illustrated in
[0056] In at least some aspects, the feature tracking module 214 shown in
[0057] Referring again to
[0058] The claims tracking module 216 may be configured to receive claims data 105 from the claims server 104 and to process such data such as for use in training the prediction model 222. Notably, the current claim frequency information retrieved from the claims data 105 may be used to train the prediction model 222 along with the features 111 to predict the claim frequency of a future time period. As shown in
[0059] Referring again to
[0060] The prediction model 222 is a machine learning model and preferably, in at least some embodiments, an extreme gradient boosting model, such as XGBoost, which utilizes the features 111 to predict a future likelihood of claim submissions for a policyholder of the vehicle 106 for which the features are processed.
[0061] Generally, regular gradient boosting uses a loss function of a base model (e.g. decision tree) as a proxy for minimizing an error of the overall model, XGBoost uses the 2nd order derivative as an approximation. Extreme Gradient Boosting is an efficient open-source implementation of the stochastic gradient boosting ensemble algorithm. XGBoost is an implementation of gradient boosted decision trees designed for speed and performance. Advantageously, utilizing an extreme gradient boosting model that can be used for classification or regression predictive modeling problems, as a way to predict expected claim frequency submission based on the features 111 allows an accurate and time efficient prediction, according to at least some embodiments of the present disclosure. In the extreme gradient boosting model, trees are added one at a time to the ensemble and fit to correct the prediction errors made by prior models, and configured by the prediction model 222 to accurately utilize historical driver behaviour data over a past time period, e.g. a month, to predict future driver behaviour and thereby expected claim submission frequency rate over a similar future time period, e.g. the next month.
[0062] The prediction model 222 may thus be trained by historical feature data and historical claims data captured from a number of policyholders (e.g. claims data 105, and features 111 captured from the system 100 for the current policyholder and other policyholders of the entity) to predict a likelihood that a particular driver's behaviours may lead to one or more accidents in the near future and thus an expected claim frequency over the future time duration. Once the prediction model 222 is trained during a training phase and used in the testing phase on actual current features 111 of a particular driver to predict a likelihood of accidents occurring and claim submissions in the near future, the trained model and its parameters may be provided to a Shapley estimation module 224. As may be envisaged, the prediction model 222, its inputs, outputs, trained model parameters, etc. may be accessed by the remaining modules of the scoring module 220 to calculate the risk score 103 in
[0063] Notably, the prediction model 222 feeds into the Shapley estimation module 224. The Shapley estimation module 224 is configured to assign a corresponding Shapley value to each data point input into the prediction model 222. Namely, each of the features 111 input into the prediction model 222 to generate an expected claim submission frequency is assigned a Shapley value. The Shapley values assess every combination of predictors (e.g. features 111 and/or claims data 105) to determine each predictor's impact on the output. Typically, as noted earlier, each category of the features 111 are aggregated over a duration of time.
[0064]
[0065] Referring to
[0066] Referring to the example process of
[0067] Referring to
[0068] Other examples of determining the spline values for the Shapley estimation of other types of features 111 are shown in
[0069] In the example of
[0070] Referring again now to
[0071] Accordingly, in at least some embodiments, the scoring module 220 is configured to analyze the prediction model 222 by assigning a weight to each feature processed by the model (derived from the Shapley value for the feature) based on a determined influence that the feature element has on the output prediction performed by the model.
[0072] It is understood that operations may not fall exactly within the modules and/or models 212, 214, 216, 218, 220, 222, 224, 226, and 228 of
[0073] In one example, a function may be assigned by the scoring module 220 to model the contribution determined for each feature data, e.g. each cornering event (an element of the quality features) and that function may be represented by a linear property (e.g. a spline). In this example, the function provides a graph representation of a weight to be provided to the feature, e.g. cornering event. In the current example, if a driver of the vehicle 106 has between 0.05 and 0.1 cornering events per KM, then the contribution function may assign a 0.01 (below average risk) weight. In the current example, a weight is assigned to each feature, and each feature set has its respective most important variable based on the Shapley process that is given the most weight. These examples are not meant to be limiting.
[0074]
[0075] In some aspects, operations 300 receive historical and current features 111, as well as claims data 105 for a particular policyholder and if needed, other relevant policyholders to train and test a machine learning prediction model for predicting a future likelihood of claim submissions for the particular policyholder.
[0076] The computing device for implementing the operations 300, such as the risk assessment server 102 comprises a processor configured to communicate with a display to instruct providing a GUI wherein the assessment server 102 has a communication interface to receive input features and claims related data for policyholders and wherein instructions (stored in a non-transient storage device), which when executed by the processor, configure the risk assessment server 102 to perform operations such as the operations 300.
[0077] At 302, operations of the risk assessment server 102 receive a plurality of vehicle behaviour data (e.g. features 111) from a mobile computing device 108 (and in some cases geo-tracking system 110) associated with the vehicle 106 aggregated over a defined data collection period. In some example, the data collection period may be a month and the risk assessment server 102 is configured to determine a risk assessment for the following month. At 302, the vehicle behaviour data comprises a plurality of features 111 relating to operating the vehicle 106 over a defined data collection period. Example vehicle behaviour data which are provided in the features 111, include frequency features 114, location features 116, quality features 118, and application features 120 (e.g. modification or deletion of profile or trips for the vehicle 106 on a stored log of the system 100). At 302, operations of the risk assessment server 102 provide that at least some of the features (e.g. some of the frequency features 114, location features 116 and quality features 118) are captured from a geo-tracking system 110 (e.g. GPS systems, on-board diagnostic systems, other telematics systems, etc.) are associated with one or more mobile computing device(s) 108 while driving the vehicle 106.
[0078] At 304, operations of the risk assessment server 102 provide the vehicle behaviour data (e.g. frequency features 114, location features 116, quality features 118, etc.) to a supervised learning prediction model (e.g. the prediction model 222 in
[0079] At 306, operations of the risk assessment server 102 compute a Shapley estimate value for each feature 111 of the vehicle behaviour data applied to the prediction model 222 for determining a contribution of each said feature 111 to the predicted value, wherein the Shapley estimate value for each said feature is determined by performing a spline approximation to an output of a Shapley function applied to each said feature 111 to estimate the contribution of each said feature 111. As shown in
[0080] Referring again to
[0081] Thus in at least some aspects, a risk score 103 is conveniently attributed to a driver which is based on actual real-time driving behaviours and application usage parameters and correlates to the amount of risk taken by the driver of the vehicle 106 while driving.
[0082] In at least some aspect, the risk assessment computer application (e.g. native or browser based) on the mobile computing device 108 may be configured to perform at least some of the operations of the risk assessment server 102 described herein to collect and analyze the behaviour data provided in the features 111 and display a risk score 103 thereon.
[0083] Further conveniently, in at least some aspects, the proposed methods and systems provides a direct correlation between the risk score 103, e.g. UBI score provided and claim frequency. Additionally, in at least some aspects, since the system 100 tracks application features 120, which tracks interactions with the risk assessment application such as deletion of driver behaviours or trips; turning off location or de-activation of the application, then such behaviours are also accounted for in the risk score determination provided by the risk assessment server 102 such as to continue to provide dynamic and accurate risk assessments.
[0084] One or more currently preferred embodiments have been described by way of example. It will be apparent to persons skilled in the art that a number of variations and modifications can be made without departing from the scope of the disclosure as defined in the claims.