Methods of using multiple regression in football tendency analysis
11452926 · 2022-09-27
Inventors
Cpc classification
G06N7/01
PHYSICS
International classification
A63B71/06
HUMAN NECESSITIES
Abstract
Methods are disclosed in which the user defines three or more categories of plays that an American football opponent may run and multiple regression techniques are used to estimate the probability that the opponent will run a play in each such category, under particular game conditions, based on data collected from the opponent's past games. The regression coefficients and game condition data are entered into a computer device, which calculates the predicted probabilities and sorts and displays the categories of plays according to such probabilities. The user may assign ratings or rankings to schemes that the user may execute, based on the expected effectiveness of each such scheme against each category of plays that the opponent may run, and such ratings or rankings may be combined with the predicted probabilities to recommend schemes to the user under various game conditions. If permitted by the rules, such predictions or recommendations may be used to assist the user in play calling during a game. The methods may also be used to enhance scouting reports, improve efficiency of practices, and/or develop more sophisticated play sheets.
Claims
1. A method of analyzing the play calling tendencies of an American football opponent, comprising: defining at least three distinct, non-overlapping categories of plays that the opponent might call, all such categories together being exhaustive of the universe of plays that the opponent might call; collecting information on the plays that the opponent actually called in prior games, assigning each such play to exactly one of the defined categories, and collecting information about the score, time remaining, down and distance, field position, personnel package, and other relevant information in the prior games when each such play was called; inputting all such information collected into a computerized statistical database; for each of the at least three defined categories of plays, performing a multiple regression wherein each observation is a play that the opponent ran in a past game, the dependent variable is a binary variable indicating whether the play that the opponent ran was in that category, and the explanatory variables reflect the game conditions during that play; using appropriate explanatory variables reflecting the game conditions, to estimate the probability that the opponent will run a play in that category given various game conditions; programming a computer to use the coefficients estimated by each such regression to model the probability that the opponent will run a play in the corresponding category under particular conditions during a game; entering sets of game conditions into the computer to use as or to calculate values for the explanatory variables of the models; and solving the models for each category of plays and displaying each category, along with the predicted probability that the opponent will run a play in that category, sorted by probability, from highest to lowest.
2. The method of claim 1 wherein the explanatory variables include one or more dummy variables reflecting the presence or absence in the game of key individual players on the opposing team.
3. The method of claim 1 further comprising inputting data about the game conditions into the computer during a game, before each play by the opponent, immediately after the conclusion of the preceding play; computing the predicted probability that the opponent will run a play in each category or plays; and displaying the categories of plays and predicted probabilities on a computer display in time to assist in the selection of a scheme by the user and for such scheme to be communicated to players on the field.
4. The method of claim 3 wherein, for plays that do not follow extended stoppages of play, the user inputs only the fact that the preceding play resulted in an incomplete pass or the field position after the play and relevant changes in the opponent's personnel package; and the computer calculates or estimates other game conditions.
5. The method of claim 1 further comprising identifying a set of schemes that the user is prepared to deploy against the opponent; assigning a numerical rating or ranking to each such scheme against each category of plays, based on the expected effectiveness of the scheme against plays in that category; identifying recommended schemes, based on the average rating or ranking of each scheme against all of the play categories, weighted by the predicted probability that the opponent will run a play in each category; and displaying the recommended schemes in order of weighted average rating or ranking, from highest to lowest.
6. The method of claim 5 further comprising displaying the recommended schemes on a computer display in time to assist in the selection of a scheme by the user and for such scheme to be communicated to players on the field.
7. The method of claim 6 wherein, for plays that do not follow extended stoppages of play, the user inputs only the fact that the preceding play resulted in an incomplete pass or the field position after the play and relevant changes in the opponent's personnel package; and the computer calculates or estimates other game conditions.
8. The method of claim 6 wherein the a priori ranks or ratings can be adjusted during the course of a game.
9. The method of claim 1 wherein the information collected includes information about the formation from which the opponent ran each play in prior games and that information is used to formulate additional explanatory variables used in the models.
10. The method of claim 1 wherein the coefficients and/or the predicted probabilities generated by the models are used in the preparation of scouting reports on the opponent.
11. The method of claim 1 wherein the coefficients and/or the predicted probabilities generated by the models are used in the preparation of play sheets to be used during a game against the opponent.
12. The method of claim 1 wherein the coefficients and/or the predicted probabilities generated by the models are used to allocate practice time.
13. The method of claim 1 wherein the coefficients and/or the predicted probabilities generated by the models are used to enable the scout teams to simulate the opponent's behavior more accurately.
14. The method of claim 1 wherein the opponent whose tendencies are analyzed is the user's team and the coefficients and/or the predicted probabilities generated by the models are used for self-scouting.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION OF THE INVENTION
(11) The descriptions and examples of the methods disclosed herein are provided from the perspective of a user seeking to analyze the offensive play calling tendencies of an opponent, in order to develop and deploy effective defensive schemes. A skilled practitioner will recognize that the methods disclosed herein could similarly be used to analyze the opponent's defensive tendencies to facilitate the user's development of an offensive game plan or to assist in offensive play calling. There is no intention to limit the scope of the claims to the prediction of the opponent's offensive plays or to the facilitation of defensive play calling or game preparation.
(12)
(13) In step 20 of the method, the user defines a plurality of categories of offensive plays that the opponent might run. The categories should be defined so as to be exhaustive and non-overlapping, meaning that every play that the opponent might run will fall within the definition of exactly one category.
(14) In one experiment, the ten categories, described in Table 1, were used:
(15) TABLE-US-00003 TABLE 1 Play Category Definition Zone Simple handoff with zone blocking, typically with Blocking double team at point of attack Trap/Pull Simple handoff or pitch with interior lineman or linemen Blocking (including tight end) trap blocking or pulling Option Run Handoff with extended mesh point or trailing pitch man Draw/Delay Delayed handoff or quarterback draw after showing pass Reverse Handoff or pitch to a receiver, including jet sweep, end around, reverse, flea flicker, etc. RPO Quarterback roll out or bootleg Screen Any screen pass Play Action Play action pass (except where a fake handoff precedes an RPO or screen pass) Short Less than 3-step dropback from shotgun; less than Dropback 5-step dropback from under center Deep More than 2-step dropback from shotgun; more than Dropback 5-step dropback from under center
Note that these definitions provide unambiguous rules for resolution where a play seems initially to fall into more than one category—e.g., when a screen pass is preceded by play action.
(16) The play categories should be defined with sufficient precision to be useful for preparing scouting reports, planning practices, and play calling—i.e., selection of schemes likely to be effective against plays that the opponent is expected to run. As noted above, the ability to predict only two categories of offensive plays-“run” and “pass”—is likely to be of extremely limited value. Each category should be defined, however, to ensure that the statistical database contains sufficient observations of plays in that category; otherwise, the regression equation for that category will not yield meaningful coefficients. It is recommended that the categories be defined so as to provide at least three observations in each category. (Out of a total of 333 plays from five games included in the database created for the experiment described herein, three (0.9%) were in the Draw/Delay category.)
(17) While the embodiments described herein reflect particular definitions of the categories, a skilled practitioner will recognize that useful results may be obtained using different categories. There is no intention to limit the scope of the claims to the particular categories described herein.
(18) Referring again to
(19)
(20) Each observation 110 in the database 100 includes an entry in each of the fields 120. Each of the fields 120 corresponds to one of the categories of plays defined by the user; in the experiment described herein, there were ten such fields 120, corresponding to the ten defined play categories. For each observation 110, one of the fields 120 will have a value of 1, indicating that the opponent ran a play in that category; all other fields 120 will have a value of 0. The values in each of the fields 120 will be used as the dependent variable in one of the regression equations estimated in accordance with the method disclosed herein.
(21) As shown in
(22) The fields 130 contain, for each observation 110, data relating to the opponent's play selection that is known (or can be accurately approximated) immediately after the conclusion of the preceding play—e.g., the score of the game, the time remaining, down and distance, field position, personnel in the opponent's huddle.
(23) The fields 140 contain, for each observation 110, data relating to the opponent's formation when the play begins, which, due to the opponent's ability to execute various shifts and motions, cannot be known until the ball is snapped. The data contained in the fields 140 may include, for example, the number of running backs in the backfield, whether tight ends are lined up strong right or left, the number of receivers split right or left, whether the quarterback is under center or in the shotgun, etc.
(24) Referring again to
(25) While the methods disclosed herein reflect particular specifications of the regressions, a skilled practitioner will recognize that useful results may be obtained using different explanatory variables, including different functional forms or interactions of the data used to calculate the variables described herein. There is no intention to limit the scope of the claims to the particular regression specifications described herein.
(26) In the experiment described herein, ten explanatory variables, described in Table 2, were used:
(27) TABLE-US-00004 TABLE 2 Explanatory Variable Description ODYPR A variable intended to measure the team's perception of its opponent's relative vulnerability to passes vs. runs, calculated as [yards per pass]/ [yards per rush] yielded by opponent over its preceding 10 games. STR A variable reflecting the score and time remaining, calculated as: [scoring margin]/[minutes remaining in game]; the scoring margin is positive if the team is leading, negative if trailing, 0 if tied. STR.sup.2 [STR].sup.2 or, if STR is negative, − [STR].sup.2; a quadratic form of STR appeared to improve the fit of the model. 2M1H A variable reflecting time remaining in the first half (0 if more than two minutes) and distance from the team's own goal line on a scale from 0 to 100 (increasing with distance from own goal and as less time remains in half). DD A variable reflecting the down and distance situation, calculated as: [yards to go]/ (4 − [down]); in other words, the average yards the team must gain on each play in order to make a first down on the third down play. (Examples: on 1st and 10, DD = 3.333; on 3rd and 1, DD = 1.) On 4th down, DD = [yards to go]. DD.sup.2 [DD].sup.2; a quadratic form of DD appeared to improve the fit of the model. Own A variable measuring the team's proximity to its own goal line on a scale GL Prox from 0 to 100 (100 being closest), calculated as: ([yardage from opponent's goal line].sup.2)/9,801. (The denominator is a scaling factor equal to 99.sup.2.) Opp A variable measuring the team's proximity to its opponent's goal line on a GL Prox scale from 0 to 100 (100 being closest), calculated as: ([yardage from own goal line].sup.2)/9,801. (The denominator is a scaling factor equal to 99.sup.2.) TEs The number of tight ends on the field. RB s The number of running backs on the field.
(28) In one embodiment, the explanatory variables of the MLR model include one or more dummy variables reflecting the presence or absence of one or more of the opponent's key players in the game. This would provide more refined predictions if, for example, the opponent is more likely to run certain categories of plays when the first-string running back is in the game than when he is being rested. Such dummy variables could also correct for anomalies in the statistical database caused by variations in the opponent's personnel. For example, if the database comprised data from the opponent's past six games but their starting quarterback missed two of those games due to injury, such a variable could provide insight into the effect of that player's absence on the opponent's play calling.
(29)
(30) The regression statistics 220 provide diagnostic information about the regression equation as a whole. The R square, for example, is commonly cited as a measure of the “goodness of fit” or the percentage of variation in the dependent variable explained by the regression equation. Although the R squares generated by the methods described herein are lower than might be desirable for some applications of regression modeling techniques, high R squares are neither expected nor required for the methods to be of value in the current application. Because play callers intentionally try to be unpredictable—i.e., to introduce an element of random variation in their play calling—it is unsurprising that equations generated according to the claimed methods explain only limited percentages of the variations in plays called.
(31) In many applications of multiple regression techniques, the primary objective is to understand the impact, if any, of particular explanatory variables on the dependent variable. In such cases, the focus is on the magnitude and sign of the coefficients of, and the diagnostic statistics associated with, each explanatory variable; the R square of the equation is of less importance.
(32) Each of the coefficients 240 describes the mathematical relationship between the corresponding explanatory variable 230 and the dependent variable 210—i.e., the probability that the opponent will run a play in the category corresponding to dependent variable 210. If the sign of one of the coefficients 240 is positive, the probability that the opponent will run a play in that category increases as the value of the corresponding explanatory variable 230 gets larger; if the sign of the coefficient 240 is negative, the probability decreases as the value of the corresponding explanatory 230 variable gets larger.
(33) Referring to the diagnostic statistics 250, the t-statistic, in particular, is a commonly cited measure of the statistical significance of the effect of an explanatory variable on the dependent variable. A larger t-statistic provides higher confidence that the corresponding explanatory variable has a meaningful effect on the dependent variable.
(34) As shown in
(35) While the coefficient 240 on STR.sup.2 is much smaller than the coefficient on STR and bears the opposite sign, it is also statistically significant, suggesting that the effect of STR on the dependent variable 210 gets slightly smaller as STR approaches large positive or large negative values.
(36) The coefficient 240 on OwnGLProx is negative, suggesting that, when the opponent is near its own goal line, it is less likely to call a deep dropback pass, perhaps because the risk of a sack or turnover is heightened given that field position. The coefficient 240 on OppGLProx is also negative, suggesting that the opponent is less likely to call a deep dropback pass when it is near the other team's goal line, perhaps because such plays typically involve deep pass patterns that are ineffective close to the other team's end line.
(37) Not surprisingly, the coefficient 240 on DD is positive, indicating that the opponent is more likely to call a deep dropback pass on late downs with longer yardage to go. Such a call is more likely, for example, on third and 10 than on first and 10 or third and 1. The coefficient 240 on 2M1H is also positive, indicating that the opponent is more likely to call a deep dropback pass during a two-minute drill in the first half.
(38) For comparison purposes,
(39) Referring again to
(40) In the next step 60 of the method, data reflecting particular game conditions are entered into the computer, which calculates and uses a complete set of values of the explanatory variables to solve the regression model for each category of plays. The solutions represent, for each category, a probability that the opponent will run a play in that category under those particular game conditions.
(41) In the experiment described herein, five consecutive games played by a team were reviewed according to step 20. The data from those five games were used to construct the statistical database according to step 30 and that database was used to estimate regression equations according to step 40. The resulting regression coefficients were entered into a computer to create a regression model for each category of plays according to step 50. Data from each offensive play run by the team during its next game were used as or to compute, according to step 60, a complete set of values of the explanatory variables. According to step 70, those values were used to solve the models, generating, for each category of plays, a predicted probability that the team would run a play in that category; the computer was used to sort and display the categories and the associated probabilities.
(42) In step 80, the defined categories and the corresponding probabilities calculated in step 70 are sorted by probability, from highest to lowest, and displayed.
(43) For comparison purposes,
(44) As reflected in
(45) In the experiment described herein, the probabilities generated by the regression models were compared with the plays actually called by the team studied over the course of an entire game. As shown in Table 3, the team called a play in the category predicted to be most likely (Prediction 1) almost half the time (37 out of 81 plays). The team called a play in one of the two categories predicted to be most likely two-thirds of the time (54 out of 81 plays).
(46) TABLE-US-00005 TABLE 3 Prediction Frequency Percentage Cumulative % 1 37 45.68 45.68 2 17 20.99 66.67 3 5 6.17 72.74 4 11 13.58 86.42 5 8 9.88 96.30 6 0 0.00 96.30 7 0 0.00 96.30 8 0 0.00 96.30 9 3 3.70 100.00 10 0 0.00 100.00 Total 81 100.00 100.00
(47) In one embodiment, the methods disclosed herein may be used to provide predicted probabilities during a game, to assist the user with play calling, when the rules permit. In this embodiment, step 60 in
(48) In another embodiment, the methods disclosed herein may be used to recommend schemes that are expected to be effective against the categories of plays that the opponent is likely to run under given game conditions.
(49) In step 35, the user assigns a numerical rating or ranking to each identified scheme against each category of plays defined in step 20, based on the expected effectiveness of the scheme against plays in that category. (For purposes of this disclosure, it is assumed that a scheme with a higher rating or ranking is expected to be more effective; more effective schemes could just as easily be denoted with lower ratings or rankings.)
(50) In step 75, after the probability is calculated for each category of plays, a weighted average rating or ranking for each scheme is calculated according to the following algorithm:
(51)
where
(52) In step 85, the schemes with the highest weighted average ratings or rankings will be displayed as recommended schemes, sorted in order of weighted average rating or ranking, from highest to lowest, under the given game conditions. The user may choose not to display every scheme but may select a smaller number of schemes with the highest weighted average ratings or rankings. The user may also choose whether to display the categories of plays and/or the predicted probability associated with each category.
(53) In still another embodiment, the a priori rankings or ratings can be adjusted during the game, perhaps in response to actual outcomes. The adjusted rankings or ratings will thereafter be used to generate recommended schemes.
(54) The methods disclosed herein may be used to recommend schemes during a game, to assist the user with play calling, when the rules permit. As described above, step 60 of
(55) In order to minimize the delay between the end of the preceding play and the display of the recommended schemes for the next play, it would be advantageous to streamline the entry of new game condition data. In one embodiment, a full set of game condition data will be entered only at the beginning of a drive-which, by definition will result from a score or other change of possession—or after some other event accompanied by an extended stoppage of play, such as a penalty or a time out. Absent such an extended stoppage of play, minimal entry of game condition data will be required.
(56)
(57) In step 515, the user is asked whether most of the game conditions can be automatically updated before the next play. The user will respond negatively if the next play is the first play of a drive or in the event of an extended stoppage in play, such as for a penalty, a time out, or the end of a quarter. If the user declines automatic updating in step 515, then, in step 520, the program will prompt the user with the current value of each game condition, allowing the user to enter new values for those conditions that need to be updated. In step 525, the user will be prompted to input relevant changes in the opponent's personnel package, if any. At that point, all game conditions necessary to solve the regression models. will have been updated.
(58) Subsequent to the first play of a drive, and where there has been no extended stoppage of play, the user will select automatic updating when queried in step 515. If the user selects automatic updating, then, in step 530, the program will ask the user whether the previous play resulted in an incomplete pass. If not, in step 535, the program will prompt the user to update the field position resulting from the previous play, if any yardage was gained or lost. From that information, in step 540, the program will automatically calculate the new down and distance (although, if the ball is placed within a yard of the line to gain, the program will prompt the user to confirm whether a first down was made). Because the clock will be running, the program will also estimate the time remaining when the next play is initiated. (To improve the accuracy of such estimates, the user may be prompted as to whether the opponent is in “hurry up” mode.) The program then runs step 525, prompting the user to input any relevant changes in the opponent's personnel package.
(59) If, in step 530, the user indicates that the previous play resulted in an incomplete pass, the program will, in step 545, automatically update the down (the distance to go and field position will be unchanged) and estimate the time remaining (which, since the clock will have stopped as soon as the play ended, can be estimated with reasonable accuracy). The program then runs step 525, prompting the user to input any relevant changes in the opponent's personnel package.
(60) It will be apparent that, regardless of how the game condition data are updated, the update concludes with step 525. As soon as that input is received, in step 550, the program displays a complete set of the updated game conditions. In step 555, the program calculates any necessary explanatory variables, solves the regression models, and, in some embodiments, formulates recommendations to the user with regard to likely effective schemes. In step 560, the predicted play categories, corresponding probabilities, and/or recommended schemes are sorted and displayed.
(61) After the predicted play categories, corresponding probabilities, and/or recommended schemes are displayed, in step 565, the program evaluates whether the time remaining in the game has reached zero. If so (and the score is not tied), the program terminates. Otherwise, the program returns to step 515, to begin the process of updating game condition data for the next play.
(62) Even if the use of computer technology in the manner described herein is prohibited on the sideline or in the coaches' box during games, the methods described herein can be used to enhance game preparation and improve play calling. Those methods can be used, for example, to identify, confirm, and quantify the significance of keys that should be included in the scouting report on an opponent.
(63) In one embodiment, the explanatory variables used in the regression models include data on the opponent's formation at the snap. Referring again to
(64) The categories of plays may be defined to indicate the lateral direction, left or right, to which certain types of plays were run. Similarly, the explanatory variables may reflect data with a directional component—e.g., strong side, position relative to hash marks, the number of split receivers on each side of the formation. The resulting regression models may provide insight into the tendencies of an opponent to run particular types of plays to the left or right, to the strong or weak side, to the wide side or short side of the field, etc., under particular game conditions-information that might enhance the value of scouting reports.
(65) The methods described herein could be used to assist in play calling even without using computer technology on the sideline or in the coaches' box by, for example, facilitating the development of more sophisticated play sheets. Such play sheets could take into account not only down, distance and field position, but also the score, time remaining, and personnel packages. Potentially, a number of different play sheets could be prepared for each game; the user would select a particular sheet to use during a given drive based on the game conditions at the outset of that drive.
(66) The methods described herein could also be used to improve the efficiency of practices. Those methods could be used, for example, to focus the user on the categories of plays that the opponent is most likely to run, ensuring the most productive allocation of practice time. Those methods could enable the scout teams to simulate the opponent's behavior more accurately.
(67) Finally, an important aspect of game preparation is self-scouting. The user could utilize the methods described herein to analyze its own play calling to ensure that it does not exhibit any predictable tendencies that could be exploited by opponents.