SYSTEM AND METHOD FOR FRACTIONAL ATTRIBUTION UTILIZING AGGREGATED ADVERTISING INFORMATION
20180308123 ยท 2018-10-25
Inventors
Cpc classification
International classification
Abstract
Embodiments disclosed provide new approaches for determining fractional attribution using aggregate advertising information. A channel weighting approach may derive the causal influence weight of any channel on conversions. In some embodiments, the approach may include arranging the conversion rate of each channel into different funnel stages, constructing aggregate-level data, and running a multi-stage regression computation using instrumental variables. This approach works with any number of different types of advertising channels, including online and offline channels, and provides the most accurate credit to each channel or sub-channel involved.
Claims
1. A fractional attribution method, comprising: arranging, by a computer, a plurality of channels into a plurality of funnel stages based on a conversion rate associated with each channel; constructing aggregate-level data; and computing a multi-stage regression on the plurality of funnel stages using the aggregate-level data to thereby determine channel weights for the plurality of channels.
2. The method according to claim 1, wherein the arranging further comprises: overriding an arrangement of the plurality of funnel stages based on domain knowledge.
3. The method according to claim 1, wherein the arranging further comprises: splitting at least one of the plurality of channels into sub-channels.
4. The method according to claim 1, wherein the constructing further comprises: aggregating user-level data within a channel.
5. The method according to claim 1, wherein the computing further comprises: performing a causal analysis on channels at a first stage of the plurality of funnel stages using channels at other stages of the plurality of funnel stages as instrumental variables.
6. The method according to claim 1, wherein there are m levels in the plurality of funnel stages and wherein the computing further comprises: a) determining causal weights for the m.sup.th level channels using a two-stage least squares algorithm, using all channels above the m.sup.th level as instrumental variables; b) after the causal weights for the m.sup.th level channels are determined, determining causal weights for m1.sup.th level channels, with residual channels as dependent variables and channels above the m1.sup.th levels as instrumental variables; and c) repeating a) and b) until all causal weights are determined for the plurality of channels.
7. The method according to claim 1, wherein the computing further comprises: adding non-negative constraints such that the channel weights cannot be negative.
8. A computer program product comprising at least one non-transitory computer readable medium storing instructions translatable by at least one processor to perform: arranging a plurality of channels into a plurality of funnel stages based on a conversion rate associated with each channel; constructing aggregate-level data; and computing a multi-stage regression on the plurality of funnel stages using the aggregate-level data to thereby determine channel weights for the plurality of channels.
9. The computer program product of claim 8, wherein the arranging further comprises: overriding an arrangement of the plurality of funnel stages based on domain knowledge.
10. The computer program product of claim 8, wherein the arranging further comprises: splitting at least one of the plurality of channels into sub-channels.
11. The computer program product of claim 8, wherein the constructing further comprises: aggregating user-level data within a channel.
12. The computer program product of claim 8, wherein the computing further comprises: performing a causal analysis on channels at a first stage of the plurality of funnel stages using channels at other stages of the plurality of funnel stages as instrumental variables.
13. The computer program product of claim 8, wherein there are m levels in the plurality of funnel stages and wherein the computing further comprises: a) determining causal weights for the m.sup.th level channels using a two-stage least squares algorithm, using all channels above the m.sup.th level as instrumental variables; b) after the causal weights for the m.sup.th level channels are determined, determining causal weights for m1.sup.th level channels, with residual channels as dependent variables and channels above the m1.sup.th levels as instrumental variables; and c) repeating a) and b) until all causal weights are determined for the plurality of channels.
14. The computer program product of claim 8, wherein the computing further comprises: adding non-negative constraints such that the channel weights cannot be negative.
15. A system, comprising: at least one processor; and at least one non-transitory computer readable medium storing instructions translatable by the at least one processor to perform: arranging a plurality of channels into a plurality of funnel stages based on a conversion rate associated with each channel; constructing aggregate-level data; and computing a multi-stage regression on the plurality of funnel stages using the aggregate-level data to thereby determine channel weights for the plurality of channels.
16. The system of claim 15, wherein the arranging further comprises: overriding an arrangement of the plurality of funnel stages based on domain knowledge.
17. The system of claim 15, wherein the arranging further comprises: splitting at least one of the plurality of channels into sub-channels.
18. The system of claim 15, wherein the constructing further comprises: aggregating user-level data within a channel.
19. The system of claim 15, wherein the computing further comprises: performing a causal analysis on channels at a first stage of the plurality of funnel stages using channels at other stages of the plurality of funnel stages as instrumental variables.
20. The system of claim 15, wherein there are m levels in the plurality of funnel stages and wherein the computing further comprises: a) determining causal weights for the m.sup.th level channels using a two-stage least squares algorithm, using all channels above the m.sup.th level as instrumental variables; b) after the causal weights for the m.sup.th level channels are determined, determining causal weights for m1.sup.th level channels, with residual channels as dependent variables and channels above the m1.sup.th levels as instrumental variables; and c) repeating a) and b) until all causal weights are determined for the plurality of channels.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] The drawings accompanying and forming part of this specification are included to depict certain aspects of the disclosure. It should be noted that the features illustrated in the drawings are not necessarily drawn to scale. A more complete understanding of the disclosure and the advantages thereof may be acquired by referring to the following description, taken in conjunction with the accompanying drawings in which like reference numbers indicate like features and wherein:
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
DETAILED DESCRIPTION
[0030] The disclosure and various features and advantageous details thereof are explained more fully with reference to the exemplary, and therefore non-limiting, embodiments illustrated in the accompanying drawings and detailed in the following description. Descriptions of known programming techniques, computer software, hardware, operating platforms and protocols may be omitted so as not to unnecessarily obscure the disclosure in detail. It should be understood, however, that the detailed description and the specific examples, while indicating the preferred embodiments, are given by way of illustration only and not by way of limitation. Thus, any examples or illustrations given herein are not to be regarded in any way as restrictions on, limits to, or express definitions of, any term or terms with which they are utilized. Instead these examples or illustrations are to be regarded as being described with respect to one particular embodiment and as illustrative only. Those of ordinary skill in the art will appreciate that any term or terms with which these examples or illustrations are utilized encompass other embodiments as well as implementations and adaptations thereof which may or may not be given therewith or elsewhere in the specification and all such embodiments are intended to be included within the scope of that term or terms. Language designating such non-limiting examples and illustrations includes, but is not limited to: for example, for instance, e.g., in one embodiment, and the like. Various substitutions, modifications, additions and/or rearrangements within the spirit and/or scope of the underlying inventive concept will become apparent to those skilled in the art from this disclosure.
[0031]
[0032] In the example of
[0033] An attribution platform 120 in accordance with embodiments of the invention allows the advertiser 116 to make informed decisions about payment for advertisements and future ad campaigns.
[0034] Data from the click 101 and ultimate conversion 118 may be collected in a variety of ways. In some embodiments, one or more computers in the network 122 may collect click data. In some embodiments, a click data collecting computer may be a server machine residing in a publisher 114's or other party's computing environment or network. In some embodiments, the click data collecting computer may collect click streams associated with visitors to one or more Web sites. In some embodiments, the collected information may be stored in one or more log files. In some embodiments, the information associated with the plurality of clicks may comprise visitor Internet Protocol (IP) address information, date and time information, publisher information, referrer information, user-agent information, searched keywords, cookies, and so on. For additional examples on collecting information provided from a visitor's Web browser application, readers are directed to U.S. patent application Ser. No. 11/796,031, filed Apr. 26, 2007, entitled METHOD FOR COLLECTING ONLINE VISIT ACTIVITY, which is fully incorporated herein by reference.
[0035] In some embodiments, the attribution platform 120 employs ad tags for monitoring impression data and page tags for monitoring click data. Ad tags can be 11 pixels embedded in page code at the publisher site and can be used to determine where the ad is on a page (above or below a fold, i.e., visible with or without scrolling) and whether and how long a user sees it. Page tags can be embedded in a similar manner on the landing page, and can identify whether a user has arrived and where the user comes from. Example tags are included in the attached Appendices A and B. As will be described in greater detail below, ad tags or page tags can be transmitted to the attribution platform 120 responsive to a user viewing or clicking on an ad and viewing or clicking on an associated web page.
[0036] In addition, in some embodiments, aggregate data may be provided or collected. For example, such aggregate data can include data from offline sources, such as television or radio ratings over predetermined periods, magazine and newspaper circulation on a per issue basis, and the like. In addition, in certain embodiments, user-level data from online sources may be aggregated out to correspond to similar data from offline sources. Examples of aggregate-level data include daily total impressions and click and conversion volumes by channel. External time series data such as consumer price index may also be leveraged, depending on the particular embodiment.
[0037]
[0038] An ad server 212 may be used to maintain the ad on the publisher's web site 204. The user 202 may click an ad to arrive at a landing page 208. Embedded on the landing page 208 includes a page tag 207, which identifies user accesses to the landing page 208 and may be sent to a database 214 accessible by the attribution platform 220. An advertiser 206 records a conversion 218, if any, and likewise provides the information to the attribution platform 220.
[0039] Attribution platform 220 may reside in a computing environment comprising one or more server machines. Each server machine may include a central processing unit (CPU), read-only memory (ROM), random access memory (RAM), hard drive (HD) or non-volatile memory, and input/output (I/O) device(s). An I/O device may be a keyboard, monitor, printer, electronic pointing device (e.g., mouse, trackball, etc.), or the like. The hardware configuration of this server machine can be representative of other devices and computers alike at a server site (represented by platform 220) as well as at a client site.
[0040] Embodiments of platform 220 disclosed herein may include a system and a computer program product implementing a method for fractional attribution in a network environment. In some embodiments, platform 220 may be owned and operated independent of the clients that it services. For example, company A operating platform 220 may provide attribution services to company B operating a client (not shown). In one embodiment, Companies A and B may communicate over a network. In one embodiment, Companies A and B may communicate over a secure channel in a public network such as the Internet. Example clients may include advertisers, publishers, and ad networks.
[0041] In some embodiments, the system may run on a Web server. In some embodiments, the computer program product may comprise one or more non-transitory computer readable storage media storing computer instructions translatable by multiple processors to process attribution data. The input data may be from a log file, a memory, a streaming source, or ad and page tags. Within this disclosure, the term attribution data refers to any and all data associated with online advertising events such as clicking on an ad, viewing an ad (an impression), entering a search query, conversion, and so on, and may include click history data, click intelligence data, post-click data, visitor profile data, impression data, etc.
[0042] In some embodiments, software running on a server computer in platform 220 may receive a client file containing attribution data from an attribution data collecting computer associated with a client. For example, a client may represent an online retailer and may collect click stream data from visitors to a Web site own and/or operated by the online retailer. The attribution data thus collected can provide a detailed look at how each visitor got to the Web site, what pages were viewed by the visitor, what products and/or services the visitor clicked on, the date and time of each visit and click, and so on.
[0043] The specific attribution data that can be collected from each click stream may include a variety of entities such as the Internet Protocol (IP) address associated with a visitor (which can be a human or a bot), timestamps indicating the date and time at which each request is made or click is generated, target URL or page and network address of a server associated therewith, user-agent (which shows what browser the visitor was using), query strings (which may include keywords searched by the visitor), and cookie data. For example, if the visitor found the Web site through a search engine, the corresponding click stream may contain the referrer page of the search engine and the search words entered by the visitor. Attribution data can be created using a corporate information infrastructure that supports a Web-based enterprise computing environment. A skilled artisan can appreciate what typical attribution click streams may contain and how they are generated and stored.
[0044] Thus, in some embodiments, optimization data may include an impression/click record for every ad impression/click received from a given client of the system. An example impression/click record may include Impression/click timestamp; visitor cookie (if available, may be set up as a domain cookie for persistent visitor identification); visitor IP address; visitor browser user-agent; impression/click source (may be a publisher ID or a referrer domain); click destination (landing page Web address or bid keywords for advertisers); and conversion data (whether the visitor executed a desired conversion).
[0045] The optimization data returned from log files or tags may comprise one or more rows of data arranged in a plurality of fields. For example, in some embodiments, each row of event data includes twenty-three fields, defined as follows: [0046] 1. Server Timestamp, in YYYYMMddHHmmss format (UTC) [0047] 2. Request ID, generated by the server as a unique identifier for the logging call [0048] 3. Cookie ID. Omitted if the browser does not accept cookies. [0049] 4. Source IP [0050] 5. Interaction/Event Type [0051] <empty>=Old logs/tags did not specify an interaction type; this should be processed as an impression for those, but all recent data should process this as an error [0052] ?=Error conditionan invalid or unknown interaction type was specified. May indicate that an old parser is processing newer log files if there is a high frequency. [0053] 0=Impression [0054] 1=Click [0055] . . . [0056] 6. Session IDa number generated on page load by the browser and sent on all requests from that page (Impression, On Load, Post, etc.), used to correlate those events together. Populated in JavaScript tags only, 0 for pixel tags. [0057] 7. Campaign ID [0058] 8. Placement IDmay be an ID generated by us, if hard-coded in the tag, or the ad server placement ID, if populated by macro on the ad server. [0059] 9. Publisher IDoften not used (0) [0060] 10. Creative IDrarely used (0), but may be used to indicate the creative. [0061] 11. Agency IDoften not used (0). [0062] 12. Visibility1 if the tag is in an iFrame and visibility information cannot be collected. This prevents collection of ad seen and ad time data, as well as possibly indicating a bogus (ad server) referrer. Populated in JavaScript tags only. [0063] 13. Location on Page. Populated in JavaScript tags only. [0064] 0=Banner (top 20%) [0065] 1=Left Column (left 30%) [0066] 2=Center Column (middle 40%) [0067] 3=Right Column (right 30%) [0068] 4=Below the fold [0069] 5=Everything else (off right) [0070] 14. Ad Seen. 1 if the ad is not in an iFrame and was viewed at some point, captured by a JavaScript tag. Empty otherwise. [0071] 15. Screen resolution, WidthHeightBit Depth; JavaScript tag only. [0072] 16. Time on Adthe amount of time (seconds) the ad was scrolled into view in the client; only available from the JavaScript tag when not in an iFrame [0073] 17. Time on Pagethe amount of time (seconds) the page was viewed; only available from the JavaScript tag [0074] 18. Source URL (URL encoded)Best effort at finding the page URL. The JavaScript attempts to climb out of iFrames when possible to determine this, though sometimes the referrer must be used. The server component will attempt to extract the actual source URL from some known ad server referrers, if possible. [0075] 19. User Agent (URL encoded) [0076] 20. Demographic data. Pipe (I) delimited segments from the relevant demographic provider, as indicated by the interaction type. [0077] 21. Referrer URL (URL encoded)The referrer of the page containing the tag, if available (i.e., not in a non-friendly iFrame and for the channel.js page tag); otherwise, the actual http referrer of the pixel/tag. [0078] 22. Revenueif available (e.g., via Brighttag) [0079] 23. Custom (URL encoded)any custom/unknown parameters specified on the http request, not otherwise handled. These take the form of key1=value1;key2=value2;key3 . . . . An example of usage is to pass the custom field checkout_rank=N through in this manner.
[0080] An exemplary event row is shown in Table 1 below:
TABLE-US-00001 TABLE 1 column value 1 20130719180002 2 Q2I2MzMxOGRIMjAxMzA3MTkxNDAwMDI3Mg== 3 C70d37a002013041209093840 4 12.43.117.146 5 4 6 903947 7 2118 8 63507 9 0 10 0 11 0 12 1 13 14 15 16 17 18 https%3A%2F%2Fwww.ideeli.com%2Flogin%3Futm_campaign%3DDaily %26utm_medium%3Demail%26utm_source%3Dideeli%26csync%3D1 Mozilla%2F5.0+%28Windows+NT+5.1%29+AppleWebKit%2F537.36+% 28KHTML%2C+like+Gecko%29+Chrome%2F28.0.1500.72+Safari%2F5 37.36 19 20 https%3A%2F%2Fwww.ideeli.com%2Flogin%3Futm_campaign%3DDaily %26utm_medium%3Demail%26utm_source%3Dideeli 21 22 23
[0081] For the sake of simplicity, hardware components (e.g., CPU, ROM, RAM, HD, I/O, etc.) are not illustrated in
[0082] It may be helpful to first describe a method for using event level or user level data for fractional attribution.
[0083] Without loss of generality, assume that a user has had three events (i.e., three interactions with a marketer's various campaigns; the definition of interactions is discussed below), prior to her conversion. The fractional attribution problem includes figuring out what fraction of the conversion credit goes each of the three events. A more mathematical description can be as follows:
[0084] If a user had events E.sub.1, E.sub.2, and E.sub.3 and then converted, what fractional credit w.sub.1 goes to E.sub.1, w.sub.2 goes to E.sub.2, and w.sub.3 goes to E.sub.3, subject to .sub.j=1.sup.3w.sub.j=1?
[0085] In this example, it is assumed that the conversion event is 100% driven by the combination of the three events {E.sub.1, E.sub.2, E.sub.3}. In reality this might not be true. However, it appears likely that whatever factors not observed introduce the same bias to all the campaigns in the data. The fractional attribution results are still useful in reflecting the relative importance of different channels/campaigns or of any other entities in which one might be interested.
[0086] In some embodiments, a good attribution model may possess three desirable properties: Monotonicity (Property 1); Correlation with Conversion (Property 2); and Accounting for Event Interactions (Property 3).
[0087] The first desired property is Monotonicity, which means that if two events (e.g., E.sub.1 and E.sub.2) were combined into one composite event E.sub.12 then the fraction credit w.sub.12 for E.sub.12 should most likely be no less than w.sub.1 or w.sub.2. That is, w.sub.12w.sub.1 and w.sub.12w.sub.2. The intuition is that two events a converted user has with a marketer's campaigns should deserve no less credit than each of those two events individually.
[0088] The second property, Correlation with Conversion, holds that the weight for each event should be roughly correlated with the event's ability to drive conversions based on historical data. If E.sub.1 historically has driven conversions better than E.sub.2 and E.sub.3 together, then E.sub.1 deserves more credit than either E.sub.2 and E.sub.3.
[0089] The third property of the model should take into account as much as possible the interactions among different events. For example, if individually each of the three events has driven conversions equally well, but when E.sub.2 and E.sub.3 are together they have driven conversions much better, a higher credit weight should be given to either E.sub.2 or E.sub.3 than to E.sub.1.
[0090] Let conversion be represented by C, in mathematical terms, this means
If P(C|E.sub.1)P(C|E.sub.2)P(C|E.sub.3) but P(C|E.sub.2,E.sub.3)>>P(C|E.sub.1), then w.sub.2>>w.sub.1 and w.sub.3>>w.sub.1.
[0091] Embodiments make use of data-driven probabilistic models. That is, all the conditional probability estimates discussed herein are based on historical data.
[0092] In particular, each conditional probability P(A|B) can be derived from historical data by dividing the number of users who (at least) had events A and B by number of users who (at least) had event B. That is,
[0093] Embodiments may make use of any of a variety of models, although some may be more or less desirable, depending on the nature of the data.
[0094] A first model (Model 1) may be the Nave Bayes model:
[0095] Consider the nave Bayes model for P(C|E.sub.1, E.sub.2, E.sub.3):
P(C|E.sub.1,E.sub.2,E.sub.3)P(C|E.sub.1).Math.P(C|E.sub.2).Math.P(C|E.sub.3).(1)
[0096] One natural idea would be to use
w.sub.j=P(C|E.sub.j),j=1,2,3.(2)
[0097] This nave choice does possess Properties 1 and 2 discussed above. However, this model assumes that the three events {E.sub.1, E.sub.2, E.sub.3} are independent given the conversion event C. It does not return the right answer when there are strong event correlations; that is, it does not possess Property 3. For example, in the example used for explaining Property 3, this model would NOT give a higher weight to either E.sub.2 or E.sub.3 than that to E.sub.1, which is desired.
[0098] A second model (Model 2) may be the Conversion Index model:
[0099] If w.sub.1 is set to be the conversion index of E.sub.1
where .sub.1 means no event E.sub.1. This model turns out to be very similar to the nave Bayes model because w.sub.1 in (3) is strongly positively (although nonlinearly) correlated with P(C|E.sub.1). As in the nave Bayes model, correlations among the three events are not taken into account.
[0100] A third model (Model 3) may be the Conditional Importance model:
[0101] Consider capturing the importance E.sub.1 by the conditional probability
which indicates how likely E.sub.1 is observed, given that {E.sub.2, E.sub.3, C} are observed.
[0102] However, with (4), w.sub.1 may change in the wrong direction when the specificity of E.sub.1 is increased. For example, if (4) were used to compute the importance of a composite event E.sub.12={E.sub.1, E.sub.2}, the result would be
which will most likely be smaller than w.sub.1, even though according to Property 1 one would normally expect the opposite (w.sub.12>w.sub.1), i.e., the composite event E.sub.12 should most likely get more conversion credit, not less.
[0103] A fourth model (Model 4) may be the Marginal Importance model:
[0104] Consider an improvement of Model 3 as follows
[0105] This normalizes the probability of seeing E.sub.1 given {E.sub.2, E.sub.3, C} in (4) by the probability of seeing E.sub.1 given {E.sub.2, E.sub.3}. The idea is that, if E.sub.1 is equally likely with or without C (given {E.sub.2, E.sub.3}), then it is probably not that important. Also what it means is that if E.sub.2&E.sub.3 together drive conversions as well as all three events together, i.e., P(C|E.sub.2, E.sub.3) is close to P(C|E.sub.1, E.sub.2, E.sub.3), then E.sub.1 is probably not that important and the weight for E.sub.1 should be small.
[0106] This new importance measure does not have the issue of Model 3 as the composite event E.sub.12={E.sub.1, E.sub.2} would have an importance weight most likely higher than w.sub.1 or w.sub.2 alone. It can be imagined that
is most likely higher than w.sub.1 as it is most likely that P(C|E.sub.3)<P(C|E.sub.2, E.sub.3). Again the intuition here is that normally for a given user, the more he is advertised to, the more likely he is to convert.
[0107] This model also addresses the issue of not considering event interactions (as mentioned for Model 1&2). Suppose E.sub.1 & E.sub.2 together is effective and drives a high P(C|E.sub.1, E.sub.2) but it is not the case for P(C|E.sub.1, E.sub.3) and P(C|E.sub.2, E.sub.3), it can be seen that based on (5) E.sub.1 & E.sub.2 will each get more credits than E.sub.3.
[0108] A variant of Model 4 can be
[0109] This weight becomes zero when P(C|E.sub.2, E.sub.3)=1.
[0110] Overall, the Marginal Importance model in (5) may provide better results than the other models discussed and possesses the three desired properties proposed above.
[0111] To generalize to the situation in which there are there are more than three events, say a converted user had K events, {E.sub.1, E.sub.2, . . . , E.sub.K}, the credit weight for E.sub.1 (j=1, . . . , K) would be
where {E.sub.1, E.sub.2, E.sub.K}\E.sub.J means the subset of {E.sub.1, E.sub.2, E.sub.K} without E.sub.1.
[0112] The definition of events may vary from implementation to implementation. For example, E.sub.1 could represent a user seeing one or more impressions from a specific campaign; or a user seeing one or more impressions from a specific campaign more than two weeks ago; or a user seeing exactly two impressions from a specific campaign in the last day; or a user seeing one or more impressions on a specific site in the last day; etc.
[0113] As can be appreciated, the list of possible definitions can quickly become intractable. The question is which definitions make more sense than others for a particular implementation and how to combine attribution results if one were to run attribution analysis with different event definitions.
[0114] It may be desirable to define an event as specifically as possible; e.g., a user seeing exactly n impressions from campaign x with creative y on site z exactly m days ago. However, defining events at that deep level of granularity may encounter data sparsityoften there is not enough data to robustly derive the conditional probabilities described in the previous section. It may sound counterintuitive as the system easily collects billions of impressions and hundreds of millions of users every month from a large advertiser. However, not many users would share the same event of seeing exactly n impressions from campaign x with creative y on site z exactly m days ago. When the number of users is small, there would be low confidence in the conditional probabilities estimated.
[0115] To increase confidence levels, one can define events at a less granular level such as the campaign level. There are likely a lot of (both converted and non-converting) users sharing the event of seeing at least one impression from campaign x, making the estimates at campaign level more robust. However, if there are only estimates at the campaign level, it does not help to attribute conversion credits across different sites, different frequency or recency values for the same campaign.
[0116] In some embodiments, an attribution analysis may be run at many different granularity levels and then combined based on confidence values of different estimates. One technique for this task is hierarchical Bayesian shrinkage. The goal is to get as robust as possible an estimate at the most granular level. One way to address data sparsity at the granular level is to borrow information (or estimates) from lower granularity levels.
[0117] In some embodiments, different levels can be arranged into a hierarchy 300 like the one shown in
[0118] Likewise, parent node site 304 is parent to site+frequency 312 and site+frequency node 314 which, in turn, are parents to site+frequency+recency node 316. Nodes 310 and 316 are parents and less granular than node 318 (campaign+site+frequency+recency).
[0119] The attribution weight for a given event can be calculated for every node in the hierarchy and combined based on the confidence of each calculation. Confidence can be a function of the amount of data (i.e., the number of users) used to estimate the conditional probabilities. For example, a reasonable confidence function is the sigmoid function
where n is the number of users, and and are adjustable parameters. The parameter determines when confidence becomes 0.5 and controls how fast the confidence grows with n.
[0120] One way of combining the attribution weights estimated at different granularity levels is to take a confidence-weighted average across different levels. That is,
.sub.lg.sub.lw.sub.l/.sub.lg.sub.l,
where w.sub.1 is the attribution weight at level l and g.sub.l is the confidence at level l. This effectively shrinks the (less robust) estimate at the most granular level towards (more robust) estimates at less granular levels, thus the name of shrinkage. In statistical terms, it is a tradeoff between bias and variance. At more granular levels, the estimates have lower bias but higher variance; at less granular levels, the estimates have lower variance (i.e., more robust) but higher bias. It will be appreciated that the actual equation may vary somewhat from implementation to implementation. For example, one embodiment may add a level-dependent weight that is fixed for each level to reflect prior knowledge about the importance of difference levels. That is, if enough data can be had at a campaign+recency level, one might want to give more weight to that level than to a less granular (e.g., campaign) level.
[0121]
[0122] In a step 402, conversions and events are defined. As noted above, in some embodiments, a conversion is a desired activity, such as a user purchase of an advertiser's product or service. An event can be one or more user-defined events or sequences of events.
[0123] In a step 404, for each event definition (i.e., a particular granularity level), event sets for each user/conversion are created. This is essentially to arrange events by user and conversion. For each incidence of the conversion, this step may include listing all the event item exposures the user had prior to the conversion. Events are defined and tracked from the raw impression/click/conversion data obtained from the ad tags and page tags or log files or other data collected.
[0124] In a step 406, for each event definition, create event subsets that need counts. That is, for each event set of size K (that associates with a conversion), generate K1 event subsets as explained above.
[0125] In a step 408, for each event definition, and for each event subset generated, count the number of converted users and number of non-converting users and use the ratio between those two as the basis for computing attribution weights. The total user counts may also be used as the basis for computing confidence as described above.
[0126] In a step 410, for each event definition, populate the attribution weights down to the most granular event level, i.e., individual impressions or clicks. Depending on the event definition, each event may map to one or more impressions/clicks and the attribution weight computed for the event will be evenly distributed down to individual impressions/clicks. For example, if events are defined by a campaign+recency, an event (campaign x+3 days ago) gets a weight of 0.6 and it corresponds to 10 impressions on that day, then each of those 10 impressions would get a weight of 0.06.
[0127] Finally, in a step 410, combine the attribution weights from different event definitions (i.e., different granularity levels) using, for example, the hierarchical Bayesian shrinkage method described above.
[0128] In some embodiments, step 406getting the user counts for each event subsetis computationally intensive. There can be hundreds of millions of users and hundreds of thousands of subsets. Each user is represented by an event set (all the events the user has had). The basic operation is, for each user and each subset, to determine if the user's event set contains the subset of interest (for which we want to get user counts).
[0129] One efficient way of doing the counting is to determine, for each user, which n events he has seen, and to define (n1) subsets. For example, if he has seen events E1, E2, E3, then the subsets are defined as follows:
S1 E1, E2
S2 E1, E3
S3 E2, E3
[0130] For each event in any of the subsets, keep track of the list of the indexes of the subsets that contain the item.
[0131] Then, for each user, go through each event in the user's event set and add all the subset indexes to a hash and keep track of the counts. For example, for event E1, add the subset indexes of S1 and S2 to a hash; for event E2, add the subset indexes of S1 and S3 to a hash; and for event E3, add the subset indexes of S2 and S3 to a hash. If the hash count of a subset index equals the length of the subset, increase the user count for a subset.
[0132] These steps can be performed for both converted users and non-converting users, separately, to obtain the counts. Further, these steps can be easily parallelized in practice.
[0133] An additional simplification may be made by noticing that most of the users are non-converting users. As such, a sample of the non-converting users may be taken to reduce the computation. Experiments have shown that using a 10% sample of non-converting users seems to generate roughly the same attribution weights vs. using all users' data.
[0134] The process of shortcut counting of converting and nonconverting users is shown below by way of an eight event example:
[0135] Shown in Table 2 below are exemplary event data (each row in this example is a user event sequence; E.sub.1-E.sub.8 are eight events to be assigned conversion credits; C/NC stands for conversion/no conversion):
TABLE-US-00002 TABLE 2 E.sub.1 E.sub.2 E.sub.3 .fwdarw. C E.sub.1 E.sub.2 E.sub.5 .fwdarw. NC E.sub.3 E.sub.4 E.sub.5 .fwdarw. C E.sub.1 E.sub.3 E.sub.4 E.sub.5 .fwdarw. NC E.sub.1 E.sub.2 E.sub.6 .fwdarw. C E.sub.3 E.sub.4 E.sub.5 E.sub.6 .fwdarw. NC E.sub.1 E.sub.5 E.sub.6 E.sub.7 .fwdarw. C E.sub.1 E.sub.2 E.sub.4 E.sub.6 E.sub.7 .fwdarw. NC E.sub.2 E.sub.3 E.sub.4 E.sub.7 .fwdarw. NC E.sub.1 E.sub.2 E.sub.3 E.sub.5 E.sub.7 .fwdarw. NC E.sub.1 E.sub.3 E.sub.5 E.sub.6 E.sub.8 .fwdarw. NC E.sub.2E.sub.6 .fwdarw. NC
[0136] For each converted user, generate all leave-one-out sub-sequences. For example, from the first converted user, one gets {E.sub.1 E.sub.2}, {E.sub.2 E.sub.3}, and {E.sub.1 E.sub.3}.
[0137] Next, merge the sub-sequences from all converted users. For example, from the four converted users, one gets the following 12 sub-sequences, where the second column is an index assigned to the sub-sequences. This is shown in Table 3 below.
TABLE-US-00003 TABLE 3 {E.sub.1 E.sub.2}, 1 {E.sub.2 E.sub.3}, 2 {E.sub.1 E.sub.3}, 3 {E.sub.3 E.sub.4}, 4 {E.sub.4 E.sub.5}, 5 {E.sub.3 E.sub.5}, 6 {E.sub.1 E.sub.6}, 7 {E.sub.2 E.sub.6}, 8 {E.sub.1 E.sub.5 E.sub.6}, 9 {E.sub.1 E.sub.5 E.sub.7}, 10 {E.sub.1 E.sub.6 E.sub.7}, 11 {E.sub.5 E.sub.6 E.sub.7}, 12
[0138] For each sub-sequence S, count the number of converted users (n.sub.conv) and number of non-converting users (n.sub.nonconv) that have the sub-sequence and compute the conditional probability
(the extra count 1 and 2 added to the numerator and denominator are priors used to smooth out estimate from very sparse data).
[0139] To get the counts (n.sub.conv and n.sub.nonconv), do the following:
[0140] For each event, build an inverted index for each event that appeared in any converted user sequence, which stores the indexes of the sub-sequences that contain the event. This is shown in Table 4 below.
TABLE-US-00004 TABLE 4 E.sub.1 .fwdarw. {1, 3, 7, 9, 10, 11} E.sub.2 .fwdarw. {1, 2, 8} E.sub.3 .fwdarw. {2, 3, 4, 6} E.sub.4 .fwdarw. {4, 5} E.sub.5 .fwdarw. {5, 6, 9, 10, 12} E.sub.6 .fwdarw. {7, 8, 9, 11, 12} E.sub.7 .fwdarw. {10, 11, 12}
[0141] For each user sequence in Table 2, use the inverted index to determine which sub-sequences in Table 3 are subsets of the user sequence, i.e., for which sub-sequences one should increment n.sub.conv and/or n.sub.nonconv. That is, for the first converted user sequence {E.sub.1 E.sub.2 E.sub.3.fwdarw.C}, generate the following list (see Table 5 below) from the inverted index Table 4: {1,3,7,9,10,11; 1,2,8; 2,3,4,6} and then the sub-sequence counts (number of times appearing in the list):
TABLE-US-00005 TABLE 5 1:2 2:2 3:2 4:1 x 6:1 x 7:1 x 8:1 x 9:1 x 10:1 x 11:1 x
where the last column indicates whether each sub-sequence is a subset of the user sequence (by comparing the counts in the second column to the length of the sub-sequence; e.g., sequence 1 has a count of 2 in Table 5 and a length of 2 as seen in Table 3). Therefore, by going through, the user sequence {E.sub.1 E.sub.2 E.sub.3.fwdarw.C}, it was determined that one should increase n.sub.conv for sub-sequence 1, 2, and 3.
[0142] Results from operation of attribution modeling according to some embodiments will be discussed by way of example below.
[0143]
[0144] For simplicity, results for many other levels are omitted and in the last row the final fractional attribution results based on applying hierarchical Bayesian shrinkage to combine the results from all different levels are shown.
[0145] After this is done for every conversion, the result is a weight for each impression/click event (i.e., at the most granular level). These final weights can then be rolled up along different dimensions for reporting. Common dimensions of interest include campaign, site, creative, etc.
[0146]
[0147]
[0148] As noted above, in some cases, some but not all user-level data may be available. Additionally, some user-level data may be difficult or expensive to get. For instance, a user may be exposed to a business's advertising channels such as a direct mail campaign, an email campaign, and online ads displayed on various web sites including social networking sites, etc. Thus, this user's converting path may include one or more offline channels where user-level data may not be available as well as one or more digital channels such as social networking sites where user-level data may be difficult and/or expensive to get. In such cases, a hybrid approach may be utilized to determine appropriate attribution fractions. This hybrid approach is driven by data in that the importance (which serves as the basis of calculating attribution fraction) of each advertisement event is derived based on data on both converted users and non-converting aggregate-level data. Aspects and examples of a fractional attribution approach using user-level data and aggregate level data are provided in U.S. patent application Ser. No. ______ (Attorney Docket No. ADOM1200-1), filed Oct. 17, 2013, entitled SYSTEM AND METHOD FOR FRACTIONAL ATTRIBUTION UTILIZING USER-LEVEL DATA AND AGGREGATE LEVEL DATA, which is fully incorporated by reference herein.
[0149] At the aggregate level, the influence of a channel relative to other channels may also be important in terms of conversions. For example, a user may receive at home a flyer advertising a sale of a product on a web site, use a search site to research on the product, be redirected to the web site by the search site, and ultimately purchase the product (a conversion). Accordingly, it may be desirable to find the appropriate fractional attribution across channels at the aggregate level. Various approaches may be used for channel weighting, including those using regression modeling or instrumental variables. These approaches will now be described.
[0150] A regression modeling approach can be used to build a predictive model that can predict total (multi-channel) conversions, based on channel volumes. According to embodiments, a what-if analysis to produce a delta key performance indicator (KPI) that can be attributed to a given channel. In particular, the what-if analysis sets the volume for a channel to 0 and uses the delta change in predicted conversions as a measure of the conversion contribution from the channel. The deltas may be normalized across all channels to get a channel weight.
[0151] That is, delta KPI=predicted KPI (with all channels)predicted KPI (without [what-if] channel).
[0152] An exemplary regression model that may be used is provided below:
x.sub.0 is length of time period (e.g., # days)
{x.sub.i}.sub.i=1, . . . , m are volumes for different channels/placements
{w,,} are non-negative parameters of the non-linear regression model that are designed to capture interactions between each channel/placement and the KPI and among channel/placements.
[0153] Then, the predicted KPI, , is given by:
[0154] Here, w.sub.0.Math.x.sub.0 captures the baseline; g(.sub.kx.sub.k) captures channel-specific values; and g(.sub.k=1.sup.m.sub.kx.sub.k) captures interactions.
[0155]
[0156] Table 6 below illustrates exemplary aggregate level data that may be used in conjunction with embodiments. In this example, Table 6 shows the KPI for data in predetermined periods (i.e., one week) for TV volume, Display volume, and Paid Search volume.
TABLE-US-00006 TABLE 6 Period TV Display Paid Search Week length Volume volume volume Index KPI (x0) (x1) (x2) (x3) 1 1641 2151449000 16804027 301862 2 1550 1324139000 17105960 295913 3 1756 1752262000 26227548 431713 4 1674 1604994000 21903751 223286 5 1919 1984001000 21154248 204708 6 1646 1104399000 9013703 155295 7 2230 664204000 8747002 142544 8 917 1994760000 8721300 127959 9 2095 2133997000 17462143 203469 10 2005 2187959000 19622518 183965 11 1817 1629374000 15305965 195570 12 839 1066385000 7120515 110342 13 1219 731298000 6230122 49386 14 3061 1075845000 30407220 298963 15 2872 1954760000 37775621 324554 16 2435 1460215000 33495246 296442 17 2429 2508148000 25601200 185078 18 1801 2816486000 25195267 360340 19 1238 2876553000 32740966 679508 20 1283 3493989000 34464282 797808
[0157] In one embodiment, the above exemplary aggregate level data can be used to determine the weights for the aggregate-level regression model. An example of the results is shown in Table 7 below.
TABLE-US-00007 TABLE 7 Parameter Value w0 0.294972 w1 0 w2 0.460304 w3 0 w4 0.298837 alpha1 0 alpha2 0.62797 alpha3 0 beta1 0 beta2 4.20614 beta3 0
[0158] In one embodiment, the weights may be estimated using standard multiplicative gradient descent, as can be appreciated by a person of ordinary skill in the art.
[0159] Other channel weighting models may also be possible. A data-driven instrumental approach to capture true channel weights will now be described. In some embodiments, this approach may include arranging, by a computer, a plurality of channels into a plurality of funnel stages (or levels) based on the conversion rate associated with a channel, constructing aggregate-level data where appropriate, and running a multi-stage regression computation on the plurality of funnel stages.
[0160] Specifically, the conversion rate of each channel may be examined and used to arrange the channels into a funnel of multiple stages. Table 8 below shows an example of different channels arranged by their conversion rates into a funnel with a plurality of funnel stages.
TABLE-US-00008 TABLE 8 Funnel Stage Channel/Sub-Channel (1 = highest) TV 1 Brand Display 2 Retargeting Display 3 Email 4 Generic Paid Search 5 Brand Paid Search 6 Organic Search 7
[0161] In arranging the channels into a funnel of multiple stages, the computer may compute attributable conversion rate for channels that have user-level data, counting the conversions if there is at least one touch point from the channel of interest. For channels that do not have user-level data, the computer may use all the conversions.
[0162] In some embodiments, where necessary, the funnel stage of a given channel can be overridden based on domain knowledge. For example, TV or email can be forced to be at the top of the funnel.
[0163] Further, multiple channels may exist at the same funnel stage. For example, Display and TV may be at the same top level.
[0164] A channel may be split into sub-channels as needed. For example, one might want to split Retargeting Display and Non-Retargeting display into two different sub-channels as the conversions rates for then can differ by more than one order of magnitude. In addition they are designed to target users at very different of funnel stages. Another example is Branded Search vs. Non-Branded Searchintuitively Branded Search is at a later stage than Non-Branded Search as the users searching for branded keywords are likely already past the awareness stage and in the consideration stage for the particular brand.
[0165] Next, the computer may construct aggregate-level data for each channel (or sub-channel) as appropriate. For example, user-level data can be aggregated out into aggregate as described above.
[0166] The computer may then run a multi-stage least squares regression, as an extension of a two-stage least squares algorithm. Multiple regressions may be run in a stepwise fashion as exemplified below: [0167] a. Assume there are m funnel stages (or levels), going from 1 to m top down. One may first try to determine weights for the bottom (m-th) level channels using the standard two-stage least squares algorithm, treating all channels above the m-th level as instrumental variables. [0168] b. After the causal weights of the m-th level channels are determined, do the same for the (m1)-th level channels, with the residuals as the target (dependent variable) and the channels in the top (m2) levels as instrumental variables. [0169] c. Repeat this process until the causal weights are determined for all channels. [0170] d. Optionally, non-negative constraints can be added so that the channel weights cannot be negative.
[0171] For example, the functional form of the model for the example data can be:
y=.sub.k=0.sup.m=3w.sub.kx.sub.k
[0172] Suppose the following funnel stages are used:
TABLE-US-00009 TABLE 9 Funnel Stage Channel/Sub-Channel (1 = highest) TV 1 Display 2 Paid Search 3
[0173] The channel weights learned from the example data above can be as follows:
TABLE-US-00010 TABLE 10 Parameter Value w0 0.388314 w1 0 w2 0.217415 w3 0.25
[0174] A channel weight thus determined may reflect how the key performance indicator will respond to a change to the volume (or advertising spending) of the channel at the aggregate level.
[0175] Turning now to
[0176] Embodiments can provide many advantages. For example, existing fractional attribution methods rely on marketing mix modeling or marketing mix optimization (MMM/MMO) approaches to deal with scenarios in which user-level data may not be available. Such approaches use regression models on multi-year time-series data to produce relative regression weights for different channels. Such weights are used to explain the contribution from different channels on conversions. They normally stay at the channel level and cannot assign attribution credit at more granular levels. Further, directly normalizing conversion probabilities across different channels may lead to useless results because the probabilities for different channels can differ by orders of magnitudes. To address these issues, embodiments can leverage a data-driven instrumental approach to derive the causal influence weight of any channel on conversions, without preconceived bias on the importance of different channels. This general approach works with any number of different types of advertising channels, as long as daily total ad volume is reliably captured. The new approach expands beyond online advertising channels and provides an accurate modeling of causal relationship among channels (online and offline) and conversions to thereby determine the most accurate credit to each channel or sub-channel involved.
[0177] Although the invention has been described with respect to specific embodiments thereof, these embodiments are merely illustrative, and not restrictive of the invention. The description herein of illustrated embodiments of the invention, including the description in the Abstract and Summary, is not intended to be exhaustive or to limit the invention to the precise forms disclosed herein (and in particular, the inclusion of any particular embodiment, feature or function within the Abstract or Summary is not intended to limit the scope of the invention to such embodiment, feature or function). Rather, the description is intended to describe illustrative embodiments, features and functions in order to provide a person of ordinary skill in the art context to understand the invention without limiting the invention to any particularly described embodiment, feature or function, including any such embodiment feature or function described in the Abstract or Summary. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes only, various equivalent modifications are possible within the spirit and scope of the invention, as those skilled in the relevant art will recognize and appreciate. As indicated, these modifications may be made to the invention in light of the foregoing description of illustrated embodiments of the invention and are to be included within the spirit and scope of the invention. Thus, while the invention has been described herein with reference to particular embodiments thereof, a latitude of modification, various changes and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of embodiments of the invention will be employed without a corresponding use of other features without departing from the scope and spirit of the invention as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit of the invention.
[0178] Reference throughout this specification to one embodiment, an embodiment, or a specific embodiment or similar terminology means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment and may not necessarily be present in all embodiments. Thus, respective appearances of the phrases in one embodiment, in an embodiment, or in a specific embodiment or similar terminology in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics of any particular embodiment may be combined in any suitable manner with one or more other embodiments. It is to be understood that other variations and modifications of the embodiments described and illustrated herein are possible in light of the teachings herein and are to be considered as part of the spirit and scope of the invention.
[0179] Reference throughout this specification to one embodiment, an embodiment, or a specific embodiment or similar terminology means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment and may not necessarily be present in all embodiments. Thus, respective appearances of the phrases in one embodiment, in an embodiment, or in a specific embodiment or similar terminology in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics of any particular embodiment may be combined in any suitable manner with one or more other embodiments. It is to be understood that other variations and modifications of the embodiments described and illustrated herein are possible in light of the teachings herein and are to be considered as part of the spirit and scope of the invention.
[0180] In the description herein, numerous specific details are provided, such as examples of components and/or methods, to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that an embodiment may be able to be practiced without one or more of the specific details, or with other apparatus, systems, assemblies, methods, components, materials, parts, and/or the like. In other instances, well-known structures, components, systems, materials, or operations are not specifically shown or described in detail to avoid obscuring aspects of embodiments of the invention. While the invention may be illustrated by using a particular embodiment, this is not and does not limit the invention to any particular embodiment and a person of ordinary skill in the art will recognize that additional embodiments are readily understandable and are a part of this invention.
[0181] Any suitable programming language can be used to implement the routines, methods or programs of embodiments of the invention described herein, including C, C++, Java, assembly language, etc. Different programming techniques can be employed such as procedural or object oriented. Any particular routine can execute on a single computer processing device or multiple computer processing devices, a single computer processor or multiple computer processors. Data may be stored in a single storage medium or distributed through multiple storage mediums, and may reside in a single database or multiple databases (or other data storage techniques). Although the steps, operations, or computations may be presented in a specific order, this order may be changed in different embodiments. In some embodiments, to the extent multiple steps are shown as sequential in this specification, some combination of such steps in alternative embodiments may be performed at the same time. The sequence of operations described herein can be interrupted, suspended, or otherwise controlled by another process, such as an operating system, kernel, etc. The routines can operate in an operating system environment or as stand-alone routines. Functions, routines, methods, steps and operations described herein can be performed in hardware, software, firmware or any combination thereof.
[0182] Embodiments described herein can be implemented in the form of control logic in software or hardware or a combination of both. The control logic may be stored in an information storage medium, such as a computer-readable medium, as a plurality of instructions adapted to direct an information processing device to perform a set of steps disclosed in the various embodiments. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and/or methods to implement the invention.
[0183] It is also within the spirit and scope of the invention to implement in software programming or code an of the steps, operations, methods, routines or portions thereof described herein, where such software programming or code can be stored in a computer-readable medium and can be operated on by a processor to permit a computer to perform any of the steps, operations, methods, routines or portions thereof described herein. The invention may be implemented by using software programming or code in one or more general purpose digital computers, by using application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical, chemical, biological, quantum or nanoengineered systems, components and mechanisms may be used. In general, the functions of the invention can be achieved by any means as is known in the art. For example, distributed, or networked systems, components and circuits can be used. In another example, communication or transfer (or otherwise moving from one place to another) of data may be wired, wireless, or by any other means.
[0184] A computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, system or device. The computer readable medium can be, by way of example only but not by limitation, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, system, device, propagation medium, or computer memory. Such computer-readable medium shall generally be machine readable and include software programming or code that can be human readable (e.g., source code) or machine readable (e.g., object code).
[0185] A processor includes any, hardware system, mechanism or component that processes data, signals or other information. A processor can include a system with a general-purpose central processing unit, multiple processing units, dedicated circuitry for achieving functionality, or other systems. Processing need not be limited to a geographic location, or have temporal limitations. For example, a processor can perform its functions in real-time, offline, in a batch mode, etc. Portions of processing can be performed at different times and at different locations, by different (or the same) processing systems.
[0186] It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. Additionally, any signal arrows in the drawings/Figures should be considered only as exemplary, and not limiting, unless otherwise specifically noted.
[0187] Furthermore, the term or as used herein is generally intended to mean and/or unless otherwise indicated. As used herein, including the claims that follow, a term preceded by a or an (and the when antecedent basis is a or an) includes both singular and plural of such term, unless clearly indicated within the claim otherwise (i.e., that the reference a or an clearly indicates only the singular or only the plural). Also, as used in the description herein and throughout the claims that follow, the meaning of in includes in and on unless the context clearly dictates otherwise. The scope of the present disclosure should be determined by the following claims and their legal equivalents.
[0188] Although the foregoing specification describes specific embodiments, numerous changes in the details of the embodiments disclosed herein and additional embodiments will be apparent to, and may be made by, persons of ordinary skill in the art having reference to this description. In this context, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of this disclosure. Accordingly, the scope of the present disclosure should be determined by the following claims and their legal equivalents.