SYSTEM AND METHOD FOR PROCESSING DIGITAL TRAFFIC METRICS
20200027104 ยท 2020-01-23
Inventors
- Andrew PRENDERGAST (South Yarra, AU)
- Paul CROSS (Brighton, AU)
- Dhruv BHATIA (Armadale, AU)
- Mark GORMLEY (South Yarra, AU)
- Ryan SANTOS (Caroline Springs, AU)
Cpc classification
International classification
Abstract
A computer-implemented method is disclosed for processing metrics via a controller. The controller comprises a processor and a memory storing program instructions which when executed by the processor causes implementation of the steps of generating or receiving metrics characterising digital traffic and/or related user behaviour from one or more sources and generating or receiving a tabular dataset associated with the metrics, wherein the dataset comprises rows of metrics and dimensions in which each row represents a subset of a metric grouping characterised by a combination of dimensions. The processor further implements the steps of receiving one or more partition identifiers representing a data structure of dataset partitions, assigning one or more metric groupings to one or more partition identifiers and analysing the dataset according to partition identifiers.
Claims
1. A computer-implemented method of processing metrics via a controller, the controller comprising a processor and a memory storing code which when executed by the processor causes implementation of the steps of: receiving a first dataset X characterising digital traffic and/or related user behaviour from a first source, the first dataset X including data of a first metric; a second dataset Y characterising digital traffic and/or related user behaviour from a second source, the second dataset Y including data of a second metric, wherein the second metric is correlated to the first metric; based on the correlation between the second metric and the first metric, generating a mapping function configured to merge the first dataset X with the second dataset Y; and merging the first dataset X with the second dataset Y into a third dataset by application of the mapping function to the first dataset X and second dataset Y, such that the third dataset includes the data of the first dataset X and the second dataset Y.
2. A computer-implemented method according to claim 1, wherein the code when executed by the processor further causes implementation of the step of: learning the mapping function from the first dataset X and the second dataset Y.
3. A computer-implemented method according to claim 2, wherein the mapping function BA.sup.1C, A being a matrix constructed from the second dataset Y and consisting of |T| rows and |Y| columns, each row in A containing the value of a metric M that occurs in both the first and the second datasets for a predetermined period, and each column in A contains the value of M for one level in the dimension Y; and C being a matrix constructed from the first dataset X consisting of |T| rows and |X| columns, each row in C containing the value of M for the predetermined period, and each column in C contains the value of M for one level in the dimension X.
4. A computer-implemented method according to claim 3, wherein when B is a positive integer matrix, and the sum of all cells in the matrix B is equal to MAX(|X|,|Y|), a linear or non-linear solver is run by the processor to learn the mapping function B.
5. A computer-implemented method according to claim 3, wherein a least-squares matrix solver is run by the processor to learn the mapping function B.
6. A controller for processing metrics, the controller comprising a processor and a memory storing code which when executed by the processor causes implementation of the steps of: receiving a first dataset X characterising digital traffic and/or related user behaviour from a first source, the first dataset X including data of a first metric; receiving a second dataset Y characterising digital traffic and/or related user behaviour from a second source, the second dataset Y including data of a second metric, wherein the second metric is correlated to the first metric; based on the correlation between the second metric and the first metric, generating a mapping function configured to merge the first dataset X with the second dataset Y; and merging the first dataset X with the second dataset Y into a third dataset by application of the mapping function to the first dataset X and second dataset Y, such that the third dataset includes the data of the first dataset X and the second dataset Y.
7. The controller for processing metrics according to claim 6, wherein the code when executed by the processor further causes implementation of the step of selecting metrics and/or dimensions from the first and second datasets which are to be joined by positioning opposing ends of at least one connector onto graphic elements representing metrics and/or dimensions to be joined.
8. The controller for processing metrics according to claim 6, the steps further comprising learning the mapping function from the first dataset X and the second dataset Y.
9. The controller for processing metrics according to claim 8, wherein: the mapping function is represented by BA.sup.1C, wherein A being a matrix constructed from the second dataset Y and consisting of |T| rows and |Y| columns, each row in A containing the value of a metric M that occurs in both the first and the second datasets for a predetermined period, and each column in A contains the value of M for one level in the dimension Y, and C being a matrix constructed from the first dataset X consisting of |T| rows and |X| columns, each row in C containing the value of M for the predetermined period, and each column in C contains the value of M for one level in the dimension X.
10. The controller for processing metrics according to claim 9, wherein: when B is a positive integer matrix, and the sum of all cells in the matrix B is equal to MAX(|X|,|Y|), a linear or non-linear process is used to learn the mapping function B.
11. The controller for processing metrics according to claim 9, wherein the linear or non-linear process used to learn the mapping function B is a least-squares matrix process.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0060] The invention will now be described in further detail by reference to the accompanying drawings. It is to be understood that the particularity of the drawings does not supersede the generality of the preceding description of the invention.
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
DETAILED DESCRIPTION
[0070] Referring firstly to
[0071] The system 10 includes a data warehouse 12 connected to a series of advertising platform data bases 14 to 20 via a data network 22, such as the Internet. A series of the advertising platform databases 14 to 20 store datasets of information relating to digital traffic and related user behaviour. The datasets stored on each of the databases 14 to 20 relates to separate traffic measurement platforms that have been run by the proprietors of each of the databases 14 to 20. These datasets are provided to the data warehouse 12, and specifically to a database server 24 in communication with the network 22 and stored in a database 26 associated with the database server 24.
[0072] A terminal 28 and associated graphic user interface 30 enable a campaign manager or other user to interact with the datasets stored in the database 26. Once the datasets have been reorganised, augmented and/or merged at the data warehouse 12, the resultant datasets are transmitted to a customer terminal 32 to enable viewing of a consolidated campaign reporting board 34 on the display of the customer terminal 32, or alternatively to generate printed campaign reports from a printer 36 in communication with customer terminal 32. In addition, the consolidated datasets may be transmitted from the database server 24 to a customer database server 38 and associated database 40 in communication with the data network 22.
[0073] The data warehouse 12 enables the reorganising of the datasets from the various advertising platform databases 14 to 20 into a predetermined data structure by partitioning the various datasets, improving the datasets with additional business specific metric data and furthermore provides a way to combine multiple views of activity into a single de-duplicated dataset. The graphic user interface 30 provides the campaign manager with the functionality required to specify an indefinitely deep tree hierarchy 200, or other predetermined structure, and a point-and-click facility for assigning advertising activity data from multiple advertising systems to any node (partition) in this user defined hierarchy 190. The graphic user interface 30 furthermore provides a means of entering new or overwriting existing metric data at any node in the hierarchy 170. Furthermore, when data from two or more advertising systems are assigned to a node in the hierarchy, a machine learning algorithm detects which dimensions in a first system are to be mapped to which dimensions into dimensions in the other system.
[0074] It should be appreciated that the computer implemented method of processing metrics described herein could be applied not only to advertising datasets, but to any dataset in general. Any company or organisation with a data warehouse that has a need to reorganise their datasets, add additional data to their datasets and/or merge multiple datasets together will benefit from the advantages provided by the present invention.
[0075] The system 10 may be implemented using hardware, software or a combination thereof and may be implemented in one or more computer systems, controllers or processing systems. In particular, the functionality of the client user terminal 32 and its graphic user interface 34, as well as the server 24 may be provided by one or more computer systems capable of carrying out the above described functionality.
[0076] An exemplary controller 50 is shown in
[0077] The secondary memory 62 may include, for example, a hard disk drive 64, magnetic tape drive, optical disk drive, etc. A removable storage drive 68 reads from and/or writes to a removable storage unit 70 in a well-known manner. The removable storage unit 70 represents a floppy disk, magnetic tape, optical disk, etc.
[0078] As will be appreciated, the removable storage unit 70 includes a computer usable non-transitory storage medium having stored therein computer software in a form of program instructions to cause the processor 52 to carry out desired functionality. In alternative embodiments, the secondary memory 62 may include other similar means for allowing computer programs or program instructions to be loaded into the controller 50. Such means may include, for example, a removable storage unit 72 and interface 74.
[0079] The controller 50 may also include a communications interface 76. Communications interface 76 allows software and data to be transferred between the controller 50 and external devices. Examples of communication interface 76 may include a modem, a network interface, a communications port, a PCMIA slot and card etc. Software and data transferred via a communications interface 76 are in the form of signals 78 which may be electromagnetic, electronic, optical or other signals capable of being received by the communications interface 76. The signals are provided to communications interface 76 via a communications path 80 such as a wire or cable, fibre optics, phone line, cellular phone link, radio frequency or other communications channels.
[0080] Referring now to
[0081] The tabular dataset 90 consists of rows of metrics and dimensions in which each row represents a subset of a metric grouping characterised by a combination of dimensions. Accordingly, each row in the dataset comprises a metric grouping running a different combination of dimensions (such as date, campaign, keyword) and records the impressions, clicks and conversions occurring when that specific combination of dimensions occurred. Other datasets having different dimensions and recording different metrics against various combinations of dimensions may be recorded in the other advertising platform databases.
[0082] By use of the graphic user interface 30, a campaign manager 160 is firstly able to specify a hierarchy or other data structure of partitions 200 into which the dataset can be divided for subsequent analysis. Partition identifiers are used to associate rows of data in the dataset with nodes in a data structure, such as a linear list, hierarchical tree or multiply connected graph structure, one such exemplary hierarchical tree data structure 100 is depicted in
[0083] Beneath the upper level partition p1 exists two data partitions identified by partition identifiers p2 and p3. Partitions may be defined by way of logic such as Boolean logic, set logic or the like. For example, the partition p2 includes all metrics falling within the data partition p1 and having a value of the Y dimension as y3 (and for example, defined by set logic as Y={y3]). The data partition p3 includes all metrics falling within the data partition p1 where the value of the said dimension is either z1 or z2 and Impressions greater than 1 (and for example, defined by Boolean logic as (Z=z1 OR Z=z2) AND Impressions >1). Finally, the data structure 100 includes two further low level dataset partitions respectively having partition identifiers p4 and p5. The data partition p4 includes metrics falling within the data partition p3 and having a Y dimension with a value of y1, whilst the data partition p5 may include all metrics falling within the data partition p3 and having a Y dimension value of y2. The partition identifiers p1 to p5 are assigned to one or more of the metric groupings (rows) depicted in the dataset 90.
[0084]
[0085] In addition to the supplementary dimension data provided by the partition identifiers, the dataset 110 depicts supplementary metrics 112 which have been added to the metrics 92 as well as supplementary dimensions 113 which have been added to the dimensions 94 described in relation to the dataset 90 according to the data structure depicted in 100. In this example, the supplementary metrics define target conversions, costs and budgeted costs while the supplementary dimensions define annotations.
[0086] In the example data structure 100, p1 contains the supplemental metric Target Conversions which should be set to 10 with the allocation weighted according to the Clicks metric. Referring to 112, you can see the results of this, with the Target Conversions column now summing to 10, and a weighted average applied according to the Click metric.
[0087] As another example, in the data structure 100 p4 and p5 contain supplemental metrics for Budgeted Cost which each should be set to $200. Referring again to 112, the Budgeted Cost column now sums to $400, with $200 distributed across rows 1 and 11 according to a weighted average on Impressions (p4) and an additional $200 distributed across rows 4 and 7 according to a weighted average on Clicks (p5).
[0088] As well as receiving supplementary metrics and/or dimension data and writing that supplementary metrics and/or dimension data and partition identifiers to a particular dataset, the data warehouse 12 is also adapted to enable updated metrics and/or dimension data to be received and written to a dataset.
[0089] Operation of the graphic user interface 30 so as to allow a user to define a hierarchy or other data structure of dataset partitions will now be explained with reference to
[0090] As can be seen in
[0091] The graphic user interface 30 provides various interface portions depicting each created partition. The position of each partition within the hierarchical data structure can be altered by a user friendly drag and drop functionality 204 and 206, whereby a user is able to either delete a partition or select an interface portion corresponding to a particular data partition in order to reposition that interface window to a higher or lower hierarchical position with respect to the other data partitions displayed. Once the graphical representation 124 of the interface portions corresponding to each partition graphically presented in a desired hierarchical structure are settled, the changes can then be recorded by the campaign manager in the data base server 24.
[0092] A further interface window 126 is provided so that a user may select an interface portion corresponding to a particular data partition 190 and thereafter have displayed in the interface window 126 the various metrics associated with that particular data partition 192. In the example shown in
[0093] Functionality is also provided by the graphic user interface 30 to enable editing of that particular data partition 192. For example, rather than selecting data from the Fairfax publisher, a data partition corresponding to a different publisher may be selected on the interface window 126.
[0094] Moreover, as shown in
[0095] Although the interface portions and windows displayed in
[0096] The graphic user interface 30 also enables a user 160 to provide supplementary metrics and/or dimensions to a dataset 170. As seen in
[0101] As shown in
[0102] Once that date range has been entered, a further interface window 146, as shown in
[0103] Once a particular metric is selected for editing, a further interface window 150 is presented to the user to enable editing of that metric. In the depicted example, variable budget rate data is able to be entered in a window portion 152 and fixed budget data is able to be entered in a window portion 154.
[0104] In instances where a first metric is derived from a second metric by multiplying the second metric by a fixed coefficient (e.g. fixed cost per click), a user uses the panel 152 depicted in
[0105] In instances where the absolute value of a metric is known outright (for example, total spend is known in absolute terms after activity has finished running), then one uses the panel 154 depicted in
[0106] Once the selected metric has been edited, the graphic user interface 30 once again presents the interface window 146 to the user, as shown in
[0107] The aforementioned process is able to be repeated at the graphic user interface 30 for all other data segments for which supplementary metrics are desired to be added or existing metrics changed. The augmented dataset or supplementary metrics can be displayed in an interface window 158 viewed by the user prior to confirmation and updating of the dataset.
[0108]
[0109] The resultant database structure using the stored dimensions, metrics, as well as the stored partition identifiers (hierarchical information) and associated augmented metrics is shown in
[0110] The partition table 220 contains the hierarchy of partition IDs in which a parent partition ID 222 is used to create a tree structure. Connected to this table are the filtergroups 224 and 226 which defined which dimensions are covered by a partition, and the datarows 228 and 230 which contain supplemental dimension 221 and metric 229 augmentations for a particular interval.
[0111] A data partition can contain multiple views of the same dataset (for example, data from a search platform and data from a third party advertisement server, data from an email platform and from a website analytics package). In this instance, metrics such as cost might be present in one dataset, conversions in the other and clicks may be counted twice. To deal with this, the datasets from various sources can be merged by the database server 24 to a single view in which groupings (rows) are combined and duplication is removed by application of a mapping function.
[0112] By way of explanation,
[0113] Preferably, the mapping function is one which is learned from the first and second datasets. To learn the map function the database server 24 requires two datasets, a highly correlated (but possibly noisy) metric (M) that occurs in both datasets (e.g., Clicks & Visits), the name of a dimension in the first dataset (X) upon which should be mapped the levels of some other named dimension in the second dataset (Y) and several days (T) or other periods which co-occur in both datasets.
[0114] The map function (B) can then be recovered by solving the following linear equation:
BA{circumflex over ()}(1)C
[0115] subject to the following constraints on B: [0116] B is a positive integer matrix [0117] The sum of all cells in the matrix B are equal to MAX(|X|,|Y|)
Where:
[0118] A is a matrix constructed from the second dataset consisting of |T| rows and |Y| columns. Each row in the matrix contains the value of M for one whole day, and each column contains the value of M for one level in the dimension Y, [0119] C is a matrix constructed from the first dataset consisting of |T| rows and |X| columns. Each row in the matrix contains the value of M for one whole day, and each column contains the value of M for one level in the dimension X, and [0120] B is the map function.
[0121] When implemented by the database server 24, the following observations can apply: [0122] a linear or non-linear solver may be used to calculate B. The same general form applies. [0123] a least-squares matrix solver can be used without the constraint, however a minimum of MAX(|X|,|Y|) days of data is required. [0124] some linear algebra solvers will require the matrices to be made into square matrices. The behaviour of the algorithm is the same. [0125] introducing the constraint reduces the number of days of data required. [0126] if the metric M is noisy (that is, it's not a perfect mapping) then proportions should be used in its place. [0127] an optimizer based solution which chooses the map matrix B that minimizes the squared error in M will produce the best result, but can be computationally expensive.
[0128] The following example uses the data from databases 252 and 254 depicted in
[0129] {c1}={k1, k2, k3, k4, k5, k6, k7, k8}
[0130] {c2}={k9, k10, k11}
then the following linear system is to be solved:
[0131]
[0132] From the foregoing it will be appreciated that the present invention enables users to reorganise their advertising datasets, and to augment their datasets with additional dimension and metric information, before, during and after advertising activity has run.
[0133] Datasets are able to be easily segmented using a hierarchical drag and drop interface that provides a user with ease of use and flexibility. Segment definitions and custom data are retained when moving segments so that a user can continue to easily manage and update their digital advertising data should their business needs evolve.
[0134] Custom data can be entered against a range of dimensions and metrics rather than a single metric only, such as cost. Additional metrics including business entry metrics such as targets, forecasts, budgets etc. can be entered which are often used by digital marketing teams to assess the performance of digital media purchasing.
[0135] The present invention also enables a real time preview of custom data to be provided before changes are saved. This view provides an assurance layer and helps to prevent errors that could decrease the accuracy of existing data within the system.
[0136] The invention also provides a mechanism for easily splitting custom data date ranges 186 and 157, making custom data entry easier and more intuitive than existing solutions.
[0137] Custom data appearing in a particular report is also able to be limited, if so desired.
[0138] As has been previously mentioned, although the present invention has been described in relation to its application to advertising datasets, the invention is also applicable to any dataset in general. Any company with a data warehouse that has a need to reorganise their datasets, is able to add additional data to their datasets and merge these multiple datasets together.
[0139] Although in the above described embodiments the invention is implemented primarily using computer software, in other embodiments the invention may be implemented primarily in hardware using, for example, hardware components such as an application specific integrated circuit (ASICs). Implementation of a hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art. In other embodiments, the invention may be implemented using a combination of both hardware and software.
[0140] While the invention has been described in conjunction with a limited number of embodiments, it will be appreciated by those skilled in the art that many alternative, modifications and variations in light of the foregoing description are possible. Accordingly, the present invention is intended to embrace all such alternative, modifications and variations as may fall within the spirit and scope of the invention as disclosed.