DETERMINING USAGE DATA OF MOBILE APPLICATIONS FOR A POPULATION
20180165697 ยท 2018-06-14
Inventors
Cpc classification
G06F15/173
PHYSICS
International classification
Abstract
A utility application for a mobile device inspects data packets from other mobile applications running on the device to gather and record usage data about those applications. Since users of the utility application may not reflect the true population for which the usage data is desired, a system de-biases the data reported from the utility applications using a machine learning model to predict demographics of the users of the utility application. To determine a training data set for the model, the system requests a user to provide a desired user attribute by way of an in-app questionnaire. This enables labeling utility usage data with the demographics, which can be weighted and extrapolated to determine usage across the population as a whole.
Claims
1. A method for determining usage data of one or more mobile applications for a desired population, the method comprising: obtaining usage data of one or more mobile applications by a plurality of users who have a utility application installed on a mobile device, the usage data obtained by the utility application by inspecting data packets on each mobile device; obtaining a model for predicting whether a user of the plurality of users has a user attribute, by: requesting a utility application user to provide a desired user attribute, the utility application to obtain the user attribute from a user by way of an in-app questionnaire, receiving, in response to the request, the value for the desired user attribute for a subset of the plurality of users, assigning the user attribute to each of the subset of the plurality of users, and training the model for predicting whether a user of the plurality of users has a user attribute using the assigned user attributes for a training set for the model; de-biasing the obtained usage data by: for the plurality of users who were not sent a request to provide a desired user attribute, predicting whether the user has the user attribute using the model, assigning the predicted user attribute to the plurality of users based on the predicting, comparing an aggregate of the user attribute of the plurality of users who use the utility application with an aggregate of the user attribute for a desired population, and weighting the obtained usage data based on the comparison of the aggregates of the user attribute; and generating a report of the usage data for the desired population based on the weighted usage data.
2. The method of claim 1, wherein the usage data for a user of each user attribute is extrapolated to a user from the sampled population.
3. The method of claim 1, wherein the obtained usage data for the user attribute is assigned a confidence score.
4. The method of claim 1, wherein the weighting of usage data is discounted by an associated confidence score of the usage data.
5. The method of claim 1, wherein the sampled population for each desired user attributes is obtained by conducting surveys.
6. The method of claim 1, wherein the utility application is a virtual private network (VPN) application.
7. The method of claim 1, wherein the utility application is a data collecting utility application.
8. The method of claim 1, wherein obtaining a model for predicting whether a user of the plurality of users has a user attribute further comprises: purchasing an advertisement on a third-party system, the advertisement including a targeting criterion wherein only users who have the user attribute are eligible to be served the advertisement; observing, by inspecting data packets by the utility application, a subset of the plurality of users; assigning the user attribute to each of the subset of the plurality of users; and training the model for predicting whether a user of the plurality of users has a user attribute using the assigned user attributes for a training set for the model.
9. A computer program product for tracking usage of applications on a mobile device, the computer program product comprising a computer-readable storage medium containing computer program code for: obtaining usage data of one or more mobile applications by a plurality of users who have a utility application installed on a mobile device, the usage data obtained by the utility application by inspecting data packets on each mobile device; obtaining a model for predicting whether a user of the plurality of users has a user attribute, by: requesting a utility application user to provide a desired user attribute, the utility application to obtain the user attribute from a user by way of an in-app questionnaire, receiving, in response to the request, the value for the desired user attribute for a subset of the plurality of users, assigning the user attribute to each of the subset of the plurality of users, and training the model for predicting whether a user of the plurality of users has a user attribute using the assigned user attributes for a training set for the model; de-biasing the obtained usage data by: for the plurality of users who were not sent a request to provide a desired user attribute, predicting whether the user has the user attribute using the model, assigning the predicted user attribute to the plurality of users based on the predicting, comparing an aggregate of the user attribute of the plurality of users who use the utility application with an aggregate of the user attribute for a desired population, and weighting the obtained usage data based on the comparison of the aggregates of the user attribute; and generating a report of the usage data for the desired population based on the weighted usage data.
10. The computer program product of claim 9, wherein the usage data for a user of each user attribute is extrapolated to a user from the sampled population.
11. The computer program product of claim 9, wherein the obtained usage data for the user attribute is assigned a confidence score.
12. The computer program product of claim 9, wherein the weighting of usage data is discounted by an associated confidence score of the usage data.
13. The computer program product of claim 9, wherein the sampled population for each desired user attributes is obtained by conducting surveys.
14. The computer program product of claim 9, wherein the utility application is a virtual private network (VPN) application.
15. The computer program product of claim 9, wherein the utility application is a data collecting utility application.
16. The computer program product of claim 9, wherein obtaining a model for predicting whether a user of the plurality of users has a user attribute further comprises: purchasing an advertisement on a third-party system, the advertisement including a targeting criterion wherein only users who have the user attribute are eligible to be served the advertisement; observing, by inspecting data packets by the utility application, a subset of the plurality of users; assigning the user attribute to each of the subset of the plurality of users; and training the model for predicting whether a user of the plurality of users has a user attribute using the assigned user attributes for a training set for the model.
17. A system for determining usage data of one or more mobile applications for a desired population, the system configured to: obtain usage data of one or more mobile applications by a plurality of users who have a utility application installed on a mobile device, the usage data obtained by the utility application by inspecting data packets on each mobile device; obtain a model for predicting whether a user of the plurality of users has a user attribute, by: requesting a utility application user to provide a desired user attribute, the utility application to obtain the user attribute from a user by way of an in-app questionnaire, receiving, in response to the request, the value for the desired user attribute for a subset of the plurality of users, assigning the user attribute to each of the subset of the plurality of users, and training the model for predicting whether a user of the plurality of users has a user attribute using the assigned user attributes for a training set for the model; de-bias the obtained usage data by: for the plurality of users who were not sent a request to provide a desired user attribute, predicting whether the user has the user attribute using the model, assigning the predicted user attribute to the plurality of users based on the predicting, comparing an aggregate of the user attribute of the plurality of users who use the utility application with an aggregate of the user attribute for a desired population, and weighting the obtained usage data based on the comparison of the aggregates of the user attribute; and generate a report of the usage data for the desired population based on the weighted usage data.
18. The system of claim 17, wherein the usage data for a user of each user attribute is extrapolated to a user from the sampled population.
19. The system of claim 17, wherein the obtained usage data for the user attribute is assigned a confidence score.
20. The system of claim 17, wherein the weighting of usage data is discounted by an associated confidence score of the usage data.
21. The system of claim 17, wherein the sampled population for each desired user attributes is obtained by conducting surveys.
22. The system of claim 17, wherein the utility application is a virtual private network (VPN) application.
23. The system of claim 17, wherein the utility application is a data collecting utility application.
24. The system of claim 17, wherein obtaining a model for predicting whether a user of the plurality of users has a user attribute further comprises: purchasing an advertisement on a third-party system, the advertisement including a targeting criterion wherein only users who have the user attribute are eligible to be served the advertisement; observing, by inspecting data packets by the utility application, a subset of the plurality of users; assigning the user attribute to each of the subset of the plurality of users; and training the model for predicting whether a user of the plurality of users has a user attribute using the assigned user attributes for a training set for the model.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0007]
[0008]
[0009]
[0010] The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
DETAILED DESCRIPTION
Overview
[0011]
[0012] The client devices 106 are one or more computing devices capable of receiving user input as well as transmitting and/or receiving data via the network 102. In one embodiment, a client device 106 is a smartphone, a tablet or a conventional computer system, such as a desktop or laptop computer. Alternatively, a client device 106 may be a device having computer functionality that accesses a set of mobile applications 109. A client device 106 is configured to communicate via the network 102. In one embodiment, a client device 106 executes an application 109 allowing a user of the client device 106 to interact with the de-biasing module 110. For example, a client device 106 executes a browser application to enable interaction between the client device 106 and the de-biasing module 110 via the network 102. In another embodiment, a client device 106 interacts with the de-biasing module 110 through an application programming interface (API) running on a native operating system of the client device 106, such as IOS or ANDROID. In another embodiment, a client device 106 interacts with the de-biasing module 110 via the utility application 108 that may be a data collecting application executing on the client device 106.
[0013] The client devices 106 are configured to communicate via the network 102, which may comprise any combination of local area and/or wide area networks, using both wired and/or wireless communication systems. In one embodiment, the network 102 uses standard communications technologies and/or protocols. For example, the network 102 includes communication links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 3G, 4G, code division multiple access (CDMA), digital subscriber line (DSL), etc. Examples of networking protocols used for communicating via the network 102 include multiprotocol label switching (MPLS), transmission control protocol/Internet protocol (TCP/IP), hypertext transport protocol (HTTP), simple mail transfer protocol (SMTP), and file transfer protocol (FTP). Data exchanged over the network 120 may be represented using any suitable format, such as hypertext markup language (HTML) or extensible markup language (XML). In some embodiments, all or some of the communication links of the network 102 may be encrypted using any suitable technique or techniques.
[0014] The de-biasing module 110 determines usage data of one or more mobile applications for a desired population, wherein the desired population may be different than the utility application users. The de-biasing module 110 includes a machine learning model to predict whether a user of the plurality of users has a user attribute, the user attribute may be related to the demographic information of the user. In one embodiment, to determine a training set for the model, the de-biasing module 110 communicates with the utility application 108 on the client device 106 to request a user's permission to collect user attribute data of the user of the client device 106. For example, the utility application 108 may request the user to fill a form that includes user attribute information such as gender, age, usage of an application and other such information. Based on the collected information, the de-biasing module 110 determines uses this information to assign a user attribute to a plurality of users of the utility application. Based on the user attribute information of the plurality of users using the applications, the model is trained to predict whether a utility application 108 user of the plurality of users has a user attribute.
[0015] In another embodiment, to determine a training set for the model, the de-biasing module 110 communicates with the utility application 108, such as a VPN application, on the client device 106 to observe a subset of plurality of users who have been served advertisements that include a targeting criterion related to a desired user attribute, by inspecting data packets by the utility application 108. The advertisement is displayed on a client device 106, the advertisement may be purchased from a third-party system and is displayed to a user who is eligible to be served with the advertisement, i.e. the user has the user attribute included in the targeting criterion of the advertisement. The utility application 108 monitors the network traffic to determine which client device downloaded/installed the application of the advertisement. The utility application 108 may use an application rule dictionary that stores rules (e.g. regular expressions) that match the network traffic generated by an application. The utility application 108 communicates the information to the de-biasing module 110.
[0016] Based on the information, the de-biasing module 110 determines which advertisements were served to which client devices 106 and uses this information to assign a user attribute to a plurality of users of the utility application 108. Based on the user attribute information of the plurality of users who were served the advertisement, the model is trained to predict whether a utility application 108 user of the plurality of users has a user attribute or not. The utility application 108 determines a usage of an application on the client device 106 and reports it to the de-biasing module 110. The de-biasing module 110 aggregates the received usage data to determine the combined application usage statistic, and may group the usage data by a user attribute such as a specific user demographics.
[0017] The usage data for an application is biased based on the user attributes of the utility application users. Since the utility application users may not reflect a sample population (or a desired population) for which the usage data is desired, the de-biasing module 110 de-biases the usage data. To de-bias the usage data, or any other data metric, a sample population (or a desired population) that includes the user attribute, is obtained by the sampling module 120.The sampling module 120 may conduct surveys to obtain the sample population for each desired user attribute. The sampled population is sent to the de-biasing module 110. The de-biasing module 110 further weights the usage data for the user attribute by comparing the aggregate utility app usage data for a given attribute with an aggregate of the user attribute from the sampled population. Based on the weighted usage data set, a report of the usage data or any other data metric, for example, Mbs used per session is generated for the sampled population.
[0018]
[0019] The ad targeting module 210 purchases advertisements with a targeting criterion from a third-party system, the advertisements are displayed on the client device 106, for example, display within a mobile application. The targeting criterion is matched to a desired demographic such as the gender, age, marital status, education, ethnicity and other such demographics. Different types of advertisements are purchased for each desired user attribute, each advertisement annotated with the targeting criterion such as, displaying an advertisement for female users with a target age group 25-35 years. An example advertisement includes toddler fashion apparel shopping mobile application that generally female users within the age group 25-35 years that have toddlers would be interested in and would download and install the application.
[0020] The purpose of purchasing the advertisements is to collect user attribute (e.g. demographic) data for a utility application 108 user, in order to determine ground truth (i.e. gather data for a training set for a computer model). In one embodiment, the ground truth determination module 220 obtains the user attribute information from the utility collection application 108. The ground truth determination module 220 may send a request to the utility application (i.e. a VPN or utility data collection application) 108 to gather a user's attribute information by way of presenting a questionnaire, and with the user's content, access the attribute information of the user. In response to the request, the user may provide the related information, such as gender, age, sex of the user. The ground truth determination module 220 may gather the information and assign attribute group labels to each user of the utility application user.
[0021] In another embodiment, the ground truth determination module 220 receives the advertisement information from the ad targeting module 210, i.e. the time and user the advertisement was served to, along with the targeting criterion of the advertisement. The ground truth determination module 220 monitors the network interface to determine which ads were served to which users and if the user of the client device 106 downloaded and installed the applications displayed within the advertisement. The ground truth determination module 220 assigns a user attribute to the client device 106 user based on the targeting criterion of the served advertisement and the fact that the user downloaded and installed the advertised mobile application. For example, if a client device 106 user downloads the displayed toddler fashion apparel shopping application, the client device 106 user is assigned gender=female and age group=25-35 years.
[0022] In another embodiment, the ground truth determination module 220 assigns a confidence value for each assignment of the user attribute information to the user. The confidence value indicates the probability of the assignment, i.e. what is the probability that the user belongs to the assigned user attribute group. A higher confidence value indicates a higher probability of the assignment and vice versa. For example, a confidence value of 90% for the assigned gender=female indicates that the user is highly likely a female user. A confidence value of 60% for the age group=25-35 years indicates there is a 60% probability that the user is within the age range 25-35 years.
[0023] Once the ground truth is determined, the user attribute prediction module 230 generates a computer model that predicts whether a utility application user has a user attribute or not. The determined ground truth (i.e. the assigned users) is used as a training set to train the computer model. The computer model generates a function that fits the training set data, such that the computer model can further predict the user attribute information of a user using the function. The function includes model parameters that may be adjusted based on differences between the fitted function and the training set.
[0024] Once the computer model is trained, the user attribute prediction module 230 predicts a user attribute for each utility application 108 user and assigns a user attribute label to the user. For example, a user may be assigned labels gender=female, age group=25-35 years, marital status=married and other such labels.
[0025] The utility application 108 on each client device can determine usage data for the other applications 109 executing on the client device 106. The utility application 108 collects the usage data for each user and reports it to the user attribute prediction module 230. Further, the user attribute prediction module 230 aggregates the usage data based on the user attribute label of the user. For example, usage data for a mobile application Starbucks is collected from the utility application 108 of each client device 106 and aggregated based on the user attribute information of the user of the client device 106, i.e. number of males using the Starbucks application, age group=25-30 years using the Starbucks application.
[0026] The weight determination module 240 weights the aggregated usage data. The weight determination module 240 receives an aggregate of the user attribute of the utility application users from the user attribute prediction module 230. Further, the weight determination module 240 receives an aggregate of the user attribute for a desired population from the sampling module 120. In one embodiment, the weight determination module 240 compares the aggregates of the user attribute of the utility application users to the desired population from the sampling module 120 to determine a weighting ratio for that attribute. Based on the ratio, the usage data for each user of a given app for a given user attribute is weighted, thus de-biasing the set of utility application users of the user attribute. In another embodiment, the usage data weight for the utility application users is discounted using the confidence value associated with each user attribute label. An aggregated confidence value can be determined by combining the confidence values associated with each user within the user attribute label. Each user included in the usage data of a user attribute group is weighted, by either a regular weighting ratio, or by a discounted weight if one is used. For example, for a demographic gender, i.e. number of females using the Starbucks application, if the sample population indicates that 50% of the population are female, and the determined usage data indicates that 80% of the utility app users are female, then the weight associated with the determined usage data for the demographic gender=female is 50/80=0.625 (females are over-represented amongst utility app users). If the raw usage data indicates that 71% of Starbucks users are female, the value for the de-biased Starbucks metric is 71%*0.625=44.3% female. The weighting ratios for multiple user attributes can be multiplied together to obtain an overall weight for each user or, in another embodiment, can be calculated separately for each multi-dimensional combination of attributes.
[0027] The usage data determination module 245 receives the weighted/unbiased usage data of the utility application users based on the user attribute. Based on the weighted usage data, the usage data determination module 245 may extrapolate the data to the entire set of the population using known extrapolation methods such as linear extrapolation, polynomial extrapolation or any other such method. Further, on receiving a request to generate a usage data report for a sample population or a specific user attribute of the sample population, the usage data determination module 245 generates the report from the un-biased usage data. Examples of usage data reports for a specific user attribute include average length of a session, the number of active male users, the number of active female users, weekly active users, daily active users, monthly active users, data used in MB/user, data used in MB/session, and other such data for a mobile application.
[0028]
[0029] Since the usage data collected is biased towards utility application users, the de-biasing module 110 further de-biases 314 the obtained usage data by performing the steps described from 316 to 322. The de-biasing module 110 predicts 316 if a user, from the plurality of users that were not served the advertisements, has the user attribute, using the computer model. If the user does have the user attribute, the de-biasing module 110 assigns 318 the user attribute to the user. Based on the assigned user attribute labels, the de-biasing module 110 aggregates the usage data of the utility application users based on the user attribute.
[0030] The de-biasing module 110 further receives the usage data for a sample population, based on the user attribute. The de-biasing module 110 compares 320 the aggregate of the user attribute of the utility application users to an aggregate of the user attribute for the sample population. Based on the comparison, the usage data is weighted 322, i.e. the obtained usage data from the utility application users is adjusted to fit the user attribute of the sample population. Based on the weighted usage data for the sample population, the de-biasing module generates a report of the usage data based on the user attribute, e.g. Mbps used per session within a demographic population and other such user attributes.
Summary
[0031] The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.
[0032] Some portions of this description describe the embodiments of the invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.
[0033] Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
[0034] Embodiments of the invention may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a nontransitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
[0035] Embodiments of the invention may also relate to a product that is produced by a computing process described herein. Such a product may comprise information resulting from a computing process, where the information is stored on a nontransitory, tangible computer readable storage medium and may include any embodiment of a computer program product or other data combination described herein.
[0036] Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.