Deep User Modeling by Behavior

Abstract

A system, method and non-transitory computer-readable medium are provided for deep user modeling of user behavior. According to the deep user modeling, user behavior vectors that represent historical user behaviors of a user are determined. Based on a concatenation of the user behavior vectors, a variable-length user behavior matrix is determined. The variable-length user behavior matrix is converted into a fixed-length embedding vector via a long short term memory network, and the fixed-length embedding vector is outputted to the user as a predicted target behavior.

Claims

1. A method for performing deep user modeling, comprising: determining user behavior vectors that represent historical user behaviors of a user; determining a variable-length user behavior matrix based on a concatenation of the user behavior vectors; converting the variable-length user behavior matrix into a fixed-length embedding vector via a long short term memory network; and outputting the fixed-length embedding vector to the user as a predicted target behavior.

2. The method according to claim 1, further comprising: updating the variable-length user behavior matrix based on the predicted target behavior.

3. The method according to claim 1, further comprising: guiding the user to a predicted destination in a vehicle based on the predicted target behavior.

4. The method according to claim 1, wherein the fixed-length embedding vector represents a user profile.

5. The method according to claim 1, further comprising: determining an error between the predicted target behavior and an actual user behavior.

6. The method according to claim 5, further comprising: updating the user behavior vectors based on the error.

7. A method for modeling behavior of a user, comprising: receiving user characteristics data of a user; transforming the user characteristics data into user behavior data based on an attention based framework; transforming the user behavior data into a predicted target of user behavior based on a long short term memory processing of the user behavior data; and outputting the predicted target to a mobile device or vehicle of the user.

8. The method according to claim 7, further comprising: determining an error between the predicted target and an actual user behavior.

9. The method according to claim 8, further comprising: updating the user behavior data based on the error.

10. A non-transitory computer-readable medium storing a program that, when executed by a processor, causes the processor to perform a method comprising: determining user behavior vectors that represent historical user behaviors of a user; determining a variable-length user behavior matrix based on a concatenation of the user behavior vectors; converting the variable-length user behavior matrix into a fixed-length embedding vector via a long short term memory network; and outputting the fixed-length embedding vector to the user as a predicted target behavior.

11. The non-transitory computer-readable medium according to claim 10, further comprising: updating the variable-length user behavior matrix based on the predicted target behavior.

12. The non-transitory computer-readable medium according to claim 10, further comprising: guiding the user to a predicted destination in a vehicle based on the predicted target behavior.

13. The non-transitory computer-readable medium according to claim 10, wherein the fixed-length embedding vector represents a user profile.

14. The non-transitory computer-readable medium according to claim 10, further comprising: determining an error between the predicted target behavior and an actual user behavior.

15. The non-transitory computer-readable medium according to claim 14, further comprising: updating the user behavior vectors based on the error.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] FIG. 1 illustrates a flow chart according to an exemplary embodiment of the present invention.

[0021] FIG. 2 illustrates a general user profile learning system according to the present invention.

[0022] FIG. 3 illustrates a standard long short term memory (LSTM) network trained under a downstream prediction task according to the present invention.

[0023] FIG. 4 illustrates an exemplary spread of data points for users i, j and k, in which user j is the most similar to user i and user k is the least similar to user i.

[0024] FIG. 5 illustrates the raw activity log for users i, j and k corresponding to FIG. 4.

[0025] FIG. 6 illustrates an exemplary embodiment of a method according to the present invention.

[0026] FIG. 7 illustrates a schematic block diagram of a system according to an exemplary embodiment of the present invention.

DETAILED DESCRIPTION OF THE DRAWINGS

[0027] FIG. 1 illustrates a flow chart according to an exemplary embodiment of the present invention. As illustrated in FIG. 1, the process 100 includes obtaining user characteristics in step 101, transforming the user characteristics in step 102 using an attention based framework and producing a user behavior record in step 103. In step 104, the user behavior record is transformed using a modified sequence based LSTM network, which produces an observation matrix in step 105. LSTM networks are artificial recurrent neural network (RNN) architectures used in the field of deep learning. This enables deep learning of user characteristics represented by embedding. From the collected data as observation, we can estimate the modeling to minimize the loss between the target and the prediction, where the loss function is defined. In the data collection, we can take any data as a target, and leverage previous history as an input, and thus the framework is supervised, but no annotation or labeling is required, with the potential to be self-learning all from the data.

[0028] FIG. 2 illustrates a general user profile learning system according to the present invention. According to this system, the algorithm takes one behavior record as a target 201 and historic behaviors 206 are input to the sequence modeling 204. The historical data is used to train the model. From this information, a transform for similarity measurement is performed 202 and a probability between the prediction and the target is output 203, wherein the loss function is defined as the probability between the prediction and the target as the ground truth, such as the cross entropy. A unique aspect of this system is that the algorithm is organized as supervised, but there is no manual annotation or labeling needed. After the sequence modeling is performed based on the historical behavior learning 204, the user modeling/embedding 205 is performed.

[0029] According to the proposed algorithm, user behaviors are input and the output is a prediction of the possibility of a target behavior occurring and a user profile inference. The algorithm includes semantic modeling, in which objects (e.g., user interaction I, content O, and context C) are transformed into sematic space. A transform is performed to provide a similarity measure between historical behaviors and the target behavior. The possible behaviors are ranked and the most possible behavior, having the highest similarity against the historical behaviors, is selected as the target behavior. According to the algorithm, the user modeling is based on historical behavior learning, and an evaluation is performed using an N-best match (exact match: 1-best). The algorithm according to the present invention provides rich semantic modeling using discriminative training with a small similarity model and an online learning capability.

[0030] We introduce the transfer learning method to leverage previous leanings from a pre-trained model and avoid starting from scratch for the user profile learning. The pre-trained model is based on a behavior learning model that is supervised and trained based on the loss defined by a prediction task, e.g., destination recommendation. User behavior is defined as taking certain action on certain content at the given context. All user interaction I, content O, and context C are modeled to construct the feature modeling layer consisting of the raw input. Besides the final prediction result, the embedding of objects are trained to have the following matrix:

E({I})=[[I.sub.1,1, I.sub.1,2, . . . , I.sub.1,H], . . . , [I.sub.Q,1, I.sub.Q,2, . . . , I.sub.Q,H]]

E({O})=[[O.sub.1,1, O.sub.1,2, . . . , O.sub.1,H], . . . , [O.sub.K,1, O.sub.K,2, . . . , O.sub.K,H]]

E({C})=[[C.sub.1,1, C.sub.1,2, . . . , C.sub.1,H], . . . , [C.sub.P,1, C.sub.P,2, . . . , C.sub.P,H]]

[0031] r=concatenate.sub.axis=1(E(I.sub.q), E(O.sub.k), E(C.sub.p))×w+b

[0032] where H is the pre-defined feature size of embedding vector, Q, K, P is the size of user interaction, content, and context, respectively, w and b are also the pre-train parameters, r represents one behavior record based on user interaction I.sub.q, content O.sub.k, and context C.sub.p.

[0033] In practice, the pre-trained model can help to transfer the knowledge learned previous and greatly decrease the computation time. The training can be done offline then deploy the learned embedding as features to be fed into proposed user profile learning framework.

[0034] FIG. 3 illustrates a standard long short term memory (LSTM) network trained under a downstream prediction task according to the present invention. Given that a user's behaviors consist of a sequence of user behavior records ordered by timestamps, assume the user has T numbers of behavior records, we concatenate all behavior records r along axis t to generate an (H, T)-size matrix R=(r.sub.1r.sub.2 . . . r.sub.T).sub.t, where H and T may be, for example, 30 and 128 dimensions, respectively. Instead of using user behaviors matrix R to represent the user, we applied a sequence modeling to convert the varied-length matrix to a fixed-length embedding vector. Here we implemented a standard long short term memory (LSTM) network trained under a downstream prediction task as illustrated in FIG. 3, in which element A represents an LSTM unit.

[0035] As illustrated in FIG. 3, the target behavior FT and the behaviors matrix R are input to the sequence model. In FIG. 3, x.sub.t represents the input vector of the LSTM unit, h.sub.t represents the output vector of the ASTM unit, and Y represents the output including the fixed-length embedding vector.

[0036] As one user's behavior might drift along time due to either a non-recurrent event such as a vacation or periodical event such as weekday/weekend routines, we propose a recursive representation of user embedding through considering the delay of the past behaviors and the observed current behaviors. Let U.sub.t the user embedding calculated based on user historical behaviors R.sub.t:t.sub.0.sub.˜t.sub.0.sup.+Δt starting from timestamp t.sub.0 to t. The predicted user embedding at time t+Δt can be calculated as follows:

U*.sub.t+Δt=α*U*.sub.t+(1−α)*U.sub.t+Δt

[0037] where U*.sub.t is prediction value and U.sub.t+Δt is the observation value.

[0038] We explored the deployment of the proposed model on a trip pattern prediction task that predicts which location a user will visit at a certain time given his/her trip history in an experiment. The dataset includes user location tracking including driving. Raw features of the experiment include, for example, <user ID, location_gps_grid_ID, timestamp), 100 users, 1578 locations through 200 m×200 m grid by map segmentation, over a 6-month period. For the task, we assume a user interaction for user u is the following:

[0039] I.sub.u={(visit location i.sub.0 at time t.sub.0), . . . , (visit location i.sub.T at time t.sub.T)}, where we use the first k of I.sub.u to predict the k+1-th visit in the train set, where data contains both location i and timestamp t information for the visit, and use the first n−1 visit to predict the last one in the test set. We applied top 1-best matching accuracy that is widely used in recommendation systems to measure the performance. Meanwhile, parameter number and response time were reported to indicate the scalability. We also evaluated our model in the online learning case for distributed training purposes.

[0040] We benchmarked the model performance based on different training scenarios (online or offline) and whether transfer learning is enabled. The prediction accuracy and response time are both evaluated on the same test set across all indexed models. The result is shown in the following Table 1.

TABLE-US-00001 TABLE 1 Online Transfer Prediction accuracy Trainable Response time Index Learning Learning Training Data Model (Top 1 Matching) Parameters (second/100 users) 1 N N 6-month data Baseline 0.81 324,590 2.445 2 N Y 6-month data Pre-trained Baseline + 0.83 456,174 0.309 LSTM 3 Y Y First 5-month data Pre-trained Baseline + 0.85 456,174 0.309 for offline training, LSTM last 1-month data for online training 4 N Y Last 1-month data Pre-trained Baseline + 0.76 456,174 0.309 LSTM

[0041] As illustrated in Table 1, when both online learning and transfer learning are enabled, the result of index 3 shows that our proposed algorithm improves the prediction and greatly decreases the response time.

[0042] FIG. 4 illustrates an exemplary spread of data points for users i, j and k, in which user j is the most similar to user i and user k is the least similar to user i. We explored the learned embedding of 100 users. First, we computed the pairwise similarity d among users through Euclidean distance measurement. Second, we visualized the 100 embedding vectors through a dimension reduction by principle component analysis. We chose the ith user as an example for illustration. For user i, we found the user j that represents the most similar user and user k that represents the most different user based on the following equation:

[00001] $j = \underset{j}{argmin} (d_{i j}); k = \underset{k}{argmax} (d_{i k}),$

where the data points of user i, j, and k are shown in FIG. 4. The distribution of points is consistent with the distance measurement that user i and user j are mostly overlapping each other, while user k is located in a remote area.

[0043] FIG. 5 illustrates the raw activity log for users i, j and k corresponding to FIG. 4. The x-axis represents the trip timestamp while the y-axis shows the visited locations which have been re-indexed to 0 and 1 for illustration. Once the user changed the location, the index shifted from the current one to another one. This shows that the user embedding is consistent with the observation of user similarity.

[0044] FIG. 6 illustrates an exemplary embodiment of a method according to the present invention. In step S601, a variable-length user behavior matrix and a target behavior vector are received. In step S602, the variable-length user behavior matrix is converted into a fixed-length embedding vector. The user embedding is predicted in step S603 based on the fixed-length embedding vector, and in step S604 the target behavior is compared to the actual behavior to determine the loss (error) in the prediction. The target behavior may then be outputted to the user and/or may be recursively determined again in step S605.

[0045] FIG. 7 illustrates a schematic block diagram of a system according to an exemplary embodiment of the present invention. The system may include, for example, a vehicle 700, a modeling server 710, a mobile device 720, and cloud storage 730. Each of these devices has its own processor and memory and a communication interface(s), wherein the processors are specifically programmed to perform the functions described herein. Telemetry data and the like may be received from the vehicle 700 and may be received from the mobile device 720. The mobile device 720 may be a smart phone, tablet computer or the like. Communication between the modeling server and the vehicle/mobile device may occur via cellular network, WiFi, Bluetooth, or the like. Data gathered from the vehicle 700 and the mobile device 720 may be transmitted to the modeling server 710 or transmitted directly to cloud storage 730.

[0046] In another exemplary embodiment of the present invention, a non-transitory computer-readable medium is encoded with a computer program that performs the above-described method. Common forms of non-transitory computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read.

[0047] The present invention provides a number of significant advantages over conventional systems and methods. In particular, the present invention provides a unified algorithmic framework for user modeling based on user behavior that is able to extend to become feature toward different services. The user can be flexibly trained for different tasks driven by user behavior, e.g., predicted destination driven by mobility behavior, recommended feature by app usage behavior, etc. The semantics are enriched for users, which allows computation among users, e.g., user segmentation, user similarity based recommendation, and predictive modeling.

[0048] Also, the system and method according to the present invention has low complexity that improves the service online computation due to compact user modeling and improves the user experience by leveraging personal context to have better predicted performance. The present invention also provides a solution to data sparsity. Additionally, the present invention enables transfer learning and online learning. The pre-trained model can help to transfer the knowledge learned previously and greatly decrease the computation time. Meanwhile, the online learning enables the distributed training to deal with computation scalability to address the large-scale dataset in real-world applications.

[0049] The foregoing disclosure has been set forth merely to illustrate the invention and is not intended to be limiting. Since modifications of the disclosed embodiments incorporating the spirit and substance of the invention may occur to persons skilled in the art, the invention should be construed to include everything within the scope of the appended claims and equivalents thereof.

Deep User Modeling by Behavior

Inventors

Cpc classification

Classification Explorer

G06Q30/0269

PHYSICS

Classification Explorer

G06N3/044

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G01C21/3407

PHYSICS

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06V40/20

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06F18/213

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G01C21/3484

PHYSICS

International classification

Classification Explorer

G01C21/34

PHYSICS

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06K9/00

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Abstract

Claims

Description