METHOD AND SYSTEM FOR BEHAVIOR VECTORIZATION OF INFORMATION DE-IDENTIFICATION
20220335331 · 2022-10-20
Inventors
Cpc classification
G06N5/01
PHYSICS
International classification
Abstract
A method for behavior vectorization of information de-identification, through which data concerning browsing traces, link paths, trigger events, clicks, and operation behaviors of network users on the Internet are selected by a server, a client device, or an edge device for performing a conversion/integration process. Then, the integrated data are converted into a vector. The vector represents the profile of the usage behavior of the network users. Moreover, because vectors can be quickly grouped and classified to find similar groups, it can quickly identify the network users. The server uses the supervised learning method as the base method, and uses pre-defined network behaviors for training. Also, the semi-supervised learning method or the unsupervised learning method can be employed to modify undefined network behaviors to better conform to the profile description of the network users.
Claims
1. A method for behavior vectorization of information de-identification, comprising following steps: providing data by a data provider, wherein a server is connected with a data provider device, and wherein the data provider device provides and transmits a path vector learning data and a vector grouping learning data to the server; training a model, wherein, after the server receives the path vector learning data and the vector grouping learning data, a vectorization module of the server uses the path vector learning data as past data for performing a first machine learning, and wherein a grouping/classifying module of the server uses the vector grouping learning data as past data for performing a second machine learning; retrieving path data of network users, wherein, after the first machine learning and the second machine learning are completed, the server retrieves a path data of a client device and transmits the path data to the vectorization module; vectorizing path data, wherein the vectorization module performs a data vectorization action on the path data based on a result of the first machine learning such that the path data are converted into vectorized data, and wherein the vectorization module transmits the vectorized data to the grouping/classifying module; and vectorizing and grouping, wherein the grouping/classifying module performs a grouping action on the vectorized data based on a result of the second machine learning, and assigns a grouping result to the vectorized data, and finally stores the grouping result to the server.
2. The method as claimed in claim 1, wherein the path vector learning data include a plurality of past path data and a plurality of past vectorized data, and wherein the past vectorized data are one of a website trigger event, a website click event, a website operation behavior, a website stay time of the past path data, or a combination thereof.
3. The method as claimed in claim 2, wherein the vector grouping learning data include a plurality of the past vectorized data and a plurality of past grouping data, and wherein the past grouping data corresponds to the plurality of past vectorized data.
4. The method as claimed in claim 1, wherein the first machine learning and the second machine learning are one of a group consisting of a supervised learning, a semi-supervised learning, a reinforcement learning, an unsupervised learning, a self-supervised learning, a heuristic algorithms, and a combination thereof.
5. The method as claimed in claim 1, wherein the path data are one of a group consisting of a website trigger event, a website click event, a website operation behavior, a website stay time, and a combination thereof.
6. The method as claimed in claim 1, wherein the data vectorization operation converts one-dimensional data into one of a two-dimensional vector matrix, a three-dimensional vector matrix, or a multi-dimensional vector matrix.
7. The method as claimed in claim 1, wherein, in the step of retrieving path data of the network users and the step of vectorizing the path data, the server first transmits the result of the first machine learning to the client device so that the client device converts the path data into the vectorized data, and then transmits the vectorized data to the server.
8. A system for behavior vectorization of information de-identification, comprising: a server having a data processing module, a data storage module, a vectorization module, and a grouping/classifying module which establish an information link with the server, respectively, the data processing module being provided for running the server, the data storage module being provided for storing data received and calculated by the server; a data provider device establishing an information link with the server, the data provider device providing a path vector learning data and a vector grouping learning data to the server; a client device establishing an information link with the server, the server retrieving a path data of the client device; wherein the vectorization module uses the path vector learning data as past data for performing a first machine learning, and wherein, after the first machine learning training is completed, a data vectorization action can be performed on the path data, and the path data can be converted into a vectorized data; and wherein the grouping/classifying module uses the vector grouping learning data as past data for performing a second machine learning, and wherein, after the second machine learning training is completed, a grouping action can be performed on the vectorized data, and a grouping result is given to the vectorized data, and finally the grouping result is stored in the data storage module.
9. The system as claimed in claim 8, wherein wherein the path vector learning data include a plurality of past path data and a plurality of past vectorized data, and wherein the past vectorized data are one of a website trigger event, a website click event, a website operation behavior, a website stay time of the past path data, or a combination thereof.
10. The system as claimed in claim 9, wherein the vector grouping learning data include a plurality of the past vectorized data and a plurality of past grouping data, and wherein the past grouping data corresponds to the plurality of past vectorized data.
11. The system as claimed in claim 8, wherein the first machine learning and the second machine learning are one of a group consisting of a supervised learning, a semi-supervised learning, a reinforcement learning, an unsupervised learning, a self-supervised learning, a heuristic algorithms, and a combination thereof.
12. The system as claimed in claim 8, wherein the path data are one of a group consisting of a website trigger event, a website click event, a website operation behavior, a website stay time, and a combination thereof.
13. The system as claimed in claim 8, wherein the data vectorization operation converts one-dimensional data into one of a two-dimensional vector matrix, a three-dimensional vector matrix, or a multi-dimensional vector matrix.
14. The system as claimed in claim 8, wherein the server further establishes an information link with at least one edge server, and wherein the edge server assists the server and improves the computing function of the server with an edge computing function.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0023] Referring to
[0024] The server 11 establishes an information link with the data provider device 12 and the client device 13. The server 11 can receive a learning training sample provided by the data provider device 12 and build a machine learning model based on the learning training sample provided by the data provider device 12. The model can mainly retrieve network usage paths of the client device 13 for stacking and vectorization, and then group and classify the vectorized data.
[0025] The data provider device 12 can be a search engine database or a data database. Any device that enables the server 11 to obtain the required learning and training samples can be employed.
[0026] The client device 13 can be one of a mobile phone, a tablet computer, a personal computer, etc. Any device that enables the server 11 to obtain the required samples to be tested, can be employed.
[0027] The client device 13 is operated by a client. The client can use the Internet through the client device 13, and the server 11 can retrieve the Internet path used by the client device 13. The client of the client device 13 mainly refers to a network user, but it is not limited thereto.
[0028] The server 11 mainly includes a data processing module 111, a data storage module 112, a vectorization module 113, and a grouping/classifying module 114 which establish an information link with each other. The data processing module 111 is used to run the server 11 and to drive the modules connected thereto. The data processing module 111 fulfills functions such as logic operations, temporary storage of operation results, and storage of execution instruction positions. It can be, for example, a CPU, but is not limited thereto.
[0029] The data storage module 112 can store electronic data, which can be, for example, a Solid State Disk or Solid State Drive (SSD), a Hard Disk Drive (HDD), a Static Random Access Memory (SRAM), or a Random Access Memory (DRAM), etc. The data storage module 112 mainly stores path vector learning data and vector grouping learning data transmitted by the data provider device 12, path data transmitted by the client device 13, and data calculated and processed by the server 11.
[0030] The vectorization module 113 mainly performs training and learning for the path vector learning data provided by the data provider device 12. After the training and learning are completed, the vectorization module 113 can convert the path data transmitted by the client device 13 into vectorized data. The training and learning of the vectorization module 113 mainly use machine learning such as supervised learning, semi-supervised learning, reinforcement learning, unsupervised learning, self-supervised learning or heuristic algorithms, but not limited thereto. The above-mentioned path vector learning data can be a plurality of past path data and a plurality of past vectorized data. The past path data and the path data can be any data of a website trigger event, a website click event, a website operation behavior, a website stay time, or a combination thereof. Any data referring to the visiting traces on the Internet is applicable. The past vectorized data mainly correspond to the past path data, and are used for training and learning by the vectorization module 113. The vectorized data can be one of two-dimensional matrix vector, three-dimensional matrix vector, or multi-dimensional matrix vector. The vectorization module 113 mainly stacks and converts each one-dimensional data in the path data into the vectorized data. For example, a network user of the client device 13 stays on a website A for 5 minutes and 30 seconds, clicks on three products, and each is linked to other external websites corresponding to the three products, then returns back to the website A. Meanwhile, the network user watches advertisements A, B, C on the website A for 15 seconds, respectively. In this case, a matrix of the client device 13 can be provided by the vectorization module 113 and defined to be: [0.33, 3, 0.45] ([total stay time, number of products clicked, total time to watch advertisements]). The above-mentioned case is only an example, but should not limited thereto. After the vectorization module 113 converts the path data into the vectorized data, it can be stored in the data storage module 112 or transmitted to the subsequent grouping/classifying module 114.
[0031] The grouping/classifying module 114 can perform training and learning for the vector grouping learning data provided by the data provider device 12. After the training and learning are completed, the grouping/classifying module 114 can assign a grouping result to the vectorized data transmitted by the vectorization module 113. The grouping/classifying module 114 can group and classify the vectorized data transmitted by the vectorization module 113. The training and learning of the grouping/classifying module 114 mainly uses machine learning such as supervised learning, semi-supervised learning, reinforcement learning, unsupervised learning, self-supervised learning or heuristic algorithms, but not limited thereto. The vector grouping learning data include mainly a plurality of the past vectorized data and a past grouping data. The past grouping data can include a plurality of the past vectorized data of the aforementioned past network users for training and learning by the grouping/classifying module 114. Moreover, the grouping result can be a group or set containing a plurality of vectorized data representing network users.
[0032] As illustrated in
[0033] (1) Step S1 of providing data by a data provider:
[0034] As shown in
[0035] (2) Step S2 of training a model:
[0036] After the vectorization module 113 receives the path vector learning data D1 transmitted by the data provider device 12 and the vector grouping learning data D2 of the grouping/classifying module 114, the vectorization module 113 uses the path vector learning data D1 as the past data to perform a first machine learning. The grouping/classifying module 114 uses the vector grouping learning data D2 as the past data to perform a second machine learning. The first and the second machine learning mainly refer to the machine learning such as supervised learning, semi-supervised learning, reinforcement learning, unsupervised learning, self-supervised learning or heuristic algorithms, but not limited thereto.
[0037] (3) Step S3 of retrieving path data of the network users:
[0038] Following the above-mentioned steps and referring to
[0039] (4) Step S4 of vectorizing path data:
[0040] Referring to
[0041] (5) Step S5 of vectorizing and grouping:
[0042] Following the above-mentioned steps and referring to
[0043] Referring to
[0044] In the step S3 of retrieving path data of the network users and in the step S4 of vectorizing path data, the server 11 may further transmit the result of the first machine learning to the client device 13. After receiving the result of the first machine learning, the client device 13 can retrieve the path data D3 of the client device 13 in real time. Meanwhile, the path data D3 are converted into vectorized data D4, and then the vectorized data D4 are transmitted to the server 11.
[0045] Referring to
[0046] In summary, the present disclosure is mainly based on machine learning. Without the need to obtain the personal information of the network user, the path of the network users on the Internet is vectorized and grouped. Meanwhile, the network users are identified according to the grouping results for facilitating the subsequent processing and use. The present invention can indeed provide a behavior vectorization method that de-identifies information, converts the path of network users in a vectorized way, and then de-identifies grouped information.
REFERENCE SIGN
[0047] 1 system for behavior vectorization of information de-identification [0048] 11 server [0049] 12 data provider device [0050] 111 data processing module [0051] 112 data storage module [0052] 113 vectorization module [0053] 114 grouping/classifying module [0054] 13 client device [0055] 14 edge server [0056] D1 path vector learning data [0057] D2 vector grouping learning data [0058] D3 path data [0059] D4 vectorized data [0060] S1 step of providing data by a data provider [0061] S2 step of training a model [0062] S3 step of retrieving path data of the network users [0063] S4 step of vectorizing path data [0064] S5 step of vectorizing and grouping [0065] S6 step of correcting the model