METHOD AND SYSTEM FOR BEHAVIOR VECTORIZATION OF INFORMATION DE-IDENTIFICATION

Abstract

A method for behavior vectorization of information de-identification, through which data concerning browsing traces, link paths, trigger events, clicks, and operation behaviors of network users on the Internet are selected by a server, a client device, or an edge device for performing a conversion/integration process. Then, the integrated data are converted into a vector. The vector represents the profile of the usage behavior of the network users. Moreover, because vectors can be quickly grouped and classified to find similar groups, it can quickly identify the network users. The server uses the supervised learning method as the base method, and uses pre-defined network behaviors for training. Also, the semi-supervised learning method or the unsupervised learning method can be employed to modify undefined network behaviors to better conform to the profile description of the network users.

Claims

1. A method for behavior vectorization of information de-identification, comprising following steps: providing data by a data provider, wherein a server is connected with a data provider device, and wherein the data provider device provides and transmits a path vector learning data and a vector grouping learning data to the server; training a model, wherein, after the server receives the path vector learning data and the vector grouping learning data, a vectorization module of the server uses the path vector learning data as past data for performing a first machine learning, and wherein a grouping/classifying module of the server uses the vector grouping learning data as past data for performing a second machine learning; retrieving path data of network users, wherein, after the first machine learning and the second machine learning are completed, the server retrieves a path data of a client device and transmits the path data to the vectorization module; vectorizing path data, wherein the vectorization module performs a data vectorization action on the path data based on a result of the first machine learning such that the path data are converted into vectorized data, and wherein the vectorization module transmits the vectorized data to the grouping/classifying module; and vectorizing and grouping, wherein the grouping/classifying module performs a grouping action on the vectorized data based on a result of the second machine learning, and assigns a grouping result to the vectorized data, and finally stores the grouping result to the server.

2. The method as claimed in claim 1, wherein the path vector learning data include a plurality of past path data and a plurality of past vectorized data, and wherein the past vectorized data are one of a website trigger event, a website click event, a website operation behavior, a website stay time of the past path data, or a combination thereof.

3. The method as claimed in claim 2, wherein the vector grouping learning data include a plurality of the past vectorized data and a plurality of past grouping data, and wherein the past grouping data corresponds to the plurality of past vectorized data.

4. The method as claimed in claim 1, wherein the first machine learning and the second machine learning are one of a group consisting of a supervised learning, a semi-supervised learning, a reinforcement learning, an unsupervised learning, a self-supervised learning, a heuristic algorithms, and a combination thereof.

5. The method as claimed in claim 1, wherein the path data are one of a group consisting of a website trigger event, a website click event, a website operation behavior, a website stay time, and a combination thereof.

6. The method as claimed in claim 1, wherein the data vectorization operation converts one-dimensional data into one of a two-dimensional vector matrix, a three-dimensional vector matrix, or a multi-dimensional vector matrix.

7. The method as claimed in claim 1, wherein, in the step of retrieving path data of the network users and the step of vectorizing the path data, the server first transmits the result of the first machine learning to the client device so that the client device converts the path data into the vectorized data, and then transmits the vectorized data to the server.

8. A system for behavior vectorization of information de-identification, comprising: a server having a data processing module, a data storage module, a vectorization module, and a grouping/classifying module which establish an information link with the server, respectively, the data processing module being provided for running the server, the data storage module being provided for storing data received and calculated by the server; a data provider device establishing an information link with the server, the data provider device providing a path vector learning data and a vector grouping learning data to the server; a client device establishing an information link with the server, the server retrieving a path data of the client device; wherein the vectorization module uses the path vector learning data as past data for performing a first machine learning, and wherein, after the first machine learning training is completed, a data vectorization action can be performed on the path data, and the path data can be converted into a vectorized data; and wherein the grouping/classifying module uses the vector grouping learning data as past data for performing a second machine learning, and wherein, after the second machine learning training is completed, a grouping action can be performed on the vectorized data, and a grouping result is given to the vectorized data, and finally the grouping result is stored in the data storage module.

9. The system as claimed in claim 8, wherein wherein the path vector learning data include a plurality of past path data and a plurality of past vectorized data, and wherein the past vectorized data are one of a website trigger event, a website click event, a website operation behavior, a website stay time of the past path data, or a combination thereof.

10. The system as claimed in claim 9, wherein the vector grouping learning data include a plurality of the past vectorized data and a plurality of past grouping data, and wherein the past grouping data corresponds to the plurality of past vectorized data.

11. The system as claimed in claim 8, wherein the first machine learning and the second machine learning are one of a group consisting of a supervised learning, a semi-supervised learning, a reinforcement learning, an unsupervised learning, a self-supervised learning, a heuristic algorithms, and a combination thereof.

12. The system as claimed in claim 8, wherein the path data are one of a group consisting of a website trigger event, a website click event, a website operation behavior, a website stay time, and a combination thereof.

13. The system as claimed in claim 8, wherein the data vectorization operation converts one-dimensional data into one of a two-dimensional vector matrix, a three-dimensional vector matrix, or a multi-dimensional vector matrix.

14. The system as claimed in claim 8, wherein the server further establishes an information link with at least one edge server, and wherein the edge server assists the server and improves the computing function of the server with an edge computing function.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] FIG. 1 is a schematic drawing of the composition of the present disclosure;

[0013] FIG. 2 is a flow chart of the present disclosure;

[0014] FIG. 3 is a schematic drawing I of the implementation of the present disclosure;

[0015] FIG. 4 is a schematic drawing II of the implementation of the present disclosure;

[0016] FIG. 5 is a schematic drawing III of the implementation of the present disclosure;

[0017] FIG. 6 is a schematic drawing IV of the implementation of the present disclosure;

[0018] FIG. 7 is a schematic drawing V of the implementation of the present disclosure;

[0019] FIG. 8 is a schematic drawing VI of the implementation of the present disclosure;

[0020] FIG. 9 is a schematic drawing VII of the implementation of the present disclosure;

[0021] FIG. 10 is a schematic drawing of another embodiment of the present disclosure; and

[0022] FIG. 11 is a schematic drawing of a further embodiment of the present disclosure.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0023] Referring to FIG. 1, a system 1 for behavior vectorization of information de-identification according to the present disclosure includes a server 11, a data provider device 12, and a client device 13.

[0024] The server 11 establishes an information link with the data provider device 12 and the client device 13. The server 11 can receive a learning training sample provided by the data provider device 12 and build a machine learning model based on the learning training sample provided by the data provider device 12. The model can mainly retrieve network usage paths of the client device 13 for stacking and vectorization, and then group and classify the vectorized data.

[0025] The data provider device 12 can be a search engine database or a data database. Any device that enables the server 11 to obtain the required learning and training samples can be employed.

[0026] The client device 13 can be one of a mobile phone, a tablet computer, a personal computer, etc. Any device that enables the server 11 to obtain the required samples to be tested, can be employed.

[0027] The client device 13 is operated by a client. The client can use the Internet through the client device 13, and the server 11 can retrieve the Internet path used by the client device 13. The client of the client device 13 mainly refers to a network user, but it is not limited thereto.

[0028] The server 11 mainly includes a data processing module 111, a data storage module 112, a vectorization module 113, and a grouping/classifying module 114 which establish an information link with each other. The data processing module 111 is used to run the server 11 and to drive the modules connected thereto. The data processing module 111 fulfills functions such as logic operations, temporary storage of operation results, and storage of execution instruction positions. It can be, for example, a CPU, but is not limited thereto.

[0029] The data storage module 112 can store electronic data, which can be, for example, a Solid State Disk or Solid State Drive (SSD), a Hard Disk Drive (HDD), a Static Random Access Memory (SRAM), or a Random Access Memory (DRAM), etc. The data storage module 112 mainly stores path vector learning data and vector grouping learning data transmitted by the data provider device 12, path data transmitted by the client device 13, and data calculated and processed by the server 11.

[0030] The vectorization module 113 mainly performs training and learning for the path vector learning data provided by the data provider device 12. After the training and learning are completed, the vectorization module 113 can convert the path data transmitted by the client device 13 into vectorized data. The training and learning of the vectorization module 113 mainly use machine learning such as supervised learning, semi-supervised learning, reinforcement learning, unsupervised learning, self-supervised learning or heuristic algorithms, but not limited thereto. The above-mentioned path vector learning data can be a plurality of past path data and a plurality of past vectorized data. The past path data and the path data can be any data of a website trigger event, a website click event, a website operation behavior, a website stay time, or a combination thereof. Any data referring to the visiting traces on the Internet is applicable. The past vectorized data mainly correspond to the past path data, and are used for training and learning by the vectorization module 113. The vectorized data can be one of two-dimensional matrix vector, three-dimensional matrix vector, or multi-dimensional matrix vector. The vectorization module 113 mainly stacks and converts each one-dimensional data in the path data into the vectorized data. For example, a network user of the client device 13 stays on a website A for 5 minutes and 30 seconds, clicks on three products, and each is linked to other external websites corresponding to the three products, then returns back to the website A. Meanwhile, the network user watches advertisements A, B, C on the website A for 15 seconds, respectively. In this case, a matrix of the client device 13 can be provided by the vectorization module 113 and defined to be: [0.33, 3, 0.45] ([total stay time, number of products clicked, total time to watch advertisements]). The above-mentioned case is only an example, but should not limited thereto. After the vectorization module 113 converts the path data into the vectorized data, it can be stored in the data storage module 112 or transmitted to the subsequent grouping/classifying module 114.

[0031] The grouping/classifying module 114 can perform training and learning for the vector grouping learning data provided by the data provider device 12. After the training and learning are completed, the grouping/classifying module 114 can assign a grouping result to the vectorized data transmitted by the vectorization module 113. The grouping/classifying module 114 can group and classify the vectorized data transmitted by the vectorization module 113. The training and learning of the grouping/classifying module 114 mainly uses machine learning such as supervised learning, semi-supervised learning, reinforcement learning, unsupervised learning, self-supervised learning or heuristic algorithms, but not limited thereto. The vector grouping learning data include mainly a plurality of the past vectorized data and a past grouping data. The past grouping data can include a plurality of the past vectorized data of the aforementioned past network users for training and learning by the grouping/classifying module 114. Moreover, the grouping result can be a group or set containing a plurality of vectorized data representing network users.

[0032] As illustrated in FIG. 2 together with FIG. 1, steps of the present disclosure are shown as follows:

[0033] (1) Step S1 of providing data by a data provider:

[0034] As shown in FIG. 3, the server 11 receives a path vector learning data D1 and a vector grouping learning data D2 transmitted by a data provider device 12. The data processing module 111 respectively transmits the path vector learning data D1 to the vectorization module 113, and the vector grouping learning data D2 to the grouping/classifying module 114 for training and learning. The above-mentioned path vector learning data D1 can be a plurality of past path data and a plurality of past vectorized data. The past path data can be any data of a website trigger event, a website click event, a website operation behavior, a website stay time, or a combination thereof. Any data referring to the visiting traces left on the Internet is applicable. The vector grouping learning data D2 can include a plurality of the past vectorized data and a plurality of past grouping data. The past grouping data can include a plurality of the past vectorized data of the past network users, but not limited thereto.

[0035] (2) Step S2 of training a model:

[0036] After the vectorization module 113 receives the path vector learning data D1 transmitted by the data provider device 12 and the vector grouping learning data D2 of the grouping/classifying module 114, the vectorization module 113 uses the path vector learning data D1 as the past data to perform a first machine learning. The grouping/classifying module 114 uses the vector grouping learning data D2 as the past data to perform a second machine learning. The first and the second machine learning mainly refer to the machine learning such as supervised learning, semi-supervised learning, reinforcement learning, unsupervised learning, self-supervised learning or heuristic algorithms, but not limited thereto.

[0037] (3) Step S3 of retrieving path data of the network users:

[0038] Following the above-mentioned steps and referring to FIG. 4, after the aforementioned first machine learning and the aforementioned second machine are completed, the data processing module 111 can retrieve a path data D3 of the client device 13. Meanwhile, the path data D3 are transmitted to the vectorization module 113 for subsequent operations. The past path data can be any data of a website trigger event, a website click event, a website operation behavior, a website stay time, or a combination thereof. Any data referring to the visiting traces left on the Internet by the client device 13 is applicable. For example: An network user of the client device 13 stays on website A for 10 minutes and 23 seconds, and clicks on 5 products, and each is linked to other external websites corresponding to the five products, then returns back to the website A. Meanwhile, the network user watches advertisements A, B, C on the website A for 20 seconds, respectively. Finally, after 2 products are searched and the website A is closed, the server 11 retrieves the time spent on the client device 13, the number of product clicks, the number of ads viewed, the time spent for watching ads, and the number of product searches, etc. But the data retrieved does not include the personal data stored in the client device 13. Finally, the server 11 then transmits the retrieved data to the vectorization module 113. The above-mentioned is only an example, and should not be limited thereto.

[0039] (4) Step S4 of vectorizing path data:

[0040] Referring to FIG. 5 and FIG. 6, after the vectorization module 113 receives the path data D3, it performs a data vectorization operation based on a result of the first machine learning to convert the path data D3 into a vectorized data D4. The data vectorization operation mainly converts one-dimensional data into one of two-dimensional vector matrix, three-dimensional vector matrix, or multi-dimensional vector matrix. For example: Continuing the example of step S3 of retrieving path data of the network user, the vectorization module 113 converts the 10 minutes and 23 seconds (total 623 seconds represented by A), that the network user of the client device 13 stays on the website A, to a part a of the vector matrix C1. Meanwhile, the part a is set to be 0.623. A part b of the vector matrix C1 is the number X of product clicks plus the number Y of product searches, and is set to be 7. A part c of the vector matrix C1 is the product of the number a of ads viewed and the time β spent for watching ads, and is set to be 0.6. After the vector matrix C1 is created, the three-dimensional spatial distribution thereof is illustrated in FIG. 6. C1 to C6 in FIG. 6 can all represent different network users of the client device. The above-mentioned conversion process is only an example. In actual operation, the path data D3 is converted into the vectorized data C1 based on the results of machine learning. The conversion illustrated here is not provided for limitation. The vectorization module 113 finally stores the generated vectorized data D4 to the data storage module 112, or transmits it to the subsequent grouping/classifying module 114.

[0041] (5) Step S5 of vectorizing and grouping:

[0042] Following the above-mentioned steps and referring to FIG. 7 through FIG. 9, after receiving the vectorized data D4, the group classification module 114 performs a grouping action based on a result of the second machine learning. Meanwhile, a grouping result is assigned to the vectorized data D4. The grouping result is a group or a set that can contain a plurality of the vectorized data C1 representing the network user. For example: Continuing the example of the step S4 of vectorizing path data, a tangent t can represent that the grouping/classifying module 114 divides C1 to C6 into two groups under a certain grouping training topic. C1 to C3 can belong to group 1, and C4 to C6 can belong to group 2. Since C1 to C6 are all in the form of vectors, they can be classified quickly. In the same situation, the tangent line t is different in slope and direction due to different training topics, which makes the grouping results different. The above-mentioned grouping process is just an example. In actual operation, the result of machine learning is used to assign the grouping result of the vectorized data, and the conversion as illustrated here does not serve as a limitation. Finally, the grouping/classifying module 114 can store the grouping result to the data storage module 112.

[0043] Referring to FIG. 10, the step S4 of vectorizing path data can be followed by a step S6 of correcting the model. After receiving the path data D3, the vectorization module 113 performs a data vectorization operation based on the result of the first machine learning. However, if the path data D3 transmitted by the client device 13 is data that has never appeared or rarely appeared in the past path data, the vectorization module 113 can modify the result of the first machine learning based on the path data. In this way, the subsequent vectorized data D4 is more consistent with the client device 13.

[0044] In the step S3 of retrieving path data of the network users and in the step S4 of vectorizing path data, the server 11 may further transmit the result of the first machine learning to the client device 13. After receiving the result of the first machine learning, the client device 13 can retrieve the path data D3 of the client device 13 in real time. Meanwhile, the path data D3 are converted into vectorized data D4, and then the vectorized data D4 are transmitted to the server 11.

[0045] Referring to FIG. 11, the server 11 can establish an information link with at least one edge server 14. The edge server 14 mainly provides one of the edge computing functions of the server 11. The edge server 14 can be a mobile phone, a tablet computer, a personal computer, a central processing computer, etc. Any device that can share the computing functions of the server 11 is applicable. Edge computing is configured to decompose the large data that was originally processed by the central node and cut it into smaller and easier-to-manage data, and distribute it to the edge nodes for processing. Because the edge node is closer to the client device 13, the data processing and transmission speed can be accelerated, and the delay can be reduced.

[0046] In summary, the present disclosure is mainly based on machine learning. Without the need to obtain the personal information of the network user, the path of the network users on the Internet is vectorized and grouped. Meanwhile, the network users are identified according to the grouping results for facilitating the subsequent processing and use. The present invention can indeed provide a behavior vectorization method that de-identifies information, converts the path of network users in a vectorized way, and then de-identifies grouped information.

REFERENCE SIGN

[0047] 1 system for behavior vectorization of information de-identification [0048] 11 server [0049] 12 data provider device [0050] 111 data processing module [0051] 112 data storage module [0052] 113 vectorization module [0053] 114 grouping/classifying module [0054] 13 client device [0055] 14 edge server [0056] D1 path vector learning data [0057] D2 vector grouping learning data [0058] D3 path data [0059] D4 vectorized data [0060] S1 step of providing data by a data provider [0061] S2 step of training a model [0062] S3 step of retrieving path data of the network users [0063] S4 step of vectorizing path data [0064] S5 step of vectorizing and grouping [0065] S6 step of correcting the model

METHOD AND SYSTEM FOR BEHAVIOR VECTORIZATION OF INFORMATION DE-IDENTIFICATION

Inventors

Cpc classification

Classification Explorer

G06F18/24

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06Q30/0242

PHYSICS

Classification Explorer

G06F18/27

PHYSICS

Classification Explorer

G06N5/01

PHYSICS

Classification Explorer

G06F18/2413

PHYSICS

Classification Explorer

G06Q30/0271

PHYSICS

Classification Explorer

G06Q30/0272

PHYSICS

International classification

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06N5/00

PHYSICS

Abstract

Claims

Description