Evaluation device, evaluation method, evaluation program, and evaluation system
11282040 · 2022-03-22
Assignee
Inventors
Cpc classification
G06Q10/107
PHYSICS
G06F13/00
PHYSICS
G06F16/9566
PHYSICS
H04L51/42
ELECTRICITY
H04L41/026
ELECTRICITY
International classification
H04L51/00
ELECTRICITY
H04L41/026
ELECTRICITY
G06F16/955
PHYSICS
Abstract
An evaluating method can be performed by a computer. The method includes acquiring two-dimensional data represented by a plurality of character types, converting the two-dimensional data to three-dimensional data by classifying the acquired two-dimensional data into a dimension of the plurality of character types, analyzing a feature of the three-dimensional data, and evaluating input data described in the plurality of character types based on the analyzed feature.
Claims
1. An evaluation device comprising: a non-transitory computer readable memory; a processor; an acquisition unit for acquiring two-dimensional data represented by a plurality of character types, wherein there are I character types, I being an integer; a conversion unit for converting the two-dimensional data to three-dimensional data by classifying the acquired two-dimensional data into a dimension of the plurality of character types, wherein the conversion unit is configured to compress the I character types to a dimension that has a lower number than the I character types to generate a compressed three-dimensional data; an analysis unit for analyzing a feature of the three-dimensional data; and an evaluation unit for evaluating input data described in the plurality of character types based on the feature analyzed by the analysis unit.
2. The evaluation device of claim 1 further comprising a converting unit for converting the input data to three-dimensional input data by classifying the input data into a dimension of the plurality of character types, wherein the evaluation unit is configured to evaluate the input data by comparing a feature of the three-dimensional input data with the feature analyzed by the analysis unit.
3. The evaluation device of claim 1, wherein the input data is electronic mail data and the evaluation unit is configured to evaluate whether the input data is or is not spam.
4. An evaluation system comprising: a non-transitory computer readable memory; a processor; a learning module for learning a feature of text data; and an evaluation module for evaluating the text data, wherein the learning module comprises: an acquisition unit for acquiring two-dimensional data represented by a plurality of character types from data base, wherein there are I character types, I being an integer; a conversion unit for converting the two-dimensional data to three-dimensional data by classifying the acquired two-dimensional data into a dimension of the plurality of character types, wherein the conversion unit is configured to compress the I character types to a dimension that has a lower number than the I character types to generate a compressed three-dimensional data; and a learning unit for analyzing and learning a feature of the three-dimensional data; and the evaluation module comprises: a converting unit for converting input data to three-dimensional input data by classifying the input data to be evaluated into a dimension of the plurality of character types; and an evaluation unit for evaluating the input data by comparing a feature of the three-dimensional input data converted by the conversion unit with the feature learned by the learning unit.
5. The evaluation system of claim 4, wherein the evaluation module is provided on a mail server.
6. The evaluation system of claim 5, wherein the input data is electronic mail data and the evaluation unit is configured to evaluate whether the input data is or is not spam.
7. An evaluating method performed by a computer, the method comprising: acquiring two-dimensional data represented by a plurality of character types, wherein acquiring the two-dimensional data comprises acquiring m rows×n columns of sample data, where m and n are integers; converting the two-dimensional data to three-dimensional data by classifying the acquired two-dimensional data into a dimension of the plurality of character types, wherein converting the two-dimensional data to three-dimensional data comprises extracting I character types from the sample data to convert the sample data to three-dimensional data with m rows×n columns×character types I, where I is an integer; analyzing a feature of the three-dimensional data; and evaluating input data described in the plurality of character types based on the analyzed feature.
8. The method of claim 7, wherein evaluating the input data comprises receiving electronic mail data and determining whether the electronic mail data includes spam.
9. A non-transitory storage medium storing an evaluation program that can be performed by a computer, the evaluation program causing the computer to perform the steps of: acquiring two-dimensional data represented by a plurality of character types, wherein acquiring the two-dimensional data comprises acquiring m rows×n columns of sample data, where m and n are integers; converting the two-dimensional data to three-dimensional data by classifying the acquired two-dimensional data into a dimension of the plurality of character types wherein converting the two-dimensional data to three-dimensional data comprises extracting I character types from the sample data to convert the sample data to three-dimensional data with m rows×n columns x character types I, where I is an integer; analyzing a feature of the three-dimensional data; and evaluating input data described in the plurality of character types based on the analyzed feature.
10. The evaluation device of claim 1, wherein the two-dimensional data is arranged as m rows×n columns of data, where m and n are integers; and wherein the three-dimensional data is arranged as m rows×n columns×I character types.
11. The evaluation device of claim 10, wherein the input data is text data with m rows×n columns.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9) The following reference numerals can be used in conjunction with the drawings: the
(10) 1: mail processing device
(11) 2: learning system
(12) 3: evaluation system
(13) 4: text input unit
(14) 5: data conversion unit
(15) 6: learning unit
(16) 7: learning model
(17) 8: text input unit
(18) 9: data conversion unit
(19) 10: evaluation unit
(20) 11: storage unit
(21) 12: mail
(22) 13: spam
(23) 14: normal mail
(24) 20: text data evaluation device
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
(25) Now, embodiments according to the present invention will be described in detail with reference to the drawings. In a preferred embodiment, an evaluation device according to the present invention may be implemented as a text data evaluation device. In a further preferred embodiment, the evaluation device may be implemented as a mail processing device. The mail processing device may be connected to a network such as the internet or intranet. The mail processing device may receive electronic mail via the network to analyze the received electronic mail to determine whether it is spam or not. The mail processing device may be provided at least with a mail processing function. The mail processing device may be provided with another function implemented by hardware or software. The mail processing device may be another electronic device such as a server, a computer, an electronic device, a terminal device, and a mail delivery server.
(26)
(27) The mail processing device 1 may be composed of, for example, one or more computer devices, or one or more servers. Each function provided in the mail processing device 1 may be implemented by a computer device or a server which are separated. In such case, the computer device and the server may be connected via a network. For example, while the evaluation system 3 inputting the mail 12 is placed in a mail server, the learning system 2 is connected to the evaluation system via a network. Thus, learning results from the learning model 7 may be provided.
(28) The learning system acquires text data used as sample from the data base ii and performs a data conversion according to a certain rule to build a learning model. The data base 11 stores spam acquired by using honeypot technology, normal mail, and text data which should be learned by the learning system 2, etc. The text input unit 4 acquires text data from the data base 11. The data conversion unit 5 converts the text data acquired by the text input unit 4.
(29)
(30) Thus, the data conversion unit 5 converts text data in each dimension of character types of text data to create data “1” in the corresponding positions of row/column. If one text data has the size m rows×n columns and the character type included therein is I, thus the data conversion unit 5 converts the text data to a three-dimensional data format which has m rows×n columns and depth I, as shown in
(31) The data conversion unit 5 may compress the dimension I of character types. For example, the above-described 33 types of symbols may be used as one type of same symbols, or 26 types of a-z may be divided into 7 types such as a-d, e-h, i-l, m-p, q-t, u-x, and y-z to compress the dimension I to 7 types. For the compression, continuous character types are not always grouped to one group. Discrete character types may be grouped to one group and the number of character types included in each of groups may be different. For example, character types which are not used very much may be grouped to one group and compressed, or the dimension I of the character type I may be compressed according to predetermined rule or the degree of importance. The dimension I may be compressed by the way of dimensionality compression such as principal component analysis and autoencoder.
(32)
(33) Then, the data conversion unit 5 performs a data conversion such that text data with m rows×n columns may be scanned every one letter, for example. In this example, characters in m.sup.th row (m=1, 2, 3 . . . , the number of rows of text) are acquired in the column order. All of characters in n.sup.th column (n=1, 2, 3 . . . , the number of columns of text) is acquired followed by moving to the next line. Loop processing is performed again in which characters are acquired in the column order.
(34) The data conversion unit 5 acquires a character at m.sup.th row×n.sup.th column first (S106) and converts the acquired character to data or code recognizable by computer (S108). Then, a dimension I of character types acquired in S106 is acquired according to the table shown in FIG. 5 (S110). A flag “1” is set at m.sup.th row×n.sup.th column in the dimension I (S112). The processes from S106 to S112 are repeated by loop processing.
(35)
(36) The learning unit 6 retrieves three-dimensional data converted by the conversion unit 5 to learn them. Learning means conventionally-used mechanical learning such as deep learning, wherein any number of text data converted by the data conversion unit 5 may be acquired and a feature of the text data is extracted to analyze and classify spam and normal mail.
(37) The evaluation system 3 inputs the mail to be evaluated 12 (text data), performs a data conversion of the mail by using the same algorithm as the learning system 2, and evaluates whether the input mail is spam or normal mail by using the learning model.
(38) The text input unit 8 inputs the mail to be evaluated 12. The timing of the text input unit 8 inputting mail to be evaluated is not specifically determined. For example, mail to be evaluated may be stored such that the text input unit 8 is performed at a timing of a certain number of mail being stored. Or, stored mail may be input at a timing such as every day, every week, and every month, etc. The input may be performed at a timing instructed by a user of the mail processing device 1 from external. The data conversion unit 9 converts text data by using the same algorithm as the above-described data conversion unit 5.
(39) The evaluation unit 10 evaluates text data converted by the data conversion unit 9 based on the learning model provided by the learning system 2. In this embodiment, the input mail 12 is evaluated whether it is spam or normal mail. In the learning model 7, the features for determining spam such as the feature of spam learned by the learning system 2 and the difference from normal mail etc. are modeled. The evaluation unit 10 compares the feature of the mail to be evaluated 12 which is converted by the same algorithm as the learning system 2 with the feature provided by the learning model 7, evaluates if they are match or approximate, and classifies the mail 12 to spam or normal mail 14 based on the evaluation.
(40) In the above-described embodiment, the evaluation for spam or normal mail is performed by way of mail header. However, this is just an example and text data other than mail header may be evaluated. Further, the present invention may be applied to text analyses with a high degree of randomness and freedom, such as data headers, communication commands, communication packets, or a program itself.
(41)
(42) The mail processing device 1 according to an embodiment of the present invention allows for using character strings with a high degree of randomness which are difficult to use in conventional spam removal methods. Thus, spam may be appropriately removed even if spam in which a portion thereof is modified is delivered. Further, a structure of one text data may be retained as structure information as shown in
(43) The preferred embodiments of the present invention are above-described in detail. The present invention is not limited to specific embodiments. Various modifications and alternations are possible within main points of the invention described in claims.