DATA EXPLORATION MANAGEMENT METHOD AND SYSTEM, ELECTRONIC DEVICE, AND STORAGE MEDIUM
20200167352 ยท 2020-05-28
Inventors
Cpc classification
G06F16/2448
PHYSICS
International classification
Abstract
Disclosed is a data exploration management method, including: acquiring data input by a user, where the data includes data content and an exploration variable; acquiring a pre-stored flow selected by the user, where the pre-stored flow is used to perform data exploration on the data; acquiring an operation, a method, and flow program code of the pre-stored flow, and generating and storing output program code; running the output program code, and acquiring and storing a running result; and displaying the pre-stored flow, the output program code, and the running result. According to the foregoing method, scientific management on a data exploration process ensures that the data exploration process is repeatable, and that an operation and a method of the data exploration process can be invoked, shared, and reused in real time.
Claims
1. A data exploration management method, comprising the following steps: data acquisition: acquiring data input by a user, wherein the data comprises data content and an exploration variable; pre-stored flow selection: acquiring a pre-stored flow selected by the user, wherein the pre-stored flow is used to perform data exploration on the data; program code generation: acquiring an operation, a method, and flow program code of the pre-stored flow, and generating and storing output program code; and program code running: running the output program code, and acquiring and storing a running result.
2. The data exploration management method according to claim 1, further comprising the following step: result display: displaying the pre-stored flow, the output program code, and the running result.
3. The data exploration management method according to claim 1, wherein the data content comprises a database, a data table, and a data file.
4. The data exploration management method according to claim 1, wherein the pre-stored flow comprises a node, a path, the method, and the flow program code, the node and the path constitute the operation, the method comprises a pre-stored method, and the flow program code is used to execute the pre-stored flow.
5. The data exploration management method according to claim 4, wherein the pre-stored method comprises a statistical method and method program code, and the method program code is used to execute the pre-stored method.
6. The data exploration management method according to claim 5, wherein the flow program code invokes the method program code to generate the output program code.
7. A data exploration management system, comprising: a pre-stored method module, a pre-stored flow module, a data acquisition module, a flow selection module, a program code generation module, a program code running module, and a result display module, wherein the pre-stored method module is connected to the pre-stored flow module, the pre-stored flow module and the data acquisition module are connected to the flow selection module, the flow selection module is connected to the program code generation module, the program code generation module is connected to the program code running module, and the program code running module is connected to the result display module; and the data acquisition module acquires data input by a user, the flow selection module acquires a pre-stored flow in the pre-stored flow module based on the data, the pre-stored flow module acquires a pre-stored method and method program code in the pre-stored method module based on the pre-stored flow, the flow selection module acquirers a pre-stored flow selected by the user, the program code generation module generates and stores output program code, the program code running module runs the output program code and acquires and stores a running result, and the result display module displays the pre-stored flow, the output program code, and the running result.
8. An electronic device, comprising: a processor, a memory, and a program, wherein the program is stored in the memory and is configured to be executed by the processor, and the program is used to execute the method according to claim 1.
9. A computer-readable storage medium with a computer program stored, wherein the computer program is executed by a processor to implement the method according to claim 1.
10. An electronic device, comprising: a processor, a memory, and a program, wherein the program is stored in the memory and is configured to be executed by the processor, and the program is used to execute the method according to claim 2.
11. An electronic device, comprising: a processor, a memory, and a program, wherein the program is stored in the memory and is configured to be executed by the processor, and the program is used to execute the method according to claim 3.
12. An electronic device, comprising: a processor, a memory, and a program, wherein the program is stored in the memory and is configured to be executed by the processor, and the program is used to execute the method according to claim 4.
13. An electronic device, comprising: a processor, a memory, and a program, wherein the program is stored in the memory and is configured to be executed by the processor, and the program is used to execute the method according to claim 5.
14. An electronic device, comprising: a processor, a memory, and a program, wherein the program is stored in the memory and is configured to be executed by the processor, and the program is used to execute the method according to claim 6.
15. A computer-readable storage medium with a computer program stored, wherein the computer program is executed by a processor to implement the method according to claim 2.
16. A computer-readable storage medium with a computer program stored, wherein the computer program is executed by a processor to implement the method according to claim 3.
17. A computer-readable storage medium with a computer program stored, wherein the computer program is executed by a processor to implement the method according to claim 4.
18. A computer-readable storage medium with a computer program stored, wherein the computer program is executed by a processor to implement the method according to claim 5.
19. A computer-readable storage medium with a computer program stored, wherein the computer program is executed by a processor to implement the method according to claim 6.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0031] The accompanying drawings described here are provided for further understanding of the present invention, and constitute a part of the application. The exemplary embodiments and illustrations thereof of the present invention are intended to explain the present invention, but do not constitute inappropriate limitations to the present invention. In the accompanying drawings:
[0032]
[0033]
[0034]
[0035]
[0036]
DETAILED DESCRIPTION
[0037] The present invention is further described below with reference to the accompanying drawings and specific implementations. It should be noted that the embodiments or technical features described below can be arbitrarily combined to form a new embodiment provided that no conflict occurs.
[0038] As shown in
[0039] Data acquisition: acquire data input by a user, where the data includes data content and an exploration variable;
[0040] pre-stored flow selection: acquire a pre-stored flow selected by the user, where the pre-stored flow is used to perform data exploration on the data;
[0041] program code generation: acquire an operation, a method, and flow program code of the pre-stored flow, and generate and store output program code;
[0042] program code running: run the output program code, and acquire and store a running result; and
[0043] result display: display the pre-stored flow, the output program code, and the miming result.
[0044] The data content includes a database, a data table, and a data file. In addition, the data content is invoked by the program code.
[0045] The pre-stored flow selection is described as follows: A system provides a flow selection interface for the user to select a to-be-executed pre-stored flow; and in this way, the pre-stored flow selected by the user is acquired, where the pre-stored flow is used to perform data exploration on the data.
[0046] The pre-stored flow includes a node, a path, the method, and the flow program code. The node and the path constitute the operation. The method includes a pre-stored method. The flow program code is used to execute the pre-stored flow. The pre-stored method includes a statistical method and method program code. The method program code is used to execute the pre-stored method.
[0047] The flow program code invokes the method program code to generate the output program code.
[0048] The step of pre-stored flow selection and the step of program code generation constitute a standardized exploration process. One data exploration requires a plurality of flows, and a plurality of methods are used in each flow. As shown in
[0049] A first embodiment of the present invention is applied to scientific research data exploration. As shown in
[0050] Data acquisition: acquire systolic and diastolic pressure data content of a hypertensive patient group and a normal control group and an exploration variable that are input by a user;
[0051] pre-stored flow selection: acquire a two-sample mean comparison flow selected by the user, where the flow is used to perform data exploration on the data;
[0052] program code generation: acquire an operation, a method, and flow program code of the two-sample mean comparison flow, and generate and store corresponding output program code;
[0053] program code running: run the output program code of the two-sample mean comparison flow, and acquire and store a corresponding running result; and
[0054] result display: display the two-sample mean comparison flow, the corresponding output program code, and the corresponding running result.
[0055] As shown in
[0056] Normality test: determine whether a sample quantity of the data content is greater than 5000. If yes, perform a Kolmogorov-Smirnov test on the data content, and output a normality test result; otherwise, perform a Shapiro-Wilk test on the data content, and output a normality test result.
[0057] Homoscedasticity test: If the normality test result is that the data content conforms to a normal distribution, perform an F-test on the data content, and output a homoscedasticity result; otherwise, perform a Wilcoxon rank-sum test on the data content.
[0058] Mean comparison: If the homoscedasticity result is that the data content conforms to homoscedasticity, perform a T-test on the data content; otherwise, perform a Welch's approximate t-test on the data content.
[0059] In conclusion, for the two-sample mean comparison flow, the T-test is used when the systolic and diastolic pressure data of the hypertensive patient group and the normal control group conforms to the normal distribution and homoscedasticity; the Welch's approximate t-test is used when the data conforms to the normal distribution but does not conform to the homoscedasticity; and the Wilcoxon rank-sum test is used when the data does not conform to the normal distribution.
[0060] In conclusion, with reference to the foregoing description, in this embodiment, pre-stored methods used in the two-sample mean comparison flow include: the Kolmogorov-Smirnov test, the Shapiro-Wilk test, the Wilcoxon rank-sum test, the F-test, the Welch's approximate t-test, and the T-test.
[0061] In this embodiment, for ease of description, only the two-sample mean comparison flow is described as an example. In an actual exploration process, many flows need to be performed for exploring the hypertension research data. For example, a two-sample rate comparison flow is selected; an operation, a method, and flow program code of the two-sample rate comparison flow are acquired, and corresponding output program code is generated and stored; the corresponding output program code is run, and a corresponding running result is acquired and stored; and the two-sample rate comparison flow, the corresponding output program code, and the corresponding running result are displayed for comparing whether there is a difference in rate between the hypertensive patient group and the normal control group.
[0062] A second embodiment of the present invention is applied to health data exploration. On the basis of the first embodiment, the flow, the method, and the like for exploring the hypertension research data in the first embodiment are changed to a corresponding flow and method for exploring health data, thereby facilitating a user in doing research on health data exploration.
[0063] A third embodiment of the present invention is applied to education data exploration. On the basis of the first embodiment, the flow, the method, and the like for exploring the hypertension research data in the first embodiment are changed to a corresponding flow and method for exploring education data, thereby facilitating a user in doing research on education data exploration.
[0064] A data exploration management system includes a pre-stored method module, a pre-stored flow module, a data acquisition module, a flow selection module, a program code generation module, a program code miming module, and a result display module. The pre-stored method module is connected to the pre-stored flow module. The pre-stored flow module and the data acquisition module are connected to the flow selection module. The flow selection module is connected to the program code generation module. The program code generation module is connected to the program code running module. The program code running module is connected to the result display module.
[0065] The data acquisition module acquires data input by a user. The flow selection module acquires a pre-stored flow in the pre-stored flow module based on the data. The pre-stored flow module acquires a pre-stored method and method program code in the pre-stored method module based on the pre-stored flow. The flow selection module acquirers a pre-stored flow selected by the user. The program code generation module generates and stores output program code. The program code running module runs the output program code, and acquires and stores a running result. The result display module displays the pre-stored flow, the output program code, and the miming result.
[0066] An electronic device includes a processor, a memory, and a program. The program is stored in the memory and is configured to be executed by the processor. The program is used to execute the foregoing data exploration management method. A computer-readable storage medium stores a computer program. The computer program is executed by a processor to implement the foregoing data exploration management method.
[0067] According to the present invention, the pre-stored flow is selected; the operation, the method, and the flow program code of the pre-stored flow are acquired, and the output program code is generated and stored; the output program code is run, and the miming result is acquired and stored; and the pre-stored flow, the output program code, and the running result are displayed. In this way, scientific management is implemented on the data exploration process, thereby ensuring that the data exploration process is repeatable, and that the operation and the method of the data exploration process can be invoked, shared, and reused in real time.
[0068] The foregoing implementations are merely preferred implementations of the present invention, and the protection scope of the present invention cannot be limited thereto. Any insubstantial changes and substitutions made by a person skilled in the art on the basis of the present invention fall within the protection scope claimed by the present invention.