SYSTEM AND METHOD FOR DATA PROCESS
20230052603 · 2023-02-16
Inventors
Cpc classification
G16H15/00
PHYSICS
International classification
G06F16/28
PHYSICS
Abstract
A system for data process comprises an operating platform for storing and reading a data unit. A data processing module signally connected to the operating platform. The data unit is structured or unstructured. The data processing module labeling and processing the data unit, and generating a visualization diagram. The system for data process includes a graphical user interface, which can achieve one of the purposes of this present disclosure of improving the data visualization of structured data and unstructured data.
Claims
1. A system for data process, comprising: an operating platform for storing and reading a data unit; a data processing module, signally connected to the operating platform; and the data unit is structured or unstructured; wherein the data processing module labels and processes the data unit, and generates a visualization diagram.
2. The system of claim 1, further comprises: an unstructured database; a structured database; the operating platform signally connected to the unstructured database and the structured database; wherein at least one of the unstructured database and the structured database storing at least one of the data unit; and the data processing module labeling and processing at least one of the data unit, and generating the visualization diagram.
3. The system of claim 2, wherein the unstructured database stores at least one of the data unit, and the structured database stores at least one another data unit.
4. The system of claim 2, wherein the data processing module further used for automatically labeling the data unit; and when the data unit is unstructured, the data processing module automatically labeling and processing the data unit; wherein the data processing module determines whether to automatically label the at least one data unit according to which of the database the at least one data unit belongs: when the at least one data unit is stored in the unstructured database, the data processing module automatically labels and processes the at least one data unit, and generates the visualization diagram; and when the at least one data unit is stored in the structured database, the data processing module processes and generates the visualization diagram.
5. The system of claim 1, wherein the data processing module is further used to process the at least one data unit and generates a visualizable data set; wherein the data processing module generates the visualization diagram according to the visualizable data set.
6. The system of claim 5, wherein the operating platform further used to import the visualizable data set, which is generated by the data processing module pre-processed the at least one data unit; the data processing module generates the visualization diagram according to the visualizable data set being imported.
7. The system of claim 5, further comprising a visualizing module signally connect to the operating platform and the data processing module; wherein the visualizing module generates the visualization diagram according to the visualizable data set.
8. The system of claim 2, wherein the at least one data unit comprises unstructured data, structured data, semi-structured data, or a combination thereof.
9. The system of claim 8, wherein the structured data includes structured patient data, medication record, drug data, medicine data (pharmaceutical raw materials), doctor's advice data, doctor's data, equipment data, department data, hospitalization data, examination data, shift report, or a combination thereof.
10. The system of claim 8, wherein the unstructured data includes unstructured texts, case history, doctor's advice, patient data, medication record, medicine profile data, drug profile data, doctor's data, equipment data, department data, hospitalization data, examination data, shift report, or a combination thereof.
11. The system of claim 1, the operating platform comprises a visualization diagram editing interface used to configure the generating conditions of the visualization diagram and decide a diagram generating condition; wherein the data processing module processes the at least one data unit and generates the visualization diagram according to the diagram generating condition.
12. The system of claim 11, the diagram generating condition comprises displayed field condition, data screening condition, diagram type, or a combination thereof.
13. The system of claim 11, wherein the data unit comprises a data value, corresponding to a data attribute of the data unit; and the diagram generating condition decide the visualization diagram displays at least one of the data attribute.
14. The system of claim 11, wherein the diagram generating condition comprises: a data screening condition, having: data set condition, classification and clustering condition, data attribute condition, arithmetic unit condition, tagged value condition, or any combination thereof, wherein the data processing module screens the data value according to the diagram generating condition, and processes the data value after being screened to generate the visualization diagram.
15. The system of claim 11, wherein the diagram generating condition comprises: a diagram category, having: data list, data table, basic frequency table, percentage table, co-occurrence matrix, co-occurrence list, population distribution, bar graph, line graph, table, pie graph, histogram, statistical graph, scatter graph, bubble graph, surface graph, radar graph, horizontal bar graph, timeline, organ/body tissue chart, or a combination thereof.
16. The system of claim 11, further comprises: a historical data module signally connected to the data processing module and the operating platform for storing the diagram generating condition, the visualization diagram, or any combination thereof.
17. The system of claim 1, wherein the at least one data unit comprises at least one data value; wherein the operating platform is used to generate a project data set, including at least one project data attribute; wherein the operating platform determines that the at least one project data attribute corresponds to the data attribute, which the at least one data value belongs to; and the data processing module generates the visualization diagram according to the project data set.
18. The system of claim 17, wherein the project data set comprises: a plurality of project data attributes, and the at least one data unit including a plurality of data values; wherein the operating platform determines: one of the plurality of project data attributes corresponds to the data attribute one or more than one of the plurality data values belong to; or more than one of the plurality of data attributes correspond to the data attribute one or more than one of the plurality of data values belong to.
19. The system of claim 1, further comprises a plurality of data units, and any one of the plurality of the data units includes at least one data value; wherein the operating platform is further used to determine at least one relationship key; wherein the relationship key corresponds to one of the plurality of data units and another one of the plurality of data units; and wherein the data processing module further generates the visualization diagram according to the at least one relationship key.
20. The system of claim 19, wherein the at least one relationship key corresponds to the data attribute belonging to the at least one data value of one of the plurality of data units and the data attribute belonging to the at least one data value of the other of the plurality of data units; wherein data processing module processes the plurality of data values and generates the visualization diagram.
21. The system of claim 1, wherein the data unit comprises a plurality of data values, and the data processing module determines at least two of the plurality of data values at least one relevance determination, wherein the relevance determination comprises “greater than”, “equal to”, “less than”, “not equal to”, “greater than or equal to”, “including”, or “less than or equal to”; the data processing module further generates the visualization diagram according to the relevance determination.
22. The system of claim 21, wherein when the at least two of the plurality of data values are corpus data, the relevance determination includes a logical operation judgment, a similar semantic judgment, or a combination thereof.
23. The system of claim 1, wherein the data unit comprises: a patient data set, including a plurality of data values; the data process module processes the patient data set to generate the visualization diagram; wherein any one of the plurality of data values corresponds to at least one data attribute, the data processing module generates the visualization diagram according to the plurality of data values and the data attribute; wherein the visualization comprises representations of the at least one data attribute.
24. The system of claim 23, wherein the data attribute comprises a plurality of time data, a first data attribute, and a second data attribute; one of the plurality of data values corresponds to the first data attribute and one of the plurality of time data; another one of the plurality of data values corresponds to the second attribute and the one of the plurality of time data; wherein the data processing module generates a co-occurrence analysis result according to the one of the plurality of time data, the one of the plurality of data values, and the another one of the plurality of data values; the visualization diagram comprises the co-occurrence analysis result.
25. The system of claim 24, comprises an analysis module signally connected to the operating platform; the analysis module used to process the co-occurrence analysis result, predict the patient data set, and generate a predicted result.
26. The system of claim 24, the co-occurrence result further comprising statistical analysis result, probability analysis result, or a combination thereof.
27. The system of claim 26, the plurality of time data comprises a time record of patient's examination, a time record of patient's examination report being made, a time record of patient's visit a doctor, or a combination thereof.
28. The system of claim 1, further comprising: a labeling database and an unlabelling database; wherein the data processing module comprises a labelled data expansion module; the operating platform signally connected to the labeling database, the unlabeling database and the labelled data expansion module, and comprised a labeling pattern editing interface, and the labeling pattern editing interface used for inputting data units and performing editing operation to generates at least one confirmation labeling pattern; wherein the labelled data expansion module performs operation to generates at least one added labeling data unit according to the at least one confirmation labeling pattern and the unlabeling database, and restores the at least one added labeling data unit to the labeling database; wherein the data processing module used for processing the added labelled data unit, and generating a visualization diagram.
29. A method for data process, a data processing module is used to process one of unstructured data and structured data, to process at least one data unit, and perform the following steps: (A) a step for project creation: an operating platform generates a project data set for accessing the at least one data unit including a plurality of data values; the project data set includes at least one data attribute; (B) a step for semantic labeling decision and labeling process: the data processing module processes the at least one data unit, determines a semantic labeling decision, and outputs a labelled data unit, the semantic labeling decision makes one of the plurality of data values to correspond to the at least one data attribute; when the at least one data unit is unstructured data format, the data processing module performs a step for automatic semantic labeling to complete the semantic labeling decision makes one of the plurality of data values to correspond to the at least one data attribute; (C) a step for data unit storage: the data processing module stores the labelled data unit to the project data set; and (D) a step for process and output: the data processing module processes the project data set to generate a visualization diagram.
30. The method of claim 29, further comprises a plurality of data units, wherein before the step of (D) is executed, the method comprises: (E) a step for determining a relationship key: the data processing module determines a relationship key, and the relationship key determines one of the plurality of data values of one of the plurality of data units corresponding to one of the at least one data attribute of the project data set.
31. The method of claim 29, further comprises: (F) determining visualization diagram generating condition, an operating platform is connected to the data processing module, the operating platform determines a diagram generation condition, and after the data processing module screening the project data set according to the diagram generation condition, executes the step of (D).
32. The method of claim 29, further comprises: (F) a step for screening data units: the operating platform determines a data screening condition, the data processing module performs step (C) and step (D) after screening the data unit according to the data screening condition.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
DETAILED DESCRIPTION
[0028] The term “signally connected” used in the detailed description of the present disclosure (including the scope of the claims) can refer to any direct or indirect connection means. For example, if the text describes that a data processing module signally connected to an operating platform, it should be interpreted as that the data processing module can be directly connected to the operating platform, or the data processing module can be connected by other devices or certain connection means, indirectly connected to the operating platform. In addition, wherever possible, elements/components/steps with the same reference numerals in the drawings and embodiments represent the same or similar parts. Elements/components/steps that use the same reference numerals or use the same terms in different embodiments may refer to related descriptions.
[0029] Refer to
[0038] As aforesaid, when the data units A and B stored (or temporarily stored) in the memory 1021 of the operating platform 101 are unstructured data units, the operating platform 101 allows the system user to choose whether to perform manual labeling the data units A and B by provided GUI on the operating platform 101 or automatic labeling the data A and B by the data processing module 102. When the system user performs manual labeling, the operating platform 101 allows the system user to determine or edit a label of the data unit A and the data unit B.
[0039] This embodiment can be applied to both the data unit A and the data unit B are in a structured data format or an unstructured data format; or it can be applied to the two data units, where one of the data unit A and the data unit B is in the unstructured date format. As a result, one of the effects achieved by this embodiment: system users can easily operate the data unit A and the data unit B, which in different data formats, by the operating platform 101 to achieve the purpose of visualizing data, wherein this embodiment provides a better user experience; also, it is easy for users to use and good for introducing to various industry categories.
[0040] In another embodiment, the data unit A, the data unit B, or both may be a labelled data unit A′ or a labelled data unit B′ after performing “labeling” in advance. In this way, the previously labelled text content can be imported to the data processing module 102 by the operating platform 101 and enable the data processing module 102 to process and generate the visualization diagram 103. Thereby, the data processing module 102 can save labeling time.
[0041] Please refer to
[0042] The visualization module 106 signally connected the data processing module 102, may include, for example, Matplotlib, Pyecharts, Plotly, Bokeh, Seaborn, Python-based visualization plug-ins, R language-based visualization plug-ins, or other database-based visualizations tools.
[0043] The unstructured database 104 is used to store the data unit A in an unstructured data format. The structured database 1051 is used to store the data unit B in the structured data format. Specifically, the database can be a database management system similar to MySQL, MSSQL, Apache Hbase, etc.
[0044] The operating platform 101 can provide a system user to access the data unit A from the unstructured database 104 or the data unit B from the structured database 105.
[0045] The data processing module 102 is used to process the data unit A and the data unit B, and can output a visualizable data set V1. The system user can even import the visualizable data set V1 to the data processing module 102 by the operating platform 101. The visualization module 102 generates a visualization diagram 103 according to the visualization data set V1.
[0046] The visualizable data set V1 includes one or more than one data unit that is able to be visualized by the data processing module 102, for example, data units in a structured data format. In this embodiment, the visualizable data set V1 includes a labelled data unit C′, which is labelled by the data processing module 102 in advance. The data processing module 102 can copy or move the labelled data unit A′ or the labelled data unit B′ to the visualizable data set V1.
[0047] The system user can use the operating platform 101 to enable the data processing module 102 to process the labelled data unit A′, the labelled data unit B′, the labelled data unit C′, or a combination thereof in the visualizable data set V1. Thereby, when the system user operates the operating platform 101, the data processing module 102 can read one or more than one of the plurality of data units in the visualizable data set V1, and generate a visualization diagram 103. To achieve aforesaid purpose, the visualizable data set V1 can be stored in the structured database 1052. The data processing module 102 processes the labelled data unit A′ and the labelled data unit C′ in the visualizable data set V1 to generate the visualization diagram 103.
[0048] The structured database 1052 of this second embodiment can be used to store and read the visualization data set V1. In this way, the data units that have been labelled in advance or the data units that are labelled and processed LP can be stored in the structured database 1052 by the visualizable data set V1. The operating platform 101 can read and import the visualizable data set V1 to the data processing module 102 by the structured database 1052, in order to output the visualization diagram 103. In this way, the operations of a system user can be simplified, and the processing time required by the system can be shortened. In another variation of the second embodiment, the data unit that has been labelled or the data unit that has been labelled and processed LP can be stored as a computer-readable file, such as JSON; it is not a requirement to be stored in a structured database or unstructured the database.
[0049] In another variation of the second embodiment, the structured database 1051 and the structured database 1052 may be the same one.
[0050] Next, please refer to
[0051] Data Value
[0052] In the third embodiment, the data unit A is a structured data format, such as patient data. The data unit B is an unstructured data format, such as an MRI report. The data unit A includes data values a1 to a3, and the data unit B includes data values b1 to b5.
[0053] The data values a1 to a3 can be stored in the data unit A in a structured data format. Specifically, the data value refers to a value stored in a storage field in a data unit, such as a character, a word, a value, a bit value, etc. For example, if the data unit is patient data in a structured data format, the data value can be a record or a tuple in the data unit, such as “name,” “Flora(name),” “ (Medical record number),” “17-L.” In this case, the data value may include multiple characters, numbers, bit values, or symbols. In other embodiments of the present disclosure, the data value after being labelled by the data processing module 102 may be the aforementioned record or tuple including multiple characters, numbers, bit values, or symbols.
[0054] The data values b1 to b5 refer to a character, a number, a bit value, or a symbol in the data unit B that can be recognized by the data processing module 102, for example, Arabic numerals, Chinese characters, English characters, or symbols. For further example, if the data unit is a medical record table in an unstructured data format, the data values can be, for example: “n,” “a,” “m,” “e,” “f,” “l,” “o,” “r,” “a,” “,” “
,” “
,” “
,” “1,” “7,” “-,” “L,” “c,” “a,” “n,” “c,” “e,” “r.” The labelled value a1′ and the labelled value b1′ refer to the words, values, or data of the data unit A and the data unit B after being automatically or manually labelled by the data processing module 102, for example: “flora,” “17-L,” “cancer.”
[0055] Data Attributes
[0056] The data attributes DA1 to DA3 can be “field name” and “label name” in a structured data format. More specifically, for example: “medical record number,” “name,” “gender,” “residence,” “time,” “the findings/diseases,” and “tumor invasion location” in order are the data attributes of the data “17-L,” “Xiao-Ming(name),” “Male,” “Taoyuan City,” “Jul. 15, 2021 at 10:10 AM,” and “Liver Cancer,” “Liver.”
[0057] In some embodiments, the operating platform 101 can import the data value a1 or the data value b1 of the data unit A or the data unit B to the data processing module 102. The system user can manually label the data value a1 corresponding to a data attribute DA1 through the operating platform 101, and then store in the project data set as the data value a1′. For example, the data unit A in the structured data format includes data values: “job occupation”, “office worker”. Use symbols such as “#”, “$”, “%”, or other characters to label before the data value a1, for example: “#job occupation”. In this way, the data processing module 102 can identify “job occupation” as a data attribute.
[0058] Specifically, in some other embodiments, the operating platform 101 may further define a data attribute as a semantic labeling decision. That is, a data attribute tag is a “semantic tag” in the natural language processing. After labeling up some unstructured data formats, the unstructured data formats can be labeled with semantic tags through the data processing module.
[0059] In some embodiments, the data attribute can further provide the data processing module to be used for statistical analysis of clustering or classification. For example, the data attributes can be “residence”, “marital status”, “education”, “age”, etc., which are usually used as data attributes for clustering or classification in statistical analysis. In this way, the visualization diagram can present a diagram showing the classifications or the clusterings.
[0060] In some embodiments, as shown in
[0061] In some embodiments, the data attributes may be used by the system for data processing as a basis for classification (for clustering or classification). For example, the data attribute may be “positive comment”, “symptoms”, “unpopular”, “region”, or other similar classifications or clusterings. In this way, the visualization diagram can present a diagram of classification or clustering. In other embodiments, a clustering or a classification may further include one or more clusterings or classifications. That is, the clustering or the classification can include one or more superordinate categories, and the one or more superordinate categories include one or more sub-categories. For example, the superordinate category may be “Northern of Taiwan”; the sub-categories may be “Taipei City”, “New Taipei City”, etc. To further explain, that is, each data attribute can include one or more data attributes.
[0062] Project Data Set [0063] The operating platform 101 can edit the project data set PR. Specifically, the project data set PR allows a system user to establish the correct data attribute corresponding to the data unit through the operating platform 101. The project data set PR can create data attributes DA1 and DA2 in advance; or based on the data attribute DA1 and data attribute DA2 provided in the imported data unit A, it is automatically created in the project data set PR; alternatively, create the corresponding data attribute DA2 to the project data set PR according to the data attribute which is determined by the data processing module 102 after it labeling the data unit B. For example: [0064] The data attribute DA1 is “medical record number.” [0065] The data attribute DA2 is “discovery.” [0066] Data attribute DA3 is “age.” [0067] The data value a1 is “17-L.” [0068] The data value a2 is “cervix cancer.” [0069] The data value a3 is “36”. [0070] The data values b1-b5 are “m”, “y”, “o”, “m”, and “a” respectively. The data processing module 102 labels the data values b1-b5 as “the findings/diseases.” [0071] The labelled value a1′ is “17-L”. The labelled value b1′ is “myoma”, and the data processing module 102 can store the labelled value b1′ of the “the finding/diseases” in the project data set PR corresponding to the data attribute DA2.
[0072] In some embodiments, the data processing module 102 can identify that the data unit A or the data unit B is an unstructured data format or a structured data format.
[0073] Then please refer to
[0095] The historical data module 107 is used to store the diagram generating condition 1013, the visualization diagram 103, or a combination thereof. In detail, the historical data module 107 can store the data screening condition 1015, diagram category 1017, and displayed field condition 1019 determined by the system user to the historical data module 107.
[0096] Next please refer to
[0097] In addition, the present embodiment differs from others in that both data unit A and data unit B are in a structured data format; or, data unit B has been labelled by the data processing module 102 and can be identified as a structured data format. In other words, the data value b1 and the data value b2 of the data unit B are corresponding to fields or data attributes.
[0098] Next, please refer to
[0099] Wherein, the patient data set P1 includes data values a1-a2. The data value a1 corresponds to the first data attribute D3, and the data value a2 corresponds to the second data attribute D4. The data value a1 and the data value a2 respectively corresponds to the time data T1 and the time data T2. Time data T1-T2 can refer to the time recorded by data values a1-a2. For example, the data value a1 is “cervical cancer,” the time data T1 is “May 11, 2021,” and the first data attribute is “findings/diseases.”
[0100] In this variation, the patient data set P1 can be used to analyze the condition of a patient at different points in time. For example, a patient's “findings/diseases” continue to appear “cervical cancer”, and a certain day “Uterine Fibroids” has been added to “findings/diseases.” The above purpose can be accomplished by this embodiment, and please refer to
[0101] The data value a1 corresponding to the first data attribute D3 is labelled as “cervical cancer.” The time data T1 corresponding to the first data attribute D3 is “May 11, 2020.” The data value a2 corresponding to the second data attribute D4 is labelled as “Uterine Fibroids,” and the time data T2 corresponding to the second data attribute D4 is “Dec. 11, 2020.” In this way, when the data processing module 102 labels and processes the data unit A, a visualization diagram 103 can be generated. Please refer to the visualization diagram 103 in
[0102] Another example is the following scenario: a patient's “findings/diseases” is “Uterine Fibroids,” and the change in “size” of the uterine fibroids is observed during a continuous observation. Similarly, as in the mentioned embodiments, the first data attribute D3 and the second data attribute D4 of the patient data set P1 are “findings/diseases.” The data values a1-a2 are numerical data. Data values a1-a2 correspond to data time T1-T2 respectively. When the data processing module 102 labels the data value, it can be identified as numerical data.
[0103] In order to make the data processing module 102 to identify a numerical data, the data values a1-a2 may be manually labelled by the operating platform 101, or may be automatically labelled by the data processing module 102. To further explain, one may label at the front of the data value a1 by using symbols such as “#,” “$,” “%,” or other characters, for example: “#job occupation.” In this way, the data processing module 102 can identify “job occupation” as a data attribute.
[0104] Please refer to
[0105] Please refer to
[0106] The so-called co-occurrence analysis may include analyzing data units using Pearson Product-Moment Correlation Coefficient (PPMCC); in some embodiments, Mutual Information (MI) analysis may also be applied to the data unit.
[0107] Please refer to
[0108] To add further supplemented explanation, by establishing a patient data set or establishing a relationship key; by establishing the relationship key, specifically, one or more data values of multiple data units can be corresponding to the same patient (corresponding to the same primary key). In this way, a particular patient can be analyzed. For instance, if one or more data values are presented on the visualization diagram as a kind of broken line graph, the point of the end point of the broken line graph can represent a data value in a patient data set P1. The system user can learn the data values, the summary or information of the data attributes stored in other data units of the patient by clicking on the point of the end point.
[0109] Please refer to
[0110] Please refer to
[0111] Next please refer to
[0112] The correlation judgment may further include logical operation. Specifically, for instance, the labelled data values of data unit A and data unit B are all numerical values, which can determine whether the data values to be used to generate the visualization diagram are redundant or conflicting. For example, a1=30, b1=30, the correlation judgment R1 can be “intersection.” In this way, the data unit A and the data unit B can be combined, and used for the data processing module 102 to generate the visualization diagram 103.
[0113] In some embodiments, the data unit being processed as the project data set is visualized by the data processing module. The project data set defines the data attributes by the operating platform, which the data attributes are processed and generated by the data processing module. In this way, by using the correlation judgment R1 to judge the relationship of the data values between each other, the data processing module can be used to copy, move, or delete the data values of the data unit and the data values of the project data set.
[0114] Next please refer to
[0115] The labelled database 2025 is used to store labelled data units, and the unlabelled database 2026 is used to store unlabelled data units.
[0116] The operating platform 201 includes a labeling pattern editing interface 2011 and a data labeling prediction interface 2015, and the labeling pattern editing interface 2011 can be used to input data and perform editing operations to generate at least one confirmation labeling pattern 2013. The operating platform 201 of this embodiment is provided for system users to input data and perform editing, adding, and deleting. In addition, the operating platform 201 can input data and perform editing, adding, and deleting by an application program interface (API). The data labeling prediction interface 2015 can be used to input data and display forecast results. The automatic data labeling module 2021 of this embodiment can be configured to perform data labeling prediction.
[0117] The labelled data expansion module 2022 performs operations to generate at least one added labeling data unit 2027 according to at least one confirmation labeling pattern 2013 and unlabelled database 2026, and stores the at least one added labeling data unit 2027 in the labelled data database 2025 in order to expand the labelled database 2025. The data processing module 202 is used to process the added labeling data unit 2027 and generate a visualization diagram 203.
[0118] Please further refer to
[0119] After obtaining the at least one confirmation labeling pattern 2013, perform step S23, and the labelled data expansion module 2022 stores the at least one confirmation labeling pattern 2013 in a labeling pattern data set 2024. In this embodiment, labelled data expansion module 2022 first performs a test according to the at least one confirmation labeling pattern 2013 and the labeling pattern in the labeling pattern data set 2024; then, after the test is confirmed, the at least one confirmation labeling pattern 2013 is stored and updated to the labeling pattern data set 2024. The labelled data expansion module 2022 tests whether there is a repetition or a conflict between the at least one confirmation labeling pattern 2013 and the data unit in the labeling pattern data set 2024. In this way, the repetition or the conflict between the labeling patterns can be removed. In other embodiments, the expansion unit 2023 may also perform the aforementioned test.
[0120] Then perform step S31, the expansion unit 2023 of the labelled data expansion module 2022 executes labeling algorithm 2028 according to the labelling pattern data set 2024 and an unlabelled database 2026 to generate at least one added labeling data unit 2027; and perform step S4 to store the at least one added labeling data unit 2027 in a labelled database 2025. Specifically, in step S31 of this embodiment, the expansion unit 2023 is based on the at least one confirmation labeling pattern 2013 in the labeling pattern data set 2024 and the labeling patterns that has been stored in the labeling pattern data set 2024, to execute the labeling algorithm 2028 on the data unit in the unlabelled database 2026 in order to label the data unit in the unlabelled database 2026, and the at least one added labeling data unit 2027 is generated. After performing step S31, the expansion unit 2023 stores the generated added labeling data unit 2027 in the labelled database 2025 to expand the labelled database 2025. The labeling algorithm 2028 of this embodiment may be a string searching algorithm or a maximum matching algorithm.
[0121] In the case that there is no pre-existing data unit in the labeling pattern data set 2024, step S23 of this embodiment may optionally not be performed. In this case, in step S31, the expansion unit 2023 labels the data unit in the unlabelled data base 2026 according to the at least one confirmation labeling pattern 2013 generated in the step S22.
[0122] The method for labeling data unit of the data processing module 202 in this embodiment can further perform step S51, that the data labeling prediction interface 2015 of the operating platform 201 receives an unlabelled data unit; and step S52 is performed, that an automatic data labeling module 2021 performs data labeling prediction on the unlabelled data according to the labelled database 2025, and transmits the prediction result corresponding to the unlabelled data to the operating platform 201. The automatic data labeling module 2021 of this embodiment may execute algorithms such as Recurrent Neural Network, Conditional Random Field, and Maximum-Entropy Markov Model. Then, the operating platform 201 displays the prediction result corresponding to the unlabelled data.
[0123] The system for data process 20 shown in
[0124] Next, please refer to
[0125] Perform step S102 of semantic labeling decision and labeling processing. The data processing module 102 processes at least one data unit A and determines a semantic labeling decision 3021, and outputs a labelled data unit A′. The semantic labeling decision 3021 makes one of the plural data values a1-a3 correspond to at least one data attribute DA1. When the at least one data unit A is in an unstructured data format, the data processing module 302 performs an automatic semantic labeling step to complete the semantic labeling decision 3021 so that the data value a1 corresponds to the data attribute DA1. Perform step S103 of data unit storage. The data processing module 302 stores the labelled data unit A′ to the project data set PR. Perform step S104 of processing and outputting. The data processing module 302 processes the project data set PR to generate a visualization diagram 303.
[0126] Please refer to
[0127] Please refer to
[0128] Please refer to
[0129] The above are only preferred embodiments of the present invention, and are not used to limit the scope of embodiment of the present invention; anyone who is familiar with this technique and makes equal changes and modifications without departing from the spirit and scope of the present invention shall be covered by the claims of this disclosure.
[0130] To sum up, the present disclosure obtains the patent requirements “utility,” “novelty” and “non-obviousness”; the applicant filed an application for a patent before the Patent Office in accordance with the provisions of the Patent Law.