SYSTEM AND METHOD FOR DATA AND DATA PROCESSING MANAGEMENT
20220358135 · 2022-11-10
Assignee
Inventors
Cpc classification
International classification
Abstract
Systems and methods described herein involve a meta-graph management configured to link external data source to another external data mart through a data management platform which can involve managing characteristics of one or more tables of the data source and the data mart and a temporary table based on columns, managing characteristics of one or more Input data and Output data of data processing from the data source to the data mart based on columns; managing relationships of characteristics between data and data processing for the data source and the data mart based on the columns; managing one or more data flows between the data source and the data mart that include data, data processing, and relationships; and providing data, data processing, and relationships between the data source and the data mart for each data flow.
Claims
1. A method for a meta-graph management configured to link external data source to another external data mart through a data management platform, the method comprising: managing, by a processor, characteristics of one or more tables of the data source and the data mart and a temporary table based on columns; managing, by a processor, characteristics of one or more input data and output data of data processing from the data source to the data mart based on columns; managing, by the processor, relationships of characteristics between data and data processing for the data source and the data mart based on the columns; managing, by a processor one or more data flows between the data source and the data mart that include data, data processing, and relationships; providing, by the processor, data, data processing, and relationships between the data source and the data mart for each data flow; managing, by the processor, for each component in the data management platform, usage information, cost, estimate, and statistics based on execution logs associated with the each component; and providing, by a processor, an interface configured to provide the usage information, the cost the estimate, and the statistics for the each component.
2. The method of claim 1, further comprising, creating, by the processor, the one or more data flows based on a data search from the data mart to the data source and from the data source to the data mart; and providing, by the processor, the one or more data flows and usage records for the each component in the data management platform.
3. The method of claim 2, wherein the creating the one or more data flows based on the data search comprises searching execution logs of components on the data management platform to determine the one or more data flows.
4. The method of claim 3, wherein for the searching of the execution logs comprises retrieving, from execution logs corresponding to target data associated with the data search, the one or more data flows related to target data associated with the data search.
5. (canceled)
6. The method of claim 1, further comprising: creating, by the processor, isolated data spaces for each of the one or more data flows; and for execution of a data flow from the one or more dataflows, executing, by the processor, the data flow through using an associated one of the isolated data spaces.
7. The method of claim 1, further comprising, for the data processing being enabled for data processing duplication and for the each data flow being duplicable, duplicating, by a processor, the data processing.
8. The method of claim 1, further comprising, for the each data flow being incomplete, not executing the data flow.
9. The method of claim 1, further comprising adding, by the processor, event definitions based on an autorun property.
10. The method of claim 1, further comprising, for other data sources being similar to the data source, recommending, by the processor, the data processing used in the data flow between data source and the data mart for the other data sources; and managing, by the processor, a plurality of properties for the recommended data processing for the other data sources.
11. A non-transitory computer readable medium, storing instructions for execution by one or more processors for a meta-graph management configured to link external data source to another external data mart through a data management platform, the instructions comprising: managing, by a processor, characteristics of one or more tables of the data source and the data mart and a temporary table based on columns; managing, by a processor, characteristics of one or more input data and output data of data processing from the data source to the data mart based on columns; managing, by a processor, relationships of characteristics between data and data processing for the data source and the data mart based on the columns; managing, by the processor, one or more data flows between the data source and the data mart that include data, data processing, and relationships; providing, by the processor, data, data processing, and relationships between the data source and the data mart for each data flow; managing, by the processor, for each component in the data management platform, usage information, cost, estimate, and statistics based on execution logs associated with the each component; and providing, by a processor, an interface configured to provide the usage information, the cost the estimate, and the statistics for the each component.
12. The non-transitory computer readable medium of claim 11, the instructions further comprising, creating, by the processor, the one or more data flows based on a data search from the data mart to the data source and from the data source to the data mart; and providing, by the processor, the one or more data flows and usage records for the each component in the data management platform.
13. The non-transitory computer readable medium of claim 12, wherein the creating the one or more data flows based on the data search comprises searching execution logs of components on the data management platform to determine the one or more data flows.
14. The non-transitory computer readable medium of claim 13, wherein for the searching of the execution logs comprises retrieving, from execution logs corresponding to target data associated with the data search, the one or more data flows related to target data associated with the data search.
15. (canceled)
16. The non-transitory computer readable medium of claim 11, the instructions further comprising: creating, by the processor, isolated data spaces for each of the one or more data flows; and for execution of a data flow from the one or more dataflows, executing, by the processor, the data flow through using an associated one of the isolated data spaces.
17. The non-transitory computer readable medium of claim 11, the instructions further comprising, for the data processing being enabled for data processing duplication and for the each data flow being duplicable, duplicating, by the processor the data processing.
18. The non-transitory computer readable medium of claim 11, the instructions further comprising, for the each data flow being incomplete, not executing the data flow.
19. The non-transitory computer readable medium of claim 11, further comprising adding, by the processor, event definitions based on an autorun property.
20. The non-transitory computer readable medium of claim 11, further comprising, for other data sources being similar to the data source, recommending, by the processor, the data processing used in the data flow between data source and the data mart for the other data sources; and managing, by the processor, a plurality of properties for the recommended data processing for the other data sources.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
DETAILED DESCRIPTION
[0043] The following detailed description provides details of the figures and embodiments of the present application. Reference numerals and descriptions of redundant elements between figures are omitted for clarity. Terms used throughout the description are provided as examples and are not intended to be limiting. For example, the use of the term “automatic” may involve fully automatic or semi-automatic implementations involving user or administrator control over certain aspects of the implementation, depending on the desired implementation of one of ordinary skill in the art practicing implementations of the present application. Selection can be conducted by a user through a user interface or other input means, or can be implemented through a desired algorithm. Embodiments as described herein can be utilized either singularly or in combination and the functionality of the embodiments can be implemented through any means according to the desired implementations.
[0044]
[0045] The IoT insurer desires to scale the business and increase customers, and therefore needs to reach out to potential customers. Even if the IoT insurer wishes to search for potential customers, they do not have access to any relevant data to determine potential customers. If the IoT insurer wishes to provide an insurance premium rate from the data for potential customers, the IoT insurer may not understand what data processing techniques to use for the new customers while desiring to use present data processing for the new customers. Similarly, a factory owner may desire to sign up for IoT insurance and may not know what IoT insurance applies to his data, how to reach the IoT insurance services, what data processing is needed to obtain IoT insurance, and the costs of the IoT insurance.
[0046]
[0047] Meta-graph storage 220 can involve data processing 221, table 222, knowledge graph 223, search log 224, autorun configuration 225, various metadata such as data processing metadata 226, table metadata 227, relationship metadata 228, and public metadata 229, as well as execution configuration 230 and execution log 231. Further details of these elements are described with respect to the implementations herein.
[0048]
[0049]
[0050] In a second scenario 401, users create a data mart from data sources. In this scenario, user defines the data flow and executes the data flow to get a data mart as illustrated in the data flow 420 at
[0051] In a third and fourth scenario 402, users discover data marts from a data source and users clarify the missing relationships and get a support to create the missing node. In the third scenario as illustrated in
[0052]
[0053]
[0054]
[0055]
[0056] In the following examples from
[0057]
[0058]
[0059]
[0060] At 1105, the data source search engine executes a search for a data flow. At 1106, the data source search engine searches for relationships of table or data processing “output” based on the table, or it searches for relationships of table or data processing “output” based on the data processing “input”. The data source engine does not only extract exact matches, but can also be modified to extract similar relationships through the use of machine learning (e.g., topic modeling, clustering, etc.) in accordance with the desired implementation.
[0061] At 1107, the data source search engine determines if the data flow is an infinite loop, if the data flow depth over the limit, or if the data flow execution time over the limit. If not (No), the flow proceeds to 1108, otherwise (Yes) the flow proceeds to 1109. At 1108, the data source search engine selects the next component to process based on a depth-first search approach. If there is a component to process (Yes), then the flow proceeds to 1106 to process the component, otherwise (No), the flow proceeds to 1109.
[0062] At 1109 if a data flow was found, then the process proceeds to 1110 to save the data flow in the search log. At 1111, if there is an additional data flow to be found (Yes), then the process repeats at 1106, otherwise (No), the process ends.
[0063]
[0064] In an example implementation, the estimated cost for data processing can be automatically calculated based on a selection of an execution target using execution logs. In this example, the user selects an execution target at 1202. Based on the selection, a calculation and estimation of the cost is conducted at 1203, with the results as shown for the data fee and the processing fee.
[0065] In the example of
[0066]
[0067]
[0068]
[0069] In the example of
[0070]
[0071] In the following explanations for
[0072]
[0073] In the example of
[0074] Further, the viewer calculates a verified rate of data flow components and a reuse rate as illustrated in
[0075]
[0076] At 1800, a determination is made as to whether “Enable Execution Log” is set to Yes. If so, (Yes), then the flow proceeds to 1801, otherwise (No), the flow proceeds to 1802. At 1801, the data flow execution engine creates a log directory in execution log for the data flow. At 1802, the data flow execution engine creates new tables to store execution results based on the data flow. There can be data conflicts when applications use the same table, so the data flow execution engine creates new tables to avoid such problems at 1802. At 1803, a determination is made as to whether “Data Processing Duplication” is Yes? AND “Duplicationable” is Yes in Data Processing Property. If so (Yes), then the data flow execution engine proceeds to 1804 to duplicate the data processing of the data flow to avoid data conflict and security risk. Otherwise (No), the data flow utilizes the original data processing managed by another user.
[0077] At 1805, the data flow execution engine creates relationships between the tables and the data processing. The engine creates and saves the data flow in the Execution Config and executes the data flow. Further, if “Enable Execution Log” is Yes, the data flow execution engine archives the log for each component.
[0078]
[0079]
[0080]
[0081]
[0082]
[0083] In the following explanations for
[0084]
[0085]
[0086] At first, a user defines a search condition to search for data marts at 2500. At 2501, a determination is made as to whether Execution Log is enabled in the search condition and the root table name is in the execution log. If so (Yes), the flow proceeds to 2502 wherein the data mart search engine searches for relationships of table and data processing from data marts to the root table using execution logs. Otherwise (No), the data mart search engine searches for relationships of table and data processing from the root table to data marts.
[0087] At 2504, the data mart search engine starts a loop to search for a data flow. At 2505, the data mart search engine searches for relationships of table or data processing “input” based on the table, or it searches for relationships of table or data processing “input” based on the data processing “output”.
[0088] At 2506, a determination is made as to whether the data flow is an infinite loop, the data flow depth is over the limit, or if the data flow execution time is over the limit. If so (Yes), the flow proceeds to 2508, otherwise (No) the flow proceeds to 2507.
[0089] At 2507, a determination is made as to whether there is a next component to process. If so (Yes), then the flow proceeds to 2505, otherwise (No) the flow proceeds to 2508.
[0090] At 2508, a determination is made as to whether the data mart search engine has found a data flow. If so (Yes), then the flow proceeds to 2509 to save the data flow in the search log, otherwise (No) the flow proceeds to 2510.
[0091] At 2510, a determination is made as to whether the data mart search engine has a next data flow to process. If so (Yes), then the flow proceeds back to 2504, otherwise (No), the flow ends.
[0092] In the following example from
[0093]
[0094] Specifically, the data flow recommendation engine recommends a data processing to connect between tables. The data flow recommendation engine searches for a triangle relationship that contain a relationship of “table A-similar.fwdarw.table B-input.fwdarw.data processing C-output.fwdarw.table D”. If such a relationship is detected, the data flow recommendation engine recommends a data processing to connect table A and table D, and indicates that the recommended data processing and data processing C are similar.
[0095] In the example of
[0096]
[0097]
[0098] To execute the triangle relationships detection as illustrated in
[0099]
[0100] In the example of
[0101]
[0102]
[0103]
[0104] Computer device 3105 in computing environment 3100 can include one or more processing units, cores, or processors 3110, memory 3115 (e.g., RAM, ROM, and/or the like), internal storage 3120 (e.g., magnetic, optical, solid state storage, and/or organic), and/or I/O interface 3125, any of which can be coupled on a communication mechanism or bus 3130 for communicating information or embedded in the computer device 3105. I/O interface 3125 is also configured to receive images from cameras or provide images to projectors or displays, depending on the desired implementation.
[0105] Computer device 3105 can be communicatively coupled to input/user interface 3135 and output device/interface 3140. Either one or both of input/user interface 3135 and output device/interface 3140 can be a wired or wireless interface and can be detachable. Input/user interface 3135 may include any device, component, sensor, or interface, physical or virtual, that can be used to provide input (e.g., buttons, touch-screen interface, keyboard, a pointing/cursor control, microphone, camera, braille, motion sensor, optical reader, and/or the like). Output device/interface 3140 may include a display, television, monitor, printer, speaker, braille, or the like. In some example implementations, input/user interface 3135 and output device/interface 3140 can be embedded with or physically coupled to the computer device 3105. In other example implementations, other computer devices may function as or provide the functions of input/user interface 3135 and output device/interface 3140 for a computer device 3105.
[0106] Examples of computer device 3105 may include, but are not limited to, highly mobile devices (e.g., smartphones, devices in vehicles and other machines, devices carried by humans and animals, and the like), mobile devices (e.g., tablets, notebooks, laptops, personal computers, portable televisions, radios, and the like), and devices not designed for mobility (e.g., desktop computers, other computers, information kiosks, televisions with one or more processors embedded therein and/or coupled thereto, radios, and the like).
[0107] Computer device 3105 can be communicatively coupled (e.g., via I/O interface 3125) to external storage 3145 and network 3150 for communicating with any number of networked components, devices, and systems, including one or more computer devices of the same or different configuration. Computer device 3105 or any connected computer device can be functioning as, providing services of, or referred to as a server, client, thin server, general machine, special-purpose machine, or another label.
[0108] I/O interface 3125 can include, but is not limited to, wired and/or wireless interfaces using any communication or I/O protocols or standards (e.g., Ethernet, 802.11x, Universal System Bus, WiMax, modem, a cellular network protocol, and the like) for communicating information to and/or from at least all the connected components, devices, and network in computing environment 3100. Network 3150 can be any network or combination of networks (e.g., the Internet, local area network, wide area network, a telephonic network, a cellular network, satellite network, and the like).
[0109] Computer device 3105 can use and/or communicate using computer-usable or computer-readable media, including transitory media and non-transitory media. Transitory media include transmission media (e.g., metal cables, fiber optics), signals, carrier waves, and the like. Non-transitory media include magnetic media (e.g., disks and tapes), optical media (e.g., CD ROM, digital video disks, Blu-ray disks), solid state media (e.g., RAM, ROM, flash memory, solid-state storage), and other non-volatile storage or memory.
[0110] Computer device 3105 can be used to implement techniques, methods, applications, processes, or computer-executable instructions in some example computing environments. Computer-executable instructions can be retrieved from transitory media, and stored on and retrieved from non-transitory media. The executable instructions can originate from one or more of any programming, scripting, and machine languages (e.g., C, C++, C#, Java, Visual Basic, Python, Perl, JavaScript, and others).
[0111] Processor(s) 3110 can execute under any operating system (OS) (not shown), in a native or virtual environment. One or more applications can be deployed that include logic unit 3160, application programming interface (API) unit 3165, input unit 3170, output unit 3175, and inter-unit communication mechanism 3195 for the different units to communicate with each other, with the OS, and with other applications (not shown). The described units and elements can be varied in design, function, configuration, or implementation and are not limited to the descriptions provided.
[0112] In some example implementations, when information or an execution instruction is received by API unit 3165, it may be communicated to one or more other units (e.g., logic unit 3160, input unit 3170, output unit 3175). In some instances, logic unit 3160 may be configured to control the information flow among the units and direct the services provided by API unit 3165, input unit 3170, output unit 3175, in some example implementations described above. For example, the flow of one or more processes or implementations may be controlled by logic unit 3160 alone or in conjunction with API unit 3165. The input unit 3170 may be configured to obtain input for the calculations described in the example implementations, and the output unit 3175 may be configured to provide output based on the calculations described in example implementations.
[0113] Processor(s) 3110 can be configured to facilitate a meta-graph management configured to link external data source to another external data mart through a data management platform, which can involve managing characteristics of one or more tables of the data source and the data mart and a temporary table based on columns; managing characteristics of one or more input data and output data of data processing from the data source to the data mart based on columns; managing relationships of characteristics between data and data processing for the data source and the data mart based on the columns; managing one or more data flows between the data source and the data mart that include data, data processing, and relationships; and providing data, data processing, and relationships between the data source and the data mart for each data flow as illustrated from
[0114] Processor(s) 3110 can be configured to create the one or more data flows based on a data search from the data mart to the data source and from the data source to the data mart; and provide the one or more data flows and usage records for each component in the data management platform as illustrated in
[0115] Processor(s) 3110 can be configured to manage, for each component on the data management platform, usage information, total cost, estimated cost, and estimated execution statistics based on execution logs associated with the each component, and provide an interface configured to provide the usage information, total cost, estimated cost, and estimated execution statistics for the each component as illustrated in
[0116] Processor(s) 3110 can be configured to create isolated data spaces for each of the one or more data flows; and for execution of a data flow from the one or more dataflows, execute the data flow through using an associated one of the isolated data spaces as illustrated in
[0117] Processor(s) 3110 can be configured to, for the data processing being enabled for data processing duplication and for the each data flow being duplicable, duplicate the data processing as illustrated in
[0118] Processor(s) 3110 can be configured to, for the each data flow being incomplete, not execute the data flow as illustrated in
[0119] Processor(s) 3110 can be configured to add event definitions based on an autorun property as illustrated in
[0120] Processor(s) 3110 can be configured to, for other data sources being similar to the data source, recommend the data processing used in the data flow between data source and the data mart for the other data sources; and manage a plurality of properties for the recommended data processing for the other data sources as illustrated in
[0121] Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In embodiments, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result.
[0122] Unless specifically stated otherwise, as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” “displaying,” or the like, can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other information storage, transmission or display devices.
[0123] Embodiments may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may include one or more general-purpose computers selectively activated or reconfigured by one or more computer programs. Such computer programs may be stored in a computer readable medium, such as a computer-readable storage medium or a computer-readable signal medium. A computer-readable storage medium may involve tangible mediums such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of tangible or non-transitory media suitable for storing electronic information. A computer readable signal medium may include mediums such as carrier waves. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Computer programs can involve pure software implementations that involve instructions that perform the operations of the desired implementation.
[0124] Various general-purpose systems may be used with programs and modules in accordance with the examples herein, or it may prove convenient to construct a more specialized apparatus to perform desired method steps. In addition, the embodiments are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the embodiments as described herein. The instructions of the programming language(s) may be executed by one or more processing devices, e.g., central processing units (CPUs), processors, or controllers.
[0125] As is known in the art, the operations described above can be performed by hardware, software, or some combination of software and hardware. Various aspects of the embodiments may be implemented using circuits and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out implementations of the present application. Further, some embodiments of the present application may be performed solely in hardware, whereas other embodiments may be performed solely in software. Moreover, the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways. When performed by software, the methods may be executed by a processor, such as a general purpose computer, based on instructions stored on a computer-readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.
[0126] Moreover, other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the teachings of the present application. Various aspects and/or components of the described embodiments may be used singly or in any combination. It is intended that the specification and embodiments be considered as examples only, with the true scope and spirit of the present application being indicated by the following claims.