Cash flow forecasting using a bottoms-up machine learning approach

Abstract

A method and apparatus for improving the management of cash and liquidity of an organization utilizing a plurality of ledger accounts and a plurality of currency accounts is described. One improvement in the accuracy of the forecasts comes from the uses of individual ledger accounts. The improvements optimize the interest earnings for the cash balances in each currency account and minimizes the expenses related to funding the currency accounts. Machine learning techniques are incorporated to forecast payments, receipts, interest rates, and currency exchange rates, and then the cash is transferred or borrowed or loaned to fund the payments and utilize available cash.

Claims

1. An improved cash management apparatus comprising: one or more payment rails connected to one or more banks; a special purpose server connected to the one or more payment rails; and a plurality of data storage facilities connected to the special purpose server, wherein the special purpose server is configured to retrieve a set of payment and receipt transactions from the plurality of data storage facilities for a given past date range, configured to separate the set of payment and receipt transactions by ledger accounts, configured to sort the payment and receipt transactions with proximate time frames, configured to perform an ARIMA analysis on the payment and receipt transactions in each ledger account of each currency account to create a model of expected inflow and outflows for each period for each ledger account for each currency account, configured to forecast a receipts forecast and a payments forecast for a future period, where the special purpose server subtracts the payments forecast from the receipts forecast and adds in a previous period cash balance to create a forecast cash balance time series for each currency account, configured to retrieve historical banking rate information and perform the ARIMA analysis on the historical banking rate information to create a forecast banking rate information time series, configured to form a distributed machine learning model using a machine learning algorithm that calculates an F-score and rule for each feature set in each silo and then combines the F-scores and the rules to create the distributed machine learning model, configured to execute the distributed machine learning model on multiple data silos on the forecast cash balance time series for each currency account and on the forecast banking rate information time series to determine a set of optimal cash transfers between each currency account and one or more sweep accounts, and configured to execute instructions to make payments and cash transfers.

2. The improved cash management apparatus of claim 1 wherein the set of payment and receipt transactions further includes transactions from multiple tenants from a bank.

3. The improved cash management apparatus of claim 1 wherein the future period is user-configurable.

4. The improved cash management apparatus of claim 1 wherein the payments forecast is modified to incorporate actual planned payments.

5. The improved cash management apparatus of claim 1 wherein the receipts forecast is modified to incorporate actual incoming receipts.

6. The improved cash management apparatus of claim 5 wherein the machine learning algorithm is K-means.

7. The improved cash management apparatus of claim 5 wherein the historical banking rate information is retrieved from the one or more banks over the one or more payment rails.

8. The improved cash management apparatus of claim 5 wherein the historical banking rate information includes interest rates, foreign exchange rates, and money transfer costs.

9. A method for managing cash in an organization, the method comprising: retrieving a set of payment and receipt transactions from a plurality of data storage facilities for a given past date range for a plurality of currency accounts with software on a special-purpose server that is connected to the plurality of data storage facilities; separating, with the software, the set of the payment and receipt transactions by ledger accounts; sorting, with the software, the set of the payment and receipt transactions with proximate time frames; performing, with the software, an ARIMA analysis in on the set of payment and receipt transactions in each ledger account of each currency account; creating, with the software, a model of expected inflows and outflows for each period for each ledger account for each currency account; forecasting, with the software, a receipts forecast and a payments forecast for a future period; subtracting, with the software on, the payments forecast from the receipts forecast and adding in a previous day cash balance, creating a forecast cash balance time series for each currency account; retrieving historical banking rate information to the software; performing, with the software, the ARIMA analysis on the historical banking rate information to create a forecast banking rate information time series; forming a distributed machine learning model using a machine learning algorithm that calculates an F-score and rule for each feature set in each of a plurality of silos and then combines the F-scores and the rules to create the distributed machine learning model; executing, using the software, the distributed machine learning model utilizing multiple data silos on the forecast cash balance time series for each currency account and on the forecast banking rate information time series to determine a set of optimal cash transfers between each currency account and one or more sweep accounts; and executing, by the software, instructions to make payments and cash transfers.

10. The method of claim 9 wherein the set of payment and receipt transactions further includes transactions from multiple tenants from a bank.

11. The method of claim 9 wherein the given past date range is user-configurable.

12. The method of claim 9 further comprising modifying the payments forecast by incorporating actual planned payments.

13. The method of claim 9 further comprising modifying the receipts forecast by incorporating actual planned receipts.

14. The method of claim 9 wherein the machine learning algorithm is Random Forrest.

15. The method of claim 9 wherein the historical banking rate information is retrieved from the one or more banks.

16. The method of claim 9 wherein the historical banking rate information includes interest rates, foreign exchange rates, and money transfer costs.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a block diagram of a payment and cash management system.

(2) FIG. 2 is a sample screen showing four currency accounts and a group of payments.

(3) FIG. 3 is a diagram of the various functions of the cash management system.

(4) FIG. 4 is a flow hart of the periodic process to pay bills.

(5) FIG. 5 is a diagram of one possible hardware environment for operating the outcome creation engine.

(6) FIG. 6 is an alternative embodiment of the creation of models for predicting cash balances.

(7) FIG. 7 is an alternative embodiment of the creation of a cash balance table.

(8) FIG. 8 is a flowchart showing rule generation.

(9) FIG. 9 shows a K-beam search based on rule specialization.

(10) FIG. 10 is a flow chart of rule specialization and evaluation.

(11) FIG. 11 illustrates a rule generation process with the internal rule list and the final model rule list.

(12) FIG. 12 is an electrical architecture of one embodiment.

(13) FIG. 13A is a diagram showing horizontal fragmentation of the data across multiple sites.

(14) FIG. 13B shows a diagram showing vertical fragmentation of attributes across multiple sites.

(15) FIG. 14 is a view of a virtual feature table showing the distributed entities.

(16) FIG. 15 is a view of a database distributed over three servers.

(17) FIG. 16A is a graphical view of the data from FIG. 8 from one of the servers.

(18) FIG. 16B is a graphical view of the data from FIG. 8 from two of the servers.

(19) FIG. 17 illustrates the communication between the Distributed DensiCube components.

(20) FIGS. 18A and 18B show modifications to the scoring algorithm to support privacy preserving in the data silos.

(21) FIG. 19 is an overview of the distributed data structure upon which the distributed machine learning algorithm is performed.

(22) FIG. 20 is a flowchart showing the distributed nature of the Distributed DensiCube algorithm.

DETAILED DESCRIPTION

(23) The present inventions are now described in detail with reference to the drawings. In the drawings, each element with a reference number is similar to other elements with the same reference number independent of any letter designation following the reference number. In the text, a reference number with a specific letter designation following the reference number refers to the specific element with the number and letter designation and a reference number without a specific letter designation refers to all elements with the same reference number independent of any letter designation following the reference number in the drawings.

(24) Beginning with FIG. 1, the corporate data is pulled from the ERP 101a system (enterprise resource planning), the TMS 101b system (transportation management system), and the accounting 101c systems through a secure channel 107 to the PCM 110. In addition, users using a personal computing device such as a personal computer 104, a smartphone 105, a tablet, a laptop, a smartwatch, or similar device through a secure network access 106 can configure, manage, observe, and operate the PCM 110. The secure network access 106 communicates through the secure channel 107 to provide the computing devices 104, 105 with access to the PCM 110 modules.

(25) The secure channel 107 interfaces with the integration layer 113 of the PCM 110. This channel 107 provides access between the corporate facilities, through the integration layer 113, to the payments manager 111, the statements manager 112, the audit and security 114 module, and the cash manager 115.

(26) On the other side of the PCM 110, the integration layer 113 interfaces with the universal aggregator network gateway 120. The universal aggregator 120 provides access to a number of banks 121-126 using a number of different protocols (SWIFT, ACH, RTP, BACS, etc.) and networks (payment rails 504).

(27) The messages between the universal aggregator 120 and the integration layer 113 are made up of bank statements 118, account balances, interest rates, costs for wire transfers, and foreign exchange rates coming into the PCM 110. The integration layer 113 sends payment messages 119, as well as messages to transfer funds, loan funds, and borrow funds. In addition, message acknowledgements and other housekeeping information are exchanged.

(28) The universal aggregator 120 sends and receives information from the banks 121-126 through the various payment rails 504. A payment rail 504 is a secure network that moves transactions between banks, where the transactions are related to the electronic movement of money from payer to payee.

(29) The payment manager module 111 provides full visibility on the payments lifecycle from input (import or manual creation) to confirmation by the bank. PCM 110 allows importing transaction files from ERP 101a, TMS 101b, DBMS 101c, or any other backend application through a secure channel 107. The payments manager 111 then validates the payment, and gathers approvals through an approval workflow, and prepares the payment for sending to the bank rails. However, before the payment is released, the payments go to the cash manager 115 to assure that funding is available.

(30) The security model 114 enables the granular control of entitlements or access rights allowing access to be controlled based on functionality and data. The user will only see the functionality and data to which he has been granted access.

(31) The statement manager module 112 gives clients the ability to control and monitor proper receipt of statements, and link these to account and business units in static data. PCM module 110 will import all data contained into the bank statement (End of Day and Intraday) which allows clients to control balances and transactions. Statements from partner banks are displayed in exactly the same way, regardless of bank, country, currency, or format.

(32) PCM 110 can interface with a Sanction Screening module 116. This module 116 will check transactions against sanction lists maintained by various government entities that prohibit transactions with sanctioned entities. Transaction sanction screening 116 is performed after authorization and before sending the transaction over a network. Similarly, the risk and fraud management module 117 checks transactions for fraud by performing various analytics to prevent fraudulent payments.

(33) The PCM 110 cash manager 115 incorporates functions for managing liquidity 301, cash forecasting 302, and cash pooling 303. See FIG. 2 for a sample screen 200 of the cash management function. The tiles on the bottom of the screen show the payments 202 that are pending. Each payment tile has a flag indicating the country where the payment is to be made along with an indication of the currency (USD, GBP, EURO, AUD, etc.) and the account number. The amount of the currency is shown on the tile, as well as the number of payments included and the date when the payment is due. The status of the payment is also included in the tile (APPROVED, PAID, COMPLETE, REJECTED, ARCHIVED, CANCELLED, ERROR, IN PROGRESS, etc).

(34) The top of the screen shows the balances of each currency account 201. Each tile shows one currency, the account balance in that currency, and a chart showing the amount available for payment and the pending payments. A warning triangle is shown when the payments exceed the balance in that currency. The user is given the option to convert funds into the currency or to directly fund the account.

(35) The PCM 110 in FIG. 3 uses current bank account balances and high-value cash transaction information to facilitate cash management. The Cash Manager module 115 comprises a consolidated view of accounts, short term cash forecast, and the capability of generating cash transfers between accounts. The consolidated view of all bank accounts 200, the ‘liquidity view’ 301, provides users with a hierarchically organized view of the current and available funds balance across all accounts.

(36) Users can configure hierarchical groups of accounts 311 for cash management purposes. A typical hierarchy consists of a lead account, with subsidiary accounts in the same currency. Multiple account hierarchies may be defined, and these hierarchies can then be used by the Liquidity View 301 and Cash Forecasts 302. New account hierarchies can be defined for Cash Pooling 303 purposes.

(37) Transactions, either passing through the payments process or captured directly (in the case of expected receipts) from back-office systems are brought into the consolidated cash ledger 312. Here they are matched against confirmation messages, interim statements, and bank statements in order to identify the current “open” transactions yet to be reported on a bank statement.

(38) For accounts that are being actively managed through the PCM User Interface, daily Bank Statements 321 are loaded through the file interface automatically with a batch. These bank statements 321 are used to provide a closing position for the previous day. Additional information is derived from the debit and credit confirmations 322, the outboard payments 323, and the adjusting transactions 324. The closing balance acts as the opening balance for the current position. The bank statement is also reconciled against the current “open” transactions for the day and matching transactions are closed.

(39) By combining and matching your organization's transactions and your partner bank transactions, PCM can create an accurate representation of the short-term cash forecast. The integration, statement processing, and transaction matching processes are automated background tasks and require no user intervention. The short-term cash forecast module displays the current balance+/−pending payments, transfers, and expected receipts for the coming days.

(40) PCM Forecasting 302 features include the creation and administration of user's hierarchical cash forecasting structures that are bank independent, standard cash forecasts based on a 5-day liquidity forecast, drill-down capability to display forecasted transactions against individual accounts, the addition of single or repetitive manual adjustments to cash forecast (for repetitive adjustments, a background process—scheduled on the daily run list, will automatically add the recurring adjustment to the cash forecast), conversion of account balances into account hierarchy currency (uses the exchange rates, stored as static reference data to estimate the conversion between currencies), visual indicators against each account to show if the expected statements or confirmations have been received.

(41) Ledger 312 adjustment can be made directly in PCM 110 to reflect business scenarios outside of the system. Standard adjustment details can be added (account, amount, date, repeat option, and start day of the month). Users can also search on ledger adjustments, view details of individual ledger adjustments, and export the results in the same manner as for payments.

(42) In conjunction with the short-term liquidity forecast 302, the liquidity management functionality may be enabled to provide visibility of cash positions as it displays the current and available funds balance across all accounts in the selected account hierarchy. The liquidity management features include the creation and administration of user's own hierarchical cash reporting structures that are bank independent, drill-down capability to display statement lines against each account along with intraday transactions, closed balance display and available funds display, and conversion of account balances into account hierarchy currency (uses the exchange rates, stored as static reference data to estimate the conversion between currencies).

(43) The cash management 115 can be used to implement cash pooling 303 both across a group or locally as required. Cash pool views can be created from a cash account hierarchy 311. To control the sweeping and funding process, the following parameters can be defined for each account in the cash pool: transfer action (sweep and funds, fund only, report only), minimum transfer, rounding, minimum balance, and target balance. Cash pool controls 313 define the operating rules of the cash pooling 303.

(44) In addition, the cash manager 115 provides the functionality of automatically funding the cash positions in various currencies and in various accounts (or it could provide suggestions to a human cash manager). This feature interprets the current cash forecast, and the parameters set up for the accounts in the cash pool and implements transactions to sweep surplus cash into the lead account for the group or fund subsidiary accounts from the lead account in the group. This operation is repeated for each level in the account hierarchy. The transfers will be executed automatically (or created for authorization). Cross-currency cash pools can be configured, and these will use the exchange rates, stored as static reference data to estimate conversions between currencies.

(45) FIG. 4 shows a flow chart of the machine learning required to automatically determine the appropriate funding in each currency and in each account. First of all, the historical transactions of the business are pulled from the accounting 101c, TMS 101b and ERP 101a systems, filtering 402 these transactions to isolate the receipts and the payments within a review period. This review period is the amount of time, perhaps a month, a quarter, a year, or a decade. The review period should be sufficiently long to generate a large number of receipts and payments, enough to create statistically significant data. In addition, the review period should be long enough to cover several seasonal periods. For some businesses, a season could be a month, for others, it is a year. In some embodiments, the review period is determined by a machine learning algorithm testing the quality of the resulting data over a variety of period values and choosing the best value for the available data. FIG. 6 describes an alternative methodology for filtering the account transactions 402 and the ARIMA 403 steps.

(46) Next, an autoregressive integrated moving average (ARIMA) 403 is run against the data. An autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). ARIMA models are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the “integrated” part of the model) can be applied one or more times to eliminate the non-stationarity. The ARIMA model is considered a machine learning technique within the field of artificial intelligence technology.

(47) In some embodiments, the ARIMA model is run, and then all outliers beyond one or two standard deviations, are removed from the dataset, and the data run again. Particularly with receipts and payments, occasionally a very large payment for a large job or a past due client suddenly pays up. This may throw off the predictive values and are best removed from the data unless there is reason to know that this is a periodic event.

(48) Given a time series of data X.sub.t where t is an integer index and the X.sub.t are real numbers, an ARIMA (p, d, q) model is given by

(49) $(1 - {.Math.}_{i = 1}^{p} \propto_{i} L^{i}) {(1 - L)}^{d} X_{t} = (1 - {.Math.}_{i = 1}^{q} θ_{i} L^{i}) {.Math.}_{t}$

(50) where L is the lag operator, the ∝.sub.i are the parameters of the autoregressive part of the model, the θ.sub.i are the parameters of the moving average part and the ε.sub.t are error terms. The error terms ε.sub.t are generally assumed to be independent, identically distributed variables sampled from a normal distribution with zero mean. The parameter p is the order (number of time lags) of the autoregressive model, d is the degree of differencing (the number of times the data have had past values subtracted), and q is the order of the moving-average model.

(51) The ARIMA model 403 is run on the receipts and payments separately for each currency and each account, with each predicting the receipts and payments for each currency-account group. The ARIMA model 403 provides the time series X.sub.t for receipts and payments for each currency-account for the next 15 or 30 days. Treasury function in-flows or out-flows (funding loans or transfers or investments) are not included in this calculation.

(52) Since payments are typically paid 30 or so days after receipt, and the approval process takes 7-14 days, so the PCM 110 knows perhaps 15 days' worth of actual payments (other periods could be used for other embodiments). These actual payments 404 are factored into the future payments time series. In some cases, several days' worth of receipts may also be known in advance, depending on the characteristics of the bank account and payment rails. If known in advance, the receipts are also switched to actual values. See FIG. 7 for an alternative method of determining the cash forecast 405 and incorporating the actuals 404.

(53) Once the time series X.sub.i for receipts and the time series X.sub.i for payments are determined, the running balance is calculated by taking the previous balance in the currency account and subtracting the payments X.sub.i for the day and adding the receipts X.sub.i for the day to obtain the account balance for the end of the day (cash forecast 405). A fifteen-day forecast should be fine, but other durations could be used without deviating from these inventions. In some embodiments, this number is configurable. The results of the calculations are assembled into a cash flow time series for the future X.sub.i.

(54) Next, data associated with interest rates for loans and investments, both current and historical, for each currency and account location, are collected 406, along with the costs for wire transfers and other forms of money transfers. The spread between sell and buy of the various currencies is also collected, both present and historical values, to determine the foreign exchange rate costs.

(55) The time series for exchange rates and interest rates are run through an ARIMA model to predict the interest rates and exchange rates for the coming fifteen days (or whatever window is being used for payments and receipts).

(56) Each currency account is analyzed 407 to maximize the value of the entire universe of currency accounts. First, critical cash needs are identified by reviewing each currency account for predicted negative balances for the current day. These accounts need funding to cover the payments due in the current day. For example, in one embodiment, based on previously recorded user behavior (foreign exchange, swapping, and funding) a machine learning algorithm proposes a number of operations for final validation: change of currency to have an appropriate balance between the main currency recorded, funding accounts that are in overdraft, but also moving cash to interest-earning accounts. Also, following the previous behavior of the system will allocate cash per counterparty to mitigate risk exposure. In case of negative interest existing on accounts in a specific currency, the system will propose to empty those accounts and swap the currencies against an account in currency who generate positive interest.

(57) Regarding payment rejection, the machine learning/artificial intelligence algorithm will propose predictive and prescriptive analytics. For example, a payment that has been rejected before will be flagged and correction in the data will be proposed, also a payment type channel and route will be analyzed against the cut off time and cost per channel banks and alternative channel and route will be proposed to reduce the cost of payments.

(58) Regarding cash exposure and foreign exchange exposure the system will be able to predict variation and encourage mitigation risk actions, for example, but non exhaustively predict that the US Dollar is going up against the Sterling and propose SWAP and Forward FX operation to mitigate loss exposure.

(59) The entire universe of currency accounts is optimized using multivariate forecasting techniques. In one embodiment, the algorithm to optimize the sum of the balances of all of the currency accounts is a mathematical formula. For each day of the future time series, the currency accounts are searched for negative balances. For each account with a negative balance, the accounts with positive balances are searched for the currency with the lowest interest rate and the lowest cost to transfer to the currency with the shortfall. The transfer instructions are then recorded and if necessary, additional funds are sought. Then the funding for the next account with a shortfall is sought until all of the accounts have funding. Then the accounts with a positive balance are defunded to a sweep account in that currency.

(60) In another embodiment, a machine learning algorithm such as K-means, Random Forrest, or DensiCube (see U.S. Pat. No. 9,489,627, issued to Jerzy Bala on Nov. 8, 2016, said patent incorporated herein in its entirety and U.S. patent application Ser. No. 16/355,985, filed by Jerzy Bala and Paul Green on Mar. 18, 2019, said patent application incorporated herein in its entirety) is used to try various scenarios for cash transfers, borrowing, and funding to optimize the balance at the end of the future forecast period. The fields could be the currency, the cash, the interest in, the interest out, the transfer cost, and the foreign exchange rate to various currencies. A second dimension of the fields could be each day of the forecast period. And the attributes could be the currency accounts. Each combination of cash movements is calculated to determine a sum for the scenario (across the entire forecast period and all currency accounts), and the cash movements related to the best scenario is stored until a better scenario is found. Rather than using an F-score as a quality metric, the cash sum could be used as the quality metric. The best scenario at the end of the analysis is then saved as the cash movement plan. The output of the machine learning algorithm is a rules set that dictate the cash movements. In some embodiments, this rules engine is used as the cash movement plan for a single day; in other cases, the rules could be used as the cash management plane for a short period, perhaps a week. See the discussion below for more detailed information on the distributed Densicube algorithm.

(61) In the operations of the machine learning module over a window of a longer period (perhaps a week, 15 days, or a month), the goal is to find opportunities to invest the money in longer-term, higher interest rate notes, such as a 15-day certificate of deposit, rather than solely investing in overnight sweep accounts. The same is true for borrowing money or for transferring cash between accounts: by looking at the longer term, better rates and lower transaction costs can be achieved.

(62) Once the funding plan is calculated, the cash manager 115 moves the money 408 according to the plan to the appropriate account, borrowing or investing money as specified in the plan. Once funding is in place, the payments for the day are paid 409 from the appropriate currency account.

(63) Because of the complexities of machine learning algorithms, special purpose computing may be needed to build and execute the machine learning model described herein. FIG. 5 shows one such embodiment. The user configures the PCM 110 monitors the currency account status 200 on a personal computing device such as a personal computer, laptop, tablet, smart phone, monitor, or similar device 501, 104, 105. The personal computing device 501, 104, 105 communicates through a network 502 such as the Internet, a local area network, or perhaps through a direct interface to the server 503. The special-purpose machine learning server 503 is a high performance, multi-core computing device (that includes floating-point processing capabilities) with significant data storage facilities 101a, 101b, 101c, 406 (hard disk drives, solid-state drives, optical storage devices, RAID drives, etc) in order to store the transaction data for the PCM 110. Since these databases 101a, 101b, 101c, 406 are continuously updated in some embodiments, this data must be kept online and accessible so that it can be updated. These data analytics require complex computation with large values of data, requiring the special purpose, high-performance server 503. The server 503 is a high-performance computing machine electrically connected to the network 502 and to the storage facilities 101a, 101b, 101c, 406. In addition, the server 503 requires connectivity to the payment rails 504 in order to have secure, high-performance access to the banks 121-126 where the currency accounts are located. The foregoing devices and operations, including their implementation, will be familiar to, and understood by, those having ordinary skill in the art.

(64) Looking at FIG. 6, a bottoms-up method is shown for creating a model of cash flow. A large set of multi-tenant transaction records from one or more banks 601 is accessed. The entire set of transactions 601 could be downloaded to the file storage 406 of server 503. In some cases, these bank transactions 601 are combined 602 with the company transactions 101a, 101b, 101c to provide a universe of transactions upon which to use in the creation of the model. The combined transactions could then be filtered 603 to focus the transaction set on a specific industry. This is done because some forecasting may not desire to have retail sales mixed in with large industries as the cycles of spending may be different.

(65) The transactions are then sorted 604 into specific ledger accounts such as utilities, rent, payroll, and materials (in some embodiments, this sorting uses individual vendor names and in other embodiments, the ledger accounts are grouped). For each ledger account, the sorted transactions are loaded into a time series that is plotted in memory 605. In some embodiments, the transactions are grouped in a time frame 605, and the amounts within the time frame are summed. For instance, the time frame could be daily, weekly, or monthly. In the case of weekly, all transactions for the ledger account within each week are summed, and the time series uses the summed amount. The plotted time series is then processed through the ARIMA algorithm 606 as the model for that ledger account. The model predicts the expected value of transactions for a point in time.

(66) In some embodiments, the data is segregated across a plurality of bank computers, perhaps from a number of different banks. In order to preserve the privacy of the data, the filtering 602, sorting 604, combining 605, and ARIMA algorithm 606 are performed on the bank (or a third party) computers and only the model is transferred to the company for use. See U.S. patent application Ser. No. 16/355,985, filed by Jerzy Bala and Paul Green on Mar. 18, 2019, for a further discussion of distributing machine learning algorithms.

(67) In addition, the company's transactions 101a, 101b, 101c are similarly sorted 611 into the same ledger account categories as used in the sorting 604 of the bank transactions. The sorted company's transactions are then combined into time frames 612 and processed using the ARIMA 613 algorithm to form a company-specific model for each ledger account.

(68) The company-specific ledger account models are then compared 621, on an account-by-account basis, to the ledger account models based on the bank transactions. If the two models are comparable, then the bank transaction model is used going forward. If the ARIMA formulas are the same but the offset, then the constants in the model are adjusted 622 to those in the company-specific model. If the models are not compatible, then the company-specific model is used, or the user is requested to adjust the filtering so that the set of bank transactions more accurately reflects the industry of the company. The final set of ledger account models are then returned 623.

(69) In some embodiments, the bank transactions are not used, and the company-specific transactions are the only data used to create the model.

(70) The methodology of FIG. 7 shows how to create a chart of predicted cash flows from the ledger account models. First, the ledger account models are received 701. Then, for each time period 702 in the chart, and for each ledger account 703 the model is used to extrapolate 704 the expected amount of expense or receipts for that ledger account for the time period. When each ledger account is competed for each time period, the resulting chart is adjusted 711 in the near term (15-30 days) to use actual data for invoices on hand and receipts that have already been invoiced. Once the chart of cash changes is created, the cash balances 712 for each period can be calculated by adding the amount of the available cash for the previous period to the sum of the receipts for the period and subtracting the expenses for the period. The entire cash flow chart is then returned 713.

(71) This cash flow chart can then be inputted to the machine learning 407 processes in FIG. 4 to determine the movement of cash to cover shortfalls in currency accounts or to invest excess cash.

(72) The following description outlines several possible embodiments to create models using distributed data. The Distributed DensiCube modeler and scorer described below extend the predicative analytic algorithms that are described in U.S. Pat. No. 9,489,627 to extend their execution in distributed data environments and into quality analytics. The rule learning algorithm for DensiCube is briefly described below. But the DensiCube machine learning algorithm is only one embodiment of the inventions herein. Other machine learning algorithms could also be used.

(73) Rule Learning Algorithm

(74) The rule learning algorithm induces a set of rules. A rule itself is a conjunction of conditions, each for one attribute. A condition is a relational expression in the form:
A=V,

(75) where A is an attribute and Vis a nominal value for a symbolic attribute or an interval for a numeric attribute. The rule induction algorithm allows for two important learning parameters 802: minimum recall and minimum precision. More specifically, rules generated by the algorithm must satisfy the minimum recall and minimum precision requirements 805 as set by these parameters 802. The algorithm repeats the process of learning a rule 803 for the target class and removing all target class examples covered by the rule 804 until no rule can be generated to satisfy the minimum recall and minimum precision requirements 805 (FIG. 8). In the distributed DensiCube algorithm, the removal of the positive examples covered by the rule is done in parallel at each of the distributed servers that hold the data.

(76) In learning a rule, as seen in FIG. 9, the algorithm starts with the most general rule 901, which covers the entire feature space (all examples both positive and negative) and then conducts a general-to-specific beam search. At each step of the search, the algorithm maintains a set of k best rules (rules with the largest F-measure scores), where k is a user defined parameter. A smaller k translates into a smaller search space, hence a faster search. Each of the best rules is specialized by either adding a new condition or reducing the interval of a condition of a numeric attribute. This search process repeats until the recalls of all rules are smaller than the minimum recall and the best rule is the rule generated by the rule search process. However, any rule learning approach that follows the covering rule generation schema can be used here (i.e., search for the “best” rule, remove the data explained/covered by this rule, and repeat the search process).

(77) Looking at 911, 912, the rule 912 covers all of the positive and negative values, and rule 911 is empty. This rule set is then scored and compared to the base rule 901. The best rule is stored.

(78) Next, the algorithm increments the x-axis split between the rules, creating rule 931 and 932. The rules are scored and compared to the previous best rule.

(79) The process is repeated until all but one increment on the x-axis is left. These rules 941, 942 are then scored, compared, and stored if the score is better.

(80) Once the x-axis has been searched, the best rules are then split on the y-axis (for example, 951,952) to find the best overall rule. This process may be repeated for as many axes as found in the data.

(81) In the Distributed DensiCube algorithm, the functions shown in FIG. 9 are performed independently on multiple data silos operating on the different features that reside on those silos.

(82) FIG. 10 depicts the internal process of generating a singular rule. It starts 2301 with the step of initializing the risk model with a rule that describes the whole representation space 2302 (i.e., a rule with conditional parts satisfying all attributes values). The initial rule is stored as the best rule 2303. This rule is iteratively specialized via a k-beam search process of re-referencing its value ranges for each of the attributes 2304. The specialization includes calculating the F-score 2305, setting the rule set to the K rules with the best F-score 2306, and replacing the Best Rule if this rule has the better F-Score 2307. This continues while there are more rules to specialize 2308. If not, the algorithm outputs the Best Rule 2311 and stops 2309. The top k rules, based on the evaluation measure, are maintained on the candidate list 1105 during this process. All the rules on the candidate list 1105 are evaluated and ranked. The best rule from the candidate rule list (i.e., an internal rule set maintained by the beam search algorithm) enters the model rule list (FIG. 11).

(83) In the Distributed DensiCube algorithm, the entire process described in FIG. 10 is distributed, performed on each data silo.

(84) Looking at FIG. 11, the rule 1101 is analyzed and the F-scores of each sub-rule is recorded in the internal rule set 1102. If the F-score 1102 for the rule 1101 is greater than the last F-score 1103, then the last rule is replaced by the new rule 1104. Various algorithms could be reserved here, for instance, the rule set could be a sorted list of pairs of the rule set and the rule's F-score. Also, the statistics of other machine learning quality measures could be used. When comparing 1103, the list is searched and the new rule inserted 1104, dropping off the lowest scoring rule set.

(85) Every rule induction algorithm uses a metric to evaluate or rank the rules that it generates. Most rule induction algorithms use accuracy as the metric. However, accuracy is not a good metric for imbalanced data sets. The algorithm uses an F-measure as the evaluation metric. It selects the rule with the largest F-measure score. F-measure is widely used in information retrieval and in some machine learning algorithms. The two components of F-measure are recall and precision. The recall of a target class rule is the ratio of the number of target class examples covered by the rule to the total number of target class examples. The precision of a target class (i.e., misstatement class) rule is the ratio of the number of target class examples covered by the rule to the total number of examples (from both the target and non-target classes) covered by that rule. F-measure of a rule r is defined as:

(86) $F - measure (r) = \frac{β^{2} + 1}{\frac{β^{2}}{recall (r)} + \frac{1}{precision (r)}}$
where β is the weight. When β is set to 1, recall and precision are weighted equally. F-measure favors recall with β>1 and favors precision with β<1. F-measure can be used to compare the performances of two different models/rules. A model/rule with a larger F-measure is better than a model/rule with a smaller F-measure.
Prototype Generation Algorithm for Ranking with Rules

(87) The algorithms incorporate a method, called prototype generation, to facilitate ranking with rules. For each rule generated by the rule learning algorithm, two prototypes are created. In generating prototypes, the software ignores symbolic conditions, because examples covered by a rule share the same symbolic values. Given a rule R with m numeric conditions: A.sub.R1=V.sub.R1 {circumflex over ( )}A.sub.R2=V.sub.R2 {circumflex over ( )}. . . {circumflex over ( )}A.sub.Rm=V.sub.Rm, where A.sub.Ri is a numeric attribute and V.sub.Ri is a range of numeric values, the positive prototype of R, P(R) (p.sub.R1, p.sub.R2, . . . , p.sub.Rm) and the negative prototype of R N(R)=(n.sub.R1, n.sub.R2, . . . , n.sub.Rm), where both p.sub.Ri∈V.sub.Ri and n.sub.Ri∈V.sub.Ri. p.sub.Ri and n.sub.Ri are computed using the following formulas:

(88) $p_{Ri} = \frac{\underset{e \in R (P O S)}{.Math.} e_{Ri}}{.Math. R (P OS) .Math.} and n_{Ri} = \frac{\underset{e \in R (N E G)}{.Math. e_{Ri}}}{.Math. R (N EG) .Math.},$

(89) where R(POS) and R(NEG) are the sets of positive and negative examples covered by R respectively, e=(e.sub.R1, e.sub.R2, . . . , e.sub.Rm) is an example, and e.sub.Ri∈V.sub.Ri for i=1, m, because e is covered by R.

(90) Given a positive prototype P(R)=(p.sub.R1, p.sub.R2, . . . , p.sub.Rm) and a negative prototype N(R)=(n.sub.R1, n.sub.R2, . . . , n.sub.Rm) of rule R, the score of an example e=(e.sub.R1, e.sub.R2, . . . , e.sub.Rm) is 0 if e is not covered by R. Otherwise, e receives a score between 0 and 1 computed using the following formula:

(91) $score (e, R) = \frac{{.Math.}_{i = 1}^{m} w_{Ri} \frac{.Math. e_{Ri} - n_{Ri} .Math. - .Math. e_{Ri} - p_{Ri} .Math.}{.Math. p_{Ri} - n_{Ri} .Math.} + {.Math.}_{i = 1}^{m} w_{Ri}}{2 \times {.Math.}_{i = 1}^{m} w_{Ri}}$

(92) where w.sub.Ri is the weight of Ri.sup.th attribute of R. The value of

(93) $\frac{.Math. e_{Ri} - n_{Ri} .Math. - .Math. e_{Ri} - p_{Ri} .Math.}{.Math. p_{Ri} - n_{Ri} .Math.}$
is between −1 and 1. When e.sub.Ri>n.sub.Ri>p.sub.Ri or p.sub.Ri>n.sub.Ri>e.sub.Ri it is −1. When e.sub.Ri>p.sub.Ri>n.sub.Ri or n.sub.Ri>p.sub.Ri>e.sub.Ri it is 1. When e.sub.Ri is closer to n.sub.Ri than p.sub.Ri, it takes a value between −1 and 0. When e.sub.Ri is closer to p.sub.Ri than n.sub.Ri, it takes a value between 0 and 1. The value of score(e, R) is normalized to the range of 0 and 1. If p.sub.Ri=n.sub.Ri, then

(94) $\frac{.Math. e_{Ri} - n_{Ri} .Math. - .Math. e_{Ri} - p_{Ri} .Math.}{.Math. p_{Ri} - n_{Ri} .Math.}$
is set to 0.

(95) w.sub.Ri is computed using the following formula.

(96) $w_{R i} = \frac{.Math. p_{R i} - n_{R i} .Math.}{\max_{R i} - \min_{R i}},$

(97) where max.sub.Ri and min.sub.Ri, are the maximum and minimum values of the Ri.sup.th attribute of R, respectively. The large difference between p.sub.Ri and n.sub.Ri implies that the values of positive examples are very different from the values of negative examples on the Ri.sup.th attribute, so the attribute should distinguish positive examples from negative one well.

(98) Scoring Using Rules

(99) A rule induction algorithm usually generates a set of overlapped rules. Two methods, Max and Probabilistic Sum, for combining example scores of multiple rules are used by the software. Both methods have been used in rule-based expert systems. The max approach simply takes the largest score of all rules. Given an example e and a set of n rules R={R.sub.1, . . . , R.sub.n,}, the combined score of e using Max is computed as follows:
score(e,R)=max.sub.i=1.sup.n{Precision(Ri)×score(e,R.sub.i)}
where precision(R.sub.i) is the precision of R.sub.i. There are two ways to determine score(e, R) for a hybrid rule. The first way returns the score of e received from rule R for all e's. The second way returns the score of e received from R.sub.i, only if the score is larger than or equal to the threshold of R.sub.i, otherwise the score is 0. The first way returns. For a normal rule,

(100) $score (e, R_{i}) = {\begin{matrix} 1 if e is covered by R_{i} \\ 0 Otherwise \end{matrix}$

(101) For the probabilistic sum method, the formula can be defined recursively as follows.
score(e,{R.sub.1})=score(e,R.sub.1)
score(e,{R.sub.1,R.sub.2})=score(e,R.sub.1)+score(e,R.sub.2)−score(e,R.sub.1)×score(e,R.sub.2)
score(e,{R.sub.1, . . . ,R.sub.n})=score(e,{R.sub.1, . . . ,R.sub.n-1})+score(e,R.sub.n)−score(e,{R.sub.1, . . . ,R.sub.n-1})×score(e,R.sub.n)
Hardware Architecture

(102) Turning to FIG. 12, we see a hardware architecture for implementing the distributed DensiCube algorithms. At the center of the distributed architecture is the cloud 502, which could be implemented as any type of network, from the Internet to a local area network or similar. Off of the cloud 502 are three servers 503, 505, 507, although any number of servers could be connected to the cloud 502. Each server 503, 505, 507 have a storage facility 509, 506, 508. There storage facilities 509, 506, 508 hold the databases as seen in FIGS. 13A and 13B 2610, 2620, 2630, 2650, 2660, 2670. Personal computer 501 (a laptop, desktop or server 1001) could operate the algorithms to combine the distributed rules or this combination could occur on any server 503, 505, 507. The servers 503, 505, 507 (or data silos 1002) are not ordinary computers as the servers must have the performance to handle the highly computationally intensive efforts to operate the DensiCube algorithm described above. In addition, for many datasets, the storage facilities 509, 506, 508 must be able to hold very large databases 2610, 2620, 2630, 2650, 2660, 2670.

(103) Distributed DensiCube

(104) By allowing for distributed execution, the Distributed DensiCube algorithm allows for a number of important benefits. First of all, privacy of the data assets in the model generation and prediction modes of operation are preserved by keeping the data in its original location and limiting access to the specific data. Second, the cost of implementing complex ETL processes and data warehousing in general is reduced by eliminating the costs of transmission to and storage in a central location. Third, these inventions increase performance by allowing parallel execution of the DensiCube algorithm (i.e., executing the predictive analytics algorithms on a distributed computing platforms). In addition, this distributed algorithm provides the capability for the Distributed DensiCube algorithm to provide unsupervised learning (e.g., fraud detection from distributed data sources). Finally, it allows predictive analytics solutions to operate and react in real time on a low-level transactional streaming data representation without requiring data aggregation.

(105) The Distributed DensiCube approach represents a paradigm shift of moving from the currently predominant Data Centric approaches to predictive analytics, i.e., approaches that transform, integrate, and push data from distributed silos to predictive analytics agents, to the future Decision Centric (predictive analytics bot agent based) approaches, i.e., approaches that push predictive analytics agents to the data locations and by collaborating support decision-making in the distributed data environments.

(106) Essentially, the distributed DensiCube algorithm operates the Densicube algorithm on each server 503, 505, 507 analyzing the local data in the database 509, 506, 508. The best rule or best set of rules 1105 from each server 503, 505, 507 is then combined into the best overall rule. In some embodiments, several servers could work together to derive a best rule, that is combined with another server.

(107) Collaborating predictive analytics bot agents can facilitate numerous opportunities for enterprise data warehousing to provide faster, more predictive, more prescriptive, and time and cost saving decision-making solutions for their customers.

(108) Distributed DensiCube Concept of Operation

(109) The following sections describe the concept behind the Distributed DensiCube approach. As mentioned in the previous section, the Distributed DensiCube solution continues to use the same modeling algorithms as the current non-distributed predictive analytics solution (with modifications to the scoring algorithms to support privacy by preserving the data assets in silos).

(110) 1.1 Distributed Modeling

(111) The Distributed DensiCube operates on distributed entities at different logical and/or physical locations.

(112) The distributed entity represents a unified virtual feature vector describing an event (e.g., financial transaction, customer campaign information). Feature subsets 2704, 2705 of this representation are registered/linked by a common identifier (e.g., transaction ID, Enrolment Code, Invoice ID, etc.) 2707. Thus, a distributed data 2701 represents a virtual table 2706 of joined feature subsets 2704, 2705 by their common identifier 2707 (see FIG. 14).

(113) In FIG. 14, there are a number of data silos 2701 located at distributed locations across a network. Two of these data sets 2702, 2703 are called out in FIG. 14, although any number of data sets could be used. The data sets 2702, 2703 are essentially tables in some embodiments, each with an identifier column. These identifiers provide a link 2707 between records in the two data sets 2702, 2703. In most, but not all, embodiments, there is a one to one correspondence between the records in the data sets 2702, 2703. The records in the data sets 2702, 2703 include feature tables 2704, 2705 of the registered entities 2708 (registered records of the data sets 2702, 2703). These feature tables 2704, 2705 are virtually combined into a virtual feature table 2706.

(114) As an example of the distributed DensiCube algorithm, see FIG. 15. In this figure, there is an identifier 801 that is a social security number (SSN). The identifier 801 is used in each of the three databases 802, 803, 804. In this simplified example, the bank database 802 contains three fields, the ID (SSN), the Default field, and the amount borrowed. The Default field is a negative if the loan is in default and positive if the loan is current.

(115) The credit agency database 803 contains three fields, the ID (SSN), the Credit Score, and the Total Debt fields. The registry of deeds database 804 also has three fields in this example, the ID (SSN), a home ownership field, and a home value field. In our example, there are a number of reasons that the data in the credit agency 803 needs to be kept separate from the registry data 804, and both of those dataset need to be kept separate from the bank data 802. As a result, the DensiCube algorithm is run three times on each of the databases 802, 803, 804. In another embodiment, two of the servers could be combined, with the algorithm running on one of the servers. This embodiment is seen in FIG. 16B, where the registry data 805 is combined with the bank information 803 to create a scatter diagram to perform the DensiCube algorithm upon. In FIG. 16A, the data from the credit agency database 803 is diagramed independently from the other datasets. The DensiCube algorithm is then run on this scatter diagram.

(116) As seen in FIG. 17, the Distributed DensiCube is accomplished via a synchronized collaboration of the following components, operating on the laptops, desktops, or servers 1001 (see also 501 in FIG. 12) and the plurality of data silos 1002 (see also 503, 505-509 in FIG. 12): Modeler 1003 on the servers 1001 Feature managers 1004 on multiple data silos 1002 Predictors 1009 on the servers 1001

(117) All the above components collaborate to generate models and use them for scoring, and at the same time, preserve the privacy of the data silos 1002. There are three levels of privacy that are possible in this set of inventions. The first level could preserve the data in the silos, providing privacy only for the individual data records. A second embodiment preserves that attributes of the data in the silos, preventing the model from knowing the attributes. The second embodiment may also hide the features (names of attributes) by instead returning an pseudonym for the features. In the third embodiment, the features themselves are kept hidden in the silos. For example, in the first level, that the range of the credit scores is between 575 and 829 is reported back to the modeler 1003, but the individual record is kept hidden. In the second embodiment, the modeler 1003 are told that credit scores are used, but the range is kept hidden on the data silo 1002. In the third embodiment, the credit score feature itself is kept hidden from the modeler 1003. In this third embodiment, the model itself is distributed on each data silo, and the core modeler 1003 has no knowledge of the rules used on each data silo 1002.

(118) The collaboration between distributed components results in a set of rules generated through a rule-based induction algorithm. The DensiCube induction algorithm, in an iterative fashion, determines the data partitions based on the feature rule based on the syntactic representation (e.g., if feature F>20 and F<−25). It dichotomizes (splits) the data into partitions. Each partition is evaluated by computing statistical quality measures. Specifically, the DensiCube uses an F-Score measure to compute the predictive quality of a specific partition. In binary classification the F-score measure is a measure of a test's accuracy and is defined as the weighted harmonic mean of the test's precision and recall. Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while Recall (also known as sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of instances.

(119) Specifically, the following steps are executed by Distributed DensiCube:

(120) 1) The modelers 1003 invokes feature managers 1004 that subsequently start data partitioning based on the local set of features at the data silo 1002. This process is called specialization.

(121) 2) Feature managers 1004 push their computed partitions (i.e., using the data identifier as the partition identifier) and their corresponding evaluation measures (e.g., F-score) to modelers 1003.

(122) 3) Each feature model manager 1008 compares evaluation measures of the sent partitions and selects the top N best partitions (i.e. specifically it establishes the global beam search for the top preforming partitions and their combinations).

(123) 4) Subsequently, the modeler 1003 proceeds to the process of generating partition combinations. The first iteration of such combinations syntactically represent two-conditional rules (i.e., a partition is represented by a joint of lower and upper bounds of two features). Once this process is completed the identifiers of the two-conditional rules are sent to the feature managers 1004. Once received, feature managers 1004 evaluate the new partitions identified by the identifiers by executing the next iteration specialization.

(124) A data manager 1012 is a logical construct which is comprised of a data orchestrator 1005 and one or more feature data managers 1006, which cooperate to manage data sets. Data sets can be used to create models and/or to make predictions using models. A data orchestrator 1005 is a component which provides services to maintain Data Sets, is identified by its host domain and port, and has a name which is not necessarily unique. A feature data manager 1006 is a component which provides services to maintain Feature Data Sets 1203, is identified by its host domain and port, and has a name which is not necessarily unique. A data set lives in a data orchestrator 1005, has a unique ID within the data orchestrator 1005, consists of a junction of Feature Data Sets 1203, joins Feature Data Sets 1203 on specified unique features, and is virtual tabular data (see FIG. 19). Each column 1206, 1207 is a feature from a Feature Data Set 1203. The columns also are associated with a feature data manager 1202. Each row is a junction of Events 1204 from each Feature Data Set 1203. The join feature attribute values 1205 are the joined features attributes from each row and column. The entire junction is the table 1201.

(125) A model manager 1013 is a logical construct which is comprised of a model orchestrator 1007 and one or more feature model managers 1008, which cooperate to generate models.

(126) A prediction manager 1014 is a logical construct which is comprised of a prediction orchestrator 1010 and one or more feature prediction managers 1011, which cooperate to create scores and statistics (a.k.a. predictions).

(127) 1.2 Distributed Scoring

(128) The distributed scoring process is accomplished in two steps. First, partial scores are calculated on each feature manager 1004 on each server. Then, complete scores are calculated from the partial scores.

(129) The combined scores are the sum of the scores from each server divided by the sum of the weights from each server, multiplied by two:

(130) $score (e, R) = \frac{ScoreA + ScoreB}{2 \times {.Math.}_{i = 1}^{m} w_{Ri}}$

(131) In this formula, the score for server A and B are similar to the DensiCube scoring described above.

(132) 0 $ScoreA = {.Math.}_{i = 1}^{m} w_{Ri} (\frac{.Math. e_{Ri} - n_{Ri} .Math. - .Math. e_{Ri} - p_{Ri} .Math.}{.Math. p_{Ri} - n_{Ri} .Math.} + 1)$ $ScoreB = {.Math.}_{i = 1}^{m} w_{Ri} (\frac{.Math. e_{Ri} - n_{Ri} .Math. - .Math. e_{Ri} - p_{Ri} .Math.}{.Math. p_{Ri} - n_{Ri} .Math.} + 1)$

(133) The weights are also determined for each location, as above.

(134) $w_{Ri} = \frac{.Math. p_{R i} - n_{R i} .Math.}{\max_{R i} - \min_{R i}}$ $w_{Ri} = \frac{.Math. p_{R i} - n_{R i} .Math.}{\max_{R i} - \min_{R i}}$

(135) With the combined score, we have a metric to show the validity of the selected model.

(136) 2.0 Initial Architectural Concept of Operation and Requirements

(137) 2.1 Feature Manager 1004

(138) At the initialization of the machine learning model generation process, each feature manager 1004 is setup on the local servers 1002. Each feature manager 1004 must be uniquely named (e.g., within the subnet where it lives). The port number where the feature manager 1004 can be reached needs to be defined. Access control needs to be configured, with a certificate for the feature manager 1004 installed and the public key for each modeler 1003 and feature prediction manager 1011 installed to allow access to this feature manager 1004. Each local feature manager 1004 needs to broadcast the name, host, port and public key of the feature manager 1004. In some embodiments, the feature manager 1004 needs to listen to other broadcasts to verify uniqueness.

(139) Next, the data sources are defined. A seen in FIGS. 15 and 19, the data source is tabular form (Rows & Columns). In another embodiment, a Relation Data Source is a collection of Data Tables which themselves contain tabular data. The important characteristic is to be able to define a Data Set Template which results in the Column definition of tabular data. Each Data Source must be uniquely identified by name within a feature manager 1004. Each Column must be uniquely identified by name within a Data Source. At least one Column in each Data Source must be unique and suitable for joining to other Data Sources. It must have meaning outside the Data Source such that the feature model managers 1008 can join the Data Source to other Data Sources.

(140) Each Data Source shall be described by a name for the data source and a plurality of columns, where each column has a name, a data type, and a uniqueness field. Data Sources can be used by feature model managers 1008 or feature prediction managers 1011 or both. Data Sources are probably defined by calls from a modeler 1003.

(141) The next step involves defining the Data Set Templates. A Data Set Template is a specification of how to join Data Sources defined within a feature data manager 1006. Each Data Set Template must be uniquely identified by name within a feature data manager 1006. A Data Set Template is a definition of Columns without regard to the Rows in each Data Source. For example, a Data Set Template could be represented by a SQL select statement with columns and join conditions, but without a where clause to limit rows. Data Set Templates can be used by feature model managers 1008 or feature prediction managers 1011 or both. Data Set Templates are probably defined by calls from a feature model manager 1008.

(142) Once the Data Set Templates are setup, the next step is to define the Data Sets. A Data Set is tabular data which is a subset of a data from the Data Sources defined within a feature data manager 1006. Each Data Set must be uniquely identified by name within a feature data manager 1006. A Data Set is defined by a Data Set Template to define the columns and a set of filters to define the rows. For example, the filter could be the where clause in a SQL statement. Data Sets can be used by modelers 1003 or feature prediction managers 1011 or both. Data Sets are probably defined by calls from a modeler 1003.

(143) 2.2 Modeler 1003

(144) In FIG. 17 the relationship between the modelers 1003, the predictors 1009, and the feature managers 1004.

(145) In the setup of the model orchestrator 1007, each modeler 1003 should be uniquely named, at least within the subnet where it lives. However, in some embodiments, the uniqueness may not be enforceable. Next the access control is configured by installing a certificate for the modeler 1003 and installing the public key for each feature manager 1004 containing pertinent data. The public key for each feature prediction manager 1011 is also installed, to which this modeler 1003 can publish.

(146) Once set up, the model orchestrator 1007 establishes a connection to each feature model manager 1008.

(147) Then the Model Data Set templates are defined. A Model Data Set Template is a conjunction of Data Set Templates from feature data managers 1006. Each Data Set Template must be uniquely named within the feature manager 1004. The Data Set Templates on feature data managers 1006 are defined, as are the join conditions. A join condition is an equality expression between unique columns on two Data Sets. For example <Feature Manager A>.<Data Set Template 1>.<Column a>==<Feature Manager B>.<Data Set Template 2>.<Column b>. Each data set participating in the model data set must be joined such that a singular virtual tabular data set is defined.

(148) After the templates are defined, the model data sets themselves are defined. A Model Data Set is a conjunction of Data Sets from feature data managers 1006. The Model Data Set is a row filter applied to a Model Data Set Template. Each Data Set must be uniquely named within a Model Data Set Template. Then the data sets on the feature data managers 1006 are defined. This filters the rows.

(149) Next, the Modeling Parameters are defined. Modeling Parameters define how a Model is created on any Model Data Set which is derived from a Model Data Set Template. Each Modeling Parameters definition must be unique within a Model Data Set Template.

(150) Then, a model is created and published. A model is created by applying Modeling Parameters to a Model Data Set. Each Model must be uniquely identified by name within a Model Data Set. A Model can be published to a feature prediction manager 1011. Publishing will persist the Model artifacts in the feature model managers 1008 and feature prediction managers 1011. Following are some of the artifacts which will be persisted to either the feature data manager 1008 and/or feature prediction manager 1011: Data set templates, model data set templates, and the model.

(151) 2.3 Prediction Orchestrator 1010

(152) The prediction orchestrator 1010 setup begins with the configuration of the access control. This is done by installing a certificate for the feature prediction manager 1011 and installing the public key for each modeler 1003 allowed to access this prediction orchestrator 1010. The public key for each feature manager 1004 containing pertinent data is also installed. Each prediction orchestrator 1010 should be uniquely named, but in some embodiments this may not be enforced.

(153) Next, a connection to each feature prediction manager 1011 is established and to a model orchestrator 1007. The model orchestrator 1007 will publish the Model Data Set Template and Model to the prediction orchestrator 1010.

(154) The scoring data sets are then defined. A Scoring Data Set is a conjunction of Data Sets from the feature data managers 1006. It is a row filter applied to a Model Data Set Template. Each Data Set must be uniquely named within a Model Data Set Template. The data sets on the feature data managers 1006 are defined (this filters the rows).

(155) Then the Scoring Parameters are defined. Scoring Parameters define how Scores are calculated on any Score Data Set which is derived from a Model Data Set Template. Each Scoring Parameters definition must be unique within a Model Data Set Template.

(156) Finally, a Scoring Data Set is defined. Partial Scores are calculated on each feature manager 1004 in the feature prediction manager 1011. See FIG. 18A. Complete Scores are then calculated by the prediction orchestrator 1010 from the partial Scores. See FIG. 18B for the calculation combining the partial scores.

(157) Looking to FIG. 20, we see the distributed nature of the Distributed DensiCube algorithm. The algorithm starts 1301 by initializing the software. The data requirements are setup, and the distributed sources of the data are identified 1302. Once the data features have been identified, a list of the IDs 801, the learning results (e.g. the Loan Results in 802), and the perhaps the desired features (e.g. Amount Borrowed n 802, Credit Score and Total Debt in 803, Home Ownership and Home Value in 804) are sent 1303 to the data silos 1002. In some embodiments, the desired features are not sent, instead, the feature manager 1004 on the data silo 1002 determines the features. While the FIG. 15 embodiment has a tri-state results (+, −, and blank), some embodiments only use a two state results set (“+” or blank). In the two state embodiment, there is no need to transmit the learning results; instead only a list of IDs is sent, with the implication that the IDs specified are the set of positive results.

(158) The feature managers 1004 on each of the data silos 1002 then initialize the site 1311, 1321, 1331. The data on the silo 1002 is then sliced, using the list of IDs and the features 1312, 1322, 1332 into a data set of interest, by the feature data manager 1006. The DensiCube algorithm 1313, 1323, 1333 is then run by the feature model manager 1008 on the data of interest, as seen in FIGS. 9, 10, and 11. Once the DensiCube algorithm 1313, 1323, 1333 is complete, the rule and the F-score are finalized by the feature prediction managers 1011, the rule and F-Scores are returned 1314, 1324, 1334 to the prediction orchestrator 1010. In some embodiments, only the F-Scores are returned 1314, 1324, 1334, and the rules are maintained locally in the feature managers 1004.

(159) The rules, in some embodiments, are then returned to the prediction orchestrator 1010 where they are combined into an overall rule 1304, as seen in FIG. 18A. Next, the F-Scores are combined 1304 by the prediction orchestrator 1010 into an overall F-Score for the generated rule using the formulas in FIG. 18B. And the Distributed DensiCube algorithm is complete 1305.

(160) Modifications to the scoring algorithms to support privacy preserving in the data silos.

(161) The above description of the embodiments, alternative embodiments, and specific examples are given by way of illustration and should not be viewed as limiting. Further, many changes and modifications within the scope of the present embodiments may be made without departing from the spirit thereof, and the present inventions include such changes and modifications.

Cash flow forecasting using a bottoms-up machine learning approach

Assignee

Inventors

Cpc classification

Classification Explorer

G06Q20/381

PHYSICS

Classification Explorer

G06N5/025

PHYSICS

Classification Explorer

G06Q20/389

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06Q40/12

PHYSICS

Classification Explorer

G06Q20/108

PHYSICS

Classification Explorer

G06F18/23213

PHYSICS

Classification Explorer

G06Q20/02

PHYSICS

Classification Explorer

G06Q10/04

PHYSICS

Classification Explorer

G06Q20/047

PHYSICS

Classification Explorer

G06Q40/02

PHYSICS

Classification Explorer

G06Q40/04

PHYSICS

International classification

Classification Explorer

G06Q20/10

PHYSICS

Classification Explorer

G06Q40/00

PHYSICS

Classification Explorer

G06Q10/04

PHYSICS

Classification Explorer

G06Q20/02

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06Q20/04

PHYSICS

Classification Explorer

G06Q40/02

PHYSICS

Abstract

Claims

Description