Data Pre-Processing and Searching Systems
20180012239 · 2018-01-11
Inventors
- Ari L. Studnitzer (Northbrook, IL, US)
- David John Geddes (Antrim, GB)
- Inderdeep Singh (Palatine, IL, US)
- Steven Hutt (Sutton, GB)
- Bernard Pieter Hosman (Amsterdam, NL)
Cpc classification
G06Q30/0201
PHYSICS
International classification
Abstract
Systems and methods for pre-processing data to facilitate efficient and accurate machine learning are provided. The data may include market data. The pre-processing may include partitioning the data into windows assigning categories to windows generate a series of vectors.
Claims
1. A computer system comprising: a processor; a tangible computer-readable medium containing computer-executable instructions that when executed by the processor cause the computer system to process market data for use by a machine learning computer by performing the steps comprising: (a) receiving a collection of market data that includes time stamps, price levels and order quantities; (b) determining for each time stamp a difference in order quantity at each price level when compared to order quantity at the same price level at the previous time stamp; and (c) partitioning the collection of market data into a sequence of time period windows.
2. The computer system of claim 1, wherein (c) comprises: selecting a length of the time period windows to reveal patterns in market data.
3. The computer system of claim 1, wherein (c) comprises: selecting a length of the time period windows to reveal structures in market data.
4. The computer system of claim 1, where the tangible computer-readable medium further contains computer-executable instructions that when executed by the processor cause the computer system to perform the step comprising: (d) determining quantiles for changes in order quantities;
5. The computer system of claim 4, wherein (d) comprises; comparing order quantities prior to a windows to windows to order quantities within windows.
6. The computer system of claim 5, wherein (d) comprises: classifying changes in order quantities that are large and small increases and decreases.
7. The computer system of claim 6, wherein (d) comprises: analyzing order quantity changes over multiple windows.
8. The computer system of claim 4, where the tangible computer-readable medium further contains computer-executable instructions that when executed by the processor cause the computer system to perform the step comprising: (e) assigning a category for each time period window in accordance with the quantiles determined in (d).
9. The computer system of claim 8, (e) further comprises: assigning a category for each time stamp within a time period window in accordance with the quantiles determined in (d).
10. The computer system of claim 9, wherein the categories comprises: large increase in ask order quantity, small increase/decrease in ask order quantity, large decrease in ask order quantity, no order quantity, large decrease in bid order quantity, small increase/decrease in bid order quantity and large increase in bid order quantity.
11. A computer implemented method of processing data for training a machine learning system, the method comprising: (a) receiving a collection of market data that includes time stamps, price levels and order quantities; (b) determining for each time stamp a difference in order quantity at each price level when compared to order quantity at the same price level at the previous time stamp; and (c) partitioning the collection of market data into a sequence of time period windows; and (d) determining quantiles for changes in order quantities.
12. The computer implemented method of claim 11, wherein (d) comprises: comparing order quantities prior to a windows to windows to order quantities within windows.
13. The computer implemented method of claim 12, wherein (d) comprises: classifying changes in order quantities that are large and small increases and decreases.
14. The computer implemented method of claim 12, further including: (e) assigning a category for each time period window in accordance with the quantiles determined in (d).
15. The computer implemented method of claim 14, wherein (e) further comprises: assigning a category for each time stamp within a time period window in accordance with the quantiles determined in (d).
16. A computer implemented method of searching market data, the method comprising: (a) receiving a collection of market data that includes time stamps, price levels and order quantities; (b) extracting features from windows of market data that include start and end times; (c) receiving a search request that identifies a search window of market data; (d) comparing extracted features from the search window to extracted features from other windows; and (e) returning search results that include result windows that are similar to the search window.
17. The computer implemented method of claim 16, wherein (b) comprises: (i) processing market data to generate preprocessed data for use by a machine learning computer; and (ii) processing the preprocessed data with a machine learning computer to extract features.
18. The computer implemented method of claim 17, wherein (i) comprises: (1) determining for each time stamp a difference in order quantity at each price level when compared to order quantity at the same price level at the previous time stamp; and (2) partitioning the collection of market data into a sequence of time period windows.
19. The computer implemented method of claim 18, wherein (i) further comprises: (3) determining quantiles for changes in order quantities; and (4) assigning a category for each time period window in accordance with the quantiles determined in (3).
20. The computer implemented method of claim 19, wherein (3) comprises: classifying changes in order quantities that are large and small increases and decreases.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] The present invention may take physical form in certain parts and steps, embodiments of which will be described in detail in the following description and illustrated in the accompanying drawings that form a part hereof, wherein:
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
DETAILED DESCRIPTION OF THE INVENTION
[0022] Aspects of the present invention are preferably implemented with computer devices and computer networks that allow users to exchange trading information. An exemplary trading network environment for implementing trading systems and methods is shown in
[0023] An exchange computer system 100 receives orders and transmits market data related to orders and trades to users. Exchange computer system 100 may be implemented with one or more mainframe, desktop or other computers. A user database 102 includes information identifying traders and other users of exchange computer system 100. Data may include user names and passwords. An account data module 104 may process account information that may be used during trades. A match engine module 106 is included to match bid and offer prices. Match engine module 106 may be implemented with software that executes one or more algorithms for matching bids and offers. A trade database 108 may be included to store information identifying trades and descriptions of trades. In particular, a trade database may store information identifying the time that a trade took place and the contract price. An order book module 110 may be included to compute or otherwise determine current bid and offer prices. A market data module 112 may be included to collect market data and prepare the data for transmission to users. A risk management module 134 may be included to compute and determine a user's risk utilization in relation to the user's defined risk thresholds. An order processing module 136 may be included to decompose delta based and bulk order types for processing by order book module 110 and match engine module 106.
[0024] The trading network environment shown in
[0025] Computer device 114 is shown directly connected to exchange computer system 100. Exchange computer system 100 and computer device 114 may be connected via a T1 line, a common local area network (LAN) or other mechanism for connecting computer devices. Computer device 114 is shown connected to a radio 132. The user of radio 132 may be a trader or exchange employee. The radio user may transmit orders or other information to a user of computer device 114. The user of computer device 114 may then transmit the trade or other information to exchange computer system 100.
[0026] Computer devices 116 and 118 are coupled to a LAN 124. LAN 124 may have one or more of the well-known LAN topologies and may use a variety of different protocols, such as Ethernet. Computers 116 and 118 may communicate with each other and other computers and devices connected to LAN 124. Computers and other devices may be connected to LAN 124 via twisted pair wires, coaxial cable, fiber optics or other media. Alternatively, a wireless personal digital assistant device (PDA) 122 may communicate with LAN 124 or the Internet 126 via radio waves. PDA 122 may also communicate with exchange computer system 100 via a conventional wireless hub 128. As used herein, a PDA includes mobile telephones and other wireless devices that communicate with a network via radio waves.
[0027]
[0028] One or more market makers 130 may maintain a market by providing constant bid and offer prices for a derivative or security to exchange computer system 100. Exchange computer system 100 may also exchange information with other trade engines, such as trade engine 138. One skilled in the art will appreciate that numerous additional computers and systems may be coupled to exchange computer system 100. Such computers and systems may include clearing, regulatory and fee systems.
[0029] The operations of computer devices and systems shown in
[0030] Of course, numerous additional servers, computers, handheld devices, personal digital assistants, telephones and other devices may also be connected to exchange computer system 100. Moreover, one skilled in the art will appreciate that the topology shown in
Pre-Processing of Data
[0031] Machine learning is a methodology that may be used to identify structure in data. For example, sequences of related events (i.e. contiguous in time and price) in a limit order book are often of interest, whereas small changes in a limit order book may be regarded as noise. Machine learning can require a lot of processing resources, particularly when large amounts of data are analyzed. The accuracy of the learning process can also be reduced as the size of the data increases.
[0032] Some embodiments of the invention include a pre-processing process prior to machine learning. The disclosed pre-processing processes reduce processing requirements during the machine learning process. The disclosed pre-processing processes also allow machine training algorithms to generate accurate results.
[0033] Pre-processing balances filtering irrelevant data (noise) with retaining relevant data (that could potentially contribute to a signal). For example, the analysis of patterns in order books requires decisions on which order book changes are key and how to represent those changes. Without pre-processing of data, the machine learning machine may waste computational time and resources learning details which are not of interest. Proper pre-processing increases the efficiency of the operation of a machine learning computer or machine.
[0034] An exemplary process for pre-processing data is shown in
[0035] Next, in step 406 the collection of market data is portioned into a sequence of time period windows. Each window being a fixed number of consecutive rows. The size of the window can be adjusted, and may be set to a size that can encompass a pattern or structure within the market data. After the data has been partitioned, quantiles for changes in limit order quantities are determined in step 408. The quantity quantiles may be computed for a period prior to the beginning of a window. These quantiles may be used to determine quantity change categories. For example, the categories may be “large increase,” “large decrease,” and “small increase or decrease.”
[0036] Finally, in step 410 a category may be assigned for each time period window in accordance with the quantiles determined in step 408. An exemplary set of categories includes: [0037] a. large increase in ask order quantity [0038] b. small increase/decrease in ask order quantity [0039] c. large decrease in ask order quantity [0040] d. no order quantity [0041] e. large decrease in bid order quantity [0042] f. small increase/decrease in bid order quantity [0043] g. large increase in bid order quantity
[0044] In accordance with some embodiments of the invention, these seven categories are represented as a 7-dimensional, one-hot binary vector. This final form of the data used as input to training the machine learning machine follows: [0045] 1. N windows consisting of: [0046] 2. P price levels×T timestamps, each of which is a: [0047] 3. 7-dimensional one-hot binary vector.
[0048] For the purposes of visualization, a single window may be represented as shown in
[0049] In some embodiments of the invention, the pre-processing results are used by a computer system that executes a machine learning algorithm. The machine learning process may involve training a neural network, such as a recurrent neural network (RNN), as needed.
Market Data Searching
[0050] Financial market data may be viewed as closer to a video than an image. Financial exchanges receive incoming order flow which may be FIFO processed by a matching engine. The matching engine reports each change in the Limit Order Book with a timestamp. Hence market data can be represented as a time series or a sequence of events. Each event updates the state of the Limit Order Book.
[0051] Some embodiments of the invention allow a user to specify a historical period of market data defined by a contract, a start time and an end time. This ‘request’ period is a ‘snapshot’ of the market data that occurred in the past. The user will then request a search for other historical periods which exhibit similar patterns of order flow, not necessarily on the same contract. The search will return a selection of historical periods, so called ‘matching’ periods. Both ‘request’ and ‘matching’ periods are presented to the user in a visual representation of the data. Request periods may be ordered according to their machine defined similarity.
[0052] Embodiments of the invention include a system for searching market data based on historic market data patterns.
[0053] Next, in step 504 features are extracted from windows of market data that include start and end times. Step 504 may include one or more of the pre-processing steps described above. In one example, feature extraction can be done by a computer executing computer-executable instructions and that uses a neural network specifically adapted for the statistical structure of market data. Once trained, the system may provide a feature mapping from sequences of market snapshots to a so-called feature space. The feature space may be a lossy encoded compression of the sequence. In other words, compression of sequences of market data snapshots removes “noise” in a market data sequence and retains the “signal”, i.e., the unique features of market data behavior that make up the feature space. A sequence of market snapshots may be mapped to a point in the feature space. The feature space allows for a distance metric to be calculated between any two points in the feature space.
[0054] A search request that identifies a search window of market data is received in step 506. The search request may be manually created by a trader or exchange employee. In some embodiments the search request may be created by a computer system executing an algorithm.
[0055] A search is performed in step 508 by comparing the extracted features from the search window to extracted features from other windows. The search function may be implemented as follows in some embodiments of the invention: [0056] 1. The system uses feature mapping to map a ‘request period’ to a ‘request point’ in feature space. This is done by first pre-processing the raw data, as described above, and then compressing the processed data. Compression of this data results in a representation of features that are unique or display some level of structure or pattern. The more structure the data has, the more it can be compressed. The search compares the compressed search query features to all other features in the feature search set. [0057] 2. The search algorithm ranks points in the feature search set according to their distance from the ‘request point’. Specifically, the search query is a point in n-dimensional space, and the other points in the feature search set representing historical features can also be represented as points. The distance between the request point and all the other points in the feature space can be computed. [0058] 3. The nearest ‘n’ points in the feature set are returned with ranking.
[0059] In some embodiments, the search process is as follows:
[0060] 1. User identifies a contract and period of interest known as the ‘request period’. This normally will be submitted as a time slice of “raw” market data, i.e., market data in the same format which is generally received from the exchange or other market data provider. [0061] 2. The ‘request period’ is passed to a software application. [0062] 3. The system searches for ‘matching periods’ with similar patterns to the ‘request period’. [0063] 4. The system returns a ranked list of ‘matching periods’. The shortest distance between the points is considered most similar (highest ranked)
[0064] After the search is performed, search results are returned that include result windows that are similar to the search window in step 510. For the purposes of user interaction, a sequence of market snapshots may be returned in a visual representation. An exemplary representation is shown in
[0065] Step 512 includes predicting a future change to the market based on at least one change that happened after at least one of the result windows. Step 512 may include predicting a liquidity event or any other condition of impacting price discovery. In some embodiments, step 512 may include predicting changes that are of interest to traders, such as changes in values of contracts or indexes. Step 514 preventative action may be taken to at least limit the impact of the predicted future change to the market. The preventative action may include pausing a market, suspending an account, halting trading and other actions taken by an exchange to limit or end an undesired market condition.
[0066] The present invention has been described herein with reference to specific exemplary embodiments thereof. It will be apparent to those skilled in the art that a person understanding this invention may conceive of changes or other embodiments or variations, which utilize the principles of this invention without departing from the broader spirit and scope of the invention. All are considered within the sphere, spirit, and scope of the invention.