On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Babbage, Charles (1864). Passages from the Life of a Philosopher.
The first step in any data-centric analysis is to understand the input data, and subsequently assess its quality. The common refrain, "Garbage In, Garbage Out" becomes more than a cliche if you have ever made the mistake of not understanding input data prior to use in whatever model you are building. Often, garbage data will yield results that show great promise, only to be revealed as noise upon further, time-expensive, emotionally draining analysis. Worse, the garbage data may mask the genuine validity of a model, leading to erroneous rejection.
TradingPhysics is a distributor of Nasdaq's TotalView-ITCH historical data. These data allow reconstruction of the full order book depth -- data needed for market microstructure research. Although there are many distributors, TradingPhysics provides single-day ITCH files in table, CSV, XML, or binary formats. Dollar per datum it's expensive, but in terms of cost of entry it is the least expensive source I am aware of. (Feel free to correct me!) Since this is a how-to style blog, cost of entry trumps dollar per datum.
The remainder of this post assumes you have downloaded an ITCH file from TradingPhysics. (Note: They occasionally gift 5 download credits for a signup -- chances are you can follow along without paying anything for one file.) I will be working with the ITCH data for SPY on July 9th, 2012. I have not provided the code used to produce this post, as it is embedded in a larger (proprietary) system I have built over time; however, none of the statistics discussed are difficult to run in your preferred language / environment.
A test was conducted to assert that no message refers to an order id prior to the display of the initiating AddBuyOrder or AddSellOrder. This test passed. There are no references to orders not present in the ITCH file.
A test was conducted to assert that the messages were in chronological order; that is, no message existed in the file that was timestamped before any message previously seen. This test passed.
Ascending Order Id's
A test was conducted to assert that all AddBuyOrder and AddSellOrder messages had order ids in ascending order; that is, when an AddBuyOrder or AddSellOrder was printed, the order id was greater than any order id previously seen. This test failed.
Why? The order id is a unique reference within the context of a single day. Although I have found no text explicitly stating that order id's are incremental, the assumption seems likely. Looking more closely, it is apparent that there was only one offensive order: an AddBuyOrder with a 09:30:00.135 timestamp, an order id of 527,679, and a price of $124.33.The recorded low for the day was 134.70. Separately, simulation of the order book shows that this order would never have been hit. This order was deleted at 16:00:01.815, and was not hit for a fill. If this order id was excluded, the minimum observed order id was 764,002 and occurred as the first message in the file.
I'm not sure why this order appeared, but I believe it can safely be ignored.
After observing the single, out-of-order order id, I dug deeper in the the distribution of order prices. The minimum price quoted bid was $0.01 and the maximum quoted ask was $190,000.00. I'm not entirely sure why these quotes exist, but I suspect they are not errors. Instead, I suspect they represent a form of disinformation, targeting naive traders who look at summary statistics operating on the quotes.
As a contrived example, assume a trader placed sell orders as a function of the size weighted average bid price. The maximum size of bid orders existing at once on the book was 1,939,707. If all but 100 shares of that size were quoting at $0.01, and the 100 shares were quoting at $135.00, the average would drop $0.007 -- almost a full penny. Again, it's contrived, but almost a penny might evoke exploitable action.
Frequency of Message Types
TradingPhysics ITCH files contain eight message types: Add buy order, Add sell order, Execute outstanding order in part, Cancel outstanding order in part, Execute outstanding order in full, Delete outstanding order in full, Bulk volume for the cross event, and Execute non-displayed order. This represents a subset of the message types present in Nasdaq-ITCH historical data. Exclusions include but are not limited to System Event, Stock Directory, Stock Trading Action, Market Participant Position, Net Order Imbalance Indicator.
On July 9th, the following message type frequencies were observed:
|Message Type||Absolute Frequency||Relative Frequency|
Delete Full and Cancel Part
Investors who are unaccustomed to viewing order flow may look at the relative frequency of Delete Full messages with skepticism, or at least caution. Nearly fifty percent of all messages are deletes. While it may evoke skepticism, this is a reality it contemporary markets. Traders are continuously jockeying for position. To do so, they are constantly placing and canceling orders. (Although I plan on writing subsequent posts on why this happens, the No BS Trading eBook does an excellent job for the uninitiated.)
Of the buy orders, 215,433 (29.1%) are canceled within one second; 216,975 (27.1%) of the sell are canceled within one second.
Add Sell Order and Add Buy Order
The number of buy orders and sell orders are roughly in balance, with sell orders dominating slightly.
Execute Full and Partial Execute
Again, in contradiction with an investor's instincts, only 3.31% of all posted messages are executions. Looking from a different perspective, only 6.73% of buy or sell orders result in any form of execution(7.00% with non-displayed executions). Even more impressive, only 0.61% of all AddBuyOrder and AddSellOrder size results in an execution. (The sum of all AddBuyOrder and AddSellOrder size was 1,092,134,379 shares; the number of shares traded was 6,615,069, excluding the opening and closing crosses.)
Also note, looking at the total size of all executions including non-displayed and bulk crossings (6,621,094.0) an SPY trader might notice that this is only a fraction of the daily SPY volume. SPY is listed on the NYSE Arca; he Nasdaq-ITCH order flow only represents SPY traded on Nasdaq.
Looking at frequency relative to all message types, ExecuteNonDisplayed messages are rare; however, the frequency of non-displayed executions relative to all executions is more impressive. By number of messages, roughly 3.24% of all executions are the result of non-displayed orders; however, by volume, roughly 20% of all executions are the result of non-displayed orders. (5,301,218 shares were executed on displayed orders; 1,313,851 were executed on non-displayed orders.)
Non-displayed orders are interacted with indirectly -- that is, by executing against them. A message of the executed order's price and size is printed, but there is no complementary, preexisting AddBuyOrder or AddSellOrder. Additionally, the order may have reserves, obfuscating the underlying order's size.
See: Dark Pools.
Bulk Volume Cross
Two bulk volume crossings are present. One occurs at 09:00:00.135 and the other occurs at 16:00:00.540, representing opening and closing crosses, respectively.
Looking at the (CSV) file positions of the messages, the opening cross is at line 90,697 and the closing cross is at line 3,097,695; that is, the opening and closing crosses do not exist at the head and tail of the file, as might be naively expected. This expectation fails to consider that the TotalView-ITCH stream prints extended hour orders (7:00am to 8:00pm) as well as regular market hours (9:30am to 4:00pm). The first message in the file occurs at 07:00:03.069 (AddSellOrder); the last message in the file occurs at 20:00:00.159 (DeleteFull).
Notice: The only relationship I have with TradingPhysics is as a customer. I receive no benefit from them for this post.