2016-01-22, Jeroen Visser
Most people who (semi-) automatically trade via IB (Interactive Brokers) use the IB API via TWS or the IB gateway. This API enables trading software to receive market feed and send orders. In this article I suggest a few simple improvements regarding the IB API. I’ll start by describing a few of the issues I’ve coped with in the past 2 years and along the way I’ll suggest a couple of simple, but very powerful improvements.
When I started working with the IB API, I noticed the API uses doubles to model prices. Based on my past experience, I think this is a bad idea. Prices are decimals and not doubles. A decimal is specified by a mantissa M and an exponent E in the following formula: M*10^E. A double is specified as a mantissa M and an exponent E in the following formula: M*2^E. Modelling prices as doubles can lead to unexpected rounding problems.
The funny thing is that the raw IB socket feed does deliver these prices in decimals and not in doubles. So the IB API makes a conversion from a decimal to a double when receiving feed and a conversion from a double to a decimal when sending orders. These conversions seem unnecessary and in itself can already lead to rounding problems.
Often trading software manipulates incoming prices before they are send in an order message, e.g. add or subtract a tick. These manipulations can also lead to rounding errors when using doubles.
I have fixed this in C++ by directly processing the raw IB socket feed and creating a price model based on a new datatype: decimal. Note that in C# such a datatype is already available and is called a decimal as well. Another funny thing: the C# version of the IB API still uses doubles. Why is that?
Triggered by the decimal/double issue, I started to decode the IB messages directly from the socket hence creating my own version of the IB API. Which in fact is not that difficult. In the past, my previous company made client connection software for exchanges all over the world (~60 exchanges). Most of them were message based. Our recipe was simple but very effective, get the message spec, generate the data model and the encoding mechanisms, implement the communication protocols on top of the socket, and you were ready to go. I also used this recipe for the IB API.
The problem of the IB API is, that an official message specification is not available. Luckily, the spec is embedded in the code of the IB API itself. It is not hard to derive the specification of the messages from the code. In the end the data model for the messages is quite simple (with the exception of the order messages, but that is do-able as well).
For example a TickPrice message consists of the following fields:
A Type, unsigned int specifying we are dealing with a TickPrice message.
A Version, unsigned int specifying the version of the message. This is done so that IB can change the structure of the message while still providing backward compatibility in their API at the same time.
A TickerId, unsigned int specifying to which subscription the message belongs. Just a handy tip: use the contractId.
A TickType, unsigned int specifying a LAST_PRICE, BID_PRICE, ASK_PRICE, etc.
A Price, decimal specifying the price.
A Size, unsigned int specifying the size,
A AutoExecute, bool specifying… I don’t know, I suspect it specifies whether the price is actually tradeable or so.
The encoding mechanism of the IB API is also very simple: the fields are encoded in ASCII and each field is separated by a special delimiter character ‘\0’, i.e. a character containing the binary value 0.
The average size of a TickPrice message is ~35 bytes. This may differ per exchange, e.g. FOREX messages are much bigger because of the bytes needed to encode big-digit prices and sizes.
To summarize, the IB API is a simple TCP/IP message based API using a dynamic size ASCII encoding. This implies the following:
The total size of a message is dynamic in nature, e.g for the number ‘10’ one would need 2 bytes and for the number ‘100’ 3 bytes are needed. When designing an efficient encoding mechanism it is a big advantage when you know the size of the message up front.
Because the fields are dynamic in size, they need to be separated by a special delimiter character: ‘\0’. This means that a 1 digit number already acquires 2 bytes. The delimiter character does not add any real information to the message. It is just a consequence of using a dynamic encoding.
Big numbers, i.e. with a lot of digits, acquire a lot of bytes.
For 2 years I have been recording and storing raw feed from the IB API for the purpose of backtesting automated trading algorithms. A size of ~35 bytes per message does not sound a lot, but I receive millions of these messages per day. So in the end I need quite some storage to store all these messages. I could zip them, but that requires also an CPU expensive unzip slowing down my backtests.
Basically, I need smaller messages which need less storage space and which can be decoded faster.
A suggested improvement is to change the dynamic-size ASCII encoding into a fixed-size binary encoding. Most fields can be encoded as binary numbers of fixed size. E.g. the TickPrice message:
A Type, unsigned int of 1 byte.
A Version, unsigned int of 1 byte.
A TickerId, unsigned int of 4 bytes.
A TickType, unsigned int of 2 bytes.
A Price, decimal of 5 bytes, 4 bytes for the mantissa and 1 byte for the exponent.
A Size, unsigned int of 4 bytes
A CanAutoExecute, bool of 1 byte.
This means that a TickPrice message can be encoded in only 18 bytes! That is almost half the size of the current TickPrice message of ~35 bytes.
I have applied this binary encoding scheme to all my recorded messages resulting in an enormous size reduction. Because of the size reduction and eliminating the CPU-expensive and error prone decimal-double conversions, the backtests execute super fast.
Reducing the message size would also lead to faster executions and receiving feed faster, since the size reduction has a very big effect on the network latency. I do not know what kind of encoding is used between TWS and the IB backend. But if IB uses the same ASCII format, the advantage of using a binary format may be even bigger.
To summarize, I only suggest 2 simple but very powerful improvements for the IB API:
Model prices as decimals, it not only avoids unexpected rounding problems, but it also avoids unnecessary CPU-expensive decimal to double (feed) conversions and double to decimal conversions (order).
Use a simple fixed-size binary encoding for the number and price fields. It reduces the size of the messages by a factor 2 improving the overall network latency and throughput. Besides that, a binary format is more computer friendly, it requires less CPU ticks for message encoding.