Sanjay Arora <sanjay.k.arora@gmail.com> writes:
> Do you mean something like storing one month worth tick data in a blob
> type field and giving the contents to the CEP engine for further
> building of required data streams?
Either that or put the tick data in an external file and store that
file's pathname in the database row. As I said, you can find plenty
of argumentation on both sides of that in the PG archives.
When SSS were doing this, we had the raw tick data in one set of files
and pre-aggregated bar data in other files (I think we stored 5-min
and daily bars, but it was a long time ago). The analysis functions
would automatically build the bar width they wanted from the widest
stored form that divided the desired width, so as to minimize what they
had to read from disk.
It does sound like you are doing pretty much exactly what we were
doing. One thing to think about is that the real-time case is actually
much slower and easier to deal with than back-testing. When you are
back-testing, you'd like to test a trading strategy over say a year's
worth of data, and you'd like that to require a bit less than a year
to run, no? So the path you have to optimize is the one feeding stored,
pre-aggregated data into your test engine. The people hollering about
real-time performance are just betraying that they've never built one of
these things. I'm not familiar with this CEP software, but it sounds to
me like you want that as part of the real-time data collection process
and nowhere near your back-testing data path.
Another little tip: if your trading strategies are like ours were,
they need to ramp up for actual trading by pre-scanning some amount of
historical data. So you're going to need a mechanism that starts by
reading the historical data at high speed and smoothly segues into
reading the real-time feed (at which the passage of time in the model
suddenly slows to one-to-one with real time). Also consider what
happens when your machine crashes (it happens) and you need to not only
redo that startup process, but persuade the model that its actual
trading position is whatever you had open. Or you changed the model a
bit and restart as above. The model might now wish it was in a different
position, but it has to cope with your actual position.
The crash reliability of a DBMS is a strong reason why you track your
live positions in one, btw ... your brokerage isn't going to forget you
were short IBM, even if your model crashes.
regards, tom lane