I have a system that currently inserts ~ 250 million rows per day (I
have about 10k more raw data than that, but I'm at the limit of my
ability to get useful insert performance out of postgres).
Things I've already done that have made a big difference:
- modified postgresql.conf shared_buffers value
- converted to COPY from individual insert statements
- changed BLCKSZ to 32768
I currently get ~35k/sec inserts on a table with one index (~70k/sec
inserts if I don't have any indexes).
The indexed field is basically a time_t (seconds since the epoch),
autovacuum is running (or postgres would stop choosing to use the
index). The other fields have relatively lower cardinality.
Each days worth of data gets inserted into its own table so that I
can expire the data without too much effort (since drop table is much
faster than running a delete and then vacuum).
I would really like to be able to have 1 (or 2) more indexes on the
table since it takes a while for a sequential scan of 250million rows
to complete, but CPU time goes way up.
In fact, it looks like I'm not currently IO bound, but CPU-bound. I
think some sort of lazy-index generation (especially if it could be
parallelized to use the other processors/cores that currently sit
mostly idle) would be a solution. Is anyone working on something like
this? Any other ideas? Where should I look if I want to start to
think about creating a new index that would work this way (or am I
just crazy)?
Thanks for any insight!
--
Daniel J. Luke
+========================================================+
| *---------------- dluke@geeklair.net ----------------* |
| *-------------- http://www.geeklair.net -------------* |
+========================================================+
| Opinions expressed are mine and do not necessarily |
| reflect the opinions of my employer. |
+========================================================+