Guidance Requested - Bulk Inserting + Queries - Mailing list pgsql-performance

From Benjamin Johnson
Subject Guidance Requested - Bulk Inserting + Queries
Date
Msg-id 4ED64B67.70401@getcarbonblack.com
Whole thread Raw
Responses Re: Guidance Requested - Bulk Inserting + Queries
List pgsql-performance
Experts,

Quick Summary: data can now be inserted very quickly via COPY + removing
indexes, but is there a design or some tricks to still allow someone to
query while the partition is still active and 'hot' ?

- Postgres 9.1
- Windows 7 (64-bit) , although this is just for the current test and
could vary depending on situation
- We have 4 main tables with daily partitions
- Each table/partition has multiple indexes on it
- Streaming logs from client machines into our server app which
processes the logs and tries to shove all that data into these daily
partitions as fast as it can.
- Using COPY and removed original primary key unique constraints to try
to get it to be as fast as possible (some duplicates are possible)
- Will remove duplicates in a later step (disregard for this post)

We now found (thanks Andres and Snow-Man in #postgresql) that in our
tests, after the indexes get too large performance drops signficantly
and our system limps forward due to  disk reads (presumably for the
indexes).  If we remove the indexes, performance for our entire sample
test is great and everything is written to postgresql very quickly.
This allows us to shove lots and lots of data in (for production
possibly 100 GB or a TB per day!)

My question is, what possible routes can I take where we can have both
fast inserts (with indexes removed until the end of the day), but still
allow a user to query against today's data? Is this even possible?  One
idea would be possibly have hourly tables for today and as soon as we
can try to re-add indexes.  Another possible solution might be to stream
the data to another "reader" postgres instance that has indexes,
although I'm not very versed in replication.


Any ideas would be greatly appreciated.

Thanks!

Ben


--
Benjamin Johnson
http://getcarbonblack.com/ | @getcarbonblack
cell: 312.933.3612 | @chicagoben


pgsql-performance by date:

Previous
From: MirrorX
Date:
Subject: vacuum internals and performance affect
Next
From: Leonardo Francalanci
Date:
Subject: Re: Guidance Requested - Bulk Inserting + Queries