Re: Guidance Requested - Bulk Inserting + Queries - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Guidance Requested - Bulk Inserting + Queries
Date
Msg-id CAMkU=1y0n1fShoW8CMgV+oPeMMM43W2vNirF3GrH_J2=3raLwg@mail.gmail.com
Whole thread Raw
In response to Guidance Requested - Bulk Inserting + Queries  (Benjamin Johnson <benjamin.johnson@getcarbonblack.com>)
Responses Re: Guidance Requested - Bulk Inserting + Queries
List pgsql-performance
On Wed, Nov 30, 2011 at 7:27 AM, Benjamin Johnson
<benjamin.johnson@getcarbonblack.com> wrote:
> Experts,
>
> Quick Summary: data can now be inserted very quickly via COPY + removing
> indexes, but is there a design or some tricks to still allow someone to
> query while the partition is still active and 'hot' ?
>
> - Postgres 9.1
> - Windows 7 (64-bit) , although this is just for the current test and
> could vary depending on situation
> - We have 4 main tables with daily partitions

How long are the daily partitions kept for?

> - Each table/partition has multiple indexes on it
> - Streaming logs from client machines into our server app which
> processes the logs and tries to shove all that data into these daily
> partitions as fast as it can.

Why shove it in as fast as you can?  If you want to both read and
write at the same time, then focusing first only on writing and
worrying about reading as an after thought seems like the wrong thing
to do.

> - Using COPY and removed original primary key unique constraints to try
> to get it to be as fast as possible (some duplicates are possible)
> - Will remove duplicates in a later step (disregard for this post)
>
> We now found (thanks Andres and Snow-Man in #postgresql) that in our
> tests, after the indexes get too large performance drops signficantly
> and our system limps forward due to  disk reads (presumably for the
> indexes).

How many hours worth of data can be loaded into the new partition
before the performance knee hits?

After the knee, how does the random disk read activity you see compare
to the maximum random disk reads your IO system can support?  How many
COPYs were you doing at the same time?

During this test, was there background select activity going on, or
was the system only used for COPY?

> If we remove the indexes, performance for our entire sample
> test is great and everything is written to postgresql very quickly.
> This allows us to shove lots and lots of data in (for production
> possibly 100 GB or a TB per day!)

How much do you need to shove in per day?  If you need to insert it,
and index it, and run queries, and deal with maintenance of the older
partitions, then you will need a lot of spare capacity, relative to
just inserting, to do all of those things.  Do you have windows where
there is less insert activity in which other things can get done?

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Jesper Krogh
Date:
Subject: Re: Problems with FTS
Next
From: Josh Berkus
Date:
Subject: Re: vacuum internals and performance affect