Re: Getting even more insert performance (250m+rows/day) - Mailing list pgsql-performance

From Dave Dutcher
Subject Re: Getting even more insert performance (250m+rows/day)
Date
Msg-id 01f701c67f6d$0578bf40$8300a8c0@tridecap.com
Whole thread Raw
In response to Getting even more insert performance (250m+rows/day)  ("Daniel J. Luke" <dluke@geeklair.net>)
Responses Re: Getting even more insert performance (250m+rows/day)
List pgsql-performance
If you can live with possible database corruption, you could try turning
Fsync off.  For example if you could just reinsert the data on the off
chance a hardware failure corrupts the database, you might get a decent
improvement.

Also have you tried creating the index after you have inserted all your
data?  (Or maybe copy already disables the indexes while inserting?)



> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of
> Daniel J. Luke
> Sent: Wednesday, May 24, 2006 2:45 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Getting even more insert performance
> (250m+rows/day)
>
>
> I have a system that currently inserts ~ 250 million rows per day (I
> have about 10k more raw data than that, but I'm at the limit of my
> ability to get useful insert performance out of postgres).
>
> Things I've already done that have made a big difference:
> - modified postgresql.conf shared_buffers value
> - converted to COPY from individual insert statements
> - changed BLCKSZ to 32768
>
> I currently get ~35k/sec inserts on a table with one index (~70k/sec
> inserts if I don't have any indexes).
>
> The indexed field is basically a time_t (seconds since the epoch),
> autovacuum is running (or postgres would stop choosing to use the
> index). The other fields have relatively lower cardinality.
>
> Each days worth of data gets inserted into its own table so that I
> can expire the data without too much effort (since drop table
> is much
> faster than running a delete and then vacuum).
>
> I would really like to be able to have 1 (or 2) more indexes on the
> table since it takes a while for a sequential scan of
> 250million rows
> to complete, but CPU time goes way up.
>
> In fact, it looks like I'm not currently IO bound, but CPU-bound. I
> think some sort of lazy-index generation (especially if it could be
> parallelized to use the other processors/cores that currently sit
> mostly idle) would be a solution. Is anyone working on
> something like
> this? Any other ideas? Where should I look if I want to start to
> think about creating a new index that would work this way (or am I
> just crazy)?
>
> Thanks for any insight!
>
> --
> Daniel J. Luke
> +========================================================+
> | *---------------- dluke@geeklair.net ----------------* |
> | *-------------- http://www.geeklair.net -------------* |
> +========================================================+
> |   Opinions expressed are mine and do not necessarily   |
> |          reflect the opinions of my employer.          |
> +========================================================+
>
>
>


pgsql-performance by date:

Previous
From: "Daniel J. Luke"
Date:
Subject: Getting even more insert performance (250m+rows/day)
Next
From: "Steinar H. Gunderson"
Date:
Subject: Re: Getting even more insert performance (250m+rows/day)