Re: tuning Postgres for large data import (using Copy from) - Mailing list pgsql-performance

From John A Meinel
Subject Re: tuning Postgres for large data import (using Copy from)
Date
Msg-id 42836DEB.30600@arbash-meinel.com
Whole thread Raw
In response to tuning Postgres for large data import (using Copy from)  ("Marc Mamin" <m.mamin@gmx.net>)
List pgsql-performance
Marc Mamin wrote:
> Hello,
>

I'm not an expert, but I'll give some suggestions.

>
> I'd like to tune Postgres for large data import (using Copy from).
>

I believe that COPY FROM <file> is supposed to be faster than COPY FROM
STDIN, but <file> must be available to the backend process. If you can
do it, you should think about it, as it eliminates the communication
between the client and the backend.

>
> here are a few steps already done:
>
>
>
> 1) use 3 different disks for:
>
>     -1: source data
>     -2: index tablespaces
>     -3: data tablespaces
>

Make sure pg_xlog is on it's own filesystem. It contains the
write-ahead-log, and putting it by itself keeps the number of seeks
down. If you are constrained, I think pg_xlog is more important than
moving the index tablespaces.

>
> 2) define all foreign keys as initially deferred
>
>
> 3) tune some parameters:
>
>
>
>     max_connections =20
>     shared_buffers =30000
>     work_mem = 8192
>     maintenance_work_mem = 32768
>     checkpoint_segments = 12
>
>     (I also modified the kernel accordingly)
>

Don't forget to increase your free space map if you are going to be
doing deletes frequently.

>
>
>
> 4) runs VACUUM regulary
>
>
> The server runs RedHat and has 1GB RAM
>
> In the production (which may run on a better server), I plan to:
>
> - import a few millions rows per day,
> - keep up to ca 100 millions rows in the db
> - delete older data
>
>
>
>
> I've seen a few posting on hash/btree indexes, which say that hash index do
> not work very well in Postgres;
> currently, I only use btree indexes. Could I gain performances whole using
> hash indexes as well ?
>
I doubt it.

> How does Postgres handle concurrent copy from on: same table / different
> tables ?
>

I think it is better with different tables. If using the same table, and
there are indexes, it has to grab a lock for updating the index, which
causes contention between 2 processes writing to the same table.

>
> I'd be glad on any further suggestion on how to further increase my
> performances.
>

Since you are deleting data often, and copying often, I might recommend
using a partition scheme with a view to bind everything together. That
way you can just drop the old table rather than doing a delete. I don't
know how this would affect foreign key references.

But basically you can create a new table, and do a copy without having
any indexes, then build the indexes, analyze, update the view.

And when deleting you can update the view, and drop the old table.

Something like this:

CREATE TABLE table_2005_05_11 AS (blah);
COPY FROM ... ;
CREATE INDEX blah ON table_2005_05_11(blah);
CREATE OR REPLACE VIEW table AS
    SELECT * FROM table_2005_05_10
    UNION ALL SELECT * FROM table_2005_05_11;
VACUUM ANALYZE table_2005_05_11;
...

John
=:->

>
>
>
> Marc
>
>
>
>


Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: tuning Postgres for large data import (using Copy from)
Next
From: Alex Stapleton
Date:
Subject: Re: Partitioning / Clustering