Marc Mamin wrote:
> Hello,
>
I'm not an expert, but I'll give some suggestions.
>
> I'd like to tune Postgres for large data import (using Copy from).
>
I believe that COPY FROM <file> is supposed to be faster than COPY FROM
STDIN, but <file> must be available to the backend process. If you can
do it, you should think about it, as it eliminates the communication
between the client and the backend.
>
> here are a few steps already done:
>
>
>
> 1) use 3 different disks for:
>
> -1: source data
> -2: index tablespaces
> -3: data tablespaces
>
Make sure pg_xlog is on it's own filesystem. It contains the
write-ahead-log, and putting it by itself keeps the number of seeks
down. If you are constrained, I think pg_xlog is more important than
moving the index tablespaces.
>
> 2) define all foreign keys as initially deferred
>
>
> 3) tune some parameters:
>
>
>
> max_connections =20
> shared_buffers =30000
> work_mem = 8192
> maintenance_work_mem = 32768
> checkpoint_segments = 12
>
> (I also modified the kernel accordingly)
>
Don't forget to increase your free space map if you are going to be
doing deletes frequently.
>
>
>
> 4) runs VACUUM regulary
>
>
> The server runs RedHat and has 1GB RAM
>
> In the production (which may run on a better server), I plan to:
>
> - import a few millions rows per day,
> - keep up to ca 100 millions rows in the db
> - delete older data
>
>
>
>
> I've seen a few posting on hash/btree indexes, which say that hash index do
> not work very well in Postgres;
> currently, I only use btree indexes. Could I gain performances whole using
> hash indexes as well ?
>
I doubt it.
> How does Postgres handle concurrent copy from on: same table / different
> tables ?
>
I think it is better with different tables. If using the same table, and
there are indexes, it has to grab a lock for updating the index, which
causes contention between 2 processes writing to the same table.
>
> I'd be glad on any further suggestion on how to further increase my
> performances.
>
Since you are deleting data often, and copying often, I might recommend
using a partition scheme with a view to bind everything together. That
way you can just drop the old table rather than doing a delete. I don't
know how this would affect foreign key references.
But basically you can create a new table, and do a copy without having
any indexes, then build the indexes, analyze, update the view.
And when deleting you can update the view, and drop the old table.
Something like this:
CREATE TABLE table_2005_05_11 AS (blah);
COPY FROM ... ;
CREATE INDEX blah ON table_2005_05_11(blah);
CREATE OR REPLACE VIEW table AS
SELECT * FROM table_2005_05_10
UNION ALL SELECT * FROM table_2005_05_11;
VACUUM ANALYZE table_2005_05_11;
...
John
=:->
>
>
>
> Marc
>
>
>
>