Re: update 600000 rows - Mailing list pgsql-performance

From H. Hall
Subject Re: update 600000 rows
Date
Msg-id 4766B7AC.8080303@reedyriver.com
Whole thread Raw
In response to update 600000 rows  (okparanoid@free.fr)
List pgsql-performance
Note: I am resending this because the first never appeared after 40hrs.
HH

okparanoid@free.fr wrote:
> Hello
>
> i have a python script to update 600000 rows to one table from a csv file in my
> postgres database and it takes me 5 hours to do the transaction...
>
> I'm on debian etch with 8.1 postgres server on a 64 bits quad bi opteron.
>
> I have desactived all index except the primary key who is not updated since it's
> the reference column of the update too.
>
> When i run this script the server is not used by any other user.
>
> First when i run htop i see that the memory used is never more than 150 MB.
> I don't understand in this case why setting shmall and shmmax kernel's
> parameters to 16 GB of memory (the server has 32 GB) increase the rapidity of
> the transaction a lot compared to a shmall and shmax in (only) 2 GB ?!
>
> The script is run with only one transaction and pause by moment to let the time
> to postgres to write data to disk.
>
> If the data were writed at the end of the transaction will be the perfomance
> better ? i wan't that in production data regulary writed to disk to prevent
> loosinf of data but it there any interest to write temporary data in disk in a
> middle of a transaction ???
>
> I'm completely noob to postgres and database configuration and help  are
> welcome.
>
> thank

You will get a huge improvement in time if you use batch updates instead
of updating a row at a time. See:

    http://www.postgresql.org/docs/8.2/interactive/populate.html
        and
    http://www.postgresql.org/docs/8.2/interactive/sql-begin.html

You will also get a big improvement if you can turn fsync off during the
update. See:
    http://www.postgresql.org/docs/8.2/interactive/runtime-config-wal.html

You also need to vacuum the table after doing that many updates since pg
does a delete and insert on each update, there will be a lot of holes.

Cheers
HH





pgsql-performance by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: viewing source code
Next
From: Craig James
Date:
Subject: Multi-threading friendliness (was: libgcc double-free, backend won't die)