Home > mailing lists

Re: update 600000 rows - Mailing list pgsql-performance

From	H. Hall
Subject	Re: update 600000 rows
Date	December 17, 2007 16:54:04
Msg-id	4766B7AC.8080303@reedyriver.com Whole thread Raw
In response to	update 600000 rows (okparanoid@free.fr)
List	pgsql-performance

Tree view

Note: I am resending this because the first never appeared after 40hrs.
HH

okparanoid@free.fr wrote:
> Hello
>
> i have a python script to update 600000 rows to one table from a csv file in my
> postgres database and it takes me 5 hours to do the transaction...
>
> I'm on debian etch with 8.1 postgres server on a 64 bits quad bi opteron.
>
> I have desactived all index except the primary key who is not updated since it's
> the reference column of the update too.
>
> When i run this script the server is not used by any other user.
>
> First when i run htop i see that the memory used is never more than 150 MB.
> I don't understand in this case why setting shmall and shmmax kernel's
> parameters to 16 GB of memory (the server has 32 GB) increase the rapidity of
> the transaction a lot compared to a shmall and shmax in (only) 2 GB ?!
>
> The script is run with only one transaction and pause by moment to let the time
> to postgres to write data to disk.
>
> If the data were writed at the end of the transaction will be the perfomance
> better ? i wan't that in production data regulary writed to disk to prevent
> loosinf of data but it there any interest to write temporary data in disk in a
> middle of a transaction ???
>
> I'm completely noob to postgres and database configuration and help  are
> welcome.
>
> thank

You will get a huge improvement in time if you use batch updates instead
of updating a row at a time. See:

    http://www.postgresql.org/docs/8.2/interactive/populate.html
        and
    http://www.postgresql.org/docs/8.2/interactive/sql-begin.html

You will also get a big improvement if you can turn fsync off during the
update. See:
    http://www.postgresql.org/docs/8.2/interactive/runtime-config-wal.html

You also need to vacuum the table after doing that many updates since pg
does a delete and insert on each update, there will be a lot of holes.

Cheers
HH

pgsql-performance by date:

From: "Joshua D. Drake"
Date: 17 December 2007, 15:16:18
Subject: Re: viewing source code

From: Craig James
Date: 17 December 2007, 18:35:31
Subject: Multi-threading friendliness (was: libgcc double-free, backend won't die)

Re: update 600000 rows - Mailing list pgsql-performance

Previous

Next