Re: Tuning massive UPDATES and GROUP BY's? - Mailing list pgsql-performance

From Marti Raudsepp
Subject Re: Tuning massive UPDATES and GROUP BY's?
Date
Msg-id AANLkTiks9aTwVyN-JW1Ep2cVoJabBF_fZfkk_43GRcNv@mail.gmail.com
Whole thread Raw
In response to Tuning massive UPDATES and GROUP BY's?  (fork <forkandwait@gmail.com>)
List pgsql-performance
On Thu, Mar 10, 2011 at 17:40, fork <forkandwait@gmail.com> wrote:
> The data is not particularly sensitive; if something happened and it rolled
> back, that wouldnt be the end of the world.  So I don't know if I can use
> "dangerous" setting for WAL checkpoints etc.   There are also aren't a lot of
> concurrent hits on the DB, though a few.

If you don't mind long recovery times in case of a crash, set
checkpoint_segments to ~100 and checkpoint_completion_target=0.9; this
will improve write throughput significantly.

Also, if you don't mind CORRUPTing your database after a crash,
setting fsync=off and full_page_writes=off gives another significant
boost.

> I am loathe to create a new table from a select, since the indexes themselves
> take a really long time to build.

UPDATE on a table with many indexes will probably be slower. If you
want to speed up this part, use INSERT INTO x SELECT and take this
chance to partition your table, such that each individual partition
and most indexes will fit in your cache. Index builds from a warm
cache are very fast in PostgreSQL. You can create several indexes at
once in separate sessions, and the table will only be scanned once.

Don't forget to bump up maintenance_work_mem for index builds, 256MB
might be a reasonable arbitrary value.

The downside is that partitioning can interfere with your read queries
if they expect the data in a sorted order. But then, HashAggregate
tends to be faster than GroupAggregate in many cases, so this might
not matter for your queries. Alternatively you can experiment with
PostgreSQL 9.1 alpha, which has mostly fixed this shortcoming with the
"merge append" plan node.

> As the title alludes, I will also be doing GROUP BY's on the data, and would
> love to speed these up, mostly just for my own impatience...

I think regular tuning is the best you can do here.

Regards,
Marti

pgsql-performance by date:

Previous
From: Steve Atkins
Date:
Subject: Re: big joins not converging
Next
From: fork
Date:
Subject: Re: big joins not converging