Re: Tuning massive UPDATES and GROUP BY's? - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Tuning massive UPDATES and GROUP BY's?
Date
Msg-id AANLkTi=L9vaRA+n=i75-NkBfbcBM--JdrBXjL1UexNcY@mail.gmail.com
Whole thread Raw
In response to Tuning massive UPDATES and GROUP BY's?  (fork <forkandwait@gmail.com>)
List pgsql-performance
On Thu, Mar 10, 2011 at 9:40 AM, fork <forkandwait@gmail.com> wrote:
> Given that doing a massive UPDATE SET foo = bar || ' ' || baz; on a 12 million
> row table (with about 100 columns -- the US Census PUMS for the 2005-2009 ACS)
> is never going to be that fast, what should one do to make it faster?
>
> I set work_mem to 2048MB, but it currently is only using a little bit of memory
> and CPU. (3% and 15% according to top; on a SELECT DISTINCT ... LIMIT earlier,
> it was using 70% of the memory).
>
> The data is not particularly sensitive; if something happened and it rolled
> back, that wouldnt be the end of the world.  So I don't know if I can use
> "dangerous" setting for WAL checkpoints etc.   There are also aren't a lot of
> concurrent hits on the DB, though a few.
>
> I am loathe to create a new table from a select, since the indexes themselves
> take a really long time to build.

you are aware that updating the field for the entire table, especially
if there is an index on it (or any field being updated), will cause
all your indexes to be rebuilt anyways?  when you update a record, it
gets a new position in the table, and a new index entry with that
position.  insert/select to temp, + truncate + insert/select back is
usually going to be faster and will save you the reindex/cluster.
otoh, if you have foreign keys it can be a headache.

> As the title alludes, I will also be doing GROUP BY's on the data, and would
> love to speed these up, mostly just for my own impatience...

need to see the query here to see if you can make them go faster.

merlin

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: NULLS LAST performance
Next
From: runner
Date:
Subject: Basic performance tuning on dedicated server