Re: Running update in chunks? - Mailing list pgsql-general

From Richard Huxton
Subject Re: Running update in chunks?
Date
Msg-id 50FD22A7.4070203@archonet.com
Whole thread Raw
In response to Re: Running update in chunks?  (Tim Uckun <timuckun@gmail.com>)
List pgsql-general
On 21/01/13 10:30, Tim Uckun wrote:
>> Can you try a couple of things just to check timings. Probably worth EXPLAIN
>> ANALYSE.
>>
>> SELECT count(*) FROM cars.imports i JOIN cars.models md ON i.model_id =
>> md.id;
>
> Takes about 300 ms
>
>> CREATE TEMP TABLE tt AS SELECT i.* FROM cars.imports i JOIN cars.models md
>> ON i.model_id = md.id;
> Takes about 300 ms
OK - so writing all the data takes very under one second but updating
the same amount takes 50 seconds.

The only differences I can think of are WAL logging (transaction log)
and index updates (the temp table has no indexes).

1. Try CREATE TABLE rather than CREATE TEMP TABLE - if that's still
quick then it's not the time taken to write WAL.
2. Run the update query against your new tt table and see how long that
takes.
3. Add indexes and repeat (in particular I'd be suspicious of the gin
index on "data")

My guess is that it's the time taken to update the "data" index - gin
indexes can be slow to rebuild (although 50 seconds seems *very* slow).
If so there are a few options:
1. Split the table and put whatever this "data" is into an import_data
table - assuming it doesn't change often.
2. Try a fill-factor of 50% or less - keeping the updates on the same
data page as the original might help
3. Drop the gin index before doing your bulk update and rebuild it at
the end. This is a common approach with bulk-loading / updates.

Oh - I'm assuming you're only updating those rows whose id has changed -
that seemed to be the suggestion in your first message. If not, simply
adding "AND make_id <> md.make_id" should help. Also (and you may well
have considered this) - for a normalised setup you'd just have the
model-id in "imports" and look up the make-id through the "models" table.

--
   Richard Huxton
   Archonet Ltd


pgsql-general by date:

Previous
From: Tim Uckun
Date:
Subject: Re: Running update in chunks?
Next
From: Albe Laurenz
Date:
Subject: Re: pg_Restore