Re: Optimizing bulk update performance - Mailing list pgsql-general

From Tom Lane
Subject Re: Optimizing bulk update performance
Date
Msg-id 28751.1367037697@sss.pgh.pa.us
Whole thread Raw
In response to Optimizing bulk update performance  (Yang Zhang <yanghatespam@gmail.com>)
Responses Re: Optimizing bulk update performance
List pgsql-general
Yang Zhang <yanghatespam@gmail.com> writes:
> It currently takes up to 24h for us to run a large set of UPDATE
> statements on a database, which are of the form:

>     UPDATE table SET field1 = constant1, field2 = constant2, ...  WHERE
>     id = constid

> (We're just overwriting fields of objects identified by ID.)

Forgive the obvious question, but you do have an index on "id", right?
Have you checked it's being used (ie EXPLAIN ANALYZE on one of these)?

> The tables have handfuls of indices each and no foreign key constraints.

How much is a "handful"?

> It takes 2h to import a `pg_dump` of the entire DB.  This seems like a
> baseline we should reasonably target.

Well, maybe.  You didn't say what percentage of the DB you're updating.

But the thing that comes to mind here is that you're probably incurring
a network round trip for each row, and maybe a query-planning round as
well, so you really can't expect that this is going to be anywhere near
as efficient as a bulk load operation.  You could presumably get rid of
the planner overhead by using a prepared statement.  Cutting the network
overhead is going to require a bit more ingenuity --- could you move
some logic into a stored procedure, perhaps, so that one command from
the client is sufficient to update multiple rows?

            regards, tom lane


pgsql-general by date:

Previous
From: Yang Zhang
Date:
Subject: Re: Optimizing bulk update performance
Next
From: Yang Zhang
Date:
Subject: Re: Optimizing bulk update performance