Home > mailing lists

Re: UPDATE on two large datasets is very slow - Mailing list pgsql-general

From	Steve Gerhardt
Subject	Re: UPDATE on two large datasets is very slow
Date	April 4, 2007 17:58:58
Msg-id	4612DDDE.3060704@ocean.fraknet.org Whole thread Raw
In response to	Re: UPDATE on two large datasets is very slow (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: UPDATE on two large datasets is very slow (Listmail <lists@peufeu.com>)
List	pgsql-general

Tree view

Tom Lane wrote:

> You're focusing on the wrong thing --- there's nothing wrong with the plan.
> It's only taking 9 seconds to perform the merge join.  The other 183
> seconds are going somewhere else; you need to find out where.
>
> One thing that came to mind was triggers, which would be shown in the
> EXPLAIN results if you are using a sufficiently recent version of PG
> (but you didn't say what you're using) ... however if this is a straight
> port of MySQL code it's pretty unlikely to have either custom triggers
> or foreign keys, so that is most likely the wrong guess.  It may just be
> that it takes that long to update 26917 rows, which would suggest a
> configuration problem to me.

Any suggestions for finding out where all the time is being spent? I'm
running 8.2.0 by the way, bit boneheaded of me to not mention that in
the original message, but I'm planning on upgrading to 8.2.3 soon. I
don't have any triggers or other procedures set up that would interrupt
this, which is why I'm really confused as to the enormous runtime. I
agree that the merge join only takes 9 seconds, but it looks to me like
the 183 seconds are spent sequential scanning both tables, then sorting
the results, which I imagine would be necessary for the merge join to
take place. Sadly, I'm not familiar enough with the internals to know if
this is the case or not.

> shared_buffers, wal_buffers, and checkpoint_segments seem like things
> you might need to increase.

I'll try modifying those and report back with what kind of performance
increases I can get.

> Another problem with this approach is that it's not going to take long
> before the table is bloated beyond belief, if it's not vacuumed
> regularly.  Do you have autovacuum turned on?
>
> Does the tracker tend to send a lot of null updates (no real change to
> the rows)?  If so it'd be worth complicating the query to check for
> no-change and avoid the update for unchanged rows.

The tracker is set up to run a VACUUM ANALYZE after each commit; I
neglected to mention that. From the testing I've done, it seems like
performance is more or less the same whether the table has been vacuumed
recently. Also, the tracker specifically ignores null updates where no
data is changed to cut down on the size of the data being sent.

Also, if you don't mind answering, I've been pretty puzzled why the two
stored procedures are substantially slower than the original method,
since the "concept" in my head seems like they would be a lot more
simple. Am I missing something huge with the way Postgres works?

>             regards, tom lane

Thanks for the help, Tom.

Steve Gerhardt

pgsql-general by date:

From: "Sergei Shelukhin"
Date: 04 April 2007, 17:58:56
Subject: a couple of newbie question - deferrable, varchar vs text, 2 dbs

From: "Ben Trewern"
Date: 04 April 2007, 18:05:44
Subject: Re: Webappication and PostgreSQL login roles

Re: UPDATE on two large datasets is very slow - Mailing list pgsql-general

Previous

Next