Re: UPDATE on two large datasets is very slow - Mailing list pgsql-general

From Steve Gerhardt
Subject Re: UPDATE on two large datasets is very slow
Date
Msg-id 4612DDDE.3060704@ocean.fraknet.org
Whole thread Raw
In response to Re: UPDATE on two large datasets is very slow  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: UPDATE on two large datasets is very slow  (Listmail <lists@peufeu.com>)
List pgsql-general
Tom Lane wrote:

> You're focusing on the wrong thing --- there's nothing wrong with the plan.
> It's only taking 9 seconds to perform the merge join.  The other 183
> seconds are going somewhere else; you need to find out where.
>
> One thing that came to mind was triggers, which would be shown in the
> EXPLAIN results if you are using a sufficiently recent version of PG
> (but you didn't say what you're using) ... however if this is a straight
> port of MySQL code it's pretty unlikely to have either custom triggers
> or foreign keys, so that is most likely the wrong guess.  It may just be
> that it takes that long to update 26917 rows, which would suggest a
> configuration problem to me.

Any suggestions for finding out where all the time is being spent? I'm
running 8.2.0 by the way, bit boneheaded of me to not mention that in
the original message, but I'm planning on upgrading to 8.2.3 soon. I
don't have any triggers or other procedures set up that would interrupt
this, which is why I'm really confused as to the enormous runtime. I
agree that the merge join only takes 9 seconds, but it looks to me like
the 183 seconds are spent sequential scanning both tables, then sorting
the results, which I imagine would be necessary for the merge join to
take place. Sadly, I'm not familiar enough with the internals to know if
this is the case or not.

> shared_buffers, wal_buffers, and checkpoint_segments seem like things
> you might need to increase.

I'll try modifying those and report back with what kind of performance
increases I can get.

> Another problem with this approach is that it's not going to take long
> before the table is bloated beyond belief, if it's not vacuumed
> regularly.  Do you have autovacuum turned on?
>
> Does the tracker tend to send a lot of null updates (no real change to
> the rows)?  If so it'd be worth complicating the query to check for
> no-change and avoid the update for unchanged rows.

The tracker is set up to run a VACUUM ANALYZE after each commit; I
neglected to mention that. From the testing I've done, it seems like
performance is more or less the same whether the table has been vacuumed
recently. Also, the tracker specifically ignores null updates where no
data is changed to cut down on the size of the data being sent.

Also, if you don't mind answering, I've been pretty puzzled why the two
stored procedures are substantially slower than the original method,
since the "concept" in my head seems like they would be a lot more
simple. Am I missing something huge with the way Postgres works?

>             regards, tom lane

Thanks for the help, Tom.

Steve Gerhardt

pgsql-general by date:

Previous
From: "Sergei Shelukhin"
Date:
Subject: a couple of newbie question - deferrable, varchar vs text, 2 dbs
Next
From: "Ben Trewern"
Date:
Subject: Re: Webappication and PostgreSQL login roles