Very slow update / hash join - Mailing list pgsql-general

From Kurt Roeckx
Subject Very slow update / hash join
Date
Msg-id 20160504222226.GA24085@roeckx.be
Whole thread Raw
Responses Re: Very slow update / hash join  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
Hi,

I have an update query that's been running for 48 hours now.
Since it started it used about 2.5% CPU, and is writing to the
disk at about 3 MB/s, and reading at about 2 MB/s.  It's mostly
waiting for the disks.

The query plan looks like this:
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Update on certificates c  (cost=1224052.45..60710389.31 rows=19950420 width=1371)
   ->  Hash Join  (cost=1224052.45..60710389.31 rows=19950420 width=1371)
         Hash Cond: (c.id = cu.id)
         ->  Seq Scan on certificates c  (cost=0.00..8372137.31 rows=147868231 width=1258)
         ->  Hash  (cost=623981.20..623981.20 rows=19950420 width=117)
               ->  Seq Scan on certificates_update cu  (cost=0.00..623981.20 rows=19950420 width=117)

I've set the work_mem for this query to 6GB, which seem to be
enough to make pgsql_tmp empty, when it was only set to 1 GB it
did have files in it.  The process is using about 4GB of RAM, of
which 0.5 probably comes from the shared_buffers.

It did use 100% CPU at the start, but that was for about 80
seconds.  I'm guessing that's the time it needs to read and hash
the update table.  But from that point on, it gets really slow.

As you can see, the table is quite large and I want to update
about 20M rows of the 133M rows (not sure why the plan say 147M)

The table itself is 53GB, and the table it updates from is only
3.3 GB.  There are some index on some of the fields (like the id),
but none of them are being updated.  I tried removing those that
did get updated but that had little effect.  It does have foreign
keys to other tables, and other tables references it, but none of
the keys should get updated.

Reading or writing the whole table shouldn't take that long, and I
have no idea why it's this slow.  Does anybody have an idea why
it's this slow?

From what I understand, the hash join should be the one I want to
use, I tried to force the others but that doesn't seem to improve
anything.


Kurt



pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Function PostgreSQL 9.2
Next
From: Vincent Veyron
Date:
Subject: Re: Thoughts on "Love Your Database"