Thread: Very slow update / hash join

Very slow update / hash join

From
Kurt Roeckx
Date:
Hi,

I have an update query that's been running for 48 hours now.
Since it started it used about 2.5% CPU, and is writing to the
disk at about 3 MB/s, and reading at about 2 MB/s.  It's mostly
waiting for the disks.

The query plan looks like this:
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Update on certificates c  (cost=1224052.45..60710389.31 rows=19950420 width=1371)
   ->  Hash Join  (cost=1224052.45..60710389.31 rows=19950420 width=1371)
         Hash Cond: (c.id = cu.id)
         ->  Seq Scan on certificates c  (cost=0.00..8372137.31 rows=147868231 width=1258)
         ->  Hash  (cost=623981.20..623981.20 rows=19950420 width=117)
               ->  Seq Scan on certificates_update cu  (cost=0.00..623981.20 rows=19950420 width=117)

I've set the work_mem for this query to 6GB, which seem to be
enough to make pgsql_tmp empty, when it was only set to 1 GB it
did have files in it.  The process is using about 4GB of RAM, of
which 0.5 probably comes from the shared_buffers.

It did use 100% CPU at the start, but that was for about 80
seconds.  I'm guessing that's the time it needs to read and hash
the update table.  But from that point on, it gets really slow.

As you can see, the table is quite large and I want to update
about 20M rows of the 133M rows (not sure why the plan say 147M)

The table itself is 53GB, and the table it updates from is only
3.3 GB.  There are some index on some of the fields (like the id),
but none of them are being updated.  I tried removing those that
did get updated but that had little effect.  It does have foreign
keys to other tables, and other tables references it, but none of
the keys should get updated.

Reading or writing the whole table shouldn't take that long, and I
have no idea why it's this slow.  Does anybody have an idea why
it's this slow?

From what I understand, the hash join should be the one I want to
use, I tried to force the others but that doesn't seem to improve
anything.


Kurt



Re: Very slow update / hash join

From
Jeff Janes
Date:
On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx <kurt@roeckx.be> wrote:
> Hi,
>
> I have an update query that's been running for 48 hours now.
> Since it started it used about 2.5% CPU, and is writing to the
> disk at about 3 MB/s, and reading at about 2 MB/s.  It's mostly
> waiting for the disks.

The easiest way to figure out what is going on is to identify the
process, and then trace it with something like:

strace -T -ttt -y -p <PID of process>

That should make it obvious which file it is waiting for IO on.  Then
you can look up that relfilenode in pg_class to see what table/index
it is.


> The table itself is 53GB, and the table it updates from is only
> 3.3 GB.  There are some index on some of the fields (like the id),
> but none of them are being updated.  I tried removing those that
> did get updated but that had little effect.  It does have foreign
> keys to other tables, and other tables references it, but none of
> the keys should get updated.

What version of PostgreSQL are you using?  Have you tried dropping the
foreign keys?

Cheers,

Jeff


Re: Very slow update / hash join

From
Kurt Roeckx
Date:
On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote:
> On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx <kurt@roeckx.be> wrote:
> > Hi,
> >
> > I have an update query that's been running for 48 hours now.
> > Since it started it used about 2.5% CPU, and is writing to the
> > disk at about 3 MB/s, and reading at about 2 MB/s.  It's mostly
> > waiting for the disks.
>
> The easiest way to figure out what is going on is to identify the
> process, and then trace it with something like:
>
> strace -T -ttt -y -p <PID of process>
>
> That should make it obvious which file it is waiting for IO on.  Then
> you can look up that relfilenode in pg_class to see what table/index
> it is.

Thanks for the hint, that I didn't think about it.

So it's busy reading all the index files including the primary
key, and only writing to the table I'm updating.

> What version of PostgreSQL are you using?  Have you tried dropping the
> foreign keys?

I'm using 9.5.2.

So I think the foreign keys are unrelated now.  They all obviously
point to the primary key that's not changing, and it's reading all
the index on the table itself, not those on the other tables.

It's kind of annoying that I would need to drop the indexes that
aren't modified just to run an update query.


Kurt



Re: Very slow update / hash join

From
Kurt Roeckx
Date:
On Fri, May 06, 2016 at 11:38:27AM +0200, Kurt Roeckx wrote:
> On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote:
> > On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx <kurt@roeckx.be> wrote:
> > > Hi,
> > >
> > > I have an update query that's been running for 48 hours now.
> > > Since it started it used about 2.5% CPU, and is writing to the
> > > disk at about 3 MB/s, and reading at about 2 MB/s.  It's mostly
> > > waiting for the disks.
> >
> > The easiest way to figure out what is going on is to identify the
> > process, and then trace it with something like:
> >
> > strace -T -ttt -y -p <PID of process>
> >
> > That should make it obvious which file it is waiting for IO on.  Then
> > you can look up that relfilenode in pg_class to see what table/index
> > it is.
>
> Thanks for the hint, that I didn't think about it.
>
> So it's busy reading all the index files including the primary
> key, and only writing to the table I'm updating.
>
> > What version of PostgreSQL are you using?  Have you tried dropping the
> > foreign keys?
>
> I'm using 9.5.2.
>
> So I think the foreign keys are unrelated now.  They all obviously
> point to the primary key that's not changing, and it's reading all
> the index on the table itself, not those on the other tables.
>
> It's kind of annoying that I would need to drop the indexes that
> aren't modified just to run an update query.

I dropped all the index except for the primary key.  It was still
as slow when it started, but then I forced the primary key into
the filesystem cache and it seems to be much happier now, average
reading at about 10 MB/s, writing at 30 MB/s.


Kurt



Re: Very slow update / hash join

From
Steven Lembark
Date:
> > It's kind of annoying that I would need to drop the indexes that
> > aren't modified just to run an update query.
>
> I dropped all the index except for the primary key.  It was still
> as slow when it started, but then I forced the primary key into
> the filesystem cache and it seems to be much happier now, average
> reading at about 10 MB/s, writing at 30 MB/s.

Look at the PG tuning doc's.
It seems as if you have too little index cache assigned for the
size of your database (and its indexes). Dropping indexes isn't
the real answer, tuning the database to accomodate them is a
better bet.

It would also be worth checking whether the I/O was entirely due
to sequential reads or may have been swapping. procinfo, vmstat,
or just top can tell you about that.

--
Steven Lembark                                             3646 Flora Pl
Workhorse Computing                                   St Louis, MO 63110
lembark@wrkhors.com                                      +1 888 359 3508


Re: Very slow update / hash join

From
Kurt Roeckx
Date:
On Fri, May 06, 2016 at 09:13:09AM -0500, Steven Lembark wrote:
>
> > > It's kind of annoying that I would need to drop the indexes that
> > > aren't modified just to run an update query.
> >
> > I dropped all the index except for the primary key.  It was still
> > as slow when it started, but then I forced the primary key into
> > the filesystem cache and it seems to be much happier now, average
> > reading at about 10 MB/s, writing at 30 MB/s.

It finished in 2 hours.

> Look at the PG tuning doc's.
> It seems as if you have too little index cache assigned for the
> size of your database (and its indexes). Dropping indexes isn't
> the real answer, tuning the database to accomodate them is a
> better bet.

As far as I know, the only way to improve this is to make the
shared_buffers larger.  But the database is multiple times the
total RAM. Even that table itself is, and all the combined indexes
are too.

And I do agree that dropping the indexes isn't a good idea, but it
seems to be the only way to get it done in a reasonable time.

> It would also be worth checking whether the I/O was entirely due
> to sequential reads or may have been swapping. procinfo, vmstat,
> or just top can tell you about that.

There was no swapping.

The point is that it wasn't doing sequential reads but was
randomly accessing indexes for what looks like no good reason to
me.


Kurt



Re: Very slow update / hash join

From
Jeff Janes
Date:
On Fri, May 6, 2016 at 3:21 AM, Kurt Roeckx <kurt@roeckx.be> wrote:
> On Fri, May 06, 2016 at 11:38:27AM +0200, Kurt Roeckx wrote:
>> On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote:
>> > On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx <kurt@roeckx.be> wrote:
>> > > Hi,
>> > >
>> > > I have an update query that's been running for 48 hours now.
>> > > Since it started it used about 2.5% CPU, and is writing to the
>> > > disk at about 3 MB/s, and reading at about 2 MB/s.  It's mostly
>> > > waiting for the disks.
>> >
>> > The easiest way to figure out what is going on is to identify the
>> > process, and then trace it with something like:
>> >
>> > strace -T -ttt -y -p <PID of process>
>> >
>> > That should make it obvious which file it is waiting for IO on.  Then
>> > you can look up that relfilenode in pg_class to see what table/index
>> > it is.
>>
>> Thanks for the hint, that I didn't think about it.
>>
>> So it's busy reading all the index files including the primary
>> key, and only writing to the table I'm updating.

It is probably dirtying the indexes as well, but the writes are being
buffered in shared_buffers.  The dirty blocks will eventually get
written by the background writer or the checkpoint writer (which you
won't see if you aren't tracing them).


>> > What version of PostgreSQL are you using?  Have you tried dropping the
>> > foreign keys?
>>
>> I'm using 9.5.2.
>>
>> So I think the foreign keys are unrelated now.  They all obviously
>> point to the primary key that's not changing, and it's reading all
>> the index on the table itself, not those on the other tables.
>>
>> It's kind of annoying that I would need to drop the indexes that
>> aren't modified just to run an update query.
>
> I dropped all the index except for the primary key.  It was still
> as slow when it started, but then I forced the primary key into
> the filesystem cache and it seems to be much happier now, average
> reading at about 10 MB/s, writing at 30 MB/s.


OK, so it sounds like what is happening is that your update cannot do
a "Heap-Only Tuple" (HOT) update, because there is not enough room in
each data page for the new copy of rows being updated.  So it is
forced to put the new copy on a different page, which means it has to
update all the indexes so they know where to find the new version.

If this not a one-time event, then one thing you could do is lower the
table's fillfactor, so that the table is more loosely packed and
future updates are more likely to be able to do HOT updates.  If the
rows being updated are randomly scattered, it wouldn' take much
lowering to make this happen (maybe 90).  But if the rows being
updated in a single transaction are co-located with each other, then
you might have to lower it to below 50 before it would solve the
problem, which might be a solution worse than the problem.  When you
change the parameter, it won't take full effect until the table has
been completely rewritten, either due to natural churn, or running a
VACUUM FULL or CLUSTER.

If lowering the fillfactor isn't a good solution, then pre-warming
your indexes (using pg_prewarm or just doing it yourself from the
filesystem) might be your best bet.  If not all of the relevant
indexes can fit in cache at the same time, then you might still have
to drop some of them, or just be patient.

If this giant update does not have to occur atomically in order for
your application to behave correctly, then I would probably break it
up into a series of smaller transactions.  Then you could even run
them in parallel, which would be a big help if you have a RAID (which
can efficiently work on multiple random IO read requests in parallel)
but not help so much if you have a single disk.

Cheers,

Jeff


Re: Very slow update / hash join

From
Kurt Roeckx
Date:
On Fri, May 06, 2016 at 10:25:34AM -0700, Jeff Janes wrote:
>
> OK, so it sounds like what is happening is that your update cannot do
> a "Heap-Only Tuple" (HOT) update, because there is not enough room in
> each data page for the new copy of rows being updated.  So it is
> forced to put the new copy on a different page, which means it has to
> update all the indexes so they know where to find the new version.

That makes total sense now.

> If this not a one-time event, then one thing you could do is lower the
> table's fillfactor, so that the table is more loosely packed and
> future updates are more likely to be able to do HOT updates.  If the
> rows being updated are randomly scattered, it wouldn' take much
> lowering to make this happen (maybe 90).  But if the rows being
> updated in a single transaction are co-located with each other, then
> you might have to lower it to below 50 before it would solve the
> problem, which might be a solution worse than the problem.  When you
> change the parameter, it won't take full effect until the table has
> been completely rewritten, either due to natural churn, or running a
> VACUUM FULL or CLUSTER.

I will probably want to run this a few times.  The data being
updated comes from an external tool and once I add new things or
fix bug in it I would like to update the old rows.  It's normally an
insert/select only table.

But there are only about 20M of the 133M current rows (about 15%)
that I'm really interested in.  So I guess something like an 85%
fillfactor might actually help.

> If this giant update does not have to occur atomically in order for
> your application to behave correctly, then I would probably break it
> up into a series of smaller transactions.  Then you could even run
> them in parallel, which would be a big help if you have a RAID (which
> can efficiently work on multiple random IO read requests in parallel)
> but not help so much if you have a single disk.

I don't care about it being atomic or not.  I actually tried to do
it in smaller batches before and I ended up calculating that it
would take 2 weeks to do the update.



Kurt