Re: UPDATEDs slowing SELECTs in a fully cached database - Mailing list pgsql-performance

From lars
Subject Re: UPDATEDs slowing SELECTs in a fully cached database
Date
Msg-id 4E1B810D.4070404@yahoo.com
Whole thread Raw
In response to Re: UPDATEDs slowing SELECTs in a fully cached database  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: UPDATEDs slowing SELECTs in a fully cached database  (lars <lhofhansl@yahoo.com>)
List pgsql-performance
On 07/11/2011 02:43 PM, Merlin Moncure wrote:
> On Mon, Jul 11, 2011 at 2:16 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov>  wrote:
>> lars<lhofhansl@yahoo.com>  wrote:
>>
>>> Stopping the UPDATEs, waiting for any CHECKPOINTs to finish,
>>> and then running the SELECTs indeed shows a similar slowdown.
>>>
>>> Interestingly I see very heavy WAL traffic while executing the
>>> SELECTs. (So I was confused as to what caused the WAL traffic).
>> Hint bit changes aren't logged, so if it was that you would be
>> seeing writes to the heap, but not to the WAL.  Clean-up of dead
>> tuples is logged -- this is probably the result of pruning dead
>> tuples.  You could probably reduce the impact on your SELECT
>> statements at least a little by making autovacuum more aggressive.
> yeah.  In fact, I'd like to disable autovacuum completely just to
> confirm this.  In particular I'd like to know if that removes wal
> traffic when only selects are going on.  Another way to check is to
> throw some queries to pg_stat_activity during your select period and
> see if any non-select activity (like autovacum vacuum).  Basically I'm
> suspicious there is more to this story.
>
> hint bit flusing causing i/o during SELECT is a typical complaint
> (especially on systems with underperformant i/o), but I'm suspicious
> if that's really the problem here. Since you are on a virtualized
> platform, I can't help but wonder if you are running into some
> bottleneck that you wouldn't see on native hardware.
>
> What's iowait during the slow period?
>
> merlin
Thanks Kevin and Merlin this is extremely helpful...

Ok, that makes much more sense (WALing hint bits did not make sense).

I disabled auto-vacuum and did four tests:
1. Run a bunch of updates, stop that process, wait until checkpointing
is finished, and run the selects (as before).
2. run VACUUM manually, then run the SELECTs
3. Have the UPDATEs and SELECTs touch a mutually exclusive, random sets
of row (still in sets of 10000).
So the SELECTs are guaranteed not to select rows that were updated.
4. Lastly, change the UPDATEs to update a non-indexed column. To rule
out Index maintenance. Still distinct set of rows.

In the first case I see the same slowdown (from ~14ms to ~400-500ms).
pg_stat_activity shows no other load during that
time. I also see write activity only on the WAL volume.

In the 2nd case after VACUUM is finished the time is back to 14ms. As an
aside: If I run the SELECTs while VACUUM is
running the slowdown is about the same as in the first case until
(apparently) VACUUM has cleaned up most of the table,
at which point the SELECTs become faster again (~50ms).

In the 3rd case I see exactly the same behavior, which is interesting.
Both before VACUUM is run and after.
There's no guarantee obviously that distinct rows do not share the same
page of course especially since the index is
updated as part of this (but see the 4th case).

In case 4 I still see the same issue. Again both before and after VACUUM.

In all cases I see from pg_stat_bgwriter that no backend writes buffers
directly (but I think that only pertains to dirty buffers, and not the WAL).

So I think I have a partial answer to my initial question.

However, that brings me to some other questions:
Why do SELECTs cause dead tuples to be pruned (as Kevin suggests)?
That even happens when the updates did not touch the selected rows(?)
And why does that slow down the SELECTs? (checkpointing activity on the
EBS volume holding the database for example
does not slow down SELECTs at all, only WAL activity does). Does the
selecting backend do that work itself?

Lastly, is this documented somewhere? (I apologize if it is and I missed
it). If not I'd be happy to write a wiki entry for this.


pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database
Next
From: lars
Date:
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database