Re: UPDATEDs slowing SELECTs in a fully cached database - Mailing list pgsql-performance

From lars
Subject Re: UPDATEDs slowing SELECTs in a fully cached database
Date
Msg-id 4E1B90AF.1020408@yahoo.com
Whole thread Raw
In response to Re: UPDATEDs slowing SELECTs in a fully cached database  (lars <lhofhansl@yahoo.com>)
Responses Re: UPDATEDs slowing SELECTs in a fully cached database
List pgsql-performance
On 07/11/2011 04:02 PM, lars wrote:
> On 07/11/2011 02:43 PM, Merlin Moncure wrote:
>> On Mon, Jul 11, 2011 at 2:16 PM, Kevin Grittner
>> <Kevin.Grittner@wicourts.gov>  wrote:
>>> lars<lhofhansl@yahoo.com>  wrote:
>>>
>>>> Stopping the UPDATEs, waiting for any CHECKPOINTs to finish,
>>>> and then running the SELECTs indeed shows a similar slowdown.
>>>>
>>>> Interestingly I see very heavy WAL traffic while executing the
>>>> SELECTs. (So I was confused as to what caused the WAL traffic).
>>> Hint bit changes aren't logged, so if it was that you would be
>>> seeing writes to the heap, but not to the WAL.  Clean-up of dead
>>> tuples is logged -- this is probably the result of pruning dead
>>> tuples.  You could probably reduce the impact on your SELECT
>>> statements at least a little by making autovacuum more aggressive.
>> yeah.  In fact, I'd like to disable autovacuum completely just to
>> confirm this.  In particular I'd like to know if that removes wal
>> traffic when only selects are going on.  Another way to check is to
>> throw some queries to pg_stat_activity during your select period and
>> see if any non-select activity (like autovacum vacuum).  Basically I'm
>> suspicious there is more to this story.
>>
>> hint bit flusing causing i/o during SELECT is a typical complaint
>> (especially on systems with underperformant i/o), but I'm suspicious
>> if that's really the problem here. Since you are on a virtualized
>> platform, I can't help but wonder if you are running into some
>> bottleneck that you wouldn't see on native hardware.
>>
>> What's iowait during the slow period?
>>
>> merlin
> Thanks Kevin and Merlin this is extremely helpful...
>
> Ok, that makes much more sense (WALing hint bits did not make sense).
>
> I disabled auto-vacuum and did four tests:
> 1. Run a bunch of updates, stop that process, wait until checkpointing
> is finished, and run the selects (as before).
> 2. run VACUUM manually, then run the SELECTs
> 3. Have the UPDATEs and SELECTs touch a mutually exclusive, random
> sets of row (still in sets of 10000).
> So the SELECTs are guaranteed not to select rows that were updated.
> 4. Lastly, change the UPDATEs to update a non-indexed column. To rule
> out Index maintenance. Still distinct set of rows.
>
> In the first case I see the same slowdown (from ~14ms to ~400-500ms).
> pg_stat_activity shows no other load during that
> time. I also see write activity only on the WAL volume.
>
> In the 2nd case after VACUUM is finished the time is back to 14ms. As
> an aside: If I run the SELECTs while VACUUM is
> running the slowdown is about the same as in the first case until
> (apparently) VACUUM has cleaned up most of the table,
> at which point the SELECTs become faster again (~50ms).
>
> In the 3rd case I see exactly the same behavior, which is interesting.
> Both before VACUUM is run and after.
> There's no guarantee obviously that distinct rows do not share the
> same page of course especially since the index is
> updated as part of this (but see the 4th case).
>
> In case 4 I still see the same issue. Again both before and after VACUUM.
>
> In all cases I see from pg_stat_bgwriter that no backend writes
> buffers directly (but I think that only pertains to dirty buffers, and
> not the WAL).
>
> So I think I have a partial answer to my initial question.
>
> However, that brings me to some other questions:
> Why do SELECTs cause dead tuples to be pruned (as Kevin suggests)?
> That even happens when the updates did not touch the selected rows(?)
> And why does that slow down the SELECTs? (checkpointing activity on
> the EBS volume holding the database for example
> does not slow down SELECTs at all, only WAL activity does). Does the
> selecting backend do that work itself?
>
> Lastly, is this documented somewhere? (I apologize if it is and I
> missed it). If not I'd be happy to write a wiki entry for this.
>
>
Oh, and iowait hovers around 20% when SELECTs are slow:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            1.54    0.00    0.98   18.49    0.07   78.92

When SELECTs are fast it looks like this:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            8.72    0.00    0.26    0.00    0.00   91.01

Note that this is a 12 core VM. So one core at 100% would show as 8.33% CPU.


pgsql-performance by date:

Previous
From: lars
Date:
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database
Next
From: lars
Date:
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database