Re: Background vacuum - Mailing list pgsql-performance

From Ron Mayer
Subject Re: Background vacuum
Date
Msg-id 464D04F8.7020306@cheapcomplexdevices.com
Whole thread Raw
In response to Re: Background vacuum  (Andrew Sullivan <ajs@crankycanuck.ca>)
Responses Re: Background vacuum
List pgsql-performance
Andrew Sullivan wrote:
> On Thu, May 10, 2007 at 05:10:56PM -0700, Ron Mayer wrote:
>> One way is to write astored procedure that sets it's own priority.
>> An example is here:
>> http://weblog.bignerdranch.com/?p=11
>
> Do you have evidence to show this will actually work consistently?

The paper referenced below gives a better explanation than I can.

Their conclusion was that on many real-life workloads (including
TPC-C and TPC-H like workloads) on many databases (including DB2
and postgresql) the benefits vastly outweighed the disadvantages.

> The problem with doing this is that if your process is holding a lock
> that prevents some other process from doing something, then your
> lowered priority actually causes that _other_ process to go slower
> too.  This is part of the reason people object to the suggestion that
> renicing a single back end will help anything.

Sure.  And in the paper they discussed the effect and found that
if you do have an OS scheduler than supports priority inheritance
the benefits are even bigger than those without it.  But even
for OS's and scheduler combinations without it the benefits
were very significant.

>
>> This paper studied both CPU and lock priorities on a variety
>> of databases including PostgreSQL.
>>
>> http://www.cs.cmu.edu/~bianca/icde04.pdf
>>
>> " By contrast, for PostgreSQL, lock scheduling is not as
>>   effective as CPU scheduling (see Figure 4(c)).
>
> It is likely that in _some_ cases, you can get this benefit, because
> you don't have contention issues.  The explanation for the good lock
> performance by Postgres on the TPC-C tests they were using is
> PostgreSQL's MVCC: Postgres locks less.  The problem comes when you
> have contention, and in that case, CPU scheduling will really hurt.
>
> This means that, to use CPU scheduling safely, you have to be really
> sure that you know what the other transactions are doing.

Not necessarily.  From the wide range of conditions the paper tested
I'd say it's more like quicksort - you need to be sure you avoid
theoretical pathological conditions that noone (that I can find)
has encountered in practice.

If you do know of such a workload, I (and imagine the authors
of that paper) would be quite interested.

Since they showed that the benefits are very real for both
TPC-C and TPC-H like workloads I think the burden of proof
is now more on the people warning of the (so far theoretical)
drawbacks.

pgsql-performance by date:

Previous
From: "Ralph Mason"
Date:
Subject: Re: Ever Increasing IOWAIT
Next
From: Greg Smith
Date:
Subject: Re: Background vacuum