Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Date
Msg-id 2B38F631-C80E-4882-BBB5-4678891B21E9@phlo.org
Whole thread Raw
In response to Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile  (Ants Aasma <ants@cybertec.at>)
List pgsql-hackers
On Jun1, 2012, at 15:45 , Tom Lane wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> A potential issue with this line of thinking is that your pin delay
>> queue could get highly pressured by outer portions of the query (as in
>> the OP's case)  that will get little or no benefit from the delayed
>> pin.  But choosing a sufficiently sized drain queue would work for
>> most reasonable cases assuming 32 isn't enough?  Why not something
>> much larger, for example the lesser of 1024, (NBuffers * .25) /
>> max_connections?  In other words, for you to get much benefit, you
>> have to pin the buffer sufficiently more than 1/N times among all
>> buffers.
>
> Allowing each backend to pin a large fraction of shared buffers sounds
> like a seriously bad idea to me.  That's just going to increase
> thrashing of what remains.

Right, that was one of the motivations for suggesting the small queue.
At least that way, the number of buffers optimistically pinned by each
backend is limited.

The other was that once the outer portions plough through more than
a few pages per iteration of the sub-plan, the cost of doing that should
dominate the cost of pinning and unpinning.

> More generally, I don't believe that we have any way to know which
> buffers would be good candidates to keep pinned for a long time.

I'd think that pinning a buffer which we've only recently unpinned
is a pretty good indication that the same thing will happen again.

My proposed algorithm could be made to use exactly that criterion
by tracking a little bit more state. We'd have to tag queue entries
with a flag indicating whether they are
 Unpinned (COLD) Pinned, and unpinning should be delayed (HOT)
 Waiting to be unpinned (LUKEWARM)

UnpinBuffer() would check if the buffer is HOT, and if so add it to
the queue with flag LUKEWARM. Otherwise, it'd get immediately
unpinned and flagged as COLD (adding it to the queue if necessary).
PinBuffer() would pin the buffer and mark it as HOT if it was COLD,
and just mark it as HOT if it was LUKEWARM. If the buffer isn't on
the queue already, PinBuffer() would simply pin it and be done.

This would give the following behaviour for a buffer that is pinned
repeatedly
 PinBuffer(): <not on queue> -> <not on queue> (refcount incremented) UnpinBuffer(): <not on queue> -> COLD (refcount
decremented)... PinBuffer(): COLD -> HOT (refcount incremented) UnpinBuffer(): HOT -> LUKEWARM (refcount *not*
decremented)... PinBuffer(): LUKEWARM -> HOT (refcount *not* incremented) UnpinBuffer(): HOT -> LUKEWARM (refcount
*not*decremented) … 

> Typically, we don't drop the pin in the first place if we know we're
> likely to touch that buffer again soon. btree root pages might be an
> exception, but I'm not even convinced of that one.

But Sergey's use-case pretty convincingly shows that, more generally,
inner sides of a nested loop join are also an exception, no? At least
if the inner side is either an index scan, or a seqscan of a really
small table.

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Checkpointer starts before bgwriter to avoid missing fsync reque
Next
From: Merlin Moncure
Date:
Subject: Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile