Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench) - Mailing list pgsql-hackers

Robert Haas <robertmhaas@gmail.com> wrote:
>On Thu, Jul 27, 2017 at 10:05 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> I really don't know if that would be worthwhile. It would certainly fix
>> the regression shown in my test case, but that might not go far enough.
>> I strongly suspect that there are more complicated workloads where
>> LP_DEAD cleanup from SELECT statements matters, which is prevented by
>> the LSN check thing, just because there are always other sessions that
>> modify the page concurrently. This might be true of Alik's Zipfian test
>> case, for example.
>
>I haven't studied the test case, but I think as a general principle it
>makes sense to be happy when someone comes up with an algorithm that
>holds a lock for a shorter period of time (and buffer pins are a type
>of lock).  There are a number of places (fast-path locking, for
>example, or vacuum skipping pinned heap pages) where we have
>fast-paths that get taken most of the time and slow paths that get
>used when concurrent activity happens; empirically, such things often
>work out to a win.  I think it's disturbing that this code seems to be
>taking the slow-path (which, in context, means skipping LP_DEAD
>cleanup) even there is no concurrent activity.  That's hard to
>justify.

That is hard to justify. I don't think that failing to set LP_DEAD hints
is the cost that must be paid to realize a benefit elsewhere, though. I
don't see much problem with having both benefits consistently. It's
actually very unlikely that VACUUM will run, and a TID will be recycled
at exactly the wrong time. We could probably come up with a more
discriminating way of detecting that that may have happened, at least
for Postgres 11. We'd continue to use LSN; the slow path would be taken
when the LSN changed, but we do not give up on setting LP_DEAD bits. I
think we can justify going to the heap again in this slow path, if
that's what it takes.

>But the fact that it is taking the slow-path when there *is*
>concurrent activity is harder to complain about.  That might win or it
>might lose; the non-concurrent case only loses.

Let's wait to see what difference it makes if Alik's zipfian
distribution pgbench test case uses unlogged tables. That may gives us a
good sense of the problem for cases with contention/concurrency.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Transactions involving multiple postgres foreign servers