Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
Date
Msg-id 20170308132222.GQ9812@tamriel.snowman.net
Whole thread Raw
In response to Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
* Pavan Deolasee (pavan.deolasee@gmail.com) wrote:
> On Wed, Mar 8, 2017 at 7:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> > On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > > Right, that's what I thought he was getting at and my general thinking
> > > was that we would need a way to discover if a CIC is ongoing on the
> > > relation and therefore heap_page_prune(), or anything else, would know
> > > that it can't twiddle the bits in the VM due to the ongoing CIC.
> > > Perhaps a lock isn't the right answer there, but it would have to be
> > > some kind of cross-process communication that operates at a relation
> > > level..
> >
> > Well, I guess that's one option.  I lean toward the position already
> > taken by Andres and Peter, namely, that it's probably not a great idea
> > to pursue this optimization.
>
> Fair point. I'm not going to "persist" with the idea too long. It seemed
> like a good, low-risk feature to me which can benefit certain use cases
> quite reasonably. It's not uncommon to create indexes (or reindex existing
> indexes to remove index bloats) on extremely large tables and avoiding a
> second heap scan can hugely benefit such cases. Holding up the patch for
> something for which we don't even have a proposal yet seemed a bit strange
> at first, but I see the point.

I'm not really sure that I do.  CIC's will not be so frequent that not
allowing VM updates during their operation should really result in a
huge drag on the system, imv.  Of course, if the only way to realize a
CIC is happening on the table is to perform some expensive operation and
we have to do that every time then it's not going to be worth it, but
I'm not sure that's really the case here.

The issue here, as I understand it at least, is to come up with a way
that we can make sure to not do anything which would screw up the CIC
while it's running, that we can detect very cheaply so we don't slow
things down during normal non-CIC-running periods.  I suggested a new
lock, in part because anything that's updating the VM is going to have
to have some kind of lock anyway, and perhaps going for the "heavier"
lock that would conflict with the CIC, in an optimistic manner, would
make the normal non-CIC-running case essentially the same speed.  If a
CIC was running then the attempt to acquire the lock would fail and
extra time would be spent acquiring a lower-weight lock, of course, but
that's going to happen relatively rarely.

> Anyways, for a recently vacuumed table of twice the size of RAM and on a
> machine with SSDs, the patched CIC version runs about 25% faster. That's
> probably the best case scenario.

I agree, that certainly seems like a very nice performance improvement.

Thanks!

Stephen

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Parallel bitmap heap scan
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Removing #include "postgres.h" from a couple of headers