Thread: [HACKERS] Skip all-visible pages during second HeapScan of CIC

[HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Pavan Deolasee
Date:
Hello All,

During the second heap scan of CREATE INDEX CONCURRENTLY, we're only interested in the tuples which were inserted after the first scan was started. All such tuples can only exists in pages which have their VM bit unset. So I propose the attached patch which consults VM during second scan and skip all-visible pages. We do the same trick of skipping pages only if certain threshold of pages can be skipped to ensure OS's read-ahead is not disturbed.

The patch obviously shows significant reduction of time for building index concurrently for very large tables, which are not being updated frequently and which was vacuumed recently (so that VM bits are set). I can post performance numbers if there is interest. For tables that are being updated heavily, the threshold skipping was indeed useful and without that we saw a slight regression.

Since VM bits are only set during VACUUM which conflicts with CIC on the relation lock, I don't see any risk of incorrectly skipping pages that the second scan should have scanned.

Comments?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Masahiko Sawada
Date:
On Tue, Feb 28, 2017 at 10:42 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> Hello All,
>
> During the second heap scan of CREATE INDEX CONCURRENTLY, we're only
> interested in the tuples which were inserted after the first scan was
> started. All such tuples can only exists in pages which have their VM bit
> unset. So I propose the attached patch which consults VM during second scan
> and skip all-visible pages. We do the same trick of skipping pages only if
> certain threshold of pages can be skipped to ensure OS's read-ahead is not
> disturbed.

Great.

>
> The patch obviously shows significant reduction of time for building index
> concurrently for very large tables, which are not being updated frequently
> and which was vacuumed recently (so that VM bits are set). I can post
> performance numbers if there is interest.

Please share it. I'm curious.

> For tables that are being updated
> heavily, the threshold skipping was indeed useful and without that we saw a
> slight regression.
>
> Since VM bits are only set during VACUUM which conflicts with CIC on the
> relation lock, I don't see any risk of incorrectly skipping pages that the
> second scan should have scanned.

Agreed.

And the patch looks good to me.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Andres Freund
Date:
Hi,

On 2017-02-28 19:12:03 +0530, Pavan Deolasee wrote:
> Since VM bits are only set during VACUUM which conflicts with CIC on the
> relation lock, I don't see any risk of incorrectly skipping pages that the
> second scan should have scanned.

I think that's true currently, but it'd also prevent us from doing that
in additional places.  Which, in my opinion, we really should (and I
believe that's realistically achievable).  Thus I really don't want to
base the correctness of CIC - a relatively infrequent operation - on the
assumption that no VM bits can be set concurrenty due to the SUE lock.

Regards,

Andres



Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Peter Geoghegan
Date:
On Fri, Mar 3, 2017 at 2:54 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-02-28 19:12:03 +0530, Pavan Deolasee wrote:
>> Since VM bits are only set during VACUUM which conflicts with CIC on the
>> relation lock, I don't see any risk of incorrectly skipping pages that the
>> second scan should have scanned.
>
> I think that's true currently, but it'd also prevent us from doing that
> in additional places.  Which, in my opinion, we really should (and I
> believe that's realistically achievable).  Thus I really don't want to
> base the correctness of CIC - a relatively infrequent operation - on the
> assumption that no VM bits can be set concurrenty due to the SUE lock.

I agree.

FWIW, the extra time that CIC takes over a plain CI is much reduced these days.

-- 
Peter Geoghegan



Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Stephen Frost
Date:
Andres,

* Andres Freund (andres@anarazel.de) wrote:
> On 2017-02-28 19:12:03 +0530, Pavan Deolasee wrote:
> > Since VM bits are only set during VACUUM which conflicts with CIC on the
> > relation lock, I don't see any risk of incorrectly skipping pages that the
> > second scan should have scanned.
>
> I think that's true currently, but it'd also prevent us from doing that
> in additional places.  Which, in my opinion, we really should (and I
> believe that's realistically achievable).  Thus I really don't want to
> base the correctness of CIC - a relatively infrequent operation - on the
> assumption that no VM bits can be set concurrenty due to the SUE lock.

That sounds like we need a lock or similar mechanism to indicate that
CIC depends on the VM not being changed, rather than saying it shouldn't
depend on that because it might not always be true that the only other
operation (VACUUM) sets them and already acquires a conflicting lock.

Thanks!

Stephen

Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Peter Geoghegan
Date:
On Tue, Feb 28, 2017 at 5:42 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> During the second heap scan of CREATE INDEX CONCURRENTLY, we're only
> interested in the tuples which were inserted after the first scan was
> started. All such tuples can only exists in pages which have their VM bit
> unset. So I propose the attached patch which consults VM during second scan
> and skip all-visible pages. We do the same trick of skipping pages only if
> certain threshold of pages can be skipped to ensure OS's read-ahead is not
> disturbed.

BTW, is there any danger of VACUUM acquiring a lock on the heap
relation (i.e. vacuuming it) after the first CIC transaction ends, but
before the second CIC transaction begins?

-- 
Peter Geoghegan



Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Andres Freund
Date:
On 2017-03-03 15:12:04 -0800, Peter Geoghegan wrote:
> On Tue, Feb 28, 2017 at 5:42 AM, Pavan Deolasee
> <pavan.deolasee@gmail.com> wrote:
> > During the second heap scan of CREATE INDEX CONCURRENTLY, we're only
> > interested in the tuples which were inserted after the first scan was
> > started. All such tuples can only exists in pages which have their VM bit
> > unset. So I propose the attached patch which consults VM during second scan
> > and skip all-visible pages. We do the same trick of skipping pages only if
> > certain threshold of pages can be skipped to ensure OS's read-ahead is not
> > disturbed.
> 
> BTW, is there any danger of VACUUM acquiring a lock on the heap
> relation (i.e. vacuuming it) after the first CIC transaction ends, but
> before the second CIC transaction begins?

We hold session level locks to prevent such danger.



Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Robert Haas
Date:
On Fri, Mar 3, 2017 at 6:06 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Andres Freund (andres@anarazel.de) wrote:
>> On 2017-02-28 19:12:03 +0530, Pavan Deolasee wrote:
>> > Since VM bits are only set during VACUUM which conflicts with CIC on the
>> > relation lock, I don't see any risk of incorrectly skipping pages that the
>> > second scan should have scanned.
>>
>> I think that's true currently, but it'd also prevent us from doing that
>> in additional places.  Which, in my opinion, we really should (and I
>> believe that's realistically achievable).  Thus I really don't want to
>> base the correctness of CIC - a relatively infrequent operation - on the
>> assumption that no VM bits can be set concurrenty due to the SUE lock.
>
> That sounds like we need a lock or similar mechanism to indicate that
> CIC depends on the VM not being changed, rather than saying it shouldn't
> depend on that because it might not always be true that the only other
> operation (VACUUM) sets them and already acquires a conflicting lock.

I don't really think that would be a useful approach.  I think what
Andres is thinking about -- or at least what comes to mind for me --
is the possibility that heap_page_prune() might someday try to mark
pages all-visible.  Then it would become something that happens from
time to time during foreground processing, rather than (as now)
something that only happens from within a maintenance command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Stephen Frost
Date:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Mar 3, 2017 at 6:06 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> >> On 2017-02-28 19:12:03 +0530, Pavan Deolasee wrote:
> >> > Since VM bits are only set during VACUUM which conflicts with CIC on the
> >> > relation lock, I don't see any risk of incorrectly skipping pages that the
> >> > second scan should have scanned.
> >>
> >> I think that's true currently, but it'd also prevent us from doing that
> >> in additional places.  Which, in my opinion, we really should (and I
> >> believe that's realistically achievable).  Thus I really don't want to
> >> base the correctness of CIC - a relatively infrequent operation - on the
> >> assumption that no VM bits can be set concurrenty due to the SUE lock.
> >
> > That sounds like we need a lock or similar mechanism to indicate that
> > CIC depends on the VM not being changed, rather than saying it shouldn't
> > depend on that because it might not always be true that the only other
> > operation (VACUUM) sets them and already acquires a conflicting lock.
>
> I don't really think that would be a useful approach.  I think what
> Andres is thinking about -- or at least what comes to mind for me --
> is the possibility that heap_page_prune() might someday try to mark
> pages all-visible.  Then it would become something that happens from
> time to time during foreground processing, rather than (as now)
> something that only happens from within a maintenance command.

Right, that's what I thought he was getting at and my general thinking
was that we would need a way to discover if a CIC is ongoing on the
relation and therefore heap_page_prune(), or anything else, would know
that it can't twiddle the bits in the VM due to the ongoing CIC.
Perhaps a lock isn't the right answer there, but it would have to be
some kind of cross-process communication that operates at a relation
level..

Just to be clear, I wasn't thinking that heap_page_prune() or a
foreground process would actually block on the lock, just that it would
try to acquire it if it decided that it could twiddle the VM and if it
wasn't able to immediately then it would just leave it alone and move
on.  I'll admit that I'm not super familiar with the CIC or VM
internals, frankly, so I may be all wet with this entire line of
thinking, but figured I'd at least explain what the idea was.

Thanks!

Stephen

Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Robert Haas
Date:
On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Right, that's what I thought he was getting at and my general thinking
> was that we would need a way to discover if a CIC is ongoing on the
> relation and therefore heap_page_prune(), or anything else, would know
> that it can't twiddle the bits in the VM due to the ongoing CIC.
> Perhaps a lock isn't the right answer there, but it would have to be
> some kind of cross-process communication that operates at a relation
> level..

Well, I guess that's one option.  I lean toward the position already
taken by Andres and Peter, namely, that it's probably not a great idea
to pursue this optimization.  I'm not totally dead-set on that
position, but it doesn't seem likely that we can add material
synchronization overhead to heap_page_prune(), and I'd rather maintain
the option to set visibility map bits in more places in the future
than give up that opportunity by optimizing CIC now.  It's nice for
CIC to be fast, but setting all visible bits more aggressively sounds
nicer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Andres Freund
Date:
On 2017-03-07 21:03:53 -0500, Robert Haas wrote:
> On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > Right, that's what I thought he was getting at and my general thinking
> > was that we would need a way to discover if a CIC is ongoing on the
> > relation and therefore heap_page_prune(), or anything else, would know
> > that it can't twiddle the bits in the VM due to the ongoing CIC.
> > Perhaps a lock isn't the right answer there, but it would have to be
> > some kind of cross-process communication that operates at a relation
> > level..
> 
> Well, I guess that's one option.  I lean toward the position already
> taken by Andres and Peter, namely, that it's probably not a great idea
> to pursue this optimization.  I'm not totally dead-set on that
> position, but it doesn't seem likely that we can add material
> synchronization overhead to heap_page_prune(), and I'd rather maintain
> the option to set visibility map bits in more places in the future
> than give up that opportunity by optimizing CIC now.  It's nice for
> CIC to be fast, but setting all visible bits more aggressively sounds
> nicer.

Indeed.

I wonder however, if careful snapshot managment couldn't solve this as
well.  I have *not* thought a lot about this, but afaics we can easily
prevent all-visible from being set in cases it'd bother us by having an
"appropriate" xmin / registered snapshot.

- Andres



Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Robert Haas
Date:
On Tue, Mar 7, 2017 at 9:12 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-03-07 21:03:53 -0500, Robert Haas wrote:
>> On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > Right, that's what I thought he was getting at and my general thinking
>> > was that we would need a way to discover if a CIC is ongoing on the
>> > relation and therefore heap_page_prune(), or anything else, would know
>> > that it can't twiddle the bits in the VM due to the ongoing CIC.
>> > Perhaps a lock isn't the right answer there, but it would have to be
>> > some kind of cross-process communication that operates at a relation
>> > level..
>>
>> Well, I guess that's one option.  I lean toward the position already
>> taken by Andres and Peter, namely, that it's probably not a great idea
>> to pursue this optimization.  I'm not totally dead-set on that
>> position, but it doesn't seem likely that we can add material
>> synchronization overhead to heap_page_prune(), and I'd rather maintain
>> the option to set visibility map bits in more places in the future
>> than give up that opportunity by optimizing CIC now.  It's nice for
>> CIC to be fast, but setting all visible bits more aggressively sounds
>> nicer.
>
> Indeed.
>
> I wonder however, if careful snapshot managment couldn't solve this as
> well.  I have *not* thought a lot about this, but afaics we can easily
> prevent all-visible from being set in cases it'd bother us by having an
> "appropriate" xmin / registered snapshot.

Yeah, but that's a tax on the whole system.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Pavan Deolasee
Date:


On Wed, Mar 8, 2017 at 7:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Right, that's what I thought he was getting at and my general thinking
> was that we would need a way to discover if a CIC is ongoing on the
> relation and therefore heap_page_prune(), or anything else, would know
> that it can't twiddle the bits in the VM due to the ongoing CIC.
> Perhaps a lock isn't the right answer there, but it would have to be
> some kind of cross-process communication that operates at a relation
> level..

Well, I guess that's one option.  I lean toward the position already
taken by Andres and Peter, namely, that it's probably not a great idea
to pursue this optimization. 

Fair point. I'm not going to "persist" with the idea too long. It seemed like a good, low-risk feature to me which can benefit certain use cases quite reasonably. It's not uncommon to create indexes (or reindex existing indexes to remove index bloats) on extremely large tables and avoiding a second heap scan can hugely benefit such cases. Holding up the patch for something for which we don't even have a proposal yet seemed a bit strange at first, but I see the point. 

Anyways, for a recently vacuumed table of twice the size of RAM and on a machine with SSDs, the patched CIC version runs about 25% faster. That's probably the best case scenario.

I also realised that I'd attached a slightly older version of the patch. So new version attached, but I'll remove the patch from CF if I don't see any more votes in favour of the patch.

Thanks,
Pavan
Attachment

Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Pavan Deolasee
Date:


On Wed, Mar 8, 2017 at 8:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 7, 2017 at 9:12 PM, Andres Freund <andres@anarazel.de> wrote:

>
> I wonder however, if careful snapshot managment couldn't solve this as
> well.  I have *not* thought a lot about this, but afaics we can easily
> prevent all-visible from being set in cases it'd bother us by having an
> "appropriate" xmin / registered snapshot.

Yeah, but that's a tax on the whole system.


May be we can do that only when patched CIC (or any other operation which may not like concurrent VM set) is in progress. We could use what Stephen suggested upthread to find that state. But right now it's hard to think because there is nothing on either side so we don't know what gets impacted by aggressive VM set and how.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Andres Freund
Date:
On 2017-03-07 21:38:40 -0500, Robert Haas wrote:
> > I wonder however, if careful snapshot managment couldn't solve this as
> > well.  I have *not* thought a lot about this, but afaics we can easily
> > prevent all-visible from being set in cases it'd bother us by having an
> > "appropriate" xmin / registered snapshot.
> 
> Yeah, but that's a tax on the whole system.

I'm not sure I can buy that argument. CIC *already* holds a snapshot
during each of the two scans. By reducing the amount of time that's held
in the second scan, the systemwide impact is reduced, because it's held
for a shorter amount of time.  We need to hold a slightly "more
aggressive" snapshot, that's true, but if you have high xid throughput
those effects should roughly balance each other.

Greetings,

Andres Freund



Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Stephen Frost
Date:
* Pavan Deolasee (pavan.deolasee@gmail.com) wrote:
> On Wed, Mar 8, 2017 at 7:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> > On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > > Right, that's what I thought he was getting at and my general thinking
> > > was that we would need a way to discover if a CIC is ongoing on the
> > > relation and therefore heap_page_prune(), or anything else, would know
> > > that it can't twiddle the bits in the VM due to the ongoing CIC.
> > > Perhaps a lock isn't the right answer there, but it would have to be
> > > some kind of cross-process communication that operates at a relation
> > > level..
> >
> > Well, I guess that's one option.  I lean toward the position already
> > taken by Andres and Peter, namely, that it's probably not a great idea
> > to pursue this optimization.
>
> Fair point. I'm not going to "persist" with the idea too long. It seemed
> like a good, low-risk feature to me which can benefit certain use cases
> quite reasonably. It's not uncommon to create indexes (or reindex existing
> indexes to remove index bloats) on extremely large tables and avoiding a
> second heap scan can hugely benefit such cases. Holding up the patch for
> something for which we don't even have a proposal yet seemed a bit strange
> at first, but I see the point.

I'm not really sure that I do.  CIC's will not be so frequent that not
allowing VM updates during their operation should really result in a
huge drag on the system, imv.  Of course, if the only way to realize a
CIC is happening on the table is to perform some expensive operation and
we have to do that every time then it's not going to be worth it, but
I'm not sure that's really the case here.

The issue here, as I understand it at least, is to come up with a way
that we can make sure to not do anything which would screw up the CIC
while it's running, that we can detect very cheaply so we don't slow
things down during normal non-CIC-running periods.  I suggested a new
lock, in part because anything that's updating the VM is going to have
to have some kind of lock anyway, and perhaps going for the "heavier"
lock that would conflict with the CIC, in an optimistic manner, would
make the normal non-CIC-running case essentially the same speed.  If a
CIC was running then the attempt to acquire the lock would fail and
extra time would be spent acquiring a lower-weight lock, of course, but
that's going to happen relatively rarely.

> Anyways, for a recently vacuumed table of twice the size of RAM and on a
> machine with SSDs, the patched CIC version runs about 25% faster. That's
> probably the best case scenario.

I agree, that certainly seems like a very nice performance improvement.

Thanks!

Stephen

Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
Stephen Frost
Date:
* Andres Freund (andres@anarazel.de) wrote:
> On 2017-03-07 21:38:40 -0500, Robert Haas wrote:
> > > I wonder however, if careful snapshot managment couldn't solve this as
> > > well.  I have *not* thought a lot about this, but afaics we can easily
> > > prevent all-visible from being set in cases it'd bother us by having an
> > > "appropriate" xmin / registered snapshot.
> >
> > Yeah, but that's a tax on the whole system.
>
> I'm not sure I can buy that argument. CIC *already* holds a snapshot
> during each of the two scans. By reducing the amount of time that's held
> in the second scan, the systemwide impact is reduced, because it's held
> for a shorter amount of time.  We need to hold a slightly "more
> aggressive" snapshot, that's true, but if you have high xid throughput
> those effects should roughly balance each other.

For my 2c, at least, this certainly sounds like a potentially promising
approach too and I tend to agree with Andres on it not really being an
issue to have a more aggressive snapshot for the duration of the CIC.

Thanks!

Stephen

Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

From
David Steele
Date:
On 3/7/17 9:42 PM, Pavan Deolasee wrote:
>
> Fair point. I'm not going to "persist" with the idea too long. It seemed
> like a good, low-risk feature to me which can benefit certain use cases
> quite reasonably. It's not uncommon to create indexes (or reindex
> existing indexes to remove index bloats) on extremely large tables and
> avoiding a second heap scan can hugely benefit such cases. Holding up
> the patch for something for which we don't even have a proposal yet
> seemed a bit strange at first, but I see the point.
>
> Anyways, for a recently vacuumed table of twice the size of RAM and on a
> machine with SSDs, the patched CIC version runs about 25% faster. That's
> probably the best case scenario.
>
> I also realised that I'd attached a slightly older version of the patch.
> So new version attached, but I'll remove the patch from CF if I don't
> see any more votes in favour of the patch.

I have marked this submission "Returned with Feedback".  Please feel 
free to resubmit to a future commitfest or open a discussion on hackers 
if you develop a new approach.

Thanks,
-- 
-David
david@pgmasters.net