Thread: heapam_index_build_range_scan's anyvisible

heapam_index_build_range_scan's anyvisible

From
Robert Haas
Date:
I spent some time today studying heapam_index_build_range_scan and
quickly reached the conclusion that it's kind of a mess.  At heart
it's pretty simple: loop over all the table, check each tuple against
any qual, and pass the visible ones to the callback.  However, in an
attempt to make it cater to various needs slightly outside of its
original design purpose, various warts have been added, and there are
enough of them now that I at least find it fairly difficult to
understand.  One of those warts is anyvisible, which I gather was
added in support of BRIN.

I first spent some time looking at how the 'anyvisible' flag affects
the behavior of the function. AFAICS, setting the flag to true results
in three behavior changes:

1. The elog(WARNING, ...) calls about a concurrent insert/delete calls
in progress can't be reached.
2. In some cases, reltuples += 1 might not occur where it would've
happened otherwise.
3. If we encounter a HOT-updated which was deleted by our own
transaction, we index it instead of skipping it.

Change #2 doesn't matter because the only caller that passes
anyvisible = true seems to be BRIN, and BRIN ignores the return value.
I initially thought that change #1 must not matter either, because
function has comments in several places saying that the caller must
hold ShareLock or better.  And I thought change #3 must also not
matter, because as the comments explain, this function is used to
build indexes, and if our CREATE INDEX command commits, then any
deletions that it has already performed will commit too, so the fact
that we haven't indexed the now-deleted tuples will be fine.  Then I
realized that brin_summarize_new_values() is calling this function
*without* ShareLock and for *not* for the purpose of creating a new
index but rather for the purpose of updating an existing index, which
means #1 and #3 do matter after all. But I think it's kind of
confusing because anyvisible doesn't change anything about which
tuples are visible.  SnapshotAny is already making "any" tuple
"visible." This flag really means "caller is holding a
lower-than-normal lock level and is not inserting into a brand new
relfilnode".

There may be more than just a cosmetic problem here, because the comments say:

         * It might look unsafe to use this information across buffer
         * lock/unlock.  However, we hold ShareLock on the table so no
         * ordinary insert/update/delete should occur; and we hold pin on the
         * buffer continuously while visiting the page, so no pruning
         * operation can occur either.

In the BRIN case that doesn't apply; I don't know whether this is safe
in that case for some other reason.

I also note that amcheck's bt_check_every_level can also call this
without ShareLock. It doesn't need to set anyvisible because passing a
snapshot bypasses the WARNINGs anyway, but it might have whatever
problem the above comment is thinking about.

Also, it's just cosmetic, but this comment definitely needs updating:

            /*
             * We could possibly get away with not locking the buffer here,
             * since caller should hold ShareLock on the relation, but let's
             * be conservative about it.  (This remark is still correct even
             * with HOT-pruning: our pin on the buffer prevents pruning.)
             */
            LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);

One more thing.  Assuming that there are no live bugs here, or that we
fix them, another possible simplification would be to remove the
anyvisible = true flag and have BRIN pass SnapshotNonVacuumable.
SnapshotNonVacuumable returns true when HeapTupleSatisfiesVacuum
doesn't return HEAPTUPLE_DEAD, so I think we'd get exactly the same
behavior (again, modulo reltuples, which doesn't matter).
heap_getnext() would perform functionally the same check as the
bespoke code internally, and just wouldn't return the dead tuples in
the first place.  There's an assertion that would trip, but we could
probably just change it.  BRIN's callback might also get a different
value for tupleIsAlive in some cases, but it ignores that value
anyway.

So to summarize:

1. Is this function really safe with < ShareLock?  Both BRIN and
amcheck think so, but the function itself isn't sure. If yes, we need
to adapt the comments.  If no, we need to think about some other fix.

2. anyvisible is a funny name given what the flag really does. Maybe
we can simplify by replacing it with SnapshotNonVacuumable().
Otherwise maybe we should rename the flag.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: heapam_index_build_range_scan's anyvisible

From
Alvaro Herrera
Date:
On 2019-Jun-07, Robert Haas wrote:

> I spent some time today studying heapam_index_build_range_scan and
> quickly reached the conclusion that it's kind of a mess.  At heart
> it's pretty simple: loop over all the table, check each tuple against
> any qual, and pass the visible ones to the callback.  However, in an
> attempt to make it cater to various needs slightly outside of its
> original design purpose, various warts have been added, and there are
> enough of them now that I at least find it fairly difficult to
> understand.  One of those warts is anyvisible, which I gather was
> added in support of BRIN.

Yes, commit 2834855cb9fd added that flag.  SnapshotNonVacuumable did not
exist back then.  It seems like maybe it would work to remove the flag
and replace with passing SnapshotNonVacuumable.  The case that caused
that flag to be added is tested by a dedicated isolation test, so if
BRIN becomes broken by the change at least it'd be obvious ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: heapam_index_build_range_scan's anyvisible

From
Robert Haas
Date:
On Fri, Jun 7, 2019 at 4:30 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2019-Jun-07, Robert Haas wrote:
> > I spent some time today studying heapam_index_build_range_scan and
> > quickly reached the conclusion that it's kind of a mess.  At heart
> > it's pretty simple: loop over all the table, check each tuple against
> > any qual, and pass the visible ones to the callback.  However, in an
> > attempt to make it cater to various needs slightly outside of its
> > original design purpose, various warts have been added, and there are
> > enough of them now that I at least find it fairly difficult to
> > understand.  One of those warts is anyvisible, which I gather was
> > added in support of BRIN.
>
> Yes, commit 2834855cb9fd added that flag.  SnapshotNonVacuumable did not
> exist back then.  It seems like maybe it would work to remove the flag
> and replace with passing SnapshotNonVacuumable.  The case that caused
> that flag to be added is tested by a dedicated isolation test, so if
> BRIN becomes broken by the change at least it'd be obvious ...

Yeah, I wondered whether SnapshotNonVacuumable might've been added
later, but I was too lazy to check the commit log.  I'll try coding up
that approach and see how it looks.

But do you have any comment on the question of whether this function
is actually safe with < ShareLock, per the comments about caching
HOT-related state across buffer lock releases?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: heapam_index_build_range_scan's anyvisible

From
Alvaro Herrera
Date:
On 2019-Jun-07, Robert Haas wrote:

> Yeah, I wondered whether SnapshotNonVacuumable might've been added
> later, but I was too lazy to check the commit log.  I'll try coding up
> that approach and see how it looks.

Thanks.

> But do you have any comment on the question of whether this function
> is actually safe with < ShareLock, per the comments about caching
> HOT-related state across buffer lock releases?

Well, as far as I understand we do hold a buffer pin on the page the
whole time until we abandon it, which prevents HOT pruning, so the root
offset cache should be safe (since heap_page_prune requires cleanup
lock).  The thing we don't keep held is a buffer lock, so I/U/D could
occur, but those are not supposed to be hazards for the BRIN use, since
that's covered by the anyvisible / SnapshotNonVacuumable
hack^Wtechnique.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: heapam_index_build_range_scan's anyvisible

From
Ashwin Agrawal
Date:

On Fri, Jun 7, 2019 at 1:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
I spent some time today studying heapam_index_build_range_scan and
quickly reached the conclusion that it's kind of a mess.

While at it might be helpful and better to also decouple HeapTuple dependency for
IndexBuildCallback. Currently, all AM needs to build HeapTuple in index_build_range_scan function. I looked into all the callback functions and only htup->t_self is used from heaptuple in all the functions (unless I missed something). So, if seems fine will be happy to write patch to make that argument ItemPointer instead of HeapTuple?

Re: heapam_index_build_range_scan's anyvisible

From
Andres Freund
Date:
Hi,

On 2019-06-10 13:48:54 -0700, Ashwin Agrawal wrote:
> While at it might be helpful and better to also decouple HeapTuple
> dependency for IndexBuildCallback.

Indeed.


> Currently, all AM needs to build HeapTuple in
> index_build_range_scan function. I looked into all the callback functions
> and only htup->t_self is used from heaptuple in all the functions (unless I
> missed something). So, if seems fine will be happy to write patch to make
> that argument ItemPointer instead of HeapTuple?

I wonder if it should better be the slot? It's not inconceivable that
some AMs could benefit from that. Although it'd add some complication
to the heap HeapTupleIsHeapOnly case.

Greetings,

Andres Freund



Re: heapam_index_build_range_scan's anyvisible

From
Ashwin Agrawal
Date:

On Mon, Jun 10, 2019 at 2:56 PM Andres Freund <andres@anarazel.de> wrote:
> Currently, all AM needs to build HeapTuple in
> index_build_range_scan function. I looked into all the callback functions
> and only htup->t_self is used from heaptuple in all the functions (unless I
> missed something). So, if seems fine will be happy to write patch to make
> that argument ItemPointer instead of HeapTuple?

I wonder if it should better be the slot? It's not inconceivable that
some AMs could benefit from that. Although it'd add some complication
to the heap HeapTupleIsHeapOnly case.

I like the slot suggestion, only if can push FormIndexDatum() out of AM code as a result and pass slot to the callback. Not sure what else can it help. HeapTupleIsHeapOnly possibly can be made to work with slot similar to current hack of updating the t_self and slot's tid field, maybe.

Index callback using the slot can form the index datum. Though that would mean every Index AM callback function needs to do it, not sure which way is better. Plus, need to create ExecutorState for the same. With current setup every AM needs to do. Feels good if belongs to indexing code though instead of AM.

Currently, index build needing to create ExecutorState and all at AM layer seems not nice either. Maybe there is quite generic logic here and possible can be extracted into common code which either most of AM leverage. Or possibly the API itself can be simplified to get minimum input from AM and have rest of flow/machinery at upper layer. As Robert pointed at start of thread at heart its pretty simple flow and possibly will remain same for any AM.

Re: heapam_index_build_range_scan's anyvisible

From
Ashwin Agrawal
Date:

On Mon, Jun 10, 2019 at 5:38 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:

On Mon, Jun 10, 2019 at 2:56 PM Andres Freund <andres@anarazel.de> wrote:
> Currently, all AM needs to build HeapTuple in
> index_build_range_scan function. I looked into all the callback functions
> and only htup->t_self is used from heaptuple in all the functions (unless I
> missed something). So, if seems fine will be happy to write patch to make
> that argument ItemPointer instead of HeapTuple?

I wonder if it should better be the slot? It's not inconceivable that
some AMs could benefit from that. Although it'd add some complication
to the heap HeapTupleIsHeapOnly case.

I like the slot suggestion, only if can push FormIndexDatum() out of AM code as a result and pass slot to the callback. Not sure what else can it help. HeapTupleIsHeapOnly possibly can be made to work with slot similar to current hack of updating the t_self and slot's tid field, maybe.

Index callback using the slot can form the index datum. Though that would mean every Index AM callback function needs to do it, not sure which way is better. Plus, need to create ExecutorState for the same. With current setup every AM needs to do. Feels good if belongs to indexing code though instead of AM.

Currently, index build needing to create ExecutorState and all at AM layer seems not nice either. Maybe there is quite generic logic here and possible can be extracted into common code which either most of AM leverage. Or possibly the API itself can be simplified to get minimum input from AM and have rest of flow/machinery at upper layer. As Robert pointed at start of thread at heart its pretty simple flow and possibly will remain same for any AM.


Please find attached the patch to remove IndexBuildCallback's dependency on HeapTuple, as discussed. Changed to have the argument as ItemPointer instead of HeapTuple. Other larger refactoring if feasible for index_build_range_scan API itself can be performed as follow-up changes.

Attachment

Re: heapam_index_build_range_scan's anyvisible

From
Andres Freund
Date:
Hi,

On 2019-07-11 17:27:46 -0700, Ashwin Agrawal wrote:
> Please find attached the patch to remove IndexBuildCallback's dependency on
> HeapTuple, as discussed. Changed to have the argument as ItemPointer
> instead of HeapTuple. Other larger refactoring if feasible for
> index_build_range_scan API itself can be performed as follow-up changes.

> From f73b0061795f0c320f96ecfed0c0602ae318d73e Mon Sep 17 00:00:00 2001
> From: Ashwin Agrawal <aagrawal@pivotal.io>
> Date: Thu, 11 Jul 2019 16:58:50 -0700
> Subject: [PATCH v1] Remove IndexBuildCallback's dependency on HeapTuple.
>
> With IndexBuildCallback taking input as HeapTuple, all table AMs
> irrespective of storing the tuples in HeapTuple form or not, are
> forced to construct HeapTuple, to insert the tuple in Index. Since,
> only thing required by the index callbacks is TID and not really the
> full tuple, modify callback to only take ItemPointer.

Looks good to me. Planning to apply this unless somebody wants to argue
against it soon?

- Andres



Re: heapam_index_build_range_scan's anyvisible

From
Ashwin Agrawal
Date:

On Tue, Jul 16, 2019 at 10:22 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2019-07-11 17:27:46 -0700, Ashwin Agrawal wrote:
> Please find attached the patch to remove IndexBuildCallback's dependency on
> HeapTuple, as discussed. Changed to have the argument as ItemPointer
> instead of HeapTuple. Other larger refactoring if feasible for
> index_build_range_scan API itself can be performed as follow-up changes.

> From f73b0061795f0c320f96ecfed0c0602ae318d73e Mon Sep 17 00:00:00 2001
> From: Ashwin Agrawal <aagrawal@pivotal.io>
> Date: Thu, 11 Jul 2019 16:58:50 -0700
> Subject: [PATCH v1] Remove IndexBuildCallback's dependency on HeapTuple.
>
> With IndexBuildCallback taking input as HeapTuple, all table AMs
> irrespective of storing the tuples in HeapTuple form or not, are
> forced to construct HeapTuple, to insert the tuple in Index. Since,
> only thing required by the index callbacks is TID and not really the
> full tuple, modify callback to only take ItemPointer.

Looks good to me. Planning to apply this unless somebody wants to argue
against it soon?

Andres, I didn't yet register this for next commitfest. If its going in soon anyways will not do it otherwise let me know and I will add it to the list.

Re: heapam_index_build_range_scan's anyvisible

From
Alvaro Herrera
Date:
On 2019-Jul-30, Ashwin Agrawal wrote:

> On Tue, Jul 16, 2019 at 10:22 AM Andres Freund <andres@anarazel.de> wrote:

> > Looks good to me. Planning to apply this unless somebody wants to argue
> > against it soon?
> 
> Andres, I didn't yet register this for next commitfest. If its going in
> soon anyways will not do it otherwise let me know and I will add it to the
> list.

Sounds OK ... except that Travis points out that Ashwin forgot to patch contrib:

make[4]: Entering directory '/home/travis/build/postgresql-cfbot/postgresql/contrib/amcheck'
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -Wall
-Werror-fPIC -I. -I. -I../../src/include -I/usr/include/x86_64-linux-gnu -D_GNU_SOURCE   -c -o verify_nbtree.o
verify_nbtree.c
verify_nbtree.c: In function ‘bt_check_every_level’:
verify_nbtree.c:614:11: error: passing argument 6 of ‘table_index_build_scan’ from incompatible pointer type
[-Werror=incompatible-pointer-types]
           bt_tuple_present_callback, (void *) state, scan);
           ^
In file included from verify_nbtree.c:29:0:
../../src/include/access/tableam.h:1499:1: note: expected ‘IndexBuildCallback {aka void (*)(struct RelationData *,
structItemPointerData *, long unsigned int *, _Bool *, _Bool,  void *)}’ but argument is of type ‘void (*)(struct
RelationData*, HeapTupleData *, Datum *, _Bool *, _Bool,  void *) {aka void (*)(struct RelationData *, struct
HeapTupleData*, long unsigned int *, _Bool *, _Bool,  void *)}’
 
 table_index_build_scan(Relation table_rel,
 ^
cc1: all warnings being treated as errors
<builtin>: recipe for target 'verify_nbtree.o' failed
make[4]: *** [verify_nbtree.o] Error 1

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: heapam_index_build_range_scan's anyvisible

From
Ashwin Agrawal
Date:

On Wed, Sep 25, 2019 at 1:52 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Sounds OK ... except that Travis points out that Ashwin forgot to patch contrib:

make[4]: Entering directory '/home/travis/build/postgresql-cfbot/postgresql/contrib/amcheck'
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -Wall -Werror -fPIC -I. -I. -I../../src/include -I/usr/include/x86_64-linux-gnu -D_GNU_SOURCE   -c -o verify_nbtree.o verify_nbtree.c
verify_nbtree.c: In function ‘bt_check_every_level’:
verify_nbtree.c:614:11: error: passing argument 6 of ‘table_index_build_scan’ from incompatible pointer type [-Werror=incompatible-pointer-types]
           bt_tuple_present_callback, (void *) state, scan);
           ^
In file included from verify_nbtree.c:29:0:
../../src/include/access/tableam.h:1499:1: note: expected ‘IndexBuildCallback {aka void (*)(struct RelationData *, struct ItemPointerData *, long unsigned int *, _Bool *, _Bool,  void *)}’ but argument is of type ‘void (*)(struct RelationData *, HeapTupleData *, Datum *, _Bool *, _Bool,  void *) {aka void (*)(struct RelationData *, struct HeapTupleData *, long unsigned int *, _Bool *, _Bool,  void *)}’
 table_index_build_scan(Relation table_rel,
 ^
cc1: all warnings being treated as errors
<builtin>: recipe for target 'verify_nbtree.o' failed
make[4]: *** [verify_nbtree.o] Error 1

Thanks for reporting, I did indeed missed out contrib. Please find attached the v2 of the patch which includes the change required in contrib as well.

Attachment

Re: heapam_index_build_range_scan's anyvisible

From
Michael Paquier
Date:
On Wed, Sep 25, 2019 at 10:24:05PM -0700, Ashwin Agrawal wrote:
> Thanks for reporting, I did indeed missed out contrib. Please find attached
> the v2 of the patch which includes the change required in contrib as well.

Okay, that makes sense.  The patch looks good to me so I have switched
it to ready for committer.  Andres, Robert, would you prefer
committing this one yourself?  If not, I'll take care of it tomorrow
after a second look.
--
Michael

Attachment

Re: heapam_index_build_range_scan's anyvisible

From
Andres Freund
Date:
Hi,

On 2019-11-07 17:02:36 +0900, Michael Paquier wrote:
> On Wed, Sep 25, 2019 at 10:24:05PM -0700, Ashwin Agrawal wrote:
> > Thanks for reporting, I did indeed missed out contrib. Please find attached
> > the v2 of the patch which includes the change required in contrib as well.
> 
> Okay, that makes sense.  The patch looks good to me so I have switched
> it to ready for committer.  Andres, Robert, would you prefer
> committing this one yourself?  If not, I'll take care of it tomorrow
> after a second look.

Let me take a look this afternoon. Swapped out of my brain right now
unfortunately.

Greetings,

Andres Freund



Re: heapam_index_build_range_scan's anyvisible

From
Michael Paquier
Date:
On Thu, Nov 07, 2019 at 09:25:40AM -0800, Andres Freund wrote:
> Let me take a look this afternoon. Swapped out of my brain right now
> unfortunately.

Thanks for the update.
--
Michael

Attachment

Re: heapam_index_build_range_scan's anyvisible

From
Andres Freund
Date:
Hi,

On 2019-11-07 09:25:40 -0800, Andres Freund wrote:
> On 2019-11-07 17:02:36 +0900, Michael Paquier wrote:
> > On Wed, Sep 25, 2019 at 10:24:05PM -0700, Ashwin Agrawal wrote:
> > > Thanks for reporting, I did indeed missed out contrib. Please find attached
> > > the v2 of the patch which includes the change required in contrib as well.
> > 
> > Okay, that makes sense.  The patch looks good to me so I have switched
> > it to ready for committer.  Andres, Robert, would you prefer
> > committing this one yourself?  If not, I'll take care of it tomorrow
> > after a second look.
> 
> Let me take a look this afternoon. Swapped out of my brain right now
> unfortunately.

Looks good to me (minus a formatting change in one or two places,
undoing linebreaks). I was about to push, but after trying to write a
sentence in the commit message like three times, I'instead push first
thing tomorrow..

Greetings,

Andres Freund



Re: heapam_index_build_range_scan's anyvisible

From
Andres Freund
Date:
On 2019-11-08 01:22:45 -0800, Andres Freund wrote:
> On 2019-11-07 09:25:40 -0800, Andres Freund wrote:
> > On 2019-11-07 17:02:36 +0900, Michael Paquier wrote:
> > > On Wed, Sep 25, 2019 at 10:24:05PM -0700, Ashwin Agrawal wrote:
> > > > Thanks for reporting, I did indeed missed out contrib. Please find attached
> > > > the v2 of the patch which includes the change required in contrib as well.
> > > 
> > > Okay, that makes sense.  The patch looks good to me so I have switched
> > > it to ready for committer.  Andres, Robert, would you prefer
> > > committing this one yourself?  If not, I'll take care of it tomorrow
> > > after a second look.
> > 
> > Let me take a look this afternoon. Swapped out of my brain right now
> > unfortunately.
> 
> Looks good to me (minus a formatting change in one or two places,
> undoing linebreaks). I was about to push, but after trying to write a
> sentence in the commit message like three times, I'instead push first
> thing tomorrow..

And pushed. Sorry that this took so long.


Greetings,

Andres Freund



Re: heapam_index_build_range_scan's anyvisible

From
Michael Paquier
Date:
On Fri, Nov 08, 2019 at 12:10:35PM -0800, Andres Freund wrote:
> And pushed. Sorry that this took so long.

Thanks Andres.  I have updated the status of the patch in the CF app
accordingly: https://commitfest.postgresql.org/25/2235/.
--
Michael

Attachment