Thread: gist microvacuum doesn't appear to care about hot standby?

gist microvacuum doesn't appear to care about hot standby?

From
Andres Freund
Date:
Hi,

Am I missing something or did

commit 013ebc0a7b7ea9c1b1ab7a3d4dd75ea121ea8ba7
Author: Teodor Sigaev <teodor@sigaev.ru>
Date:   2015-09-09 18:43:37 +0300

    Microvacuum for GIST

    Mark index tuple as dead if it's pointed by kill_prior_tuple during
    ordinary (search) scan and remove it during insert process if there is no
    enough space for new tuple to insert. This improves select performance
    because index will not return tuple marked as dead and improves insert
    performance because it reduces number of page split.

    Anastasia Lubennikova <a.lubennikova@postgrespro.ru> with
     minor editorialization by me

entirely disregard recovery conflict handling?  The index entries it
removes could very well be visible to a snapshot on the standby. That's
why the equivalent nbtree (and hash) code does:


    /*
     * If we have any conflict processing to do, it must happen before we
     * update the page.
     *
     * Btree delete records can conflict with standby queries.  You might
     * think that vacuum records would conflict as well, but we've handled
     * that already.  XLOG_HEAP2_CLEANUP_INFO records provide the highest xid
     * cleaned by the vacuum of the heap and so we can resolve any conflicts
     * just once when that arrives.  After that we know that no conflicts
     * exist from individual btree vacuum records on that index.
     */
    if (InHotStandby)
    {
        TransactionId latestRemovedXid = btree_xlog_delete_get_latestRemovedXid(record);
        RelFileNode rnode;

        XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);

        ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
                                            xlrec->onCatalogTable, rnode);
    }

Is there any reason something like that isn't necessary for gist? If so,
where's that documented? If not, uh, isn't that a somewhat serious bug
in gist?

Greetings,

Andres Freund


Re: gist microvacuum doesn't appear to care about hot standby?

From
Alexander Korotkov
Date:
On Thu, Dec 13, 2018 at 1:45 AM Andres Freund <andres@anarazel.de> wrote:
> Is there any reason something like that isn't necessary for gist? If so,
> where's that documented? If not, uh, isn't that a somewhat serious bug
> in gist?

Thank you for pointing!  This looks like a bug for me too.  I'm going
to investigate more on this and provide a fix in next couple of days.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: gist microvacuum doesn't appear to care about hot standby?

From
Alexander Korotkov
Date:
On Thu, Dec 13, 2018 at 7:28 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Thu, Dec 13, 2018 at 1:45 AM Andres Freund <andres@anarazel.de> wrote:
> > Is there any reason something like that isn't necessary for gist? If so,
> > where's that documented? If not, uh, isn't that a somewhat serious bug
> > in gist?
>
> Thank you for pointing!  This looks like a bug for me too.  I'm going
> to investigate more on this and provide a fix in next couple of days.

Sorry for delay.  Attached patch implements conflict handling for gist
microvacuum like btree and hash.  I'm going to push it if no
objections.

Note, that it implements new WAL record type.  So, new WAL can\t be
replayed on old minor release.  I'm note sure if we claim that it's
usually possible.  Should we state something explicitly for this case?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: gist microvacuum doesn't appear to care about hot standby?

From
Andres Freund
Date:
Hi,

On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> On Thu, Dec 13, 2018 at 7:28 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Thu, Dec 13, 2018 at 1:45 AM Andres Freund <andres@anarazel.de> wrote:
> > > Is there any reason something like that isn't necessary for gist? If so,
> > > where's that documented? If not, uh, isn't that a somewhat serious bug
> > > in gist?
> >
> > Thank you for pointing!  This looks like a bug for me too.  I'm going
> > to investigate more on this and provide a fix in next couple of days.
> 
> Sorry for delay.  Attached patch implements conflict handling for gist
> microvacuum like btree and hash.  I'm going to push it if no
> objections.
> 
> Note, that it implements new WAL record type.  So, new WAL can\t be
> replayed on old minor release.  I'm note sure if we claim that it's
> usually possible.  Should we state something explicitly for this case?

Please hold off committing for a bit. Adding new WAL records in a minor
release ought to be very well considered and a measure of last resort.

Couldn't we determine the xid horizon on the primary, and reuse an
existing WAL record to trigger the conflict?  Or something along those
lines?

Greetings,

Andres Freund


Re: gist microvacuum doesn't appear to care about hot standby?

From
Alexander Korotkov
Date:
On Mon, Dec 17, 2018 at 1:25 AM Andres Freund <andres@anarazel.de> wrote:
> On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> > On Thu, Dec 13, 2018 at 7:28 AM Alexander Korotkov
> > <a.korotkov@postgrespro.ru> wrote:
> > > On Thu, Dec 13, 2018 at 1:45 AM Andres Freund <andres@anarazel.de> wrote:
> > > > Is there any reason something like that isn't necessary for gist? If so,
> > > > where's that documented? If not, uh, isn't that a somewhat serious bug
> > > > in gist?
> > >
> > > Thank you for pointing!  This looks like a bug for me too.  I'm going
> > > to investigate more on this and provide a fix in next couple of days.
> >
> > Sorry for delay.  Attached patch implements conflict handling for gist
> > microvacuum like btree and hash.  I'm going to push it if no
> > objections.
> >
> > Note, that it implements new WAL record type.  So, new WAL can\t be
> > replayed on old minor release.  I'm note sure if we claim that it's
> > usually possible.  Should we state something explicitly for this case?
>
> Please hold off committing for a bit. Adding new WAL records in a minor
> release ought to be very well considered and a measure of last resort.
>
> Couldn't we determine the xid horizon on the primary, and reuse an
> existing WAL record to trigger the conflict?  Or something along those
> lines?

I thought about that, but decided it's better to mimic B-tree and hash
behavior rather than invent new logic in a minor release.  But given
that new WAL record in minor release in substantial problem, that
argument doesn't matter.

Yes, it seems to be possible.  We can determine xid horizon on primary
in the same way you proposed for B-tree and hash [1] and use
XLOG_HEAP2_CLEANUP_INFO record to trigger the conflict.  Do you like
me to make such patch for GiST based on your patch?

1. https://www.postgresql.org/message-id/20181214014235.dal5ogljs3bmlq44%40alap3.anarazel.de

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: gist microvacuum doesn't appear to care about hot standby?

From
Alexander Korotkov
Date:
On Mon, Dec 17, 2018 at 3:40 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Mon, Dec 17, 2018 at 1:25 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> > > Sorry for delay.  Attached patch implements conflict handling for gist
> > > microvacuum like btree and hash.  I'm going to push it if no
> > > objections.
> > >
> > > Note, that it implements new WAL record type.  So, new WAL can\t be
> > > replayed on old minor release.  I'm note sure if we claim that it's
> > > usually possible.  Should we state something explicitly for this case?
> >
> > Please hold off committing for a bit. Adding new WAL records in a minor
> > release ought to be very well considered and a measure of last resort.
> >
> > Couldn't we determine the xid horizon on the primary, and reuse an
> > existing WAL record to trigger the conflict?  Or something along those
> > lines?
>
> I thought about that, but decided it's better to mimic B-tree and hash
> behavior rather than invent new logic in a minor release.  But given
> that new WAL record in minor release in substantial problem, that
> argument doesn't matter.
>
> Yes, it seems to be possible.  We can determine xid horizon on primary
> in the same way you proposed for B-tree and hash [1] and use
> XLOG_HEAP2_CLEANUP_INFO record to trigger the conflict.  Do you like
> me to make such patch for GiST based on your patch?

Got another tricky idea.  Now, deleted offset numbers are written to
buffer data.  We can also append them to record data.  So, basing on
record length we can resolve conflicts when offsets are provided in
record data.  Unpatched version will just ignore extra record data
tail.  That would cost us some redundant bigger wal records, but solve
other problems.  Any thoughts?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: gist microvacuum doesn't appear to care about hot standby?

From
Peter Geoghegan
Date:
On Sun, Dec 16, 2018 at 4:41 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> I thought about that, but decided it's better to mimic B-tree and hash
> behavior rather than invent new logic in a minor release.  But given
> that new WAL record in minor release in substantial problem, that
> argument doesn't matter.

It might be better to mimic B-Tree and hash on the master branch.

-- 
Peter Geoghegan


Re: gist microvacuum doesn't appear to care about hot standby?

From
Alexander Korotkov
Date:
On Mon, Dec 17, 2018 at 3:35 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Mon, Dec 17, 2018 at 3:40 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Mon, Dec 17, 2018 at 1:25 AM Andres Freund <andres@anarazel.de> wrote:
> > > On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> > > > Sorry for delay.  Attached patch implements conflict handling for gist
> > > > microvacuum like btree and hash.  I'm going to push it if no
> > > > objections.
> > > >
> > > > Note, that it implements new WAL record type.  So, new WAL can\t be
> > > > replayed on old minor release.  I'm note sure if we claim that it's
> > > > usually possible.  Should we state something explicitly for this case?
> > >
> > > Please hold off committing for a bit. Adding new WAL records in a minor
> > > release ought to be very well considered and a measure of last resort.
> > >
> > > Couldn't we determine the xid horizon on the primary, and reuse an
> > > existing WAL record to trigger the conflict?  Or something along those
> > > lines?
> >
> > I thought about that, but decided it's better to mimic B-tree and hash
> > behavior rather than invent new logic in a minor release.  But given
> > that new WAL record in minor release in substantial problem, that
> > argument doesn't matter.
> >
> > Yes, it seems to be possible.  We can determine xid horizon on primary
> > in the same way you proposed for B-tree and hash [1] and use
> > XLOG_HEAP2_CLEANUP_INFO record to trigger the conflict.  Do you like
> > me to make such patch for GiST based on your patch?
>
> Got another tricky idea.  Now, deleted offset numbers are written to
> buffer data.  We can also append them to record data.  So, basing on
> record length we can resolve conflicts when offsets are provided in
> record data.  Unpatched version will just ignore extra record data
> tail.  That would cost us some redundant bigger wal records, but solve
> other problems.  Any thoughts?

Please, find backpatch version of patch implementing this approach
attached.  I found it more attractive than placing xid horizon
calculation to primary.  Because xid horizon calculation on primary is
substantially new behavior, which is unwanted for backpatching.  I've
not yet tested this patch.

I'm going to test this patch including WAL compatibility.  If
everything will be OK, then commit.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: gist microvacuum doesn't appear to care about hot standby?

From
Alexander Korotkov
Date:
On Tue, Dec 18, 2018 at 2:04 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Mon, Dec 17, 2018 at 3:35 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Mon, Dec 17, 2018 at 3:40 AM Alexander Korotkov
> > <a.korotkov@postgrespro.ru> wrote:
> > > On Mon, Dec 17, 2018 at 1:25 AM Andres Freund <andres@anarazel.de> wrote:
> > > > On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> > > > > Sorry for delay.  Attached patch implements conflict handling for gist
> > > > > microvacuum like btree and hash.  I'm going to push it if no
> > > > > objections.
> > > > >
> > > > > Note, that it implements new WAL record type.  So, new WAL can\t be
> > > > > replayed on old minor release.  I'm note sure if we claim that it's
> > > > > usually possible.  Should we state something explicitly for this case?
> > > >
> > > > Please hold off committing for a bit. Adding new WAL records in a minor
> > > > release ought to be very well considered and a measure of last resort.
> > > >
> > > > Couldn't we determine the xid horizon on the primary, and reuse an
> > > > existing WAL record to trigger the conflict?  Or something along those
> > > > lines?
> > >
> > > I thought about that, but decided it's better to mimic B-tree and hash
> > > behavior rather than invent new logic in a minor release.  But given
> > > that new WAL record in minor release in substantial problem, that
> > > argument doesn't matter.
> > >
> > > Yes, it seems to be possible.  We can determine xid horizon on primary
> > > in the same way you proposed for B-tree and hash [1] and use
> > > XLOG_HEAP2_CLEANUP_INFO record to trigger the conflict.  Do you like
> > > me to make such patch for GiST based on your patch?
> >
> > Got another tricky idea.  Now, deleted offset numbers are written to
> > buffer data.  We can also append them to record data.  So, basing on
> > record length we can resolve conflicts when offsets are provided in
> > record data.  Unpatched version will just ignore extra record data
> > tail.  That would cost us some redundant bigger wal records, but solve
> > other problems.  Any thoughts?
>
> Please, find backpatch version of patch implementing this approach
> attached.  I found it more attractive than placing xid horizon
> calculation to primary.  Because xid horizon calculation on primary is
> substantially new behavior, which is unwanted for backpatching.  I've
> not yet tested this patch.
>
> I'm going to test this patch including WAL compatibility.  If
> everything will be OK, then commit.

I've managed to reproduce the problem and test my backpatch solution.

primary (patched)
    standby 1 (patched)
        standby 2 (unpatched)

drop table if exists test;
create table test (p point) with (fillfactor = 50, autovacuum_enabled = false);
insert into test (select point(i % 100, i / 100) from
generate_series(0,9999) i);
vacuum test;
create index test_gist_idx on test using gist (p);
alter table test set (fillfactor = 100);

    begin isolation level repeatable read;
    select count(*) from test where p <@ box(point(0,0),point(99,99));
     count
    -------
     10000
    (1 row)

        begin isolation level repeatable read;
        select count(*) from test where p <@ box(point(0,0),point(99,99));
         count
        -------
         10000
        (1 row)

delete from test where p[0]::int % 10 = 0 and p[1]::int % 10 = 0;
set enable_seqscan = off;
set enable_bitmapscan = off;
set enable_indexonlyscan = off;
select count(*) from test where p <@ box(point(0,0),point(99,99));
insert into test (select point(i % 100, i / 100) from
generate_series(0,9999) i);

    select count(*) from test where p <@ box(point(0,0),point(99,99));
     count
    -------
     10000
    (1 row)

        select count(*) from test where p <@ box(point(0,0),point(99,99));
         count
        -------
          9961
        (1 row)

    select count(*) from test where p <@ box(point(0,0),point(99,99));
    FATAL:  terminating connection due to conflict with recovery
    DETAIL:  User query might have needed to see row versions that
must be removed.
    HINT:  In a moment you should be able to reconnect to the database
and repeat your command.
    server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
    The connection to the server was lost. Attempting reset: Succeeded.

So, two standbys were reading the same WAL generated by patched
primary.  Patched standby got conflict: it gives correct query answer
then drops transaction.  Unpatched replicate WAL stream without
conflict.  So, it gives wrong query answer as if it was reading WAL
from unpatched master.

If experimenting with unpatched primary, both standbys gives wrong
query answer without conflict.

Please, find attached two patches I'm going to commit: for master and
for backpatching.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: gist microvacuum doesn't appear to care about hot standby?

From
Alexander Korotkov
Date:
On Thu, Dec 20, 2018 at 1:41 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Please, find attached two patches I'm going to commit: for master and
> for backpatching.

So, pushed.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: gist microvacuum doesn't appear to care about hot standby?

From
Andres Freund
Date:
Hi,

On 2018-12-21 02:40:18 +0300, Alexander Korotkov wrote:
> On Thu, Dec 20, 2018 at 1:41 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Please, find attached two patches I'm going to commit: for master and
> > for backpatching.
> 
> So, pushed.

I noticed that I didn't adapt this in

commit 558a9165e081d1936573e5a7d576f5febd7fb55a
Author: Andres Freund <andres@anarazel.de>
Date:   2019-03-26 14:41:46 -0700

    Compute XID horizon for page level index vacuum on primary.


Attached you thus can find the conversion of gist to the new logic for
computing the horizon. Any comments?

Greetings,

Andres Freund

Attachment