Re: gist microvacuum doesn't appear to care about hot standby? - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: gist microvacuum doesn't appear to care about hot standby?
Date
Msg-id CAPpHfdsKS0K8q1sJ-XyMrU=L+e6XSAOgS09NXp1bQDQts+qz+g@mail.gmail.com
Whole thread Raw
In response to Re: gist microvacuum doesn't appear to care about hot standby?  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: gist microvacuum doesn't appear to care about hot standby?
List pgsql-hackers
On Tue, Dec 18, 2018 at 2:04 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Mon, Dec 17, 2018 at 3:35 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Mon, Dec 17, 2018 at 3:40 AM Alexander Korotkov
> > <a.korotkov@postgrespro.ru> wrote:
> > > On Mon, Dec 17, 2018 at 1:25 AM Andres Freund <andres@anarazel.de> wrote:
> > > > On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> > > > > Sorry for delay.  Attached patch implements conflict handling for gist
> > > > > microvacuum like btree and hash.  I'm going to push it if no
> > > > > objections.
> > > > >
> > > > > Note, that it implements new WAL record type.  So, new WAL can\t be
> > > > > replayed on old minor release.  I'm note sure if we claim that it's
> > > > > usually possible.  Should we state something explicitly for this case?
> > > >
> > > > Please hold off committing for a bit. Adding new WAL records in a minor
> > > > release ought to be very well considered and a measure of last resort.
> > > >
> > > > Couldn't we determine the xid horizon on the primary, and reuse an
> > > > existing WAL record to trigger the conflict?  Or something along those
> > > > lines?
> > >
> > > I thought about that, but decided it's better to mimic B-tree and hash
> > > behavior rather than invent new logic in a minor release.  But given
> > > that new WAL record in minor release in substantial problem, that
> > > argument doesn't matter.
> > >
> > > Yes, it seems to be possible.  We can determine xid horizon on primary
> > > in the same way you proposed for B-tree and hash [1] and use
> > > XLOG_HEAP2_CLEANUP_INFO record to trigger the conflict.  Do you like
> > > me to make such patch for GiST based on your patch?
> >
> > Got another tricky idea.  Now, deleted offset numbers are written to
> > buffer data.  We can also append them to record data.  So, basing on
> > record length we can resolve conflicts when offsets are provided in
> > record data.  Unpatched version will just ignore extra record data
> > tail.  That would cost us some redundant bigger wal records, but solve
> > other problems.  Any thoughts?
>
> Please, find backpatch version of patch implementing this approach
> attached.  I found it more attractive than placing xid horizon
> calculation to primary.  Because xid horizon calculation on primary is
> substantially new behavior, which is unwanted for backpatching.  I've
> not yet tested this patch.
>
> I'm going to test this patch including WAL compatibility.  If
> everything will be OK, then commit.

I've managed to reproduce the problem and test my backpatch solution.

primary (patched)
    standby 1 (patched)
        standby 2 (unpatched)

drop table if exists test;
create table test (p point) with (fillfactor = 50, autovacuum_enabled = false);
insert into test (select point(i % 100, i / 100) from
generate_series(0,9999) i);
vacuum test;
create index test_gist_idx on test using gist (p);
alter table test set (fillfactor = 100);

    begin isolation level repeatable read;
    select count(*) from test where p <@ box(point(0,0),point(99,99));
     count
    -------
     10000
    (1 row)

        begin isolation level repeatable read;
        select count(*) from test where p <@ box(point(0,0),point(99,99));
         count
        -------
         10000
        (1 row)

delete from test where p[0]::int % 10 = 0 and p[1]::int % 10 = 0;
set enable_seqscan = off;
set enable_bitmapscan = off;
set enable_indexonlyscan = off;
select count(*) from test where p <@ box(point(0,0),point(99,99));
insert into test (select point(i % 100, i / 100) from
generate_series(0,9999) i);

    select count(*) from test where p <@ box(point(0,0),point(99,99));
     count
    -------
     10000
    (1 row)

        select count(*) from test where p <@ box(point(0,0),point(99,99));
         count
        -------
          9961
        (1 row)

    select count(*) from test where p <@ box(point(0,0),point(99,99));
    FATAL:  terminating connection due to conflict with recovery
    DETAIL:  User query might have needed to see row versions that
must be removed.
    HINT:  In a moment you should be able to reconnect to the database
and repeat your command.
    server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
    The connection to the server was lost. Attempting reset: Succeeded.

So, two standbys were reading the same WAL generated by patched
primary.  Patched standby got conflict: it gives correct query answer
then drops transaction.  Unpatched replicate WAL stream without
conflict.  So, it gives wrong query answer as if it was reading WAL
from unpatched master.

If experimenting with unpatched primary, both standbys gives wrong
query answer without conflict.

Please, find attached two patches I'm going to commit: for master and
for backpatching.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?
Next
From: Chengchao Yu
Date:
Subject: RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs