Thread: Re: pgsql: Check for conflicting queries during replay ofgistvacuumpage()

Re: pgsql: Check for conflicting queries during replay ofgistvacuumpage()

From
Alvaro Herrera
Date:
Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
change.  Otherwise, what is going to happen to an unpatched standby (of
released versions) that receives the new WAL record from a patched
primary?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
> change.  Otherwise, what is going to happen to an unpatched standby (of
> released versions) that receives the new WAL record from a patched
> primary?

We can't change XLOG_PAGE_MAGIC in released branches, surely.

I think the correct thing is just for the release notes to warn people
to upgrade standby servers first.

            regards, tom lane


Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
> change.  Otherwise, what is going to happen to an unpatched standby (of
> released versions) that receives the new WAL record from a patched
> primary?

We can't change XLOG_PAGE_MAGIC in released branches, surely.

I think the correct thing is just for the release notes to warn people
to upgrade standby servers first.

            regards, tom lane


Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
>> change.  Otherwise, what is going to happen to an unpatched standby (of
>> released versions) that receives the new WAL record from a patched
>> primary?

Oh, and if the answer to your question is not "it fails with an
intelligible error about an unrecognized WAL record type", then we
need to adjust what is emitted so that that will be what happens.
Crashing, or worse silently misprocessing the record, will not do.

            regards, tom lane


Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
>> change.  Otherwise, what is going to happen to an unpatched standby (of
>> released versions) that receives the new WAL record from a patched
>> primary?

Oh, and if the answer to your question is not "it fails with an
intelligible error about an unrecognized WAL record type", then we
need to adjust what is emitted so that that will be what happens.
Crashing, or worse silently misprocessing the record, will not do.

            regards, tom lane


Re: pgsql: Check for conflicting queries during replay ofgistvacuumpage()

From
Alvaro Herrera
Date:
On 2018-Dec-21, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
> > change.  Otherwise, what is going to happen to an unpatched standby (of
> > released versions) that receives the new WAL record from a patched
> > primary?
> 
> We can't change XLOG_PAGE_MAGIC in released branches, surely.
> 
> I think the correct thing is just for the release notes to warn people
> to upgrade standby servers first.

You're right.  My memory is playing tricks on me.  I recalled that we
had done it to prevent replay of WAL replay in nonpatched standbys in
some backpatched commit, but I can't find any evidence of this :-(
The commit message for 8e9a16ab8f7f (in 9.3 branch after it was
released) says:

    In replication scenarios using the 9.3 branch, standby servers must be
    upgraded before their master, so that they are prepared to deal with the
    new WAL record once the master is upgraded; failure to do so will cause
    WAL replay to die with a PANIC message.  Later upgrade of the standby
    will allow the process to continue where it left off, so there's no
    disruption of the data in the standby in any case.  Standbys know how to
    deal with the old WAL record, so it's okay to keep the master running
    the old code for a while.

Stupidly, I checked the 9.4 version of that commit (then the master
branch) and it does indeed contain the XLOG_PAGE_MAGIC change, but the
9.3 commit doesn't.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Check for conflicting queries during replay ofgistvacuumpage()

From
Alvaro Herrera
Date:
On 2018-Dec-21, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
> > change.  Otherwise, what is going to happen to an unpatched standby (of
> > released versions) that receives the new WAL record from a patched
> > primary?
> 
> We can't change XLOG_PAGE_MAGIC in released branches, surely.
> 
> I think the correct thing is just for the release notes to warn people
> to upgrade standby servers first.

You're right.  My memory is playing tricks on me.  I recalled that we
had done it to prevent replay of WAL replay in nonpatched standbys in
some backpatched commit, but I can't find any evidence of this :-(
The commit message for 8e9a16ab8f7f (in 9.3 branch after it was
released) says:

    In replication scenarios using the 9.3 branch, standby servers must be
    upgraded before their master, so that they are prepared to deal with the
    new WAL record once the master is upgraded; failure to do so will cause
    WAL replay to die with a PANIC message.  Later upgrade of the standby
    will allow the process to continue where it left off, so there's no
    disruption of the data in the standby in any case.  Standbys know how to
    deal with the old WAL record, so it's okay to keep the master running
    the old code for a while.

Stupidly, I checked the 9.4 version of that commit (then the master
branch) and it does indeed contain the XLOG_PAGE_MAGIC change, but the
9.3 commit doesn't.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()

From
Alexander Korotkov
Date:
On Fri, Dec 21, 2018 at 7:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> >> Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
> >> change.  Otherwise, what is going to happen to an unpatched standby (of
> >> released versions) that receives the new WAL record from a patched
> >> primary?
>
> Oh, and if the answer to your question is not "it fails with an
> intelligible error about an unrecognized WAL record type", then we
> need to adjust what is emitted so that that will be what happens.
> Crashing, or worse silently misprocessing the record, will not do.

Please, note that backpatched version takes special efforts to not
introduce new WAL record type. Unpatched standby applies WAL stream of
patched primary without any errors, but ignoring conflicts (as it was
before) [1].  Patched standby applies the same WAL stream with
conflict handling.  And I've briefly mentioned that in commit message.

"On stable releases we've to be tricky to keep WAL compatibility.
Information required for conflict processing is just appended to data
of XLOG_GIST_PAGE_UPDATE record.  So, PostgreSQL version, which
doesn't know about conflict processing, will just ignore that."

The thing we can mention in the release notes is that both primary and
standby should be upgraded to get conflict handling.  If one of them
is not upgraded, conflicts will be still missed.

Links:
1. https://www.postgresql.org/message-id/CAPpHfdsKS0K8q1sJ-XyMrU%3DL%2Be6XSAOgS09NXp1bQDQts%2Bqz%2Bg%40mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()

From
Alexander Korotkov
Date:
On Fri, Dec 21, 2018 at 7:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> >> Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
> >> change.  Otherwise, what is going to happen to an unpatched standby (of
> >> released versions) that receives the new WAL record from a patched
> >> primary?
>
> Oh, and if the answer to your question is not "it fails with an
> intelligible error about an unrecognized WAL record type", then we
> need to adjust what is emitted so that that will be what happens.
> Crashing, or worse silently misprocessing the record, will not do.

Please, note that backpatched version takes special efforts to not
introduce new WAL record type. Unpatched standby applies WAL stream of
patched primary without any errors, but ignoring conflicts (as it was
before) [1].  Patched standby applies the same WAL stream with
conflict handling.  And I've briefly mentioned that in commit message.

"On stable releases we've to be tricky to keep WAL compatibility.
Information required for conflict processing is just appended to data
of XLOG_GIST_PAGE_UPDATE record.  So, PostgreSQL version, which
doesn't know about conflict processing, will just ignore that."

The thing we can mention in the release notes is that both primary and
standby should be upgraded to get conflict handling.  If one of them
is not upgraded, conflicts will be still missed.

Links:
1. https://www.postgresql.org/message-id/CAPpHfdsKS0K8q1sJ-XyMrU%3DL%2Be6XSAOgS09NXp1bQDQts%2Bqz%2Bg%40mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company