Thread: Invalid data read from synchronously replicated hot standby

Invalid data read from synchronously replicated hot standby

From
martin.kamp.jensen@schneider-electric.com
Date:
Hi,

We are getting invalid data when reading from a synchronously replicated hot standby node in a 2-node setup. To better understand the situation, we have created a document that provides an overview. We are hoping that someone might be able to confirm whether or not the setup makes sense, i.e., whether we are using PostgreSQL correctly and experiencing a bug, or if we are using PostgreSQL incorrectly.

Link to document that contains a step-by-step description of the situation: https://docs.google.com/document/d/1MuX8rq1gKw_WZ-HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing

If the setup is sane (and expected to work), we will work on setting up a minimal reproduce that avoids our complete system. We are thinking that a scripted Ansible/Vagrant setup makes sense.

Best regards,
Martin

Re: Invalid data read from synchronously replicated hot standby

From
Sameer Kumar
Date:


On Thu, 21 Apr 2016 04:05 , <martin.kamp.jensen@schneider-electric.com> wrote:
Hi,

We are getting invalid data when reading from a synchronously replicated hot standby node in a 2-node setup. To better understand the situation, we have created a document that provides an overview. We are hoping that someone might be able to confirm whether or not the setup makes sense, i.e., whether we are using PostgreSQL correctly and experiencing a bug, or if we are using PostgreSQL incorrectly.

Link to document that contains a step-by-step description of the situation: https://docs.google.com/document/d/1MuX8rq1gKw_WZ-HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing





If the setup is sane (and expected to work), we will work on setting up a minimal reproduce that avoids our complete system. We are thinking that a scripted Ansible/Vagrant setup makes sense.

I am not sure if it is because of that but you are on an old patch. Upgrade to latest (I guess 9.1.21).

Once you have upgraded, re-create the stand by from scratch using a basebackup and then see if the error is still there.


Best regards,
Martin
--
--
Best Regards
Sameer Kumar | DB Solution Architect 
ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com

Re: Invalid data read from synchronously replicated hot standby

From
Adrian Klaver
Date:
On 04/21/2016 01:05 AM, martin.kamp.jensen@schneider-electric.com wrote:
> Hi,
>
> We are getting invalid data when reading from a synchronously replicated
> hot standby node in a 2-node setup. To better understand the situation,
> we have created a document that provides an overview. We are hoping that
> someone might be able to confirm whether or not the setup makes sense,
> i.e., whether we are using PostgreSQL correctly and experiencing a bug,
> or if we are using PostgreSQL incorrectly.
>
> Link to document that contains a step-by-step description of the
> situation:
> https://docs.google.com/document/d/1MuX8rq1gKw_WZ-HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing
>
>
> If the setup is sane (and expected to work), we will work on setting up
> a minimal reproduce that avoids our complete system. We are thinking
> that a scripted Ansible/Vagrant setup makes sense.

Questions:

What is wal_level set to?

Why on Node A do you have in recovery.conf?:
primary_conninfo = 'host=<Node A IP address>'

What exactly are you trying to do?

Looks to me you are trying to have multi-master, is that the case?

>
> Best regards,
> Martin


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Invalid data read from synchronously replicated hot standby

From
martin.kamp.jensen@schneider-electric.com
Date:

Adrian Klaver <adrian.klaver@aklaver.com> wrote on 04/21/2016 16:03:55:

> From: Adrian Klaver <adrian.klaver@aklaver.com>

> To: Martin Kamp Jensen/DK/Schneider@Europe, pgsql-general@postgresql.org
> Date: 04/21/2016 16:09
> Subject: Re: [GENERAL] Invalid data read from synchronously
> replicated hot standby

>
> On 04/21/2016 01:05 AM, martin.kamp.jensen@schneider-electric.com wrote:
> > Hi,
> >
> > We are getting invalid data when reading from a synchronously replicated
> > hot standby node in a 2-node setup. To better understand the situation,
> > we have created a document that provides an overview. We are hoping that
> > someone might be able to confirm whether or not the setup makes sense,
> > i.e., whether we are using PostgreSQL correctly and experiencing a bug,
> > or if we are using PostgreSQL incorrectly.
> >
> > Link to document that contains a step-by-step description of the
> > situation:
> >
https://docs.google.com/document/d/1MuX8rq1gKw_WZ-
> HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing
> >
> >
> > If the setup is sane (and expected to work), we will work on setting up
> > a minimal reproduce that avoids our complete system. We are thinking
> > that a scripted Ansible/Vagrant setup makes sense.
>
> Questions:
>
> What is wal_level set to?


wal_level = hot_standby

>
> Why on Node A do you have in recovery.conf?:
> primary_conninfo = 'host=<Node A IP address>'
>
> What exactly are you trying to do?
>
> Looks to me you are trying to have multi-master, is that the case?


Eh, that's a mistake in the document. Probably because of a leftover recovery.done file. We only have Node A as master. I have updated the document.

I have been trying to reproduce the issue with a simple setup but so far without any luck.

>
> >
> > Best regards,
> > Martin
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> ______________________________________________________________________

Re: Invalid data read from synchronously replicated hot standby

From
martin.kamp.jensen@schneider-electric.com
Date:

Sameer Kumar <sameer.kumar@ashnik.com> wrote on 04/21/2016 13:56:52:

> From: Sameer Kumar <sameer.kumar@ashnik.com>

> To: Martin Kamp Jensen/DK/Schneider@Europe, pgsql-general@postgresql.org
> Date: 04/21/2016 14:00
> Subject: Re: [GENERAL] Invalid data read from synchronously
> replicated hot standby

>
>

> On Thu, 21 Apr 2016 04:05 , <martin.kamp.jensen@schneider-electric.com> wrote:
> Hi,
>
> We are getting invalid data when reading from a synchronously
> replicated hot standby node in a 2-node setup. To better understand
> the situation, we have created a document that provides an overview.
> We are hoping that someone might be able to confirm whether or not
> the setup makes sense, i.e., whether we are using PostgreSQL
> correctly and experiencing a bug, or if we are using PostgreSQL incorrectly.
>
> Link to document that contains a step-by-step description of the situation:
>
https://docs.google.com/document/d/1MuX8rq1gKw_WZ-
> HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing
>
>
>
>
>
> If the setup is sane (and expected to work), we will work on setting
> up a minimal reproduce that avoids our complete system. We are
> thinking that a scripted Ansible/Vagrant setup makes sense.

>
> I am not sure if it is because of that but you are on an old patch.
> Upgrade to latest (I guess 9.1.21).


I have reproduced the issue on 9.1.20 which is the latest version for Debian 6 (yes, I know, old stuff).

In the mean time, we are preparing a new platform on 9.5.2 where I have not been able to reproduce the issue (however, we have introduced a lot of changes besides upgrading PostgreSQL). I would have liked to be able to come up with a minimal reproduce to be able to reason about the issue but I guess we will not pursue that for now.

>
> Once you have upgraded, re-create the stand by from scratch using a
> basebackup and then see if the error is still there.

>
>
> Best regards,
> Martin

> --
> --
> Best Regards
> Sameer Kumar | DB Solution Architect 
> ASHNIK PTE. LTD.
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
> T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> ______________________________________________________________________

Re: Invalid data read from synchronously replicated hot standby

From
Kevin Grittner
Date:
On Wed, May 11, 2016 at 5:44 AM,
<martin.kamp.jensen@schneider-electric.com> wrote:

>> We are getting invalid data when reading from a synchronously
>> replicated hot standby node in a 2-node setup. To better understand
>> the situation, we have created a document that provides an overview.
>> We are hoping that someone might be able to confirm whether or not
>> the setup makes sense, i.e., whether we are using PostgreSQL
>> correctly and experiencing a bug, or if we are using PostgreSQL
>> incorrectly.
>>
>> Link to document that contains a step-by-step description of the
>> situation:
>> https://docs.google.com/document/d/1MuX8rq1gKw_WZ-HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing

Please include such information in your post or as an attachment.
Who knows whether that link will still be usable and unchanged 20
years from now?

>> If the setup is sane (and expected to work),

I didn't see anywhere that you correctly handled WAL in setting up
your standby.  I am not surprised by there being corruption,
including duplicate keys in a unique index.  You might try -x or -X
when you run pg_basebackup, or use archiving.  Whatever you do, do
NOT delete the backup_label file!

> In the mean time, we are preparing a new platform on 9.5.2 where
> I have not been able to reproduce the issue (however, we have
> introduced a lot of changes besides upgrading PostgreSQL).

We would need a lot more detail to be able to even guess at whether
you have actually solved the flaws in your process or have just
been lucky so far.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company