Thread: Invalid data read from synchronously replicated hot standby
Invalid data read from synchronously replicated hot standby
From
martin.kamp.jensen@schneider-electric.com
Date:
Hi,
We are getting invalid data when reading from a synchronously replicated hot standby node in a 2-node setup. To better understand the situation, we have created a document that provides an overview. We are hoping that someone might be able to confirm whether or not the setup makes sense, i.e., whether we are using PostgreSQL correctly and experiencing a bug, or if we are using PostgreSQL incorrectly.
Link to document that contains a step-by-step description of the situation: https://docs.google.com/document/d/1MuX8rq1gKw_WZ-HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing
If the setup is sane (and expected to work), we will work on setting up a minimal reproduce that avoids our complete system. We are thinking that a scripted Ansible/Vagrant setup makes sense.
Best regards,
Martin
We are getting invalid data when reading from a synchronously replicated hot standby node in a 2-node setup. To better understand the situation, we have created a document that provides an overview. We are hoping that someone might be able to confirm whether or not the setup makes sense, i.e., whether we are using PostgreSQL correctly and experiencing a bug, or if we are using PostgreSQL incorrectly.
Link to document that contains a step-by-step description of the situation: https://docs.google.com/document/d/1MuX8rq1gKw_WZ-HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing
If the setup is sane (and expected to work), we will work on setting up a minimal reproduce that avoids our complete system. We are thinking that a scripted Ansible/Vagrant setup makes sense.
Best regards,
Martin
On Thu, 21 Apr 2016 04:05 , <martin.kamp.jensen@schneider-electric.com> wrote:
Hi,
We are getting invalid data when reading from a synchronously replicated hot standby node in a 2-node setup. To better understand the situation, we have created a document that provides an overview. We are hoping that someone might be able to confirm whether or not the setup makes sense, i.e., whether we are using PostgreSQL correctly and experiencing a bug, or if we are using PostgreSQL incorrectly.
Link to document that contains a step-by-step description of the situation: https://docs.google.com/document/d/1MuX8rq1gKw_WZ-HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing
If the setup is sane (and expected to work), we will work on setting up a minimal reproduce that avoids our complete system. We are thinking that a scripted Ansible/Vagrant setup makes sense.
I am not sure if it is because of that but you are on an old patch. Upgrade to latest (I guess 9.1.21).
Once you have upgraded, re-create the stand by from scratch using a basebackup and then see if the error is still there.
Best regards,
Martin
--
--
Best Regards
Sameer Kumar | DB Solution Architect
ASHNIK PTE. LTD.
101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com
On 04/21/2016 01:05 AM, martin.kamp.jensen@schneider-electric.com wrote: > Hi, > > We are getting invalid data when reading from a synchronously replicated > hot standby node in a 2-node setup. To better understand the situation, > we have created a document that provides an overview. We are hoping that > someone might be able to confirm whether or not the setup makes sense, > i.e., whether we are using PostgreSQL correctly and experiencing a bug, > or if we are using PostgreSQL incorrectly. > > Link to document that contains a step-by-step description of the > situation: > https://docs.google.com/document/d/1MuX8rq1gKw_WZ-HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing > > > If the setup is sane (and expected to work), we will work on setting up > a minimal reproduce that avoids our complete system. We are thinking > that a scripted Ansible/Vagrant setup makes sense. Questions: What is wal_level set to? Why on Node A do you have in recovery.conf?: primary_conninfo = 'host=<Node A IP address>' What exactly are you trying to do? Looks to me you are trying to have multi-master, is that the case? > > Best regards, > Martin -- Adrian Klaver adrian.klaver@aklaver.com
Re: Invalid data read from synchronously replicated hot standby
From
martin.kamp.jensen@schneider-electric.com
Date:
Adrian Klaver <adrian.klaver@aklaver.com> wrote on 04/21/2016 16:03:55:
> From: Adrian Klaver <adrian.klaver@aklaver.com>
> To: Martin Kamp Jensen/DK/Schneider@Europe, pgsql-general@postgresql.org
> Date: 04/21/2016 16:09
> Subject: Re: [GENERAL] Invalid data read from synchronously
> replicated hot standby
>
> On 04/21/2016 01:05 AM, martin.kamp.jensen@schneider-electric.com wrote:
> > Hi,
> >
> > We are getting invalid data when reading from a synchronously replicated
> > hot standby node in a 2-node setup. To better understand the situation,
> > we have created a document that provides an overview. We are hoping that
> > someone might be able to confirm whether or not the setup makes sense,
> > i.e., whether we are using PostgreSQL correctly and experiencing a bug,
> > or if we are using PostgreSQL incorrectly.
> >
> > Link to document that contains a step-by-step description of the
> > situation:
> > https://docs.google.com/document/d/1MuX8rq1gKw_WZ-
> HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing
> >
> >
> > If the setup is sane (and expected to work), we will work on setting up
> > a minimal reproduce that avoids our complete system. We are thinking
> > that a scripted Ansible/Vagrant setup makes sense.
>
> Questions:
>
> What is wal_level set to?
wal_level = hot_standby
>
> Why on Node A do you have in recovery.conf?:
> primary_conninfo = 'host=<Node A IP address>'
>
> What exactly are you trying to do?
>
> Looks to me you are trying to have multi-master, is that the case?
Eh, that's a mistake in the document. Probably because of a leftover recovery.done file. We only have Node A as master. I have updated the document.
I have been trying to reproduce the issue with a simple setup but so far without any luck.
>
> >
> > Best regards,
> > Martin
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> ______________________________________________________________________
Re: Invalid data read from synchronously replicated hot standby
From
martin.kamp.jensen@schneider-electric.com
Date:
Sameer Kumar <sameer.kumar@ashnik.com> wrote on 04/21/2016 13:56:52:
> From: Sameer Kumar <sameer.kumar@ashnik.com>
> To: Martin Kamp Jensen/DK/Schneider@Europe, pgsql-general@postgresql.org
> Date: 04/21/2016 14:00
> Subject: Re: [GENERAL] Invalid data read from synchronously
> replicated hot standby
>
>
> On Thu, 21 Apr 2016 04:05 , <martin.kamp.jensen@schneider-electric.com> wrote:
> Hi,
>
> We are getting invalid data when reading from a synchronously
> replicated hot standby node in a 2-node setup. To better understand
> the situation, we have created a document that provides an overview.
> We are hoping that someone might be able to confirm whether or not
> the setup makes sense, i.e., whether we are using PostgreSQL
> correctly and experiencing a bug, or if we are using PostgreSQL incorrectly.
>
> Link to document that contains a step-by-step description of the situation:
> https://docs.google.com/document/d/1MuX8rq1gKw_WZ-
> HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing
>
>
>
>
>
> If the setup is sane (and expected to work), we will work on setting
> up a minimal reproduce that avoids our complete system. We are
> thinking that a scripted Ansible/Vagrant setup makes sense.
>
> I am not sure if it is because of that but you are on an old patch.
> Upgrade to latest (I guess 9.1.21).
I have reproduced the issue on 9.1.20 which is the latest version for Debian 6 (yes, I know, old stuff).
In the mean time, we are preparing a new platform on 9.5.2 where I have not been able to reproduce the issue (however, we have introduced a lot of changes besides upgrading PostgreSQL). I would have liked to be able to come up with a minimal reproduce to be able to reason about the issue but I guess we will not pursue that for now.
>
> Once you have upgraded, re-create the stand by from scratch using a
> basebackup and then see if the error is still there.
>
>
> Best regards,
> Martin
> --
> --
> Best Regards
> Sameer Kumar | DB Solution Architect
> ASHNIK PTE. LTD.
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
> T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> ______________________________________________________________________
On Wed, May 11, 2016 at 5:44 AM, <martin.kamp.jensen@schneider-electric.com> wrote: >> We are getting invalid data when reading from a synchronously >> replicated hot standby node in a 2-node setup. To better understand >> the situation, we have created a document that provides an overview. >> We are hoping that someone might be able to confirm whether or not >> the setup makes sense, i.e., whether we are using PostgreSQL >> correctly and experiencing a bug, or if we are using PostgreSQL >> incorrectly. >> >> Link to document that contains a step-by-step description of the >> situation: >> https://docs.google.com/document/d/1MuX8rq1gKw_WZ-HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing Please include such information in your post or as an attachment. Who knows whether that link will still be usable and unchanged 20 years from now? >> If the setup is sane (and expected to work), I didn't see anywhere that you correctly handled WAL in setting up your standby. I am not surprised by there being corruption, including duplicate keys in a unique index. You might try -x or -X when you run pg_basebackup, or use archiving. Whatever you do, do NOT delete the backup_label file! > In the mean time, we are preparing a new platform on 9.5.2 where > I have not been able to reproduce the issue (however, we have > introduced a lot of changes besides upgrading PostgreSQL). We would need a lot more detail to be able to even guess at whether you have actually solved the flaws in your process or have just been lucky so far. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company