Thread: Urgent :: Postgresql streaming replication issue - sync mode

Urgent :: Postgresql streaming replication issue - sync mode

From
Shital A
Date:
Hello,

We are seeing a strange issue with postgresql streaming application in sync mode. 

We are using postgresql 9.6. Old version because of some specific requirements.  We have setup cluster with master-standby using pacemaker. 

When we kill master using killall -9 postgres. The failed primary has few records extra than standby node. We have done setup with synchronous_commit = remote_apply and have set synchronous_standby_names=server_name. 

As the failed primary is having more data, How is it possible that primary is committing transaction before they were applied on standby with synchronous_commit=remote_apply?


Please share if you have any thoughts. Are we missing any config ? 

Thanks ! 

Re: Urgent :: Postgresql streaming replication issue - sync mode

From
Ravi Krishna
Date:
>
> As the failed primary is having more data, How is it possible that primary is committing transaction before they were
appliedon standby with synchronous_commit=remote_apply? 

If I am not mistaken remote_apply is only from ver 11.




Re: Urgent :: Postgresql streaming replication issue - sync mode

From
Shital A
Date:


On Thu, 3 Oct 2019, 00:08 Ravi Krishna, <srkrishna@vivaldi.net> wrote:
>
> As the failed primary is having more data, How is it possible that primary is committing transaction before they were applied on standby with synchronous_commit=remote_apply?

If I am not mistaken remote_apply is only from ver 11.

Hi Ravi,

Thanks for your reply.

This property/feature is available in 9.6.


Thanks!

Re: Urgent :: Postgresql streaming replication issue - sync mode

From
Shital A
Date:


On Thu, 3 Oct 2019, 03:10 Jason Wang, <jasonwang.public@gmail.com> wrote:
I think when you use kill -9 it wouldn't give any chance for postgres to do what it normally does. So in your case, the db was killed with no chance to apply to remote then it would be up to the recovery to decide how to handle the extra data at the master. I'm not sure what would happen but killall in general is a dangerous command.

On Thu, 3 Oct 2019, 7:00 am Shital A, <brightuser2019@gmail.com> wrote:


On Thu, 3 Oct 2019, 00:08 Ravi Krishna, <srkrishna@vivaldi.net> wrote:
>
> As the failed primary is having more data, How is it possible that primary is committing transaction before they were applied on standby with synchronous_commit=remote_apply?

If I am not mistaken remote_apply is only from ver 11.

Hi Ravi,

Thanks for your reply.

This property/feature is available in 9.6.


Thanks!

Thanks Jason.

Using killall -9 we are trying to simulate the situation where primary is stopped unexpectedly/crashed.

So in this case there is data loss because when the broken primary later comes in sync with new primary it copies data from new primary and the data records that were extra in old primary are lost. Can this data loss be prevented in anyway in postgres 9.6 ? Please suggest. 


Thanks! 

Re: Urgent :: Postgresql streaming replication issue - sync mode

From
Laurenz Albe
Date:
On Wed, 2019-10-02 at 23:58 +0530, Shital A wrote:
> We are seeing a strange issue with postgresql streaming application
> in sync mode. 
> 
> We are using postgresql 9.6. Old version because of some specific
> requirements.  We have setup cluster with master-standby using
> pacemaker. 
> 
> When we kill master using killall -9 postgres. The failed primary has
> few records extra than standby node. We have done setup with
> synchronous_commit = remote_apply and have set
> synchronous_standby_names=server_name. 
> 
> As the failed primary is having more data, How is it possible that
> primary is committing transaction before they were applied on standby
> with synchronous_commit=remote_apply?
> 
> 
> Please share if you have any thoughts. Are we missing any config ?

This is to be expected.

The transaction will be committed on the primary, then on the standby,
and COMMIT will only return once the standby reports success.

But the transacaction still has to be committed on the primary first.

If the standby sis promoted while COMMIT is waiting for the standby,
you can end up with the transaction committed on the primary,
but not yet committed on the standby.

You should use "pg_rewind" on the failed primary if you want to use
it as new standby for the promoted server.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Urgent :: Postgresql streaming replication issue - sync mode

From
Jason Wang
Date:
I read this https://www.2ndquadrant.com/en/blog/evolution-fault-tolerance-postgresql-synchronous-commit/

But don't see why your primary would have more records than the standby? 

If killall was issued before commit returned, that means the transaction wasn't completed so yes you would lose records after last commit but that's expected; if commit was returned both primary and standby should have the transaction.

Are you sure in your case you end up with primary and standby with different records from a single transaction?


On Thu, 3 Oct 2019, 9:41 pm Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Wed, 2019-10-02 at 23:58 +0530, Shital A wrote:
> We are seeing a strange issue with postgresql streaming application
> in sync mode.
>
> We are using postgresql 9.6. Old version because of some specific
> requirements.  We have setup cluster with master-standby using
> pacemaker.
>
> When we kill master using killall -9 postgres. The failed primary has
> few records extra than standby node. We have done setup with
> synchronous_commit = remote_apply and have set
> synchronous_standby_names=server_name.
>
> As the failed primary is having more data, How is it possible that
> primary is committing transaction before they were applied on standby
> with synchronous_commit=remote_apply?
>
>
> Please share if you have any thoughts. Are we missing any config ?

This is to be expected.

The transaction will be committed on the primary, then on the standby,
and COMMIT will only return once the standby reports success.

But the transacaction still has to be committed on the primary first.

If the standby sis promoted while COMMIT is waiting for the standby,
you can end up with the transaction committed on the primary,
but not yet committed on the standby.

You should use "pg_rewind" on the failed primary if you want to use
it as new standby for the promoted server.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Re: Urgent :: Postgresql streaming replication issue - sync mode

From
Laurenz Albe
Date:
On Fri, 2019-10-04 at 00:34 +1000, Jason Wang wrote:
> I read this 
> https://www.2ndquadrant.com/en/blog/evolution-fault-tolerance-postgresql-synchronous-commit/
> 
> But don't see why your primary would have more records than the
> standby? 
> 
> If killall was issued before commit returned, that means the
> transaction wasn't completed so yes you would lose records after last
> commit but that's expected; if commit was returned both primary and
> standby should have the transaction.
> 
> Are you sure in your case you end up with primary and standby with
> different records from a single transaction?

PostgreSQL synchronous streaming replicatoin doesn't use anything like
two-phase commit.

1. It commits the transaction locally first, which generates WAL.
2. The WAL gets replicated.
3. As soon as the standby reports success, COMMIT returns.

If there is a failure after the first step completed, the
transaction will be committed locally, but not on the standby.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com