Commit to primary with unavailable sync standby - Mailing list pgsql-general

From Andrey Borodin
Subject Commit to primary with unavailable sync standby
Date
Msg-id B70260F9-D0EC-438D-9A59-31CB996B320A@yandex-team.ru
Whole thread Raw
Responses Re: Commit to primary with unavailable sync standby
Re: Commit to primary with unavailable sync standby
List pgsql-general
Hi!

I cannot figure out proper way to implement safe HA upsert. I will be very grateful if someone would help me.

Imagine we have primary server after failover. It is network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING;
thateventually timed out. 

az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
    INSERT INTO t(
        pk,
        v,
        dt
    )
    VALUES
    (
        5,
        'text',
        now()
    )
    ON CONFLICT (pk) DO NOTHING
    RETURNING pk,
              v,
              dt)
   SELECT new_doc.pk from new_doc;
^CCancel request sent
WARNING:  01000: canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
LOCATION:  SyncRepWaitForLSN, syncrep.c:264
Time: 2173.770 ms (00:02.174)

Here our driver decided that something goes wrong and we retry query.

az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
    INSERT INTO t(
        pk,
        v,
        dt
    )
    VALUES
    (
        5,
        'text',
        now()
    )
    ON CONFLICT (pk) DO NOTHING
    RETURNING pk,
              v,
              dt)
   SELECT new_doc.pk from new_doc;
 pk
----
(0 rows)

Time: 4.785 ms

Now we have split-brain, because we acknowledged that row to client.
How can I fix this?

There must be some obvious trick, but I cannot see it... Or maybe cancel of sync replication should be disallowed and
terminationshould be treated as system failure? 

Best regards, Andrey Borodin.


pgsql-general by date:

Previous
From: James Sewell
Date:
Subject: Partitioned tables and locks
Next
From: Peter Eisentraut
Date:
Subject: Re: Max locks