Home > mailing lists

BUGFIX: standby disconnect can corrupt serialized reorder buffers - Mailing list pgsql-hackers

From	Craig Ringer
Subject	BUGFIX: standby disconnect can corrupt serialized reorder buffers
Date	December 5, 2017 08:58:23
Msg-id	CAMsr+YHdX=XECbZshDZ2CZNWGTyw-taYBnzqVfx4JzM4ExP5xg@mail.gmail.com Whole thread
Responses	Re: BUGFIX: standby disconnect can corrupt serialized reorder buffers Re: BUGFIX: standby disconnect can corrupt serialized reorder buffers
List	pgsql-hackers

Tree view

Hi all

TL;DR: we should delete pg_replslot/$SLOTNAME/* at the start of each decoding session or we can accidentally re-use stale reorder buffer contents from prior runs and $BADNESS happens. StartupReorderBuffer() is not sufficient.

Details:

Petr and I have found a bug in logical decoding where changes get appended multiple times to serialized reorder buffers. This causes duplicate changes sent to downstream (conflicts, ERRORs, etc).

Symptoms are:

* Oversized serialized reorder buffers in pg_replslot. In the case we found this problem in, there was a 147GB reorder buffer that should only have been 12GB.

* Downstream/receiver working hard but achieving nothing (pglogical/bdr with conflict resolution enabled), or failing with unique violations and other errors (built-in logical replication)

Debugging showed that logical decoding was calling the output plugin many times with the same set of ReorderBufferChange records. It'd advance normally, then go back to the start of a page address or similar and go through them all all over again.

Log analysis showed that the downstream had been repeatedly connecting to the upstream, then ERRORing out, so the upstream logs were full of

LOG: could not receive data from client: Connection reset by peer

LOG: unexpected EOF on standby connection

When the downstream error was fixed, the repeated changes were seen.

The cause appears to be that walsender.c's ProcessRepliesIfAny writes a LOG for unexpected EOF then calls proc_exit(0). But serialized txn cleanup is done by

ReorderBufferRestoreCleanup, as called by ReorderBufferCleanupTXN, which is invoked from the PG_CATCH() in ReorderBufferCommit() and on various normal exits. It won't get called if we proc_exit() without an ERROR, so we leave stale data lying around.

It's not a problem on crash restart because StartupReorderBuffer already does the required delete.

ReorderBufferSerializeTXN, which spills the txns to disk, doesn't appear to have any guard to ensure that the segment files don't already exist when it goes to serialize a snapshot. Adding one there would probably be expensive; we don't know the last lsn of the txn yet, so to be really safe we'd have to do a directory listing and scan for any txn-$OURXID-* entries.

So to fix, I suggest that we should do a slot-specific StartupReorderBuffer-style deletion when we start a new decoding session on a slot, per attached patch.

It might be nice to also add a hook on proc exit, so we don't have stale buffers lying around until next decoding session, but I didn't want to add new global state to reorderbuffer.c so I've left that for now.

BTW, while this bug looks similar to https://www.postgresql.org/message-id/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp "Failed to delete old ReorderBuffer spilled files" it's really quite a separate issue.

Both this bugfix and the above need backpatching to 9.4.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

v1-0001-Clean-up-reorder-buffer-files-when-starting-logic.patch

pgsql-hackers by date:

From: Julien Rouhaud
Date: 05 December 2017, 08:50:23
Subject: Re: Mention ordered datums in PartitionBoundInfoData comment

From: Ashutosh Bapat
Date: 05 December 2017, 09:03:01
Subject: Re: Mention ordered datums in PartitionBoundInfoData comment

BUGFIX: standby disconnect can corrupt serialized reorder buffers - Mailing list pgsql-hackers

Attachment

Previous

Next