Race conditions in logical decoding - Mailing list pgsql-hackers

From Antonin Houska
Subject Race conditions in logical decoding
Date
Msg-id 85833.1768840165@localhost
Whole thread Raw
Responses Re: Race conditions in logical decoding
List pgsql-hackers
A stress test [1] for the REPACK patch [1] revealed data
corruption. Eventually I found out that the problem is in postgres core. In
particular, it can happen that a COMMIT record is decoded, but before the
commit could be recorded in CLOG, a snapshot that takes the commit into
account is created and even used. Visibility checks then work incorrectly
until the CLOG gets updated.

In logical replication, the consequences are not only wrong data on the
subscriber, but also corrutped table on publisher - this is due to incorrectly
set commit hint bits.

Attached is a spec file that demonstrates the issue. I did not add it to
Makefile because I don't expect the current version to be merged (see the
commit message for details.

I'm not sure yet how to fix the problem. I tried to call XactLockTableWait()
from SnapBuildAddCommittedTxn() (like it happens in SnapBuildWaitSnapshot()),
but it made at least one regression test (subscription/t/010_truncate.pl)
stuck - probably a deadlock. I can spend more time on it, but maybe someone
can come up with a good idea sooner than me.

[1] https://www.postgresql.org/message-id/CADzfLwU78as45To9a%3D-Qkr5jEg3tMxc5rUtdKy2MTv4r_SDGng%40mail.gmail.com
[2] https://commitfest.postgresql.org/patch/5117/

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: tablecmds: clarify recurse vs recusing
Next
From: Antonin Houska
Date:
Subject: Re: Adding REPACK [concurrently]