Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers

From Andres Freund
Subject Re: long-standing data loss bug in initial sync of logical replication
Date
Msg-id 20231118015443.sfxfbr4eqolxbmjc@awork3.anarazel.de
Whole thread Raw
In response to long-standing data loss bug in initial sync of logical replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: long-standing data loss bug in initial sync of logical replication
Re: long-standing data loss bug in initial sync of logical replication
List pgsql-hackers
Hi,

On 2023-11-17 15:36:25 +0100, Tomas Vondra wrote:
> It seems there's a long-standing data loss issue related to the initial
> sync of tables in the built-in logical replication (publications etc.).

:(


> Overall, this looks, walks and quacks like a cache invalidation issue,
> likely a missing invalidation somewhere in the ALTER PUBLICATION code.

It could also be be that pgoutput doesn't have sufficient invalidation
handling.


One thing that looks bogus on the DDL side is how the invalidation handling
interacts with locking.


For tables etc the invalidation handling works because we hold a lock on the
relation before modifying the catalog and don't release that lock until
transaction end. That part is crucial: We queue shared invalidations at
transaction commit, *after* the transaction is marked as visible, but *before*
locks are released. That guarantees that any backend processing invalidations
will see the new contents.  However, if the lock on the modified object is
released before transaction commit, other backends can build and use a cache
entry that hasn't processed invalidations (invaliations are processed when
acquiring locks).

While there is such an object for publications, it seems to be acquired too
late to actually do much good in a number of paths. And not at all in others.

E.g.:

    pubform = (Form_pg_publication) GETSTRUCT(tup);

    /*
     * If the publication doesn't publish changes via the root partitioned
     * table, the partition's row filter and column list will be used. So
     * disallow using WHERE clause and column lists on partitioned table in
     * this case.
     */
    if (!pubform->puballtables && publish_via_partition_root_given &&
        !publish_via_partition_root)
        {
        /*
         * Lock the publication so nobody else can do anything with it. This
         * prevents concurrent alter to add partitioned table(s) with WHERE
         * clause(s) and/or column lists which we don't allow when not
         * publishing via root.
         */
        LockDatabaseObject(PublicationRelationId, pubform->oid, 0,
                           AccessShareLock);

a) Another session could have modified the publication and made puballtables out-of-date
b) The LockDatabaseObject() uses AccessShareLock, so others can get past this
   point as well

b) seems like a copy-paste bug or such?


I don't see any locking of the publication around RemovePublicationRelById(),
for example.

I might just be misunderstanding things the way publication locking is
intended to work.





> However, while randomly poking at different things, I realized that if I
> change the lock obtained on the relation in OpenTableList() from
> ShareUpdateExclusiveLock to ShareRowExclusiveLock, the issue goes away.

That's odd. There's cases where changing the lock level can cause invalidation
processing to happen because there is no pre-existing lock for the "new" lock
level, but there was for the old. But OpenTableList() is used when altering
the publications, so I don't see how that connects.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Change GUC hashtable to use simplehash?
Next
From: John Naylor
Date:
Subject: Re: [PoC] Reducing planning time when tables have many partitions