Some pgq table rewrite incompatibility with logical decoding? - Mailing list pgsql-hackers

From Jeremy Finzel
Subject Some pgq table rewrite incompatibility with logical decoding?
Date
Msg-id CAMa1XUjkTsrtmJxdwJBw9UBdxqYYz2pTxbwyaK0HwjQ9iLjefA@mail.gmail.com
Whole thread Raw
Responses Re: Some pgq table rewrite incompatibility with logical decoding?
List pgsql-hackers
I am hoping someone here can shed some light on this issue - I apologize if this isn't the right place to ask this but I'm almost some of you all were involving in pgq's dev and might be able to answer this.

We are actually running 2 replication technologies on a few of our dbs, skytools and pglogical.  Although we are moving towards only using logical decoding-based replication, right now we have both for different purposes.

There seems to be a table rewrite happening on table pgq.event_58_1 that has happened twice, and it ends up in the decoding stream, resulting in the following error:

ERROR,XX000,"could not map filenode ""base/16418/1173394526"" to relation OID"

In retracing what happened, we discovered that this relfilenode was rewritten.  But somehow, it is ending up in the logical decoding stream as is "undecodable".  This is pretty disastrous because the only way to fix it really is to advance the replication slot and lose data.

The only obvious table rewrite I can find in the pgq codebase is a truncate in pgq.maint_rotate_tables.sql.  But there isn't anything surprising there.  If anyone has any ideas as to what might cause this so that we could somehow mitigate the possibility of this happening again until we move off pgq, that would be much appreciated.

Thanks,
Jeremy


pgsql-hackers by date:

Previous
From: Alexander Kuzmenkov
Date:
Subject: Re: Removing unneeded self joins
Next
From: Lætitia Avrot
Date:
Subject: Re: Constraint documentation