Re: invalid non-zero objectSubId for object class - Mailing list pgsql-general

From Michel Pelletier
Subject Re: invalid non-zero objectSubId for object class
Date
Msg-id CACxu=vKvqpEti11owAKEX8n4NVeoBnP+c=pkgTd7i8+Uoongww@mail.gmail.com
Whole thread Raw
In response to Re: invalid non-zero objectSubId for object class  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: invalid non-zero objectSubId for object class
List pgsql-general


On Thu, Jul 9, 2020 at 4:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michel Pelletier <pelletier.michel@gmail.com> writes:
> On a 12.3 AWS RDS instance, I get the following error when trying to drop
> either of two tables:

> dev=> drop table current_flight;
> ERROR:  invalid non-zero objectSubId for object class 297108
> dev=> drop table flight;
> ERROR:  invalid non-zero objectSubId for object class 297108

This looks like corrupt data in pg_depend, specifically an entry or
entries with classid or refclassid = 297108, which should not happen
(the classid should always be the OID of one of a short list of system
catalogs).  You could try poking around in pg_depend to see if you
can identify any obviously-bogus rows.

Hi Tom, thanks for getting back so quick:

I don't seem to have either:

dev=> select * from pg_depend where classid = 297108 or refclassid = 297108;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
(0 rows)

I'm not sure what a bogus row would look like.
 
No idea how it got that way.  Have you had any database crashes or the
like?

No crashes, but a restart and one upgrade.  On Sunday and Monday, at exactly UTC midnight  we run a cron job to create a new partition for an unrelated table and attach it to a pglogical replication set.   I updated the procedure on saturday to create two new partitions for two unrelated tables, and that somehow caused an issue on 12.2 / pglogical 2.3.0 that caused an error, but not a crash.  What's puzzling is that the two partition creation still worked, and replicated to all downstream consumers, but from that point on replication ceased and consumers logged the error in the link below:


This spooled up changes on the RDS primary until it filled up the storage. On sunday we resized the instance and restarted, and reinitialized the pglogical setup which restarted replication.  On monday the error happened again at midnight, and we restarted replication and upgraded to 12.3/2.3.1 on tuesday as recommended in the issue.  It has thus run till now without error and has been replicating nicely so have assumed that issue is fixed.

Neither of these two tables are involved in the midnight job, they're no longer used and I was hoping to clean them up.  I guess my concern should be, is there additional possible corruption I can check for?  And if that's ok is there some manual intervention I can do to drop the tables?

Thanks,

-Michel
 

                        regards, tom lane

pgsql-general by date:

Previous
From: James Sewell
Date:
Subject: Safe switchover
Next
From: Alvaro Herrera
Date:
Subject: Re: invalid non-zero objectSubId for object class