Re: making relfilenodes 56 bits - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: making relfilenodes 56 bits |
Date | |
Msg-id | CAA4eK1La-ZYvCjZAmShmGFvvhHNo2wbtSp80kXedS3d8A8R=aA@mail.gmail.com Whole thread Raw |
In response to | Re: making relfilenodes 56 bits (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: making relfilenodes 56 bits
Re: making relfilenodes 56 bits |
List | pgsql-hackers |
On Sat, Jul 30, 2022 at 1:59 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Jul 20, 2022 at 7:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > There was also an issue where the user table from the old cluster's > > relfilenode could conflict with the system table of the new cluster. > > As a solution currently for system table object (while creating > > storage first time) we are keeping the low range of relfilenumber, > > basically we are using the same relfilenumber as OID so that during > > upgrade the normal user table from the old cluster will not conflict > > with the system tables in the new cluster. But with this solution > > Robert told me (in off list chat) a problem that in future if we want > > to make relfilenumber completely unique within a cluster by > > implementing the CREATEDB differently then we can not do that as we > > have created fixed relfilenodes for the system tables. > > > > I am not sure what exactly we can do to avoid that because even if we > > do something to avoid that in the new cluster the old cluster might > > be already using the non-unique relfilenode so after upgrading the new > > cluster will also get those non-unique relfilenode. > > I think this aspect of the patch could use some more discussion. > > To recap, the problem is that pg_upgrade mustn't discover that a > relfilenode that is being migrated from the old cluster is being used > for some other table in the new cluster. Since the new cluster should > only contain system tables that we assume have never been rewritten, > they'll all have relfilenodes equal to their OIDs, and thus less than > 16384. On the other hand all the user tables from the old cluster will > have relfilenodes greater than 16384, so we're fine. pg_largeobject, > which also gets migrated, is a special case. Since we don't change OID > assignments from version to version, it should have either the same > relfilenode value in the old and new clusters, if never rewritten, or > else the value in the old cluster will be greater than 16384, in which > case no conflict is possible. > > But if we just assign all relfilenode values from a central counter, > then we have got trouble. If the new version has more system catalog > tables than the old version, then some value that got used for a user > table in the old version might get used for a system table in the new > version, which is a problem. One idea for fixing this is to have two > RelFileNumber ranges: a system range (small values) and a user range. > System tables get values in the system range initially, and in the > user range when first rewritten. User tables always get values in the > user range. Everything works fine in this scenario except maybe for > pg_largeobject: what if it gets one value from the system range in the > old cluster, and a different value from the system range in the new > cluster, but some other system table in the new cluster gets the value > that pg_largeobject had in the old cluster? Then we've got trouble. > To solve that problem, how about rewriting the system table in the new cluster which has a conflicting relfilenode? I think we can probably do this conflict checking before processing the tables from the old cluster. -- With Regards, Amit Kapila.
pgsql-hackers by date: