Re: making relfilenodes 56 bits - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: making relfilenodes 56 bits |
Date | |
Msg-id | CA+Tgmob-J_70e47imyLV3Wr5Q8h21ijh=+QMsjx_hA2LMcC=gg@mail.gmail.com Whole thread Raw |
In response to | Re: making relfilenodes 56 bits (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: making relfilenodes 56 bits
|
List | pgsql-hackers |
On Fri, Aug 5, 2022 at 3:25 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > I think even if we start the range from the 4 billion we can not avoid > keeping two separate ranges for system and user tables otherwise the > next upgrade where old and new clusters both have 56 bits > relfilenumber will get conflicting files. And, for the same reason we > still have to call SetNextRelFileNumber() during upgrade. Well, my proposal to move everything from the new cluster up to higher numbers would address this without requiring two ranges. > So the idea is, we will be having 2 ranges for relfilenumbers, system > range will start from 4 billion and user range maybe something around > 4.1 (I think we can keep it very small though, just reserve 50k > relfilenumber for system for future expansion and start user range > from there). A disadvantage of this is that it basically means all the file names in new clusters are going to be 10 characters long. That's not a big disadvantage, but it's not wonderful. File names that are only 5-7 characters long are common today, and easier to remember. > So now system tables have no issues and also the user tables from the > old cluster have no issues. But pg_largeobject might get conflict > when both old and new cluster are using 56 bits relfilenumber, because > it is possible that in the new cluster some other system table gets > that relfilenumber which is used by pg_largeobject in the old cluster. > > This could be resolved if we allocate pg_largeobject's relfilenumber > from the user range, that means this relfilenumber will always be the > first value from the user range. So now if the old and new cluster > both are using 56bits relfilenumber then pg_largeobject in both > cluster would have got the same relfilenumber and if the old cluster > is using the current 32 bits relfilenode system then the whole range > of the new cluster is completely different than that of the old > cluster. I think this can work, but it does rely to some extent on the fact that there are no other tables which need to be treated like pg_largeobject. If there were others, they'd need fixed starting RelFileNumber assignments, or some other trick, like renumbering them twice in the cluster, first two a known-unused value and then back to the proper value. You'd have trouble if in the other cluster pg_largeobject was 4bn+1 and pg_largeobject2 was 4bn+2 and in the new cluster the reverse, without some hackery. I do feel like your idea here has some advantages - my proposal requires rewriting all the catalogs in the new cluster before we do anything else, and that's going to take some time even though they should be small. But I also feel like it has some disadvantages: it seems to rely on complicated reasoning and special cases more than I'd like. What do other people think? -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: