Re: making relfilenodes 56 bits - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: making relfilenodes 56 bits
Date
Msg-id CAFiTN-uFRNufBmJTeM6cbm8EPe8BM8TteGBa5Spn9S+CDAj89w@mail.gmail.com
Whole thread Raw
In response to Re: making relfilenodes 56 bits  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: making relfilenodes 56 bits
List pgsql-hackers
On Tue, Aug 23, 2022 at 1:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Aug 22, 2022 at 3:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > To solve that problem, how about rewriting the system table in the new
> > cluster which has a conflicting relfilenode? I think we can probably
> > do this conflict checking before processing the tables from the old
> > cluster.
>
> Thanks for chiming in.
>
> Right now, there are two parts to the relfilenumber preservation
> system, and this scheme doesn't quite fit into either of them. First,
> the dump includes commands to set pg_class.relfilenode in the new
> cluster to the same value that it had in the old cluster. The dump
> can't include any SQL commands that depend on what's happening in the
> new cluster because pg_dump(all) only connects to a single cluster,
> which in this case is the old cluster. Second, pg_upgrade itself
> copies the files from the old cluster to the new cluster. This doesn't
> involve a database connection at all. So there's no part of the
> current relfilenode preservation mechanism that can look at the old
> AND the new database and decide on some SQL to execute against the new
> database.
>
> I thought for a while that we could use the information that's already
> gathered by get_rel_infos() to do what you're suggesting here, but it
> doesn't quite work, because that function excludes system tables, and
> we can't afford to do that here. We'd either need to modify that query
> to include system tables - at least for the new cluster - or run a
> separate one to gather information about system tables in the new
> cluster. Then, we could put all the pg_class.relfilenode values we
> found in the new cluster into a hash table, loop over the list of rels
> this function found in the old cluster, and for each one, probe into
> the hash table. If we find a match, that's a system table that needs
> to be moved out of the way before calling create_new_objects(), or
> maybe inside that function but before it runs pg_restore.
>
> That doesn't seem too crazy, I think. It's a little bit of new
> mechanism, but it doesn't sound horrific. It's got the advantage of
> being significantly cheaper than my proposal of moving everything out
> of the way unconditionally, and at the same time it retains one of the
> key advantages of that proposal - IMV, anyway - which is that we don't
> need separate relfilenode ranges for user and system objects any more.
> So I guess on balance I kind of like it, but maybe I'm missing
> something.

Okay, so this seems exactly the same as your previous proposal but
instead of unconditionally rewriting all the system tables we will
rewrite only those conflict with the user table or pg_largeobject from
the previous cluster.  Although it might have additional
implementation complexity on the pg upgrade side, it seems cheaper
than rewriting everything.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: SQL/JSON features for v15
Next
From: Andres Freund
Date:
Subject: Re: shared-memory based stats collector - v70