Home > mailing lists

Re: making relfilenodes 56 bits - Mailing list pgsql-hackers

From	Dilip Kumar
Subject	Re: making relfilenodes 56 bits
Date	August 23, 2022 09:06:27
Msg-id	CAFiTN-tstcK1NUAsMaxQBc9GdHZh3=FEa2z-PubJmMU2xKn_pA@mail.gmail.com Whole thread Raw
In response to	Re: making relfilenodes 56 bits (Dilip Kumar <dilipbalaut@gmail.com>)
Responses	Re: making relfilenodes 56 bits Re: making relfilenodes 56 bits
List	pgsql-hackers

Tree view

On Tue, Aug 23, 2022 at 8:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Aug 23, 2022 at 1:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Mon, Aug 22, 2022 at 3:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > To solve that problem, how about rewriting the system table in the new
> > > cluster which has a conflicting relfilenode? I think we can probably
> > > do this conflict checking before processing the tables from the old
> > > cluster.
> >
> > Thanks for chiming in.
> >
> > Right now, there are two parts to the relfilenumber preservation
> > system, and this scheme doesn't quite fit into either of them. First,
> > the dump includes commands to set pg_class.relfilenode in the new
> > cluster to the same value that it had in the old cluster. The dump
> > can't include any SQL commands that depend on what's happening in the
> > new cluster because pg_dump(all) only connects to a single cluster,
> > which in this case is the old cluster. Second, pg_upgrade itself
> > copies the files from the old cluster to the new cluster. This doesn't
> > involve a database connection at all. So there's no part of the
> > current relfilenode preservation mechanism that can look at the old
> > AND the new database and decide on some SQL to execute against the new
> > database.
> >
> > I thought for a while that we could use the information that's already
> > gathered by get_rel_infos() to do what you're suggesting here, but it
> > doesn't quite work, because that function excludes system tables, and
> > we can't afford to do that here. We'd either need to modify that query
> > to include system tables - at least for the new cluster - or run a
> > separate one to gather information about system tables in the new
> > cluster. Then, we could put all the pg_class.relfilenode values we
> > found in the new cluster into a hash table, loop over the list of rels
> > this function found in the old cluster, and for each one, probe into
> > the hash table. If we find a match, that's a system table that needs
> > to be moved out of the way before calling create_new_objects(), or
> > maybe inside that function but before it runs pg_restore.
> >
> > That doesn't seem too crazy, I think. It's a little bit of new
> > mechanism, but it doesn't sound horrific. It's got the advantage of
> > being significantly cheaper than my proposal of moving everything out
> > of the way unconditionally, and at the same time it retains one of the
> > key advantages of that proposal - IMV, anyway - which is that we don't
> > need separate relfilenode ranges for user and system objects any more.
> > So I guess on balance I kind of like it, but maybe I'm missing
> > something.
>
> Okay, so this seems exactly the same as your previous proposal but
> instead of unconditionally rewriting all the system tables we will
> rewrite only those conflict with the user table or pg_largeobject from
> the previous cluster.  Although it might have additional
> implementation complexity on the pg upgrade side, it seems cheaper
> than rewriting everything.

OTOH, if we keep the two separate ranges for the user and system table
then we don't need all this complex logic of conflict checking.  From
the old cluster, we can just remember the relfilenumbr of the
pg_largeobject, and in the new cluster before trying to restore we can
just query the new cluster pg_class and find out whether it is used by
any system table and if so then we can just rewrite that system table.
And I think using two ranges might not be that complicated because as
soon as we are done with the initdb we can just set NextRelFileNumber
to the first user range relfilenumber so I think this could be the
simplest solution.  And I think what Amit is suggesting is something
on this line?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: John Naylor
Date: 23 August 2022, 09:03:03
Subject: Re: [PATCH] Optimize json_lex_string by batching character copying

From: John Naylor
Date: 23 August 2022, 09:13:13
Subject: Re: Considering additional sort specialisation functions for PG16

Re: making relfilenodes 56 bits - Mailing list pgsql-hackers

Previous

Next