Thread: Pro et contra of preserving pg_proc oids during pg_upgrade

Pro et contra of preserving pg_proc oids during pg_upgrade

From
Nikita Malakhov
Date:
Hi hackers!

Please advise on the idea of preserving pg_proc oids during pg_upgrade, in a way like relfilenodes, type id and so on. What are possible downsides of such a solution?

Thanks!

--

Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
Tom Lane
Date:
Nikita Malakhov <hukutoc@gmail.com> writes:
> Please advise on the idea of preserving pg_proc oids during pg_upgrade, in
> a way like relfilenodes, type id and so on. What are possible downsides of
> such a solution?

You have the burden of proof backwards.  That would add a great deal
of new mechanism, and you haven't provided even one reason why it'd
be worth doing.

            regards, tom lane



Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
Nikita Malakhov
Date:
Hi!

Say, we have data processed by some user function and we want to keep reference to this function
in our data. In this case we have two ways - first - store string output of regprocedure, which is not
very convenient, and the second - store its OID, which requires slight modification of pg_upgrade
(pg_dump and func/procedure creation function).

I've read previous threads about using regproc, and agree that this is not a very good case anyway,
but I haven't found any serious obstacles that forbid modifying pg_upgrade this way.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
"David G. Johnston"
Date:
On Thu, Oct 12, 2023 at 9:57 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Say, we have data processed by some user function and we want to keep reference to this function
in our data.

Then you need to keep the user-visible identifier of said function (schema+name+input argument types - you'd probably want to incorporate version into the name) in your user-space code.  Exposing runtime generated oids to user-space is not something I can imagine the system supporting.  It goes against the very definition of "implementation detail" that user-space code is not supposed to depend upon.

David J.

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
"David G. Johnston"
Date:
On Thu, Oct 12, 2023 at 7:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nikita Malakhov <hukutoc@gmail.com> writes:
> Please advise on the idea of preserving pg_proc oids during pg_upgrade, in
> a way like relfilenodes, type id and so on. What are possible downsides of
> such a solution?

You have the burden of proof backwards.  That would add a great deal
of new mechanism, and you haven't provided even one reason why it'd
be worth doing.


I was curious about the comment regarding type oids being copied over and I found the commentary in pg_upgrade.c that describes which oids are copied over and why, but the IMPLEMENTATION seems to be out-of-sync with the actual implementation.

"""
It preserves the relfilenode numbers so TOAST and other references
to relfilenodes in user data is preserved.  (See binary-upgrade usage
in pg_dump). We choose to preserve tablespace and database OIDs as well.
"""

David J.

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
Robert Haas
Date:
On Thu, Oct 12, 2023 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You have the burden of proof backwards.  That would add a great deal
> of new mechanism, and you haven't provided even one reason why it'd
> be worth doing.

"A great deal of new mechanism" seems like a slight exaggeration. We
preserve a bunch of kinds of OIDs already, and it wouldn't be any
harder to preserve this one than the ones we preserve already, or so I
think. So it would be some additional mechanism, but maybe not a great
deal.

As to whether it's a good idea, it isn't necessary for the system to
operate properly, so we didn't, but it's a judgement call whether it's
better for other reasons, like being able to have regprocedure columns
survive an upgrade, or making users being less confused, or allowing
people supporting PostgreSQL having an easier time debugging issues.
Personally, I've never been quite sure we made the right decision
there. I admit that I'm not particularly keen to try to add the amount
of mechanism that would be required to preserve every single OID
everywhere, but I also somehow feel like the fact that we don't is
pretty weird.

The pg_upgrade experience right now is a bit as if you woke up in the
morning and found that city officials came by during the night and
renumbered your house, thus changing your address. Then, they sent
change of address forms to everyone who ever mails you anything, plus
updated your address with your doctor's office and your children's
school. In a way, there's no problem: nothing has really changed for
you in any way that matters. Yet, I think that would feel pretty
uncomfortable if it actually happened to you, and I think the
pg_upgrade experience is uncomfortable in the same way.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
"David G. Johnston"
Date:
On Thu, Oct 12, 2023, 11:21 Robert Haas <robertmhaas@gmail.com> wrote:

The pg_upgrade experience right now is a bit as if you woke up in the
morning and found that city officials came by during the night and
renumbered your house, thus changing your address. Then, they sent
change of address forms to everyone who ever mails you anything, plus
updated your address with your doctor's office and your children's
school. In a way, there's no problem: nothing has really changed for
you in any way that matters. Yet, I think that would feel pretty
uncomfortable if it actually happened to you, and I think the
pg_upgrade experience is uncomfortable in the same way.

It's more like a lot number or surveying tract than an postal address.  Useful for a single party, the builder or the government, but not something you give out to other people so they can find you.

Whether or not we copy over oids should be done based upon our internal needs, not end users.  Which is why the fee that do get copied exists, because we store them in internal files that we want to copy as part of the upgrade.  It also isn't like pg_dump/restore is going to retain them and the less divergence between that and pg_upgrade arguably the better.

David J.

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
Robert Haas
Date:
On Thu, Oct 12, 2023 at 2:38 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> It's more like a lot number or surveying tract than an postal address.  Useful for a single party, the builder or the
government,but not something you give out to other people so they can find you. 
>
> Whether or not we copy over oids should be done based upon our internal needs, not end users.  Which is why the fee
thatdo get copied exists, because we store them in internal files that we want to copy as part of the upgrade.  It also
isn'tlike pg_dump/restore is going to retain them and the less divergence between that and pg_upgrade arguably the
better.

We build the product for the end users. Their desires and needs are
relevant. And if they're telling us we did it wrong, we need to listen
to that. We don't have to do everything that everybody wants, but
treating developer needs as strictly more important than end-user
needs is self-defeating.

I agree that there's a trade-off here. Preserving more OIDs requires
more code and makes pg_dump and other things more complicated, which
is not great. But, at least to me, arguing that there are no downsides
of not preserving these OIDs is simply not a believable argument.

Well, maybe somebody believes it. But I don't.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
"David G. Johnston"
Date:
On Thu, Oct 12, 2023 at 11:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 12, 2023 at 2:38 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> It's more like a lot number or surveying tract than an postal address.  Useful for a single party, the builder or the government, but not something you give out to other people so they can find you.
>
> Whether or not we copy over oids should be done based upon our internal needs, not end users.  Which is why the fee that do get copied exists, because we store them in internal files that we want to copy as part of the upgrade.  It also isn't like pg_dump/restore is going to retain them and the less divergence between that and pg_upgrade arguably the better.

We build the product for the end users. Their desires and needs are
relevant. And if they're telling us we did it wrong, we need to listen
to that. We don't have to do everything that everybody wants, but
treating developer needs as strictly more important than end-user
needs is self-defeating.

Every catalog has both a natural and a surrogate key.  Developers get to use the surrogate key while end-users get to use the natural one (i.e., the one they provided).  I see no reason to change that specification.  And I do believe there are no compelling reasons for an end-user to need to use the surrogate key instead of the natural one.  The example provided by the OP isn't one, IMO, the overall goal can be accomplished via the natural key (if it cannot, maybe we need to make retrieving the natural key for a pg_proc record given an OID easier).  The fact that OIDs are not even accessible via SQL further reinforces this belief.  The only reason to need OIDs as a DBA is to perform joins among the catalogs and all such joins are local to the database and even session executing them - the specific values are immaterial.

The behavior of pg_upgrade only preserving OIDs that are necessary due to the physical copying of data files from the old server to the new one seems sufficient both in terms of effort and the principle of doing the minimum amount to solve the problem at hand.

David J.

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
Robert Haas
Date:
On Thu, Oct 12, 2023 at 3:36 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Every catalog has both a natural and a surrogate key.  Developers get to use the surrogate key while end-users get to
usethe natural one (i.e., the one they provided).  I see no reason to change that specification. 

I agree with this.

> And I do believe there are no compelling reasons for an end-user to need to use the surrogate key instead of the
naturalone. 

But I disagree with this.

> The example provided by the OP isn't one, IMO, the overall goal can be accomplished via the natural key (if it
cannot,maybe we need to make retrieving the natural key for a pg_proc record given an OID easier).  The fact that OIDs
arenot even accessible via SQL further reinforces this belief.  The only reason to need OIDs as a DBA is to perform
joinsamong the catalogs and all such joins are local to the database and even session executing them - the specific
valuesare immaterial. 

This just all seems very simplistic to me. In theory it's true, but in
practice it isn't.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
Nikita Malakhov
Date:
Hi,

I've already implemented preserving PG_PROC oids during pg_upgrade
in a way like relfilenodes, etc, actually, it is quite simple, and on the first
look there are no any problems.

About using surrogate key - this feature is more for data generated by
the DBMS itself, i.e. data processed by some extension and saved
and re-processed automatically or by user's request, but without bothering
user with these internal keys.

The main question - maybe, are there pitfalls of which I am not aware of?

Thanks for your replies!

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
"David G. Johnston"
Date:
On Thu, Oct 12, 2023 at 1:31 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
About using surrogate key - this feature is more for data generated by
the DBMS itself, i.e. data processed by some extension and saved
and re-processed automatically or by user's request, but without bothering
user with these internal keys.

Then what does it matter whether you spell it:

12345
or
my_ext.do_something(int)
?

Why do you require us to redefine the scope for which pg_proc.oid is useful in order to implement this behavior?

Your extension breaks if your user uses logical backups or we otherwise get into a position where pg_upgrade cannot be used to migrate in the future.  Is avoiding the textual representation so necessary that you need to add another dependency to the system?  That just seems unwise regardless of how easy it may be to accomplish.

David J.

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
Nikita Malakhov
Date:
Hi,

Textual representation requires a long text field because it could contain schema,
arguments, it is difficult and not effective to be saved as part of the data, and must
be parsed to retrieve function oid. By using direct oid (actually, a value
of the regprocedure field) we avoid it and function could be retrieved by pk.

Why pg_upgrade cannot be used? OID preservation logic is already implemented
for several OIDs in catalog tables, like pg_class, type, relfilenode, enum...
I've mentioned twice that this logic is already implemented and I haven't encountered
any problems with pg_upgrade.

Actually, I've asked here because there are several references to PG_PROC oids
from other tables in the system catalog, so I was worried if this logic could break
something I do not know about.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
"David G. Johnston"
Date:
On Thu, Oct 12, 2023 at 2:58 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Why pg_upgrade cannot be used?

We document both a pg_dump/pg_restore migration and a pg_upgrade one (not to mention that logical backup and restore would cause the oids to change).  It seems odd to have a feature that requires pg_upgrade to be the chosen one.  pg_upgrade is an option, not a requirement.  Same goes for pg_basebackup.

pg_upgrade itself warns that should the on-disk file format change then it would be unusable - though I suspect that we'd end up with some kind of hybrid approach in that case.
 
OID preservation logic is already implemented
for several OIDs in catalog tables, like pg_class, type, relfilenode, enum...


We are allowed to preserve oids if we wish but that doesn't mean we must, nor does doing so constitute a declaration that such oids are part of the public API.  And I don't see us making OIDs part of the public API unless we modify pg_dump to include them in its output.


Actually, I've asked here because there are several references to PG_PROC oids
from other tables in the system catalog

Of course there are, e.g., views depending on functions would result is those.  But pg_upgrade et al. recomputes the views so the changing of oids isn't a problem.

Long text fields are common in databases; and if there are concerns with parsing/interpretation we can add functions to make doing that simpler.

David J.

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
Laurenz Albe
Date:
On Thu, 2023-10-12 at 19:56 +0300, Nikita Malakhov wrote:
> Say, we have data processed by some user function and we want to keep reference to this function
> in our data. In this case we have two ways - first - store string output of regprocedure, which is not
> very convenient, and the second - store its OID, which requires slight modification of pg_upgrade
> (pg_dump and func/procedure creation function).

So far, we have lived quite well with the rule "don't store any system OIDs in the database
if you want to pg_upgrade" (views on system objects, reg* data types, ...).

What is inconvenient about storing the output of regprocedure?

Yours,
Laurenz Albe



Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
Alvaro Herrera
Date:
On 2023-Oct-13, Nikita Malakhov wrote:

> Textual representation requires a long text field because it could
> contain schema, arguments, it is difficult and not effective to be
> saved as part of the data, and must be parsed to retrieve function
> oid.

It is worse than that: the regproc textual representation depends on
search_path.  If you store the text now, the meaning could change later,
depending on the search_path that applies at read time.

Of course, the storage for OID is much shorter and not subject to this
problem; but it is subject to the problem that it breaks if you drop and
replace the function, which could happen for instance in an extensions
upgrade script.

I think a better way to store a function's identity is to store the
'identity' column from pg_identify_object().  It is fully qualified and
you can cast to regprocedure with no ambiguity (which gives you an OID,
if you need one).  And it should upgrade cleanly.

If you have a regproc column that you want to upgrade, maybe it would
work to do 'ALTER TABLE .. SET TYPE TEXT USING' and turn the value into
pg_identify_object().identity.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Pro et contra of preserving pg_proc oids during pg_upgrade

From
Nikita Malakhov
Date:
Hi,

Thank you very much, I'll check it out. It looks like the getObjectIdentity() used in
pg_identify_object() could do.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company