Thread: Pro et contra of preserving pg_proc oids during pg_upgrade
Hi hackers!
Please advise on the idea of preserving pg_proc oids during pg_upgrade, in a way like relfilenodes, type id and so on. What are possible downsides of such a solution?
--
Thanks!
Nikita Malakhov <hukutoc@gmail.com> writes: > Please advise on the idea of preserving pg_proc oids during pg_upgrade, in > a way like relfilenodes, type id and so on. What are possible downsides of > such a solution? You have the burden of proof backwards. That would add a great deal of new mechanism, and you haven't provided even one reason why it'd be worth doing. regards, tom lane
Hi!
Say, we have data processed by some user function and we want to keep reference to this function
in our data. In this case we have two ways - first - store string output of regprocedure, which is not
very convenient, and the second - store its OID, which requires slight modification of pg_upgrade
(pg_dump and func/procedure creation function).
I've read previous threads about using regproc, and agree that this is not a very good case anyway,
but I haven't found any serious obstacles that forbid modifying pg_upgrade this way.
--
On Thu, Oct 12, 2023 at 9:57 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Say, we have data processed by some user function and we want to keep reference to this functionin our data.
Then you need to keep the user-visible identifier of said function (schema+name+input argument types - you'd probably want to incorporate version into the name) in your user-space code. Exposing runtime generated oids to user-space is not something I can imagine the system supporting. It goes against the very definition of "implementation detail" that user-space code is not supposed to depend upon.
David J.
On Thu, Oct 12, 2023 at 7:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nikita Malakhov <hukutoc@gmail.com> writes:
> Please advise on the idea of preserving pg_proc oids during pg_upgrade, in
> a way like relfilenodes, type id and so on. What are possible downsides of
> such a solution?
You have the burden of proof backwards. That would add a great deal
of new mechanism, and you haven't provided even one reason why it'd
be worth doing.
I was curious about the comment regarding type oids being copied over and I found the commentary in pg_upgrade.c that describes which oids are copied over and why, but the IMPLEMENTATION seems to be out-of-sync with the actual implementation.
"""
It preserves the relfilenode numbers so TOAST and other references
to relfilenodes in user data is preserved. (See binary-upgrade usage
in pg_dump). We choose to preserve tablespace and database OIDs as well.
to relfilenodes in user data is preserved. (See binary-upgrade usage
in pg_dump). We choose to preserve tablespace and database OIDs as well.
"""
David J.
On Thu, Oct 12, 2023 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > You have the burden of proof backwards. That would add a great deal > of new mechanism, and you haven't provided even one reason why it'd > be worth doing. "A great deal of new mechanism" seems like a slight exaggeration. We preserve a bunch of kinds of OIDs already, and it wouldn't be any harder to preserve this one than the ones we preserve already, or so I think. So it would be some additional mechanism, but maybe not a great deal. As to whether it's a good idea, it isn't necessary for the system to operate properly, so we didn't, but it's a judgement call whether it's better for other reasons, like being able to have regprocedure columns survive an upgrade, or making users being less confused, or allowing people supporting PostgreSQL having an easier time debugging issues. Personally, I've never been quite sure we made the right decision there. I admit that I'm not particularly keen to try to add the amount of mechanism that would be required to preserve every single OID everywhere, but I also somehow feel like the fact that we don't is pretty weird. The pg_upgrade experience right now is a bit as if you woke up in the morning and found that city officials came by during the night and renumbered your house, thus changing your address. Then, they sent change of address forms to everyone who ever mails you anything, plus updated your address with your doctor's office and your children's school. In a way, there's no problem: nothing has really changed for you in any way that matters. Yet, I think that would feel pretty uncomfortable if it actually happened to you, and I think the pg_upgrade experience is uncomfortable in the same way. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Oct 12, 2023, 11:21 Robert Haas <robertmhaas@gmail.com> wrote:
The pg_upgrade experience right now is a bit as if you woke up in the
morning and found that city officials came by during the night and
renumbered your house, thus changing your address. Then, they sent
change of address forms to everyone who ever mails you anything, plus
updated your address with your doctor's office and your children's
school. In a way, there's no problem: nothing has really changed for
you in any way that matters. Yet, I think that would feel pretty
uncomfortable if it actually happened to you, and I think the
pg_upgrade experience is uncomfortable in the same way.
It's more like a lot number or surveying tract than an postal address. Useful for a single party, the builder or the government, but not something you give out to other people so they can find you.
Whether or not we copy over oids should be done based upon our internal needs, not end users. Which is why the fee that do get copied exists, because we store them in internal files that we want to copy as part of the upgrade. It also isn't like pg_dump/restore is going to retain them and the less divergence between that and pg_upgrade arguably the better.
David J.
On Thu, Oct 12, 2023 at 2:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > It's more like a lot number or surveying tract than an postal address. Useful for a single party, the builder or the government,but not something you give out to other people so they can find you. > > Whether or not we copy over oids should be done based upon our internal needs, not end users. Which is why the fee thatdo get copied exists, because we store them in internal files that we want to copy as part of the upgrade. It also isn'tlike pg_dump/restore is going to retain them and the less divergence between that and pg_upgrade arguably the better. We build the product for the end users. Their desires and needs are relevant. And if they're telling us we did it wrong, we need to listen to that. We don't have to do everything that everybody wants, but treating developer needs as strictly more important than end-user needs is self-defeating. I agree that there's a trade-off here. Preserving more OIDs requires more code and makes pg_dump and other things more complicated, which is not great. But, at least to me, arguing that there are no downsides of not preserving these OIDs is simply not a believable argument. Well, maybe somebody believes it. But I don't. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Oct 12, 2023 at 11:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 12, 2023 at 2:38 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> It's more like a lot number or surveying tract than an postal address. Useful for a single party, the builder or the government, but not something you give out to other people so they can find you.
>
> Whether or not we copy over oids should be done based upon our internal needs, not end users. Which is why the fee that do get copied exists, because we store them in internal files that we want to copy as part of the upgrade. It also isn't like pg_dump/restore is going to retain them and the less divergence between that and pg_upgrade arguably the better.
We build the product for the end users. Their desires and needs are
relevant. And if they're telling us we did it wrong, we need to listen
to that. We don't have to do everything that everybody wants, but
treating developer needs as strictly more important than end-user
needs is self-defeating.
Every catalog has both a natural and a surrogate key. Developers get to use the surrogate key while end-users get to use the natural one (i.e., the one they provided). I see no reason to change that specification. And I do believe there are no compelling reasons for an end-user to need to use the surrogate key instead of the natural one. The example provided by the OP isn't one, IMO, the overall goal can be accomplished via the natural key (if it cannot, maybe we need to make retrieving the natural key for a pg_proc record given an OID easier). The fact that OIDs are not even accessible via SQL further reinforces this belief. The only reason to need OIDs as a DBA is to perform joins among the catalogs and all such joins are local to the database and even session executing them - the specific values are immaterial.
The behavior of pg_upgrade only preserving OIDs that are necessary due to the physical copying of data files from the old server to the new one seems sufficient both in terms of effort and the principle of doing the minimum amount to solve the problem at hand.
David J.
On Thu, Oct 12, 2023 at 3:36 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > Every catalog has both a natural and a surrogate key. Developers get to use the surrogate key while end-users get to usethe natural one (i.e., the one they provided). I see no reason to change that specification. I agree with this. > And I do believe there are no compelling reasons for an end-user to need to use the surrogate key instead of the naturalone. But I disagree with this. > The example provided by the OP isn't one, IMO, the overall goal can be accomplished via the natural key (if it cannot,maybe we need to make retrieving the natural key for a pg_proc record given an OID easier). The fact that OIDs arenot even accessible via SQL further reinforces this belief. The only reason to need OIDs as a DBA is to perform joinsamong the catalogs and all such joins are local to the database and even session executing them - the specific valuesare immaterial. This just all seems very simplistic to me. In theory it's true, but in practice it isn't. -- Robert Haas EDB: http://www.enterprisedb.com
Hi,
I've already implemented preserving PG_PROC oids during pg_upgrade
in a way like relfilenodes, etc, actually, it is quite simple, and on the first
look there are no any problems.
About using surrogate key - this feature is more for data generated by
the DBMS itself, i.e. data processed by some extension and saved
and re-processed automatically or by user's request, but without bothering
user with these internal keys.
The main question - maybe, are there pitfalls of which I am not aware of?
Thanks for your replies!
On Thu, Oct 12, 2023 at 1:31 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
About using surrogate key - this feature is more for data generated bythe DBMS itself, i.e. data processed by some extension and savedand re-processed automatically or by user's request, but without botheringuser with these internal keys.
Then what does it matter whether you spell it:
12345
or
my_ext.do_something(int)
?
Why do you require us to redefine the scope for which pg_proc.oid is useful in order to implement this behavior?
Your extension breaks if your user uses logical backups or we otherwise get into a position where pg_upgrade cannot be used to migrate in the future. Is avoiding the textual representation so necessary that you need to add another dependency to the system? That just seems unwise regardless of how easy it may be to accomplish.
David J.
Hi,
Textual representation requires a long text field because it could contain schema,
arguments, it is difficult and not effective to be saved as part of the data, and must
be parsed to retrieve function oid. By using direct oid (actually, a value
of the regprocedure field) we avoid it and function could be retrieved by pk.
Why pg_upgrade cannot be used? OID preservation logic is already implemented
for several OIDs in catalog tables, like pg_class, type, relfilenode, enum...
I've mentioned twice that this logic is already implemented and I haven't encountered
any problems with pg_upgrade.
Actually, I've asked here because there are several references to PG_PROC oids
from other tables in the system catalog, so I was worried if this logic could break
something I do not know about.
--
On Thu, Oct 12, 2023 at 2:58 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Why pg_upgrade cannot be used?
We document both a pg_dump/pg_restore migration and a pg_upgrade one (not to mention that logical backup and restore would cause the oids to change). It seems odd to have a feature that requires pg_upgrade to be the chosen one. pg_upgrade is an option, not a requirement. Same goes for pg_basebackup.
pg_upgrade itself warns that should the on-disk file format change then it would be unusable - though I suspect that we'd end up with some kind of hybrid approach in that case.
OID preservation logic is already implementedfor several OIDs in catalog tables, like pg_class, type, relfilenode, enum...
We are allowed to preserve oids if we wish but that doesn't mean we must, nor does doing so constitute a declaration that such oids are part of the public API. And I don't see us making OIDs part of the public API unless we modify pg_dump to include them in its output.
Actually, I've asked here because there are several references to PG_PROC oidsfrom other tables in the system catalog
Of course there are, e.g., views depending on functions would result is those. But pg_upgrade et al. recomputes the views so the changing of oids isn't a problem.
Long text fields are common in databases; and if there are concerns with parsing/interpretation we can add functions to make doing that simpler.
David J.
On Thu, 2023-10-12 at 19:56 +0300, Nikita Malakhov wrote: > Say, we have data processed by some user function and we want to keep reference to this function > in our data. In this case we have two ways - first - store string output of regprocedure, which is not > very convenient, and the second - store its OID, which requires slight modification of pg_upgrade > (pg_dump and func/procedure creation function). So far, we have lived quite well with the rule "don't store any system OIDs in the database if you want to pg_upgrade" (views on system objects, reg* data types, ...). What is inconvenient about storing the output of regprocedure? Yours, Laurenz Albe
On 2023-Oct-13, Nikita Malakhov wrote: > Textual representation requires a long text field because it could > contain schema, arguments, it is difficult and not effective to be > saved as part of the data, and must be parsed to retrieve function > oid. It is worse than that: the regproc textual representation depends on search_path. If you store the text now, the meaning could change later, depending on the search_path that applies at read time. Of course, the storage for OID is much shorter and not subject to this problem; but it is subject to the problem that it breaks if you drop and replace the function, which could happen for instance in an extensions upgrade script. I think a better way to store a function's identity is to store the 'identity' column from pg_identify_object(). It is fully qualified and you can cast to regprocedure with no ambiguity (which gives you an OID, if you need one). And it should upgrade cleanly. If you have a regproc column that you want to upgrade, maybe it would work to do 'ALTER TABLE .. SET TYPE TEXT USING' and turn the value into pg_identify_object().identity. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Hi,
Thank you very much, I'll check it out. It looks like the getObjectIdentity() used in
pg_identify_object() could do.
--