Re: pg15b2: large objects lost on upgrade - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: pg15b2: large objects lost on upgrade |
Date | |
Msg-id | CA+TgmoYwaXh_wRRa2CqL4XpM4r6YEbq1+ec=+8b7Dr7-T_tT+Q@mail.gmail.com Whole thread Raw |
In response to | Re: pg15b2: large objects lost on upgrade (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: pg15b2: large objects lost on upgrade
|
List | pgsql-hackers |
+ /* Keep track of whether a filenode matches the OID */ + if (maps[mapnum].relfilenumber == LargeObjectRelationId) + *has_lotable = true; + if (maps[mapnum].relfilenumber == LargeObjectLOidPNIndexId) + *has_loindex = true; On Thu, Jul 7, 2022 at 2:44 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > We're referring to the new cluster which should have been initdb'd more or less > immediately before running pg_upgrade [0]. > > It'd be weird to me if someone were to initdb a new cluster, then create some > large objects, and then maybe delete them, and then run pg_upgrade. Why > wouldn't they do their work on the old cluster before upgrading, or on the new > cluster afterwards ? > > Does anybody actually do anything significant between initdb and pg_upgrade ? > Is that considered to be supported ? It seems like pg_upgrade could itself run > initdb, with the correct options for locale, checksum, block size, etc > (although it probably has to support the existing way to handle "compatible > encodings"). > > Actually, I think check_new_cluster_is_empty() ought to prohibit doing work > between initdb and pg_upgrade by checking that no objects have *ever* been > created in the new cluster, by checking that NextOid == 16384. But I have a > separate thread about "pg-upgrade allows itself to be re-run", and this has > more to do with that than about whether to check that the file is empty before > removing it. I think you're getting at a really good point here which is also my point: we assume that nothing significant has happened between when the cluster was created and when pg_upgrade is run, but we don't check it. Either we shouldn't assume it, or we should check it. So, is such activity ever legitimate? I think there are people doing it. The motivation is that maybe you have a dump from the old database that doesn't quite restore on the new version, but by doing something to the new cluster, you can make it restore. For instance, maybe there are some functions that used to be part of core and are now only available in an extension. That's going to make pg_upgrade's dump-and-restore workflow fail, but if you install that extension onto the new cluster, perhaps you can work around the problem. It doesn't have to be an extension, even. Maybe some function in core just got an extra argument, and you're using it, so the calls to that function cause dump-and-restore to fail. You might try overloading it in the new database with the old calling sequence to get things working. Now, are these kinds of things considered to be supported? Well, I don't know that we've made any statement about that one way or the other. Perhaps they are not. But I can see why people want to use workarounds like this. The alternative is having to dump-and-restore instead of an in-place upgrade, and that's painfully slow by comparison. pg_upgrade itself doesn't give you any tools to deal with this kind of situation, but the process is just loose enough to allow people to insert their own workarounds, so they do. I'm sure I'd do the same, in their shoes. My view on this is that, while we probably don't want to make such things officially supported, I don't think we should ban it outright, either. We probably can't support an upgrade after the next cluster has been subjected to arbitrary amounts of tinkering, but we're making a mistake if we write code that has fragile assumptions for no really good reason. I think we can do better than this excerpt from your patch, for example: + /* Keep track of whether a filenode matches the OID */ + if (maps[mapnum].relfilenumber == LargeObjectRelationId) + *has_lotable = true; + if (maps[mapnum].relfilenumber == LargeObjectLOidPNIndexId) + *has_loindex = true; I spent a while struggling to understand this because it seems to me that every database has an LO table and an LO index, so what's up with these names? I think what these names are really tracking is whether the relfilenumber of pg_largeobject and its index in the old database had their default values. But this makes the assumption that the LO table and LO index in the new database have never been subjected to VACUUM FULL or CLUSTER and, while there's no real reason to do that, I can't quite see what the point of such an unnecessary and fragile assumption might be. If Dilip's patch to make relfilenodes 56 bits gets committed, this is going to break anyway, because with that patch, the relfilenode for a table or index doesn't start out being equal to its OID. Perhaps a better solution to this particular problem is to remove the backing files for the large object table and index *before* restoring the dump, deciding what files to remove by asking the running server for the file path. It might seem funny to allow for dangling pg_class entries, but we're going to create that situation for all other user rels anyway, and pg_upgrade treats pg_largeobject as a user rel. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: