On Tue, Apr 08, 2025 at 01:07:09PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> I do think it's worth considering going back to copying
>> pg_largobject_metadata's files for upgrades from v16 and newer.
>
> (If we do this) I don't see why we'd need to stop at v16. I'm
> envisioning that we'd use COPY, which will be dealing in the
> text representation of aclitems, and I don't think that's changed
> in a long time. The sort of thing that would break it is changes
> in the set of available/default privilege bits for large objects.
>
> That is, where the dump currently contains something like
>
> SELECT pg_catalog.lo_create('2121');
> ALTER LARGE OBJECT 2121 OWNER TO postgres;
> GRANT ALL ON LARGE OBJECT 2121 TO joe;
>
> we'd have
>
> COPY pg_largeobject_metadata FROM STDIN;
> ...
> 2121 10 {postgres=rw/postgres,joe=rw/postgres}
> ...
>
> and some appropriate COPY data for pg_shdepend too.
Attached is a proof-of-concept grade patch for using COPY for
pg_largeobject_metadata and the relevant pg_shdepend entries. On my
laptop, pg_upgrade with 10M LOs (each with a non-bootstrap-superuser owner
and another role with SELECT rights) goes from ~8.5 minutes to ~1 minute
with this patch.
I originally set out to invent a new lo_create_with_owner() function and
teach pg_dump to batch those together in large groups, but as I started the
required pg_dump surgery, I was quickly scared away by the complexity.
Next, I gave COPY a try. The improvements from using COPY will likely be
limited to the pg_upgrade case, but that's the only case I regularly hear
complaints about for zillions of large objects, so maybe it's good enough
for now.
For the COPY approach, I modified pg_dump to dump the contents of
pg_largeobject_metadata. This is easy enough, but I ran into problems with
the dependent comments and security labels. It turns out that even before
v12, we run all the lo_create() commands just so that creating the comments
and security labels works (AFAICT). So I suspect upgrading with many large
objects has always been slow. The comment/security label dependency issue
can be fixed (at least well enough for the tests) by moving
PRIO_LARGE_OBJECT below PRIO_TABLE_DATA. There might be an existing issue
here, because dbObjectTypePriorities has the following comment:
* NOTE: object-type priorities must match the section assignments made in
* pg_dump.c; that is, PRE_DATA objects must sort before DO_PRE_DATA_BOUNDARY,
* POST_DATA objects must sort after DO_POST_DATA_BOUNDARY, and DATA objects
* must sort between them.
But dumpLO() puts large objects in SECTION_DATA, and PRIO_LARGE_OBJECT is
before PRIO_PRE_DATA_BOUNDARY. I admittedly haven't spent too much time
investigating this, though. In any case, it might be a good idea to also
make sure we explicitly mark the large objects and their comments/seclabels
as dependent on the pg_largeobject_metadata data.
That leaves pg_shdepend. For now, I've just instructed pg_upgrade to COPY
the relevant pg_shdepend rows as an independent step, but perhaps there's a
reasonably straightforward way to put that in pg_dump, too.
--
nathan