Re: pg_upgrade: transfer pg_largeobject_metadata's files when possible - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pg_upgrade: transfer pg_largeobject_metadata's files when possible
Date
Msg-id t3zu4hyzwqws4puaqd2lhdlw7yp2ckgswuaaizmfi6seto3x4w@5exxg3kazkos
Whole thread Raw
In response to Re: pg_upgrade: transfer pg_largeobject_metadata's files when possible  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: pg_upgrade: transfer pg_largeobject_metadata's files when possible
Re: pg_upgrade: transfer pg_largeobject_metadata's files when possible
List pgsql-hackers
Hi,

On 2026-02-05 13:31:23 -0600, Nathan Bossart wrote:
> On Thu, Feb 05, 2026 at 01:02:17PM -0500, Andres Freund wrote:
> > Upthread I also wondering why we do all the work in getLOs() if we don't
> > actually need most of it (only if there are comments or labels).  Right now
> > that's a very slow and very memory intensive part of doing an upgrade of a
> > system with a lot of binary upgrades.  Do we need *any* of that if we go the
> > path you suggest?
>
> AFAICT we only need it for the comments and security labels later on.

And in binary upgrade mode not even for that, if we do the thing we talked
about re not checking references in binary upgrade mode? Right?


> Commit a45c78e3 did batch 1000 large objects into each ArchiveEntry, but of
> course there can still be a ton of entries.

Entries and then also the PGresult (almost as large as all the entries). The
peak memory usage is pretty bad. We could address the PGresult memory usage
with PQsetChunkedRowsMode() or explicit cursor use, but it doesn't seem
entirely trivial to combine with the batching.


> In theory, we could update the pg_largeobject_metadata query to only
> retrieve LOs with comments and security labels.  I'm not sure it's worth
> trying to optimize further than that; we've long operated under the
> assumption that comments/seclabels on LOs are pretty rare.

I think that'd be a huge improvement. Right now it's not hard to get into a
situation where you have too many LOs to not have enough memory to do a
pg_upgrade.

Memory usage aside, it's also slow and expensive from the query execution and
data transfer side. Because of the ORDER BY that the batching requires, the
server needs to sort all of pg_largeobject_metadata before any rows can be
returned.

For 5M LOs:

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                  QUERY PLAN
                       │
 

├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Sort  (cost=715978.34..728478.39 rows=5000020 width=72) (actual time=10292.252..10652.950 rows=5000020.00 loops=1)
                       │
 
│   Sort Key: pg_largeobject_metadata.lomowner, ((pg_largeobject_metadata.lomacl)::text), pg_largeobject_metadata.oid
                       │
 
│   Sort Method: quicksort  Memory: 509110kB
                       │
 
│   ->  Seq Scan on pg_largeobject_metadata  (cost=0.00..159638.55 rows=5000020 width=72) (actual time=0.034..2284.442
rows=5000020.00loops=1) │
 
│         SubPlan expr_1
                       │
 
│           ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.000 rows=1.00 loops=5000020)
                       │
 
│ Planning Time: 0.117 ms
                       │
 
│ Serialization: time=3961.343 ms  output=218686kB  format=text
                       │
 
│ Execution Time: 14930.747 ms
                       │
 

└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

/usr/bin/time -v pg_dump --binary-upgrade --no-data --quote-all-identifiers --no-statistics --format=custom -f
/tmp/dumplo_5m
 

    Command being timed: "pg_dump --binary-upgrade --no-data --quote-all-identifiers --no-statistics --format=custom -f
/tmp/dumplo_5m"
 
    User time (seconds): 1.85
    System time (seconds): 0.54
    Percent of CPU this job got: 16%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:14.55
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 935084
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 133379
    Voluntary context switches: 30486
    Involuntary context switches: 6
    Swaps: 0
    File system inputs: 0
    File system outputs: 16
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

Peak memory usage ~1GB.


And unfortunately 5M LOs is not a whole lot.


Filtering the LOs in the query to only return ones that have a comment / label
would typically make the query much faster and pg_dump consume a lot less
memory.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Changing the state of data checksums in a running cluster
Next
From: Nathan Bossart
Date:
Subject: Re: pg_upgrade: transfer pg_largeobject_metadata's files when possible