Hi,
Am Donnerstag, dem 28.02.2019 um 12:30 -0500 schrieb Tom Lane:
> Yeah. We've heard about this repeatedly since large objects were
> promoted to be full-fledged objects with individual owners and ACL
> info. It was argued at the time that large objects would be, well,
> large, and therefore there wouldn't be *that* many of them, and so
> it'd be okay for pg_dump to treat them as full-fledged objects with
> their own dump-table-of-contents entries. Evidently, not everybody
> uses them that way :-(.
To give another data point, we also encountered such a problem at a
customer recently.
They have around 70 tables and a database size of 24 GB, but over 20
million rows in pg_largeobject. The directory dump they are using on an
XFS file system has 14,6 million files in it, leading to a size of the
directory inode of over 450 MB:
|drwx------+ 2 postgres postgres 454M 14. Sep 07:10 xxxxxxxxxtst-20220913
The TOC file is larger than 800 MB and has ca. 14,6 million lines, all
but around 700 of which are BLOB entries.
Dumping with pg_dump -d -j2 leads to frequent out of memory situations
on this node due to the large amount of memory pg_dump needs (and also
because their TSM client apparently also cannot cope with so many files
and takes up 20% of the RAM as well):
|postgres 43844 73.9 8.8 11711000 11680472 ? R 00:27 297:00 /srv/[...]/bin/pg_dump -Fd -b -f
[...]/xxxxxxxxxtst-20220913-j 2 -Z 1 XXXXXXXXXTST
|postgres 43832 0.0 8.8 11711000 11680236 ? S 00:27 0:03 /srv/[...]/bin/pg_dump -Fd -b -f
[...]/xxxxxxxxxtst-20220913-j 2 -Z 1 XXXXXXXXXTST
|postgres 38251 0.3 8.8 11710680 11683100 ? S 00:21 1:22 /srv/[...]/bin/pg_dump -Fd -b -f
[...]/xxxxxxxxxtst-20220913-j 2 -Z 1 XXXXXXXXXTST
> At some point I think we're going to have to revisit that decision in
> pg_dump. Maybe we could merge things so that we have one TOC entry
> for all blobs that share the same owner and ACL? But it comes up
> seldom enough that nobody's gotten motivated to do the work.
Yeah, it would be good if something could be done about this.
> In the short run, the only answer is to run pg_dump on beefier iron,
> or else split up the dump as Haribabu suggests (although TBH I'm
> afraid that might not work either --- if memory serves, pg_dump tends
> to pull in all the per-object info even if it's not going to dump it
> all ...)
In this case, we will try to advise the client to not use large objects
if possible (this is only a test instance so far).
Michael
--
Michael Banck
Teamleiter PostgreSQL-Team
Projektleiter
Tel.: +49 2166 9901-171
E-Mail: michael.banck@credativ.de
credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz