Re: pg_dump: largeobject behavior issues (possible bug) - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: pg_dump: largeobject behavior issues (possible bug)
Date
Msg-id 553BAB5A.9010707@dunslane.net
Whole thread Raw
In response to Re: pg_dump: largeobject behavior issues (possible bug)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_dump: largeobject behavior issues (possible bug)
List pgsql-hackers
On 04/24/2015 06:41 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 04/23/2015 04:04 PM, Andrew Gierth wrote:
>>> The relevant code is getBlobs in pg_dump.c, which queries the whole of
>>> pg_largeobject_metadata without using a cursor (so the PGresult is
>>> already huge thanks to having >100 million rows), and then mallocs a
>>> BlobInfo array and populates it from the PGresult, also using pg_strdup
>>> for the oid string, owner name, and ACL if any.
>> I'm surprised this hasn't come up before. I have a client that I
>> persuaded to convert all their LOs to bytea fields because of problems
>> with pg_dump handling millions of LOs, and kept them on an older
>> postgres version until they made that change.
> Yeah, this was brought up when we added per-large-object metadata; it was
> obvious that that patch would cause pg_dump to choke on large numbers of
> large objects.  The (perhaps rather lame) argument was that you wouldn't
> have that many of them.
>
> Given that large objects don't have any individual dependencies,
> one could envision fixing this by replacing the individual large-object
> DumpableObjects by a single placeholder to participate in the sort phase,
> and then when it's time to dump that, scan the large objects using a
> cursor and create/print/delete the information separately for each one.
> This would likely involve some rather painful refactoring in pg_dump
> however.


I think we need to think about this some more, TBH, I'm not convinced 
that the changes made back in 9.0 were well conceived. Having separate 
TOC entries for each LO seems wrong in principle, although I understand 
why it was done. For now, my advice would be to avoid use of 
pg_dump/pg_restore if you have large numbers of LOs. The good news is 
that these days there are alternative methods of doing backup / restore, 
albeit not 100% equivalent with pg_dump / pg_restore.

One useful thing might be to provide pg_dump with 
--no-blobs/--blobs-only switches so you could at least easily segregate 
the blobs into their own dump file. That would be in addition to dealing 
with the memory problems pg_dump has with millions of LOs, of course.


cheers

andrew



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: forward vs backward slashes in msvc build code
Next
From: Tom Lane
Date:
Subject: Re: pg_dump: largeobject behavior issues (possible bug)