TODO item: teach pg_dump about sparsely-stored large objects - Mailing list pgsql-hackers

From Tom Lane
Subject TODO item: teach pg_dump about sparsely-stored large objects
Date
Msg-id 18789.1349750451@sss.pgh.pa.us
Whole thread Raw
Responses Re: TODO item: teach pg_dump about sparsely-stored large objects  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
The backend code for large objects goes to some lengths to be
intelligent about sparsely-written blobs: if you seek out to the middle
of nowhere and write a few bytes, you don't end up allocating space in
pg_largeobject for all the byte positions you skipped over.  However,
pg_dump knows nothing of this.  If you pg_dump a sparsely-stored large
object, it will tediously transfer all those nonexistent zeros from
server to client, and write them into the resulting archive.  And then
when you restore, the blob isn't sparse anymore ... those zeroes become
non-virtual on the database side too.

Admittedly, this is no different than what happens when you try to back
up a sparsely-stored Unix file, at least with simpler backup tools.
But it seems to me we should try a bit harder.

There are a couple of stumbling blocks to making that happen:

* How should pg_dump find out where there are holes?  It would be easy
if it were to look into pg_largeobject, but that would destroy the
ability to use pg_dump as non-superuser.  I think we'd really have to
provide some API to read from a blob in a sparse-storage-aware manner.
The first idea that comes to mind is some way to tell lo_read to stop
reading when it hits a gap (instead of manufacturing zeroes) and then
a new "whence" option for lo_lseek that tells it to seek to the next
non-dummy data in the blob.

* How do we get pg_dump to make use of the knowledge once it's got it?
The current code in that area is a masterpiece of ugly unreadability;
it's near impossible to tell what connects to what else, and there are
assorted magic switches that completely change the behavior of major
interface functions.  I'd kind of want to rewrite the whole mess before
trying to change its behavior.  I'm also pretty certain that we'd need
an archive format change, though we've certainly done those before so
that isn't a fatal objection.

I'm not planning to do anything about this myself, but if someone is
looking for a project, here's one.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Joachim Wieland
Date:
Subject: Re: Add FET to Default and Europe.txt
Next
From: Noah Misch
Date:
Subject: Re: Visual Studio 2012 RC