On Thu, 30 Oct 2008, Joshua D. Drake wrote:
>> This reminds me yet again that pg_clearxlogtail should probably get added
>> to the next commitfest for inclusion into 8.4; it's really essential for a
>> WAN-based PITR setup and it would be nice to include it with the
>> distribution.
>
> What is to be gained over just using rsync with -z?
When a new XLOG segment is created, it gets zeroed out first, so that
there's no chance it can accidentally look like a valid segment. But when
an existing segment is recycled, it gets a new header and that's it--the
rest of the 16MB is still left behind from whatever was in that segment
before. That means that even if you only write, say, 1MB of new data to a
recycled segment before a timeout that causes you to ship it somewhere
else, there will still be a full 15MB worth of junk from its previous life
which may or may not be easy to compress.
I just noticed that recently this project has been pushed into pgfoundry,
it's at
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/clearxlogtail/clearxlogtail/
What clearxlogtail does is look inside the WAL segment, and it clears the
"tail" behind the portion of that is really used. So our example file
would end up with just the 1MB of useful data, followed by 15MB of zeros
that will compress massively. Since it needs to know how XLogPageHeader
is formatted and if it makes a mistake your archive history will be
silently corrupted, it's kind of a scary utility to just download and use.
That's why I'd like to see it turn into a more official contrib module, so
that it will never lose sync with the page header format and be available
to anyone using PITR.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD