Re: Improving compressibility of WAL files - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Improving compressibility of WAL files
Date
Msg-id Pine.GSO.4.64.0901081850420.2578@westnet.com
Whole thread Raw
In response to Re: Improving compressibility of WAL files  (Hannu Krosing <hannu@krosing.net>)
Responses Re: Improving compressibility of WAL files
Re: Improving compressibility of WAL files
List pgsql-hackers
On Fri, 9 Jan 2009, Hannu Krosing wrote:

> won't it still be easier/less intrusive on inline core functionality and
> more flexible to just record end-of-valid-wal somewhere and then let the
> compressor discard the invalid part when compressing and recreate it
> with zeros on decompression ?

I thought at one point that the direction this was going toward was to 
provide the size of the WAL file as a parameter you can use in the 
archive_command:  %p provides the path, %f the file name, and now %l the 
length.  That makes an example archive command something like:

head -c "%l" "%p" | gzip > /mnt/server/archivedir/"%f"

Expanding it back to always be 16MB on the other side might require some 
trivial script, can't think of a standard UNIX tool suitable for that but 
it's easy enough to write.  I'm assuming I just remembering someone else's 
suggestion here, maybe I just invented the above.  You don't want to just 
modify pg_standby to accept small files, because then you've made it 
harder to make absolutely sure when the file is ready to be processed if a 
non-atomic copy is being done.  And it may make sense to provide some 
simple C implementations of the clear/expand tools in contrib even with 
the %l addition, mainly to help out Windows users.

To reiterate the choices I remember popping up in the multiple rounds this 
has come up, possible implementations that would work for this general 
requirement include:

1) Provide the length as part of the archive command
2) Add a more explicit end-of-WAL delimiter
3) Write zeros to the unused portion in the server
4) pglesslog
5) pg_clearxlogtail

With "(6) use sync rep" being not quite a perfect answer; there are 
certainly WAN-based use cases where you don't want full sync rep but do 
want the WAL to compress as much as possible.

I think (1) is a better solution than most of these in the context of an 
improvement to core, with (4) pglesslog being the main other contender 
because of how it provides additional full-page write improvements.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: KaiGai Kohei
Date:
Subject: Re: New patch for Column-level privileges
Next
From: Bruce Momjian
Date:
Subject: Re: Buffer pool statistics in Explain Analyze