Improving compressibility of WAL files - Mailing list pgsql-hackers

From Bruce Momjian
Subject Improving compressibility of WAL files
Date
Msg-id 200901082139.n08LdeM17528@momjian.us
Whole thread Raw
Responses Re: Improving compressibility of WAL files  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Improving compressibility of WAL files  (Aidan Van Dyk <aidan@highrise.ca>)
List pgsql-hackers
The attached patch from Aidan Van Dyk zeros out the end of WAL files to
improve their compressibility. (The patch was originally sent to
'general' which explains why it was lost until now.)

Would someone please eyeball it?;  it is useful for compressing PITR
logs even if we find a better solution for replication streaming?

As for why this patch is useful:

> > The real reason not to put that functionality into core (or even
> > contrib) is that it's a stopgap kluge.  What the people who want this
> > functionality *really* want is continuous (streaming) log-shipping, not
> > WAL-segment-at-a-time shipping.  Putting functionality like that into
> > core is infinitely more interesting than putting band-aids on a
> > segmented approach.
>
> Well, I realize we want streaming archive logs, but there are still
> going to be people who are archiving for point-in-time recovery, and I
> assume a good number of them are going to compress their WAL files to
> save space, because they have to store a lot of them.  Wouldn't zeroing
> out the trailing byte of WAL still help those people?

---------------------------------------------------------------------------

Aidan Van Dyk wrote:
-- Start of PGP signed section.
> * Aidan Van Dyk <aidan@highrise.ca> [081031 15:11]:
> > How about something like the attached.  It's been spun quickly, passed
> > regression tests, and some simple hand tests on REL8_3_STABLE.  It seem slike
> > HEAD can't  initdb on my machine (quad opteron with SW raid1), I tried a few
> > revision in the last few days, and initdb dies on them all...
>
> OK, HEAD does work, I don't know what was going on previosly... Attached is my
> patch against head.
>
> I'll try and pull out some machines on Monday to really thrash/crash this but
> I'm running out of time today to set that up.
>
> But in running head, I've come accross this:
>     regression=# SELECT pg_stop_backup();
>     WARNING:  pg_stop_backup still waiting for archive to complete (60 seconds elapsed)
>     WARNING:  pg_stop_backup still waiting for archive to complete (120 seconds elapsed)
>     WARNING:  pg_stop_backup still waiting for archive to complete (240 seconds elapsed)
>
> My archive script is *not* running, it ran and exited:
>     mountie@pumpkin:~/projects/postgresql/PostgreSQL/src/test/regress$ ps -ewf | grep post
>     mountie   2904     1  0 16:31 pts/14   00:00:00
/home/mountie/projects/postgresql/PostgreSQL/src/test/regress/tmp_check/install/usr/local/pgsql
>     mountie   2906  2904  0 16:31 ?        00:00:01 postgres: writer process
>     mountie   2907  2904  0 16:31 ?        00:00:00 postgres: wal writer process
>     mountie   2908  2904  0 16:31 ?        00:00:00 postgres: archiver process   last was 00000001000000000000001F
>     mountie   2909  2904  0 16:31 ?        00:00:01 postgres: stats collector process
>     mountie   2921  2904  1 16:31 ?        00:00:18 postgres: mountie regression 127.0.0.1(56455) idle
>
> Those all match up:
>     mountie@pumpkin:~/projects/postgresql/PostgreSQL/src/test/regress$ pstree -acp 2904
>     postgres,2904 -D/home/mountie/projects/postgres
>       ??postgres,2906
>       ??postgres,2907
>       ??postgres,2908
>       ??postgres,2909
>       ??postgres,2921
>
> strace on the "archiver process" postgres:
>     select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>     getppid()                               = 2904
>     select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>     getppid()                               = 2904
>     select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>     getppid()                               = 2904
>     select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>     getppid()                               = 2904
>     select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>     getppid()                               = 2904
>
> It *does* finally finish, postmaster log looks like ("Archving ..." is what my
> archive script prints, bytes is the gzip'ed size):
>     Archiving 000000010000000000000016 [16397 bytes]
>     Archiving 000000010000000000000017 [4405457 bytes]
>     Archiving 000000010000000000000018 [3349243 bytes]
>     Archiving 000000010000000000000019 [3349505 bytes]
>     LOG:  ZEROING xlog file 0 segment 27 from 7954432 - 16777216 [8822784 bytes]
>     Archiving 00000001000000000000001A [3349590 bytes]
>     Archiving 00000001000000000000001B [1596676 bytes]
>     LOG:  ZEROING xlog file 0 segment 28 from 8192 - 16777216 [16769024 bytes]
>     Archiving 00000001000000000000001C [16398 bytes]
>     LOG:  ZEROING xlog file 0 segment 29 from 8192 - 16777216 [16769024 bytes]
>     Archiving 00000001000000000000001D [16397 bytes]
>     LOG:  ZEROING xlog file 0 segment 30 from 8192 - 16777216 [16769024 bytes]
>     Archiving 00000001000000000000001E [16393 bytes]
>     Archiving 00000001000000000000001E.00000020.backup [146 bytes]
>     WARNING:  pg_stop_backup still waiting for archive to complete (60 seconds elapsed)
>     WARNING:  pg_stop_backup still waiting for archive to complete (120 seconds elapsed)
>     WARNING:  pg_stop_backup still waiting for archive to complete (240 seconds elapsed)
>     LOG:  ZEROING xlog file 0 segment 31 from 8192 - 16777216 [16769024 bytes]
>     Archiving 00000001000000000000001F [16395 bytes]
>
>
> So what's this "pg_stop_backup still waiting for archive to complete" for 5
> minutes state?  I've not seen that before (runing 8.2 and 8.3).
>
> a.
> --
> Aidan Van Dyk                                             Create like a god,
> aidan@highrise.ca                                       command like a king,
> http://www.highrise.ca/                                   work like a slave.

[ Attachment, skipping... ]
-- End of PGP section, PGP failed!

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
commit fba38257e52564276bb106d55aef14d0de481169
Author: Aidan Van Dyk <aidan@highrise.ca>
Date:   Fri Oct 31 12:35:24 2008 -0400

    WIP: Zero xlog tal on a forced switch

    If XLogWrite is called with xlog_switch, an XLog swithc has been force, either
    by a timeout based switch (archive_timeout), or an interactive force xlog
    switch (pg_switch_xlog/pg_stop_backup).  In those cases, we assume we can
    afford a little extra IO bandwidth to make xlogs so much more compressable

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 003098f..c6f9c79 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1600,6 +1600,30 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch)
              */
             if (finishing_seg || (xlog_switch && last_iteration))
             {
+                /*
+                 * If we've had an xlog switch forced, then we want to zero
+                 * out the rest of the segment.  We zero it out here because at the
+                 * force switch time, IO bandwidth isn't a problem.
+                 *   -- AIDAN
+                 */
+                if (xlog_switch)
+                {
+                    char buf[1024];
+                    uint32 left = (XLogSegSize - openLogOff);
+                    ereport(LOG,
+                        (errmsg("ZEROING xlog file %u segment %u from %u - %u [%u bytes]",
+                                openLogId, openLogSeg,
+                                openLogOff, XLogSegSize, left)
+                         ));
+                    memset(buf, 0, sizeof(buf));
+                    while (left > 0)
+                    {
+                        size_t len = (left > sizeof(buf)) ? sizeof(buf) : left;
+                        write(openLogFile, buf, len);
+                        left -= len;
+                    }
+                }
+
                 issue_xlog_fsync();
                 LogwrtResult.Flush = LogwrtResult.Write;        /* end of page */


pgsql-hackers by date:

Previous
From: "D'Arcy J.M. Cain"
Date:
Subject: Re: Proposal: new border setting in psql
Next
From: Bruce Momjian
Date:
Subject: Re: Proposal: new border setting in psql