Re: Archive log compression keeping physical log available in the crash recovery - Mailing list pgsql-hackers
From | Koichi Suzuki |
---|---|
Subject | Re: Archive log compression keeping physical log available in the crash recovery |
Date | |
Msg-id | 45CC0342.7020200@oss.ntt.co.jp Whole thread Raw |
In response to | Re: Archive log compression keeping physical log available in the crash recovery (Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp>) |
List | pgsql-hackers |
Further information about the following evaluation: Pgbench throughput was as follows: Full WAL archiving (full_page_writes=on), 48.3GB archive: 123TPS Gzip WAL compress, 8.8GB archive: 145TPS Physical log removal, 2.36GB archive: 148TPS full_page_writes=off, 2.42GB archive: 161TPS Koichi Suzuki wrote: > Sorry for the late responce; > > Gzip can reduce the archive log size about one fourth. My point is > that it can still be large enough. Removing physical log record (by > replacing them with logical log record) from archive log will achieve > will shrink the size of the archive log to one twentieth, in the case of > pgbehcn test about ten hours (3,600,000 transactions) with database size > about 2GB. In the case of gzip, maybe becuase of higher CPU load, > total throughput for gzip is less than just copying WAL to archive. In > our case, throughput seems to be slightly higher than just copying > (preserving physical log) or gzip. I'll gather the meaturement result > and try to post. > > The size of archive log seems not affected by the size of the database, > but just by the number of transactions. In the case of > full_page_writes=on and full_page_compress=on, compressed archive log > size seems to be dependent only on the number of transactions and > transaction characteristics. > > Our evaluation result is as follows: > Database size: 2GB > WAL size (after 10hours pgbench run): 48.3GB > gzipped size: 8.8GB > removal of the physical log: 2.36GB > fullpage_writes=off log size: 2.42GB > > The reason why archive log size of our case is slightly smaller than > full_page_writes=off is because we remove not only the physical logs but > also each page header and the dummy part at the tail of each log segment. > > Further, we can apply gzip to this archive (2.36GB). Final size is > 0.75GB, less than one sixtieth of the original WAL. > > Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec, > and our compression to 2.36GB needed about 1010sec, slightly less than > just cat command (1386sec). When gzip is combined with our compression > (48.3GB to 0.75GB), total duration was about 1330sec. > > This shows that phyiscal log removal is good selection for the following > case: > > 1) Need same crash recovery possibility as full_page_writes=on, and > 2) Need to shrink the size of archive log for loger period to store. > > Of course, if we care crash recovery in PITR slave, we still need > physical log records in archive log. In this case, because archive log > is not intended to be kept long, its size will not be an issue. > > I'm planning to do archive log size evalutation with other benchmarks > such as DBT-2 as well. > > Materials for this has already been thrown to HACKERS and PATCHES. I > hope you try this. > > > Jim Nasby wrote: >> I thought the drive behind full_page_writes = off was to reduce the >> amount of data being written to pg_xlog, not to shrink the size of a >> PITR log archive. >> >> ISTM that if you want to shrink a PITR log archive you'd be able to >> get good results by (b|g)zip'ing the WAL files in the archive. I quick >> test on my laptop shows over a 4x reduction in size. Presumably that'd >> be even larger if you increased the size of WAL segments. >> >> On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote: >> >>> This is a proposal for archive log compression keeping physical log >>> in WAL. >>> >>> In PotgreSQL 8.2, full-page_writes option came back to cut out physical >>> log both from WAL and archive log. To deal with the partial write >>> during the online backup, physical log is written only during the online >>> backup. >>> >>> Although this dramatically reduces the log size, it can risk the crash >>> recovery. If any page is inconsisitent because of the fault, crash >>> recovery doesn't work because full page images are necessary to recover >>> the page in such case. For critical use, especially in commercial use, >>> we don't like to risk the crash recovery chance, while reducing the >>> archive log size will be crucial too for larger databases. WAL size >>> itself may be less critical, because they're reused cyclickly. >>> >>> Here, I have a simple idea to reduce archive log size while keeping >>> physical log in xlog: >>> >>> 1. Create new GUC: full_page_compress, >>> >>> 2. Turn on both the full_page_writes and full_page_compress: physical >>> log will be written to WAL at the first write to a page after the >>> checkpoint, just as conventional full_page_writes ON. >>> >>> 3. Unless physical log is written during the online backup, this can be >>> removed from the archive log. One bit in XLR_BKP_BLOCK_MASK >>> (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only >>> three of them are in use) and this mark can be set in XLogInsert(). >>> With the both full_page_writes and full_page_compress on, both logical >>> log and physical log will also be written to WAL with XLR_BKP_REMOVABLE >>> flag on. Having both physical and logical log in a same WAL is not >>> harmful in the crash recovery. In the crash recovery, physical log is >>> used if it's available. Logical log is used in the archive recovery, as >>> the corresponding physical log will be removed. >>> >>> 4. The archive command (separate binary), removes physical logs if >>> XLR_BKP_REMOVABLE flag is on. Physical logs will be replaced by a >>> minumum information of very small size, which is used to restore the >>> physical log to keep other log records's LSN consistent. >>> >>> 5. The restore command (separate binary) restores removed physical log >>> using the dummy record and restores LSN of other log records. >>> >>> 6. We need to rewrite redo functions so that they ignore the dummy >>> record inserted in 5. The amount of code modification will be very >>> small. >>> >>> As a result, size of the archive log becomes as small as the case with >>> full_page_writes off, while the physical log is still available in the >>> crash recovery, maintaining the crash recovery chance. >>> >>> Comments, questions and any input is welcome. >>> >>> ----- >>> Koichi Suzuki, NTT Open Source Center >>> >>> --Koichi Suzuki >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 6: explain analyze is your friend >>> >> >> -- >> Jim Nasby jim@nasby.net >> EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) >> >> >> > > -- Koichi Suzuki
pgsql-hackers by date: