Re: Archive log compression keeping physical log available in the crash recovery - Mailing list pgsql-hackers

From Koichi Suzuki
Subject Re: Archive log compression keeping physical log available in the crash recovery
Date
Msg-id 45CC0342.7020200@oss.ntt.co.jp
Whole thread Raw
In response to Re: Archive log compression keeping physical log available in the crash recovery  (Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp>)
List pgsql-hackers
Further information about the following evaluation:

Pgbench throughput was as follows:
Full WAL archiving (full_page_writes=on), 48.3GB archive: 123TPS
Gzip WAL compress, 8.8GB archive: 145TPS
Physical log removal, 2.36GB archive: 148TPS
full_page_writes=off, 2.42GB archive: 161TPS

Koichi Suzuki wrote:
> Sorry for the late responce;
> 
> Gzip can reduce the archive log size about one fourth.   My point is 
> that it can still be large enough.    Removing physical log record (by 
> replacing them with logical log record) from archive log will achieve 
> will shrink the size of the archive log to one twentieth, in the case of 
> pgbehcn test about ten hours (3,600,000 transactions) with database size 
> about 2GB.   In the case of gzip, maybe becuase of higher CPU load, 
> total throughput for gzip is less than just copying WAL to archive.  In 
> our case, throughput seems to be slightly higher than just copying 
> (preserving physical log) or gzip.   I'll gather the meaturement result 
> and try to post.
> 
> The size of archive log seems not affected by the size of the database, 
> but just by the number of transactions.  In the case of 
> full_page_writes=on and full_page_compress=on, compressed archive log 
> size seems to be dependent only on the number of transactions and 
> transaction characteristics.
> 
> Our evaluation result is as follows:
> Database size: 2GB
> WAL size (after 10hours pgbench run): 48.3GB
> gzipped size: 8.8GB
> removal of the physical log: 2.36GB
> fullpage_writes=off log size: 2.42GB
> 
> The reason why archive log size of our case is slightly smaller than 
> full_page_writes=off is because we remove not only the physical logs but 
> also each page header and the dummy part at the tail of each log segment.
> 
> Further, we can apply gzip to this archive (2.36GB).   Final size is 
> 0.75GB, less than one sixtieth of the original WAL.
> 
> Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec, 
> and our compression to 2.36GB needed about 1010sec, slightly less than 
> just cat command (1386sec).   When gzip is combined with our compression 
> (48.3GB to 0.75GB), total duration was about 1330sec.
> 
> This shows that phyiscal log removal is good selection for the following 
> case:
> 
> 1) Need same crash recovery possibility as full_page_writes=on, and
> 2) Need to shrink the size of archive log for loger period to store.
> 
> Of course, if we care crash recovery in PITR slave, we still need 
> physical log records in archive log.   In this case, because archive log 
> is not intended to be kept long, its size will not be an issue.
> 
> I'm planning to do archive log size evalutation with other benchmarks 
> such as DBT-2 as well.
> 
> Materials for this has already been thrown to HACKERS and PATCHES.   I 
> hope you try this.
> 
> 
> Jim Nasby wrote:
>> I thought the drive behind full_page_writes = off was to reduce the 
>> amount of data being written to pg_xlog, not to shrink the size of a 
>> PITR log archive.
>>
>> ISTM that if you want to shrink a PITR log archive you'd be able to 
>> get good results by (b|g)zip'ing the WAL files in the archive. I quick 
>> test on my laptop shows over a 4x reduction in size. Presumably that'd 
>> be even larger if you increased the size of WAL segments.
>>
>> On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote:
>>
>>> This is a proposal for archive log compression keeping physical log 
>>> in WAL.
>>>
>>> In PotgreSQL 8.2, full-page_writes option came back to cut out physical
>>> log both from WAL and archive log.   To deal with the partial write
>>> during the online backup, physical log is written only during the online
>>> backup.
>>>
>>> Although this dramatically reduces the log size, it can risk the crash
>>> recovery.   If any page is inconsisitent because of the fault, crash
>>> recovery doesn't work because full page images are necessary to recover
>>> the page in such case.  For critical use, especially in commercial use,
>>>  we don't like to risk the crash recovery chance, while reducing the
>>> archive log size will be crucial too for larger databases.    WAL size
>>> itself may be less critical, because they're reused cyclickly.
>>>
>>> Here, I have a simple idea to reduce archive log size while keeping
>>> physical log in xlog:
>>>
>>> 1. Create new GUC: full_page_compress,
>>>
>>> 2. Turn on both the full_page_writes and full_page_compress: physical
>>> log will be written to WAL at the first write to a page after the
>>> checkpoint, just as conventional full_page_writes ON.
>>>
>>> 3. Unless physical log is written during the online backup, this can be
>>> removed from the archive log.   One bit in XLR_BKP_BLOCK_MASK
>>> (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
>>> three of them are in use) and this mark can be set in XLogInsert().
>>> With the both full_page_writes and full_page_compress on, both logical
>>> log and physical log will also be written to WAL with XLR_BKP_REMOVABLE
>>> flag on.  Having both physical and logical log in a same WAL is not
>>> harmful in the crash recovery.  In the crash recovery, physical log is
>>> used if it's available.  Logical log is used in the archive recovery, as
>>> the corresponding physical log will be removed.
>>>
>>> 4. The archive command (separate binary), removes physical logs if
>>> XLR_BKP_REMOVABLE flag is on.   Physical logs will be replaced by a
>>> minumum information of very small size, which is used to restore the
>>> physical log to keep other log records's LSN consistent.
>>>
>>> 5. The restore command (separate binary) restores removed physical log
>>> using the dummy record and restores LSN of other log records.
>>>
>>> 6. We need to rewrite redo functions so that they ignore the dummy
>>> record inserted in 5.  The amount of code modification will be very 
>>> small.
>>>
>>> As a result, size of the archive log becomes as small as the case with
>>> full_page_writes off, while the physical log is still available in the
>>> crash recovery, maintaining the crash recovery chance.
>>>
>>> Comments, questions and any input is welcome.
>>>
>>> -----
>>> Koichi Suzuki, NTT Open Source Center
>>>
>>> --Koichi Suzuki
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 6: explain analyze is your friend
>>>
>>
>> -- 
>> Jim Nasby                                            jim@nasby.net
>> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
>>
>>
>>
> 
> 


-- 
Koichi Suzuki


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: [PATCHES] How can I use 2GB of shared buffers on Windows?
Next
From: "Pavan Deolasee"
Date:
Subject: Re: HOT for PostgreSQL 8.3