Re: [HACKERS] Full page writes improvement, code update - Mailing list pgsql-patches

From Koichi Suzuki
Subject Re: [HACKERS] Full page writes improvement, code update
Date
Msg-id 463148C5.6060502@oss.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] Full page writes improvement, code update  (Josh Berkus <josh@agliodbs.com>)
List pgsql-patches
Josh,

Josh Berkus wrote:
> Koichi, Andreas,
>
>> 1) To deal with partial/inconsisitent write to the data file at crash
>> recovery, we need full page writes at the first modification to pages
>> after each checkpoint.   It consumes much of WAL space.
>
> We need to find a way around this someday.  Other DBs don't do this; it may be
> becuase they're less durable, or because they fixed the problem.

Maybe both.   Fixing the problem may need some means to detect
partial/inconsistent writes to the data files, which may needs
additional CPU resource.

>
>> I don't think there should be only one setting.   It depend on how
>> database is operated.   Leaving wal_add_optiomization_info = off default
>> does not bring any change in WAL and archive log handling.   I
>> understand some people may not be happy with additional 3% or so
>> increase in WAL size, especially people who dosn't need archive log at
>> all.   So I prefer to leave the default off.
>
> Except that, is there any reason to turn this off if we are archiving?  Maybe
> it should just be slaved to archive_command ... if we're not using PITR, it's
> off, if we are, it's on.

Hmm, this sounds to work.  On the other hand, existing users, who are
happy with the current archiving and would not like to change current
archiving command to pg_compresslog or archive log size will increase a
bit.  I'd like to hear some more on this.

>
>>> 1) is there any throughput benefit for platforms with fast CPU but
>>> contrained I/O (e.g. 2-drive webservers)?  Any penalty for servers with
>>> plentiful I/O?
>> I've only run benchmarks with archive process running, because
>> wal_add_optimization_info=on does not make sense if we don't archive
>> WAL.   In this situation, total I/O decreases because writes to archive
>> log decreases.   Because of 3% or so increase in WAL size, there will be
>> increase in WAL write, but decrease in archive writes makes it up.
>
> Yeah, I was just looking for a way to make this a performance feature.  I see
> now that it can't be.  ;-)

As to the performance feature, I tested the patch against 8.3HEAD.
With pgbench, throughput was as follows:
Case1. Archiver: cp command, wal_add_optimization_info = off,
        full_page_writes=on
Case2. Archiver: pg_compresslog, wal_add_optimization_info = on,
        full_page_writes=on
DB Size: 1.65GB, Total transaction:1,000,000

Throughput was:
Case1: 632.69TPS
Case2: 653.10TPS ... 3% gain.

Archive Log Size:
Case1: 1.92GB
Case2: 0.57GB (about 30% of the Case1)... Before compression, the size
was 1.92GB.  Because this is based on the number of WAL segment file
size, there will be at most 16MB error in the measurement.  If we count
this, the increase in WAL I/O will be less than 1%.


>
>>> 3) How is this better than command-line compression for log-shipping?
>>> e.g. why do we need it in the database?
>> I don't fully understand what command-line compression means.   Simon
>> suggested that this patch can be used with log-shipping and I agree.
>> If we compare compression with gzip or other general purpose
>> compression, compression ratio, CPU usage and I/O by pg_compresslog are
>> all quite better than those in gzip.
>
> OK, that answered my question.
>
>> This is why I don't like Josh's suggested name of wal_compressable
>> eighter.
>> WAL is compressable eighter way, only pg_compresslog would need to be
>> more complex if you don't turn off the full page optimization. I think a
>> good name would tell that you are turning off an optimization.
>> (thus my wal_fullpage_optimization on/off)
>
> Well, as a PG hacker I find the name wal_fullpage_optimization quite baffling
> and I think our general user base will find it even more so.  Now that I have
> Koichi's explanation of the problem, I vote for simply slaving this to the
> PITR settings and not having a separate option at all.

Could I have more specific suggestion on this?

Regards;


--
-------------
Koichi Suzuki

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Reviewers Guide to Deferred Transactions/Transaction Guarantee
Next
From: Bruce Momjian
Date:
Subject: Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee