Thread: "full_page_writes" makes no difference?

"full_page_writes" makes no difference?

From
Tian Luo
Date:
Hi guys,

No matter I turn on or turn off the "full_page_writes", I always
observe 8192-byte writes of log data for simple write operations
(write/update).

But according to the document, when this is off, it could speed up
operations but may cause problems during recovery. So, I guess this is
because it writes less when the option is turned off. However, this
contradicts my observations ....

If I am not missing anything, I find that the writes of log data go
through function "XLogWrite" in source file
"backend/access/transam/xlog.c".

In this file, log data are written with the following code:

from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
if (write(openLogFile, from, nbytes) != nbytes)
{...
}

So, "nbytes" should always be multiples of XLOG_BLCKSZ, which in the
default case, is 8192.

My question is, if it always writes full pages no matter
"full_page_writes" is on or off, what is the difference?

Thanks!

Regards,
- Tian


Re: "full_page_writes" makes no difference?

From
Markus Wanner
Date:
Hi,

On 05/04/2011 03:46 AM, Tian Luo wrote:
> No matter I turn on or turn off the "full_page_writes", I always
> observe 8192-byte writes of log data for simple write operations
> (write/update).

How did you measure that?  A single transaction doing a single write, I
guess.  Ever tried multiple transactions with a simple write operation
each and checking how much WAL that spits out per transaction?

As I understand it, dirty blocks are written to disk as soon as
feasible.  After all, that helps crash recovery.  With a basically idle
system, "as soon as feasible" might be pretty soon.  However, put your
(disk sub-) system under load and "as soon as feasible" might take awhile.

> But according to the document, when this is off, it could speed up
> operations but may cause problems during recovery. So, I guess this is
> because it writes less when the option is turned off. However, this
> contradicts my observations ....

I think you didn't trigger the savings.  It's about writing full pages
on the first write to a block after a checkpoint.  Did you monitor
checkpoint times of Postgres in your tests?

> If I am not missing anything, I find that the writes of log data go
> through function "XLogWrite" in source file
> "backend/access/transam/xlog.c".
> 
> In this file, log data are written with the following code:
> 
> from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
> nbytes = npages * (Size) XLOG_BLCKSZ;
> if (write(openLogFile, from, nbytes) != nbytes)
> {
>  ...
> }
> 
> So, "nbytes" should always be multiples of XLOG_BLCKSZ, which in the
> default case, is 8192.

That observation seems correct.

Regards

Markus Wanner


Re: "full_page_writes" makes no difference?

From
Pavan Deolasee
Date:


On Wed, May 4, 2011 at 7:16 AM, Tian Luo <jackrobin@gmail.com> wrote:
Hi guys,

No matter I turn on or turn off the "full_page_writes", I always
observe 8192-byte writes of log data for simple write operations
(write/update).


Not sure how you measured it, but ISTM that the correct GUC to play with is "fsync". If thats turned off, the WAL buffers won't be fsynced to the disk at every commit. But that would mean reduced reliability in case of database crash.

 
But according to the document, when this is off, it could speed up
operations but may cause problems during recovery. So, I guess this is
because it writes less when the option is turned off. However, this
contradicts my observations ....


When full_page_writes is turned off, the full page won't be backed up in the WAL record after the first modification after a checkpoint. So yes, it can reduce the amount of WAL written to the disk.
 
Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: "full_page_writes" makes no difference?

From
Pavan Deolasee
Date:


On Wed, May 4, 2011 at 5:46 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:


On Wed, May 4, 2011 at 7:16 AM, Tian Luo <jackrobin@gmail.com> wrote:
Hi guys,

No matter I turn on or turn off the "full_page_writes", I always
observe 8192-byte writes of log data for simple write operations
(write/update).


Not sure how you measured it, but ISTM that the correct GUC to play with is "fsync". If thats turned off, the WAL buffers won't be fsynced to the disk at every commit. But that would mean reduced reliability in case of database crash.



And I should have added that post 8.3, we also have a user-settable parameter called synchronous_commit. Normally, database must write WAL up to the commit record to the stable storage when a transaction commits to ensure that there is no data loss in case of database crash. But if synchronous_commit is turned off, the database might delay writing the WAL buffers to the disk, thus reducing write activity, but at a increased risk of data loss.


Thanks,
Pavan 

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: "full_page_writes" makes no difference?

From
Jeff Janes
Date:
On Tue, May 3, 2011 at 6:46 PM, Tian Luo <jackrobin@gmail.com> wrote:
> Hi guys,
>
> No matter I turn on or turn off the "full_page_writes", I always
> observe 8192-byte writes of log data for simple write operations
> (write/update).
>
> But according to the document, when this is off, it could speed up
> operations but may cause problems during recovery. So, I guess this is
> because it writes less when the option is turned off. However, this
> contradicts my observations ....
>
> If I am not missing anything, I find that the writes of log data go
> through function "XLogWrite" in source file
> "backend/access/transam/xlog.c".
>
> In this file, log data are written with the following code:
>
> from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
> nbytes = npages * (Size) XLOG_BLCKSZ;
> if (write(openLogFile, from, nbytes) != nbytes)
> {
>  ...
> }
>
> So, "nbytes" should always be multiples of XLOG_BLCKSZ, which in the
> default case, is 8192.
>
> My question is, if it always writes full pages no matter
> "full_page_writes" is on or off, what is the difference?

The "full pages" refers to the shared_buffers pages, not the xlog pages.

The thing it answers is, does the full shared_buffer page get injected
into the xlog, or just a diff of it?

If you look at the offset of the xlog write, you would see that it is
writing 8192 bytes to the same offset over and over again.

In my hands using pgbench -T 300 -c 1, I get about 16 transactions
each with a 8192 xlog write to the same offset before moving to the
next xlog block.

But immediately after a checkpoint, I get only 1 or 2 writes to the
same offset before moving to the next one, due to full page writes
taking up so much more room in the xlog stream.

Cheers,

Jeff