Thread: "full_page_writes" makes no difference?

"full_page_writes" makes no difference?

From
Tian Luo
Date:
Hi guys,

No matter I turn on or turn off the "full_page_writes", I always
observe 8192-byte writes of log data for simple write operations
(write/update).

But according to the document, when this is off, it could speed up
operations but may cause problems during recovery. So, I guess this is
because it writes less when the option is turned off. However, this
contradicts my observations ....

If I am not missing anything, I find that the writes of log data go
through function "XLogWrite" in source file
"backend/access/transam/xlog.c".

In this file, log data are written with the following code:

from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
if (write(openLogFile, from, nbytes) != nbytes)
{
  ...
}

So, "nbytes" should always be multiples of XLOG_BLCKSZ, which in the
default case, is 8192.

My question is, if it always writes full pages no matter
"full_page_writes" is on or off, what is the difference?

Thanks!

Regards,
- Tian


Re: "full_page_writes" makes no difference?

From
Jeff Davis
Date:
On Wed, 2011-05-04 at 00:17 -0400, Tian Luo wrote:
> So, "nbytes" should always be multiples of XLOG_BLCKSZ, which in the
> default case, is 8192.
>
> My question is, if it always writes full pages no matter
> "full_page_writes" is on or off, what is the difference?

Most I/O systems and filesystems can end up writing part of a page (in
this case, 8192 bytes) in the event of a power failure, which is called
a "torn page". That can cause problems for postgresql, because the page
will be a mix of old and new data, which is corrupt.

The solution is "full page writes", which means that when a data page is
modified for the first time after a checkpoint, it logs the entire
contents of the page (except the free space) to WAL, and can use that as
a starting point during recovery. This results in extra WAL data for
safety, but it's unnecessary if your filesytem + IO system guarantee
that there will be no torn pages (and that's the only safe time to turn
it off).

So, to answer your question, the difference is that full_page_writes=off
means less total WAL data, which means fewer 8192-byte writes in the
long run (you have to test long enough to go through a checkpoint to see
this difference, however). PostgreSQL will never issue write() calls
with 17 bytes, or some other odd number, regardless of the
full_page_writes setting.

I can see how the name is slightly misleading, but it has to do with
whether to write this extra information to WAL (where "extra
information" happens to be "full data pages" in this case); not whether
to write the WAL itself in full pages.

Regards,
    Jeff Davis


Re: "full_page_writes" makes no difference?

From
Tian Luo
Date:
Thanks Jeff. It makes sense now. I did a test with DBT2 by turning the
"full_page_write" on and off.

The argument is set to "-d 200 -w 1 -c 10" for a short test. There is
a 7 times difference in the number of pages written.

When the option is on, 1066 pages are written;
When the option is off, 158 pages are written;

I agree with you that the name "full_page_write" is a little bit misleading.

- Tian

On Wed, May 25, 2011 at 5:59 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2011-05-04 at 00:17 -0400, Tian Luo wrote:
>> So, "nbytes" should always be multiples of XLOG_BLCKSZ, which in the
>> default case, is 8192.
>>
>> My question is, if it always writes full pages no matter
>> "full_page_writes" is on or off, what is the difference?
>
> Most I/O systems and filesystems can end up writing part of a page (in
> this case, 8192 bytes) in the event of a power failure, which is called
> a "torn page". That can cause problems for postgresql, because the page
> will be a mix of old and new data, which is corrupt.
>
> The solution is "full page writes", which means that when a data page is
> modified for the first time after a checkpoint, it logs the entire
> contents of the page (except the free space) to WAL, and can use that as
> a starting point during recovery. This results in extra WAL data for
> safety, but it's unnecessary if your filesytem + IO system guarantee
> that there will be no torn pages (and that's the only safe time to turn
> it off).
>
> So, to answer your question, the difference is that full_page_writes=off
> means less total WAL data, which means fewer 8192-byte writes in the
> long run (you have to test long enough to go through a checkpoint to see
> this difference, however). PostgreSQL will never issue write() calls
> with 17 bytes, or some other odd number, regardless of the
> full_page_writes setting.
>
> I can see how the name is slightly misleading, but it has to do with
> whether to write this extra information to WAL (where "extra
> information" happens to be "full data pages" in this case); not whether
> to write the WAL itself in full pages.
>
> Regards,
>        Jeff Davis
>
>