Thread: question regarding full_page_write

question regarding full_page_write

From

AI Rumman

Date:

16 February 2011, 02:36:45

I can't clearly understand what FULL_PAGE_WRITE parameter is stand for.
Documentation suggest that If I make it OFF, then I have the chance for DB crash.
Can anyone please tell me how it could be happened?

Re: question regarding full_page_write

From

Greg Smith

Date:

17 February 2011, 00:31:20

AI Rumman wrote:
> I can't clearly understand what FULL_PAGE_WRITE parameter is stand for.
> Documentation suggest that If I make it OFF, then I have the chance
> for DB crash.
> Can anyone please tell me how it could be happened?

The database writes to disk in 8K blocks.  If you can be sure that your
disk drives and operating system will always write in 8K blocks, you can
get a performance improvement from turning full_page_writes off.  But if
you do that, and it turns out that when the power is interrupted your
disk setup will actually do partial writes of less than 8K, your
database can get corrupted.  Your system needs to ensure that when a
write happens, either the whole thing goes to disk, or none of it does.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: question regarding full_page_write

From

Martín Marqués

Date:

22 August 2011, 18:07:58

2011/2/17 Greg Smith <greg@2ndquadrant.com>:
> AI Rumman wrote:
>>
>> I can't clearly understand what FULL_PAGE_WRITE parameter is stand for.
>> Documentation suggest that If I make it OFF, then I have the chance for DB
>> crash.
>> Can anyone please tell me how it could be happened?
>
> The database writes to disk in 8K blocks.  If you can be sure that your disk
> drives and operating system will always write in 8K blocks, you can get a
> performance improvement from turning full_page_writes off.  But if you do
> that, and it turns out that when the power is interrupted your disk setup
> will actually do partial writes of less than 8K, your database can get
> corrupted.  Your system needs to ensure that when a write happens, either
> the whole thing goes to disk, or none of it does.

Sorry for the late reply, but I was investigating this option in
postgresql.conf and saw this mail.

My question regarding your answer is, why is it important for the
first page after a checkpoint and not on other page writes?


--
Martín Marqués
select 'martin.marques' || '@' || 'gmail.com'
DBA, Programador, Administrador

Re: question regarding full_page_write

From

Greg Smith

Date:

22 August 2011, 18:40:04

On 08/22/2011 05:07 PM, Martín Marqués wrote:
> My question regarding your answer is, why is it important for the
> first page after a checkpoint and not on other page writes?
>

The first time a page is written after a checkpoint, when
full_page_writes is on, the entire 8K page is written out to disk at
that point.  The idea is that if the page is corrupted in any way by a
partial write, you can restore it to a known good state again by using
this version.  After that copy, though, additional modifications to the
page only need to save the delta of what changed, at the row level.  If
there's a crash, during recovery the full page image will be written,
then the series of deltas, ending up with the same data as was intended.

This whole mechanism resets again each time a checkpoint finishes, and
the full page writes start all over again.  One of the main purposes of
checkpoints are to move forward the pointer of how far back crash
recovery needs to replay from.  Starting each new checkpoint over again,
with a full copy of all the data modified going into the WAL, it is part
of that logic.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

Re: question regarding full_page_write

From

Martín Marqués

Date:

24 August 2011, 12:12:08

El día 22 de agosto de 2011 18:39, Greg Smith <greg@2ndquadrant.com> escribió:
> On 08/22/2011 05:07 PM, Martín Marqués wrote:
>>
>> My question regarding your answer is, why is it important for the
>> first page after a checkpoint and not on other page writes?
>>
>
> The first time a page is written after a checkpoint, when full_page_writes
> is on, the entire 8K page is written out to disk at that point.  The idea is
> that if the page is corrupted in any way by a partial write, you can restore
> it to a known good state again by using this version.  After that copy,
> though, additional modifications to the page only need to save the delta of
> what changed, at the row level.  If there's a crash, during recovery the
> full page image will be written, then the series of deltas, ending up with
> the same data as was intended.
>
> This whole mechanism resets again each time a checkpoint finishes, and the
> full page writes start all over again.  One of the main purposes of
> checkpoints are to move forward the pointer of how far back crash recovery
> needs to replay from.  Starting each new checkpoint over again, with a full
> copy of all the data modified going into the WAL, it is part of that logic.

Still something missing (for me :-)):

Checkpoint happens, logs get flushed to disk, WAL segments get
archived if archive is on, and then the are recycled.

Now a new transaction get commited, and at least 1 page has to go to
WAL (those are 8Kb). full_page_writes garantees all 8Kb are written or
nothing. After this thansaction comes another, but here you don't need
the garantee of all 8Kb in WAL (you say because deltas are enough).

Why aren't deltas good enough for the first 8Kb? Is there other
information in the first 8Kb that make those more important?

--
Martín Marqués
select 'martin.marques' || '@' || 'gmail.com'
DBA, Programador, Administrador

Re: question regarding full_page_write

From

Greg Smith

Date:

24 August 2011, 13:22:09

On 08/24/2011 11:12 AM, Martín Marqués wrote:
> Why aren't deltas good enough for the first 8Kb? Is there other
> information in the first 8Kb that make those more important?
>

The fundamental problem is what's called a torn page.  You write out a
8K page; only part of it actually makes it to disk; the server crashes.
What you now have on disk is completely unpredictable.  It's a mix of
the old new page, but what portion of each you got, that's almost
random.  You cannot fix it with any delta.  The only way to make sure
it's back into a good state is to write an entire 8K page, a known
correct copy, from some point in time.

The way this is done in PostgreSQL, one of those is written out to the
WAL the first time any page is touched after a checkpoint.  Those become
the known copy good to recover from any torn page problem.  Then
database recovery only has to replay activity since that checkpoint
marker to fix all torn pages.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us