Thread: Is full_page_writes=off safe in conjunction with PITR?

Is full_page_writes=off safe in conjunction with PITR?

From
Tom Lane
Date:
While thinking about the patch I just made to allow full_page_writes to
be turned off again, it struck me that this patch only fixes the problem
for post-crash XLOG replay.  There is still a hazard if the variable is
turned off in a PITR master system.  The reason is that while a base
backup is being taken, the backup-taker might read an inconsistent state
of a page and include that in the backup.  This is not a problem if
full_page_writes is ON --- it's logically equivalent to a torn page
write and will be fixed on the slave by XLOG replay.  But it *is* a
problem if full_page_writes is OFF, for exactly the same reason that
torn page writes are a problem.

I think we had originally argued that there was no problem anyway
because the kernel should cause the page write to appear atomic to other
processes (since we issue it in a single write() command).  But that's
only true if the backup-taker reads in units that are multiples of
BLCKSZ.  If the backup-taker reads, say, 4K at a time then it's
certainly possible that it gets a later version of the second half of a
page than it got of the first half.  I don't know about you, but I sure
don't feel comfortable making assumptions at that level about the
behavior of tar or cpio.

I fear we still have to disable full_page_writes (force it ON) if
XLogArchivingActive is on.  Comments?
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with

From
Hannu Krosing
Date:
Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane:

> I think we had originally argued that there was no problem anyway
> because the kernel should cause the page write to appear atomic to other
> processes (since we issue it in a single write() command).  But that's
> only true if the backup-taker reads in units that are multiples of
> BLCKSZ.  If the backup-taker reads, say, 4K at a time then it's
> certainly possible that it gets a later version of the second half of a
> page than it got of the first half.  I don't know about you, but I sure
> don't feel comfortable making assumptions at that level about the
> behavior of tar or cpio.
> 
> I fear we still have to disable full_page_writes (force it ON) if
> XLogArchivingActive is on.  Comments?

Why not just tell the backup-taker to take backups using 8K pages ? 

---------------
Hannu




Re: Is full_page_writes=off safe in conjunction with PITR?

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane:
>> If the backup-taker reads, say, 4K at a time then it's
>> certainly possible that it gets a later version of the second half of a
>> page than it got of the first half.  I don't know about you, but I sure
>> don't feel comfortable making assumptions at that level about the
>> behavior of tar or cpio.
>> 
>> I fear we still have to disable full_page_writes (force it ON) if
>> XLogArchivingActive is on.  Comments?

> Why not just tell the backup-taker to take backups using 8K pages ? 

How?  (No, I don't think tar's blocksize options control this
necessarily --- those indicate the blocking factor on the *tape*.
And not everyone uses tar anyway.)

Even if this would work for all popular backup programs, it seems
far too fragile: the consequence of forgetting the switch would be
silent data corruption, which you might not notice until the slave
had been in live operation for some time.
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with PITR?

From
markir@paradise.net.nz
Date:
Quoting Tom Lane <tgl@sss.pgh.pa.us>:
> I fear we still have to disable full_page_writes (force it ON) if
> XLogArchivingActive is on. Comments?

Yeah - if you are enabling PITR, then you care about safety and integrity, so it
makes sense (well, to me anyway).

Cheers

Mark


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Florian Weimer
Date:
* Tom Lane:

> I think we had originally argued that there was no problem anyway
> because the kernel should cause the page write to appear atomic to other
> processes (since we issue it in a single write() command).

I doubt Linux makes any such guarantees.  See this recent thread on
linux-kernel: <http://marc.theaimsgroup.com/?t=114489284200003>


Re: Is full_page_writes=off safe in conjunction with

From
Hannu Krosing
Date:
Ühel kenal päeval, R, 2006-04-14 kell 17:31, kirjutas Tom Lane:
> Hannu Krosing <hannu@skype.net> writes:
> > Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane:
> >> If the backup-taker reads, say, 4K at a time then it's
> >> certainly possible that it gets a later version of the second half of a
> >> page than it got of the first half.  I don't know about you, but I sure
> >> don't feel comfortable making assumptions at that level about the
> >> behavior of tar or cpio.
> >> 
> >> I fear we still have to disable full_page_writes (force it ON) if
> >> XLogArchivingActive is on.  Comments?
> 
> > Why not just tell the backup-taker to take backups using 8K pages ? 
> 
> How? 

use find + dd, or whatever. I just dont want it to be made universally
unavailable just because some users *might* use an file/disk-level
backup solution which is incompatible.

> (No, I don't think tar's blocksize options control this
> necessarily --- those indicate the blocking factor on the *tape*.
> And not everyone uses tar anyway.)

If I'm desperate enough to get the 2x reduction of WAL writes, I may
even write my own backup solution.

> Even if this would work for all popular backup programs, it seems
> far too fragile: the consequence of forgetting the switch would be
> silent data corruption, which you might not notice until the slave
> had been in live operation for some time.

We may declare only one solution to be supported by us with
XLogArchivingActive, say a gnu tar modified to read in Nx8K blocks
( pg_tar :p ).

I guess that even if we can control what operating system does, it is
still possible to get a torn page using some SAN solution, where you can
freeze the image for backup independent of OS.

----------------
Hannu








Re: Is full_page_writes=off safe in conjunction with PITR?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane:
> >> If the backup-taker reads, say, 4K at a time then it's
> >> certainly possible that it gets a later version of the second half of a
> >> page than it got of the first half.  I don't know about you, but I sure
> >> don't feel comfortable making assumptions at that level about the
> >> behavior of tar or cpio.
> >> 
> >> I fear we still have to disable full_page_writes (force it ON) if
> >> XLogArchivingActive is on.  Comments?
> 
> > Why not just tell the backup-taker to take backups using 8K pages ? 
> 
> How?  (No, I don't think tar's blocksize options control this
> necessarily --- those indicate the blocking factor on the *tape*.
> And not everyone uses tar anyway.)
> 
> Even if this would work for all popular backup programs, it seems
> far too fragile: the consequence of forgetting the switch would be
> silent data corruption, which you might not notice until the slave
> had been in live operation for some time.

Yea, it is a problem.  Even a 10k read is going to read 2k into the next
page.

I am thinking we should throw an error on pg_start_backup() and
pg_stop_backup if full_page_writes is off.  Seems archive_command and
full_page_writes can still be used if we are not in the process of doing
a file system backup.

In fact, could we have pg_start_backup() turn on full_page_writes and
have pg_stop_backup turn it off, if postgresql.conf has it off.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I am thinking we should throw an error on pg_start_backup() and
> pg_stop_backup if full_page_writes is off.

No, we'll just change the test in xlog.c so that fullPageWrites is
ignored if XLogArchivingActive.

> Seems archive_command and
> full_page_writes can still be used if we are not in the process of doing
> a file system backup.

Think harder: we are only safe if the first write to a given page after
it's mis-copied by the archiver is a full page write.  The requirement
therefore continues after pg_stop_backup.  Unless you want to add
infrastructure to keep track for *every page* in the DB of whether it's
been fully written since the last backup?
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> If I'm desperate enough to get the 2x reduction of WAL writes, I may
> even write my own backup solution.

Given Florian's concern, sounds like you might have to write your own
kernel too.  In which case, generating a variant build of Postgres
that allows full_page_writes to be disabled is certainly not beyond
your powers.  But for the ordinary mortal DBA, I think this combination
is just too unsafe to even consider.
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I am thinking we should throw an error on pg_start_backup() and
> > pg_stop_backup if full_page_writes is off.
> 
> No, we'll just change the test in xlog.c so that fullPageWrites is
> ignored if XLogArchivingActive.

We should probably throw a LOG message too.

> > Seems archive_command and
> > full_page_writes can still be used if we are not in the process of doing
> > a file system backup.
> 
> Think harder: we are only safe if the first write to a given page after
> it's mis-copied by the archiver is a full page write.  The requirement
> therefore continues after pg_stop_backup.  Unless you want to add
> infrastructure to keep track for *every page* in the DB of whether it's
> been fully written since the last backup?

Ah, yea.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is full_page_writes=off safe in conjunction with

From
Hannu Krosing
Date:
Ühel kenal päeval, L, 2006-04-15 kell 11:49, kirjutas Tom Lane:
> Hannu Krosing <hannu@skype.net> writes:
> > If I'm desperate enough to get the 2x reduction of WAL writes, I may
> > even write my own backup solution.
> 
> Given Florian's concern, sounds like you might have to write your own
> kernel too.  In which case, generating a variant build of Postgres
> that allows full_page_writes to be disabled is certainly not beyond
> your powers.  But for the ordinary mortal DBA, I think this combination
> is just too unsafe to even consider.

I guess that writing our own pg_tar, which cooperates with postgres
backends to get full pages, is still in the realm of possible things,
even on kernels which dont guarantee atomic visibility of write() calls.

But until such is included in the distribution it is a good idea indeed
to disable full_page_writes=off when doing PITR.

--------------
Hannu




Re: Is full_page_writes=off safe in conjunction with

From
Bruce Momjian
Date:
Hannu Krosing wrote:
> ?hel kenal p?eval, L, 2006-04-15 kell 11:49, kirjutas Tom Lane:
> > Hannu Krosing <hannu@skype.net> writes:
> > > If I'm desperate enough to get the 2x reduction of WAL writes, I may
> > > even write my own backup solution.
> > 
> > Given Florian's concern, sounds like you might have to write your own
> > kernel too.  In which case, generating a variant build of Postgres
> > that allows full_page_writes to be disabled is certainly not beyond
> > your powers.  But for the ordinary mortal DBA, I think this combination
> > is just too unsafe to even consider.
> 
> I guess that writing our own pg_tar, which cooperates with postgres
> backends to get full pages, is still in the realm of possible things,
> even on kernels which dont guarantee atomic visibility of write() calls.
> 
> But until such is included in the distribution it is a good idea indeed
> to disable full_page_writes=off when doing PITR.

The cost/benefit of that seems very discouraging.  Most backup
applications allow for a block size to be specified, so it isn't
unreasonable to assume that people who really want PITR and
full_page_writes can easily set the block size to 8k.  However, I don't
think we are going to allow that to be configured --- you would have to
hack up our backend code to allow the combination.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is full_page_writes=off safe in conjunction with

From
"Marko Kreen"
Date:
On 4/16/06, Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> Hannu Krosing wrote:
> > I guess that writing our own pg_tar, which cooperates with postgres
> > backends to get full pages, is still in the realm of possible things,
> > even on kernels which dont guarantee atomic visibility of write() calls.
> >
> > But until such is included in the distribution it is a good idea indeed
> > to disable full_page_writes=off when doing PITR.
>
> The cost/benefit of that seems very discouraging.  Most backup
> applications allow for a block size to be specified, so it isn't
> unreasonable to assume that people who really want PITR and
> full_page_writes can easily set the block size to 8k.  However, I don't
> think we are going to allow that to be configured --- you would have to
> hack up our backend code to allow the combination.

The problem is that they allow configuring _target_ block size,
not reading block size.  I did some tests with strace:

* GNU cpio version 2.5

allows to change only output block size, input block is 512
bytes.  Maybe uses device's block size?

* tar (GNU tar) 1.15.1

the '-b' and '--record-size' options change also input block
size, but to get 8192 bytes for output block, the first read is 7680
bytes to make room for tar header.  the rest of reads are indeed 8192
bytes, but that won't help us anymore.

* cp (coreutils) 5.2.1

fixed block size of 4096 bytes.

* rsync  version 2.6.5

it does not have a way to change input block size.  but it seems
that it reads with 32k blocks or full file if length < 32k.

So we should probably document that rsync is only working solution.

--
marko


Re: Is full_page_writes=off safe in conjunction with

From
Tom Lane
Date:
"Marko Kreen" <markokr@gmail.com> writes:
> So we should probably document that rsync is only working solution.

No, we're just turning off the variable.  One experiment on one version
of rsync doesn't prove it's "safe", even if there weren't the kernel-
behavior issue to consider.
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with

From
Simon Riggs
Date:
On Sat, 2006-04-15 at 11:45 -0400, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I am thinking we should throw an error on pg_start_backup() and
> > pg_stop_backup if full_page_writes is off.
> 
> No, we'll just change the test in xlog.c so that fullPageWrites is
> ignored if XLogArchivingActive.

I can see the danger of which you speak, but does it necessarily apply
to all forms of backup?

> > Seems archive_command and
> > full_page_writes can still be used if we are not in the process of doing
> > a file system backup.
> 
> Think harder: we are only safe if the first write to a given page after
> it's mis-copied by the archiver is a full page write.  The requirement
> therefore continues after pg_stop_backup.  Unless you want to add
> infrastructure to keep track for *every page* in the DB of whether it's
> been fully written since the last backup?

It seems that we should write an API to allow a backup device to ask for
blocks from the database.

--  Simon Riggs EnterpriseDB          http://www.enterprisedb.com/



Re: Is full_page_writes=off safe in conjunction with

From
Martijn van Oosterhout
Date:
On Sat, Apr 15, 2006 at 01:31:58PM +0300, Hannu Krosing wrote:
> > (No, I don't think tar's blocksize options control this
> > necessarily --- those indicate the blocking factor on the *tape*.
> > And not everyone uses tar anyway.)
>
> If I'm desperate enough to get the 2x reduction of WAL writes, I may
> even write my own backup solution.

I must be missing something obvious, but why don't we compress the
xlogs? They appear to be quite compressable (>75%) with standard gzip...
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Is full_page_writes=off safe in conjunction with

From
Hannu Krosing
Date:
Ühel kenal päeval, P, 2006-04-16 kell 11:31, kirjutas Tom Lane:
> "Marko Kreen" <markokr@gmail.com> writes:
> > So we should probably document that rsync is only working solution.
> 
> No, we're just turning off the variable.  One experiment on one version
> of rsync doesn't prove it's "safe", even if there weren't the kernel-
> behavior issue to consider.

But if we do need to consider the kernel-level behaviour mentioned, then
the whole PITR thing becomes an impossibility. Consider the case when we
get a torn page during the initial copy with tar/cpio/rsync/whatever,
and no WAL record updates it.

In that case we will just have a torn page in backup with no way to fix
it.

-------------
Hannu




Re: Is full_page_writes=off safe in conjunction with

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> But if we do need to consider the kernel-level behaviour mentioned, then
> the whole PITR thing becomes an impossibility. Consider the case when we
> get a torn page during the initial copy with tar/cpio/rsync/whatever,
> and no WAL record updates it.

The only way the backup program could read a torn page is if the
database is writing that page concurrently, in which case there must
be a WAL record for the action.

This was all thought through carefully when the PITR mechanism was
designed, and it is solid -- as long as we are doing full-page writes.
Unfortunately, certain people forced that feature into 8.1 without
adequate review of the system's assumptions ...
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Sat, 2006-04-15 at 11:45 -0400, Tom Lane wrote:
>> No, we'll just change the test in xlog.c so that fullPageWrites is
>> ignored if XLogArchivingActive.

> I can see the danger of which you speak, but does it necessarily apply
> to all forms of backup?

No, but the problem is we're not sure which forms are safe; it appears
to depend on poorly-documented details of behavior of both the kernel
and the backup program --- details that might well vary from one version
to the next even of the "same" program.  Given the variety of platforms
PG runs on, I can't see us expending the effort to try to monitor which
combinations it might be safe to not use full_page_writes with.

> It seems that we should write an API to allow a backup device to ask for
> blocks from the database.

I don't think we have the manpower or interest to develop and maintain
our own backup tool --- or tools, actually, as you'd at least want a tar
replacement and an rsync replacement.  Oracle might be able to afford
to throw programmers at that sort of thing, but where are you going to
get volunteers for tasks as mind-numbing as maintaining a PG-specific
tar replacement?
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> I must be missing something obvious, but why don't we compress the
> xlogs? They appear to be quite compressable (>75%) with standard gzip...

Might be worth experimenting with, but I'm a bit dubious.  We've seen
several tests showing that XLogInsert's calculation of a CRC for each
WAL record is a bottleneck (that's why we backed off from 64-bit CRC
to 32-bit recently).  I'd think that any nontrivial compression
algorithm would be vastly slower than CRC ...
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Bruce Momjian
Date:
Bruce Momjian <pgman ( at ) candle ( dot ) pha ( dot ) pa ( dot ) us> writes:
> > I am thinking we should throw an error on pg_start_backup() and
> > pg_stop_backup if full_page_writes is off.
> 
> No, we'll just change the test in xlog.c so that fullPageWrites is
> ignored if XLogArchivingActive.
> 
> > Seems archive_command and
> > full_page_writes can still be used if we are not in the process of doing
> > a file system backup.
> 
> Think harder: we are only safe if the first write to a given page after
> it's mis-copied by the archiver is a full page write.  The requirement
> therefore continues after pg_stop_backup.  Unless you want to add
> infrastructure to keep track for *every page* in the DB of whether it's
> been fully written since the last backup?

I am confused.  Since we checkpoint during pg_start_backup(), isn't any
write to a file while the tar backup is going on going to be a full page
write?  And once we pg_stop_backup(), do we need full page writes?

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is full_page_writes=off safe in conjunction with

From
"Jim C. Nasby"
Date:
On Sun, Apr 16, 2006 at 04:44:50PM -0400, Tom Lane wrote:
> > It seems that we should write an API to allow a backup device to ask for
> > blocks from the database.
> 
> I don't think we have the manpower or interest to develop and maintain
> our own backup tool --- or tools, actually, as you'd at least want a tar
> replacement and an rsync replacement.  Oracle might be able to afford
> to throw programmers at that sort of thing, but where are you going to
> get volunteers for tasks as mind-numbing as maintaining a PG-specific
> tar replacement?

Why would it have to replicate the functionality of tar or rsync? AFAICT
we'd only need the ability to produce something that could be consummed
by either a postgres backend or some other utility of our own creation.
I also think it'd be fine to forgo the rsync capabilities, at least in
an initial version.

Come to think of it, someone not too long ago was proposing an API to
allow a 'PITR slave' to subscribe to a master for WAL segments/changes;
it seems logical to me for that API to also provide the ability to send
relation data as well.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Think harder: we are only safe if the first write to a given page after
>> it's mis-copied by the archiver is a full page write.  The requirement
>> therefore continues after pg_stop_backup.  Unless you want to add
>> infrastructure to keep track for *every page* in the DB of whether it's
>> been fully written since the last backup?

> I am confused.  Since we checkpoint during pg_start_backup(), isn't any
> write to a file while the tar backup is going on going to be a full page
> write?  And once we pg_stop_backup(), do we need full page writes?

Hm.  The case I was concerned about was where a page is never written
to while the backup occurs (thus not triggering any full-page WAL
entry), and then the first post-backup write is partial.  However, if
the backup is guaranteed to have captured a non-torn copy of such a page
then there shouldn't be any problem.  So if we consider the initial
checkpoint to be a *required part* of pg_start_backup (right now it is
not) then maybe we can get away with this.  It needs more eyeballs on it
though ... after having been burnt once by full_page_writes, I'm pretty
shy ...
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> Think harder: we are only safe if the first write to a given page after
> >> it's mis-copied by the archiver is a full page write.  The requirement
> >> therefore continues after pg_stop_backup.  Unless you want to add
> >> infrastructure to keep track for *every page* in the DB of whether it's
> >> been fully written since the last backup?
> 
> > I am confused.  Since we checkpoint during pg_start_backup(), isn't any
> > write to a file while the tar backup is going on going to be a full page
> > write?  And once we pg_stop_backup(), do we need full page writes?
> 
> Hm.  The case I was concerned about was where a page is never written
> to while the backup occurs (thus not triggering any full-page WAL
> entry), and then the first post-backup write is partial.  However, if
> the backup is guaranteed to have captured a non-torn copy of such a page
> then there shouldn't be any problem.  So if we consider the initial
> checkpoint to be a *required part* of pg_start_backup (right now it is
> not) then maybe we can get away with this.  It needs more eyeballs on it
> though ... after having been burnt once by full_page_writes, I'm pretty
> shy ...

Right.  The comment in pg_start_backup() has to be updated:
   /*    * Force a CHECKPOINT.  This is not strictly necessary, but it seems like    * a good idea to minimize the
amountof past WAL needed to use the    * backup.  Also, this guarantees that two successive backup runs will    * have
differentcheckpoint positions and hence different history file    * names, even if nothing happened in between.    */
RequestCheckpoint(true,false);
 

This is a much simpler fix than people talking about writing their own
backup programs.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is full_page_writes=off safe in conjunction with

From
"Joshua D. Drake"
Date:
> Come to think of it, someone not too long ago was proposing an API to
> allow a 'PITR slave' to subscribe to a master for WAL segments/changes;
> it seems logical to me for that API to also provide the ability to send
> relation data as well.

Is that what replication is for?

Joshua D. Drake

-- 
           === The PostgreSQL Company: Command Prompt, Inc. ===     Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240    Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/






Re: Is full_page_writes=off safe in conjunction with PITR?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> This is a much simpler fix than people talking about writing their own
> backup programs.

Well, it's still not exactly trivial.  The hack that was being proposed
involved having the admin manually do
full_page_writes = ON (ie, edit config file and SIGHUP)pg_start_backuptake backup dumppg_stop_backupfull_page_writes =
OFF(ie, edit config file and SIGHUP)
 

with some additions to pg_start_backup/pg_stop_backup to complain if 
full_page_writes isn't ON.  Aside from being a PITA, this isn't at
all secure, first for the obvious reason that we're only checking
full_page_writes at start/stop and not whether it was on for the whole
interval, and second because SIGHUP is asynchronous.  Backends respond
to the signal when they feel like it (in practice, upon starting a new
interactive command) and so it'd be quite possible for a long-running
query to still be executing with full_page_writes off long after the
pg_start_backup has occurred.

If we were to do this, I'd want some more-bulletproof mechanism for
forcing full_page_writes on during the backup.  We could probably
keep a "backup in progress" flag in shared memory, and examine that
along with the GUC variable before deciding to omit a full-page write.

I seem to recall that there were previous proposals for such a flag,
which I resisted because I didn't want any macroscopic user-visible
change in behavior during a backup.  But forcing full-page WAL writes
is something I could live with as a "backup mode" behavior.
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > This is a much simpler fix than people talking about writing their own
> > backup programs.
> 
> Well, it's still not exactly trivial.  The hack that was being proposed
> involved having the admin manually do
> 
>     full_page_writes = ON (ie, edit config file and SIGHUP)
>     pg_start_backup
>     take backup dump
>     pg_stop_backup
>     full_page_writes = OFF (ie, edit config file and SIGHUP)
> 
> with some additions to pg_start_backup/pg_stop_backup to complain if 
> full_page_writes isn't ON.  Aside from being a PITA, this isn't at
> all secure, first for the obvious reason that we're only checking
> full_page_writes at start/stop and not whether it was on for the whole
> interval, and second because SIGHUP is asynchronous.  Backends respond
> to the signal when they feel like it (in practice, upon starting a new
> interactive command) and so it'd be quite possible for a long-running
> query to still be executing with full_page_writes off long after the
> pg_start_backup has occurred.
> 
> If we were to do this, I'd want some more-bulletproof mechanism for
> forcing full_page_writes on during the backup.  We could probably
> keep a "backup in progress" flag in shared memory, and examine that
> along with the GUC variable before deciding to omit a full-page write.
> 
> I seem to recall that there were previous proposals for such a flag,
> which I resisted because I didn't want any macroscopic user-visible
> change in behavior during a backup.  But forcing full-page WAL writes
> is something I could live with as a "backup mode" behavior.

Yes, good point.  The setting has to be seen by all backends at the same
time, so yea, a shared memory variable seems required.

The manual method is clearly a loser.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> If we were to do this, I'd want some more-bulletproof mechanism for
>> forcing full_page_writes on during the backup.  We could probably
>> keep a "backup in progress" flag in shared memory, and examine that
>> along with the GUC variable before deciding to omit a full-page write.

> Yes, good point.  The setting has to be seen by all backends at the same
> time, so yea, a shared memory variable seems required.

I've applied a patch for this.  On reflection, the CHECKPOINT during
pg_start_backup was actually necessary for torn-page safety even without
full_page_writes off.  The reason is that the torn-page risk occurs when
we write a page from shared memory, not when we modify it in memory.
Without a CHECKPOINT, a page modified just before pg_start_backup could
be dumped during the backup and then be saved in a torn state, even
though no WAL record for it is emitted anytime during the backup
procedure.  So that comment's been wrong all along.
        regards, tom lane


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> If we were to do this, I'd want some more-bulletproof mechanism for
> >> forcing full_page_writes on during the backup.  We could probably
> >> keep a "backup in progress" flag in shared memory, and examine that
> >> along with the GUC variable before deciding to omit a full-page write.
> 
> > Yes, good point.  The setting has to be seen by all backends at the same
> > time, so yea, a shared memory variable seems required.
> 
> I've applied a patch for this.  On reflection, the CHECKPOINT during
> pg_start_backup was actually necessary for torn-page safety even without
> full_page_writes off.  The reason is that the torn-page risk occurs when
> we write a page from shared memory, not when we modify it in memory.
> Without a CHECKPOINT, a page modified just before pg_start_backup could
> be dumped during the backup and then be saved in a torn state, even
> though no WAL record for it is emitted anytime during the backup
> procedure.  So that comment's been wrong all along.

Great, yea, checkpoing syncs up the dirty buffers with the file system,
and it is true we need that to happen before the backup begins.

The idea of creating functions to mark start/stop of backup has clearly
been a win here.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is full_page_writes=off safe in conjunction with PITR?

From
"Jim C. Nasby"
Date:
On Mon, Apr 17, 2006 at 03:00:58PM -0400, Tom Lane wrote:
> I've applied a patch for this.  On reflection, the CHECKPOINT during
> pg_start_backup was actually necessary for torn-page safety even without
> full_page_writes off.  The reason is that the torn-page risk occurs when
> we write a page from shared memory, not when we modify it in memory.
> Without a CHECKPOINT, a page modified just before pg_start_backup could
> be dumped during the backup and then be saved in a torn state, even
> though no WAL record for it is emitted anytime during the backup
> procedure.  So that comment's been wrong all along.

Are you going to back-patch this? If I understand correctly current
behavior could mean people using PITR may have invalid backups. In the
meantime, perhaps we should send an email to -annouce recommending that
folks issue a CHEKCPOINT; after pg_start_backup and before initiating
the filesystem copy.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Bruce Momjian
Date:
Jim C. Nasby wrote:
> On Mon, Apr 17, 2006 at 03:00:58PM -0400, Tom Lane wrote:
> > I've applied a patch for this.  On reflection, the CHECKPOINT during
> > pg_start_backup was actually necessary for torn-page safety even without
> > full_page_writes off.  The reason is that the torn-page risk occurs when
> > we write a page from shared memory, not when we modify it in memory.
> > Without a CHECKPOINT, a page modified just before pg_start_backup could
> > be dumped during the backup and then be saved in a torn state, even
> > though no WAL record for it is emitted anytime during the backup
> > procedure.  So that comment's been wrong all along.
> 
> Are you going to back-patch this? If I understand correctly current
> behavior could mean people using PITR may have invalid backups. In the
> meantime, perhaps we should send an email to -annouce recommending that
> folks issue a CHEKCPOINT; after pg_start_backup and before initiating
> the filesystem copy.

We are disabling full_page_writes for 8.1.4, so they should be fine.


--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is full_page_writes=off safe in conjunction with PITR?

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > On Mon, Apr 17, 2006 at 03:00:58PM -0400, Tom Lane wrote:
> > > I've applied a patch for this.  On reflection, the CHECKPOINT during
> > > pg_start_backup was actually necessary for torn-page safety even without
> > > full_page_writes off.  The reason is that the torn-page risk occurs when
> > > we write a page from shared memory, not when we modify it in memory.
> > > Without a CHECKPOINT, a page modified just before pg_start_backup could
> > > be dumped during the backup and then be saved in a torn state, even
> > > though no WAL record for it is emitted anytime during the backup
> > > procedure.  So that comment's been wrong all along.
> > 
> > Are you going to back-patch this? If I understand correctly current
> > behavior could mean people using PITR may have invalid backups. In the
> > meantime, perhaps we should send an email to -annouce recommending that
> > folks issue a CHEKCPOINT; after pg_start_backup and before initiating
> > the filesystem copy.
> 
> We are disabling full_page_writes for 8.1.4, so they should be fine.

Just to clarify, 8.1.4 will remove control for turning off
full_page_writes, but 8.2 will allow such control, and allow it can be
used with PITR because we will automatically turn it on during file
system backup.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is full_page_writes=off safe in conjunction with

From
Simon Riggs
Date:
On Sun, 2006-04-16 at 16:44 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:

> > It seems that we should write an API to allow a backup device to ask for
> > blocks from the database.
> 
> I don't think we have the manpower or interest to develop and maintain
> our own backup tool --- or tools, actually, as you'd at least want a tar
> replacement and an rsync replacement.  Oracle might be able to afford
> to throw programmers at that sort of thing, but where are you going to
> get volunteers for tasks as mind-numbing as maintaining a PG-specific
> tar replacement?

Agreed. The only reason to do that would be to combine it with an
incremental backup solution also, so that some positive benefit also
came from the work.

I think an easier answer must be to make pg_start_backup() throw a
checkpoint, then hold any database writes until pg_stop_backup() is
called. (In the case of full_page_writes = off and fsync = on only).
That way all the data is fsynced to disk and the physical backup is
guaranteed to see whole blocks always, as we need it to. 

--  Simon Riggs EnterpriseDB          http://www.enterprisedb.com/



Re: Is full_page_writes=off safe in conjunction with

From
Hannu Krosing
Date:
Ühel kenal päeval, E, 2006-04-17 kell 17:14, kirjutas Bruce Momjian:
> Jim C. Nasby wrote:
> > Are you going to back-patch this? If I understand correctly current
> > behavior could mean people using PITR may have invalid backups. In the
> > meantime, perhaps we should send an email to -annouce recommending that
> > folks issue a CHEKCPOINT; after pg_start_backup and before initiating
> > the filesystem copy.
> 
> We are disabling full_page_writes for 8.1.4, so they should be fine.

Except that people currently using full_page_writes=off on 8.1 may see a sudden 
drop in performance after upgrading.

Do you have an estimate, how big the impact is ?

-----------
Hannu




Re: Is full_page_writes=off safe in conjunction with

From
Bruce Momjian
Date:
Hannu Krosing wrote:
> ?hel kenal p?eval, E, 2006-04-17 kell 17:14, kirjutas Bruce Momjian:
> > Jim C. Nasby wrote:
> > > Are you going to back-patch this? If I understand correctly current
> > > behavior could mean people using PITR may have invalid backups. In the
> > > meantime, perhaps we should send an email to -annouce recommending that
> > > folks issue a CHEKCPOINT; after pg_start_backup and before initiating
> > > the filesystem copy.
> > 
> > We are disabling full_page_writes for 8.1.4, so they should be fine.
> 
> Except that people currently using full_page_writes=off on 8.1 may see a sudden 
> drop in performance after upgrading.

Yea, but if it can cause corruption, we have no choice.  It will be
mentioned in the release notes.

> Do you have an estimate, how big the impact is ?

Nope.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is full_page_writes=off safe in conjunction with

From
"Joshua D. Drake"
Date:
On Tue, 2006-04-18 at 08:44 -0400, Bruce Momjian wrote:
> Hannu Krosing wrote:
> > ?hel kenal p?eval, E, 2006-04-17 kell 17:14, kirjutas Bruce Momjian:
> > > Jim C. Nasby wrote:
> > > > Are you going to back-patch this? If I understand correctly current
> > > > behavior could mean people using PITR may have invalid backups. In the
> > > > meantime, perhaps we should send an email to -annouce recommending that
> > > > folks issue a CHEKCPOINT; after pg_start_backup and before initiating
> > > > the filesystem copy.
> > > 
> > > We are disabling full_page_writes for 8.1.4, so they should be fine.
> > 
> > Except that people currently using full_page_writes=off on 8.1 may see a sudden 
> > drop in performance after upgrading.
> 
> Yea, but if it can cause corruption, we have no choice.  It will be
> mentioned in the release notes.

Perhaps would should make it more visible then that? The postgresql.org
website has said, PostgreSQL 8.1 released since it was... perhaps it is
time to make it say:

PostgreSQL 8.1.4 Critical Patch released?

Joshua D. Drake


> 
> > Do you have an estimate, how big the impact is ?
> 
> Nope.
> 
-- 
           === The PostgreSQL Company: Command Prompt, Inc. ===     Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240    Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/