Thread: Is full_page_writes=off safe in conjunction with PITR?
While thinking about the patch I just made to allow full_page_writes to be turned off again, it struck me that this patch only fixes the problem for post-crash XLOG replay. There is still a hazard if the variable is turned off in a PITR master system. The reason is that while a base backup is being taken, the backup-taker might read an inconsistent state of a page and include that in the backup. This is not a problem if full_page_writes is ON --- it's logically equivalent to a torn page write and will be fixed on the slave by XLOG replay. But it *is* a problem if full_page_writes is OFF, for exactly the same reason that torn page writes are a problem. I think we had originally argued that there was no problem anyway because the kernel should cause the page write to appear atomic to other processes (since we issue it in a single write() command). But that's only true if the backup-taker reads in units that are multiples of BLCKSZ. If the backup-taker reads, say, 4K at a time then it's certainly possible that it gets a later version of the second half of a page than it got of the first half. I don't know about you, but I sure don't feel comfortable making assumptions at that level about the behavior of tar or cpio. I fear we still have to disable full_page_writes (force it ON) if XLogArchivingActive is on. Comments? regards, tom lane
Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane: > I think we had originally argued that there was no problem anyway > because the kernel should cause the page write to appear atomic to other > processes (since we issue it in a single write() command). But that's > only true if the backup-taker reads in units that are multiples of > BLCKSZ. If the backup-taker reads, say, 4K at a time then it's > certainly possible that it gets a later version of the second half of a > page than it got of the first half. I don't know about you, but I sure > don't feel comfortable making assumptions at that level about the > behavior of tar or cpio. > > I fear we still have to disable full_page_writes (force it ON) if > XLogArchivingActive is on. Comments? Why not just tell the backup-taker to take backups using 8K pages ? --------------- Hannu
Hannu Krosing <hannu@skype.net> writes: > Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane: >> If the backup-taker reads, say, 4K at a time then it's >> certainly possible that it gets a later version of the second half of a >> page than it got of the first half. I don't know about you, but I sure >> don't feel comfortable making assumptions at that level about the >> behavior of tar or cpio. >> >> I fear we still have to disable full_page_writes (force it ON) if >> XLogArchivingActive is on. Comments? > Why not just tell the backup-taker to take backups using 8K pages ? How? (No, I don't think tar's blocksize options control this necessarily --- those indicate the blocking factor on the *tape*. And not everyone uses tar anyway.) Even if this would work for all popular backup programs, it seems far too fragile: the consequence of forgetting the switch would be silent data corruption, which you might not notice until the slave had been in live operation for some time. regards, tom lane
Quoting Tom Lane <tgl@sss.pgh.pa.us>: > I fear we still have to disable full_page_writes (force it ON) if > XLogArchivingActive is on. Comments? Yeah - if you are enabling PITR, then you care about safety and integrity, so it makes sense (well, to me anyway). Cheers Mark
* Tom Lane: > I think we had originally argued that there was no problem anyway > because the kernel should cause the page write to appear atomic to other > processes (since we issue it in a single write() command). I doubt Linux makes any such guarantees. See this recent thread on linux-kernel: <http://marc.theaimsgroup.com/?t=114489284200003>
Ühel kenal päeval, R, 2006-04-14 kell 17:31, kirjutas Tom Lane: > Hannu Krosing <hannu@skype.net> writes: > > Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane: > >> If the backup-taker reads, say, 4K at a time then it's > >> certainly possible that it gets a later version of the second half of a > >> page than it got of the first half. I don't know about you, but I sure > >> don't feel comfortable making assumptions at that level about the > >> behavior of tar or cpio. > >> > >> I fear we still have to disable full_page_writes (force it ON) if > >> XLogArchivingActive is on. Comments? > > > Why not just tell the backup-taker to take backups using 8K pages ? > > How? use find + dd, or whatever. I just dont want it to be made universally unavailable just because some users *might* use an file/disk-level backup solution which is incompatible. > (No, I don't think tar's blocksize options control this > necessarily --- those indicate the blocking factor on the *tape*. > And not everyone uses tar anyway.) If I'm desperate enough to get the 2x reduction of WAL writes, I may even write my own backup solution. > Even if this would work for all popular backup programs, it seems > far too fragile: the consequence of forgetting the switch would be > silent data corruption, which you might not notice until the slave > had been in live operation for some time. We may declare only one solution to be supported by us with XLogArchivingActive, say a gnu tar modified to read in Nx8K blocks ( pg_tar :p ). I guess that even if we can control what operating system does, it is still possible to get a torn page using some SAN solution, where you can freeze the image for backup independent of OS. ---------------- Hannu
Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane: > >> If the backup-taker reads, say, 4K at a time then it's > >> certainly possible that it gets a later version of the second half of a > >> page than it got of the first half. I don't know about you, but I sure > >> don't feel comfortable making assumptions at that level about the > >> behavior of tar or cpio. > >> > >> I fear we still have to disable full_page_writes (force it ON) if > >> XLogArchivingActive is on. Comments? > > > Why not just tell the backup-taker to take backups using 8K pages ? > > How? (No, I don't think tar's blocksize options control this > necessarily --- those indicate the blocking factor on the *tape*. > And not everyone uses tar anyway.) > > Even if this would work for all popular backup programs, it seems > far too fragile: the consequence of forgetting the switch would be > silent data corruption, which you might not notice until the slave > had been in live operation for some time. Yea, it is a problem. Even a 10k read is going to read 2k into the next page. I am thinking we should throw an error on pg_start_backup() and pg_stop_backup if full_page_writes is off. Seems archive_command and full_page_writes can still be used if we are not in the process of doing a file system backup. In fact, could we have pg_start_backup() turn on full_page_writes and have pg_stop_backup turn it off, if postgresql.conf has it off. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I am thinking we should throw an error on pg_start_backup() and > pg_stop_backup if full_page_writes is off. No, we'll just change the test in xlog.c so that fullPageWrites is ignored if XLogArchivingActive. > Seems archive_command and > full_page_writes can still be used if we are not in the process of doing > a file system backup. Think harder: we are only safe if the first write to a given page after it's mis-copied by the archiver is a full page write. The requirement therefore continues after pg_stop_backup. Unless you want to add infrastructure to keep track for *every page* in the DB of whether it's been fully written since the last backup? regards, tom lane
Hannu Krosing <hannu@skype.net> writes: > If I'm desperate enough to get the 2x reduction of WAL writes, I may > even write my own backup solution. Given Florian's concern, sounds like you might have to write your own kernel too. In which case, generating a variant build of Postgres that allows full_page_writes to be disabled is certainly not beyond your powers. But for the ordinary mortal DBA, I think this combination is just too unsafe to even consider. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I am thinking we should throw an error on pg_start_backup() and > > pg_stop_backup if full_page_writes is off. > > No, we'll just change the test in xlog.c so that fullPageWrites is > ignored if XLogArchivingActive. We should probably throw a LOG message too. > > Seems archive_command and > > full_page_writes can still be used if we are not in the process of doing > > a file system backup. > > Think harder: we are only safe if the first write to a given page after > it's mis-copied by the archiver is a full page write. The requirement > therefore continues after pg_stop_backup. Unless you want to add > infrastructure to keep track for *every page* in the DB of whether it's > been fully written since the last backup? Ah, yea. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Ühel kenal päeval, L, 2006-04-15 kell 11:49, kirjutas Tom Lane: > Hannu Krosing <hannu@skype.net> writes: > > If I'm desperate enough to get the 2x reduction of WAL writes, I may > > even write my own backup solution. > > Given Florian's concern, sounds like you might have to write your own > kernel too. In which case, generating a variant build of Postgres > that allows full_page_writes to be disabled is certainly not beyond > your powers. But for the ordinary mortal DBA, I think this combination > is just too unsafe to even consider. I guess that writing our own pg_tar, which cooperates with postgres backends to get full pages, is still in the realm of possible things, even on kernels which dont guarantee atomic visibility of write() calls. But until such is included in the distribution it is a good idea indeed to disable full_page_writes=off when doing PITR. -------------- Hannu
Hannu Krosing wrote: > ?hel kenal p?eval, L, 2006-04-15 kell 11:49, kirjutas Tom Lane: > > Hannu Krosing <hannu@skype.net> writes: > > > If I'm desperate enough to get the 2x reduction of WAL writes, I may > > > even write my own backup solution. > > > > Given Florian's concern, sounds like you might have to write your own > > kernel too. In which case, generating a variant build of Postgres > > that allows full_page_writes to be disabled is certainly not beyond > > your powers. But for the ordinary mortal DBA, I think this combination > > is just too unsafe to even consider. > > I guess that writing our own pg_tar, which cooperates with postgres > backends to get full pages, is still in the realm of possible things, > even on kernels which dont guarantee atomic visibility of write() calls. > > But until such is included in the distribution it is a good idea indeed > to disable full_page_writes=off when doing PITR. The cost/benefit of that seems very discouraging. Most backup applications allow for a block size to be specified, so it isn't unreasonable to assume that people who really want PITR and full_page_writes can easily set the block size to 8k. However, I don't think we are going to allow that to be configured --- you would have to hack up our backend code to allow the combination. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 4/16/06, Bruce Momjian <pgman@candle.pha.pa.us> wrote: > Hannu Krosing wrote: > > I guess that writing our own pg_tar, which cooperates with postgres > > backends to get full pages, is still in the realm of possible things, > > even on kernels which dont guarantee atomic visibility of write() calls. > > > > But until such is included in the distribution it is a good idea indeed > > to disable full_page_writes=off when doing PITR. > > The cost/benefit of that seems very discouraging. Most backup > applications allow for a block size to be specified, so it isn't > unreasonable to assume that people who really want PITR and > full_page_writes can easily set the block size to 8k. However, I don't > think we are going to allow that to be configured --- you would have to > hack up our backend code to allow the combination. The problem is that they allow configuring _target_ block size, not reading block size. I did some tests with strace: * GNU cpio version 2.5 allows to change only output block size, input block is 512 bytes. Maybe uses device's block size? * tar (GNU tar) 1.15.1 the '-b' and '--record-size' options change also input block size, but to get 8192 bytes for output block, the first read is 7680 bytes to make room for tar header. the rest of reads are indeed 8192 bytes, but that won't help us anymore. * cp (coreutils) 5.2.1 fixed block size of 4096 bytes. * rsync version 2.6.5 it does not have a way to change input block size. but it seems that it reads with 32k blocks or full file if length < 32k. So we should probably document that rsync is only working solution. -- marko
"Marko Kreen" <markokr@gmail.com> writes: > So we should probably document that rsync is only working solution. No, we're just turning off the variable. One experiment on one version of rsync doesn't prove it's "safe", even if there weren't the kernel- behavior issue to consider. regards, tom lane
On Sat, 2006-04-15 at 11:45 -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I am thinking we should throw an error on pg_start_backup() and > > pg_stop_backup if full_page_writes is off. > > No, we'll just change the test in xlog.c so that fullPageWrites is > ignored if XLogArchivingActive. I can see the danger of which you speak, but does it necessarily apply to all forms of backup? > > Seems archive_command and > > full_page_writes can still be used if we are not in the process of doing > > a file system backup. > > Think harder: we are only safe if the first write to a given page after > it's mis-copied by the archiver is a full page write. The requirement > therefore continues after pg_stop_backup. Unless you want to add > infrastructure to keep track for *every page* in the DB of whether it's > been fully written since the last backup? It seems that we should write an API to allow a backup device to ask for blocks from the database. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com/
On Sat, Apr 15, 2006 at 01:31:58PM +0300, Hannu Krosing wrote: > > (No, I don't think tar's blocksize options control this > > necessarily --- those indicate the blocking factor on the *tape*. > > And not everyone uses tar anyway.) > > If I'm desperate enough to get the 2x reduction of WAL writes, I may > even write my own backup solution. I must be missing something obvious, but why don't we compress the xlogs? They appear to be quite compressable (>75%) with standard gzip... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Ühel kenal päeval, P, 2006-04-16 kell 11:31, kirjutas Tom Lane: > "Marko Kreen" <markokr@gmail.com> writes: > > So we should probably document that rsync is only working solution. > > No, we're just turning off the variable. One experiment on one version > of rsync doesn't prove it's "safe", even if there weren't the kernel- > behavior issue to consider. But if we do need to consider the kernel-level behaviour mentioned, then the whole PITR thing becomes an impossibility. Consider the case when we get a torn page during the initial copy with tar/cpio/rsync/whatever, and no WAL record updates it. In that case we will just have a torn page in backup with no way to fix it. ------------- Hannu
Hannu Krosing <hannu@skype.net> writes: > But if we do need to consider the kernel-level behaviour mentioned, then > the whole PITR thing becomes an impossibility. Consider the case when we > get a torn page during the initial copy with tar/cpio/rsync/whatever, > and no WAL record updates it. The only way the backup program could read a torn page is if the database is writing that page concurrently, in which case there must be a WAL record for the action. This was all thought through carefully when the PITR mechanism was designed, and it is solid -- as long as we are doing full-page writes. Unfortunately, certain people forced that feature into 8.1 without adequate review of the system's assumptions ... regards, tom lane
Simon Riggs <simon@2ndquadrant.com> writes: > On Sat, 2006-04-15 at 11:45 -0400, Tom Lane wrote: >> No, we'll just change the test in xlog.c so that fullPageWrites is >> ignored if XLogArchivingActive. > I can see the danger of which you speak, but does it necessarily apply > to all forms of backup? No, but the problem is we're not sure which forms are safe; it appears to depend on poorly-documented details of behavior of both the kernel and the backup program --- details that might well vary from one version to the next even of the "same" program. Given the variety of platforms PG runs on, I can't see us expending the effort to try to monitor which combinations it might be safe to not use full_page_writes with. > It seems that we should write an API to allow a backup device to ask for > blocks from the database. I don't think we have the manpower or interest to develop and maintain our own backup tool --- or tools, actually, as you'd at least want a tar replacement and an rsync replacement. Oracle might be able to afford to throw programmers at that sort of thing, but where are you going to get volunteers for tasks as mind-numbing as maintaining a PG-specific tar replacement? regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > I must be missing something obvious, but why don't we compress the > xlogs? They appear to be quite compressable (>75%) with standard gzip... Might be worth experimenting with, but I'm a bit dubious. We've seen several tests showing that XLogInsert's calculation of a CRC for each WAL record is a bottleneck (that's why we backed off from 64-bit CRC to 32-bit recently). I'd think that any nontrivial compression algorithm would be vastly slower than CRC ... regards, tom lane
Bruce Momjian <pgman ( at ) candle ( dot ) pha ( dot ) pa ( dot ) us> writes: > > I am thinking we should throw an error on pg_start_backup() and > > pg_stop_backup if full_page_writes is off. > > No, we'll just change the test in xlog.c so that fullPageWrites is > ignored if XLogArchivingActive. > > > Seems archive_command and > > full_page_writes can still be used if we are not in the process of doing > > a file system backup. > > Think harder: we are only safe if the first write to a given page after > it's mis-copied by the archiver is a full page write. The requirement > therefore continues after pg_stop_backup. Unless you want to add > infrastructure to keep track for *every page* in the DB of whether it's > been fully written since the last backup? I am confused. Since we checkpoint during pg_start_backup(), isn't any write to a file while the tar backup is going on going to be a full page write? And once we pg_stop_backup(), do we need full page writes? -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sun, Apr 16, 2006 at 04:44:50PM -0400, Tom Lane wrote: > > It seems that we should write an API to allow a backup device to ask for > > blocks from the database. > > I don't think we have the manpower or interest to develop and maintain > our own backup tool --- or tools, actually, as you'd at least want a tar > replacement and an rsync replacement. Oracle might be able to afford > to throw programmers at that sort of thing, but where are you going to > get volunteers for tasks as mind-numbing as maintaining a PG-specific > tar replacement? Why would it have to replicate the functionality of tar or rsync? AFAICT we'd only need the ability to produce something that could be consummed by either a postgres backend or some other utility of our own creation. I also think it'd be fine to forgo the rsync capabilities, at least in an initial version. Come to think of it, someone not too long ago was proposing an API to allow a 'PITR slave' to subscribe to a master for WAL segments/changes; it seems logical to me for that API to also provide the ability to send relation data as well. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Think harder: we are only safe if the first write to a given page after >> it's mis-copied by the archiver is a full page write. The requirement >> therefore continues after pg_stop_backup. Unless you want to add >> infrastructure to keep track for *every page* in the DB of whether it's >> been fully written since the last backup? > I am confused. Since we checkpoint during pg_start_backup(), isn't any > write to a file while the tar backup is going on going to be a full page > write? And once we pg_stop_backup(), do we need full page writes? Hm. The case I was concerned about was where a page is never written to while the backup occurs (thus not triggering any full-page WAL entry), and then the first post-backup write is partial. However, if the backup is guaranteed to have captured a non-torn copy of such a page then there shouldn't be any problem. So if we consider the initial checkpoint to be a *required part* of pg_start_backup (right now it is not) then maybe we can get away with this. It needs more eyeballs on it though ... after having been burnt once by full_page_writes, I'm pretty shy ... regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> Think harder: we are only safe if the first write to a given page after > >> it's mis-copied by the archiver is a full page write. The requirement > >> therefore continues after pg_stop_backup. Unless you want to add > >> infrastructure to keep track for *every page* in the DB of whether it's > >> been fully written since the last backup? > > > I am confused. Since we checkpoint during pg_start_backup(), isn't any > > write to a file while the tar backup is going on going to be a full page > > write? And once we pg_stop_backup(), do we need full page writes? > > Hm. The case I was concerned about was where a page is never written > to while the backup occurs (thus not triggering any full-page WAL > entry), and then the first post-backup write is partial. However, if > the backup is guaranteed to have captured a non-torn copy of such a page > then there shouldn't be any problem. So if we consider the initial > checkpoint to be a *required part* of pg_start_backup (right now it is > not) then maybe we can get away with this. It needs more eyeballs on it > though ... after having been burnt once by full_page_writes, I'm pretty > shy ... Right. The comment in pg_start_backup() has to be updated: /* * Force a CHECKPOINT. This is not strictly necessary, but it seems like * a good idea to minimize the amountof past WAL needed to use the * backup. Also, this guarantees that two successive backup runs will * have differentcheckpoint positions and hence different history file * names, even if nothing happened in between. */ RequestCheckpoint(true,false); This is a much simpler fix than people talking about writing their own backup programs. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
> Come to think of it, someone not too long ago was proposing an API to > allow a 'PITR slave' to subscribe to a master for WAL segments/changes; > it seems logical to me for that API to also provide the ability to send > relation data as well. Is that what replication is for? Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Bruce Momjian <pgman@candle.pha.pa.us> writes: > This is a much simpler fix than people talking about writing their own > backup programs. Well, it's still not exactly trivial. The hack that was being proposed involved having the admin manually do full_page_writes = ON (ie, edit config file and SIGHUP)pg_start_backuptake backup dumppg_stop_backupfull_page_writes = OFF(ie, edit config file and SIGHUP) with some additions to pg_start_backup/pg_stop_backup to complain if full_page_writes isn't ON. Aside from being a PITA, this isn't at all secure, first for the obvious reason that we're only checking full_page_writes at start/stop and not whether it was on for the whole interval, and second because SIGHUP is asynchronous. Backends respond to the signal when they feel like it (in practice, upon starting a new interactive command) and so it'd be quite possible for a long-running query to still be executing with full_page_writes off long after the pg_start_backup has occurred. If we were to do this, I'd want some more-bulletproof mechanism for forcing full_page_writes on during the backup. We could probably keep a "backup in progress" flag in shared memory, and examine that along with the GUC variable before deciding to omit a full-page write. I seem to recall that there were previous proposals for such a flag, which I resisted because I didn't want any macroscopic user-visible change in behavior during a backup. But forcing full-page WAL writes is something I could live with as a "backup mode" behavior. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > This is a much simpler fix than people talking about writing their own > > backup programs. > > Well, it's still not exactly trivial. The hack that was being proposed > involved having the admin manually do > > full_page_writes = ON (ie, edit config file and SIGHUP) > pg_start_backup > take backup dump > pg_stop_backup > full_page_writes = OFF (ie, edit config file and SIGHUP) > > with some additions to pg_start_backup/pg_stop_backup to complain if > full_page_writes isn't ON. Aside from being a PITA, this isn't at > all secure, first for the obvious reason that we're only checking > full_page_writes at start/stop and not whether it was on for the whole > interval, and second because SIGHUP is asynchronous. Backends respond > to the signal when they feel like it (in practice, upon starting a new > interactive command) and so it'd be quite possible for a long-running > query to still be executing with full_page_writes off long after the > pg_start_backup has occurred. > > If we were to do this, I'd want some more-bulletproof mechanism for > forcing full_page_writes on during the backup. We could probably > keep a "backup in progress" flag in shared memory, and examine that > along with the GUC variable before deciding to omit a full-page write. > > I seem to recall that there were previous proposals for such a flag, > which I resisted because I didn't want any macroscopic user-visible > change in behavior during a backup. But forcing full-page WAL writes > is something I could live with as a "backup mode" behavior. Yes, good point. The setting has to be seen by all backends at the same time, so yea, a shared memory variable seems required. The manual method is clearly a loser. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> If we were to do this, I'd want some more-bulletproof mechanism for >> forcing full_page_writes on during the backup. We could probably >> keep a "backup in progress" flag in shared memory, and examine that >> along with the GUC variable before deciding to omit a full-page write. > Yes, good point. The setting has to be seen by all backends at the same > time, so yea, a shared memory variable seems required. I've applied a patch for this. On reflection, the CHECKPOINT during pg_start_backup was actually necessary for torn-page safety even without full_page_writes off. The reason is that the torn-page risk occurs when we write a page from shared memory, not when we modify it in memory. Without a CHECKPOINT, a page modified just before pg_start_backup could be dumped during the backup and then be saved in a torn state, even though no WAL record for it is emitted anytime during the backup procedure. So that comment's been wrong all along. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> If we were to do this, I'd want some more-bulletproof mechanism for > >> forcing full_page_writes on during the backup. We could probably > >> keep a "backup in progress" flag in shared memory, and examine that > >> along with the GUC variable before deciding to omit a full-page write. > > > Yes, good point. The setting has to be seen by all backends at the same > > time, so yea, a shared memory variable seems required. > > I've applied a patch for this. On reflection, the CHECKPOINT during > pg_start_backup was actually necessary for torn-page safety even without > full_page_writes off. The reason is that the torn-page risk occurs when > we write a page from shared memory, not when we modify it in memory. > Without a CHECKPOINT, a page modified just before pg_start_backup could > be dumped during the backup and then be saved in a torn state, even > though no WAL record for it is emitted anytime during the backup > procedure. So that comment's been wrong all along. Great, yea, checkpoing syncs up the dirty buffers with the file system, and it is true we need that to happen before the backup begins. The idea of creating functions to mark start/stop of backup has clearly been a win here. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, Apr 17, 2006 at 03:00:58PM -0400, Tom Lane wrote: > I've applied a patch for this. On reflection, the CHECKPOINT during > pg_start_backup was actually necessary for torn-page safety even without > full_page_writes off. The reason is that the torn-page risk occurs when > we write a page from shared memory, not when we modify it in memory. > Without a CHECKPOINT, a page modified just before pg_start_backup could > be dumped during the backup and then be saved in a torn state, even > though no WAL record for it is emitted anytime during the backup > procedure. So that comment's been wrong all along. Are you going to back-patch this? If I understand correctly current behavior could mean people using PITR may have invalid backups. In the meantime, perhaps we should send an email to -annouce recommending that folks issue a CHEKCPOINT; after pg_start_backup and before initiating the filesystem copy. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Mon, Apr 17, 2006 at 03:00:58PM -0400, Tom Lane wrote: > > I've applied a patch for this. On reflection, the CHECKPOINT during > > pg_start_backup was actually necessary for torn-page safety even without > > full_page_writes off. The reason is that the torn-page risk occurs when > > we write a page from shared memory, not when we modify it in memory. > > Without a CHECKPOINT, a page modified just before pg_start_backup could > > be dumped during the backup and then be saved in a torn state, even > > though no WAL record for it is emitted anytime during the backup > > procedure. So that comment's been wrong all along. > > Are you going to back-patch this? If I understand correctly current > behavior could mean people using PITR may have invalid backups. In the > meantime, perhaps we should send an email to -annouce recommending that > folks issue a CHEKCPOINT; after pg_start_backup and before initiating > the filesystem copy. We are disabling full_page_writes for 8.1.4, so they should be fine. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Jim C. Nasby wrote: > > On Mon, Apr 17, 2006 at 03:00:58PM -0400, Tom Lane wrote: > > > I've applied a patch for this. On reflection, the CHECKPOINT during > > > pg_start_backup was actually necessary for torn-page safety even without > > > full_page_writes off. The reason is that the torn-page risk occurs when > > > we write a page from shared memory, not when we modify it in memory. > > > Without a CHECKPOINT, a page modified just before pg_start_backup could > > > be dumped during the backup and then be saved in a torn state, even > > > though no WAL record for it is emitted anytime during the backup > > > procedure. So that comment's been wrong all along. > > > > Are you going to back-patch this? If I understand correctly current > > behavior could mean people using PITR may have invalid backups. In the > > meantime, perhaps we should send an email to -annouce recommending that > > folks issue a CHEKCPOINT; after pg_start_backup and before initiating > > the filesystem copy. > > We are disabling full_page_writes for 8.1.4, so they should be fine. Just to clarify, 8.1.4 will remove control for turning off full_page_writes, but 8.2 will allow such control, and allow it can be used with PITR because we will automatically turn it on during file system backup. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sun, 2006-04-16 at 16:44 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > It seems that we should write an API to allow a backup device to ask for > > blocks from the database. > > I don't think we have the manpower or interest to develop and maintain > our own backup tool --- or tools, actually, as you'd at least want a tar > replacement and an rsync replacement. Oracle might be able to afford > to throw programmers at that sort of thing, but where are you going to > get volunteers for tasks as mind-numbing as maintaining a PG-specific > tar replacement? Agreed. The only reason to do that would be to combine it with an incremental backup solution also, so that some positive benefit also came from the work. I think an easier answer must be to make pg_start_backup() throw a checkpoint, then hold any database writes until pg_stop_backup() is called. (In the case of full_page_writes = off and fsync = on only). That way all the data is fsynced to disk and the physical backup is guaranteed to see whole blocks always, as we need it to. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com/
Ühel kenal päeval, E, 2006-04-17 kell 17:14, kirjutas Bruce Momjian: > Jim C. Nasby wrote: > > Are you going to back-patch this? If I understand correctly current > > behavior could mean people using PITR may have invalid backups. In the > > meantime, perhaps we should send an email to -annouce recommending that > > folks issue a CHEKCPOINT; after pg_start_backup and before initiating > > the filesystem copy. > > We are disabling full_page_writes for 8.1.4, so they should be fine. Except that people currently using full_page_writes=off on 8.1 may see a sudden drop in performance after upgrading. Do you have an estimate, how big the impact is ? ----------- Hannu
Hannu Krosing wrote: > ?hel kenal p?eval, E, 2006-04-17 kell 17:14, kirjutas Bruce Momjian: > > Jim C. Nasby wrote: > > > Are you going to back-patch this? If I understand correctly current > > > behavior could mean people using PITR may have invalid backups. In the > > > meantime, perhaps we should send an email to -annouce recommending that > > > folks issue a CHEKCPOINT; after pg_start_backup and before initiating > > > the filesystem copy. > > > > We are disabling full_page_writes for 8.1.4, so they should be fine. > > Except that people currently using full_page_writes=off on 8.1 may see a sudden > drop in performance after upgrading. Yea, but if it can cause corruption, we have no choice. It will be mentioned in the release notes. > Do you have an estimate, how big the impact is ? Nope. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 2006-04-18 at 08:44 -0400, Bruce Momjian wrote: > Hannu Krosing wrote: > > ?hel kenal p?eval, E, 2006-04-17 kell 17:14, kirjutas Bruce Momjian: > > > Jim C. Nasby wrote: > > > > Are you going to back-patch this? If I understand correctly current > > > > behavior could mean people using PITR may have invalid backups. In the > > > > meantime, perhaps we should send an email to -annouce recommending that > > > > folks issue a CHEKCPOINT; after pg_start_backup and before initiating > > > > the filesystem copy. > > > > > > We are disabling full_page_writes for 8.1.4, so they should be fine. > > > > Except that people currently using full_page_writes=off on 8.1 may see a sudden > > drop in performance after upgrading. > > Yea, but if it can cause corruption, we have no choice. It will be > mentioned in the release notes. Perhaps would should make it more visible then that? The postgresql.org website has said, PostgreSQL 8.1 released since it was... perhaps it is time to make it say: PostgreSQL 8.1.4 Critical Patch released? Joshua D. Drake > > > Do you have an estimate, how big the impact is ? > > Nope. > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/