Thread: PITR Backups

PITR Backups

From
Dan Gorman
Date:
Hi -
   I'm looking at ways to do clean PITR backups. Currently we're
pg_dumping our data in some cases when compressed is about 100GB.
Needless to say it's slow and IO intensive on both the host and the
backup server.

   All of our databases are on NetApp storage and I have been looking
at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
replica) for backing up our databases. The problem is because there
is no write-suspend or even a 'hot backup mode' for postgres it's
very plausible that the database has data in RAM that hasn't been
written and will corrupt the data. NetApp suggested that if we do a
SnapMirror, we do a couple in succession ( < 1s) so should one be
corrupt, we try the next one. They said oracle does something similar.

   Is there a better way to quiesce the database without shutting it
down? Some of our databases are doing about 250,000 commits/min.

Best Regards,
Dan Gorman


Re: PITR Backups

From
Tom Lane
Date:
Dan Gorman <dgorman@hi5.com> writes:
>    All of our databases are on NetApp storage and I have been looking
> at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
> replica) for backing up our databases. The problem is because there
> is no write-suspend or even a 'hot backup mode' for postgres it's
> very plausible that the database has data in RAM that hasn't been
> written and will corrupt the data.

I think you need to read the fine manual a bit more closely:
http://www.postgresql.org/docs/8.2/static/backup-file.html
If the NetApp does provide an instantaneous-snapshot operation then
it will work fine; you just have to be sure the snap covers both
data and WAL files.

Alternatively, you can use a PITR base backup as suggested here:
http://www.postgresql.org/docs/8.2/static/continuous-archiving.html

In either case, the key point is that you need both the data files
and matching WAL files.

            regards, tom lane

Re: PITR Backups

From
Toru SHIMOGAKI
Date:
Tom Lane wrote:
> Dan Gorman <dgorman@hi5.com> writes:
>>    All of our databases are on NetApp storage and I have been looking
>> at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
>> replica) for backing up our databases. The problem is because there
>> is no write-suspend or even a 'hot backup mode' for postgres it's
>> very plausible that the database has data in RAM that hasn't been
>> written and will corrupt the data.

> Alternatively, you can use a PITR base backup as suggested here:
> http://www.postgresql.org/docs/8.2/static/continuous-archiving.html

I think Dan's problem is important if we use PostgreSQL to a large size database:

- When we take a PITR base backup with hardware level snapshot operation
  (not filesystem level) which a lot of storage vender provide, the backup data
  can be corrupted as Dan said. During recovery we can't even read it,
  especially if meta-data was corrupted.

- If we don't use hardware level snapshot operation, it takes long time to take
  a large backup data, and a lot of full-page-written WAL files are made.

So, I think users need a new feature not to write out heap pages during taking a
backup.

Any comments?

Best regards,

--
Toru SHIMOGAKI<shimogaki.toru@oss.ntt.co.jp>
NTT Open Source Software Center


Re: PITR Backups

From
"Joshua D. Drake"
Date:
Toru SHIMOGAKI wrote:
> Tom Lane wrote:

> - When we take a PITR base backup with hardware level snapshot operation
>   (not filesystem level) which a lot of storage vender provide, the backup data
>   can be corrupted as Dan said. During recovery we can't even read it,
>   especially if meta-data was corrupted.
>
> - If we don't use hardware level snapshot operation, it takes long time to take
>   a large backup data, and a lot of full-page-written WAL files are made.

Does it? I have done it with fairly large databases without issue.

Joshua D. Drake


>
> So, I think users need a new feature not to write out heap pages during taking a
> backup.
>
> Any comments?
>
> Best regards,
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: PITR Backups

From
Steve Atkins
Date:
On Jun 21, 2007, at 7:30 PM, Toru SHIMOGAKI wrote:

>
> Tom Lane wrote:
>> Dan Gorman <dgorman@hi5.com> writes:
>>>    All of our databases are on NetApp storage and I have been
>>> looking
>>> at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
>>> replica) for backing up our databases. The problem is because there
>>> is no write-suspend or even a 'hot backup mode' for postgres it's
>>> very plausible that the database has data in RAM that hasn't been
>>> written and will corrupt the data.
>
>> Alternatively, you can use a PITR base backup as suggested here:
>> http://www.postgresql.org/docs/8.2/static/continuous-archiving.html
>
> I think Dan's problem is important if we use PostgreSQL to a large
> size database:
>
> - When we take a PITR base backup with hardware level snapshot
> operation
>   (not filesystem level) which a lot of storage vender provide, the
> backup data
>   can be corrupted as Dan said. During recovery we can't even read it,
>   especially if meta-data was corrupted.

I can't see any explanation for how this could happen, other
than your hardware vendor is lying about snapshot ability.

What problems have you actually seen?

Cheers,
   Steve





Re: PITR Backups

From
Toru SHIMOGAKI
Date:
Steve Atkins wrote:

>> - When we take a PITR base backup with hardware level snapshot operation
>>   (not filesystem level) which a lot of storage vender provide, the
>> backup data
>>   can be corrupted as Dan said. During recovery we can't even read it,
>>   especially if meta-data was corrupted.
>
> I can't see any explanation for how this could happen, other
> than your hardware vendor is lying about snapshot ability.

All of the hardware vendors I asked always said:

  "The hardware level snapshot has nothing to do with filesystem condition and
of course with what data has been written from operating system chache to the
hard disk platter. It just copies byte data on storage to the other volume.

So, if any data is written during taking snapshot, we can't assurance data
correctness *strictly* .

In Oracle, no table data is written between BEGIN BACKUP and END BACKUP, and it
is not a problem REDO is written..."

I'd like to know the correct information if the explanation has any mistakes, or
a good way to avoid the probrem.

I think there are users who want to migrate Oracle to PostgreSQL but can't
because of the problem as above.


Best regards,

--
Toru SHIMOGAKI<shimogaki.toru@oss.ntt.co.jp>
NTT Open Source Software Center


Re: PITR Backups

From
Toru SHIMOGAKI
Date:
Joshua D. Drake wrote:

>> - If we don't use hardware level snapshot operation, it takes long time to take
>>   a large backup data, and a lot of full-page-written WAL files are made.
>
> Does it? I have done it with fairly large databases without issue.

You mean hardware snapshot? I know taking a backup using rsync(or tar, cp?) as a
n online backup method is not so a big problem as documented. But it just take a
long time if we handle a terabyte database. We have to VACUUM and other batch
processes  to the large database as well, so we don't want to take a long time
to take a backup...

Regards,

--
Toru SHIMOGAKI<shimogaki.toru@oss.ntt.co.jp>
NTT Open Source Software Center


Re: PITR Backups

From
Dan Gorman
Date:
Here is an example. Most of the snap shots worked fine, but I did get
this once:

Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21
00:39:43 PDTLOG:  redo done at 71/99870670
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21
00:39:43 PDTWARNING:  page 28905 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [11-1] 2007-06-21
00:39:43 PDTWARNING:  page 13626 of relation 1663/16384/76716 did not
exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [12-1] 2007-06-21
00:39:43 PDTWARNING:  page 28904 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [13-1] 2007-06-21
00:39:43 PDTWARNING:  page 26711 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [14-1] 2007-06-21
00:39:43 PDTWARNING:  page 28900 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [15-1] 2007-06-21
00:39:43 PDTWARNING:  page 3535208 of relation 1663/16384/33190 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [16-1] 2007-06-21
00:39:43 PDTWARNING:  page 28917 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [17-1] 2007-06-21
00:39:43 PDTWARNING:  page 3535207 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [18-1] 2007-06-21
00:39:43 PDTWARNING:  page 28916 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [19-1] 2007-06-21
00:39:43 PDTWARNING:  page 28911 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [20-1] 2007-06-21
00:39:43 PDTWARNING:  page 26708 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [21-1] 2007-06-21
00:39:43 PDTWARNING:  page 28914 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [22-1] 2007-06-21
00:39:43 PDTWARNING:  page 28909 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [23-1] 2007-06-21
00:39:43 PDTWARNING:  page 28908 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [24-1] 2007-06-21
00:39:43 PDTWARNING:  page 28913 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [25-1] 2007-06-21
00:39:43 PDTWARNING:  page 26712 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [26-1] 2007-06-21
00:39:43 PDTWARNING:  page 28918 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [27-1] 2007-06-21
00:39:43 PDTWARNING:  page 28912 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [28-1] 2007-06-21
00:39:43 PDTWARNING:  page 3535209 of relation 1663/16384/33190 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [29-1] 2007-06-21
00:39:43 PDTWARNING:  page 28907 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [30-1] 2007-06-21
00:39:43 PDTWARNING:  page 28906 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [31-1] 2007-06-21
00:39:43 PDTWARNING:  page 26713 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [32-1] 2007-06-21
00:39:43 PDTWARNING:  page 17306 of relation 1663/16384/76710 did not
exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [33-1] 2007-06-21
00:39:43 PDTWARNING:  page 26706 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [34-1] 2007-06-21
00:39:43 PDTWARNING:  page 800226 of relation 1663/16384/33204 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [35-1] 2007-06-21
00:39:43 PDTWARNING:  page 28915 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [36-1] 2007-06-21
00:39:43 PDTWARNING:  page 26710 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [37-1] 2007-06-21
00:39:43 PDTWARNING:  page 28903 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [38-1] 2007-06-21
00:39:43 PDTWARNING:  page 28902 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [39-1] 2007-06-21
00:39:43 PDTWARNING:  page 28910 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21
00:39:43 PDTPANIC:  WAL contains references to invalid pages
Jun 21 00:39:43 sfmedstorageha001 postgres[3503]: [1-1] 2007-06-21
00:39:43 PDTLOG:  startup process (PID 3506) was terminated by signal 6
Jun 21 00:39:43 sfmedstorageha001 postgres[3503]: [2-1] 2007-06-21
00:39:43 PDTLOG:  aborting startup due to startup process failure
Jun 21 00:39:43 sfmedstorageha001 postgres[3505]: [1-1] 2007-06-21
00:39:43 PDTLOG:  logger shutting down
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-1] 2007-06-21
00:40:39 PDTLOG:  database system was interrupted while in recovery
at 2007-06-21 00:36:40 PDT
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-2] 2007-06-21
00:40:39 PDTHINT:  This probably means that some data is corrupted
and you will have to use the last backup for
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-3]  recovery.
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [2-1] 2007-06-21
00:40:39 PDTLOG:  checkpoint record is at 71/9881E928
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [3-1] 2007-06-21
00:40:39 PDTLOG:  redo record is at 71/986BF148; undo record is at
0/0; shutdown FALSE
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [4-1] 2007-06-21
00:40:39 PDTLOG:  next transaction ID: 0/2871389429; next OID: 83795
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [5-1] 2007-06-21
00:40:39 PDTLOG:  next MultiXactId: 1; next MultiXactOffset: 0
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [6-1] 2007-06-21
00:40:39 PDTLOG:  database system was not properly shut down;
automatic recovery in progress
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [7-1] 2007-06-21
00:40:39 PDTLOG:  redo starts at 71/986BF148
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [8-1] 2007-06-21
00:40:39 PDTLOG:  record with zero length at 71/998706A8
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [9-1] 2007-06-21
00:40:39 PDTLOG:  redo done at 71/99870670
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [10-1] 2007-06-21
00:40:39 PDTWARNING:  page 28905 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [11-1] 2007-06-21
00:40:39 PDTWARNING:  page 13626 of relation 1663/16384/76716 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [12-1] 2007-06-21
00:40:39 PDTWARNING:  page 28904 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [13-1] 2007-06-21
00:40:39 PDTWARNING:  page 26711 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [14-1] 2007-06-21
00:40:39 PDTWARNING:  page 28900 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [15-1] 2007-06-21
00:40:39 PDTWARNING:  page 3535208 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [16-1] 2007-06-21
00:40:39 PDTWARNING:  page 28917 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [17-1] 2007-06-21
00:40:39 PDTWARNING:  page 3535207 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [18-1] 2007-06-21
00:40:39 PDTWARNING:  page 28916 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [19-1] 2007-06-21
00:40:39 PDTWARNING:  page 28911 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [20-1] 2007-06-21
00:40:39 PDTWARNING:  page 26708 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [21-1] 2007-06-21
00:40:39 PDTWARNING:  page 28914 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [22-1] 2007-06-21
00:40:39 PDTWARNING:  page 28909 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [23-1] 2007-06-21
00:40:39 PDTWARNING:  page 28908 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [24-1] 2007-06-21
00:40:39 PDTWARNING:  page 28913 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [25-1] 2007-06-21
00:40:39 PDTWARNING:  page 26712 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [26-1] 2007-06-21
00:40:39 PDTWARNING:  page 28918 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [27-1] 2007-06-21
00:40:39 PDTWARNING:  page 28912 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [28-1] 2007-06-21
00:40:39 PDTWARNING:  page 3535209 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [29-1] 2007-06-21
00:40:39 PDTWARNING:  page 28907 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [30-1] 2007-06-21
00:40:39 PDTWARNING:  page 28906 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [31-1] 2007-06-21
00:40:39 PDTWARNING:  page 26713 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [32-1] 2007-06-21
00:40:39 PDTWARNING:  page 17306 of relation 1663/16384/76710 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [33-1] 2007-06-21
00:40:39 PDTWARNING:  page 26706 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [34-1] 2007-06-21
00:40:39 PDTWARNING:  page 800226 of relation 1663/16384/33204 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [35-1] 2007-06-21
00:40:39 PDTWARNING:  page 28915 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [36-1] 2007-06-21
00:40:39 PDTWARNING:  page 26710 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [37-1] 2007-06-21
00:40:39 PDTWARNING:  page 28903 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [38-1] 2007-06-21
00:40:39 PDTWARNING:  page 28902 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [39-1] 2007-06-21
00:40:39 PDTWARNING:  page 28910 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [40-1] 2007-06-21
00:40:39 PDTPANIC:  WAL contains references to invalid pages
Jun 21 00:40:39 sfmedstorageha001 postgres[3755]: [1-1] 2007-06-21
00:40:39 PDTLOG:  startup process (PID 3757) was terminated by signal 6
Jun 21 00:40:39 sfmedstorageha001 postgres[3755]: [2-1] 2007-06-21
00:40:39 PDTLOG:  aborting startup due to startup process failure
Jun 21 00:40:39 sfmedstorageha001 postgres[3756]: [1-1] 2007-06-21
00:40:39 PDTLOG:  logger shutting down


On Jun 22, 2007, at 12:30 AM, Toru SHIMOGAKI wrote:

>
> Steve Atkins wrote:
>
>>> - When we take a PITR base backup with hardware level snapshot
>>> operation
>>>   (not filesystem level) which a lot of storage vender provide,
>>> the backup data
>>>   can be corrupted as Dan said. During recovery we can't even
>>> read it,
>>>   especially if meta-data was corrupted.
>> I can't see any explanation for how this could happen, other
>> than your hardware vendor is lying about snapshot ability.
>
> All of the hardware vendors I asked always said:
>
>  "The hardware level snapshot has nothing to do with filesystem
> condition and of course with what data has been written from
> operating system chache to the hard disk platter. It just copies
> byte data on storage to the other volume.
>
> So, if any data is written during taking snapshot, we can't
> assurance data correctness *strictly* .
>
> In Oracle, no table data is written between BEGIN BACKUP and END
> BACKUP, and it is not a problem REDO is written..."
>
> I'd like to know the correct information if the explanation has any
> mistakes, or a good way to avoid the probrem.
>
> I think there are users who want to migrate Oracle to PostgreSQL
> but can't because of the problem as above.
>
>
> Best regards,
>
> --
> Toru SHIMOGAKI<shimogaki.toru@oss.ntt.co.jp>
> NTT Open Source Software Center
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster



Re: PITR Backups

From
Toru SHIMOGAKI
Date:
Dan Gorman wrote:
> Here is an example. Most of the snap shots worked fine, but I did get
> this once:

Thank you for your example. I'd appreciate it if I'd get any responses; whether
we should tackle the problem for 8.4?

Regards,

--
Toru SHIMOGAKI<shimogaki.toru@oss.ntt.co.jp>
NTT Open Source Software Center


Re: PITR Backups

From
"Simon Riggs"
Date:
On Fri, 2007-06-22 at 11:30 +0900, Toru SHIMOGAKI wrote:
> Tom Lane wrote:
> > Dan Gorman <dgorman@hi5.com> writes:
> >>    All of our databases are on NetApp storage and I have been looking
> >> at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
> >> replica) for backing up our databases. The problem is because there
> >> is no write-suspend or even a 'hot backup mode' for postgres it's
> >> very plausible that the database has data in RAM that hasn't been
> >> written and will corrupt the data.
>
> > Alternatively, you can use a PITR base backup as suggested here:
> > http://www.postgresql.org/docs/8.2/static/continuous-archiving.html
>
> I think Dan's problem is important if we use PostgreSQL to a large size database:
>
> - When we take a PITR base backup with hardware level snapshot operation
>   (not filesystem level) which a lot of storage vender provide, the backup data
>   can be corrupted as Dan said. During recovery we can't even read it,
>   especially if meta-data was corrupted.
>
> - If we don't use hardware level snapshot operation, it takes long time to take
>   a large backup data, and a lot of full-page-written WAL files are made.
>
> So, I think users need a new feature not to write out heap pages during taking a
> backup.

Your worries are unwarranted, IMHO. It appears Dan was taking a snapshot
without having read the procedure as clearly outlined in the manual.

pg_start_backup() flushes all currently dirty blocks to disk as part of
a checkpoint. If you snapshot after that point, then you will have all
the data blocks required from which to correctly roll forward. On its
own, the snapshot is an inconsistent backup and will give errors as Dan
shows. It is only when the snapshot is used as the base backup in a full
continuous recovery that the inconsistencies are removed and the
database is fully and correctly restored.

pg_start_backup() is the direct analogue of Oracle's ALTER DATABASE
BEGIN BACKUP. Snapshots work with Oracle too, in much the same way.

After reviewing the manual, if you honestly think there is a problem,
please let me know and I'll work with you to investigate.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: PITR Backups

From
Dan Gorman
Date:
This snapshot is done at the LUN (filer) level, postgres is un-aware
we're creating a backup, so I'm not sure how pg_start_backup() plays
into this ...

Regards,
Dan Gorman

On Jun 22, 2007, at 3:55 AM, Simon Riggs wrote:

> On Fri, 2007-06-22 at 11:30 +0900, Toru SHIMOGAKI wrote:
>> Tom Lane wrote:
>>> Dan Gorman <dgorman@hi5.com> writes:
>>>>    All of our databases are on NetApp storage and I have been
>>>> looking
>>>> at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
>>>> replica) for backing up our databases. The problem is because there
>>>> is no write-suspend or even a 'hot backup mode' for postgres it's
>>>> very plausible that the database has data in RAM that hasn't been
>>>> written and will corrupt the data.
>>
>>> Alternatively, you can use a PITR base backup as suggested here:
>>> http://www.postgresql.org/docs/8.2/static/continuous-archiving.html
>>
>> I think Dan's problem is important if we use PostgreSQL to a large
>> size database:
>>
>> - When we take a PITR base backup with hardware level snapshot
>> operation
>>   (not filesystem level) which a lot of storage vender provide,
>> the backup data
>>   can be corrupted as Dan said. During recovery we can't even read
>> it,
>>   especially if meta-data was corrupted.
>>
>> - If we don't use hardware level snapshot operation, it takes long
>> time to take
>>   a large backup data, and a lot of full-page-written WAL files
>> are made.
>>
>> So, I think users need a new feature not to write out heap pages
>> during taking a
>> backup.
>
> Your worries are unwarranted, IMHO. It appears Dan was taking a
> snapshot
> without having read the procedure as clearly outlined in the manual.
>
> pg_start_backup() flushes all currently dirty blocks to disk as
> part of
> a checkpoint. If you snapshot after that point, then you will have all
> the data blocks required from which to correctly roll forward. On its
> own, the snapshot is an inconsistent backup and will give errors as
> Dan
> shows. It is only when the snapshot is used as the base backup in a
> full
> continuous recovery that the inconsistencies are removed and the
> database is fully and correctly restored.
>
> pg_start_backup() is the direct analogue of Oracle's ALTER DATABASE
> BEGIN BACKUP. Snapshots work with Oracle too, in much the same way.
>
> After reviewing the manual, if you honestly think there is a problem,
> please let me know and I'll work with you to investigate.
>
> --
>   Simon Riggs
>   EnterpriseDB   http://www.enterprisedb.com
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match



Re: PITR Backups

From
"Simon Riggs"
Date:
On Fri, 2007-06-22 at 04:10 -0700, Dan Gorman wrote:
> This snapshot is done at the LUN (filer) level, postgres is un-aware
> we're creating a backup, so I'm not sure how pg_start_backup() plays
> into this ...

Postgres *is* completely unaware that you intend to take a backup, that
is *exactly* why you must tell the server you intend to make a backup,
using pg_start_backup() and pg_stop_backup(). That way Postgres will
flush its buffers, so that they are present on storage when you make the
backup.

Is the procedure for Oracle or any other transactional RDBMS any
different?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: PITR Backups

From
Dan Gorman
Date:
Ah okay. I understand now. So how can I signal postgres I'm about to
take a backup ? (read doc from previous email ? )

Regards,
Dan Gorman

On Jun 22, 2007, at 4:38 AM, Simon Riggs wrote:

> On Fri, 2007-06-22 at 04:10 -0700, Dan Gorman wrote:
>> This snapshot is done at the LUN (filer) level, postgres is un-aware
>> we're creating a backup, so I'm not sure how pg_start_backup() plays
>> into this ...
>
> Postgres *is* completely unaware that you intend to take a backup,
> that
> is *exactly* why you must tell the server you intend to make a backup,
> using pg_start_backup() and pg_stop_backup(). That way Postgres will
> flush its buffers, so that they are present on storage when you
> make the
> backup.
>
> Is the procedure for Oracle or any other transactional RDBMS any
> different?
>
> --
>   Simon Riggs
>   EnterpriseDB   http://www.enterprisedb.com
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match



Re: PITR Backups

From
Kurt Overberg
Date:
You can use the psql command line to run:

"select pg_start_backup();"

...then when you're done,

"select pg_stop_backup();"

if you want an example from the unix command line:

psql -c "select pg_start_backup();" database_name

then

psql -c "select pg_stop_backup();" database_name

/kurt


On Jun 22, 2007, at 7:51 AM, Dan Gorman wrote:

> Ah okay. I understand now. So how can I signal postgres I'm about
> to take a backup ? (read doc from previous email ? )
>
> Regards,
> Dan Gorman
>
> On Jun 22, 2007, at 4:38 AM, Simon Riggs wrote:
>
>> On Fri, 2007-06-22 at 04:10 -0700, Dan Gorman wrote:
>>> This snapshot is done at the LUN (filer) level, postgres is un-aware
>>> we're creating a backup, so I'm not sure how pg_start_backup() plays
>>> into this ...
>>
>> Postgres *is* completely unaware that you intend to take a backup,
>> that
>> is *exactly* why you must tell the server you intend to make a
>> backup,
>> using pg_start_backup() and pg_stop_backup(). That way Postgres will
>> flush its buffers, so that they are present on storage when you
>> make the
>> backup.
>>
>> Is the procedure for Oracle or any other transactional RDBMS any
>> different?
>>
>> --
>>   Simon Riggs
>>   EnterpriseDB   http://www.enterprisedb.com
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>        choose an index scan if your joining column's datatypes do not
>>        match
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq


Re: PITR Backups

From
Andreas Kostyrka
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Wasn't it select pg_start_backup('backuplabel');?

Andreas

Kurt Overberg wrote:
> You can use the psql command line to run:
>
> "select pg_start_backup();"
>
> ...then when you're done,
>
> "select pg_stop_backup();"
>
> if you want an example from the unix command line:
>
> psql -c "select pg_start_backup();" database_name
>
> then
>
> psql -c "select pg_stop_backup();" database_name
>
> /kurt
>
>
> On Jun 22, 2007, at 7:51 AM, Dan Gorman wrote:
>
>> Ah okay. I understand now. So how can I signal postgres I'm about to
>> take a backup ? (read doc from previous email ? )
>>
>> Regards,
>> Dan Gorman
>>
>> On Jun 22, 2007, at 4:38 AM, Simon Riggs wrote:
>>
>>> On Fri, 2007-06-22 at 04:10 -0700, Dan Gorman wrote:
>>>> This snapshot is done at the LUN (filer) level, postgres is un-aware
>>>> we're creating a backup, so I'm not sure how pg_start_backup() plays
>>>> into this ...
>>>
>>> Postgres *is* completely unaware that you intend to take a backup, that
>>> is *exactly* why you must tell the server you intend to make a backup,
>>> using pg_start_backup() and pg_stop_backup(). That way Postgres will
>>> flush its buffers, so that they are present on storage when you make the
>>> backup.
>>>
>>> Is the procedure for Oracle or any other transactional RDBMS any
>>> different?
>>>
>>> --  Simon Riggs
>>>   EnterpriseDB   http://www.enterprisedb.com
>>>
>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>>        choose an index scan if your joining column's datatypes do not
>>>        match
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: Have you checked our extensive FAQ?
>>
>>               http://www.postgresql.org/docs/faq
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGe7zyHJdudm4KnO0RAgyaAJ9Vz52izICKYkep/wZpJMFPkfAiuQCfZcjB
yUYM6rYu18HmTAs3F4VaGJo=
=n3vX
-----END PGP SIGNATURE-----

Re: PITR Backups

From
"Simon Riggs"
Date:
On Fri, 2007-06-22 at 17:23 +0900, Toru SHIMOGAKI wrote:
> Dan Gorman wrote:
> > Here is an example. Most of the snap shots worked fine, but I did get
> > this once:
>
> Thank you for your example. I'd appreciate it if I'd get any responses; whether
> we should tackle the problem for 8.4?

If you see a problem, please explain what it is, after careful review of
the manual.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: PITR Backups

From
"Joshua D. Drake"
Date:
Toru SHIMOGAKI wrote:
> Joshua D. Drake wrote:
>
>>> - If we don't use hardware level snapshot operation, it takes long time to take
>>>   a large backup data, and a lot of full-page-written WAL files are made.
>> Does it? I have done it with fairly large databases without issue.
>
> You mean hardware snapshot?

Oh goodness no. :)

> I know taking a backup using rsync(or tar, cp?) as a
> n online backup method is not so a big problem as documented. But it just take a

I use rsync with pg_start/stop_backup and it works very well. Even on
databases that are TB in size.

> long time if we handle a terabyte database. We have to VACUUM and other batch
> processes  to the large database as well, so we don't want to take a long time
> to take a backup...

Ahh o.k. that makes sense. The difference here is probably how often we
take the snapshot. We take them very often to insure we don't have a ton
of logs we have to pull over.

Joshua D. Drake

>
> Regards,
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: PITR Backups

From
"Joshua D. Drake"
Date:
Toru SHIMOGAKI wrote:
>
> Steve Atkins wrote:
>
>>> - When we take a PITR base backup with hardware level snapshot operation
>>>   (not filesystem level) which a lot of storage vender provide, the
>>> backup data
>>>   can be corrupted as Dan said. During recovery we can't even read it,
>>>   especially if meta-data was corrupted.
>>
>> I can't see any explanation for how this could happen, other
>> than your hardware vendor is lying about snapshot ability.
>
> All of the hardware vendors I asked always said:
>
>  "The hardware level snapshot has nothing to do with filesystem
> condition and of course with what data has been written from operating
> system chache to the hard disk platter. It just copies byte data on
> storage to the other volume.

Right that has been my understanding as well.

Joshua D. Drake

>
> So, if any data is written during taking snapshot, we can't assurance
> data correctness *strictly* .
>
> In Oracle, no table data is written between BEGIN BACKUP and END BACKUP,
> and it is not a problem REDO is written..."
>
> I'd like to know the correct information if the explanation has any
> mistakes, or a good way to avoid the probrem.
>
> I think there are users who want to migrate Oracle to PostgreSQL but
> can't because of the problem as above.
>
>
> Best regards,
>


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: PITR Backups

From
Gregory Stark
Date:
>> So, if any data is written during taking snapshot, we can't assurance data
>> correctness *strictly* .

That sounds nothing like what I've heard called a "snapshot" before. Some
"filesystems" which aren't really filesystems but are also storage layer
drivers like Veritas (and ZFS?) allow you to take a snapshot which they
guarantee is atomic. You can do them while you have concurrent i/o and be sure
to get a single consistent view of the filesystem.

If you're just copying blocks from a device without any atomic snapshot
guarantee then you're going to get garbage. Even in Postgres wasn't writing
anything the OS might still choose to flush blocks during that time, possibly
not even Postgres data blocks but filesystem meta-information blocks.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: PITR Backups

From
Tom Lane
Date:
Dan Gorman <dgorman@hi5.com> writes:
> This snapshot is done at the LUN (filer) level, postgres is un-aware
> we're creating a backup, so I'm not sure how pg_start_backup() plays
> into this ...

That method works too, as long as you snapshot both the data files and
WAL files --- when you start PG from the backup, it will think it
crashed and recover by replaying WAL.  So, assuming that the snapshot
technology really works, it should be exactly as reliable as crash
recovery is.  If you saw a problem I'd be inclined to question whether
there is some upstream component (OS or disk controller) that's
reordering writes.

            regards, tom lane

Re: PITR Backups

From
"Simon Riggs"
Date:
On Fri, 2007-06-22 at 13:12 -0400, Tom Lane wrote:
> Dan Gorman <dgorman@hi5.com> writes:
> > This snapshot is done at the LUN (filer) level, postgres is un-aware
> > we're creating a backup, so I'm not sure how pg_start_backup() plays
> > into this ...
>
> That method works too, as long as you snapshot both the data files and
> WAL files --- when you start PG from the backup, it will think it
> crashed and recover by replaying WAL.  So, assuming that the snapshot
> technology really works, it should be exactly as reliable as crash
> recovery is.

> If you saw a problem I'd be inclined to question whether
> there is some upstream component (OS or disk controller) that's
> reordering writes.

Given thats exactly what they do, constantly, I don't think its safe to
say that it works since we cannot verify whether that has happened or
not.

At the very least, you should issue a CHECKPOINT prior to taking the
snapshot, to ensure that the write barriers have gone through.

But that being said, I'm not quite sure why following the Continuous
Archiving procedures is a problem, since they don't add any significant
overhead, over and above the checkpoint command.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: PITR Backups

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Fri, 2007-06-22 at 13:12 -0400, Tom Lane wrote:
>> If you saw a problem I'd be inclined to question whether
>> there is some upstream component (OS or disk controller) that's
>> reordering writes.

> Given thats exactly what they do, constantly, I don't think its safe to
> say that it works since we cannot verify whether that has happened or
> not.

If he's trying to snapshot at a level of hardware that's behind a
write-caching disk controller, I agree that that's untrustworthy.

If not, ie if he's snapshotting the actual durable state of the storage
system, then any problems in the snapshot indicate a problem with the
database's ability to recover from a crash.  So I don't think you should
tell him to not worry.

            regards, tom lane

Re: PITR Backups

From
Koichi Suzuki
Date:
Hi,

Year, I agree we should carefully follow how Done really did a backup.
My point is PostgreSQL may have to extend the file during the hot backup
to write to the new block.  It is slightly different from Oracle's case.
  Oracle allocates all the database space in advance so that there could
be no risk to modify the metadata on the fly.  In our case, because SAN
based storage snapshot is device level, not file system level, even a
file system does not know that the snapshot is being taken and we might
encounter the case where metadata and/or user data are not consistent.
Such snapshot (whole filesystem) might be corrupted and cause file
system level error.

I'm interested in this.   Any further comment/openion is welcome.

Regards;

Simon Riggs Wrote:
> On Fri, 2007-06-22 at 11:30 +0900, Toru SHIMOGAKI wrote:
>> Tom Lane wrote:
>>> Dan Gorman <dgorman@hi5.com> writes:
>>>>    All of our databases are on NetApp storage and I have been looking
>>>> at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
>>>> replica) for backing up our databases. The problem is because there
>>>> is no write-suspend or even a 'hot backup mode' for postgres it's
>>>> very plausible that the database has data in RAM that hasn't been
>>>> written and will corrupt the data.
>>> Alternatively, you can use a PITR base backup as suggested here:
>>> http://www.postgresql.org/docs/8.2/static/continuous-archiving.html
>> I think Dan's problem is important if we use PostgreSQL to a large size database:
>>
>> - When we take a PITR base backup with hardware level snapshot operation
>>   (not filesystem level) which a lot of storage vender provide, the backup data
>>   can be corrupted as Dan said. During recovery we can't even read it,
>>   especially if meta-data was corrupted.
>>
>> - If we don't use hardware level snapshot operation, it takes long time to take
>>   a large backup data, and a lot of full-page-written WAL files are made.
>>
>> So, I think users need a new feature not to write out heap pages during taking a
>> backup.
>
> Your worries are unwarranted, IMHO. It appears Dan was taking a snapshot
> without having read the procedure as clearly outlined in the manual.
>
> pg_start_backup() flushes all currently dirty blocks to disk as part of
> a checkpoint. If you snapshot after that point, then you will have all
> the data blocks required from which to correctly roll forward. On its
> own, the snapshot is an inconsistent backup and will give errors as Dan
> shows. It is only when the snapshot is used as the base backup in a full
> continuous recovery that the inconsistencies are removed and the
> database is fully and correctly restored.
>
> pg_start_backup() is the direct analogue of Oracle's ALTER DATABASE
> BEGIN BACKUP. Snapshots work with Oracle too, in much the same way.
>
> After reviewing the manual, if you honestly think there is a problem,
> please let me know and I'll work with you to investigate.
>


--
-------------
Koichi Suzuki

Re: PITR Backups

From
"Simon Riggs"
Date:
On Mon, 2007-06-25 at 19:06 +0900, Koichi Suzuki wrote:

> Year, I agree we should carefully follow how Done really did a backup.

> My point is PostgreSQL may have to extend the file during the hot backup
> to write to the new block.

If the snapshot is a consistent, point-in-time copy then I don't see how
any I/O at all makes a difference. To my knowledge, both EMC and NetApp
produce snapshots like this. IIRC, EMC calls these instant snapshots,
NetApp calls them frozen snapshots.

>  It is slightly different from Oracle's case.
>   Oracle allocates all the database space in advance so that there could
> be no risk to modify the metadata on the fly.

Not really sure its different.

Oracle allows dynamic file extensions and I've got no evidence that file
extension is prevented from occurring during backup simply as a result
of issuing the start hot backup command.

Oracle and DB2 both support a stop-I/O-to-the-database mode. My
understanding is that isn't required any more if you do an instant
snapshot, so if people are using instant snapshots it should certainly
be the case that they are safe to do this with PostgreSQL also.

Oracle is certainly more picky about snapshotted files than PostgreSQL
is. In Oracle, each file has a header with the LSN of the last
checkpoint in it. This is used at recovery time to ensure the backup is
consistent by having exactly equal LSNs across all files. PostgreSQL
doesn't use file headers and we don't store the LSN on a per-file basis,
though we do store the LSN in the control file for the whole server.

>  In our case, because SAN
> based storage snapshot is device level, not file system level, even a
> file system does not know that the snapshot is being taken and we might
> encounter the case where metadata and/or user data are not consistent.
> Such snapshot (whole filesystem) might be corrupted and cause file
> system level error.
>
> I'm interested in this.   Any further comment/openion is welcome.

If you can show me either

i) an error that occurs after the full and correct PostgreSQL hot backup
procedures have been executed, or

ii) present a conjecture that explains in detail how a device level
error might occur

then I will look into this further.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: PITR Backups

From
Tom Lane
Date:
Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes:
> Year, I agree we should carefully follow how Done really did a backup.
> My point is PostgreSQL may have to extend the file during the hot backup
> to write to the new block.  It is slightly different from Oracle's case.
>   Oracle allocates all the database space in advance so that there could
> be no risk to modify the metadata on the fly.  In our case, because SAN
> based storage snapshot is device level, not file system level, even a
> file system does not know that the snapshot is being taken and we might
> encounter the case where metadata and/or user data are not consistent.
> Such snapshot (whole filesystem) might be corrupted and cause file
> system level error.

Surely a hot-backup technique that cannot even produce a consistent
state of filesystem metadata is too broken to be considered a backup
technique at all.

AFAIK, actually workable methods of this type depend on filesystem
cooperation, and are able to produce coherent snapshots of the logical
(not necessarily physical) filesystem content at a specific instant.

            regards, tom lane

Re: PITR Backups

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> AFAIK, actually workable methods of this type depend on filesystem
> cooperation, and are able to produce coherent snapshots of the logical
> (not necessarily physical) filesystem content at a specific instant.

I think you need filesystem cooperation in order to provide access to the
snapshot somewhere. But the actual snapshotting is done at a very low level by
intercepting any block writes and stashing away the old version before writing
or alternately by noting the new version and redirecting any reads to the new
version.

I concur that anything that doesn't allow concurrent i/o while the
snapshotting is happening is worthless. It sounds like you're just dd'ing from
the device which is pretty much guaranteed not to work.

Even if Postgres didn't do any i/o there's nothing stopping the OS and
filesystem from issuing i/o.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: PITR Backups

From
Dan Gorman
Date:
It's the latter, is snapshot of the durable state of the storage
system (e.g. it will never be corrupted)

Regards,
Dan Gorman

On Jun 22, 2007, at 11:02 AM, Tom Lane wrote:

> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> On Fri, 2007-06-22 at 13:12 -0400, Tom Lane wrote:
>>> If you saw a problem I'd be inclined to question whether
>>> there is some upstream component (OS or disk controller) that's
>>> reordering writes.
>
>> Given thats exactly what they do, constantly, I don't think its
>> safe to
>> say that it works since we cannot verify whether that has happened or
>> not.
>
> If he's trying to snapshot at a level of hardware that's behind a
> write-caching disk controller, I agree that that's untrustworthy.
>
> If not, ie if he's snapshotting the actual durable state of the
> storage
> system, then any problems in the snapshot indicate a problem with the
> database's ability to recover from a crash.  So I don't think you
> should
> tell him to not worry.
>
>             regards, tom lane



Re: PITR Backups

From
Dan Gorman
Date:
I took several snapshots. In all cases the FS was fine. In one case
the db looked like on recovery it thought there were outstanding
pages to be written to disk as seen below and the db wouldn't start.

Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21
00:39:43 PDTLOG:  redo done at 71/99870670
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21
00:39:43 PDTWARNING:  page 28905 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [11-1] 2007-06-21
00:39:43 PDTWARNING:  page 13626 of relation 1663/16384/76716 did not
exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [12-1] 2007-06-21
00:39:43 PDTWARNING:  page 28904 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [13-1] 2007-06-21
00:39:43 PDTWARNING:  page 26711 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [14-1] 2007-06-21
00:39:43 PDTWARNING:  page 28900 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [15-1] 2007-06-21
00:39:43 PDTWARNING:  page 3535208 of relation 1663/16384/33190 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [16-1] 2007-06-21
00:39:43 PDTWARNING:  page 28917 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [17-1] 2007-06-21
00:39:43 PDTWARNING:  page 3535207 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [18-1] 2007-06-21
00:39:43 PDTWARNING:  page 28916 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [19-1] 2007-06-21
00:39:43 PDTWARNING:  page 28911 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [20-1] 2007-06-21
00:39:43 PDTWARNING:  page 26708 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [21-1] 2007-06-21
00:39:43 PDTWARNING:  page 28914 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [22-1] 2007-06-21
00:39:43 PDTWARNING:  page 28909 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [23-1] 2007-06-21
00:39:43 PDTWARNING:  page 28908 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [24-1] 2007-06-21
00:39:43 PDTWARNING:  page 28913 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [25-1] 2007-06-21
00:39:43 PDTWARNING:  page 26712 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [26-1] 2007-06-21
00:39:43 PDTWARNING:  page 28918 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [27-1] 2007-06-21
00:39:43 PDTWARNING:  page 28912 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [28-1] 2007-06-21
00:39:43 PDTWARNING:  page 3535209 of relation 1663/16384/33190 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [29-1] 2007-06-21
00:39:43 PDTWARNING:  page 28907 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [30-1] 2007-06-21
00:39:43 PDTWARNING:  page 28906 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [31-1] 2007-06-21
00:39:43 PDTWARNING:  page 26713 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [32-1] 2007-06-21
00:39:43 PDTWARNING:  page 17306 of relation 1663/16384/76710 did not
exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [33-1] 2007-06-21
00:39:43 PDTWARNING:  page 26706 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [34-1] 2007-06-21
00:39:43 PDTWARNING:  page 800226 of relation 1663/16384/33204 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [35-1] 2007-06-21
00:39:43 PDTWARNING:  page 28915 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [36-1] 2007-06-21
00:39:43 PDTWARNING:  page 26710 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [37-1] 2007-06-21
00:39:43 PDTWARNING:  page 28903 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [38-1] 2007-06-21
00:39:43 PDTWARNING:  page 28902 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [39-1] 2007-06-21
00:39:43 PDTWARNING:  page 28910 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21
00:39:43 PDTPANIC:  WAL contains references to invalid pages
Jun 21 00:39:43 sfmedstorageha001 postgres[3503]: [1-1] 2007-06-21
00:39:43 PDTLOG:  startup process (PID 3506) was terminated by signal 6
Jun 21 00:39:43 sfmedstorageha001 postgres[3503]: [2-1] 2007-06-21
00:39:43 PDTLOG:  aborting startup due to startup process failure
Jun 21 00:39:43 sfmedstorageha001 postgres[3505]: [1-1] 2007-06-21
00:39:43 PDTLOG:  logger shutting down
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-1] 2007-06-21
00:40:39 PDTLOG:  database system was interrupted while in recovery
at 2007-06-21 00:36:40 PDT
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-2] 2007-06-21
00:40:39 PDTHINT:  This probably means that some data is corrupted
and you will have to use the last backup for
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-3]  recovery.
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [2-1] 2007-06-21
00:40:39 PDTLOG:  checkpoint record is at 71/9881E928
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [3-1] 2007-06-21
00:40:39 PDTLOG:  redo record is at 71/986BF148; undo record is at
0/0; shutdown FALSE
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [4-1] 2007-06-21
00:40:39 PDTLOG:  next transaction ID: 0/2871389429; next OID: 83795
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [5-1] 2007-06-21
00:40:39 PDTLOG:  next MultiXactId: 1; next MultiXactOffset: 0
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [6-1] 2007-06-21
00:40:39 PDTLOG:  database system was not properly shut down;
automatic recovery in progress
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [7-1] 2007-06-21
00:40:39 PDTLOG:  redo starts at 71/986BF148
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [8-1] 2007-06-21
00:40:39 PDTLOG:  record with zero length at 71/998706A8
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [9-1] 2007-06-21
00:40:39 PDTLOG:  redo done at 71/99870670
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [10-1] 2007-06-21
00:40:39 PDTWARNING:  page 28905 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [11-1] 2007-06-21
00:40:39 PDTWARNING:  page 13626 of relation 1663/16384/76716 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [12-1] 2007-06-21
00:40:39 PDTWARNING:  page 28904 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [13-1] 2007-06-21
00:40:39 PDTWARNING:  page 26711 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [14-1] 2007-06-21
00:40:39 PDTWARNING:  page 28900 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [15-1] 2007-06-21
00:40:39 PDTWARNING:  page 3535208 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [16-1] 2007-06-21
00:40:39 PDTWARNING:  page 28917 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [17-1] 2007-06-21
00:40:39 PDTWARNING:  page 3535207 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [18-1] 2007-06-21
00:40:39 PDTWARNING:  page 28916 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [19-1] 2007-06-21
00:40:39 PDTWARNING:  page 28911 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [20-1] 2007-06-21
00:40:39 PDTWARNING:  page 26708 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [21-1] 2007-06-21
00:40:39 PDTWARNING:  page 28914 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [22-1] 2007-06-21
00:40:39 PDTWARNING:  page 28909 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [23-1] 2007-06-21
00:40:39 PDTWARNING:  page 28908 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [24-1] 2007-06-21
00:40:39 PDTWARNING:  page 28913 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [25-1] 2007-06-21
00:40:39 PDTWARNING:  page 26712 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [26-1] 2007-06-21
00:40:39 PDTWARNING:  page 28918 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [27-1] 2007-06-21
00:40:39 PDTWARNING:  page 28912 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [28-1] 2007-06-21
00:40:39 PDTWARNING:  page 3535209 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [29-1] 2007-06-21
00:40:39 PDTWARNING:  page 28907 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [30-1] 2007-06-21
00:40:39 PDTWARNING:  page 28906 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [31-1] 2007-06-21
00:40:39 PDTWARNING:  page 26713 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [32-1] 2007-06-21
00:40:39 PDTWARNING:  page 17306 of relation 1663/16384/76710 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [33-1] 2007-06-21
00:40:39 PDTWARNING:  page 26706 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [34-1] 2007-06-21
00:40:39 PDTWARNING:  page 800226 of relation 1663/16384/33204 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [35-1] 2007-06-21
00:40:39 PDTWARNING:  page 28915 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [36-1] 2007-06-21
00:40:39 PDTWARNING:  page 26710 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [37-1] 2007-06-21
00:40:39 PDTWARNING:  page 28903 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [38-1] 2007-06-21
00:40:39 PDTWARNING:  page 28902 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [39-1] 2007-06-21
00:40:39 PDTWARNING:  page 28910 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [40-1] 2007-06-21
00:40:39 PDTPANIC:  WAL contains references to invalid pages
Jun 21 00:40:39 sfmedstorageha001 postgres[3755]: [1-1] 2007-06-21
00:40:39 PDTLOG:  startup process (PID 3757) was terminated by signal 6
Jun 21 00:40:39 sfmedstorageha001 postgres[3755]: [2-1] 2007-06-21
00:40:39 PDTLOG:  aborting startup due to startup process failure
Jun 21 00:40:39 sfmedstorageha001 postgres[3756]: [1-1] 2007-06-21
00:40:39 PDTLOG:  logger shutting down




On Jun 25, 2007, at 6:26 AM, Simon Riggs wrote:

> On Mon, 2007-06-25 at 19:06 +0900, Koichi Suzuki wrote:
>
>> Year, I agree we should carefully follow how Done really did a
>> backup.
>
>> My point is PostgreSQL may have to extend the file during the hot
>> backup
>> to write to the new block.
>
> If the snapshot is a consistent, point-in-time copy then I don't
> see how
> any I/O at all makes a difference. To my knowledge, both EMC and
> NetApp
> produce snapshots like this. IIRC, EMC calls these instant snapshots,
> NetApp calls them frozen snapshots.
>
>>  It is slightly different from Oracle's case.
>>   Oracle allocates all the database space in advance so that there
>> could
>> be no risk to modify the metadata on the fly.
>
> Not really sure its different.
>
> Oracle allows dynamic file extensions and I've got no evidence that
> file
> extension is prevented from occurring during backup simply as a result
> of issuing the start hot backup command.
>
> Oracle and DB2 both support a stop-I/O-to-the-database mode. My
> understanding is that isn't required any more if you do an instant
> snapshot, so if people are using instant snapshots it should certainly
> be the case that they are safe to do this with PostgreSQL also.
>
> Oracle is certainly more picky about snapshotted files than PostgreSQL
> is. In Oracle, each file has a header with the LSN of the last
> checkpoint in it. This is used at recovery time to ensure the
> backup is
> consistent by having exactly equal LSNs across all files. PostgreSQL
> doesn't use file headers and we don't store the LSN on a per-file
> basis,
> though we do store the LSN in the control file for the whole server.
>
>>  In our case, because SAN
>> based storage snapshot is device level, not file system level, even a
>> file system does not know that the snapshot is being taken and we
>> might
>> encounter the case where metadata and/or user data are not
>> consistent.
>> Such snapshot (whole filesystem) might be corrupted and cause file
>> system level error.
>>
>> I'm interested in this.   Any further comment/openion is welcome.
>
> If you can show me either
>
> i) an error that occurs after the full and correct PostgreSQL hot
> backup
> procedures have been executed, or
>
> ii) present a conjecture that explains in detail how a device level
> error might occur
>
> then I will look into this further.
>
> --
>   Simon Riggs
>   EnterpriseDB   http://www.enterprisedb.com
>
>



Re: PITR Backups

From
Gregory Stark
Date:
"Dan Gorman" <dgorman@hi5.com> writes:

> I took several snapshots. In all cases the FS was fine. In one case the db
> looked like on recovery it thought there were outstanding  pages to be written
> to disk as seen below and the db wouldn't start.
>
> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21 00:39:43
> PDTLOG:  redo done at 71/99870670
> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21 00:39:43
> PDTWARNING:  page 28905 of relation 1663/16384/76718 was  uninitialized

What version of Postgres did you say this was?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: PITR Backups

From
Dan Gorman
Date:
Greg,

PG 8.2.4

Regards,
Dan Gorman

On Jun 25, 2007, at 9:02 AM, Gregory Stark wrote:

> "Dan Gorman" <dgorman@hi5.com> writes:
>
>> I took several snapshots. In all cases the FS was fine. In one
>> case the db
>> looked like on recovery it thought there were outstanding  pages
>> to be written
>> to disk as seen below and the db wouldn't start.
>>
>> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21
>> 00:39:43
>> PDTLOG:  redo done at 71/99870670
>> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1]
>> 2007-06-21 00:39:43
>> PDTWARNING:  page 28905 of relation 1663/16384/76718 was
>> uninitialized
>
> What version of Postgres did you say this was?
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>



Re: PITR Backups

From
"Simon Riggs"
Date:
On Mon, 2007-06-25 at 08:28 -0700, Dan Gorman wrote:
> I took several snapshots. In all cases the FS was fine. In one case
> the db looked like on recovery it thought there were outstanding
> pages to be written to disk as seen below and the db wouldn't start.
>
> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21
> 00:39:43 PDTLOG:  redo done at 71/99870670
> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21
> 00:39:43 PDTWARNING:  page 28905 of relation 1663/16384/76718 was
> uninitialized

OK, please put log_min_messages = DEBUG2 and re-run the recovery please.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: PITR Backups

From
Tom Lane
Date:
Dan Gorman <dgorman@hi5.com> writes:
> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21
> 00:39:43 PDTLOG:  redo done at 71/99870670
> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21
> 00:39:43 PDTWARNING:  page 28905 of relation 1663/16384/76718 was
> uninitialized
> ... lots of these ...
> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21
> 00:39:43 PDTPANIC:  WAL contains references to invalid pages

(BTW, you'll find putting a space at the end of log_line_prefix
does wonders for log readability.)

Reformatting and sorting, we have

WARNING: page 3535207 of relation 1663/16384/33190 was uninitialized
WARNING: page 3535208 of relation 1663/16384/33190 did not exist
WARNING: page 3535209 of relation 1663/16384/33190 did not exist

WARNING: page 800226 of relation 1663/16384/33204 did not exist

WARNING: page 17306 of relation 1663/16384/76710 did not exist

WARNING: page 13626 of relation 1663/16384/76716 did not exist

WARNING: page 28900 of relation 1663/16384/76718 was uninitialized
WARNING: page 28902 of relation 1663/16384/76718 was uninitialized
WARNING: page 28903 of relation 1663/16384/76718 was uninitialized
WARNING: page 28904 of relation 1663/16384/76718 was uninitialized
WARNING: page 28905 of relation 1663/16384/76718 was uninitialized
WARNING: page 28906 of relation 1663/16384/76718 was uninitialized
WARNING: page 28907 of relation 1663/16384/76718 was uninitialized
WARNING: page 28908 of relation 1663/16384/76718 was uninitialized
WARNING: page 28909 of relation 1663/16384/76718 was uninitialized
WARNING: page 28910 of relation 1663/16384/76718 was uninitialized
WARNING: page 28911 of relation 1663/16384/76718 was uninitialized
WARNING: page 28912 of relation 1663/16384/76718 was uninitialized
WARNING: page 28913 of relation 1663/16384/76718 was uninitialized
WARNING: page 28914 of relation 1663/16384/76718 was uninitialized
WARNING: page 28915 of relation 1663/16384/76718 was uninitialized
WARNING: page 28916 of relation 1663/16384/76718 was uninitialized
WARNING: page 28917 of relation 1663/16384/76718 was uninitialized
WARNING: page 28918 of relation 1663/16384/76718 was uninitialized

WARNING: page 26706 of relation 1663/16384/76719 was uninitialized
WARNING: page 26708 of relation 1663/16384/76719 was uninitialized
WARNING: page 26710 of relation 1663/16384/76719 was uninitialized
WARNING: page 26711 of relation 1663/16384/76719 was uninitialized
WARNING: page 26712 of relation 1663/16384/76719 was uninitialized
WARNING: page 26713 of relation 1663/16384/76719 was uninitialized

So the problems were pretty localized, probably at the ends of these
files.  Can you go back to the source database and check which
tables these are --- match the last number cited in each line
against pg_class.relfilenode?  Are they tables or indexes, and
about how big are they?

A possible explanation is we stopped scanning WAL before reaching
records that truncated or dropped these tables.  But it's not clear why.
Could we see the last few log lines before the "redo done" one?

            regards, tom lane

Re: PITR Backups

From
"Simon Riggs"
Date:
On Mon, 2007-06-25 at 12:34 -0400, Tom Lane wrote:
> Dan Gorman <dgorman@hi5.com> writes:
> > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21
> > 00:39:43 PDTLOG:  redo done at 71/99870670

This is mid-way through an xlog file.

> > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21
> > 00:39:43 PDTWARNING:  page 28905 of relation 1663/16384/76718 was
> > uninitialized
> > ... lots of these ...
> > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21
> > 00:39:43 PDTPANIC:  WAL contains references to invalid pages
>
> (BTW, you'll find putting a space at the end of log_line_prefix
> does wonders for log readability.)
>
> Reformatting and sorting, we have
>
> WARNING: page 28900 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28902 of relation 1663/16384/76718 was uninitialized

> WARNING: page 26706 of relation 1663/16384/76719 was uninitialized
> WARNING: page 26708 of relation 1663/16384/76719 was uninitialized

Those two are interesting because we appear to have two valid pages in
the middle of some uninitialized ones. That implies were not looking at
an unapplied truncation.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: PITR Backups

From
Dan Gorman
Date:
Thanks for the pointers to a) make it readable and b) log min messages

I didn't however keep the snapshots around. I could try and re-set
this scenario up. I was in the middle of doing some data migration
with Netapp and wanted to just 'test' it to make sure it was sane.

If you guys would like me to try to 'break' it again and keep the db
around for further testing let me know.

Regards,
Dan Gorman


On Jun 25, 2007, at 9:34 AM, Tom Lane wrote:

> Dan Gorman <dgorman@hi5.com> writes:
>> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21
>> 00:39:43 PDTLOG:  redo done at 71/99870670
>> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21
>> 00:39:43 PDTWARNING:  page 28905 of relation 1663/16384/76718 was
>> uninitialized
>> ... lots of these ...
>> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21
>> 00:39:43 PDTPANIC:  WAL contains references to invalid pages
>
> (BTW, you'll find putting a space at the end of log_line_prefix
> does wonders for log readability.)
>
> Reformatting and sorting, we have
>
> WARNING: page 3535207 of relation 1663/16384/33190 was uninitialized
> WARNING: page 3535208 of relation 1663/16384/33190 did not exist
> WARNING: page 3535209 of relation 1663/16384/33190 did not exist
>
> WARNING: page 800226 of relation 1663/16384/33204 did not exist
>
> WARNING: page 17306 of relation 1663/16384/76710 did not exist
>
> WARNING: page 13626 of relation 1663/16384/76716 did not exist
>
> WARNING: page 28900 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28902 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28903 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28904 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28905 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28906 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28907 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28908 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28909 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28910 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28911 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28912 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28913 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28914 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28915 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28916 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28917 of relation 1663/16384/76718 was uninitialized
> WARNING: page 28918 of relation 1663/16384/76718 was uninitialized
>
> WARNING: page 26706 of relation 1663/16384/76719 was uninitialized
> WARNING: page 26708 of relation 1663/16384/76719 was uninitialized
> WARNING: page 26710 of relation 1663/16384/76719 was uninitialized
> WARNING: page 26711 of relation 1663/16384/76719 was uninitialized
> WARNING: page 26712 of relation 1663/16384/76719 was uninitialized
> WARNING: page 26713 of relation 1663/16384/76719 was uninitialized
>
> So the problems were pretty localized, probably at the ends of these
> files.  Can you go back to the source database and check which
> tables these are --- match the last number cited in each line
> against pg_class.relfilenode?  Are they tables or indexes, and
> about how big are they?
>
> A possible explanation is we stopped scanning WAL before reaching
> records that truncated or dropped these tables.  But it's not clear
> why.
> Could we see the last few log lines before the "redo done" one?
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq



Re: PITR Backups

From
Gregory Stark
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:

>> WARNING: page 28900 of relation 1663/16384/76718 was uninitialized
>> WARNING: page 28902 of relation 1663/16384/76718 was uninitialized
>
>> WARNING: page 26706 of relation 1663/16384/76719 was uninitialized
>> WARNING: page 26708 of relation 1663/16384/76719 was uninitialized
>
> Those two are interesting because we appear to have two valid pages in
> the middle of some uninitialized ones. That implies were not looking at
> an unapplied truncation.

You don't have fsync off do you? That could explain missing pages at the end
of a file like this too. And it would explain how you could have two written
in the midst of others that are missing.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: PITR Backups

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
>> Reformatting and sorting, we have
>>
>> WARNING: page 28900 of relation 1663/16384/76718 was uninitialized
>> WARNING: page 28902 of relation 1663/16384/76718 was uninitialized

>> WARNING: page 26706 of relation 1663/16384/76719 was uninitialized
>> WARNING: page 26708 of relation 1663/16384/76719 was uninitialized

> Those two are interesting because we appear to have two valid pages in
> the middle of some uninitialized ones. That implies were not looking at
> an unapplied truncation.

Not necessarily --- it's possible the WAL sequence simply didn't touch
those pages.

Your suggestion to rerun the recovery with higher log_min_messages
is a good one, because that way we'd get some detail about what the
WAL records that touched the pages were.  I think DEBUG1 would be
sufficient for that, though, and DEBUG2 might be pretty durn verbose.

            regards, tom lane

Re: PITR Backups

From
Dan Gorman
Date:
No, however, I will attach the postgreql.conf so everyone can look at
other settings just in case.

Regards,
Dan Gorman

On Jun 25, 2007, at 10:07 AM, Gregory Stark wrote:

> "Simon Riggs" <simon@2ndquadrant.com> writes:
>
>>> WARNING: page 28900 of relation 1663/16384/76718 was uninitialized
>>> WARNING: page 28902 of relation 1663/16384/76718 was uninitialized
>>
>>> WARNING: page 26706 of relation 1663/16384/76719 was uninitialized
>>> WARNING: page 26708 of relation 1663/16384/76719 was uninitialized
>>
>> Those two are interesting because we appear to have two valid
>> pages in
>> the middle of some uninitialized ones. That implies were not
>> looking at
>> an unapplied truncation.
>
> You don't have fsync off do you? That could explain missing pages
> at the end
> of a file like this too. And it would explain how you could have
> two written
> in the midst of others that are missing.
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>


Attachment

Re: PITR Backups

From
Tom Lane
Date:
Dan Gorman <dgorman@hi5.com> writes:
> I didn't however keep the snapshots around. I could try and re-set
> this scenario up. I was in the middle of doing some data migration
> with Netapp and wanted to just 'test' it to make sure it was sane.

> If you guys would like me to try to 'break' it again and keep the db
> around for further testing let me know.

Yeah, please do.  It's not entirely clear whether you've found a bug
or not, and it'd be good to determine that ...

            regards, tom lane