Thread: Snapshot backups

Snapshot backups

From
James Sewell
Date:
Hey All,

This is a message to confirm my thoughts / validate a possible approach.

In a situation where PGDATA and {XLOG, ARCHIVELOG} are on different SAN/NAS volumes and a backup is to be initiated do pg_start_backup and pg_stop_backup need to be used?

I am using snapshots of each volume for backup.

My thinking is that they are not needed (although I realise it is good practice).

As far as I can tell all they are doing is something like:

pg_start_backup:
  - create backup label
  - trigger checkpoint

pg_stop_backup
  - remove backup label file
  - creates backup history file
  - trigger log switch

There is nothing in here that is *required* from a backup point of view. Am I missing anything?

James Sewell
Solutions Architect

_____________________________________


http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment

Re: Snapshot backups

From
Magnus Hagander
Date:

On Thu, Jun 20, 2013 at 8:45 AM, James Sewell <james.sewell@lisasoft.com> wrote:
Hey All,

This is a message to confirm my thoughts / validate a possible approach.

In a situation where PGDATA and {XLOG, ARCHIVELOG} are on different SAN/NAS volumes and a backup is to be initiated do pg_start_backup and pg_stop_backup need to be used?

I am using snapshots of each volume for backup.

My thinking is that they are not needed (although I realise it is good practice).

As far as I can tell all they are doing is something like:

pg_start_backup:
  - create backup label
  - trigger checkpoint

pg_stop_backup
  - remove backup label file
  - creates backup history file
  - trigger log switch

There is nothing in here that is *required* from a backup point of view. Am I missing anything?

The backup functions also set internal state in the database, so you can't just replace it with doing those operations manually.  You do need to call those functions.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Snapshot backups

From
James Sewell
Date:
Thanks Magnus,

Could you elaborate a bit more on this? 

I've been having a look at do_pg_start_backup() and I can't really see anything apart from enabling full page writes and running a checkpoint to avoid getting a torn page. I could be missing something easily though, as I'm not familiar with the codebase.

do_pg_stop_backup() isn't really of consequence, as the backup is taken before this - so any restore is to a point in time before this as well.

I was under the impression a restore was (more or less) the same as a crash recovery, and logically it seems like PGDATA snapshot is equivalent to a crash/restart (disk at a discrete point in time). 

I can understand if log replay might take longer, but I am struggling to see how it could result in an inconsistent state? 

As I said I know this isn't best practice, but just want to understand how it works.

Cheers,



James Sewell
Solutions Architect

_____________________________________


http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Thu, Jun 20, 2013 at 6:34 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Jun 20, 2013 at 8:45 AM, James Sewell <james.sewell@lisasoft.com> wrote:
Hey All,

This is a message to confirm my thoughts / validate a possible approach.

In a situation where PGDATA and {XLOG, ARCHIVELOG} are on different SAN/NAS volumes and a backup is to be initiated do pg_start_backup and pg_stop_backup need to be used?

I am using snapshots of each volume for backup.

My thinking is that they are not needed (although I realise it is good practice).

As far as I can tell all they are doing is something like:

pg_start_backup:
  - create backup label
  - trigger checkpoint

pg_stop_backup
  - remove backup label file
  - creates backup history file
  - trigger log switch

There is nothing in here that is *required* from a backup point of view. Am I missing anything?

The backup functions also set internal state in the database, so you can't just replace it with doing those operations manually.  You do need to call those functions.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment

Re: Snapshot backups

From
James Sewell
Date:
Hey all,

I understand that I have already been given an answer here, but I am still curious as to why this is the case (perhaps I should ask this on the hackers list though, if so let me know).

More importantly I'd like to understand why I would need to use the start/stop backup commands to ensure a valid backup when using filesystem snapshots (assuming I get the order correct)- worst case scenario wouldn't it be the same as a crash and cause an automatic roll-forward?

Cheers,
James

James Sewell
PostgreSQL Team Lead / Solutions Architect

_____________________________________


http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Fri, Jun 21, 2013 at 10:17 AM, James Sewell <james.sewell@lisasoft.com> wrote:
Thanks Magnus,

Could you elaborate a bit more on this? 

I've been having a look at do_pg_start_backup() and I can't really see anything apart from enabling full page writes and running a checkpoint to avoid getting a torn page. I could be missing something easily though, as I'm not familiar with the codebase.

do_pg_stop_backup() isn't really of consequence, as the backup is taken before this - so any restore is to a point in time before this as well.

I was under the impression a restore was (more or less) the same as a crash recovery, and logically it seems like PGDATA snapshot is equivalent to a crash/restart (disk at a discrete point in time). 

I can understand if log replay might take longer, but I am struggling to see how it could result in an inconsistent state? 

As I said I know this isn't best practice, but just want to understand how it works.

Cheers,



James Sewell
Solutions Architect

_____________________________________


http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Thu, Jun 20, 2013 at 6:34 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Jun 20, 2013 at 8:45 AM, James Sewell <james.sewell@lisasoft.com> wrote:
Hey All,

This is a message to confirm my thoughts / validate a possible approach.

In a situation where PGDATA and {XLOG, ARCHIVELOG} are on different SAN/NAS volumes and a backup is to be initiated do pg_start_backup and pg_stop_backup need to be used?

I am using snapshots of each volume for backup.

My thinking is that they are not needed (although I realise it is good practice).

As far as I can tell all they are doing is something like:

pg_start_backup:
  - create backup label
  - trigger checkpoint

pg_stop_backup
  - remove backup label file
  - creates backup history file
  - trigger log switch

There is nothing in here that is *required* from a backup point of view. Am I missing anything?

The backup functions also set internal state in the database, so you can't just replace it with doing those operations manually.  You do need to call those functions.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/




The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment

Re: Snapshot backups

From
Amit Langote
Date:

On Mon, Jul 29, 2013 at 3:32 PM, James Sewell <james.sewell@lisasoft.com> wrote:
Hey all,

I understand that I have already been given an answer here, but I am still curious as to why this is the case (perhaps I should ask this on the hackers list though, if so let me know).

More importantly I'd like to understand why I would need to use the start/stop backup commands to ensure a valid backup when using filesystem snapshots (assuming I get the order correct)- worst case scenario wouldn't it be the same as a crash and cause an automatic roll-forward?



​pg_start_backup('backup_label') and pg_stop_backup(), if I understand it correctly, ​write to the 'backup_label' file the information necessary to recover "consistently" from that backup. For example, backup_label file contains the checkpoint location and its REDO location (identified as "START WAL LOCATION:" field in the backup_label file.) While you are reading the code, you can read the comment above the function read_backup_label() in src/backend/access/transam/xlog.c


--
Amit Langote

Re: Snapshot backups

From
James Sewell
Date:
I understand what you are saying, and I understand how the backup_label works - but I still don't understand why the pg_start and pg_stop commands are REQUIRED when doing a snapshot backup to ensure data integrity. 

Surely not using them and restoring a snapshot is the same as starting after a crash, and will result in log replay to get to the latest possible consistent state?

I thought PostgreSQL guaranteed a consistent state after a crash, am I mistaken about this?

James



James Sewell
PostgreSQL Team Lead / Solutions Architect

_____________________________________


http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Mon, Jul 29, 2013 at 4:54 PM, Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Jul 29, 2013 at 3:32 PM, James Sewell <james.sewell@lisasoft.com> wrote:
Hey all,

I understand that I have already been given an answer here, but I am still curious as to why this is the case (perhaps I should ask this on the hackers list though, if so let me know).

More importantly I'd like to understand why I would need to use the start/stop backup commands to ensure a valid backup when using filesystem snapshots (assuming I get the order correct)- worst case scenario wouldn't it be the same as a crash and cause an automatic roll-forward?



pg_start_backup('backup_label') and pg_stop_backup(), if I understand it correctly, write to the 'backup_label' file the information necessary to recover "consistently" from that backup. For example, backup_label file contains the checkpoint location and its REDO location (identified as "START WAL LOCATION:" field in the backup_label file.) While you are reading the code, you can read the comment above the function read_backup_label() in src/backend/access/transam/xlog.c


--
Amit Langote



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment

Re: Snapshot backups

From
Jeff Janes
Date:
On Tuesday, July 30, 2013, James Sewell wrote:
I understand what you are saying, and I understand how the backup_label works - but I still don't understand why the pg_start and pg_stop commands are REQUIRED when doing a snapshot backup to ensure data integrity. 

Surely not using them and restoring a snapshot is the same as starting after a crash, and will result in log replay to get to the latest possible consistent state?

That's true provided that all of your data is on a single volume, and you trust your "snapshot" to be free of bugs.

Cheers,

Jeff

Re: Snapshot backups

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Tuesday, July 30, 2013, James Sewell wrote:
>> I understand what you are saying, and I understand how the backup_label
>> works - but I still don't understand why the pg_start and pg_stop commands
>> are REQUIRED when doing a snapshot backup to ensure data integrity.
>>
>> Surely not using them and restoring a snapshot is the same as starting
>> after a crash, and will result in log replay to get to the latest possible
>> consistent state?

> That's true provided that all of your data is on a single volume, and you
> trust your "snapshot" to be free of bugs.

James stated to begin with that his data was spread across multiple
volumes, so it's definitely not safe for him to omit
pg_start_backup/pg_stop_backup.

Perhaps it would help to consider what can happen when you're not using
filesystem snapshots at all, but just an ordinary userspace backup program
such as "tar".  The difference between recovering from a tar backup and
crash recovery is that, because the backup is taken over an extended
period of time, it may contain a set of data that does not match any
possible instantaneous state of the on-disk data --- and crash recovery
only promises to deal with the latter.  Here is a concrete example of what
can happen:

1. The tar process copies the file for table foo.  There are changes to
foo in Postgres' shared buffers that haven't made it to disk yet (although
those changes are committed and recorded in on-disk WAL), so the copy made
by tar isn't entirely up to date.

2. PG's checkpoint process starts a checkpoint run.  Along the way,
it flushes out the changes to table foo.  When done, it updates the
last-checkpoint pointer in pg_control, which tells where crash recovery
would need to start replaying WAL.

3. The tar process archives pg_control.

Now, if you restore the tar backup onto a new system and start up
Postgres, you will have an obsolete copy of table foo --- and WAL replay
will not apply the needed updates to foo, because it will start from the
point in WAL that pg_control says it should start from, and that's after
the WAL records that describe the missing changes.  Note that this will
fail even if you assume you've got perfectly good and complete copies
of the WAL files; there's a whole 'nother set of hazards if you don't.

For recovery from a tar backup to work, the archived copy of pg_control
must point to a spot in the WAL sequence that is before any changes that
could possibly not yet appear in any archived data files.  The purpose of
pg_start_backup/pg_stop_backup is to provide the synchronization needed to
meet this requirement.

The comparable case isn't possible for crash recovery, assuming that the
OS and storage hardware implement fsync() correctly, because we'll have
fsync'd the changes to foo down to disk before updating pg_control.

Now, if you instead take a filesystem snapshot (representing some
instantaneous state of the disk contents) and run "tar" to copy that,
you have a good backup, because you must have a copy of pg_control that
will tell you to re-apply any changes that are missing from the data
files, as well as WAL files that contain the needed records.  However,
this is only certain if all that data is on *one* filesystem, because
otherwise you can't be sure you have mutually consistent snapshots.
And you're vulnerable to any bugs in the filesystem's snapshot
implementation that might give you inconsistent copies of different
files.  (Such bugs would probably be closely related to bugs in fsync
... but that doesn't mean they're necessarily exactly the same.)

So that's the long form of Jeff's comment above.  Any clearer now?

            regards, tom lane


Re: Snapshot backups

From
Alban Hertroys
Date:
On Jul 31, 2013, at 7:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Tuesday, July 30, 2013, James Sewell wrote:
>>> I understand what you are saying, and I understand how the backup_label
>>> works - but I still don't understand why the pg_start and pg_stop commands
>>> are REQUIRED when doing a snapshot backup to ensure data integrity.
>>>
>>> Surely not using them and restoring a snapshot is the same as starting
>>> after a crash, and will result in log replay to get to the latest possible
>>> consistent state?
>
>> That's true provided that all of your data is on a single volume, and you
>> trust your "snapshot" to be free of bugs.
>
> James stated to begin with that his data was spread across multiple
> volumes, so it's definitely not safe for him to omit
> pg_start_backup/pg_stop_backup.

(…)

> For recovery from a tar backup to work, the archived copy of pg_control
> must point to a spot in the WAL sequence that is before any changes that
> could possibly not yet appear in any archived data files.  The purpose of
> pg_start_backup/pg_stop_backup is to provide the synchronization needed to
> meet this requirement.
>
> The comparable case isn't possible for crash recovery, assuming that the
> OS and storage hardware implement fsync() correctly, because we'll have
> fsync'd the changes to foo down to disk before updating pg_control.
>
> Now, if you instead take a filesystem snapshot (representing some
> instantaneous state of the disk contents) and run "tar" to copy that,
> you have a good backup, because you must have a copy of pg_control that
> will tell you to re-apply any changes that are missing from the data
> files, as well as WAL files that contain the needed records.  However,
> this is only certain if all that data is on *one* filesystem, because
> otherwise you can't be sure you have mutually consistent snapshots.
> And you're vulnerable to any bugs in the filesystem's snapshot
> implementation that might give you inconsistent copies of different
> files.  (Such bugs would probably be closely related to bugs in fsync
> ... but that doesn't mean they're necessarily exactly the same.)


That begs the question what happens in case of a crash or (worse) a partial crash when multiple file systems are
involved.

Say, one tablespace is on a ZFS volume and one is on an UFS volume and the ZFS code crashes. That should result in a
kernelpanic, of course, in which case every file-system is stopped at the same moment and there should be a consistent
"snapshot".But what if it doesn't trigger a panic? 

And how would file-system recovery play into this? Would a journaled file-system roll back every segment involved in
thesame interrupted transaction, or could this create an inconsistent snapshot where the original situation (sans
file-systemrecovery) would actually have been a preferable state to start from? 

And what happens if one of these volumes would, for example, get ejected from a RAID controller (because all disks in
ithave triggered alarms, whether genuine or not) and the other volume would not get ejected? 
Does the database abort work or does that result in an inconsistent state?
I suppose this case usually doesn't matter much, part of the database is gone completely anyway, but what if the RAID
controllerwas wrong and the disks are actually just fine and come back once re-inserted? That has actually happened
severaltimes here, caused by SATA wires vibrating loose over time (took me a while to discover the cause); my databases
havealways been on a single volume though, so I've never had the opportunity to run into this. 

I realise these are typical "what if" scenario's, so I suppose answering these doesn't have a high priority. It's just
thatI've been wondering/worrying about the seeming increase of people reporting database corruption - I was just
wonderingwhether issues like these might play a part in that (I'm sure large part of it is just more people using PG
thesedays). 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: Snapshot backups

From
Tom Lane
Date:
Alban Hertroys <haramrae@gmail.com> writes:
> That begs the question what happens in case of a crash or (worse) a partial crash when multiple file systems are
involved.

As long as the OS+hardware honors the contract of fsync(), everything's
fine.  If the storage system loses data that it claims to have fsync'd to
stable storage, there's not much we can do about that, except recommend
that you have a backup plan.

In practice, the more complicated your storage infrastructure is, the more
likely it is to have bugs ...

            regards, tom lane


Re: Snapshot backups

From
James Sewell
Date:
Thank you Tom! This is what I was after!

So, to get this straight in my head.

  • pg_start_backup forces a checkpoint and writes the information from this checkpoint to the backup_label file
  • <snapshot>
  • pg_stop_backup removes the backup_label file
  • <disaster!>
  • <restore snapshot>
  • Database starts and determines where to start WAL replay from the backup_label NOT from pg_control (as usual)

Cheers,
James



James Sewell
PostgreSQL Team Lead / Solutions Architect

_____________________________________


http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Wed, Jul 31, 2013 at 10:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alban Hertroys <haramrae@gmail.com> writes:
> That begs the question what happens in case of a crash or (worse) a partial crash when multiple file systems are involved.

As long as the OS+hardware honors the contract of fsync(), everything's
fine.  If the storage system loses data that it claims to have fsync'd to
stable storage, there's not much we can do about that, except recommend
that you have a backup plan.

In practice, the more complicated your storage infrastructure is, the more
likely it is to have bugs ...

                        regards, tom lane



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment

pg_stat_replication became empty suddenly

From
"ascot.moss@gmail.com"
Date:
Hi,

I am doing some stress tests to a pair of PG servers to monitor the pg_stat_replication, during the test, the pg_stat_replication suddenly became empty. 

PG version: 9.2.4
O/S: Ubuntu: 12.04

Since I need to monitor the replication lag from time to time, if the pg_stat_replication becomes empty, the lag calculation in the slave will be wrong.
Please advise if this is a bug.

regards




How to reproduce:

session 1: Master server - try to insert a large number of records into a test table 
postgres=# drop table test;CREATE TABLE test (id INTEGER PRIMARY KEY); INSERT INTO test VALUES (generate_series(1,100000000));  EXPLAIN ANALYZE SELECT COUNT(*) FROM test;

2) session 2: Master server - check the byte_lag from time to time
postgres=# SELECT
    sent_offset - (
        replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag
FROM (
    SELECT
        client_addr,
        ('x' || lpad(split_part(sent_location,   '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,
        ('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,
        ('x' || lpad(split_part(sent_location,   '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,
        ('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset
    FROM pg_stat_replication
) AS s;
byte_lag 
----------
2097216
(1 row)

postgres=# SELECT

    sent_offset - (
        replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag
FROM (
    SELECT
        client_addr,
        ('x' || lpad(split_part(sent_location,   '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,
        ('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,
        ('x' || lpad(split_part(sent_location,   '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,
        ('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset
    FROM pg_stat_replication
) AS s;
byte_lag 
----------
(0 rows)


3) session 3: Slave server - 
postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()  THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
 log_delay 
-----------
         0
(1 row)

postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()  THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
 log_delay 
-----------
  4.873282
(1 row)

.
.
.
postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()  THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
  log_delay  
-------------
 4070.325329
(1 row)


Re: pg_stat_replication became empty suddenly

From
"ascot.moss@gmail.com"
Date:
Hi,

I found the problem should be because I tried to clean RAM cache in the slave by running "sync; echo 3 > /proc/sys/vm/drop_caches' 
that caused the "receiver" of slave gone away.

ps -ef | grep receiver
postgres  6182  6178  0 12:11 ?        00:00:06 postgres: wal receiver process   streaming D/FB8DA000    
sync; echo 3 > /proc/sys/vm/drop_caches
ps -ef | grep receiver
root      8804 30447  0 12:29 pts/2    00:00:00 grep --color=auto receiver

regards

On 6 Aug 2013, at 10:44 AM, ascot.moss@gmail.com wrote:

Hi,

I am doing some stress tests to a pair of PG servers to monitor the pg_stat_replication, during the test, the pg_stat_replication suddenly became empty. 

PG version: 9.2.4
O/S: Ubuntu: 12.04

Since I need to monitor the replication lag from time to time, if the pg_stat_replication becomes empty, the lag calculation in the slave will be wrong.
Please advise if this is a bug.

regards




How to reproduce:

session 1: Master server - try to insert a large number of records into a test table 
postgres=# drop table test;CREATE TABLE test (id INTEGER PRIMARY KEY); INSERT INTO test VALUES (generate_series(1,100000000));  EXPLAIN ANALYZE SELECT COUNT(*) FROM test;

2) session 2: Master server - check the byte_lag from time to time
postgres=# SELECT
    sent_offset - (
        replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag
FROM (
    SELECT
        client_addr,
        ('x' || lpad(split_part(sent_location,   '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,
        ('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,
        ('x' || lpad(split_part(sent_location,   '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,
        ('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset
    FROM pg_stat_replication
) AS s;
byte_lag 
----------
2097216
(1 row)

postgres=# SELECT

    sent_offset - (
        replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag
FROM (
    SELECT
        client_addr,
        ('x' || lpad(split_part(sent_location,   '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,
        ('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,
        ('x' || lpad(split_part(sent_location,   '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,
        ('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset
    FROM pg_stat_replication
) AS s;
byte_lag 
----------
(0 rows)


3) session 3: Slave server - 
postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()  THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
 log_delay 
-----------
         0
(1 row)

postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()  THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
 log_delay 
-----------
  4.873282
(1 row)

.
.
.
postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()  THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
  log_delay  
-------------
 4070.325329
(1 row)



psql: FATAL: the database system is starting up

From
"ascot.moss@gmail.com"
Date:
Hi,

I just setup the replication in the slave again, when trying to use psql, I could not get the psql command prompt but
got"psql: FATAL:  the database system is starting up" from it. 

PG: 9.2.4

Below is the log from the the slave:
LOG:  database system was shut down in recovery at 2013-08-06 18:34:44
LOG:  entering standby mode
LOG:  consistent recovery state reached at 1C/9A0F9CF0
LOG:  record with zero length at 1C/9A0F9CF0
LOG:  streaming replication successfully connected to primary
FATAL:  the database system is starting up

I am new to PG replication, please help.

regards



Re: psql: FATAL: the database system is starting up

From
Haribabu kommi
Date:
On 06 August 2013 16:13 ascot.moss wrote
>Hi,

>I just setup the replication in the slave again, when trying to use psql, I could not get the psql command prompt but
got"psql: FATAL:  the database system is starting up" from it. 

>PG: 9.2.4

>Below is the log from the the slave:
>LOG:  database system was shut down in recovery at 2013-08-06 18:34:44
>LOG:  entering standby mode
>LOG:  consistent recovery state reached at 1C/9A0F9CF0
>LOG:  record with zero length at 1C/9A0F9CF0
>LOG:  streaming replication successfully connected to primary
>FATAL:  the database system is starting up

>I am new to PG replication, please help.

There is a configuration parameter "hot_standby" is set as on or not?
This allows the queries during recovery.

Regards,
Hari babu.



Re: psql: FATAL: the database system is starting up

From
"ascot.moss@gmail.com"
Date:
Thanks, there was a typo of the line "hot_standby", it works now, thanks again.

On 6 Aug 2013, at 6:52 PM, Haribabu kommi wrote:

>
> On 06 August 2013 16:13 ascot.moss wrote
>> Hi,
>
>> I just setup the replication in the slave again, when trying to use psql, I could not get the psql command prompt
butgot "psql: FATAL:  the database system is starting up" from it. 
>
>> PG: 9.2.4
>
>> Below is the log from the the slave:
>> LOG:  database system was shut down in recovery at 2013-08-06 18:34:44
>> LOG:  entering standby mode
>> LOG:  consistent recovery state reached at 1C/9A0F9CF0
>> LOG:  record with zero length at 1C/9A0F9CF0
>> LOG:  streaming replication successfully connected to primary
>> FATAL:  the database system is starting up
>
>> I am new to PG replication, please help.
>
> There is a configuration parameter "hot_standby" is set as on or not?
> This allows the queries during recovery.
>
> Regards,
> Hari babu.
>



Re: pg_stat_replication became empty suddenly

From
"ascot.moss@gmail.com"
Date:
Hi,

I just tried another round of tests, without running  "sync; echo 3 > /proc/sys/vm/drop_caches',
still got the same error,  following FATAL errors are found in pg_log (slave), can anyone please advise how to resolve this error?

regards


LOG:  entering standby mode
LOG:  consistent recovery state reached at 11/42000318
LOG:  redo starts at 11/42000280
LOG:  invalid record length at 11/42000318
LOG:  database system is ready to accept read only connections
LOG:  streaming replication successfully connected to primary
FATAL:  could not send data to WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
LOG:  unexpected pageaddr 10/D2EC0000 in log file 18, segment 5, offset 15466496
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already been removed





On 6 Aug 2013, at 12:39 PM, ascot.moss@gmail.com wrote:

Hi,

I found the problem should be because I tried to clean RAM cache in the slave by running "sync; echo 3 > /proc/sys/vm/drop_caches' 
that caused the "receiver" of slave gone away.

ps -ef | grep receiver
postgres  6182  6178  0 12:11 ?        00:00:06 postgres: wal receiver process   streaming D/FB8DA000    
sync; echo 3 > /proc/sys/vm/drop_caches
ps -ef | grep receiver
root      8804 30447  0 12:29 pts/2    00:00:00 grep --color=auto receiver

regards

On 6 Aug 2013, at 10:44 AM, ascot.moss@gmail.com wrote:

Hi,

I am doing some stress tests to a pair of PG servers to monitor the pg_stat_replication, during the test, the pg_stat_replication suddenly became empty. 

PG version: 9.2.4
O/S: Ubuntu: 12.04

Since I need to monitor the replication lag from time to time, if the pg_stat_replication becomes empty, the lag calculation in the slave will be wrong.
Please advise if this is a bug.

regards




How to reproduce:

session 1: Master server - try to insert a large number of records into a test table 
postgres=# drop table test;CREATE TABLE test (id INTEGER PRIMARY KEY); INSERT INTO test VALUES (generate_series(1,100000000));  EXPLAIN ANALYZE SELECT COUNT(*) FROM test;

2) session 2: Master server - check the byte_lag from time to time
postgres=# SELECT
    sent_offset - (
        replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag
FROM (
    SELECT
        client_addr,
        ('x' || lpad(split_part(sent_location,   '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,
        ('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,
        ('x' || lpad(split_part(sent_location,   '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,
        ('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset
    FROM pg_stat_replication
) AS s;
byte_lag 
----------
2097216
(1 row)

postgres=# SELECT

    sent_offset - (
        replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag
FROM (
    SELECT
        client_addr,
        ('x' || lpad(split_part(sent_location,   '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,
        ('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,
        ('x' || lpad(split_part(sent_location,   '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,
        ('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset
    FROM pg_stat_replication
) AS s;
byte_lag 
----------
(0 rows)


3) session 3: Slave server - 
postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()  THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
 log_delay 
-----------
         0
(1 row)

postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()  THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
 log_delay 
-----------
  4.873282
(1 row)

.
.
.
postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()  THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
  log_delay  
-------------
 4070.325329
(1 row)




Re: pg_stat_replication became empty suddenly

From
Jerry Sievers
Date:
"ascot.moss@gmail.com" <ascot.moss@gmail.com> writes:

> Hi,
>
> I just tried another round of tests, without running  "sync; echo 3 > /proc/sys/vm/drop_caches',
> still got the same error,  following FATAL errors are found in pg_log (slave), can anyone please advise how to
resolve
> this error?
>
> regards
>
> LOG:  entering standby mode
> LOG:  consistent recovery state reached at 11/42000318
> LOG:  redo starts at 11/42000280
> LOG:  invalid record length at 11/42000318
> LOG:  database system is ready to accept read only connections
> LOG:  streaming replication successfully connected to primary
> FATAL:  could not send data to WAL stream: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> LOG:  unexpected pageaddr 10/D2EC0000 in log file 18, segment 5, offset 15466496
> LOG:  streaming replication successfully connected to primary
> FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already
> been removed
> LOG:  streaming replication successfully connected to primary
> FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already

Raise wal_keep_segments on your master configs ,and HUP and/or start
your standby a lot sooner after it's reloaded.

<snip>
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: pg_stat_replication became empty suddenly

From
"ascot.moss@gmail.com"
Date:
Thanks. I increased the wal_keep_segments  and it works well now.

On 7 Aug 2013, at 12:43 AM, Jerry Sievers wrote:

> "ascot.moss@gmail.com" <ascot.moss@gmail.com> writes:
>
>> Hi,
>>
>> I just tried another round of tests, without running  "sync; echo 3 > /proc/sys/vm/drop_caches',
>> still got the same error,  following FATAL errors are found in pg_log (slave), can anyone please advise how to
resolve
>> this error?
>>
>> regards
>>
>> LOG:  entering standby mode
>> LOG:  consistent recovery state reached at 11/42000318
>> LOG:  redo starts at 11/42000280
>> LOG:  invalid record length at 11/42000318
>> LOG:  database system is ready to accept read only connections
>> LOG:  streaming replication successfully connected to primary
>> FATAL:  could not send data to WAL stream: server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> LOG:  unexpected pageaddr 10/D2EC0000 in log file 18, segment 5, offset 15466496
>> LOG:  streaming replication successfully connected to primary
>> FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already
>> been removed
>> LOG:  streaming replication successfully connected to primary
>> FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001200000005 has already
>
> Raise wal_keep_segments on your master configs ,and HUP and/or start
> your standby a lot sooner after it's reloaded.
>
> <snip>
>>
>
> --
> Jerry Sievers
> Postgres DBA/Development Consulting
> e: postgres.consulting@comcast.net
> p: 312.241.7800



Re: Snapshot backups

From
Bruce Momjian
Date:
On Wed, Jul 31, 2013 at 08:24:46AM -0400, Tom Lane wrote:
> Alban Hertroys <haramrae@gmail.com> writes:
> > That begs the question what happens in case of a crash or (worse) a partial crash when multiple file systems are
involved.
>
> As long as the OS+hardware honors the contract of fsync(), everything's
> fine.  If the storage system loses data that it claims to have fsync'd to
> stable storage, there's not much we can do about that, except recommend
> that you have a backup plan.
>
> In practice, the more complicated your storage infrastructure is, the more
> likely it is to have bugs ...

Just to give the 10k mile answer, the WAL contains all database changes
that _might_ be lost due to file system changes during a base backup.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +