Thread: Snapshot backups
Hey All,
This is a message to confirm my thoughts / validate a possible approach.
In a situation where PGDATA and {XLOG, ARCHIVELOG} are on different SAN/NAS volumes and a backup is to be initiated do pg_start_backup and pg_stop_backup need to be used?
I am using snapshots of each volume for backup.
My thinking is that they are not needed (although I realise it is good practice).
As far as I can tell all they are doing is something like:
pg_start_backup:
- create backup label
- trigger checkpoint
pg_stop_backup
- remove backup label file
- creates backup history file
- trigger log switch
There is nothing in here that is *required* from a backup point of view. Am I missing anything?
James Sewell
Solutions Architect
_____________________________________
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment
On Thu, Jun 20, 2013 at 8:45 AM, James Sewell <james.sewell@lisasoft.com> wrote:
Hey All,This is a message to confirm my thoughts / validate a possible approach.In a situation where PGDATA and {XLOG, ARCHIVELOG} are on different SAN/NAS volumes and a backup is to be initiated do pg_start_backup and pg_stop_backup need to be used?I am using snapshots of each volume for backup.My thinking is that they are not needed (although I realise it is good practice).As far as I can tell all they are doing is something like:pg_start_backup:- create backup label- trigger checkpointpg_stop_backup- remove backup label file- creates backup history file- trigger log switchThere is nothing in here that is *required* from a backup point of view. Am I missing anything?
The backup functions also set internal state in the database, so you can't just replace it with doing those operations manually. You do need to call those functions.
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Thanks Magnus,
Could you elaborate a bit more on this?
I've been having a look at do_pg_start_backup() and I can't really see anything apart from enabling full page writes and running a checkpoint to avoid getting a torn page. I could be missing something easily though, as I'm not familiar with the codebase.
do_pg_stop_backup() isn't really of consequence, as the backup is taken before this - so any restore is to a point in time before this as well.
I was under the impression a restore was (more or less) the same as a crash recovery, and logically it seems like PGDATA snapshot is equivalent to a crash/restart (disk at a discrete point in time).
I can understand if log replay might take longer, but I am struggling to see how it could result in an inconsistent state?
As I said I know this isn't best practice, but just want to understand how it works.
Cheers,
James Sewell
Solutions Architect
_____________________________________
On Thu, Jun 20, 2013 at 6:34 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Jun 20, 2013 at 8:45 AM, James Sewell <james.sewell@lisasoft.com> wrote:Hey All,This is a message to confirm my thoughts / validate a possible approach.In a situation where PGDATA and {XLOG, ARCHIVELOG} are on different SAN/NAS volumes and a backup is to be initiated do pg_start_backup and pg_stop_backup need to be used?I am using snapshots of each volume for backup.My thinking is that they are not needed (although I realise it is good practice).As far as I can tell all they are doing is something like:pg_start_backup:- create backup label- trigger checkpointpg_stop_backup- remove backup label file- creates backup history file- trigger log switchThere is nothing in here that is *required* from a backup point of view. Am I missing anything?The backup functions also set internal state in the database, so you can't just replace it with doing those operations manually. You do need to call those functions.--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment
Hey all,
I understand that I have already been given an answer here, but I am still curious as to why this is the case (perhaps I should ask this on the hackers list though, if so let me know).
More importantly I'd like to understand why I would need to use the start/stop backup commands to ensure a valid backup when using filesystem snapshots (assuming I get the order correct)- worst case scenario wouldn't it be the same as a crash and cause an automatic roll-forward?
Cheers,
James
James Sewell
PostgreSQL Team Lead / Solutions Architect
_____________________________________
On Fri, Jun 21, 2013 at 10:17 AM, James Sewell <james.sewell@lisasoft.com> wrote:
Thanks Magnus,Could you elaborate a bit more on this?I've been having a look at do_pg_start_backup() and I can't really see anything apart from enabling full page writes and running a checkpoint to avoid getting a torn page. I could be missing something easily though, as I'm not familiar with the codebase.do_pg_stop_backup() isn't really of consequence, as the backup is taken before this - so any restore is to a point in time before this as well.I was under the impression a restore was (more or less) the same as a crash recovery, and logically it seems like PGDATA snapshot is equivalent to a crash/restart (disk at a discrete point in time).I can understand if log replay might take longer, but I am struggling to see how it could result in an inconsistent state?As I said I know this isn't best practice, but just want to understand how it works.Cheers,James Sewell
Solutions Architect
_____________________________________On Thu, Jun 20, 2013 at 6:34 PM, Magnus Hagander <magnus@hagander.net> wrote:On Thu, Jun 20, 2013 at 8:45 AM, James Sewell <james.sewell@lisasoft.com> wrote:Hey All,This is a message to confirm my thoughts / validate a possible approach.In a situation where PGDATA and {XLOG, ARCHIVELOG} are on different SAN/NAS volumes and a backup is to be initiated do pg_start_backup and pg_stop_backup need to be used?I am using snapshots of each volume for backup.My thinking is that they are not needed (although I realise it is good practice).As far as I can tell all they are doing is something like:pg_start_backup:- create backup label- trigger checkpointpg_stop_backup- remove backup label file- creates backup history file- trigger log switchThere is nothing in here that is *required* from a backup point of view. Am I missing anything?The backup functions also set internal state in the database, so you can't just replace it with doing those operations manually. You do need to call those functions.--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment
On Mon, Jul 29, 2013 at 3:32 PM, James Sewell <james.sewell@lisasoft.com> wrote:
-- Hey all,I understand that I have already been given an answer here, but I am still curious as to why this is the case (perhaps I should ask this on the hackers list though, if so let me know).More importantly I'd like to understand why I would need to use the start/stop backup commands to ensure a valid backup when using filesystem snapshots (assuming I get the order correct)- worst case scenario wouldn't it be the same as a crash and cause an automatic roll-forward?
pg_start_backup('backup_label') and pg_stop_backup(), if I understand it correctly, write to the 'backup_label' file the information necessary to recover "consistently" from that backup. For example, backup_label file contains the checkpoint location and its REDO location (identified as "START WAL LOCATION:" field in the backup_label file.) While you are reading the code, you can read the comment above the function read_backup_label() in src/backend/access/transam/xlog.c
Amit Langote
I understand what you are saying, and I understand how the backup_label works - but I still don't understand why the pg_start and pg_stop commands are REQUIRED when doing a snapshot backup to ensure data integrity.
Surely not using them and restoring a snapshot is the same as starting after a crash, and will result in log replay to get to the latest possible consistent state?
I thought PostgreSQL guaranteed a consistent state after a crash, am I mistaken about this?
James
James Sewell
PostgreSQL Team Lead / Solutions Architect
_____________________________________
On Mon, Jul 29, 2013 at 4:54 PM, Amit Langote <amitlangote09@gmail.com> wrote:
--On Mon, Jul 29, 2013 at 3:32 PM, James Sewell <james.sewell@lisasoft.com> wrote:Hey all,I understand that I have already been given an answer here, but I am still curious as to why this is the case (perhaps I should ask this on the hackers list though, if so let me know).More importantly I'd like to understand why I would need to use the start/stop backup commands to ensure a valid backup when using filesystem snapshots (assuming I get the order correct)- worst case scenario wouldn't it be the same as a crash and cause an automatic roll-forward?pg_start_backup('backup_label') and pg_stop_backup(), if I understand it correctly, write to the 'backup_label' file the information necessary to recover "consistently" from that backup. For example, backup_label file contains the checkpoint location and its REDO location (identified as "START WAL LOCATION:" field in the backup_label file.) While you are reading the code, you can read the comment above the function read_backup_label() in src/backend/access/transam/xlog.c
Amit Langote
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment
On Tuesday, July 30, 2013, James Sewell wrote:
I understand what you are saying, and I understand how the backup_label works - but I still don't understand why the pg_start and pg_stop commands are REQUIRED when doing a snapshot backup to ensure data integrity.Surely not using them and restoring a snapshot is the same as starting after a crash, and will result in log replay to get to the latest possible consistent state?
That's true provided that all of your data is on a single volume, and you trust your "snapshot" to be free of bugs.
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Tuesday, July 30, 2013, James Sewell wrote: >> I understand what you are saying, and I understand how the backup_label >> works - but I still don't understand why the pg_start and pg_stop commands >> are REQUIRED when doing a snapshot backup to ensure data integrity. >> >> Surely not using them and restoring a snapshot is the same as starting >> after a crash, and will result in log replay to get to the latest possible >> consistent state? > That's true provided that all of your data is on a single volume, and you > trust your "snapshot" to be free of bugs. James stated to begin with that his data was spread across multiple volumes, so it's definitely not safe for him to omit pg_start_backup/pg_stop_backup. Perhaps it would help to consider what can happen when you're not using filesystem snapshots at all, but just an ordinary userspace backup program such as "tar". The difference between recovering from a tar backup and crash recovery is that, because the backup is taken over an extended period of time, it may contain a set of data that does not match any possible instantaneous state of the on-disk data --- and crash recovery only promises to deal with the latter. Here is a concrete example of what can happen: 1. The tar process copies the file for table foo. There are changes to foo in Postgres' shared buffers that haven't made it to disk yet (although those changes are committed and recorded in on-disk WAL), so the copy made by tar isn't entirely up to date. 2. PG's checkpoint process starts a checkpoint run. Along the way, it flushes out the changes to table foo. When done, it updates the last-checkpoint pointer in pg_control, which tells where crash recovery would need to start replaying WAL. 3. The tar process archives pg_control. Now, if you restore the tar backup onto a new system and start up Postgres, you will have an obsolete copy of table foo --- and WAL replay will not apply the needed updates to foo, because it will start from the point in WAL that pg_control says it should start from, and that's after the WAL records that describe the missing changes. Note that this will fail even if you assume you've got perfectly good and complete copies of the WAL files; there's a whole 'nother set of hazards if you don't. For recovery from a tar backup to work, the archived copy of pg_control must point to a spot in the WAL sequence that is before any changes that could possibly not yet appear in any archived data files. The purpose of pg_start_backup/pg_stop_backup is to provide the synchronization needed to meet this requirement. The comparable case isn't possible for crash recovery, assuming that the OS and storage hardware implement fsync() correctly, because we'll have fsync'd the changes to foo down to disk before updating pg_control. Now, if you instead take a filesystem snapshot (representing some instantaneous state of the disk contents) and run "tar" to copy that, you have a good backup, because you must have a copy of pg_control that will tell you to re-apply any changes that are missing from the data files, as well as WAL files that contain the needed records. However, this is only certain if all that data is on *one* filesystem, because otherwise you can't be sure you have mutually consistent snapshots. And you're vulnerable to any bugs in the filesystem's snapshot implementation that might give you inconsistent copies of different files. (Such bugs would probably be closely related to bugs in fsync ... but that doesn't mean they're necessarily exactly the same.) So that's the long form of Jeff's comment above. Any clearer now? regards, tom lane
On Jul 31, 2013, at 7:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> On Tuesday, July 30, 2013, James Sewell wrote: >>> I understand what you are saying, and I understand how the backup_label >>> works - but I still don't understand why the pg_start and pg_stop commands >>> are REQUIRED when doing a snapshot backup to ensure data integrity. >>> >>> Surely not using them and restoring a snapshot is the same as starting >>> after a crash, and will result in log replay to get to the latest possible >>> consistent state? > >> That's true provided that all of your data is on a single volume, and you >> trust your "snapshot" to be free of bugs. > > James stated to begin with that his data was spread across multiple > volumes, so it's definitely not safe for him to omit > pg_start_backup/pg_stop_backup. (…) > For recovery from a tar backup to work, the archived copy of pg_control > must point to a spot in the WAL sequence that is before any changes that > could possibly not yet appear in any archived data files. The purpose of > pg_start_backup/pg_stop_backup is to provide the synchronization needed to > meet this requirement. > > The comparable case isn't possible for crash recovery, assuming that the > OS and storage hardware implement fsync() correctly, because we'll have > fsync'd the changes to foo down to disk before updating pg_control. > > Now, if you instead take a filesystem snapshot (representing some > instantaneous state of the disk contents) and run "tar" to copy that, > you have a good backup, because you must have a copy of pg_control that > will tell you to re-apply any changes that are missing from the data > files, as well as WAL files that contain the needed records. However, > this is only certain if all that data is on *one* filesystem, because > otherwise you can't be sure you have mutually consistent snapshots. > And you're vulnerable to any bugs in the filesystem's snapshot > implementation that might give you inconsistent copies of different > files. (Such bugs would probably be closely related to bugs in fsync > ... but that doesn't mean they're necessarily exactly the same.) That begs the question what happens in case of a crash or (worse) a partial crash when multiple file systems are involved. Say, one tablespace is on a ZFS volume and one is on an UFS volume and the ZFS code crashes. That should result in a kernelpanic, of course, in which case every file-system is stopped at the same moment and there should be a consistent "snapshot".But what if it doesn't trigger a panic? And how would file-system recovery play into this? Would a journaled file-system roll back every segment involved in thesame interrupted transaction, or could this create an inconsistent snapshot where the original situation (sans file-systemrecovery) would actually have been a preferable state to start from? And what happens if one of these volumes would, for example, get ejected from a RAID controller (because all disks in ithave triggered alarms, whether genuine or not) and the other volume would not get ejected? Does the database abort work or does that result in an inconsistent state? I suppose this case usually doesn't matter much, part of the database is gone completely anyway, but what if the RAID controllerwas wrong and the disks are actually just fine and come back once re-inserted? That has actually happened severaltimes here, caused by SATA wires vibrating loose over time (took me a while to discover the cause); my databases havealways been on a single volume though, so I've never had the opportunity to run into this. I realise these are typical "what if" scenario's, so I suppose answering these doesn't have a high priority. It's just thatI've been wondering/worrying about the seeming increase of people reporting database corruption - I was just wonderingwhether issues like these might play a part in that (I'm sure large part of it is just more people using PG thesedays). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Alban Hertroys <haramrae@gmail.com> writes: > That begs the question what happens in case of a crash or (worse) a partial crash when multiple file systems are involved. As long as the OS+hardware honors the contract of fsync(), everything's fine. If the storage system loses data that it claims to have fsync'd to stable storage, there's not much we can do about that, except recommend that you have a backup plan. In practice, the more complicated your storage infrastructure is, the more likely it is to have bugs ... regards, tom lane
Thank you Tom! This is what I was after!
So, to get this straight in my head.
- pg_start_backup forces a checkpoint and writes the information from this checkpoint to the backup_label file
- <snapshot>
- pg_stop_backup removes the backup_label file
- <disaster!>
- <restore snapshot>
- Database starts and determines where to start WAL replay from the backup_label NOT from pg_control (as usual)
Cheers,
James
James Sewell
PostgreSQL Team Lead / Solutions Architect
_____________________________________
On Wed, Jul 31, 2013 at 10:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alban Hertroys <haramrae@gmail.com> writes:As long as the OS+hardware honors the contract of fsync(), everything's
> That begs the question what happens in case of a crash or (worse) a partial crash when multiple file systems are involved.
fine. If the storage system loses data that it claims to have fsync'd to
stable storage, there's not much we can do about that, except recommend
that you have a backup plan.
In practice, the more complicated your storage infrastructure is, the more
likely it is to have bugs ...
regards, tom lane
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment
Hi,
I am doing some stress tests to a pair of PG servers to monitor the pg_stat_replication, during the test, the pg_stat_replication suddenly became empty.
PG version: 9.2.4
O/S: Ubuntu: 12.04
Since I need to monitor the replication lag from time to time, if the pg_stat_replication becomes empty, the lag calculation in the slave will be wrong.
Please advise if this is a bug.
regards
How to reproduce:
session 1: Master server - try to insert a large number of records into a test table
postgres=# drop table test;CREATE TABLE test (id INTEGER PRIMARY KEY); INSERT INTO test VALUES (generate_series(1,100000000)); EXPLAIN ANALYZE SELECT COUNT(*) FROM test;
2) session 2: Master server - check the byte_lag from time to time
postgres=# SELECT
sent_offset - (
replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag
FROM (
SELECT
client_addr,
('x' || lpad(split_part(sent_location, '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,
('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,
('x' || lpad(split_part(sent_location, '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,
('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset
FROM pg_stat_replication
) AS s;
byte_lag
----------
2097216
(1 row)
postgres=# SELECT
sent_offset - (
replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag
FROM (
SELECT
client_addr,
('x' || lpad(split_part(sent_location, '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,
('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,
('x' || lpad(split_part(sent_location, '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,
('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset
FROM pg_stat_replication
) AS s;
byte_lag
----------
(0 rows)
3) session 3: Slave server -
postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
log_delay
-----------
0
(1 row)
postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
log_delay
-----------
4.873282
(1 row)
.
.
.
postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
log_delay
-------------
4070.325329
(1 row)
Hi,
I found the problem should be because I tried to clean RAM cache in the slave by running "sync; echo 3 > /proc/sys/vm/drop_caches'
that caused the "receiver" of slave gone away.
ps -ef | grep receiver
postgres 6182 6178 0 12:11 ? 00:00:06 postgres: wal receiver process streaming D/FB8DA000
sync; echo 3 > /proc/sys/vm/drop_caches
ps -ef | grep receiver
root 8804 30447 0 12:29 pts/2 00:00:00 grep --color=auto receiver
regards
On 6 Aug 2013, at 10:44 AM, ascot.moss@gmail.com wrote:
Hi,I am doing some stress tests to a pair of PG servers to monitor the pg_stat_replication, during the test, the pg_stat_replication suddenly became empty.PG version: 9.2.4O/S: Ubuntu: 12.04Since I need to monitor the replication lag from time to time, if the pg_stat_replication becomes empty, the lag calculation in the slave will be wrong.Please advise if this is a bug.regardsHow to reproduce:session 1: Master server - try to insert a large number of records into a test tablepostgres=# drop table test;CREATE TABLE test (id INTEGER PRIMARY KEY); INSERT INTO test VALUES (generate_series(1,100000000)); EXPLAIN ANALYZE SELECT COUNT(*) FROM test;2) session 2: Master server - check the byte_lag from time to timepostgres=# SELECTsent_offset - (replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lagFROM (SELECTclient_addr,('x' || lpad(split_part(sent_location, '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,('x' || lpad(split_part(sent_location, '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offsetFROM pg_stat_replication) AS s;byte_lag----------2097216(1 row)postgres=# SELECTsent_offset - (replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lagFROM (SELECTclient_addr,('x' || lpad(split_part(sent_location, '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,('x' || lpad(split_part(sent_location, '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offsetFROM pg_stat_replication) AS s;byte_lag----------(0 rows)3) session 3: Slave server -postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;log_delay-----------0(1 row)postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;log_delay-----------4.873282(1 row)...postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;log_delay-------------4070.325329(1 row)
Hi, I just setup the replication in the slave again, when trying to use psql, I could not get the psql command prompt but got"psql: FATAL: the database system is starting up" from it. PG: 9.2.4 Below is the log from the the slave: LOG: database system was shut down in recovery at 2013-08-06 18:34:44 LOG: entering standby mode LOG: consistent recovery state reached at 1C/9A0F9CF0 LOG: record with zero length at 1C/9A0F9CF0 LOG: streaming replication successfully connected to primary FATAL: the database system is starting up I am new to PG replication, please help. regards
On 06 August 2013 16:13 ascot.moss wrote >Hi, >I just setup the replication in the slave again, when trying to use psql, I could not get the psql command prompt but got"psql: FATAL: the database system is starting up" from it. >PG: 9.2.4 >Below is the log from the the slave: >LOG: database system was shut down in recovery at 2013-08-06 18:34:44 >LOG: entering standby mode >LOG: consistent recovery state reached at 1C/9A0F9CF0 >LOG: record with zero length at 1C/9A0F9CF0 >LOG: streaming replication successfully connected to primary >FATAL: the database system is starting up >I am new to PG replication, please help. There is a configuration parameter "hot_standby" is set as on or not? This allows the queries during recovery. Regards, Hari babu.
Thanks, there was a typo of the line "hot_standby", it works now, thanks again. On 6 Aug 2013, at 6:52 PM, Haribabu kommi wrote: > > On 06 August 2013 16:13 ascot.moss wrote >> Hi, > >> I just setup the replication in the slave again, when trying to use psql, I could not get the psql command prompt butgot "psql: FATAL: the database system is starting up" from it. > >> PG: 9.2.4 > >> Below is the log from the the slave: >> LOG: database system was shut down in recovery at 2013-08-06 18:34:44 >> LOG: entering standby mode >> LOG: consistent recovery state reached at 1C/9A0F9CF0 >> LOG: record with zero length at 1C/9A0F9CF0 >> LOG: streaming replication successfully connected to primary >> FATAL: the database system is starting up > >> I am new to PG replication, please help. > > There is a configuration parameter "hot_standby" is set as on or not? > This allows the queries during recovery. > > Regards, > Hari babu. >
Hi,
I just tried another round of tests, without running "sync; echo 3 > /proc/sys/vm/drop_caches',
still got the same error, following FATAL errors are found in pg_log (slave), can anyone please advise how to resolve this error?
regards
LOG: entering standby mode
LOG: consistent recovery state reached at 11/42000318
LOG: redo starts at 11/42000280
LOG: invalid record length at 11/42000318
LOG: database system is ready to accept read only connections
LOG: streaming replication successfully connected to primary
FATAL: could not send data to WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
LOG: unexpected pageaddr 10/D2EC0000 in log file 18, segment 5, offset 15466496
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already been removed
On 6 Aug 2013, at 12:39 PM, ascot.moss@gmail.com wrote:
Hi,I found the problem should be because I tried to clean RAM cache in the slave by running "sync; echo 3 > /proc/sys/vm/drop_caches'that caused the "receiver" of slave gone away.ps -ef | grep receiverpostgres 6182 6178 0 12:11 ? 00:00:06 postgres: wal receiver process streaming D/FB8DA000sync; echo 3 > /proc/sys/vm/drop_cachesps -ef | grep receiverroot 8804 30447 0 12:29 pts/2 00:00:00 grep --color=auto receiverregardsOn 6 Aug 2013, at 10:44 AM, ascot.moss@gmail.com wrote:Hi,I am doing some stress tests to a pair of PG servers to monitor the pg_stat_replication, during the test, the pg_stat_replication suddenly became empty.PG version: 9.2.4O/S: Ubuntu: 12.04Since I need to monitor the replication lag from time to time, if the pg_stat_replication becomes empty, the lag calculation in the slave will be wrong.Please advise if this is a bug.regardsHow to reproduce:session 1: Master server - try to insert a large number of records into a test tablepostgres=# drop table test;CREATE TABLE test (id INTEGER PRIMARY KEY); INSERT INTO test VALUES (generate_series(1,100000000)); EXPLAIN ANALYZE SELECT COUNT(*) FROM test;2) session 2: Master server - check the byte_lag from time to timepostgres=# SELECTsent_offset - (replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lagFROM (SELECTclient_addr,('x' || lpad(split_part(sent_location, '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,('x' || lpad(split_part(sent_location, '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offsetFROM pg_stat_replication) AS s;byte_lag----------2097216(1 row)postgres=# SELECTsent_offset - (replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lagFROM (SELECTclient_addr,('x' || lpad(split_part(sent_location, '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,('x' || lpad(split_part(sent_location, '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offsetFROM pg_stat_replication) AS s;byte_lag----------(0 rows)3) session 3: Slave server -postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;log_delay-----------0(1 row)postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;log_delay-----------4.873282(1 row)...postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;log_delay-------------4070.325329(1 row)
"ascot.moss@gmail.com" <ascot.moss@gmail.com> writes: > Hi, > > I just tried another round of tests, without running "sync; echo 3 > /proc/sys/vm/drop_caches', > still got the same error, following FATAL errors are found in pg_log (slave), can anyone please advise how to resolve > this error? > > regards > > LOG: entering standby mode > LOG: consistent recovery state reached at 11/42000318 > LOG: redo starts at 11/42000280 > LOG: invalid record length at 11/42000318 > LOG: database system is ready to accept read only connections > LOG: streaming replication successfully connected to primary > FATAL: could not send data to WAL stream: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > LOG: unexpected pageaddr 10/D2EC0000 in log file 18, segment 5, offset 15466496 > LOG: streaming replication successfully connected to primary > FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already > been removed > LOG: streaming replication successfully connected to primary > FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already Raise wal_keep_segments on your master configs ,and HUP and/or start your standby a lot sooner after it's reloaded. <snip> > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 312.241.7800
Thanks. I increased the wal_keep_segments and it works well now. On 7 Aug 2013, at 12:43 AM, Jerry Sievers wrote: > "ascot.moss@gmail.com" <ascot.moss@gmail.com> writes: > >> Hi, >> >> I just tried another round of tests, without running "sync; echo 3 > /proc/sys/vm/drop_caches', >> still got the same error, following FATAL errors are found in pg_log (slave), can anyone please advise how to resolve >> this error? >> >> regards >> >> LOG: entering standby mode >> LOG: consistent recovery state reached at 11/42000318 >> LOG: redo starts at 11/42000280 >> LOG: invalid record length at 11/42000318 >> LOG: database system is ready to accept read only connections >> LOG: streaming replication successfully connected to primary >> FATAL: could not send data to WAL stream: server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> LOG: unexpected pageaddr 10/D2EC0000 in log file 18, segment 5, offset 15466496 >> LOG: streaming replication successfully connected to primary >> FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already >> been removed >> LOG: streaming replication successfully connected to primary >> FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000001200000005 has already > > Raise wal_keep_segments on your master configs ,and HUP and/or start > your standby a lot sooner after it's reloaded. > > <snip> >> > > -- > Jerry Sievers > Postgres DBA/Development Consulting > e: postgres.consulting@comcast.net > p: 312.241.7800
On Wed, Jul 31, 2013 at 08:24:46AM -0400, Tom Lane wrote: > Alban Hertroys <haramrae@gmail.com> writes: > > That begs the question what happens in case of a crash or (worse) a partial crash when multiple file systems are involved. > > As long as the OS+hardware honors the contract of fsync(), everything's > fine. If the storage system loses data that it claims to have fsync'd to > stable storage, there's not much we can do about that, except recommend > that you have a backup plan. > > In practice, the more complicated your storage infrastructure is, the more > likely it is to have bugs ... Just to give the 10k mile answer, the WAL contains all database changes that _might_ be lost due to file system changes during a base backup. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +