Thread: pg_basebackup, requested WAL has already been removed
Hi, I've recently started to use pg_basebackup --xlog-method=stream to backup my multi-Tb database. Before I did the backup when there was not much activity in the DB and it went perfectly fine, but today, I've started the backup and it failed twice almost at the same time as the CREATE INDEX (and another time CLUSTER) commands were finished. Here: postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup --xlog-method=stream --progress --verbose --pg transaction log start point: 23AE/BD003E70 pg_basebackup: starting background WAL receiver pg_basebackup: unexpected termination of replication stream: FATAL: requested WAL segment 00000001000023B1000000FE has already been removed 4819820/16816887078 kB (4%), 0/1 tablespace (/mnt/backup/wsdb_130510/base/1) And the logs from around that time contained: some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG: duration: 40128.163 ms statement: CREATE INDEX usno_cle an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec)); ::2013-05-10 14:35:43 BST:25529LOG: checkpoints are occurring too frequently (8 seconds apart) ::2013-05-10 14:35:43 BST:25529HINT: Consider increasing the configuration parameter "checkpoint_segmen ts". ::2013-05-10 14:35:51 BST:25529LOG: checkpoints are occurring too frequently (8 seconds apart) ::2013-05-10 14:35:51 BST:25529HINT: Consider increasing the configuration parameter "checkpoint_segmen ts". postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL: requested WAL segment 00000001000023B1000000FE has already been removed some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG: duration: 78378.194 ms statement: CLUSTER usno_clean_q3c_idx ON usno_clean; One the previous occasion when it happened the CREATE INDEX() was being executed: some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG: duration: 67.680 ms statement: SELECT name FROM (SELECT pg_catalog.lower(name)AS name FROM pg_catalog.pg_settings UNION ALL SELECT 'session authorization' UNION ALL SELECT 'all')ss WHERE substring(name,1,4)='rand' LIMIT 1000 ::2013-05-10 09:22:47 BST:25529LOG: checkpoints are occurring too frequently (18 seconds apart) ::2013-05-10 09:22:47 BST:25529HINT: Consider increasing the configuration parameter "checkpoint_segments". postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL: requested WAL segment 000000010000239900000040 has already been removed some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG: duration: 542955.262 ms statement: CREATE INDEX xmatch_temp_usnoid_idxON xmatch_temp (usno_id); The .configuration PG 9.2.4, Debian 7.0, amd64 shared_buffers = 10GB work_mem = 1GB maintenance_work_mem = 1GB effective_io_concurrency = 5 synchronous_commit = off checkpoint_segments = 32 max_wal_senders = 2 effective_cache_size = 30GB autovacuum_max_workers = 3 wal_level=archive archive_mode = off Does it look like a bug or am I missing something ? Thanks, Sergey
Its definitely not a bug. You need to set/increase wal_keep_segments to a value that ensures that they aren't recycled faster than the time required to complete the base backup (plus some buffer). On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote: > Hi, > > I've recently started to use pg_basebackup --xlog-method=stream to backup my > multi-Tb database. > Before I did the backup when there was not much activity in the DB and it > went perfectly fine, but today, I've started the backup and it failed twice > almost at the same time as the > CREATE INDEX (and another time CLUSTER) commands were finished. > > Here: > > postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup > --xlog-method=stream --progress --verbose --pg > transaction log start point: 23AE/BD003E70 > pg_basebackup: starting background WAL receiver > pg_basebackup: unexpected termination of replication stream: FATAL: > requested WAL segment 00000001000023B1000000FE has already been removed > 4819820/16816887078 kB (4%), 0/1 tablespace > (/mnt/backup/wsdb_130510/base/1) > > And the logs from around that time contained: > > some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG: duration: 40128.163 ms > statement: CREATE INDEX usno_cle > an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec)); > ::2013-05-10 14:35:43 BST:25529LOG: checkpoints are occurring too > frequently (8 seconds apart) > ::2013-05-10 14:35:43 BST:25529HINT: Consider increasing the configuration > parameter "checkpoint_segmen > ts". > ::2013-05-10 14:35:51 BST:25529LOG: checkpoints are occurring too > frequently (8 seconds apart) > ::2013-05-10 14:35:51 BST:25529HINT: Consider increasing the configuration > parameter "checkpoint_segmen > ts". > postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL: requested WAL segment > 00000001000023B1000000FE has already been removed > some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG: duration: 78378.194 ms > statement: CLUSTER usno_clean_q3c_idx ON usno_clean; > > One the previous occasion when it happened the CREATE INDEX() was being > executed: > > some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG: duration: 67.680 ms > statement: SELECT name FROM (SELECT pg_catalog.lower(name) AS name FROM > pg_catalog.pg_settings UNION ALL SELECT 'session authorization' UNION > ALL SELECT 'all') ss WHERE substring(name,1,4)='rand' > LIMIT 1000 > ::2013-05-10 09:22:47 BST:25529LOG: checkpoints are occurring too > frequently (18 seconds apart) > ::2013-05-10 09:22:47 BST:25529HINT: Consider increasing the configuration > parameter "checkpoint_segments". > postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL: requested WAL > segment 000000010000239900000040 has already been removed > some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG: duration: 542955.262 ms > statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id); > > The .configuration > PG 9.2.4, Debian 7.0, amd64 > > shared_buffers = 10GB > work_mem = 1GB > maintenance_work_mem = 1GB > effective_io_concurrency = 5 > synchronous_commit = off > checkpoint_segments = 32 > max_wal_senders = 2 > effective_cache_size = 30GB > autovacuum_max_workers = 3 > wal_level=archive > archive_mode = off > > Does it look like a bug or am I missing something ? > > Thanks, > Sergey > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand https://netllama.linux-sxs.org
On Fri, 10 May 2013, Lonni J Friedman wrote: > Its definitely not a bug. You need to set/increase wal_keep_segments > to a value that ensures that they aren't recycled faster than the time > required to complete the base backup (plus some buffer). But I thought that wal_keep_segments is not needed for the streaming regime ( "--xlog-method=stream") And the documentation http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html only mentions wal_keep_segments when talking about --xlog-method=fetch. > > On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote: >> Hi, >> >> I've recently started to use pg_basebackup --xlog-method=stream to backup my >> multi-Tb database. >> Before I did the backup when there was not much activity in the DB and it >> went perfectly fine, but today, I've started the backup and it failed twice >> almost at the same time as the >> CREATE INDEX (and another time CLUSTER) commands were finished. >> >> Here: >> >> postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup >> --xlog-method=stream --progress --verbose --pg >> transaction log start point: 23AE/BD003E70 >> pg_basebackup: starting background WAL receiver >> pg_basebackup: unexpected termination of replication stream: FATAL: >> requested WAL segment 00000001000023B1000000FE has already been removed >> 4819820/16816887078 kB (4%), 0/1 tablespace >> (/mnt/backup/wsdb_130510/base/1) >> >> And the logs from around that time contained: >> >> some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG: duration: 40128.163 ms >> statement: CREATE INDEX usno_cle >> an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec)); >> ::2013-05-10 14:35:43 BST:25529LOG: checkpoints are occurring too >> frequently (8 seconds apart) >> ::2013-05-10 14:35:43 BST:25529HINT: Consider increasing the configuration >> parameter "checkpoint_segmen >> ts". >> ::2013-05-10 14:35:51 BST:25529LOG: checkpoints are occurring too >> frequently (8 seconds apart) >> ::2013-05-10 14:35:51 BST:25529HINT: Consider increasing the configuration >> parameter "checkpoint_segmen >> ts". >> postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL: requested WAL segment >> 00000001000023B1000000FE has already been removed >> some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG: duration: 78378.194 ms >> statement: CLUSTER usno_clean_q3c_idx ON usno_clean; >> >> One the previous occasion when it happened the CREATE INDEX() was being >> executed: >> >> some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG: duration: 67.680 ms >> statement: SELECT name FROM (SELECT pg_catalog.lower(name) AS name FROM >> pg_catalog.pg_settings UNION ALL SELECT 'session authorization' UNION >> ALL SELECT 'all') ss WHERE substring(name,1,4)='rand' >> LIMIT 1000 >> ::2013-05-10 09:22:47 BST:25529LOG: checkpoints are occurring too >> frequently (18 seconds apart) >> ::2013-05-10 09:22:47 BST:25529HINT: Consider increasing the configuration >> parameter "checkpoint_segments". >> postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL: requested WAL >> segment 000000010000239900000040 has already been removed >> some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG: duration: 542955.262 ms >> statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id); >> >> The .configuration >> PG 9.2.4, Debian 7.0, amd64 >> >> shared_buffers = 10GB >> work_mem = 1GB >> maintenance_work_mem = 1GB >> effective_io_concurrency = 5 >> synchronous_commit = off >> checkpoint_segments = 32 >> max_wal_senders = 2 >> effective_cache_size = 30GB >> autovacuum_max_workers = 3 >> wal_level=archive >> archive_mode = off >> >> Does it look like a bug or am I missing something ? >> >> Thanks, >> Sergey >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > > > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > L. Friedman netllama@gmail.com > LlamaLand https://netllama.linux-sxs.org >
That's a good point. Then i dunno, perhaps it is a bug, but I'd be surprised if this wasn't working, as its not really a corner case that could be missed in testing, as long as all the options were exercised. Hopefully someone else can weigh in. On Fri, May 10, 2013 at 10:00 AM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote: > > On Fri, 10 May 2013, Lonni J Friedman wrote: > >> Its definitely not a bug. You need to set/increase wal_keep_segments >> to a value that ensures that they aren't recycled faster than the time >> required to complete the base backup (plus some buffer). > > > But I thought that wal_keep_segments is not needed for the streaming regime > ( "--xlog-method=stream") And the documentation > http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html > only mentions wal_keep_segments when talking about --xlog-method=fetch. > > >> >> On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov > > <koposov@ast.cam.ac.uk> wrote: >> Hi, >>> >>> >>> I've recently started to use pg_basebackup --xlog-method=stream to backup >>> my >>> multi-Tb database. >>> Before I did the backup when there was not much activity in the DB and it >>> went perfectly fine, but today, I've started the backup and it failed >>> twice >>> almost at the same time as the >>> CREATE INDEX (and another time CLUSTER) commands were finished. >>> >>> Here: >>> >>> postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup >>> --xlog-method=stream --progress --verbose --pg >>> transaction log start point: 23AE/BD003E70 >>> pg_basebackup: starting background WAL receiver >>> pg_basebackup: unexpected termination of replication stream: FATAL: >>> requested WAL segment 00000001000023B1000000FE has already been removed >>> 4819820/16816887078 kB (4%), 0/1 tablespace >>> (/mnt/backup/wsdb_130510/base/1) >>> >>> And the logs from around that time contained: >>> >>> some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG: duration: 40128.163 ms >>> statement: CREATE INDEX usno_cle >>> an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec)); >>> ::2013-05-10 14:35:43 BST:25529LOG: checkpoints are occurring too >>> frequently (8 seconds apart) >>> ::2013-05-10 14:35:43 BST:25529HINT: Consider increasing the >>> configuration >>> parameter "checkpoint_segmen >>> ts". >>> ::2013-05-10 14:35:51 BST:25529LOG: checkpoints are occurring too >>> frequently (8 seconds apart) >>> ::2013-05-10 14:35:51 BST:25529HINT: Consider increasing the >>> configuration >>> parameter "checkpoint_segmen >>> ts". >>> postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL: requested WAL >>> segment >>> 00000001000023B1000000FE has already been removed >>> some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG: duration: 78378.194 ms >>> statement: CLUSTER usno_clean_q3c_idx ON usno_clean; >>> >>> One the previous occasion when it happened the CREATE INDEX() was being >>> executed: >>> >>> some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG: duration: 67.680 ms >>> statement: SELECT name FROM (SELECT pg_catalog.lower(name) AS name FROM >>> pg_catalog.pg_settings UNION ALL SELECT 'session authorization' UNION >>> ALL SELECT 'all') ss WHERE substring(name,1,4)='rand' >>> LIMIT 1000 >>> ::2013-05-10 09:22:47 BST:25529LOG: checkpoints are occurring too >>> frequently (18 seconds apart) >>> ::2013-05-10 09:22:47 BST:25529HINT: Consider increasing the >>> configuration >>> parameter "checkpoint_segments". >>> postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL: requested WAL >>> segment 000000010000239900000040 has already been removed >>> some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG: duration: 542955.262 ms >>> statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id); >>> >>> The .configuration >>> PG 9.2.4, Debian 7.0, amd64 >>> >>> shared_buffers = 10GB >>> work_mem = 1GB >>> maintenance_work_mem = 1GB >>> effective_io_concurrency = 5 >>> synchronous_commit = off >>> checkpoint_segments = 32 >>> max_wal_senders = 2 >>> effective_cache_size = 30GB >>> autovacuum_max_workers = 3 >>> wal_level=archive >>> archive_mode = off >>> >>> Does it look like a bug or am I missing something ? >>> >>> Thanks, >>> Sergey >>> >>> >>> -- >>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-general >> >> >> >> >> -- >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> L. Friedman netllama@gmail.com >> LlamaLand https://netllama.linux-sxs.org >> > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand https://netllama.linux-sxs.org
On Fri, May 10, 2013 at 7:00 PM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote: > > On Fri, 10 May 2013, Lonni J Friedman wrote: > >> Its definitely not a bug. You need to set/increase wal_keep_segments >> to a value that ensures that they aren't recycled faster than the time >> required to complete the base backup (plus some buffer). > > > But I thought that wal_keep_segments is not needed for the streaming regime > ( "--xlog-method=stream") And the documentation > http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html > only mentions wal_keep_segments when talking about --xlog-method=fetch. It's not a bug in the software - this will happen if the background stream in pg_basebackup cannot keep up, and that's normal. It works the same way as a regular standby in that if it's unable to keep up it will eventually fall so far behind that it can't recover. It may definitely be something that needs to be cleared up in the documentation though. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/