Thread: pg_basebackup, requested WAL has already been removed

pg_basebackup, requested WAL has already been removed

From
Sergey Koposov
Date:
Hi,

I've recently started to use pg_basebackup --xlog-method=stream to backup
my multi-Tb database.
Before I did the backup when there was not much activity in the DB and it
went perfectly fine, but today, I've started the backup and it failed
twice  almost at the same time as the
CREATE INDEX  (and another time CLUSTER) commands were finished.

Here:

postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup --xlog-method=stream --progress --verbose --pg
transaction log start point: 23AE/BD003E70
pg_basebackup: starting background WAL receiver
pg_basebackup: unexpected termination of replication stream: FATAL:
requested WAL segment 00000001000023B1000000FE has already been removed
4819820/16816887078 kB (4%), 0/1 tablespace
(/mnt/backup/wsdb_130510/base/1)

And the logs from around that time contained:

some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms  statement: CREATE INDEX usno_cle
an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too frequently (8 seconds apart)
::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the configuration parameter "checkpoint_segmen
ts".
::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too frequently (8 seconds apart)
::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the configuration parameter "checkpoint_segmen
ts".
postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL segment 00000001000023B1000000FE has already been
removed
some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms  statement: CLUSTER usno_clean_q3c_idx ON
usno_clean;

One the previous occasion when it happened the CREATE INDEX() was being executed:

some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms  statement: SELECT name FROM  (SELECT
pg_catalog.lower(name)AS name FROM pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION ALL SELECT
'all')ss  WHERE substring(name,1,4)='rand' 
         LIMIT 1000
::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too frequently (18 seconds apart)
::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the configuration parameter "checkpoint_segments".
postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL segment 000000010000239900000040 has already been
removed
some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms  statement: CREATE INDEX
xmatch_temp_usnoid_idxON xmatch_temp (usno_id); 

The .configuration
PG 9.2.4, Debian 7.0, amd64

shared_buffers = 10GB
work_mem = 1GB
maintenance_work_mem = 1GB
effective_io_concurrency = 5
synchronous_commit = off
checkpoint_segments = 32
max_wal_senders = 2
effective_cache_size = 30GB
autovacuum_max_workers = 3
wal_level=archive
archive_mode = off

Does it look like a bug or am I missing something ?

Thanks,
     Sergey


Re: pg_basebackup, requested WAL has already been removed

From
Lonni J Friedman
Date:
Its definitely not a bug.  You need to set/increase wal_keep_segments
to a value that ensures that they aren't recycled faster than the time
required to complete the base backup (plus some buffer).

On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
> Hi,
>
> I've recently started to use pg_basebackup --xlog-method=stream to backup my
> multi-Tb database.
> Before I did the backup when there was not much activity in the DB and it
> went perfectly fine, but today, I've started the backup and it failed twice
> almost at the same time as the
> CREATE INDEX  (and another time CLUSTER) commands were finished.
>
> Here:
>
> postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
> --xlog-method=stream --progress --verbose --pg
> transaction log start point: 23AE/BD003E70
> pg_basebackup: starting background WAL receiver
> pg_basebackup: unexpected termination of replication stream: FATAL:
> requested WAL segment 00000001000023B1000000FE has already been removed
> 4819820/16816887078 kB (4%), 0/1 tablespace
> (/mnt/backup/wsdb_130510/base/1)
>
> And the logs from around that time contained:
>
> some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms
> statement: CREATE INDEX usno_cle
> an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
> ::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too
> frequently (8 seconds apart)
> ::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the configuration
> parameter "checkpoint_segmen
> ts".
> ::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too
> frequently (8 seconds apart)
> ::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the configuration
> parameter "checkpoint_segmen
> ts".
> postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL segment
> 00000001000023B1000000FE has already been removed
> some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms
> statement: CLUSTER usno_clean_q3c_idx ON usno_clean;
>
> One the previous occasion when it happened the CREATE INDEX() was being
> executed:
>
> some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms
> statement: SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
> pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION
> ALL SELECT 'all') ss  WHERE substring(name,1,4)='rand'
>         LIMIT 1000
> ::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too
> frequently (18 seconds apart)
> ::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the configuration
> parameter "checkpoint_segments".
> postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL
> segment 000000010000239900000040 has already been removed
> some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms
> statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);
>
> The .configuration
> PG 9.2.4, Debian 7.0, amd64
>
> shared_buffers = 10GB
> work_mem = 1GB
> maintenance_work_mem = 1GB
> effective_io_concurrency = 5
> synchronous_commit = off
> checkpoint_segments = 32
> max_wal_senders = 2
> effective_cache_size = 30GB
> autovacuum_max_workers = 3
> wal_level=archive
> archive_mode = off
>
> Does it look like a bug or am I missing something ?
>
> Thanks,
>         Sergey
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       https://netllama.linux-sxs.org


Re: pg_basebackup, requested WAL has already been removed

From
Sergey Koposov
Date:
On Fri, 10 May 2013, Lonni J Friedman wrote:

> Its definitely not a bug.  You need to set/increase wal_keep_segments
> to a value that ensures that they aren't recycled faster than the time
> required to complete the base backup (plus some buffer).

But I thought that wal_keep_segments is not needed for the streaming
regime ( "--xlog-method=stream")  And the documentation
http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
only mentions wal_keep_segments when talking about --xlog-method=fetch.


>
> On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov
<koposov@ast.cam.ac.uk>
wrote: >> Hi,
>>
>> I've recently started to use pg_basebackup --xlog-method=stream to backup my
>> multi-Tb database.
>> Before I did the backup when there was not much activity in the DB and it
>> went perfectly fine, but today, I've started the backup and it failed twice
>> almost at the same time as the
>> CREATE INDEX  (and another time CLUSTER) commands were finished.
>>
>> Here:
>>
>> postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
>> --xlog-method=stream --progress --verbose --pg
>> transaction log start point: 23AE/BD003E70
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: unexpected termination of replication stream: FATAL:
>> requested WAL segment 00000001000023B1000000FE has already been removed
>> 4819820/16816887078 kB (4%), 0/1 tablespace
>> (/mnt/backup/wsdb_130510/base/1)
>>
>> And the logs from around that time contained:
>>
>> some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms
>> statement: CREATE INDEX usno_cle
>> an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
>> ::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too
>> frequently (8 seconds apart)
>> ::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the configuration
>> parameter "checkpoint_segmen
>> ts".
>> ::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too
>> frequently (8 seconds apart)
>> ::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the configuration
>> parameter "checkpoint_segmen
>> ts".
>> postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL segment
>> 00000001000023B1000000FE has already been removed
>> some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms
>> statement: CLUSTER usno_clean_q3c_idx ON usno_clean;
>>
>> One the previous occasion when it happened the CREATE INDEX() was being
>> executed:
>>
>> some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms
>> statement: SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
>> pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION
>> ALL SELECT 'all') ss  WHERE substring(name,1,4)='rand'
>>         LIMIT 1000
>> ::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too
>> frequently (18 seconds apart)
>> ::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the configuration
>> parameter "checkpoint_segments".
>> postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL
>> segment 000000010000239900000040 has already been removed
>> some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms
>> statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);
>>
>> The .configuration
>> PG 9.2.4, Debian 7.0, amd64
>>
>> shared_buffers = 10GB
>> work_mem = 1GB
>> maintenance_work_mem = 1GB
>> effective_io_concurrency = 5
>> synchronous_commit = off
>> checkpoint_segments = 32
>> max_wal_senders = 2
>> effective_cache_size = 30GB
>> autovacuum_max_workers = 3
>> wal_level=archive
>> archive_mode = off
>>
>> Does it look like a bug or am I missing something ?
>>
>> Thanks,
>>         Sergey
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> L. Friedman                                    netllama@gmail.com
> LlamaLand                       https://netllama.linux-sxs.org
>



Re: pg_basebackup, requested WAL has already been removed

From
Lonni J Friedman
Date:
That's a good point.  Then i dunno, perhaps it is a bug, but I'd be
surprised if this wasn't working, as its not really a corner case that
could be missed in testing, as long as all the options were exercised.
 Hopefully someone else can weigh in.

On Fri, May 10, 2013 at 10:00 AM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
>
> On Fri, 10 May 2013, Lonni J Friedman wrote:
>
>> Its definitely not a bug.  You need to set/increase wal_keep_segments
>> to a value that ensures that they aren't recycled faster than the time
>> required to complete the base backup (plus some buffer).
>
>
> But I thought that wal_keep_segments is not needed for the streaming regime
> ( "--xlog-method=stream")  And the documentation
> http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
> only mentions wal_keep_segments when talking about --xlog-method=fetch.
>
>
>>
>> On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov
>
> <koposov@ast.cam.ac.uk> wrote: >> Hi,
>>>
>>>
>>> I've recently started to use pg_basebackup --xlog-method=stream to backup
>>> my
>>> multi-Tb database.
>>> Before I did the backup when there was not much activity in the DB and it
>>> went perfectly fine, but today, I've started the backup and it failed
>>> twice
>>> almost at the same time as the
>>> CREATE INDEX  (and another time CLUSTER) commands were finished.
>>>
>>> Here:
>>>
>>> postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
>>> --xlog-method=stream --progress --verbose --pg
>>> transaction log start point: 23AE/BD003E70
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: unexpected termination of replication stream: FATAL:
>>> requested WAL segment 00000001000023B1000000FE has already been removed
>>> 4819820/16816887078 kB (4%), 0/1 tablespace
>>> (/mnt/backup/wsdb_130510/base/1)
>>>
>>> And the logs from around that time contained:
>>>
>>> some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms
>>> statement: CREATE INDEX usno_cle
>>> an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
>>> ::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too
>>> frequently (8 seconds apart)
>>> ::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the
>>> configuration
>>> parameter "checkpoint_segmen
>>> ts".
>>> ::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too
>>> frequently (8 seconds apart)
>>> ::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the
>>> configuration
>>> parameter "checkpoint_segmen
>>> ts".
>>> postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL
>>> segment
>>> 00000001000023B1000000FE has already been removed
>>> some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms
>>> statement: CLUSTER usno_clean_q3c_idx ON usno_clean;
>>>
>>> One the previous occasion when it happened the CREATE INDEX() was being
>>> executed:
>>>
>>> some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms
>>> statement: SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
>>> pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION
>>> ALL SELECT 'all') ss  WHERE substring(name,1,4)='rand'
>>>         LIMIT 1000
>>> ::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too
>>> frequently (18 seconds apart)
>>> ::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the
>>> configuration
>>> parameter "checkpoint_segments".
>>> postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL
>>> segment 000000010000239900000040 has already been removed
>>> some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms
>>> statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);
>>>
>>> The .configuration
>>> PG 9.2.4, Debian 7.0, amd64
>>>
>>> shared_buffers = 10GB
>>> work_mem = 1GB
>>> maintenance_work_mem = 1GB
>>> effective_io_concurrency = 5
>>> synchronous_commit = off
>>> checkpoint_segments = 32
>>> max_wal_senders = 2
>>> effective_cache_size = 30GB
>>> autovacuum_max_workers = 3
>>> wal_level=archive
>>> archive_mode = off
>>>
>>> Does it look like a bug or am I missing something ?
>>>
>>> Thanks,
>>>         Sergey
>>>
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>
>>
>>
>>
>> --
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> L. Friedman                                    netllama@gmail.com
>> LlamaLand                       https://netllama.linux-sxs.org
>>
>



--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       https://netllama.linux-sxs.org


Re: pg_basebackup, requested WAL has already been removed

From
Magnus Hagander
Date:
On Fri, May 10, 2013 at 7:00 PM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
>
> On Fri, 10 May 2013, Lonni J Friedman wrote:
>
>> Its definitely not a bug.  You need to set/increase wal_keep_segments
>> to a value that ensures that they aren't recycled faster than the time
>> required to complete the base backup (plus some buffer).
>
>
> But I thought that wal_keep_segments is not needed for the streaming regime
> ( "--xlog-method=stream")  And the documentation
> http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
> only mentions wal_keep_segments when talking about --xlog-method=fetch.

It's not a bug in the software - this will happen if the background
stream in pg_basebackup cannot keep up, and that's normal. It works
the same way as a regular standby in that if it's unable to keep up it
will eventually fall so far behind that it can't recover.

It may definitely be something that needs to be cleared up in the
documentation though.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/