Standby is not removing restored WAL segments - Mailing list pgsql-admin

From Alexey Klyukin
Subject Standby is not removing restored WAL segments
Date
Msg-id CAAS3tyLnXaYDZ0+zhXLPdVtOvHQOvR+jSPhp30o8kvWqQs0Tqw@mail.gmail.com
Whole thread Raw
Responses Re: Standby is not removing restored WAL segments  (bricklen <bricklen@gmail.com>)
Re: Standby is not removing restored WAL segments  (Eduardo Morras <emorrasg@yahoo.es>)
Re: Standby is not removing restored WAL segments  (Guillaume Lelarge <guillaume@lelarge.info>)
List pgsql-admin
Greetings,

We've got a 9.3.5 DB running in a standby mode for a fairly large DB
(500GB) with a busy WAL traffic (couple of GBs per hour) and it
occasionally 'forgets' to remove the segments it restored.

The checkpoint_segments is set to 128, and usually we observe around
270 segments accumulated, but at the time it happens our check
triggers at around 2K segments. The manual checkpoint command takes
ages to complete there,  the fast shutdown is very slow (around 10
minutes, usually less than 1 minute) and the WAL receiver process is
also unable to run for some reason.

The only way to make this host delete WAL files is to restart . The
particularly notable restart point right after the shutdown shows
quite a number of removed files and buffers written (the shared
buffers is set to 8GB on this system):

2014-09-04 14:39:33.376 CEST,,,22354,,537a4553.5752,88217,,2014-05-19
19:54:27 CEST,,0,LOG,00000,"restartpoint complete: wrote 332473
buffers (31.7%); 0 transaction log file(s) added, 1237 removed, 6
recycled; write=9.745 s, sync=680.314 s, total=694.447 s; sync
files=499
, longest=37.774 s, average=1.363 s",,,,,,,,,""

If we leave the host running, this restartpoint never happens.

The only difference I can come up with from the other databases that
do not show this behavior is that the host is running with
max_standby_streaming_delay and max_standby_archive_delay set to -1,
but at the time we observed the problem no queries were running on it
at all.

The problem occurs rarely, but steadily, around once every 3 months.
During this time the PostgreSQL has been upgraded from 9.0 to 9.3,
which did not solve the issue.

Any clues on how can we debug and diagnose the problem further to come
up with a proper bug report, if it is a bug, or are we missing
something in the configuration that causes this?


Regards,
--
Alexey Klyukin


pgsql-admin by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: [GENERAL] Re: Cannot retrieve images inserted through VB and odbc, after a table reorganization.
Next
From: Alanoly Andrews
Date:
Subject: Re: [GENERAL] Re: Cannot retrieve images inserted through VB and odbc, after a table reorganization.