Thread: BUG #13962: transaction logs growing on standby

BUG #13962: transaction logs growing on standby

From
nick.bales@rackspace.com
Date:
The following bug has been logged on the website:

Bug reference:      13962
Logged by:          Nick Bales
Email address:      nick.bales@rackspace.com
PostgreSQL version: 9.3.9
Operating system:   CentOS 6.6
Description:

On several of my postgres 9.3.x clusters with streaming replication, the
number of transaction logs(files in pg_xlog, not the archived logs) on the
standby nodes are growing past the theoretical maximum(wal_keep_segments + 2
* checkpoint_segments + 1).  Most are appropriately getting recycled at each
restartpoint, as can be seed with the log entry like:

  restartpoint complete: wrote 128 buffers (0.0%); 0 transaction log file(s)
added, 0 removed, 1 recycled

However, at semi-regular intervals, no xlogs are added, removed, or
recycled:

  restartpoint complete: wrote 135 buffers (0.0%); 0 transaction log file(s)
added, 0 removed, 0 recycled

The number of orphaned files for a period directly matches the number of log
occurrences where no files are recycled during a restartpoint.  There is
also no log that a new transaction log file is added, even though the number
of xlogs in the directory is growing.  Furthermore, once a file is orphaned,
it sticks around for the life of the system, ultimately leading to
exhausting all disk space on long running systems.

Relevant settings:

 wal_level = hot_standby
 wal_keep_segments = 200
 checkpoint_segments = 128
 checkpoint_timeout = 5min
 checkpoint_completion_target = 0.7
 checkpoint_warning = 30s
 log_checkpoints = on

Re: BUG #13962: transaction logs growing on standby

From
Venkata Balaji N
Date:
On Thu, Feb 18, 2016 at 5:42 AM, Nick Bales <nick.bales@rackspace.com>
wrote:
>
> I am not seeing any instances of where the standby nodes are falling
> behind the primary.  However, I do see several instances each day where the
> standby is having to re-establish the streaming connection to the primary:
>
>   2015-12-15 07:25:10 GMT 12969 XX000 566f9ae1.32a9 0 FATAL: could not
> send data to WAL stream: server closed the connection unexpectedly
>   2015-12-15 07:25:10 GMT 33302 00000 566fc056.8216 0 LOG: started
> streaming WAL from primary at 374/F1000000 on timeline 1
>   2015-12-15 10:05:13 GMT 33302 XX000 566fc056.8216 0 FATAL: could not
> send data to WAL stream: server closed the connection unexpectedly
>   2015-12-15 10:05:13 GMT 5001 00000 566fe5d9.1389 0 LOG: started
> streaming WAL from primary at 375/11000000 on timeline 1
>
> Upon closer inspection, the number of times this is occurring per day does
> correspond to the number of orphaned xlogs for the day(by file's last
> updated timestamp), so there does seem to be some relevance.  Will need to
> investigate further as to what is causing the streaming connection to drop.
>

Thats the issue, there is an interruption in streaming probably due to
network related issues. That needs to be fixed first.
So, this is not a BUG. Please note that this is not an appropriate mailing
list to send these technical queries. Please consider sending across such
technical queries to pgsql-general mailing list.

Regards,
Venkata B N

Fujitsu Australia

Re: BUG #13962: transaction logs growing on standby

From
Nick Bales
Date:
Understood.  Really thought I exhausted all potential issues before
submitting this.  However, does still seem like the orphaned xlogs
should be cleaned up if they predate the latest checkpoint.

On 02/17/2016 04:21 PM, Venkata Balaji N wrote:
>
> On Thu, Feb 18, 2016 at 5:42 AM, Nick Bales <nick.bales@rackspace.com
> <mailto:nick.bales@rackspace.com>> wrote:
>
>     I am not seeing any instances of where the standby nodes are
>     falling behind the primary.  However, I do see several instances
>     each day where the standby is having to re-establish the streaming
>     connection to the primary:
>
>       2015-12-15 07:25:10 GMT 12969 XX000 566f9ae1.32a9 0 FATAL: could
>     not send data to WAL stream: server closed the connection
>     unexpectedly
>       2015-12-15 07:25:10 GMT 33302 00000 566fc056.8216 0 LOG: started
>     streaming WAL from primary at 374/F1000000 on timeline 1
>       2015-12-15 10:05:13 GMT 33302 XX000 566fc056.8216 0 FATAL: could
>     not send data to WAL stream: server closed the connection
>     unexpectedly
>       2015-12-15 10:05:13 GMT 5001 00000 566fe5d9.1389 0 LOG: started
>     streaming WAL from primary at 375/11000000 on timeline 1
>
>     Upon closer inspection, the number of times this is occurring per
>     day does correspond to the number of orphaned xlogs for the day(by
>     file's last updated timestamp), so there does seem to be some
>     relevance. Will need to investigate further as to what is causing
>     the streaming connection to drop.
>
>
> Thats the issue, there is an interruption in streaming probably due to
> network related issues. That needs to be fixed first.
> So, this is not a BUG. Please note that this is not an appropriate
> mailing list to send these technical queries. Please consider sending
> across such technical queries to pgsql-general mailing list.
>
> Regards,
> Venkata B N
>
> Fujitsu Australia
>

Re: BUG #13962: transaction logs growing on standby

From
Alvaro Herrera
Date:
Just curious -- did communication happen in private email here?  There's
at least one email we do not have on the archives

http://www.postgresql.org/message-id/flat/CAEyp7J_sYm9Ru-uD4kCSX7_BnC9vq8vDq9Sw9vkj3m2+jK18mw@mail.gmail.com#CAEyp7J_sYm9Ru-uD4kCSX7_BnC9vq8vDq9Sw9vkj3m2+jK18mw@mail.gmail.com
which appears as quoted material from Nick in Ventaka's response, but is
not present in the first email in the thread.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #13962: transaction logs growing on standby

From
Venkata Balaji N
Date:
On Thu, Feb 18, 2016 at 9:50 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

> Just curious -- did communication happen in private email here?  There's
> at least one email we do not have on the archives
>
>
http://www.postgresql.org/message-id/flat/CAEyp7J_sYm9Ru-uD4kCSX7_BnC9vq8vDq9Sw9vkj3m2+jK18mw@mail.gmail.com#CAEyp7J_sYm9Ru-uD4kCSX7_BnC9vq8vDq9Sw9vkj3m2+jK18mw@mail.gmail.com
> which appears as quoted material from Nick in Ventaka's response, but is
> not present in the first email in the thread.
>

Yes, there is one email which i received in private. I replied including
the community email address.

Regards,
Venkata B N

Fujitsu Australia