Thread: pg_xlog piling

pg_xlog piling

From

chris

Date:

30 January 2018, 22:04:17

Hi,

I have a standby that is in replication only mode but the pg_xlog 
directory is currently at 577GB utilized and is growing daily and I 
can't figure out why. This was a problem before when we were on 
postgresl 9.2 but changed some postgresql.conf parameters and then 
rebuilt it but still running into the same problem.


Thanks in advance,

Chris


Here is some more info about the standby.


# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

wal_level = hot_standby            # minimal, archive, or hot_standby

                     # (change requires restart)
#fsync = off                # turns forced synchronization on or off
#synchronous_commit = off        # synchronization level;
                     # off, local, remote_write, or on
#wal_sync_method = fsync        # the default is the first option
                     # supported by the operating system:
                     #   open_datasync
                     #   fdatasync (default on Linux)
                     #   fsync
                     #   fsync_writethrough
                     #   open_sync
#full_page_writes = on            # recover from partial page writes
#wal_buffers = -1            # min 32kB, -1 sets based on shared_buffers
                     # (change requires restart)
#wal_writer_delay = 200ms        # 1-10000 milliseconds

#commit_delay = 0            # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000

# - Checkpoints -

checkpoint_segments = 128        # in logfile segments, min 1, 16MB each
checkpoint_timeout = 30min
#checkpoint_timeout = 5min        # range 30s-1h
#checkpoint_completion_target = 0.5    # checkpoint target duration, 0.0 
- 1.0
checkpoint_warning = 30min
#checkpoint_warning = 30s        # 0 disables


# REPLICATION
#------------------------------------------------------------------------------

# - Sending Server(s) -

# Set these on the master and on any standby that will send replication 
data.

max_wal_senders = 3        # max number of walsender processes
                 # (change requires restart)
wal_keep_segments = 256        # in logfile segments, 16MB each; 0 disables
#wal_keep_segments = 500        # in logfile segments, 16MB each; 0 disables
#replication_timeout = 60s    # in milliseconds; 0 disables

# - Master Server -

# These settings are ignored on a standby server.

#synchronous_standby_names = ''    # standby servers that provide sync rep
                 # comma-separated list of application_name
                 # from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0    # number of xacts by which cleanup is 
delayed

# - Standby Servers -

# These settings are ignored on a master server.

hot_standby = on            # "on" allows queries during recovery
                     # (change requires restart)
#max_standby_archive_delay = 30s    # max delay before canceling queries
                     # when reading WAL from archive;
                     # -1 allows indefinite delay
#max_standby_streaming_delay = 30s    # max delay before canceling queries
                     # when reading streaming WAL;
                     # -1 allows indefinite delay
#wal_receiver_status_interval = 10s    # send replies at least this often
                     # 0 disables
#hot_standby_feedback = off        # send info from standby to prevent
                     # query conflicts


postgres=# show checkpoint_segments;
  checkpoint_segments
---------------------
  128
(1 row)

postgres=# show wal_keep_segments;
  wal_keep_segments
-------------------
  256
(1 row)

Re: pg_xlog piling

From

"David G. Johnston"

Date:

30 January 2018, 22:09:19

On Tuesday, January 30, 2018, chris <chrisk@pgsqlrocket.com> wrote:

This was a problem before when we were on postgresl 9.2

What version are you on now?

David J.

Re: pg_xlog piling

From

Jorge Torralba

Date:

30 January 2018, 22:59:25

what is your archive command and is your replica up to date? Any messages in the pg_log that would indicate the replica is not receiving update ?

On Tue, Jan 30, 2018 at 11:09 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tuesday, January 30, 2018, chris <chrisk@pgsqlrocket.com> wrote:
This was a problem before when we were on postgresl 9.2

What version are you on now?

David J.

Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: pg_xlog piling

From

chris

Date:

30 January 2018, 23:00:45

still on version 9.2

On 01/30/2018 12:09 PM, David G. Johnston wrote:

On Tuesday, January 30, 2018, chris <chrisk@pgsqlrocket.com> wrote:
This was a problem before when we were on postgresl 9.2

What version are you on now?

David J.

Re: pg_xlog piling

From

chris

Date:

31 January 2018, 00:31:59

postgres=# select * from pg_stat_replication;pid  | usesysid | usename  | application_name |  client_addr   |
client_hostname | client_port |         backend_start         |   state   |
sent_location | write_location | flush_location | replay_locatio
n | sync_priority | sync_state 
------+----------+----------+------------------+----------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+---------------
--+---------------+------------5303 |       10 | postgres | walreceiver      | 198.161.184.74 |               |       57049 | 2017-12-14 11:13:16.339948-07 | streaming | 2FD3/41A38140 |
2FD3/41A38140  | 2FD3/41A38140  | 2F47/7015D490  |             0 | async
(1 row)

postgres=#

# - Archiving -

archive_mode = on        # allows archiving to be done

archive_command = 'cp %p /ComplianceDB/pitr/walstage/%f; mv /ComplianceDB/pitr/walstage/%f /ComplianceDB/pitr/wal/%f'                # (change requires restart)
#archive_command = ''        # command to use to archive a logfile segment
                # placeholders: %p = path of file to archive
                #               %f = file name only
                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0        # force a logfile segment switch after this
                # number of seconds; 0 disables
Thanks

On 01/30/2018 12:59 PM, Jorge Torralba wrote:

what is your archive command and is your replica up to date? Any messages in the pg_log that would indicate the replica is not receiving update ?

On Tue, Jan 30, 2018 at 11:09 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, January 30, 2018, chris <chrisk@pgsqlrocket.com> wrote:
This was a problem before when we were on postgresl 9.2

What version are you on now?

David J.

--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: pg_xlog piling

From

Venkata B Nagothi

Date:

31 January 2018, 01:24:47

On Wed, 31 Jan 2018 at 8:32 am, chris <chrisk@pgsqlrocket.com> wrote:

postgres=# select * from pg_stat_replication;pid  | usesysid | usename  | application_name |  client_addr   |
client_hostname | client_port |         backend_start         |   state   |
sent_location | write_location | flush_location | replay_locatio
n | sync_priority | sync_state 
------+----------+----------+------------------+----------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+---------------
--+---------------+------------5303 |       10 | postgres | walreceiver      | 198.161.184.74 |               |       57049 | 2017-12-14 11:13:16.339948-07 | streaming | 2FD3/41A38140 |
2FD3/41A38140  | 2FD3/41A38140  | 2F47/7015D490  |             0 | async
(1 row)

postgres=# 
# - Archiving -

archive_mode = on        # allows archiving to be done

archive_command = 'cp %p /ComplianceDB/pitr/walstage/%f; mv /ComplianceDB/pitr/walstage/%f /ComplianceDB/pitr/wal/%f'                # (change requires restart)
#archive_command = ''        # command to use to archive a logfile segment
                # placeholders: %p = path of file to archive
                #               %f = file name only
                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0        # force a logfile segment switch after this
                # number of seconds; 0 disables
Thanks

Interesting. Do you see any files called *.ready in pg_xlog/archive_status directory ? Normally pg_xlog fills up when archive_command is not successful.

Do you have parameter wal_keep_segments configured to higher value ?

Regards,

Venkata B

Regards,

Venkata B N

Database Consultant

RE: pg_xlog piling

From

Paula Price

Date:

31 January 2018, 03:17:59

We had a similar issue with 9.2. Managed to track it enough to figure out it was happening when streaming replication falls behind.

We log ship to a dir, log_archive, and have an archive_command.sh file in that dir.

Added a line to the archive_command.sh to call pg_archivecleanup for the xlog partition in addition to the existing pg_archivecleanup for log_archive dir. For some reason (have not had time to research) we have to fully path the location of the pg_archivecleanup.

If this is helpful, I can show the lines of the archive_command.sh file.

Paula Price

Senior Database Administrator

From: Venkata B Nagothi [mailto:nag1010@gmail.com]
Sent: Tuesday, January 30, 2018 3:25 PM
To: chris <chrisk@pgsqlrocket.com>
Cc: David G. Johnston <david.g.johnston@gmail.com>; Jorge Torralba <jorge.torralba@gmail.com>; pgsql-admin@lists.postgresql.org
Subject: Re: pg_xlog piling

On Wed, 31 Jan 2018 at 8:32 am, chris <chrisk@pgsqlrocket.com> wrote:

postgres=# select * from pg_stat_replication;
 pid  | usesysid | usename  | application_name |  client_addr   |
client_hostname | client_port |         backend_start         |   state   |
sent_location | write_location | flush_location | replay_locatio
n | sync_priority | sync_state 
------+----------+----------+------------------+----------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+---------------
--+---------------+------------
 5303 |       10 | postgres | walreceiver      | 198.161.184.74 |               
 |       57049 | 2017-12-14 11:13:16.339948-07 | streaming | 2FD3/41A38140 |
2FD3/41A38140  | 2FD3/41A38140  | 2F47/7015D490 
  |             0 | async
(1 row)
 
postgres=# 
 
 
# - Archiving -

archive_mode = on        # allows archiving to be done

archive_command = 'cp %p /ComplianceDB/pitr/walstage/%f; mv /ComplianceDB/pitr/walstage/%f /ComplianceDB/pitr/wal/%f'                # (change requires restart)
#archive_command = ''        # command to use to archive a logfile segment
                # placeholders: %p = path of file to archive
                #               %f = file name only
                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0        # force a logfile segment switch after this
                # number of seconds; 0 disables
Thanks

Interesting. Do you see any files called *.ready in pg_xlog/archive_status directory ? Normally pg_xlog fills up when archive_command is not successful.

Do you have parameter wal_keep_segments configured to higher value ?

Regards,

Venkata B

Regards,

Venkata B N

Database Consultant

Re: pg_xlog piling

From

Laurenz Albe

Date:

31 January 2018, 11:29:12

chris wrote:
> I have a standby that is in replication only mode but the pg_xlog 
> directory is currently at 577GB utilized and is growing daily and I 
> can't figure out why. This was a problem before when we were on 
> postgresl 9.2 but changed some postgresql.conf parameters and then 
> rebuilt it but still running into the same problem.

What do you get on the primary for

   SELECT state, sent_location, write_location, flush_location, replay_location
   FRPM pg_stat_replication;

Yours,
Laurenz Albe