Thread: pg_xlog piling

pg_xlog piling

From
chris
Date:
Hi,

I have a standby that is in replication only mode but the pg_xlog 
directory is currently at 577GB utilized and is growing daily and I 
can't figure out why. This was a problem before when we were on 
postgresl 9.2 but changed some postgresql.conf parameters and then 
rebuilt it but still running into the same problem.


Thanks in advance,

Chris


Here is some more info about the standby.


# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

wal_level = hot_standby            # minimal, archive, or hot_standby

                     # (change requires restart)
#fsync = off                # turns forced synchronization on or off
#synchronous_commit = off        # synchronization level;
                     # off, local, remote_write, or on
#wal_sync_method = fsync        # the default is the first option
                     # supported by the operating system:
                     #   open_datasync
                     #   fdatasync (default on Linux)
                     #   fsync
                     #   fsync_writethrough
                     #   open_sync
#full_page_writes = on            # recover from partial page writes
#wal_buffers = -1            # min 32kB, -1 sets based on shared_buffers
                     # (change requires restart)
#wal_writer_delay = 200ms        # 1-10000 milliseconds

#commit_delay = 0            # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000

# - Checkpoints -

checkpoint_segments = 128        # in logfile segments, min 1, 16MB each
checkpoint_timeout = 30min
#checkpoint_timeout = 5min        # range 30s-1h
#checkpoint_completion_target = 0.5    # checkpoint target duration, 0.0 
- 1.0
checkpoint_warning = 30min
#checkpoint_warning = 30s        # 0 disables


# REPLICATION
#------------------------------------------------------------------------------

# - Sending Server(s) -

# Set these on the master and on any standby that will send replication 
data.

max_wal_senders = 3        # max number of walsender processes
                 # (change requires restart)
wal_keep_segments = 256        # in logfile segments, 16MB each; 0 disables
#wal_keep_segments = 500        # in logfile segments, 16MB each; 0 disables
#replication_timeout = 60s    # in milliseconds; 0 disables

# - Master Server -

# These settings are ignored on a standby server.

#synchronous_standby_names = ''    # standby servers that provide sync rep
                 # comma-separated list of application_name
                 # from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0    # number of xacts by which cleanup is 
delayed

# - Standby Servers -

# These settings are ignored on a master server.

hot_standby = on            # "on" allows queries during recovery
                     # (change requires restart)
#max_standby_archive_delay = 30s    # max delay before canceling queries
                     # when reading WAL from archive;
                     # -1 allows indefinite delay
#max_standby_streaming_delay = 30s    # max delay before canceling queries
                     # when reading streaming WAL;
                     # -1 allows indefinite delay
#wal_receiver_status_interval = 10s    # send replies at least this often
                     # 0 disables
#hot_standby_feedback = off        # send info from standby to prevent
                     # query conflicts


postgres=# show checkpoint_segments;
  checkpoint_segments
---------------------
  128
(1 row)

postgres=# show wal_keep_segments;
  wal_keep_segments
-------------------
  256
(1 row)




Re: pg_xlog piling

From
"David G. Johnston"
Date:
On Tuesday, January 30, 2018, chris <chrisk@pgsqlrocket.com> wrote:
This was a problem before when we were on postgresl 9.2 

What version are you on now?

David J. 

Re: pg_xlog piling

From
Jorge Torralba
Date:
what is your archive command and is your replica up to date?  Any messages in the pg_log that would indicate the replica is not receiving update ?

On Tue, Jan 30, 2018 at 11:09 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, January 30, 2018, chris <chrisk@pgsqlrocket.com> wrote:
This was a problem before when we were on postgresl 9.2 

What version are you on now?

David J. 



--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: pg_xlog piling

From
chris
Date:

still on version 9.2

On 01/30/2018 12:09 PM, David G. Johnston wrote:
On Tuesday, January 30, 2018, chris <chrisk@pgsqlrocket.com> wrote:
This was a problem before when we were on postgresl 9.2 

What version are you on now?

David J. 

Re: pg_xlog piling

From
chris
Date:
postgres=# select * from pg_stat_replication;pid  | usesysid | usename  | application_name |  client_addr   |
client_hostname | client_port |         backend_start         |   state   |
sent_location | write_location | flush_location | replay_locatio
n | sync_priority | sync_state 
------+----------+----------+------------------+----------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+---------------
--+---------------+------------5303 |       10 | postgres | walreceiver      | 198.161.184.74 |               |       57049 | 2017-12-14 11:13:16.339948-07 | streaming | 2FD3/41A38140 |
2FD3/41A38140  | 2FD3/41A38140  | 2F47/7015D490  |             0 | async
(1 row)

postgres=# 


# - Archiving -

archive_mode = on        # allows archiving to be done

archive_command = 'cp %p /ComplianceDB/pitr/walstage/%f; mv /ComplianceDB/pitr/walstage/%f /ComplianceDB/pitr/wal/%f'                # (change requires restart)
#archive_command = ''        # command to use to archive a logfile segment
                # placeholders: %p = path of file to archive
                #               %f = file name only
                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0        # force a logfile segment switch after this
                # number of seconds; 0 disables
Thanks


On 01/30/2018 12:59 PM, Jorge Torralba wrote:
what is your archive command and is your replica up to date?  Any messages in the pg_log that would indicate the replica is not receiving update ?

On Tue, Jan 30, 2018 at 11:09 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, January 30, 2018, chris <chrisk@pgsqlrocket.com> wrote:
This was a problem before when we were on postgresl 9.2 

What version are you on now?

David J. 



--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: pg_xlog piling

From
Venkata B Nagothi
Date:

On Wed, 31 Jan 2018 at 8:32 am, chris <chrisk@pgsqlrocket.com> wrote:
postgres=# select * from pg_stat_replication;pid  | usesysid | usename  | application_name |  client_addr   |
client_hostname | client_port |         backend_start         |   state   |
sent_location | write_location | flush_location | replay_locatio
n | sync_priority | sync_state 
------+----------+----------+------------------+----------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+---------------
--+---------------+------------5303 |       10 | postgres | walreceiver      | 198.161.184.74 |               |       57049 | 2017-12-14 11:13:16.339948-07 | streaming | 2FD3/41A38140 |
2FD3/41A38140  | 2FD3/41A38140  | 2F47/7015D490  |             0 | async
(1 row)

postgres=# 


# - Archiving -

archive_mode = on        # allows archiving to be done

archive_command = 'cp %p /ComplianceDB/pitr/walstage/%f; mv /ComplianceDB/pitr/walstage/%f /ComplianceDB/pitr/wal/%f'                # (change requires restart)
#archive_command = ''        # command to use to archive a logfile segment
                # placeholders: %p = path of file to archive
                #               %f = file name only
                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0        # force a logfile segment switch after this
                # number of seconds; 0 disables
Thanks

 
Interesting. Do you see any files called *.ready in pg_xlog/archive_status directory ? Normally pg_xlog fills up when archive_command is not successful.

Do you have parameter wal_keep_segments configured to higher value ? 

Regards,
Venkata B
--

Regards,

Venkata B N
Database Consultant
 

RE: pg_xlog piling

From
Paula Price
Date:

We had a similar issue with 9.2. Managed to track it enough to figure out it was happening when streaming replication falls behind.

 

We log ship to a dir, log_archive, and have an archive_command.sh file in that dir.

 

Added a line to the archive_command.sh to call pg_archivecleanup for the xlog partition in addition to the existing pg_archivecleanup for log_archive dir.  For some reason (have not had time to research) we have to fully path the location of the pg_archivecleanup.

 

If this is helpful, I can show the lines of the archive_command.sh file.

 

Paula Price

Senior Database Administrator

 

From: Venkata B Nagothi [mailto:nag1010@gmail.com]
Sent: Tuesday, January 30, 2018 3:25 PM
To: chris <chrisk@pgsqlrocket.com>
Cc: David G. Johnston <david.g.johnston@gmail.com>; Jorge Torralba <jorge.torralba@gmail.com>; pgsql-admin@lists.postgresql.org
Subject: Re: pg_xlog piling

 

 

On Wed, 31 Jan 2018 at 8:32 am, chris <chrisk@pgsqlrocket.com> wrote:

postgres=# select * from pg_stat_replication;
 pid  | usesysid | usename  | application_name |  client_addr   |
client_hostname | client_port |         backend_start         |   state   |
sent_location | write_location | flush_location | replay_locatio
n | sync_priority | sync_state 
------+----------+----------+------------------+----------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+---------------
--+---------------+------------
 5303 |       10 | postgres | walreceiver      | 198.161.184.74 |               
 |       57049 | 2017-12-14 11:13:16.339948-07 | streaming | 2FD3/41A38140 |
2FD3/41A38140  | 2FD3/41A38140  | 2F47/7015D490 
  |             0 | async
(1 row)
 
postgres=# 
 
 

# - Archiving -

archive_mode = on        # allows archiving to be done

archive_command = 'cp %p /ComplianceDB/pitr/walstage/%f; mv /ComplianceDB/pitr/walstage/%f /ComplianceDB/pitr/wal/%f'                # (change requires restart)
#archive_command = ''        # command to use to archive a logfile segment
                # placeholders: %p = path of file to archive
                #               %f = file name only
                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0        # force a logfile segment switch after this
                # number of seconds; 0 disables
Thanks

 

 

Interesting. Do you see any files called *.ready in pg_xlog/archive_status directory ? Normally pg_xlog fills up when archive_command is not successful.

 

Do you have parameter wal_keep_segments configured to higher value ? 

 

Regards,

Venkata B

--

 

Regards,

 

Venkata B N

Database Consultant

 

Re: pg_xlog piling

From
Laurenz Albe
Date:
chris wrote:
> I have a standby that is in replication only mode but the pg_xlog 
> directory is currently at 577GB utilized and is growing daily and I 
> can't figure out why. This was a problem before when we were on 
> postgresl 9.2 but changed some postgresql.conf parameters and then 
> rebuilt it but still running into the same problem.

What do you get on the primary for

   SELECT state, sent_location, write_location, flush_location, replay_location
   FRPM pg_stat_replication;

Yours,
Laurenz Albe