Thread: Replication: How to query current segments allocation relative to "Wal keep segments"?

Replication: How to query current segments allocation relative to "Wal keep segments"?

From

Edson Carlos Ericksson Richter

Date:

03 January 2015, 19:48:36

I'm maintaining async replication (streaming) between four database
servers arranged on 2 x 2.
How to  query current segments allocation relative to "Wal keep
segments" in each master server?
I want to add this query to Postbix in order to monitor if the "wal keep
segments" parameter is too short, like having an alert at 70%, for example.

Thanks,

Edson

Re: Replication: How to query current segments allocation relative to "Wal keep segments"?

From

Michael Paquier

Date:

05 January 2015, 07:08:27

On Sun, Jan 4, 2015 at 1:48 AM, Edson Carlos Ericksson Richter
<edsonrichter@hotmail.com> wrote:
> How to  query current segments allocation relative to "Wal keep segments" in
> each master server?
What is your server version? You can have a look at
pg_stat_replication on the master which contains information about the
WAL segments written, flushed and replayed on each slave:
http://www.postgresql.org/docs/devel/static/monitoring-stats.html#PG-STAT-REPLICATION-VIEW
--
Michael

Re: Replication: How to query current segments allocation relative to "Wal keep segments"?

From

Edson Carlos Ericksson Richter

Date:

05 January 2015, 12:52:46

On 05-01-2015 02:08, Michael Paquier wrote:
> On Sun, Jan 4, 2015 at 1:48 AM, Edson Carlos Ericksson Richter
> <edsonrichter@hotmail.com> wrote:
>> How to  query current segments allocation relative to "Wal keep segments" in
>> each master server?
> What is your server version? You can have a look at
> pg_stat_replication on the master which contains information about the
> WAL segments written, flushed and replayed on each slave:
> http://www.postgresql.org/docs/devel/static/monitoring-stats.html#PG-STAT-REPLICATION-VIEW

In production servers, 9.3.5.
In development servers, 9.4.0.

I had a quick look at this view before:

Example (on 9.3.5):
postgres=# select * from pg_stat_replication;
  pid  | usesysid | usename  | application_name | client_addr |
client_hostname | client_port |        backend_start         | state   |
sent_location | write_location
  | flush_location | replay_location | sync_priority | sync_state

------+----------+----------+------------------+-------------+-----------------+-------------+------------------------------+-----------+---------------+---------------
-+----------------+-----------------+---------------+------------
  2808 |       10 | postgres | walreceiver      | 10.68.73.1
|                 |       36075 | 2015-01-03 20:17:48.53706-02 |
streaming | 22/F94D1A90   | 22/F94D1A90
  | 22/F94D1A90    | 22/F94D1A90     |             0 | async
(1 registro)

but in either case (9.3.5, 9.4.0), I get lots of info, but not the count
of wal_segments consumed.
Would this kind of count being recorded somewhere else?
How does the server knows that the wal_segments have been exhausted?

Thanks,

Edson

Re: Replication: How to query current segments allocation relative to "Wal keep segments"?

From

Michael Paquier

Date:

05 January 2015, 15:02:30

On Mon, Jan 5, 2015 at 6:51 PM, Edson Carlos Ericksson Richter
<edsonrichter@hotmail.com> wrote:
> Would this kind of count being recorded somewhere else?
> How does the server knows that the wal_segments have been exhausted?
You should evaluate the amount of wal_keep_segments necessary using
the replication lag in terms of WAL position differences (LSN) between
the master and its slaves. pg_stat_replication gives you the WAL
position (LSN) up to where each slave has received WAL information.
Combine it with pg_current_xlog_location() to determine what is the
current location master is writing WAL and you can evaluate the number
of WAL files that need to be retained on master. Knowing that each WAL
file is normally 16MB, simply use pg_xlog_location_diff to calculate
the WAL lag as a difference of bytes (for 9.4 a simple difference
operation is possible with the data type pg_lsn), and then guess from
it the number of WAL files that are actually necessary.

If you care that much about WAL retention btw, consider using
replication slots with 9.4, just be careful to monitor the partition
where pg_xlog sits in.
--
Michael

Re: Replication: How to query current segments allocation relative to "Wal keep segments"?

From

Edson Richter

Date:

05 January 2015, 19:05:22

On 05-01-2015 10:02, Michael Paquier wrote:
> On Mon, Jan 5, 2015 at 6:51 PM, Edson Carlos Ericksson Richter
> <edsonrichter@hotmail.com> wrote:
>> Would this kind of count being recorded somewhere else?
>> How does the server knows that the wal_segments have been exhausted?
> You should evaluate the amount of wal_keep_segments necessary using
> the replication lag in terms of WAL position differences (LSN) between
> the master and its slaves. pg_stat_replication gives you the WAL
> position (LSN) up to where each slave has received WAL information.
> Combine it with pg_current_xlog_location() to determine what is the
> current location master is writing WAL and you can evaluate the number
> of WAL files that need to be retained on master. Knowing that each WAL
> file is normally 16MB, simply use pg_xlog_location_diff to calculate
> the WAL lag as a difference of bytes (for 9.4 a simple difference
> operation is possible with the data type pg_lsn), and then guess from
> it the number of WAL files that are actually necessary.
>
> If you care that much about WAL retention btw, consider using
> replication slots with 9.4, just be careful to monitor the partition
> where pg_xlog sits in.
Despite being a completely valid statement, I've two contrary thoughts
about it:

1) I cannot migrate production servers at my free will. It requires long
planning, and probably will happen only in one or two years from now
(year end 2015 or 2016)
2) I do prefer to monitor how much wal segments I'm really using (and in
need), and then fix them up to 25% above this limit, than giving a
chance to blow my disk space (I mean, is preferable to stop replication
than put in risk whole database because of disk space)

Thanks,

Edson