Re: Streaming replication and a disk full in primary - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Streaming replication and a disk full in primary
Date
Msg-id g2m3f0b79eb1004120539s59fd98a7q2c3176ae797f2fe1@mail.gmail.com
Whole thread Raw
In response to Re: Streaming replication and a disk full in primary  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Mon, Apr 12, 2010 at 7:41 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> We should remove the document "25.2.5.2. Monitoring"?
>
> I updated it to no longer claim that the primary can run out of disk
> space because of a hung WAL sender. The information about calculating
> the lag between primary and standby still seems valuable, so I didn't
> remove the whole section.

Yes.

> !      An important health indicator of streaming replication is the amount
> !      of WAL records generated in the primary, but not yet applied in the
> !      standby.

Since pg_last_xlog_receive_location doesn't let us know the WAL location
not yet applied, we should use pg_last_xlog_replay_location instead. How
How about?:

----------------     An important health indicator of streaming replication is the amount     of WAL records generated
inthe primary, but not yet applied in the     standby. You can calculate this lag by comparing the current WAL write
 
-     location on the primary with the last WAL location received by the
+     location on the primary with the last WAL location replayed by the     standby. They can be retrieved using
<function>pg_current_xlog_location</>on the primary and the
 
-     <function>pg_last_xlog_receive_location</> on the standby,
+     <function>pg_last_xlog_replay_location</> on the standby,     respectively (see <xref
linkend="functions-admin-backup-table">and     <xref linkend="functions-recovery-info-table"> for details).
 
-     The last WAL receive location in the standby is also displayed in the
-     process status of the WAL receiver process, displayed using the
-     <command>ps</> command (see <xref linkend="monitoring-ps"> for details).    </para>   </sect3>
----------------

>> Why is standby_keep_segments used even if max_wal_senders is zero?
>> In that case, ISTM we don't need to keep any WAL files in pg_xlog
>> for the standby.
>
> True. I don't think we should second guess the admin on that, though.
> Perhaps he only set max_wal_senders=0 temporarily, and will be
> disappointed if the the logs are no longer there when he sets it back to
> non-zero and restarts the server.

OK. Since the behavior is not intuitive for me, I'd like to add the note
into the end of the description about "standby_keep_segments". How about?:

----------------
This setting has effect if max_wal_senders is zero.
----------------

>> When walreceiver has gotten stuck for some reason, walsender would be
>> unable to pass through the send() system call, and also get stuck.
>> In the patch, such a walsender cannot exit forever because it cannot
>> call XLogRead(). So I think that the bgwriter needs to send the
>> exit-signal to such a too lagged walsender. Thought?
>
> Any backend can get stuck like that.

OK.

> +     },
> +
> +     {
> +         {"standby_keep_segments", PGC_SIGHUP, WAL_CHECKPOINTS,
> +             gettext_noop("Sets the number of WAL files held for standby servers"),
> +             NULL
> +         },
> +         &StandbySegments,
> +         0, 0, INT_MAX, NULL, NULL

We should s/WAL_CHECKPOINTS/WAL_REPLICATION ?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Jim Mlodgenski
Date:
Subject: Re: testing HS/SR - 1 vs 2 performance
Next
From: "Erik Rijkers"
Date:
Subject: Re: testing HS/SR - 1 vs 2 performance