Re: Review for GetWALAvailability() - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Review for GetWALAvailability() |
Date | |
Msg-id | 20200616.120236.1809496990963386593.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Review for GetWALAvailability() (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Review for GetWALAvailability()
Re: Review for GetWALAvailability() |
List | pgsql-hackers |
At Mon, 15 Jun 2020 18:59:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > It was a kind of hard to decide. Even when max_slot_wal_keep_size is > > smaller than max_wal_size, the segments more than > > max_slot_wal_keep_size are not guaranteed to be kept. In that case > > the state transits as NORMAL->LOST skipping the "RESERVED" state. > > Putting aside whether the setting is useful or not, I thought that the > > state transition is somewhat abrupt. > > IMO the direct transition of the state from normal to lost is ok to me > if each state is clearly defined. > > >> Or, if that condition is really necessary, the document should be > >> updated so that the note about the condition is added. > > Does the following make sense? > > https://www.postgresql.org/docs/13/view-pg-replication-slots.html > > normal means that the claimed files are within max_wal_size. > > + If max_slot_wal_keep_size is smaller than max_wal_size, this state > > + will not appear. > > I don't think this change is enough. For example, when > max_slot_wal_keep_size > is smaller than max_wal_size and the amount of WAL files claimed by > the slot > is smaller thhan max_slot_wal_keep_size, "reserved" is reported. But > which is > inconsistent with the meaning of "reserved" in the docs. You're right. > To consider what should be reported in wal_status, could you tell me > what > purpose and how the users is expected to use this information? I saw that the "reserved" is the state where slots are working to retain segments, and "normal" is the state to indicate that "WAL segments are within max_wal_size", which is orthogonal to the notion of "reserved". So it seems to me useless when the retained WAL segments cannot exceeds max_wal_size. With longer description they would be: "reserved under max_wal_size" "reserved over max_wal_size" "lost some segements" Come to think of that, I realized that my trouble was just the wording. Are the following wordings make sense to you? "reserved" - retained within max_wal_size "extended" - retained over max_wal_size "lost" - lost some segments With these wordings I can live with "not extended"=>"lost". Of course more appropriate wording are welcome. > Even if walsender is terminated during the state "lost", unless > checkpointer > removes the required WAL files, the state can go back to "reserved" > after > new replication connection is established. This is the same as what > you're > explaining at the above? GetWALAvailability checks restart_lsn against lastRemovedSegNo, thus the "lost" cannot be seen unless checkpointer actually have removed the segment at restart_lsn (and restart_lsn has not been invalidated). However, walsenders are killed before that segments are actually removed so there're cases where physical walreceiver reconnects before RemoveOldXloFiles removes all segments, then removed after reconnection. "lost" can go back to "resrved" in that case. (Physical walreceiver can connect to invalid-restart_lsn slot) I noticed the another issue. If some required WALs are removed, the slot will be "invalidated", that is, restart_lsn is set to invalid value. As the result we hardly see the "lost" state. It can be "fixed" by remembering the validity of a slot separately from restart_lsn. Is that worth doing? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: