Thread: min_safe_lsn column in pg_replication_slots view
Hi, Per the docs, pg_replication_slots.min_safe_lsn inedicates "the minimum LSN currently available for walsenders". When I executed pg_walfile_name() with min_safe_lsn, the function returned the name of the last removed WAL file instead of minimum available WAL file name. This happens because min_safe_lsn actually indicates the ending position (the boundary byte) of the last removed WAL file. I guess that some users would want to calculate the minimum available WAL file name from min_safe_lsn by using pg_walfile_name(), but the result would be incorrect. Isn't this confusing? min_safe_lsn should indicate the bondary byte + 1, instead? BTW, I just wonder why each row in pg_replication_slots needs to have min_safe_lsn column? Basically min_safe_lsn should be the same between every replication slots. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Mon, Jun 15, 2020 at 12:40:03PM +0900, Fujii Masao wrote: > BTW, I just wonder why each row in pg_replication_slots needs to have > min_safe_lsn column? Basically min_safe_lsn should be the same between > every replication slots. Indeed, that's confusing in its current shape. I would buy putting this value into pg_replication_slots if there were some consistency of the data to worry about because of locking issues, but here this data is controlled within info_lck, which is out of the the repslot LW lock. So I think that it is incorrect to put this data in this view and that we should remove it, and that instead we had better push for a system view that maps with the contents of XLogCtl. -- Michael
Attachment
At Mon, 15 Jun 2020 13:44:31 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Mon, Jun 15, 2020 at 12:40:03PM +0900, Fujii Masao wrote: > > BTW, I just wonder why each row in pg_replication_slots needs to have > > min_safe_lsn column? Basically min_safe_lsn should be the same between > > every replication slots. > > Indeed, that's confusing in its current shape. I would buy putting > this value into pg_replication_slots if there were some consistency of > the data to worry about because of locking issues, but here this data > is controlled within info_lck, which is out of the the repslot LW > lock. So I think that it is incorrect to put this data in this view > and that we should remove it, and that instead we had better push for > a system view that maps with the contents of XLogCtl. It was once the difference from the safe_lsn to restart_lsn which is distinct among slots. Then it was changed to the safe_lsn. I agree to the discussion above, but it is needed anywhere since no one can know the margin until the slot goes to the "lost" state without it. (Note that currently even wal_status and min_safe_lsn can be inconsistent in a line.) Just for the need for table-consistency and in-line consistency, we could just remember the result of XLogGetLastRemovedSegno() around taking info_lock in the function. That doesn't make a practical difference but makes the view look consistent. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index d6fe205eb4..f57d67a900 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9498,7 +9498,7 @@ CreateRestartPoint(int flags) * * WALAVAIL_INVALID_LSN means the slot hasn't been set to reserve WAL. */ WALAvailability -GetWALAvailability(XLogRecPtr targetLSN) +GetWALAvailability(XLogRecPtr targetLSN, XLogSegNo last_removed_seg) { XLogRecPtr currpos; /* current write LSN */ XLogSegNo currSeg; /* segid of currpos */ @@ -9524,7 +9524,7 @@ GetWALAvailability(XLogRecPtr targetLSN) * the first WAL segment file since startup, which causes the status being * wrong under certain abnormal conditions but that doesn't actually harm. */ - oldestSeg = XLogGetLastRemovedSegno() + 1; + oldestSeg = last_removed_seg + 1; /* calculate oldest segment by max_wal_size and wal_keep_segments */ XLByteToSeg(currpos, currSeg, wal_segment_size); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 1b929a603e..bd2e3e84ed 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -243,6 +243,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) MemoryContext per_query_ctx; MemoryContext oldcontext; int slotno; + XLogSegNo last_removed_seg; /* check to see if caller supports us returning a tuplestore */ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) @@ -272,6 +273,14 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) rsinfo->setResult = tupstore; rsinfo->setDesc = tupdesc; + /* + * Remember the last removed segment at this point for the consistency in + * this table. Since there's no interlock between slot data and + * checkpointer, the segment can be removed in-between, but that doesn't + * make any practical difference. + */ + last_removed_seg = XLogGetLastRemovedSegno(); + MemoryContextSwitchTo(oldcontext); LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); @@ -282,7 +291,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) Datum values[PG_GET_REPLICATION_SLOTS_COLS]; bool nulls[PG_GET_REPLICATION_SLOTS_COLS]; WALAvailability walstate; - XLogSegNo last_removed_seg; int i; if (!slot->in_use) @@ -342,7 +350,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) else nulls[i++] = true; - walstate = GetWALAvailability(slot_contents.data.restart_lsn); + walstate = GetWALAvailability(slot_contents.data.restart_lsn, + last_removed_seg); switch (walstate) { @@ -368,7 +377,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) if (max_slot_wal_keep_size_mb >= 0 && (walstate == WALAVAIL_NORMAL || walstate == WALAVAIL_RESERVED) && - ((last_removed_seg = XLogGetLastRemovedSegno()) != 0)) + (last_removed_seg != 0)) { XLogRecPtr min_safe_lsn; diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index e917dfe92d..c05a18148d 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -326,7 +326,8 @@ extern void ShutdownXLOG(int code, Datum arg); extern void InitXLOGAccess(void); extern void CreateCheckPoint(int flags); extern bool CreateRestartPoint(int flags); -extern WALAvailability GetWALAvailability(XLogRecPtr restart_lsn); +extern WALAvailability GetWALAvailability(XLogRecPtr targetLSN, + XLogSegNo last_removed_seg); extern XLogRecPtr CalculateMaxmumSafeLSN(void); extern void XLogPutNextOid(Oid nextOid); extern XLogRecPtr XLogRestorePoint(const char *rpName);
On 2020/06/15 16:35, Kyotaro Horiguchi wrote: > At Mon, 15 Jun 2020 13:44:31 +0900, Michael Paquier <michael@paquier.xyz> wrote in >> On Mon, Jun 15, 2020 at 12:40:03PM +0900, Fujii Masao wrote: >>> BTW, I just wonder why each row in pg_replication_slots needs to have >>> min_safe_lsn column? Basically min_safe_lsn should be the same between >>> every replication slots. >> >> Indeed, that's confusing in its current shape. I would buy putting >> this value into pg_replication_slots if there were some consistency of >> the data to worry about because of locking issues, but here this data >> is controlled within info_lck, which is out of the the repslot LW >> lock. So I think that it is incorrect to put this data in this view >> and that we should remove it, and that instead we had better push for >> a system view that maps with the contents of XLogCtl. Agreed. But as you know it's too late to do that for v13... So firstly I'd like to fix the issues in pg_replication_slots view, and then we can improve the things later for v14 if necessary. > It was once the difference from the safe_lsn to restart_lsn which is > distinct among slots. Then it was changed to the safe_lsn. I agree to > the discussion above, but it is needed anywhere since no one can know > the margin until the slot goes to the "lost" state without it. (Note > that currently even wal_status and min_safe_lsn can be inconsistent in > a line.) > > Just for the need for table-consistency and in-line consistency, we > could just remember the result of XLogGetLastRemovedSegno() around > taking info_lock in the function. That doesn't make a practical > difference but makes the view look consistent. Agreed. Thanks for the patch. Here are the review comments: Not only the last removed segment but also current write position should be obtain at the beginning of pg_get_replication_slots() and should be given to GetWALAvailability(), for the consistency? Even after applying your patch, min_safe_lsn is calculated for each slot even though the calculated result is always the same. Which is a bit waste of cycle. We should calculate min_safe_lsn at the beginning and use it for each slot? XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0, Isn't it better to use 1 as the second argument of the above, in order to address the issue that I reported upthread? Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn) returns would be confusing. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Thanks for the comments. At Wed, 17 Jun 2020 21:37:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > On 2020/06/15 16:35, Kyotaro Horiguchi wrote: > > At Mon, 15 Jun 2020 13:44:31 +0900, Michael Paquier > > <michael@paquier.xyz> wrote in > >> On Mon, Jun 15, 2020 at 12:40:03PM +0900, Fujii Masao wrote: > >>> BTW, I just wonder why each row in pg_replication_slots needs to have > >>> min_safe_lsn column? Basically min_safe_lsn should be the same between > >>> every replication slots. > >> > >> Indeed, that's confusing in its current shape. I would buy putting > >> this value into pg_replication_slots if there were some consistency of > >> the data to worry about because of locking issues, but here this data > >> is controlled within info_lck, which is out of the the repslot LW > >> lock. So I think that it is incorrect to put this data in this view > >> and that we should remove it, and that instead we had better push for > >> a system view that maps with the contents of XLogCtl. > > Agreed. But as you know it's too late to do that for v13... > So firstly I'd like to fix the issues in pg_replication_slots view, > and then we can improve the things later for v14 if necessary. > > > > It was once the difference from the safe_lsn to restart_lsn which is > > distinct among slots. Then it was changed to the safe_lsn. I agree to > > the discussion above, but it is needed anywhere since no one can know > > the margin until the slot goes to the "lost" state without it. (Note > > that currently even wal_status and min_safe_lsn can be inconsistent in > > a line.) > > Just for the need for table-consistency and in-line consistency, we > > could just remember the result of XLogGetLastRemovedSegno() around > > taking info_lock in the function. That doesn't make a practical > > difference but makes the view look consistent. > > Agreed. Thanks for the patch. Here are the review comments: > > > Not only the last removed segment but also current write position > should be obtain at the beginning of pg_get_replication_slots() > and should be given to GetWALAvailability(), for the consistency? You are right. Though I faintly thought that I didn't need that since WriteRecPtr doesn't move by so wide steps as removed_segment, actually it moves. > Even after applying your patch, min_safe_lsn is calculated for > each slot even though the calculated result is always the same. > Which is a bit waste of cycle. We should calculate min_safe_lsn > at the beginning and use it for each slot? Agreed. That may results in a wastful calculation but it's better than repeated wasteful calculations. > XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0, > > Isn't it better to use 1 as the second argument of the above, > in order to address the issue that I reported upthread? > Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn) > returns > would be confusing. Mmm. pg_walfile_name seems too specialize to pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for segment boundaries.) I'm not willing to do that only to follow such suspicious(?) specification, but surely it would practically be better doing that. Please find the attached first patch. I found that there's no reason to hide min_safe_lsn when wal_status has certain values. So I changed it to be shown always. By the way, I noticed that when a replication slot reserves all existing WAL segments, checkpoint cannot remove a file and lastRemovedSegment is left being 0. The second attached forces RemoveOldXlogFiles to initialize the variable even when no WAL segments are removed. It puts no additional loads on file system since the directory is scanned anyway. My old proposal to unconditionally initialize it separately from checkpoint was rejected, but I think this is acceptable. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 6742e3b31ac53ef54458d64e34feeac7d4c710d2 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com> Date: Thu, 18 Jun 2020 13:36:23 +0900 Subject: [PATCH v2 1/2] Make pg_stat_replication view consistent min_safe_lsn is not consistent within the result of a query. Make it consistent. On the way fixing that min_safe_lsn is changed to show the second byte of a segment instead of first byte in order to users can get the intended segment file name from pg_walfile_name. --- src/backend/access/transam/xlog.c | 15 +++++++------ src/backend/replication/slotfuncs.c | 35 ++++++++++++++++++++--------- src/include/access/xlog.h | 4 +++- 3 files changed, 35 insertions(+), 19 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 55cac186dc..940f5fcb18 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9484,6 +9484,9 @@ CreateRestartPoint(int flags) * Report availability of WAL for the given target LSN * (typically a slot's restart_lsn) * + * currWriteLSN and lastRemovedSeg is the results of XLogGetLastRemovedSegno() + * and GetXLogWriteRecPtr() respectively at the referring time. + * * Returns one of the following enum values: * * WALAVAIL_NORMAL means targetLSN is available because it is in the range * of max_wal_size. @@ -9498,9 +9501,9 @@ CreateRestartPoint(int flags) * * WALAVAIL_INVALID_LSN means the slot hasn't been set to reserve WAL. */ WALAvailability -GetWALAvailability(XLogRecPtr targetLSN) +GetWALAvailability(XLogRecPtr targetLSN, XLogRecPtr currWriteLSN, + XLogSegNo lastRemovedSeg) { - XLogRecPtr currpos; /* current write LSN */ XLogSegNo currSeg; /* segid of currpos */ XLogSegNo targetSeg; /* segid of targetLSN */ XLogSegNo oldestSeg; /* actual oldest segid */ @@ -9513,21 +9516,19 @@ GetWALAvailability(XLogRecPtr targetLSN) if (XLogRecPtrIsInvalid(targetLSN)) return WALAVAIL_INVALID_LSN; - currpos = GetXLogWriteRecPtr(); - /* calculate oldest segment currently needed by slots */ XLByteToSeg(targetLSN, targetSeg, wal_segment_size); - KeepLogSeg(currpos, &oldestSlotSeg); + KeepLogSeg(currWriteLSN, &oldestSlotSeg); /* * Find the oldest extant segment file. We get 1 until checkpoint removes * the first WAL segment file since startup, which causes the status being * wrong under certain abnormal conditions but that doesn't actually harm. */ - oldestSeg = XLogGetLastRemovedSegno() + 1; + oldestSeg = lastRemovedSeg + 1; /* calculate oldest segment by max_wal_size and wal_keep_segments */ - XLByteToSeg(currpos, currSeg, wal_segment_size); + XLByteToSeg(currWriteLSN, currSeg, wal_segment_size); keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments), wal_segment_size) + 1; diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 1b929a603e..063c6dd435 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -243,6 +243,9 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) MemoryContext per_query_ctx; MemoryContext oldcontext; int slotno; + XLogSegNo last_removed_seg; + XLogRecPtr curr_write_lsn; + XLogRecPtr min_safe_lsn = 0; /* check to see if caller supports us returning a tuplestore */ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) @@ -272,6 +275,24 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) rsinfo->setResult = tupstore; rsinfo->setDesc = tupdesc; + /* + * Remember the last removed segment and current WAL write LSN at this + * point for the consistency in this table. Since there's no interlock + * between slot data and checkpointer, the segment can be removed + * in-between, but that doesn't make any practical difference. Also we + * calculate safe_min_lsn here as the same value is shown for all slots. + */ + last_removed_seg = XLogGetLastRemovedSegno(); + curr_write_lsn = GetXLogWriteRecPtr(); + + /* + * Show the next byte after segment boundary as min_safe_lsn so that + * pg_walfile_name() returns the correct file name for the value. + */ + if (max_slot_wal_keep_size_mb >= 0 && last_removed_seg != 0) + XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 1, + wal_segment_size, min_safe_lsn); + MemoryContextSwitchTo(oldcontext); LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); @@ -282,7 +303,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) Datum values[PG_GET_REPLICATION_SLOTS_COLS]; bool nulls[PG_GET_REPLICATION_SLOTS_COLS]; WALAvailability walstate; - XLogSegNo last_removed_seg; int i; if (!slot->in_use) @@ -342,7 +362,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) else nulls[i++] = true; - walstate = GetWALAvailability(slot_contents.data.restart_lsn); + walstate = GetWALAvailability(slot_contents.data.restart_lsn, + curr_write_lsn, last_removed_seg); switch (walstate) { @@ -366,16 +387,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) elog(ERROR, "invalid walstate: %d", (int) walstate); } - if (max_slot_wal_keep_size_mb >= 0 && - (walstate == WALAVAIL_NORMAL || walstate == WALAVAIL_RESERVED) && - ((last_removed_seg = XLogGetLastRemovedSegno()) != 0)) - { - XLogRecPtr min_safe_lsn; - - XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0, - wal_segment_size, min_safe_lsn); + if (min_safe_lsn != 0) values[i++] = Int64GetDatum(min_safe_lsn); - } else nulls[i++] = true; diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index e917dfe92d..bbe00e7195 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -326,7 +326,9 @@ extern void ShutdownXLOG(int code, Datum arg); extern void InitXLOGAccess(void); extern void CreateCheckPoint(int flags); extern bool CreateRestartPoint(int flags); -extern WALAvailability GetWALAvailability(XLogRecPtr restart_lsn); +extern WALAvailability GetWALAvailability(XLogRecPtr targetLSN, + XLogRecPtr currWriteLSN, + XLogSegNo last_removed_seg); extern XLogRecPtr CalculateMaxmumSafeLSN(void); extern void XLogPutNextOid(Oid nextOid); extern XLogRecPtr XLogRestorePoint(const char *rpName); -- 2.18.4 From 10425b3c062ed3972c0ad067adb7c14c5dca43c8 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com> Date: Thu, 18 Jun 2020 14:49:48 +0900 Subject: [PATCH v2 2/2] Forcibly initialize lastRemovedSegNo at the first checkpoint If we have a replication slot retaining all existing WAL segments, a checkpoint doesn't initialize lastRemovedSegNo. That means pg_replication_slots can not show min_safe_lsn until any slot loses a segment. Forcibly initialize it even if no WAL segments are removed at the first checkpoint. --- src/backend/access/transam/xlog.c | 33 +++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 940f5fcb18..282bac32ab 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4025,6 +4025,16 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr) DIR *xldir; struct dirent *xlde; char lastoff[MAXFNAMELEN]; + char oldest[MAXFNAMELEN]; + bool remember_oldest = false; + + /* + * If we haven't set the initial last removed segno, force to set it even + * if we wouldn't have removed any segments. + */ + oldest[0] = 0; + if (XLogGetLastRemovedSegno() == 0) + remember_oldest = true; /* * Construct a filename of the last segment to be kept. The timeline ID @@ -4062,10 +4072,33 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr) { /* Update the last removed location in shared memory first */ UpdateLastRemovedPtr(xlde->d_name); + remember_oldest = false; RemoveXlogFile(xlde->d_name, lastredoptr, endptr); } } + /* Remember the oldest file left in the directotry */ + else if(remember_oldest && + (oldest[0] == 0 || strcmp(xlde->d_name + 8, oldest + 8) < 0)) + strncpy(oldest, xlde->d_name, MAXFNAMELEN); + } + + /* + * We haven't initialize the pointer, initialize it. + * The last removed pointer is the oldest existing segment minus 1. + */ + if (remember_oldest && oldest[0] != 0) + { + uint32 tli; + XLogSegNo segno; + + XLogFromFileName(oldest, &tli, &segno, wal_segment_size); + + if (segno > 1) + { + XLogFileName(oldest, tli, segno - 1, wal_segment_size); + UpdateLastRemovedPtr(oldest); + } } FreeDir(xldir); -- 2.18.4
On Thu, Jun 18, 2020 at 11:52 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Wed, 17 Jun 2020 21:37:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > On 2020/06/15 16:35, Kyotaro Horiguchi wrote: > > Isn't it better to use 1 as the second argument of the above, > > in order to address the issue that I reported upthread? > > Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn) > > returns > > would be confusing. > > Mmm. pg_walfile_name seems too specialize to > pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for > segment boundaries.) I'm not willing to do that only to follow such > suspicious(?) specification, but surely it would practically be better > doing that. Please find the attached first patch. > It is a little unclear to me how this or any proposed patch will solve the original problem reported by Fujii-San? Basically, the problem arises because we don't have an interlock between when the checkpoint removes the WAL segment and the view tries to acquire the same. Am, I missing something? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > On Thu, Jun 18, 2020 at 11:52 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > At Wed, 17 Jun 2020 21:37:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2020/06/15 16:35, Kyotaro Horiguchi wrote: > > > Isn't it better to use 1 as the second argument of the above, > > > in order to address the issue that I reported upthread? > > > Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn) > > > returns > > > would be confusing. > > > > Mmm. pg_walfile_name seems too specialize to > > pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for > > segment boundaries.) I'm not willing to do that only to follow such > > suspicious(?) specification, but surely it would practically be better > > doing that. Please find the attached first patch. > > > > It is a little unclear to me how this or any proposed patch will solve > the original problem reported by Fujii-San? Basically, the problem > arises because we don't have an interlock between when the checkpoint > removes the WAL segment and the view tries to acquire the same. Am, I > missing something? I'm not sure, but I don't get the point of blocking WAL segment removal until the view is completed. The said columns of the view are just for monitoring, which needs an information snapshot seemingly taken at a certain time. And InvalidateObsoleteReplicationSlots kills walsenders using lastRemovedSegNo of a different time. The two are independent each other. Also the patch changes min_safe_lsn to show an LSN at segment boundary + 1. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Jun 19, 2020 at 10:02:54AM +0900, Kyotaro Horiguchi wrote: > At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in >> It is a little unclear to me how this or any proposed patch will solve >> the original problem reported by Fujii-San? Basically, the problem >> arises because we don't have an interlock between when the checkpoint >> removes the WAL segment and the view tries to acquire the same. Am, I >> missing something? The proposed patch fetches the computation of the minimum LSN across all slots before taking ReplicationSlotControlLock so its value gets more lossy, and potentially older than what the slots actually include. So it is an attempt to take the safest spot possible. Honestly, I find a bit silly the design to compute and use the same minimum LSN value for all the tuples returned by pg_get_replication_slots, and you can actually get a pretty good estimate of that by emulating ReplicationSlotsComputeRequiredLSN() directly with what pg_replication_slot provides as we have a min() aggregate for pg_lsn. For these reasons, I think that we should remove for now this information from the view, and reconsider this part more carefully for 14~ with a clear definition of how much lossiness we are ready to accept for the information provided here, if necessary. We could for example just have a separate SQL function that just grabs this value (or a more global SQL view for XLogCtl data that includes this data). > I'm not sure, but I don't get the point of blocking WAL segment > removal until the view is completed. We should really not do that anyway for a monitoring view. -- Michael
Attachment
At Fri, 19 Jun 2020 10:39:58 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Fri, Jun 19, 2020 at 10:02:54AM +0900, Kyotaro Horiguchi wrote: > > At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > >> It is a little unclear to me how this or any proposed patch will solve > >> the original problem reported by Fujii-San? Basically, the problem > >> arises because we don't have an interlock between when the checkpoint > >> removes the WAL segment and the view tries to acquire the same. Am, I > >> missing something? > > The proposed patch fetches the computation of the minimum LSN across > all slots before taking ReplicationSlotControlLock so its value gets > more lossy, and potentially older than what the slots actually > include. So it is an attempt to take the safest spot possible. Minimum LSN (lastRemovedSegNo) is not protected by the lock. That makes no defference. > Honestly, I find a bit silly the design to compute and use the same > minimum LSN value for all the tuples returned by > pg_get_replication_slots, and you can actually get a pretty good I see it as silly. I think I said upthread that it was the distance to the point where the slot loses a segment, and it was rejected but just removing it makes us unable to estimate the distance so it is there. > estimate of that by emulating ReplicationSlotsComputeRequiredLSN() > directly with what pg_replication_slot provides as we have a min() > aggregate for pg_lsn. min(lastRemovedSegNo) is the earliest value. It is enough to read it at the first then use it in all slots. > For these reasons, I think that we should remove for now this > information from the view, and reconsider this part more carefully for > 14~ with a clear definition of how much lossiness we are ready to > accept for the information provided here, if necessary. We could for > example just have a separate SQL function that just grabs this value > (or a more global SQL view for XLogCtl data that includes this data). I think, we need at least one of the "distance" above or min_safe_lsn in anywhere reachable from users. > > I'm not sure, but I don't get the point of blocking WAL segment > > removal until the view is completed. > > We should really not do that anyway for a monitoring view. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Jun 19, 2020 at 6:32 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > > On Thu, Jun 18, 2020 at 11:52 AM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > > > > > At Wed, 17 Jun 2020 21:37:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > > On 2020/06/15 16:35, Kyotaro Horiguchi wrote: > > > > Isn't it better to use 1 as the second argument of the above, > > > > in order to address the issue that I reported upthread? > > > > Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn) > > > > returns > > > > would be confusing. > > > > > > Mmm. pg_walfile_name seems too specialize to > > > pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for > > > segment boundaries.) I'm not willing to do that only to follow such > > > suspicious(?) specification, but surely it would practically be better > > > doing that. Please find the attached first patch. > > > > > > > It is a little unclear to me how this or any proposed patch will solve > > the original problem reported by Fujii-San? Basically, the problem > > arises because we don't have an interlock between when the checkpoint > > removes the WAL segment and the view tries to acquire the same. Am, I > > missing something? > > I'm not sure, but I don't get the point of blocking WAL segment > removal until the view is completed. > I am not suggesting to do that. > The said columns of the view are > just for monitoring, which needs an information snapshot seemingly > taken at a certain time. And InvalidateObsoleteReplicationSlots kills > walsenders using lastRemovedSegNo of a different time. The two are > independent each other. > > Also the patch changes min_safe_lsn to show an LSN at segment boundary > + 1. > But aren't we doing last_removed_seg+1 even without the patch? See code below - { - XLogRecPtr min_safe_lsn; - - XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0, - wal_segment_size, min_safe_lsn); With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jun 19, 2020 at 8:44 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Fri, 19 Jun 2020 10:39:58 +0900, Michael Paquier <michael@paquier.xyz> wrote in > > Honestly, I find a bit silly the design to compute and use the same > > minimum LSN value for all the tuples returned by > > pg_get_replication_slots, and you can actually get a pretty good > > I see it as silly. I think I said upthread that it was the distance > to the point where the slot loses a segment, and it was rejected but > just removing it makes us unable to estimate the distance so it is > there. > IIUC, the value of min_safe_lsn will lesser than restart_lsn, so one can compute the difference of those to see how much ahead the replication slot's restart_lsn is from min_safe_lsn but still it is not clear how user will make any use of it. Can you please explain how the distance you are talking about is useful to users or anyone? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
At Fri, 19 Jun 2020 09:09:03 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > On Fri, Jun 19, 2020 at 8:44 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > At Fri, 19 Jun 2020 10:39:58 +0900, Michael Paquier <michael@paquier.xyz> wrote in > > > Honestly, I find a bit silly the design to compute and use the same > > > minimum LSN value for all the tuples returned by > > > pg_get_replication_slots, and you can actually get a pretty good > > > > I see it as silly. I think I said upthread that it was the distance > > to the point where the slot loses a segment, and it was rejected but > > just removing it makes us unable to estimate the distance so it is > > there. > > > > IIUC, the value of min_safe_lsn will lesser than restart_lsn, so one > can compute the difference of those to see how much ahead the > replication slot's restart_lsn is from min_safe_lsn but still it is > not clear how user will make any use of it. Can you please explain > how the distance you are talking about is useful to users or anyone? When max_slot_wal_keep_size is set, the slot may retain up to as many as that amount of old WAL segments then suddenly loses the oldest segments. *I* thought that I would use it in an HA cluster tool to inform users about the remaining time (not literally, of course) a disconnected standy is allowed diconnected. Of course even if some segments have been lost, they could be copied from the primary's archive so that's not critical in theory. -- Kyotaro Horiguchi NTT Open Source Software Center
At Fri, 19 Jun 2020 08:59:48 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > > > > Mmm. pg_walfile_name seems too specialize to > > > > pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for > > > > segment boundaries.) I'm not willing to do that only to follow such > > > > suspicious(?) specification, but surely it would practically be better > > > > doing that. Please find the attached first patch. > > > > > > > > > > It is a little unclear to me how this or any proposed patch will solve > > > the original problem reported by Fujii-San? Basically, the problem > > > arises because we don't have an interlock between when the checkpoint > > > removes the WAL segment and the view tries to acquire the same. Am, I > > > missing something? > > > > I'm not sure, but I don't get the point of blocking WAL segment > > removal until the view is completed. > > > > I am not suggesting to do that. > > > The said columns of the view are > > just for monitoring, which needs an information snapshot seemingly > > taken at a certain time. And InvalidateObsoleteReplicationSlots kills > > walsenders using lastRemovedSegNo of a different time. The two are > > independent each other. > > > > Also the patch changes min_safe_lsn to show an LSN at segment boundary > > + 1. > > > > But aren't we doing last_removed_seg+1 even without the patch? See code below > > - { > - XLogRecPtr min_safe_lsn; > - > - XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0, > - wal_segment_size, min_safe_lsn); It is at the beginning byte of the *next* segment. Fujii-san told that it should be the next byte of it, namely "XLogSegNoOffsetToRecPtr(last_removed_seg + 1, *1*,", and the patch calculates as that. It adds the follows instead. + if (max_slot_wal_keep_size_mb >= 0 && last_removed_seg != 0) + XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 1, + wal_segment_size, min_safe_lsn); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/06/19 10:39, Michael Paquier wrote: > On Fri, Jun 19, 2020 at 10:02:54AM +0900, Kyotaro Horiguchi wrote: >> At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in >>> It is a little unclear to me how this or any proposed patch will solve >>> the original problem reported by Fujii-San? Basically, the problem >>> arises because we don't have an interlock between when the checkpoint >>> removes the WAL segment and the view tries to acquire the same. Am, I >>> missing something? > > The proposed patch fetches the computation of the minimum LSN across > all slots before taking ReplicationSlotControlLock so its value gets > more lossy, and potentially older than what the slots actually > include. So it is an attempt to take the safest spot possible. > > Honestly, I find a bit silly the design to compute and use the same > minimum LSN value for all the tuples returned by > pg_get_replication_slots, and you can actually get a pretty good > estimate of that by emulating ReplicationSlotsComputeRequiredLSN() > directly with what pg_replication_slot provides as we have a min() > aggregate for pg_lsn. > > For these reasons, I think that we should remove for now this > information from the view, and reconsider this part more carefully for > 14~ with a clear definition of how much lossiness we are ready to > accept for the information provided here, if necessary. Agreed. But isn't it too late to remove the columns (i.e., change the catalog) for v13? Because v13 beta1 was already released. IIUC the catalog should not be changed since beta1 release so that users can upgrade PostgreSQL without initdb. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, Jun 19, 2020 at 04:13:27PM +0900, Fujii Masao wrote: > Agreed. But isn't it too late to remove the columns (i.e., change > the catalog) for v13? Because v13 beta1 was already released. > IIUC the catalog should not be changed since beta1 release so that > users can upgrade PostgreSQL without initdb. Catalog bumps have happened in the past between beta versions: git log -p REL_12_BETA1..REL_12_BETA2 src/include/catalog/catversion.h git log -p REL_11_BETA1..REL_11_BETA2 src/include/catalog/catversion.h git log -p REL_10_BETA1..REL_10_BETA2 src/include/catalog/catversion.h So we usually avoid to do that between betas, but my take here is that a catalog bump is better than regretting a change we may have to live with after the release is sealed. -- Michael
Attachment
At Fri, 19 Jun 2020 16:36:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Fri, Jun 19, 2020 at 04:13:27PM +0900, Fujii Masao wrote: > > Agreed. But isn't it too late to remove the columns (i.e., change > > the catalog) for v13? Because v13 beta1 was already released. > > IIUC the catalog should not be changed since beta1 release so that > > users can upgrade PostgreSQL without initdb. > > Catalog bumps have happened in the past between beta versions: > git log -p REL_12_BETA1..REL_12_BETA2 src/include/catalog/catversion.h > git log -p REL_11_BETA1..REL_11_BETA2 src/include/catalog/catversion.h > git log -p REL_10_BETA1..REL_10_BETA2 src/include/catalog/catversion.h > > So we usually avoid to do that between betas, but my take here is that > a catalog bump is better than regretting a change we may have to live > with after the release is sealed. FWIW if we decide that it is really useless, I agree to remove it now. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/06/19 16:43, Kyotaro Horiguchi wrote: > At Fri, 19 Jun 2020 16:36:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in >> On Fri, Jun 19, 2020 at 04:13:27PM +0900, Fujii Masao wrote: >>> Agreed. But isn't it too late to remove the columns (i.e., change >>> the catalog) for v13? Because v13 beta1 was already released. >>> IIUC the catalog should not be changed since beta1 release so that >>> users can upgrade PostgreSQL without initdb. >> >> Catalog bumps have happened in the past between beta versions: >> git log -p REL_12_BETA1..REL_12_BETA2 src/include/catalog/catversion.h >> git log -p REL_11_BETA1..REL_11_BETA2 src/include/catalog/catversion.h >> git log -p REL_10_BETA1..REL_10_BETA2 src/include/catalog/catversion.h >> >> So we usually avoid to do that between betas, but my take here is that >> a catalog bump is better than regretting a change we may have to live >> with after the release is sealed. Sounds reasonable. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, Jun 19, 2020 at 05:34:01PM +0900, Fujii Masao wrote: > On 2020/06/19 16:43, Kyotaro Horiguchi wrote: >> At Fri, 19 Jun 2020 16:36:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in >>> So we usually avoid to do that between betas, but my take here is that >>> a catalog bump is better than regretting a change we may have to live >>> with after the release is sealed. > > Sounds reasonable. If we want to make this happen, I am afraid that the time is short as beta2 is planned for next week. As the version will be likely tagged by Monday US time, it would be good to get this addressed within 48 hours to give some room to the buildfarm to react. Attached is a straight-forward proposal of patch. Any thoughts? (The change in catversion.h is a self-reminder.) -- Michael
Attachment
At Fri, 19 Jun 2020 21:15:52 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Fri, Jun 19, 2020 at 05:34:01PM +0900, Fujii Masao wrote: > > On 2020/06/19 16:43, Kyotaro Horiguchi wrote: > >> At Fri, 19 Jun 2020 16:36:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in > >>> So we usually avoid to do that between betas, but my take here is that > >>> a catalog bump is better than regretting a change we may have to live > >>> with after the release is sealed. > > > > Sounds reasonable. > > If we want to make this happen, I am afraid that the time is short as > beta2 is planned for next week. As the version will be likely tagged > by Monday US time, it would be good to get this addressed within 48 > hours to give some room to the buildfarm to react. Attached is a > straight-forward proposal of patch. Any thoughts? > > (The change in catversion.h is a self-reminder.) Thanks for the patch. As a whole it contains all needed for ripping off the min_safe_lsn. Some items in the TAP test gets coarse but none of them lose significance. Compiles almost cleanly and passes all tests including TAP test. The variable last_removed_seg in slotfuncs.c:285 is left alone but no longer used after applying this patch. It should be removed as well. Other than that the patch looks good to me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/06/19 21:15, Michael Paquier wrote: > On Fri, Jun 19, 2020 at 05:34:01PM +0900, Fujii Masao wrote: >> On 2020/06/19 16:43, Kyotaro Horiguchi wrote: >>> At Fri, 19 Jun 2020 16:36:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in >>>> So we usually avoid to do that between betas, but my take here is that >>>> a catalog bump is better than regretting a change we may have to live >>>> with after the release is sealed. >> >> Sounds reasonable. > > If we want to make this happen, I am afraid that the time is short as > beta2 is planned for next week. As the version will be likely tagged > by Monday US time, it would be good to get this addressed within 48 > hours to give some room to the buildfarm to react. Attached is a > straight-forward proposal of patch. Any thoughts? It's better if we can do that. But I think that we should hear Alvaro's opinion about this before rushing to push the patch. Even if we miss beta2 as the result of that, I'm ok. We would be able to push something better into beta3. So, CC Alvaro. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-Jun-20, Fujii Masao wrote: > It's better if we can do that. But I think that we should hear Alvaro's opinion > about this before rushing to push the patch. Even if we miss beta2 as the result > of that, I'm ok. We would be able to push something better into beta3. > So, CC Alvaro. Uh, I was not aware of this thread. I'll go over it now and let you know. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Jun-19, Michael Paquier wrote: > If we want to make this happen, I am afraid that the time is short as > beta2 is planned for next week. As the version will be likely tagged > by Monday US time, it would be good to get this addressed within 48 > hours to give some room to the buildfarm to react. Attached is a > straight-forward proposal of patch. Any thoughts? I don't disagree with removing the LSN column, but at the same time we need to provide *some* way for users to monitor this, so let's add a function to extract the value they need for that. It seems simple enough. I cannot implement it myself now, though. I've reached the end of my week and I'm not sure I'll be able to work on it during the weekend. I agree with Kyotaro's opinion that the pg_walfile_name() function seems too single-minded ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jun 20, 2020 at 7:12 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2020-Jun-19, Michael Paquier wrote: > > > If we want to make this happen, I am afraid that the time is short as > > beta2 is planned for next week. As the version will be likely tagged > > by Monday US time, it would be good to get this addressed within 48 > > hours to give some room to the buildfarm to react. Attached is a > > straight-forward proposal of patch. Any thoughts? > > I don't disagree with removing the LSN column, but at the same time we > need to provide *some* way for users to monitor this, so let's add a > function to extract the value they need for that. It seems simple > enough. > Isn't this information specific to checkpoints, so maybe better to display in view pg_stat_bgwriter? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote: > On Sat, Jun 20, 2020 at 7:12 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> I don't disagree with removing the LSN column, but at the same time we >> need to provide *some* way for users to monitor this, so let's add a >> function to extract the value they need for that. It seems simple >> enough. > > Isn't this information specific to checkpoints, so maybe better to > display in view pg_stat_bgwriter? Not sure that's a good match. If we decide to expose that, a separate function returning a LSN based on the segment number from XLogGetLastRemovedSegno() sounds fine to me, like pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in mind? -- Michael
Attachment
On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote: > On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote: >> Isn't this information specific to checkpoints, so maybe better to >> display in view pg_stat_bgwriter? > > Not sure that's a good match. If we decide to expose that, a separate > function returning a LSN based on the segment number from > XLogGetLastRemovedSegno() sounds fine to me, like > pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in > mind? I was thinking on this one for the last couple of days, and came up with the name pg_wal_oldest_lsn(), as per the attached, traking the oldest WAL location still available. That's unfortunately too late for beta2, but let's continue the discussion. -- Michael
Attachment
On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote: > > On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote: > >> Isn't this information specific to checkpoints, so maybe better to > >> display in view pg_stat_bgwriter? > > > > Not sure that's a good match. If we decide to expose that, a separate > > function returning a LSN based on the segment number from > > XLogGetLastRemovedSegno() sounds fine to me, like > > pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in > > mind? > > I was thinking on this one for the last couple of days, and came up > with the name pg_wal_oldest_lsn(), as per the attached, traking the > oldest WAL location still available. > I feel such a function is good to have but I am not sure if there is a need to tie it with the removal of min_safe_lsn column. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020/06/22 21:01, Amit Kapila wrote: > On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote: >> >> On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote: >>> On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote: >>>> Isn't this information specific to checkpoints, so maybe better to >>>> display in view pg_stat_bgwriter? >>> >>> Not sure that's a good match. If we decide to expose that, a separate >>> function returning a LSN based on the segment number from >>> XLogGetLastRemovedSegno() sounds fine to me, like >>> pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in >>> mind? >> >> I was thinking on this one for the last couple of days, and came up >> with the name pg_wal_oldest_lsn(), as per the attached, traking the >> oldest WAL location still available. Thanks for the patch! + <literal>NULL</literal> if no WAL segments have been removed since + startup. Isn't this confusing? I think that we should store the last removed WAL segment to somewhere (e.g., pg_control) and restore it at the startup, so that we can see the actual value even after the startup. Or we should scan pg_wal directory and find the "minimal" WAL segment and return its LSN. > I feel such a function is good to have but I am not sure if there is a > need to tie it with the removal of min_safe_lsn column. We should expose the LSN calculated from "the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"? This indicates the minimum LSN of WAL files that are guaraneed to be currently retained by wal_keep_segments and max_slot_wal_keep_size. That is, if checkpoint occurs when restart_lsn of replication slot is smaller than that minimum LSN, some required WAL files may be removed. So DBAs can periodically monitor and compare restart_lsn and that minimum LSN. If they see frequently that difference of those LSN is very small, they can decide to increase wal_keep_segments or max_slot_wal_keep_size, to prevent required WAL files from being removed. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Mon, 22 Jun 2020 22:02:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2020/06/22 21:01, Amit Kapila wrote: > > On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz> > > wrote: > >> > >> On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote: > >>> On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote: > >>>> Isn't this information specific to checkpoints, so maybe better to > >>>> display in view pg_stat_bgwriter? > >>> > >>> Not sure that's a good match. If we decide to expose that, a separate > >>> function returning a LSN based on the segment number from > >>> XLogGetLastRemovedSegno() sounds fine to me, like > >>> pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in > >>> mind? > >> > >> I was thinking on this one for the last couple of days, and came up > >> with the name pg_wal_oldest_lsn(), as per the attached, traking the > >> oldest WAL location still available. > > Thanks for the patch! > > + <literal>NULL</literal> if no WAL segments have been removed since > + startup. > > Isn't this confusing? I think that we should store the last removed > WAL segment to somewhere (e.g., pg_control) and restore it at > the startup, so that we can see the actual value even after the > startup. > Or we should scan pg_wal directory and find the "minimal" WAL segment > and return its LSN. Running a separate scan on pg_wal at startup or first time the oldest WAL segno is referenced is something that was rejected before. But with the current behavior we don't find the last removed segment until any slot loses a segment if all WAL files are retained by a slot. FWIW I recently proposed a patch to find the oldest WAL file while trying removing old WAL files. > > I feel such a function is good to have but I am not sure if there is a > > need to tie it with the removal of min_safe_lsn column. > > We should expose the LSN calculated from > "the current WAL LSN - max(wal_keep_segments * 16MB, > max_slot_wal_keep_size)"? > This indicates the minimum LSN of WAL files that are guaraneed to be > currently retained by wal_keep_segments and max_slot_wal_keep_size. > That is, if checkpoint occurs when restart_lsn of replication slot is > smaller than that minimum LSN, some required WAL files may be removed. > So DBAs can periodically monitor and compare restart_lsn and that > minimum > LSN. If they see frequently that difference of those LSN is very > small, > they can decide to increase wal_keep_segments or > max_slot_wal_keep_size, > to prevent required WAL files from being removed. Thought? I'm not sure about the consensus here about showing that number in the view. It is similar to "remain" in the earlier versions of this patch but a bit simpler. It would be usable in a similar way. I can live with either numbers. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Jun 23, 2020 at 10:10:37AM +0900, Kyotaro Horiguchi wrote: > At Mon, 22 Jun 2020 22:02:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> Isn't this confusing? I think that we should store the last removed >> WAL segment to somewhere (e.g., pg_control) and restore it at >> the startup, so that we can see the actual value even after the >> startup. >> Or we should scan pg_wal directory and find the "minimal" WAL segment >> and return its LSN. > > Running a separate scan on pg_wal at startup or first time the oldest > WAL segno is referenced is something that was rejected before. But > with the current behavior we don't find the last removed segment until > any slot loses a segment if all WAL files are retained by a slot. FWIW > I recently proposed a patch to find the oldest WAL file while trying > removing old WAL files. Hmm. I agree that the approach I previously sent may be kind of confusing without a clear initialization point, which would actually be (checkPointCopy.redo + checkPointCopy.ThisTimeLineID) from the control file with an extra computation depending on any replication slot data present on disk? So one could do the maths cleanly after StartupReplicationSlots() is called in the startup process. My point is: it does not seem really obvious to me that we need to change the control file to track that. > I'm not sure about the consensus here about showing that number in the > view. It is similar to "remain" in the earlier versions of this patch > but a bit simpler. It would be usable in a similar way. I can live > with either numbers. Anyway, here is my take. We are discussing a design issue here, we are moving the discussion into having a different design, and discussing new designs is never a good sign post-beta (some open items tend to move towards this direction every year). So I'd like to think that the best thing we can do here is just to drop min_safe_lsn from pg_replication_slots, and just reconsider this part for 14~ with something we think is better. By the way, I have added a separate open item for this thread. -- Michael
Attachment
On 2020/06/23 10:10, Kyotaro Horiguchi wrote: > At Mon, 22 Jun 2020 22:02:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> >> >> On 2020/06/22 21:01, Amit Kapila wrote: >>> On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz> >>> wrote: >>>> >>>> On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote: >>>>> On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote: >>>>>> Isn't this information specific to checkpoints, so maybe better to >>>>>> display in view pg_stat_bgwriter? >>>>> >>>>> Not sure that's a good match. If we decide to expose that, a separate >>>>> function returning a LSN based on the segment number from >>>>> XLogGetLastRemovedSegno() sounds fine to me, like >>>>> pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in >>>>> mind? >>>> >>>> I was thinking on this one for the last couple of days, and came up >>>> with the name pg_wal_oldest_lsn(), as per the attached, traking the >>>> oldest WAL location still available. >> >> Thanks for the patch! >> >> + <literal>NULL</literal> if no WAL segments have been removed since >> + startup. >> >> Isn't this confusing? I think that we should store the last removed >> WAL segment to somewhere (e.g., pg_control) and restore it at >> the startup, so that we can see the actual value even after the >> startup. >> Or we should scan pg_wal directory and find the "minimal" WAL segment >> and return its LSN. > > Running a separate scan on pg_wal at startup or first time the oldest > WAL segno is referenced is something that was rejected before. But > with the current behavior we don't find the last removed segment until > any slot loses a segment if all WAL files are retained by a slot. Because scanning pg_wal can be heavy operation especially when max_wal_size is high and there are lots of WAL files? If so, it might be better to save the value in pg_control as I told upthread. However I'm not sure the use case of this function yet... > FWIW > I recently proposed a patch to find the oldest WAL file while trying > removing old WAL files. > >>> I feel such a function is good to have but I am not sure if there is a >>> need to tie it with the removal of min_safe_lsn column. >> >> We should expose the LSN calculated from >> "the current WAL LSN - max(wal_keep_segments * 16MB, >> max_slot_wal_keep_size)"? >> This indicates the minimum LSN of WAL files that are guaraneed to be >> currently retained by wal_keep_segments and max_slot_wal_keep_size. >> That is, if checkpoint occurs when restart_lsn of replication slot is >> smaller than that minimum LSN, some required WAL files may be removed. >> So DBAs can periodically monitor and compare restart_lsn and that >> minimum >> LSN. If they see frequently that difference of those LSN is very >> small, >> they can decide to increase wal_keep_segments or >> max_slot_wal_keep_size, >> to prevent required WAL files from being removed. Thought? > > I'm not sure about the consensus here about showing that number in the > view. It is similar to "remain" in the earlier versions of this patch > but a bit simpler. It would be usable in a similar way. I can live > with either numbers. It's useless to display this value in each replication slot in the view. I'm thinking to expose it as a function. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Mon, Jun 22, 2020 at 6:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/06/22 21:01, Amit Kapila wrote: > > On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote: > >> > >> On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote: > >>> On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote: > >>>> Isn't this information specific to checkpoints, so maybe better to > >>>> display in view pg_stat_bgwriter? > >>> > >>> Not sure that's a good match. If we decide to expose that, a separate > >>> function returning a LSN based on the segment number from > >>> XLogGetLastRemovedSegno() sounds fine to me, like > >>> pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in > >>> mind? > >> > >> I was thinking on this one for the last couple of days, and came up > >> with the name pg_wal_oldest_lsn(), as per the attached, traking the > >> oldest WAL location still available. > > Thanks for the patch! > > + <literal>NULL</literal> if no WAL segments have been removed since > + startup. > > Isn't this confusing? I think that we should store the last removed > WAL segment to somewhere (e.g., pg_control) and restore it at > the startup, so that we can see the actual value even after the startup. > Or we should scan pg_wal directory and find the "minimal" WAL segment > and return its LSN. > > > > I feel such a function is good to have but I am not sure if there is a > > need to tie it with the removal of min_safe_lsn column. > > We should expose the LSN calculated from > "the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"? > This indicates the minimum LSN of WAL files that are guaraneed to be > currently retained by wal_keep_segments and max_slot_wal_keep_size. > That is, if checkpoint occurs when restart_lsn of replication slot is > smaller than that minimum LSN, some required WAL files may be removed. > > So DBAs can periodically monitor and compare restart_lsn and that minimum > LSN. If they see frequently that difference of those LSN is very small, > they can decide to increase wal_keep_segments or max_slot_wal_keep_size, > to prevent required WAL files from being removed. Thought? > +1. This sounds like a good and useful stat for users. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 23, 2020 at 7:47 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/06/23 10:10, Kyotaro Horiguchi wrote: > > At Mon, 22 Jun 2020 22:02:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > >> > >>> I feel such a function is good to have but I am not sure if there is a > >>> need to tie it with the removal of min_safe_lsn column. > >> > >> We should expose the LSN calculated from > >> "the current WAL LSN - max(wal_keep_segments * 16MB, > >> max_slot_wal_keep_size)"? > >> This indicates the minimum LSN of WAL files that are guaraneed to be > >> currently retained by wal_keep_segments and max_slot_wal_keep_size. > >> That is, if checkpoint occurs when restart_lsn of replication slot is > >> smaller than that minimum LSN, some required WAL files may be removed. > >> So DBAs can periodically monitor and compare restart_lsn and that > >> minimum > >> LSN. If they see frequently that difference of those LSN is very > >> small, > >> they can decide to increase wal_keep_segments or > >> max_slot_wal_keep_size, > >> to prevent required WAL files from being removed. Thought? > > > > I'm not sure about the consensus here about showing that number in the > > view. It is similar to "remain" in the earlier versions of this patch > > but a bit simpler. It would be usable in a similar way. I can live > > with either numbers. > > It's useless to display this value in each replication slot in the view. > I'm thinking to expose it as a function. > Having a separate function for this seems like a good idea but can we consider displaying it in a view like pg_stat_replication_slots as we are discussing a nearby thread to have such a view for other things. I think ultimately this information is required to check whether some slot can be invalidated or not, so having it displayed along with other slot information might not be a bad idea. [1] - https://www.postgresql.org/message-id/CAA4eK1Jyh4qgdnxzV4fYuk9GiXLb%3DUz-6o19E2RfiN8MPmUu3A%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
At Tue, 23 Jun 2020 11:50:34 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > On Mon, Jun 22, 2020 at 6:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/06/22 21:01, Amit Kapila wrote: > > > On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote: > > >> > > >> On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote: > > >>> On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote: > > >>>> Isn't this information specific to checkpoints, so maybe better to > > >>>> display in view pg_stat_bgwriter? > > >>> > > >>> Not sure that's a good match. If we decide to expose that, a separate > > >>> function returning a LSN based on the segment number from > > >>> XLogGetLastRemovedSegno() sounds fine to me, like > > >>> pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in > > >>> mind? > > >> > > >> I was thinking on this one for the last couple of days, and came up > > >> with the name pg_wal_oldest_lsn(), as per the attached, traking the > > >> oldest WAL location still available. > > > > Thanks for the patch! > > > > + <literal>NULL</literal> if no WAL segments have been removed since > > + startup. > > > > Isn't this confusing? I think that we should store the last removed > > WAL segment to somewhere (e.g., pg_control) and restore it at > > the startup, so that we can see the actual value even after the startup. > > Or we should scan pg_wal directory and find the "minimal" WAL segment > > and return its LSN. > > > > > > > I feel such a function is good to have but I am not sure if there is a > > > need to tie it with the removal of min_safe_lsn column. > > > > We should expose the LSN calculated from > > "the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"? > > This indicates the minimum LSN of WAL files that are guaraneed to be > > currently retained by wal_keep_segments and max_slot_wal_keep_size. > > That is, if checkpoint occurs when restart_lsn of replication slot is > > smaller than that minimum LSN, some required WAL files may be removed. > > > > So DBAs can periodically monitor and compare restart_lsn and that minimum > > LSN. If they see frequently that difference of those LSN is very small, > > they can decide to increase wal_keep_segments or max_slot_wal_keep_size, > > to prevent required WAL files from being removed. Thought? > > > > +1. This sounds like a good and useful stat for users. +1 for showing a number that is not involving lastRemovedSegNo. It is like returning to the initial version of this patch. It showed a number like ((the suggested above) minus restart_lsn). The number is different for each slot so they fit in the view. The number is usable for the same purpose so I'm ok with it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020-Jun-23, Kyotaro Horiguchi wrote: > At Tue, 23 Jun 2020 11:50:34 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > > On Mon, Jun 22, 2020 at 6:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > We should expose the LSN calculated from > > > "the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"? > > > This indicates the minimum LSN of WAL files that are guaraneed to be > > > currently retained by wal_keep_segments and max_slot_wal_keep_size. > > > That is, if checkpoint occurs when restart_lsn of replication slot is > > > smaller than that minimum LSN, some required WAL files may be removed. > > > > > > So DBAs can periodically monitor and compare restart_lsn and that minimum > > > LSN. If they see frequently that difference of those LSN is very small, > > > they can decide to increase wal_keep_segments or max_slot_wal_keep_size, > > > to prevent required WAL files from being removed. Thought? > > > > +1. This sounds like a good and useful stat for users. > > +1 for showing a number that is not involving lastRemovedSegNo. It is > like returning to the initial version of this patch. It showed a > number like ((the suggested above) minus restart_lsn). The number is > different for each slot so they fit in the view. > > The number is usable for the same purpose so I'm ok with it. I think we should publish the value from wal_keep_segments separately from max_slot_wal_keep_size. ISTM that the user might decide to change or remove wal_keep_segments and be suddenly at risk of losing slots because of overlooking that it was wal_keep_segments, not max_slot_wal_keep_size, that was protecting them. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/06/23 15:27, Amit Kapila wrote: > On Tue, Jun 23, 2020 at 7:47 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/06/23 10:10, Kyotaro Horiguchi wrote: >>> At Mon, 22 Jun 2020 22:02:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>> >>>>> I feel such a function is good to have but I am not sure if there is a >>>>> need to tie it with the removal of min_safe_lsn column. >>>> >>>> We should expose the LSN calculated from >>>> "the current WAL LSN - max(wal_keep_segments * 16MB, >>>> max_slot_wal_keep_size)"? >>>> This indicates the minimum LSN of WAL files that are guaraneed to be >>>> currently retained by wal_keep_segments and max_slot_wal_keep_size. >>>> That is, if checkpoint occurs when restart_lsn of replication slot is >>>> smaller than that minimum LSN, some required WAL files may be removed. >>>> So DBAs can periodically monitor and compare restart_lsn and that >>>> minimum >>>> LSN. If they see frequently that difference of those LSN is very >>>> small, >>>> they can decide to increase wal_keep_segments or >>>> max_slot_wal_keep_size, >>>> to prevent required WAL files from being removed. Thought? >>> >>> I'm not sure about the consensus here about showing that number in the >>> view. It is similar to "remain" in the earlier versions of this patch >>> but a bit simpler. It would be usable in a similar way. I can live >>> with either numbers. >> >> It's useless to display this value in each replication slot in the view. >> I'm thinking to expose it as a function. >> > > Having a separate function for this seems like a good idea but can we > consider displaying it in a view like pg_stat_replication_slots as we > are discussing a nearby thread to have such a view for other things. > I think ultimately this information is required to check whether some > slot can be invalidated or not, so having it displayed along with > other slot information might not be a bad idea. "the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)" is the same value between all the replication slots. But you think it's better to display that same value for every slots in the view? Or you're thinking to display the difference of that LSN value and restart_lsn as Horiguchi-san suggested? That diff varies each replication slot, so it seems ok to display it for every rows. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/06/24 8:39, Alvaro Herrera wrote: > On 2020-Jun-23, Kyotaro Horiguchi wrote: > >> At Tue, 23 Jun 2020 11:50:34 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in >>> On Mon, Jun 22, 2020 at 6:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>> We should expose the LSN calculated from >>>> "the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"? >>>> This indicates the minimum LSN of WAL files that are guaraneed to be >>>> currently retained by wal_keep_segments and max_slot_wal_keep_size. >>>> That is, if checkpoint occurs when restart_lsn of replication slot is >>>> smaller than that minimum LSN, some required WAL files may be removed. >>>> >>>> So DBAs can periodically monitor and compare restart_lsn and that minimum >>>> LSN. If they see frequently that difference of those LSN is very small, >>>> they can decide to increase wal_keep_segments or max_slot_wal_keep_size, >>>> to prevent required WAL files from being removed. Thought? >>> >>> +1. This sounds like a good and useful stat for users. >> >> +1 for showing a number that is not involving lastRemovedSegNo. It is >> like returning to the initial version of this patch. It showed a >> number like ((the suggested above) minus restart_lsn). The number is >> different for each slot so they fit in the view. >> >> The number is usable for the same purpose so I'm ok with it. > > I think we should publish the value from wal_keep_segments separately > from max_slot_wal_keep_size. ISTM that the user might decide to change > or remove wal_keep_segments and be suddenly at risk of losing slots > because of overlooking that it was wal_keep_segments, not > max_slot_wal_keep_size, that was protecting them. You mean to have two functions that returns 1. "current WAL LSN - wal_keep_segments * 16MB" 2. "current WAL LSN - max_slot_wal_keep_size" Right? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, Jun 24, 2020 at 2:37 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/06/23 15:27, Amit Kapila wrote: > > > > Having a separate function for this seems like a good idea but can we > > consider displaying it in a view like pg_stat_replication_slots as we > > are discussing a nearby thread to have such a view for other things. > > I think ultimately this information is required to check whether some > > slot can be invalidated or not, so having it displayed along with > > other slot information might not be a bad idea. > > "the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)" > is the same value between all the replication slots. But you think it's better > to display that same value for every slots in the view? > > Or you're thinking to display the difference of that LSN value and > restart_lsn as Horiguchi-san suggested? > I see value in Horiguchi-San's proposal. IIUC, it will tell help DBAs/Users to know if any particular slot will get invalidated soon. > That diff varies each replication slot, > so it seems ok to display it for every rows. > Yes. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020-Jun-24, Fujii Masao wrote: > On 2020/06/24 8:39, Alvaro Herrera wrote: > > I think we should publish the value from wal_keep_segments separately > > from max_slot_wal_keep_size. ISTM that the user might decide to change > > or remove wal_keep_segments and be suddenly at risk of losing slots > > because of overlooking that it was wal_keep_segments, not > > max_slot_wal_keep_size, that was protecting them. > > You mean to have two functions that returns > > 1. "current WAL LSN - wal_keep_segments * 16MB" > 2. "current WAL LSN - max_slot_wal_keep_size" Hmm, but all the values there are easily findable. What would be the point in repeating it? Maybe we should disregard this line of thinking and go back to Horiguchi-san's original proposal, to wit use the "distance to breakage", as also supported now by Amit Kapila[1] (unless I misunderstand him). [1] https://postgr.es/m/CAA4eK1L2oJ7T1cESdc5w4J9L3Q_hhvWqTigdAXKfnsJy4=v13w@mail.gmail.com -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 24, 2020 at 8:45 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2020-Jun-24, Fujii Masao wrote: > > > On 2020/06/24 8:39, Alvaro Herrera wrote: > > > > I think we should publish the value from wal_keep_segments separately > > > from max_slot_wal_keep_size. ISTM that the user might decide to change > > > or remove wal_keep_segments and be suddenly at risk of losing slots > > > because of overlooking that it was wal_keep_segments, not > > > max_slot_wal_keep_size, that was protecting them. > > > > You mean to have two functions that returns > > > > 1. "current WAL LSN - wal_keep_segments * 16MB" > > 2. "current WAL LSN - max_slot_wal_keep_size" > > Hmm, but all the values there are easily findable. What would be the > point in repeating it? > > Maybe we should disregard this line of thinking and go back to > Horiguchi-san's original proposal, to wit use the "distance to > breakage", as also supported now by Amit Kapila[1] (unless I > misunderstand him). > +1. I also think let's drop the idea of exposing a function for this value and revert the min_safe_lsn part of the work as proposed by Michael above [1] excluding the function pg_wal_oldest_lsn() in that patch. Then, we can expose this as a new stat for PG14. I feel it would be better to display this stat in a new view (something like pg_stat_replication_slots) as discussed in another thread [2]. Does that make sense? [1] - https://www.postgresql.org/message-id/20200622054950.GC50978%40paquier.xyz [2] - https://www.postgresql.org/message-id/CA%2Bfd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020-Jun-25, Amit Kapila wrote: > +1. I also think let's drop the idea of exposing a function for this > value and revert the min_safe_lsn part of the work as proposed by > Michael above [1] excluding the function pg_wal_oldest_lsn() in that > patch. Then, we can expose this as a new stat for PG14. I feel it > would be better to display this stat in a new view (something like > pg_stat_replication_slots) as discussed in another thread [2]. Does > that make sense? I don't understand the proposal. Michael posted a patch that adds pg_wal_oldest_lsn(), and you say we should apply the patch except the part that adds that function -- so what part would be applying? If the proposal is to apply just the hunk in pg_get_replication_slots that removes min_safe_lsn, and do nothing else in pg13, then I don't like it. The feature exposes a way to monitor slots w.r.t. the maximum slot size; I'm okay if you prefer to express that in a different way, but I don't like the idea of shipping pg13 without any way to monitor it. As reported by Masao-san, the current min_safe_lsn has a definitional problem when used with pg_walfile_name(), but we've established that that's because pg_walfile_name() has a special-case definition, not because min_safe_lsn itself is bogus. If we're looking for a minimal change that can fix this problem, let's increment one byte, which should fix that issue, no? I also see that some people complain that all slots return the same value and therefore this column is redundant. To that argument I say that it's not unreasonable that we'll add a slot-specific size limit; and if we do, we'll be happy we had slot-specific min safe LSN; see e.g. https://postgr.es/m/20170301160610.wc7ez3vihmialntd@alap3.anarazel.de -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote: > I don't understand the proposal. Michael posted a patch that adds > pg_wal_oldest_lsn(), and you say we should apply the patch except the > part that adds that function -- so what part would be applying? I have sent last week a patch about only the removal of min_safe_lsn: https://www.postgresql.org/message-id/20200619121552.GH453547@paquier.xyz So this applies to this part. > If the proposal is to apply just the hunk in pg_get_replication_slots > that removes min_safe_lsn, and do nothing else in pg13, then I don't like > it. The feature exposes a way to monitor slots w.r.t. the maximum slot > size; I'm okay if you prefer to express that in a different way, but I > don't like the idea of shipping pg13 without any way to monitor it. From what I can see, it seems to me that we have a lot of views of how to tackle the matter. That gives an idea that we are not really happy with the current state of things, and usually a sign that we may want to redesign it, going back to this issue for v14. My 2c. -- Michael
Attachment
On 2020-Jun-26, Michael Paquier wrote: > On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote: > > I don't understand the proposal. Michael posted a patch that adds > > pg_wal_oldest_lsn(), and you say we should apply the patch except the > > part that adds that function -- so what part would be applying? > > I have sent last week a patch about only the removal of min_safe_lsn: > https://www.postgresql.org/message-id/20200619121552.GH453547@paquier.xyz > So this applies to this part. Well, I oppose that because it leaves us with no way to monitor slot limits. In his opening email, Masao-san proposed to simply change the value by adding 1. How you go from adding 1 to a column to removing the column completely with no recourse, is beyond me. Let me summarize the situation and possible ways forward as I see them. If I'm mistaken, please correct me. Problems: i) pg_replication_slot.min_safe_lsn has a weird definition in that all replication slots show the same value ii) min_safe_lsn cannot be used with pg_walfile_name, because it returns the name of the previous segment. Proposed solutions: a) Do nothing -- keep the min_safe_lsn column as is. Warn users that pg_walfile_name should not be used with this column. b) Redefine min_safe_lsn to be lsn+1, so that pg_walfile_name can be used and return a useful value. c) Remove min_safe_lsn; add functions that expose the same value d) Remove min_safe_lsn; add a new view that exposes the same value and possibly others e) Replace min_safe_lsn with a "distance" column, which reports restart_lsn - oldest valid LSN (Note that you no longer have an LSN in this scenario, so you can't call pg_walfile_name.) The original patch implemented (e); it was changed to its current definition because of this[1] comment. My proposal is to put it back. [1] https://postgr.es/m/20171106132050.6apzynxrqrzghb4r@alap3.anarazel.de -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 26, 2020 at 4:54 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2020-Jun-26, Michael Paquier wrote: > > > On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote: > > > I don't understand the proposal. Michael posted a patch that adds > > > pg_wal_oldest_lsn(), and you say we should apply the patch except the > > > part that adds that function -- so what part would be applying? > > > > I have sent last week a patch about only the removal of min_safe_lsn: > > https://www.postgresql.org/message-id/20200619121552.GH453547@paquier.xyz > > So this applies to this part. > > Well, I oppose that because it leaves us with no way to monitor slot > limits. In his opening email, Masao-san proposed to simply change the > value by adding 1. How you go from adding 1 to a column to removing > the column completely with no recourse, is beyond me. > > Let me summarize the situation and possible ways forward as I see them. > If I'm mistaken, please correct me. > > Problems: > i) pg_replication_slot.min_safe_lsn has a weird definition in that all > replication slots show the same value > It is also not clear how the user can make use of that value? > ii) min_safe_lsn cannot be used with pg_walfile_name, because it returns > the name of the previous segment. > > Proposed solutions: > > a) Do nothing -- keep the min_safe_lsn column as is. Warn users that > pg_walfile_name should not be used with this column. > b) Redefine min_safe_lsn to be lsn+1, so that pg_walfile_name can be used > and return a useful value. > c) Remove min_safe_lsn; add functions that expose the same value > d) Remove min_safe_lsn; add a new view that exposes the same value and > possibly others > > e) Replace min_safe_lsn with a "distance" column, which reports > restart_lsn - oldest valid LSN > (Note that you no longer have an LSN in this scenario, so you can't > call pg_walfile_name.) > Can we consider an option to "Remove min_safe_lsn; document how a user can monitor the distance"? We have a function to get current WAL insert location and other things required are available either via view or as guc variable values. The reason I am thinking of this option is that it might be better to get some more feedback on what is the most appropriate value to display. However, I am okay if we can reach a consensus on one of the above options. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020/06/26 13:45, Amit Kapila wrote: > On Fri, Jun 26, 2020 at 4:54 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> >> On 2020-Jun-26, Michael Paquier wrote: >> >>> On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote: >>>> I don't understand the proposal. Michael posted a patch that adds >>>> pg_wal_oldest_lsn(), and you say we should apply the patch except the >>>> part that adds that function -- so what part would be applying? >>> >>> I have sent last week a patch about only the removal of min_safe_lsn: >>> https://www.postgresql.org/message-id/20200619121552.GH453547@paquier.xyz >>> So this applies to this part. >> >> Well, I oppose that because it leaves us with no way to monitor slot >> limits. In his opening email, Masao-san proposed to simply change the >> value by adding 1. How you go from adding 1 to a column to removing >> the column completely with no recourse, is beyond me. >> >> Let me summarize the situation and possible ways forward as I see them. >> If I'm mistaken, please correct me. >> >> Problems: >> i) pg_replication_slot.min_safe_lsn has a weird definition in that all >> replication slots show the same value >> > > It is also not clear how the user can make use of that value? > >> ii) min_safe_lsn cannot be used with pg_walfile_name, because it returns >> the name of the previous segment. >> >> Proposed solutions: >> >> a) Do nothing -- keep the min_safe_lsn column as is. Warn users that >> pg_walfile_name should not be used with this column. >> b) Redefine min_safe_lsn to be lsn+1, so that pg_walfile_name can be used >> and return a useful value. >> c) Remove min_safe_lsn; add functions that expose the same value >> d) Remove min_safe_lsn; add a new view that exposes the same value and >> possibly others >> >> e) Replace min_safe_lsn with a "distance" column, which reports >> restart_lsn - oldest valid LSN >> (Note that you no longer have an LSN in this scenario, so you can't >> call pg_walfile_name.) I like (e). > > Can we consider an option to "Remove min_safe_lsn; document how a user > can monitor the distance"? We have a function to get current WAL > insert location and other things required are available either via > view or as guc variable values. The reason I am thinking of this > option is that it might be better to get some more feedback on what is > the most appropriate value to display. However, I am okay if we can > reach a consensus on one of the above options. Yes, that's an idea. But it might not be easy to calculate that distance manually by subtracting max_slot_wal_keep_size from the current LSN. Because we've not supported -(pg_lsn, numeric) operator yet. I'm proposing that operator, but it's for v14. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/06/30 17:07, Fujii Masao wrote: > > > On 2020/06/26 13:45, Amit Kapila wrote: >> On Fri, Jun 26, 2020 at 4:54 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> >>> On 2020-Jun-26, Michael Paquier wrote: >>> >>>> On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote: >>>>> I don't understand the proposal. Michael posted a patch that adds >>>>> pg_wal_oldest_lsn(), and you say we should apply the patch except the >>>>> part that adds that function -- so what part would be applying? >>>> >>>> I have sent last week a patch about only the removal of min_safe_lsn: >>>> https://www.postgresql.org/message-id/20200619121552.GH453547@paquier.xyz >>>> So this applies to this part. >>> >>> Well, I oppose that because it leaves us with no way to monitor slot >>> limits. In his opening email, Masao-san proposed to simply change the >>> value by adding 1. How you go from adding 1 to a column to removing >>> the column completely with no recourse, is beyond me. >>> >>> Let me summarize the situation and possible ways forward as I see them. >>> If I'm mistaken, please correct me. >>> >>> Problems: >>> i) pg_replication_slot.min_safe_lsn has a weird definition in that all >>> replication slots show the same value >>> >> >> It is also not clear how the user can make use of that value? >> >>> ii) min_safe_lsn cannot be used with pg_walfile_name, because it returns >>> the name of the previous segment. >>> >>> Proposed solutions: >>> >>> a) Do nothing -- keep the min_safe_lsn column as is. Warn users that >>> pg_walfile_name should not be used with this column. >>> b) Redefine min_safe_lsn to be lsn+1, so that pg_walfile_name can be used >>> and return a useful value. >>> c) Remove min_safe_lsn; add functions that expose the same value >>> d) Remove min_safe_lsn; add a new view that exposes the same value and >>> possibly others >>> >>> e) Replace min_safe_lsn with a "distance" column, which reports >>> restart_lsn - oldest valid LSN >>> (Note that you no longer have an LSN in this scenario, so you can't >>> call pg_walfile_name.) > > I like (e). > >> >> Can we consider an option to "Remove min_safe_lsn; document how a user >> can monitor the distance"? We have a function to get current WAL >> insert location and other things required are available either via >> view or as guc variable values. The reason I am thinking of this >> option is that it might be better to get some more feedback on what is >> the most appropriate value to display. However, I am okay if we can >> reach a consensus on one of the above options. > > Yes, that's an idea. But it might not be easy to calculate that distance > manually by subtracting max_slot_wal_keep_size from the current LSN. > Because we've not supported -(pg_lsn, numeric) operator yet. I'm > proposing that operator, but it's for v14. Sorry this is not true. That distance can be calculated without those operators. For example, SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric * 1024 * 1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size')distance FROM pg_replication_slots; If the calculated distance is small or negative value, which means that we may lose some required WAL files. So in this case it's worth considering to increase max_slot_wal_keep_size. I still think it's better and more helpful to display something like that distance in pg_replication_slots rather than making each user calculate it... Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-Jun-30, Fujii Masao wrote: > Sorry this is not true. That distance can be calculated without those operators. > For example, > > SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric * 1024 * 1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size')distance FROM pg_replication_slots; > > If the calculated distance is small or negative value, which means that > we may lose some required WAL files. So in this case it's worth considering > to increase max_slot_wal_keep_size. ... OK, but you're forgetting wal_keep_segments. > I still think it's better and more helpful to display something like > that distance in pg_replication_slots rather than making each user > calculate it... Agreed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Tue, 30 Jun 2020 23:23:30 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > >> Can we consider an option to "Remove min_safe_lsn; document how a user > >> can monitor the distance"? We have a function to get current WAL > >> insert location and other things required are available either via > >> view or as guc variable values. The reason I am thinking of this > >> option is that it might be better to get some more feedback on what is > >> the most appropriate value to display. However, I am okay if we can > >> reach a consensus on one of the above options. > > Yes, that's an idea. But it might not be easy to calculate that > > distance > > manually by subtracting max_slot_wal_keep_size from the current LSN. > > Because we've not supported -(pg_lsn, numeric) operator yet. I'm > > proposing that operator, but it's for v14. > > Sorry this is not true. That distance can be calculated without those > operators. > For example, > > SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric * > 1024 * 1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size') > distance FROM pg_replication_slots; It's an approximation with accuracy of segment size. The calculation would be not that simple because of the unit of the calculation. The formula for the exact calculateion (ignoring wal_keep_segments) is: distance = (seg_floor(restart_lsn) + seg_floor(max_slot_wal_keep_size) + 1) * wal_segment_size - current_lsn where seg_floor is floor() by wal_segment_size. regards. > If the calculated distance is small or negative value, which means > that > we may lose some required WAL files. So in this case it's worth > considering > to increase max_slot_wal_keep_size. > > I still think it's better and more helpful to display something like > that distance in pg_replication_slots rather than making each user > calculate it... Agreed. The attached replaces min_safe_lsn with "distance". regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 7e078c9ff8e0594ce8f4e95c7db84ea0a31e9775 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com> Date: Tue, 30 Jun 2020 21:09:19 +0900 Subject: [PATCH v1] Replace min_safe_lsn with "distance" in pg_replication_slots pg_replication_slot.min_safe_lsn, which shows the oldest LSN kept in pg_wal, is doubtful in usability for monitoring. Change it to distance, which shows how many bytes the server can advance before the slot loses required segments. --- src/backend/access/transam/xlog.c | 39 +++++++++++++++++++++++ src/backend/catalog/system_views.sql | 2 +- src/backend/replication/slotfuncs.c | 19 +++++------ src/include/access/xlog.h | 1 + src/include/catalog/pg_proc.dat | 4 +-- src/test/recovery/t/019_replslot_limit.pl | 20 ++++++------ src/test/regress/expected/rules.out | 4 +-- 7 files changed, 62 insertions(+), 27 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index fd93bcfaeb..1f27639912 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9570,6 +9570,45 @@ GetWALAvailability(XLogRecPtr targetLSN) return WALAVAIL_REMOVED; } +/* + * Calculate how many bytes we can advance from currptr until the targetLSN is + * removed. + * + * Returns 0 if the distance is invalid. + */ +uint64 +DistanceToWALHorizon(XLogRecPtr targetLSN, XLogRecPtr currptr) +{ + XLogSegNo targetSeg; + XLogSegNo keepSegs; + XLogSegNo failSeg; + XLogRecPtr horizon; + + XLByteToSeg(targetLSN, targetSeg, wal_segment_size); + keepSegs = 0; + + /* no limit if max_slot_wal_keep_size is invalid */ + if (max_slot_wal_keep_size_mb < 0) + return 0; + + /* How many segments slots can keep? */ + keepSegs = ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size); + + /* override by wal_keep_segments if needed */ + if (wal_keep_segments > keepSegs) + keepSegs = wal_keep_segments; + + /* calculate the LSN where targetLSN is lost when currpos reaches */ + failSeg = targetSeg + keepSegs + 1; + XLogSegNoOffsetToRecPtr(failSeg, 0, wal_segment_size, horizon); + + /* If currptr already beyond the horizon, return zero. */ + if (currptr > horizon) + return 0; + + /* return the distance from currptr to the horizon */ + return horizon - currptr; +} /* * Retreat *logSegNo to the last segment that we need to retain because of diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 5314e9348f..b9847a9f92 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -879,7 +879,7 @@ CREATE VIEW pg_replication_slots AS L.restart_lsn, L.confirmed_flush_lsn, L.wal_status, - L.min_safe_lsn + L.distance FROM pg_get_replication_slots() AS L LEFT JOIN pg_database D ON (L.datoid = D.oid); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 88033a79b2..532b3c5826 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -242,6 +242,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) Tuplestorestate *tupstore; MemoryContext per_query_ctx; MemoryContext oldcontext; + XLogRecPtr currlsn; int slotno; /* check to see if caller supports us returning a tuplestore */ @@ -274,6 +275,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) MemoryContextSwitchTo(oldcontext); + currlsn = GetXLogWriteRecPtr(); + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); for (slotno = 0; slotno < max_replication_slots; slotno++) { @@ -282,7 +285,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) Datum values[PG_GET_REPLICATION_SLOTS_COLS]; bool nulls[PG_GET_REPLICATION_SLOTS_COLS]; WALAvailability walstate; - XLogSegNo last_removed_seg; + uint32 distance; int i; if (!slot->in_use) @@ -398,16 +401,10 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) break; } - if (max_slot_wal_keep_size_mb >= 0 && - (walstate == WALAVAIL_RESERVED || walstate == WALAVAIL_EXTENDED) && - ((last_removed_seg = XLogGetLastRemovedSegno()) != 0)) - { - XLogRecPtr min_safe_lsn; - - XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0, - wal_segment_size, min_safe_lsn); - values[i++] = Int64GetDatum(min_safe_lsn); - } + distance = + DistanceToWALHorizon(slot_contents.data.restart_lsn, currlsn); + if (distance > 0) + values[i++] = Int64GetDatum(distance); else nulls[i++] = true; diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 77ac4e785f..1ec448c5d5 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -329,6 +329,7 @@ extern void InitXLOGAccess(void); extern void CreateCheckPoint(int flags); extern bool CreateRestartPoint(int flags); extern WALAvailability GetWALAvailability(XLogRecPtr targetLSN); +extern uint64 DistanceToWALHorizon(XLogRecPtr targetLSN, XLogRecPtr currptr); extern XLogRecPtr CalculateMaxmumSafeLSN(void); extern void XLogPutNextOid(Oid nextOid); extern XLogRecPtr XLogRestorePoint(const char *rpName); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 61f2c2f5b4..199fd994bd 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -10063,9 +10063,9 @@ proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f', proretset => 't', provolatile => 's', prorettype => 'record', proargtypes => '', - proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,pg_lsn}', + proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}', proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}', - proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,min_safe_lsn}', + proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,distance}', prosrc => 'pg_get_replication_slots' }, { oid => '3786', descr => 'set up a logical replication slot', proname => 'pg_create_logical_replication_slot', provolatile => 'v', diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 7d22ae5720..1c76d2d9e9 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -28,7 +28,7 @@ $node_master->safe_psql('postgres', # The slot state and remain should be null before the first connection my $result = $node_master->safe_psql('postgres', - "SELECT restart_lsn IS NULL, wal_status is NULL, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'" + "SELECT restart_lsn IS NULL, wal_status is NULL, distance is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'" ); is($result, "t|t|t", 'check the state of non-reserved slot is "unknown"'); @@ -52,9 +52,9 @@ $node_master->wait_for_catchup($node_standby, 'replay', $start_lsn); # Stop standby $node_standby->stop; -# Preparation done, the slot is the state "normal" now +# Preparation done, the slot is the state "reserved" now $result = $node_master->safe_psql('postgres', - "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'" + "SELECT wal_status, distance is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'" ); is($result, "reserved|t", 'check the catching-up state'); @@ -64,7 +64,7 @@ $node_master->safe_psql('postgres', "CHECKPOINT;"); # The slot is always "safe" when fitting max_wal_size $result = $node_master->safe_psql('postgres', - "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'" + "SELECT wal_status, distance is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'" ); is($result, "reserved|t", 'check that it is safe if WAL fits in max_wal_size'); @@ -74,7 +74,7 @@ $node_master->safe_psql('postgres', "CHECKPOINT;"); # The slot is always "safe" when max_slot_wal_keep_size is not set $result = $node_master->safe_psql('postgres', - "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'" + "SELECT wal_status, distance is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'" ); is($result, "reserved|t", 'check that slot is working'); @@ -94,9 +94,7 @@ max_slot_wal_keep_size = ${max_slot_wal_keep_size_mb}MB )); $node_master->reload; -# The slot is in safe state. The distance from the min_safe_lsn should -# be as almost (max_slot_wal_keep_size - 1) times large as the segment -# size +# The slot is in safe state. $result = $node_master->safe_psql('postgres', "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"); @@ -110,7 +108,7 @@ $node_master->safe_psql('postgres', "CHECKPOINT;"); $result = $node_master->safe_psql('postgres', "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"); is($result, "reserved", - 'check that min_safe_lsn gets close to the current LSN'); + 'check that distance gets close to the current LSN'); # The standby can reconnect to master $node_standby->start; @@ -154,7 +152,7 @@ advance_wal($node_master, 1); # Slot gets into 'unreserved' state $result = $node_master->safe_psql('postgres', - "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'" + "SELECT wal_status, distance is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'" ); is($result, "unreserved|t", 'check that the slot state changes to "unreserved"'); @@ -186,7 +184,7 @@ ok( find_in_log( # This slot should be broken $result = $node_master->safe_psql('postgres', - "SELECT slot_name, active, restart_lsn IS NULL, wal_status, min_safe_lsn FROM pg_replication_slots WHERE slot_name ='rep1'" + "SELECT slot_name, active, restart_lsn IS NULL, wal_status, distance FROM pg_replication_slots WHERE slot_name = 'rep1'" ); is($result, "rep1|f|t|lost|", 'check that the slot became inactive and the state "lost" persists'); diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index b813e32215..392eab12dd 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1464,8 +1464,8 @@ pg_replication_slots| SELECT l.slot_name, l.restart_lsn, l.confirmed_flush_lsn, l.wal_status, - l.min_safe_lsn - FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin,restart_lsn, confirmed_flush_lsn, wal_status, min_safe_lsn) + l.distance + FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin,restart_lsn, confirmed_flush_lsn, wal_status, distance) LEFT JOIN pg_database d ON ((l.datoid = d.oid))); pg_roles| SELECT pg_authid.rolname, pg_authid.rolsuper, -- 2.18.4
On Tue, Jun 30, 2020 at 7:53 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/06/30 17:07, Fujii Masao wrote: > > > > > > On 2020/06/26 13:45, Amit Kapila wrote: > >> > >> Can we consider an option to "Remove min_safe_lsn; document how a user > >> can monitor the distance"? We have a function to get current WAL > >> insert location and other things required are available either via > >> view or as guc variable values. The reason I am thinking of this > >> option is that it might be better to get some more feedback on what is > >> the most appropriate value to display. However, I am okay if we can > >> reach a consensus on one of the above options. > > > > Yes, that's an idea. But it might not be easy to calculate that distance > > manually by subtracting max_slot_wal_keep_size from the current LSN. > > Because we've not supported -(pg_lsn, numeric) operator yet. I'm > > proposing that operator, but it's for v14. > > Sorry this is not true. That distance can be calculated without those operators. > For example, > > SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric * 1024 * 1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size')distance FROM pg_replication_slots; > > If the calculated distance is small or negative value, which means that > we may lose some required WAL files. So in this case it's worth considering > to increase max_slot_wal_keep_size. > > I still think it's better and more helpful to display something like > that distance in pg_replication_slots rather than making each user > calculate it... > Okay, but do we think it is better to display this in pg_replication_slots or some new view like pg_stat_*_slots as being discussed in [1]? It should not happen that we later decide to move this or similar stats to that view. [1] - https://www.postgresql.org/message-id/CA%2Bfd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020-Jul-01, Amit Kapila wrote: > Okay, but do we think it is better to display this in > pg_replication_slots or some new view like pg_stat_*_slots as being > discussed in [1]? It should not happen that we later decide to move > this or similar stats to that view. It seems that the main motivation for having some counters in another view is the ability to reset them; and resetting this distance value makes no sense, so I think it's better to have it in pg_replication_slots. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jul 01, 2020 at 11:14:21AM -0400, Alvaro Herrera wrote: > On 2020-Jul-01, Amit Kapila wrote: >> Okay, but do we think it is better to display this in >> pg_replication_slots or some new view like pg_stat_*_slots as being >> discussed in [1]? It should not happen that we later decide to move >> this or similar stats to that view. > > It seems that the main motivation for having some counters in another > view is the ability to reset them; and resetting this distance value > makes no sense, so I think it's better to have it in > pg_replication_slots. pg_replication_slots would make sense to me than a stat view for a distance column. Now, I have to admit that I am worried when seeing design discussions on this thread for 13 after beta2 has been shipped, so my vote would still be to remove for now the column in 13, document an equivalent query to do this work (I actually just do that in a bgworker monitoring repslot bloat now in some stuff I maintain internally), and resend a patch in v14 to give the occasion for this feature to go through one extra round of review. My 2c. -- Michael
Attachment
On 2020-Jul-02, michael@paquier.xyz wrote: > pg_replication_slots would make sense to me than a stat view for a > distance column. Now, I have to admit that I am worried when seeing > design discussions on this thread for 13 after beta2 has been shipped, We already had this discussion and one of the things we said before beta2 was "we're still in beta2, there's time". I see no need to panic. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jul 1, 2020 at 8:44 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2020-Jul-01, Amit Kapila wrote: > > > Okay, but do we think it is better to display this in > > pg_replication_slots or some new view like pg_stat_*_slots as being > > discussed in [1]? It should not happen that we later decide to move > > this or similar stats to that view. > > It seems that the main motivation for having some counters in another > view is the ability to reset them; and resetting this distance value > makes no sense, so I think it's better to have it in > pg_replication_slots. > Fair enough. It would be good if we can come up with something better than 'distance' for this column. Some ideas safe_wal_limit, safe_wal_size? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020-Jul-04, Amit Kapila wrote: > Fair enough. It would be good if we can come up with something better > than 'distance' for this column. Some ideas safe_wal_limit, > safe_wal_size? Hmm, I like safe_wal_size. I've been looking at this intermittently since late last week and I intend to get it done in the next couple of days. Thanks! -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Jul-06, Alvaro Herrera wrote: > Hmm, I like safe_wal_size. > > I've been looking at this intermittently since late last week and I > intend to get it done in the next couple of days. I propose the attached. This is pretty much what was proposed by Kyotaro, but I made a couple of changes. Most notably, I moved the calculation to the view code itself rather than creating a function in xlog.c, mostly because it seemed to me that the new function was creating an abstraction leakage without adding any value; also, if we add per-slot size limits later, it would get worse. The other change was to report negative values when the slot becomes unreserved, rather than zero. It shows how much beyond safety your slots are getting, so it seems useful. Clamping at zero seems to serve no purpose. I also made it report null immediately when slots are in state lost. But beware of slots that appear lost but fall in the unreserved category because they advanced before checkpointer signalled them. (This case requires a debugger to hit ...) One thing that got my attention while going over this is that the error message we throw when making a slot invalid is not very helpful; it doesn't say what the current insertion LSN was at that point. Maybe we should add that? (As a separate patch, of couse.) Any more thoughts? If not, I'll get this pushed tomorrow finally. Thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Thanks! At Mon, 6 Jul 2020 20:54:36 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > On 2020-Jul-06, Alvaro Herrera wrote: > > > Hmm, I like safe_wal_size. I agree to the name, too. > > I've been looking at this intermittently since late last week and I > > intend to get it done in the next couple of days. > > I propose the attached. This is pretty much what was proposed by > Kyotaro, but I made a couple of changes. Most notably, I moved the > calculation to the view code itself rather than creating a function in > xlog.c, mostly because it seemed to me that the new function was > creating an abstraction leakage without adding any value; also, if we > add per-slot size limits later, it would get worse. I'm not sure that detailed WAL segment calculation fits slotfuncs.c but I don't object to the change. However if we do that: + /* determine how many segments slots can be kept by slots ... */ + keepSegs = max_slot_wal_keep_size_mb / (wal_segment_size / (1024 * 1024)); Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and use it intead of the bare expression? > The other change was to report negative values when the slot becomes > unreserved, rather than zero. It shows how much beyond safety your > slots are getting, so it seems useful. Clamping at zero seems to serve > no purpose. The reason for the clamping is the signedness of the values, or integral promotion. However, I believe the calculation cannot go beyond the range of signed long so the signedness conversion in the patch looks fine. > I also made it report null immediately when slots are in state lost. > But beware of slots that appear lost but fall in the unreserved category > because they advanced before checkpointer signalled them. (This case > requires a debugger to hit ...) Oh! Okay, that change seems right to me. > One thing that got my attention while going over this is that the error > message we throw when making a slot invalid is not very helpful; it > doesn't say what the current insertion LSN was at that point. Maybe we > should add that? (As a separate patch, of couse.) It sounds helpful to me. (I remember that I sometime want to see checkpoint LSNs in server log..) > Any more thoughts? If not, I'll get this pushed tomorrow finally. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020-Jul-07, Kyotaro Horiguchi wrote: > At Mon, 6 Jul 2020 20:54:36 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > > I propose the attached. This is pretty much what was proposed by > > Kyotaro, but I made a couple of changes. Most notably, I moved the > > calculation to the view code itself rather than creating a function in > > xlog.c, mostly because it seemed to me that the new function was > > creating an abstraction leakage without adding any value; also, if we > > add per-slot size limits later, it would get worse. > > I'm not sure that detailed WAL segment calculation fits slotfuncs.c > but I don't object to the change. However if we do that: > > + /* determine how many segments slots can be kept by slots ... */ > + keepSegs = max_slot_wal_keep_size_mb / (wal_segment_size / (1024 * 1024)); > > Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and > use it intead of the bare expression? I was of two minds about that, and the only reason I didn't do it is that we'll need to give it a better name if we do it ... I'm open to suggestions. > > The other change was to report negative values when the slot becomes > > unreserved, rather than zero. It shows how much beyond safety your > > slots are getting, so it seems useful. Clamping at zero seems to serve > > no purpose. > > The reason for the clamping is the signedness of the values, or > integral promotion. However, I believe the calculation cannot go > beyond the range of signed long so the signedness conversion in the > patch looks fine. Yeah, I think the negative values are useful to see. I think if you ever get close to 2^62, you're in much more serious trouble anyway :-) But I don't deny that the math there could be subject of overflow issues. If you want to verify, please be my guest ... > > One thing that got my attention while going over this is that the error > > message we throw when making a slot invalid is not very helpful; it > > doesn't say what the current insertion LSN was at that point. Maybe we > > should add that? (As a separate patch, of couse.) > > It sounds helpful to me. (I remember that I sometime want to see > checkpoint LSNs in server log..) Hmm, ... let's do that for pg14! Thanks for looking, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Jul-06, Alvaro Herrera wrote: > On 2020-Jul-07, Kyotaro Horiguchi wrote: > > Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and > > use it intead of the bare expression? > > I was of two minds about that, and the only reason I didn't do it is > that we'll need to give it a better name if we do it ... I'm open to > suggestions. In absence of other suggestions I gave this the name XLogMBVarToSegs, and redefined ConvertToXSegs to use that. Didn't touch callers in xlog.c to avoid pointless churn. Pushed to both master and 13. I hope this satisfies everyone ... Masao-san, thanks for reporting the problem, and thanks Horiguchi-san for providing the fix. (Also thanks to Amit and Michael for discussion.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > In absence of other suggestions I gave this the name XLogMBVarToSegs, > and redefined ConvertToXSegs to use that. Didn't touch callers in > xlog.c to avoid pointless churn. Pushed to both master and 13. The buildfarm's sparc64 members seem unhappy with this. regards, tom lane
On 2020-Jul-08, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > In absence of other suggestions I gave this the name XLogMBVarToSegs, > > and redefined ConvertToXSegs to use that. Didn't touch callers in > > xlog.c to avoid pointless churn. Pushed to both master and 13. > > The buildfarm's sparc64 members seem unhappy with this. Hmm. Some of them are, yeah, but it's not universal. For example mussurana and ibisbill are not showing failures. Anyway the error is pretty strange: only GetWALAvailability is showing a problem, but the size calculation in the view function def is returning a negative number, as expected. So looking at the code in GetWALAvailability, what happens is that targetSeg >= oldestSlotSeg, but we expect the opposite. I'd bet for targetSeg to be correct, since its input is just the slot LSN -- pretty easy. But for oldestSlotSeg, we have KeepLogSeg involved. No immediate ideas ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Jul-08, Tom Lane wrote: >> The buildfarm's sparc64 members seem unhappy with this. > Hmm. Some of them are, yeah, but it's not universal. For example > mussurana and ibisbill are not showing failures. Ah, right, I was thinking they hadn't run since this commit, but they have. > Anyway the error is pretty strange: only GetWALAvailability is showing a > problem, but the size calculation in the view function def is returning > a negative number, as expected. We've previously noted what seem to be compiler optimization bugs on both sparc32 and sparc64; the latest thread about that is https://www.postgresql.org/message-id/flat/f28f842d-e82b-4e30-a81a-2a1f9fa4a8e1%40www.fastmail.com This is looking uncomfortably like the same thing. Tom, could you experiment with different -O levels on those animals? regards, tom lane
On 2020-Jul-08, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Anyway the error is pretty strange: only GetWALAvailability is showing a > > problem, but the size calculation in the view function def is returning > > a negative number, as expected. > > We've previously noted what seem to be compiler optimization bugs on > both sparc32 and sparc64; the latest thread about that is > https://www.postgresql.org/message-id/flat/f28f842d-e82b-4e30-a81a-2a1f9fa4a8e1%40www.fastmail.com > > This is looking uncomfortably like the same thing. Ouch. So 12 builds with -O0 but 13 does not? Did we do something to sequence.c to work around this problem? I cannot find anything. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Jul-08, Tom Lane wrote: >> We've previously noted what seem to be compiler optimization bugs on >> both sparc32 and sparc64; the latest thread about that is >> https://www.postgresql.org/message-id/flat/f28f842d-e82b-4e30-a81a-2a1f9fa4a8e1%40www.fastmail.com >> This is looking uncomfortably like the same thing. > Ouch. So 12 builds with -O0 but 13 does not? Unless Tom's changed the animal's config since that thread, yes. > Did we do something to > sequence.c to work around this problem? I cannot find anything. We did not. If it's a compiler bug, and one as phase-of-the-moon- dependent as this seems to be, I'd have zero confidence that any specific source code change would fix it (barring someone confidently explaining exactly what the compiler bug is, anyway). The best we can do for now is hope that backing off the -O level avoids the bug. regards, tom lane
On 2020-Jul-08, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Ouch. So 12 builds with -O0 but 13 does not? > > Unless Tom's changed the animal's config since that thread, yes. I verified the configs in branches 12 and 13 in one of the failing animals, and yes that's still the case. > > Did we do something to > > sequence.c to work around this problem? I cannot find anything. > > We did not. If it's a compiler bug, and one as phase-of-the-moon- > dependent as this seems to be, I'd have zero confidence that any > specific source code change would fix it (barring someone confidently > explaining exactly what the compiler bug is, anyway). The best we > can do for now is hope that backing off the -O level avoids the bug. An easy workaround might be to add -O0 for that platform in that directory's Makefile. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Jul-08, Tom Lane wrote: >> We did not. If it's a compiler bug, and one as phase-of-the-moon- >> dependent as this seems to be, I'd have zero confidence that any >> specific source code change would fix it (barring someone confidently >> explaining exactly what the compiler bug is, anyway). The best we >> can do for now is hope that backing off the -O level avoids the bug. > An easy workaround might be to add -O0 for that platform in that > directory's Makefile. "Back off the -O level in one directory" seems about as misguided as "back off the -O level in one branch", if you ask me. There's no reason to suppose that the problem won't bite us somewhere else next week. The previous sparc32 bug that we'd made some effort to run to ground is described here: https://www.postgresql.org/message-id/15142.1498165769@sss.pgh.pa.us We really don't know what aspects of the source code trigger that. I'm slightly suspicious that we might be seeing the same bug in the sparc64 builds, but it's just a guess. regards, tom lane
... or on the other hand, maybe these animals are just showing more sensitivity than others to an actual code bug. skink is showing valgrind failures in this very area, on both HEAD and v13: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2020-07-08%2021%3A13%3A02 ==3166208== VALGRINDERROR-BEGIN ==3166208== Conditional jump or move depends on uninitialised value(s) ==3166208== at 0x28618D: KeepLogSeg (xlog.c:9627) ==3166208== by 0x296AC5: GetWALAvailability (xlog.c:9533) ==3166208== by 0x4FFECB: pg_get_replication_slots (slotfuncs.c:356) ==3166208== by 0x3C762F: ExecMakeTableFunctionResult (execSRF.c:234) ==3166208== by 0x3D9A9A: FunctionNext (nodeFunctionscan.c:95) ==3166208== by 0x3C81D6: ExecScanFetch (execScan.c:133) ==3166208== by 0x3C81D6: ExecScan (execScan.c:199) ==3166208== by 0x3D99A9: ExecFunctionScan (nodeFunctionscan.c:270) ==3166208== by 0x3C5072: ExecProcNodeFirst (execProcnode.c:450) ==3166208== by 0x3BD35E: ExecProcNode (executor.h:245) ==3166208== by 0x3BD35E: ExecutePlan (execMain.c:1646) ==3166208== by 0x3BE039: standard_ExecutorRun (execMain.c:364) ==3166208== by 0x3BE102: ExecutorRun (execMain.c:308) ==3166208== by 0x559669: PortalRunSelect (pquery.c:912) ==3166208== Uninitialised value was created by a stack allocation ==3166208== at 0x296A84: GetWALAvailability (xlog.c:9523) ==3166208== ==3166208== VALGRINDERROR-END and even the most cursory look at the code confirms that there's a real bug here. KeepLogSeg expects *logSegNo to be defined on entry, but GetWALAvailability hasn't bothered to initialize oldestSlotSeg. It is not clear to me which one is in the wrong; the comment for KeepLogSeg isn't particularly clear on this. regards, tom lane
On 2020-Jul-09, Tom Lane wrote: > and even the most cursory look at the code confirms that there's a > real bug here. KeepLogSeg expects *logSegNo to be defined on entry, > but GetWALAvailability hasn't bothered to initialize oldestSlotSeg. > It is not clear to me which one is in the wrong; the comment for > KeepLogSeg isn't particularly clear on this. Oh, so I introduced the bug when I removed the initialization in this fix. That one was using the wrong datatype, but evidently it achieved the right effect. And KeepLogSeg is using the wrong datatype Invalid macro also. I think we should define InvalidXLogSegNo to be ~((uint64)0) and add a macro to test for that. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Jul-09, Alvaro Herrera wrote: > I think we should define InvalidXLogSegNo to be ~((uint64)0) and add a > macro to test for that. That's overkill really. I just used zero. Running contrib/test_decoding under valgrind, this now passes. I think I'd rather do away with the compare to zero, and initialize to something else in GetWALAvailability, though. What we're doing seems unclean and unclear. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Jul-09, Alvaro Herrera wrote: >> I think we should define InvalidXLogSegNo to be ~((uint64)0) and add a >> macro to test for that. > That's overkill really. I just used zero. Running > contrib/test_decoding under valgrind, this now passes. > I think I'd rather do away with the compare to zero, and initialize to > something else in GetWALAvailability, though. What we're doing seems > unclean and unclear. Is zero really not a valid segment number? regards, tom lane
On 2020-Jul-11, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > On 2020-Jul-09, Alvaro Herrera wrote: > >> I think we should define InvalidXLogSegNo to be ~((uint64)0) and add a > >> macro to test for that. > > > That's overkill really. I just used zero. Running > > contrib/test_decoding under valgrind, this now passes. > > > I think I'd rather do away with the compare to zero, and initialize to > > something else in GetWALAvailability, though. What we're doing seems > > unclean and unclear. > > Is zero really not a valid segment number? No, but you cannot retreat from that ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
A much more sensible answer is to initialize the segno to the segment currently being written, as in the attached. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2020-Jul-13, Alvaro Herrera wrote: > A much more sensible answer is to initialize the segno to the segment > currently being written, as in the attached. Ran the valgrind test locally and it passes. Pushed it now. Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Mon, 13 Jul 2020 13:52:12 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > On 2020-Jul-13, Alvaro Herrera wrote: > > > A much more sensible answer is to initialize the segno to the segment > > currently being written, as in the attached. > > Ran the valgrind test locally and it passes. Pushed it now. - if (XLogRecPtrIsInvalid(*logSegNo) || segno < *logSegNo) + if (segno < *logSegNo) Oops! Thank you for fixing it! regards. -- Kyotaro Horiguchi NTT Open Source Software Center