Re: [HACKERS] Restricting maximum keep segments by repslots - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: [HACKERS] Restricting maximum keep segments by repslots
Date
Msg-id 20200408.164605.1874250940847340108.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: [HACKERS] Restricting maximum keep segments by repslots  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
At Wed, 08 Apr 2020 14:19:56 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> I saw another issue, the following sequence on the primary freezes
> when invalidation happens.
> 
> =# create table tt(); drop table tt; select pg_switch_wal();create table tt(); drop table tt; select
pg_switch_wal();createtable tt(); drop table tt; select pg_switch_wal(); checkpoint;
 
> 
> The last checkpoint command is waiting for CV on
> CheckpointerShmem->start_cv in RequestCheckpoint(), while Checkpointer
> is waiting for the next latch at the end of
> CheckpointerMain. new_started doesn't move but it is the same value
> with old_started.
> 
> That freeze didn't happen when I removed
> ConditionVariableSleep(&s->active_cv) in
> InvalidateObsoleteReplicationSlots.
> 
> I continue investigating it.

I understand how it happens.

The latch triggered by checkpoint request by CHECKPOINT command has
been absorbed by ConditionVariableSleep() in
InvalidateObsoleteReplicationSlots.  The attached allows checkpointer
use MyLatch for other than checkpoint request while a checkpoint is
running.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 6b877f11f557fc76f206e7a71ff7890952bf63d4 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Wed, 8 Apr 2020 16:35:25 +0900
Subject: [PATCH] Allow MyLatch of checkpointer for other use.

MyLatch of checkpointer process was used only to request for a
checkpoint.  Checkpoint can miss a request if the latch is used for
other purposes during a checkpoint.  Allow MyLatch be used for other
purposes such as condition variables by recording pending checkpoint
requests.
---
 src/backend/postmaster/checkpointer.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e354a78725..86c355f035 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -160,6 +160,12 @@ static double ckpt_cached_elapsed;
 static pg_time_t last_checkpoint_time;
 static pg_time_t last_xlog_switch_time;
 
+/*
+ * Record checkpoint requests.  Since MyLatch is used other than
+ * CheckpointerMain, we need to record pending checkpoint request here.
+ */
+static bool CheckpointRequestPending = false;
+
 /* Prototypes for private functions */
 
 static void HandleCheckpointerInterrupts(void);
@@ -335,6 +341,7 @@ CheckpointerMain(void)
 
         /* Clear any already-pending wakeups */
         ResetLatch(MyLatch);
+        CheckpointRequestPending = false;
 
         /*
          * Process any requests or signals received recently.
@@ -494,6 +501,10 @@ CheckpointerMain(void)
          */
         pgstat_send_bgwriter();
 
+        /* Don't sleep if pending request exists */
+        if (CheckpointRequestPending)
+            continue;
+
         /*
          * Sleep until we are signaled or it's time for another checkpoint or
          * xlog file switch.
@@ -817,6 +828,7 @@ ReqCheckpointHandler(SIGNAL_ARGS)
      */
     SetLatch(MyLatch);
 
+    CheckpointRequestPending = true;
     errno = save_errno;
 }
 
-- 
2.18.2


pgsql-hackers by date:

Previous
From: "movead.li@highgo.ca"
Date:
Subject: Re: recovery_target_action=pause with confusing hint
Next
From: Antonin Houska
Date:
Subject: Re: 2pc leaks fds