On Fri, Aug 22, 2025 at 11:21:22AM -0400, Tom Lane wrote:
> hubert depesz lubaczewski <depesz@depesz.com> writes:
> > I got repeatable case today. Is is breaking on its own everyy
> > ~ 5 minutes.
>
> Interesting. That futex call is presumably caused by interaction
> with some other process within the standby server, and the only
> plausible candidate really is the startup process (which is replaying
> WAL received from the primary). There are cases where WAL replay
> will take locks that can block queries on the standby. Can you
> correlate the delays on the standby server with any DDL events
> occurring on the primary?
Nope. Plus there is certain repetition of these cases, so even if I'd
miss *some* create table/alter, it just isn't going to be happening
every 4-5 minutes.
For example, looking at logs for the last ~2h, and just checking
situation when there are more than 20 messages in the same milisecond,
I can see:
108 14:02:03.149
25 14:04:01.619
110 14:05:36.924
77 14:05:36.925
108 14:09:28.155
38 14:13:52.481
63 14:13:52.482
73 14:13:52.484
146 14:18:19.338
39 14:18:19.339
24 14:20:01.694
82 14:23:07.352
55 14:23:07.353
37 14:23:07.353
45 14:27:44.125
132 14:27:44.126
109 14:31:41.593
70 14:31:41.594
24 14:32:01.205
21 14:34:01.477
79 14:35:36.761
104 14:35:36.762
22 14:39:49.541
151 14:39:49.542
22 14:39:49.543
112 14:44:15.607
73 14:44:15.608
28 14:48:01.256
50 14:48:25.588
131 14:48:25.589
139 14:52:44.391
74 14:57:02.369
117 14:57:02.370
20 15:00:02.008
137 15:00:43.982
34 15:00:43.983
20 15:01:01.110
22 15:04:21.037
153 15:04:21.038
20 15:08:01.136
31 15:08:55.798
126 15:08:55.799
76 15:13:46.654
83 15:13:46.655
20 15:17:01.700
107 15:18:42.112
72 15:18:42.113
124 15:23:48.689
32 15:23:48.690
25 15:23:48.690
28 15:24:01.397
So, while there are outliers, I'd say that most of the problems happens every
3-5 minutes.
depesz