Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Time delayed LR (WAS Re: logical replication restrictions)
Date
Msg-id CAHut+PsCF_a4Mpcdo5bcE-4YuYqzR2Kzj_v=3CFROTP2vnPECA@mail.gmail.com
Whole thread Raw
In response to Re: Time delayed LR (WAS Re: logical replication restrictions)  ("Takamichi Osumi (Fujitsu)" <osumi.takamichi@fujitsu.com>)
Responses Re: Time delayed LR (WAS Re: logical replication restrictions)
RE: Time delayed LR (WAS Re: logical replication restrictions)
List pgsql-hackers
Here are some review comments for patch v9-0001:

======

GENERAL

1. min_ prefix?

What's the significance of the "min_" prefix for this parameter? I'm
guessing the background is that at one time it was considered to be a
GUC so took a name similar to GUC recovery_min_apply_delay (??)

But in practice, I think it is meaningless and/or misleading. For
example, suppose the user wants to defer replication by 1hr. IMO it is
more natural to just say "defer replication by 1 hr" (aka
apply_delay='1hr') Clearly it means replication will take place about
1 hr into the future. OTHO saying "defer replication by a MINIMUM of 1
hr" (aka min_apply_delay='1hr')  is quite vague because then it is
equally valid if the replication gets delayed by 1 hr or 2 hrs or 5
days or 3 weeks since all of those satisfy the minimum delay. The
implementation could hardwire a delay of INT_MAX ms but clearly,
that's not really what the user would expect.

~

So, I think this parameter should be renamed just as 'apply_delay'.

But, if you still decide to keep it as 'min_apply_delay' then there is
a lot of other code that ought to be changed to be consistent with
that name.
e.g.
- subapplydelay in catalogs.sgml --> subminapplydelay
- subapplydelay in system_views.sql --> subminapplydelay
- subapplydelay in pg_subscription.h --> subminapplydelay
- subapplydelay in dump.h --> subminapplydelay
- i_subapplydelay in pg_dump.c --> i_subminapplydelay
- applydelay member name of Form_pg_subscription --> minapplydelay
- "Apply Delay" for the column name displayed by describe.c --> "Min
apply delay"
- more...

(IMO the fact that so much code does not currently say 'min' at all is
just evidence that the 'min' prefix really didn't really mean much in
the first place)


======

doc/src/sgml/catalogs.sgml

2. Section 31.2 Subscription

+  <para>
+   Time delayed replica of subscription is available by indicating
+   <literal>min_apply_delay</literal>. See
+   <xref linkend="sql-createsubscription"/> for details.
+  </para>

How about saying like:

SUGGESTION
The subscriber replication can be instructed to lag behind the
publisher side changes by specifying the
<literal>min_apply_delay</literal> subscription parameter. See XXX for
details.

======

doc/src/sgml/ref/create_subscription.sgml

3. min_apply_delay

+         <para>
+          By default, subscriber applies changes as soon as possible. As with
+          the physical replication feature
+          (<xref linkend="guc-recovery-min-apply-delay"/>), it can be useful to
+          have a time-delayed logical replica. This parameter allows you to
+          delay the application of changes by a specified amount of time. If
+          this value is specified without units, it is taken as milliseconds.
+          The default is zero, adding no delay.
+         </para>

"subscriber applies" -> "the subscriber applies"

"allows you" -> "lets the user"

"The default is zero, adding no delay." -> "The default is zero (no delay)."

~

4.

+          larger than the time deviations between servers. Note that
+          in the case when this parameter is set to a long value, the
+          replication may not continue if the replication slot falls behind the
+          current LSN by more than <literal>max_slot_wal_keep_size</literal>.
+          See more details in <xref linkend="guc-max-slot-wal-keep-size"/>.
+         </para>

4a.
SUGGESTION
Note that if this parameter is set to a long delay, the replication
will stop if the replication slot falls behind the current LSN by more
than <literal>max_slot_wal_keep_size</literal>.

~

4b.
When it is rendered (like below) it looks a bit repetitive:
... if the replication slot falls behind the current LSN by more than
max_slot_wal_keep_size. See more details in max_slot_wal_keep_size.

~

IMO the previous sentence should include the link.

SUGGESTION
if the replication slot falls behind the current LSN by more than
<link linkend =
"guc-max-slot-wal-keep-size"><literal>max_slot_wal_keep_size</literal></link>.

~

5.

+           <para>
+            Synchronous replication is affected by this setting when
+            <varname>synchronous_commit</varname> is set to
+            <literal>remote_write</literal>; every <literal>COMMIT</literal>
+            will need to wait to be applied.
+           </para>

Yes, this deserves a big warning -- but I am just not quite sure of
the details. I think this impacts more than just "remote_rewrite" --
e.g. the same problem would happen if "synchronous_standby_names" is
non-empty.

I think this warning needs to be more generic to cover everything.
Maybe something like below

SUGGESTION:
Delaying the replication can mean there is a much longer time between
making a change on the publisher, and that change being committed on
the subscriber. This can have a big impact on synchronous replication.
See https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT


======

src/backend/commands/subscriptioncmds.c

6. parse_subscription_options

+ ms = interval_to_ms(interval);
+ if (ms < 0 || ms > INT_MAX)
+ ereport(ERROR,
+ errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("%lld ms is outside the valid range for option \"%s\"",
+    (long long) ms, "min_apply_delay"));

"for option" -> "for parameter"

======

src/backend/replication/logical/worker.c

7. apply_delay

+static void
+apply_delay(TimestampTz ts)

IMO having a delay is not the usual case. So, would a better name for
this function be 'maybe_delay'?

~

8.

+ * high value for the delay. This design is different from the physical
+ * replication (that applies the delay at commit time) mainly because write
+ * operations may allow some issues (such as bloat and locks) that can be
+ * minimized if it does not keep the transaction open for such a long time.

Something seems not quite right with this wording -- is there a better
way of describing this?

~

9.

+ /*
+ * Delay apply until all tablesync workers have reached READY state. If we
+ * allow the delay during the catchup phase, once we reach the limit of
+ * tablesync workers, it will impose a delay for each subsequent worker.
+ * It means it will take a long time to finish the initial table
+ * synchronization.
+ */
+ if (!AllTablesyncsReady())
+ return;

"Delay apply until..." -> "The min_apply_delay parameter is ignored until..."

~

10.

+ /*
+ * The worker may be waken because of the ALTER SUBSCRIPTION ...
+ * DISABLE, so the catalog pg_subscription should be read again.
+ */
+ if (!in_remote_transaction && !in_streamed_transaction)
+ {
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+ }

"waken" -> "woken"

======

src/bin/psql/describe.c

11. describeSubscriptions

+ /* Origin and min_apply_delay are only supported in v16 and higher */
  if (pset.sversion >= 160000)
  appendPQExpBuffer(&buf,
-   ", suborigin AS \"%s\"\n",
-   gettext_noop("Origin"));
+   ", suborigin AS \"%s\"\n"
+   ", subapplydelay AS \"%s\"\n",
+   gettext_noop("Origin"),
+   gettext_noop("Apply delay"));

IIUC the psql command is supposed to display useful information to the
user, so I wondered if it is worthwhile to put the units in this
column header -- "Apply delay (ms)" instead of just "Apply delay"
because that would make it far easier to understand the meaning
without having to check the documentation to discover the units.

======

src/include/utils/timestamp.h

12.

+extern int64 interval_to_ms(const Interval *interval);
+

For consistency with the other interval conversion functions exposed
here maybe this one should have been called 'interval2ms'

======

src/test/subscription/t/032_apply_delay.pl

13.

IIUC this test is checking if a delay has occurred by inspecting the
debug logs to see if a certain code path including "logical
replication apply delay" is logged. I guess that is OK, but another
way might be to compare the actual timing values of the published and
replicated rows.

The publisher table can have a column with default now() and the
subscriber side table can have an *additional* column also with
default now(). After replication, those two timestamp values can be
compared to check if the difference exceeds the min_time_delay
parameter specified.

------

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: Michael Paquier
Date:
Subject: Re: Generate pg_stat_get_* functions with Macros