Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Time delayed LR (WAS Re: logical replication restrictions) |
Date | |
Msg-id | CAHut+PsCF_a4Mpcdo5bcE-4YuYqzR2Kzj_v=3CFROTP2vnPECA@mail.gmail.com Whole thread Raw |
In response to | Re: Time delayed LR (WAS Re: logical replication restrictions) ("Takamichi Osumi (Fujitsu)" <osumi.takamichi@fujitsu.com>) |
Responses |
Re: Time delayed LR (WAS Re: logical replication restrictions)
RE: Time delayed LR (WAS Re: logical replication restrictions) |
List | pgsql-hackers |
Here are some review comments for patch v9-0001: ====== GENERAL 1. min_ prefix? What's the significance of the "min_" prefix for this parameter? I'm guessing the background is that at one time it was considered to be a GUC so took a name similar to GUC recovery_min_apply_delay (??) But in practice, I think it is meaningless and/or misleading. For example, suppose the user wants to defer replication by 1hr. IMO it is more natural to just say "defer replication by 1 hr" (aka apply_delay='1hr') Clearly it means replication will take place about 1 hr into the future. OTHO saying "defer replication by a MINIMUM of 1 hr" (aka min_apply_delay='1hr') is quite vague because then it is equally valid if the replication gets delayed by 1 hr or 2 hrs or 5 days or 3 weeks since all of those satisfy the minimum delay. The implementation could hardwire a delay of INT_MAX ms but clearly, that's not really what the user would expect. ~ So, I think this parameter should be renamed just as 'apply_delay'. But, if you still decide to keep it as 'min_apply_delay' then there is a lot of other code that ought to be changed to be consistent with that name. e.g. - subapplydelay in catalogs.sgml --> subminapplydelay - subapplydelay in system_views.sql --> subminapplydelay - subapplydelay in pg_subscription.h --> subminapplydelay - subapplydelay in dump.h --> subminapplydelay - i_subapplydelay in pg_dump.c --> i_subminapplydelay - applydelay member name of Form_pg_subscription --> minapplydelay - "Apply Delay" for the column name displayed by describe.c --> "Min apply delay" - more... (IMO the fact that so much code does not currently say 'min' at all is just evidence that the 'min' prefix really didn't really mean much in the first place) ====== doc/src/sgml/catalogs.sgml 2. Section 31.2 Subscription + <para> + Time delayed replica of subscription is available by indicating + <literal>min_apply_delay</literal>. See + <xref linkend="sql-createsubscription"/> for details. + </para> How about saying like: SUGGESTION The subscriber replication can be instructed to lag behind the publisher side changes by specifying the <literal>min_apply_delay</literal> subscription parameter. See XXX for details. ====== doc/src/sgml/ref/create_subscription.sgml 3. min_apply_delay + <para> + By default, subscriber applies changes as soon as possible. As with + the physical replication feature + (<xref linkend="guc-recovery-min-apply-delay"/>), it can be useful to + have a time-delayed logical replica. This parameter allows you to + delay the application of changes by a specified amount of time. If + this value is specified without units, it is taken as milliseconds. + The default is zero, adding no delay. + </para> "subscriber applies" -> "the subscriber applies" "allows you" -> "lets the user" "The default is zero, adding no delay." -> "The default is zero (no delay)." ~ 4. + larger than the time deviations between servers. Note that + in the case when this parameter is set to a long value, the + replication may not continue if the replication slot falls behind the + current LSN by more than <literal>max_slot_wal_keep_size</literal>. + See more details in <xref linkend="guc-max-slot-wal-keep-size"/>. + </para> 4a. SUGGESTION Note that if this parameter is set to a long delay, the replication will stop if the replication slot falls behind the current LSN by more than <literal>max_slot_wal_keep_size</literal>. ~ 4b. When it is rendered (like below) it looks a bit repetitive: ... if the replication slot falls behind the current LSN by more than max_slot_wal_keep_size. See more details in max_slot_wal_keep_size. ~ IMO the previous sentence should include the link. SUGGESTION if the replication slot falls behind the current LSN by more than <link linkend = "guc-max-slot-wal-keep-size"><literal>max_slot_wal_keep_size</literal></link>. ~ 5. + <para> + Synchronous replication is affected by this setting when + <varname>synchronous_commit</varname> is set to + <literal>remote_write</literal>; every <literal>COMMIT</literal> + will need to wait to be applied. + </para> Yes, this deserves a big warning -- but I am just not quite sure of the details. I think this impacts more than just "remote_rewrite" -- e.g. the same problem would happen if "synchronous_standby_names" is non-empty. I think this warning needs to be more generic to cover everything. Maybe something like below SUGGESTION: Delaying the replication can mean there is a much longer time between making a change on the publisher, and that change being committed on the subscriber. This can have a big impact on synchronous replication. See https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT ====== src/backend/commands/subscriptioncmds.c 6. parse_subscription_options + ms = interval_to_ms(interval); + if (ms < 0 || ms > INT_MAX) + ereport(ERROR, + errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("%lld ms is outside the valid range for option \"%s\"", + (long long) ms, "min_apply_delay")); "for option" -> "for parameter" ====== src/backend/replication/logical/worker.c 7. apply_delay +static void +apply_delay(TimestampTz ts) IMO having a delay is not the usual case. So, would a better name for this function be 'maybe_delay'? ~ 8. + * high value for the delay. This design is different from the physical + * replication (that applies the delay at commit time) mainly because write + * operations may allow some issues (such as bloat and locks) that can be + * minimized if it does not keep the transaction open for such a long time. Something seems not quite right with this wording -- is there a better way of describing this? ~ 9. + /* + * Delay apply until all tablesync workers have reached READY state. If we + * allow the delay during the catchup phase, once we reach the limit of + * tablesync workers, it will impose a delay for each subsequent worker. + * It means it will take a long time to finish the initial table + * synchronization. + */ + if (!AllTablesyncsReady()) + return; "Delay apply until..." -> "The min_apply_delay parameter is ignored until..." ~ 10. + /* + * The worker may be waken because of the ALTER SUBSCRIPTION ... + * DISABLE, so the catalog pg_subscription should be read again. + */ + if (!in_remote_transaction && !in_streamed_transaction) + { + AcceptInvalidationMessages(); + maybe_reread_subscription(); + } + } "waken" -> "woken" ====== src/bin/psql/describe.c 11. describeSubscriptions + /* Origin and min_apply_delay are only supported in v16 and higher */ if (pset.sversion >= 160000) appendPQExpBuffer(&buf, - ", suborigin AS \"%s\"\n", - gettext_noop("Origin")); + ", suborigin AS \"%s\"\n" + ", subapplydelay AS \"%s\"\n", + gettext_noop("Origin"), + gettext_noop("Apply delay")); IIUC the psql command is supposed to display useful information to the user, so I wondered if it is worthwhile to put the units in this column header -- "Apply delay (ms)" instead of just "Apply delay" because that would make it far easier to understand the meaning without having to check the documentation to discover the units. ====== src/include/utils/timestamp.h 12. +extern int64 interval_to_ms(const Interval *interval); + For consistency with the other interval conversion functions exposed here maybe this one should have been called 'interval2ms' ====== src/test/subscription/t/032_apply_delay.pl 13. IIUC this test is checking if a delay has occurred by inspecting the debug logs to see if a certain code path including "logical replication apply delay" is logged. I guess that is OK, but another way might be to compare the actual timing values of the published and replicated rows. The publisher table can have a column with default now() and the subscriber side table can have an *additional* column also with default now(). After replication, those two timestamp values can be compared to check if the difference exceeds the min_time_delay parameter specified. ------ Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: