Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers

From Peter Smith
Subject Re: [HACKERS] logical decoding of two-phase transactions
Date
Msg-id CAHut+PsCTC7ebm_MaNNhGFpep=Q7VxAdPSZb0ZOUCtfG-+Jy_Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] logical decoding of two-phase transactions  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, May 31, 2021 at 9:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, May 28, 2021 at 11:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > One minor comment for 0001.
> > * Special case: if when tables were specified but copy_data is
> > + * false then it is safe to enable two_phase up-front because
> > + * those tables are already initially READY state. Note, if
> > + * the subscription has no tables then enablement cannot be
> > + * done here - we must leave the twophase state as PENDING, to
> > + * allow ALTER SUBSCRIPTION ... REFRESH PUBLICATION to work.
> >
> > Can we slightly modify this comment as: "Note that if tables were
> > specified but copy_data is false then it is safe to enable two_phase
> > up-front because those tables are already initially READY state. When
> > the subscription has no tables, we leave the twophase state as
> > PENDING, to allow ALTER SUBSCRIPTION ... REFRESH PUBLICATION to work."
> >
> > Also, I don't see any test after you enable this special case. Is it
> > covered by existing tests, if not then let's try to add a test for
> > this?
> >
>
> I see that Ajin's latest patch has addressed the other comments except
> for the above test case suggestion.

Yes, this is a known pending task.

> I have again reviewed the first
> patch and have some comments.
>
> Comments on v81-0001-Add-support-for-prepared-transactions-to-built-i
> ============================================================================
> 1.
> <para>
>         The logical replication solution that builds distributed two
> phase commit
>         using this feature can deadlock if the prepared transaction has locked
> -       [user] catalog tables exclusively. They need to inform users to not have
> -       locks on catalog tables (via explicit <command>LOCK</command>
> command) in
> -       such transactions.
> +       [user] catalog tables exclusively. To avoid this users must refrain from
> +       having locks on catalog tables (via explicit
> <command>LOCK</command> command)
> +       in such transactions.
>        </para>
>
> This change doesn't belong to this patch. I see the proposed text
> could be considered as an improvement but still we can do this
> separately. We are already trying to improve things in this regard in
> the thread [1], so you can propose this change there.
>

OK. This change has been removed in v82, and a patch posted to other
thread here [1]

> 2.
> +<varlistentry>
> +<term>Byte1('K')</term>
> +<listitem><para>
> +                Identifies the message as the commit of a two-phase
> transaction message.
> +</para></listitem>
> +</varlistentry>
> +
> +<varlistentry>
> +<term>Int8</term>
> +<listitem><para>
> +                Flags; currently unused (must be 0).
> +</para></listitem>
> +</varlistentry>
> +
> +<varlistentry>
> +<term>Int64</term>
> +<listitem><para>
> +                The LSN of the commit.
> +</para></listitem>
> +</varlistentry>
> +
> +<varlistentry>
> +<term>Int64</term>
> +<listitem><para>
> +                The end LSN of the commit transaction.
> +</para></listitem>
> +</varlistentry>
>
> Can we change the description of LSN's as "The LSN of the commit
> prepared." and "The end LSN of the commit prepared transaction."
> respectively? This will make their description different from regular
> commit and I think that defines them better.
>
> 3.
> +<varlistentry>
> +<term>Int64</term>
> +<listitem><para>
> +                The end LSN of the rollback transaction.
> +</para></listitem>
> +</varlistentry>
>
> Similar to above, can we change the description here as: "The end LSN
> of the rollback prepared transaction."?
>
> 4.
> + * The exception to this restriction is when copy_data =
> + * false, because when copy_data is false the tablesync will
> + * start already in READY state and will exit directly without
> + * doing anything which could interfere with the apply
> + * worker's message handling.
> + *
> + * For more details see comments atop worker.c.
> + */
> + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && copy_data)
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("ALTER SUBSCRIPTION ... REFRESH with copy_data is not allowed
> when two_phase is enabled"),
> + errhint("Use ALTER SUBSCRIPTION ... REFRESH with copy_data = false"
> + ", or use DROP/CREATE SUBSCRIPTION.")));
>
> The above comment is a bit unclear because it seems you are saying
> there is some problem even when copy_data is false. Are you missing
> 'not' after 'could' in the comment?
>
> 5.
>  XXX Now, this can even lead to a deadlock if the prepare
>   * transaction is waiting to get it logically replicated for
> - * distributed 2PC. Currently, we don't have an in-core
> - * implementation of prepares for distributed 2PC but some
> - * out-of-core logical replication solution can have such an
> - * implementation. They need to inform users to not have locks
> - * on catalog tables in such transactions.
> + * distributed 2PC. This can be avoided by disallowing to
> + * prepare transactions that have locked [user] catalog tables
> + * exclusively.
>
> Can we slightly modify this part of the comment as: "This can be
> avoided by disallowing to prepare transactions that have locked [user]
> catalog tables exclusively but as of now we ask users not to do such
> operation"?
>
> 6.
> +AllTablesyncsReady(void)
> +{
> + bool found_busy = false;
> + bool started_tx = false;
> + bool has_subrels = false;
> +
> + /* We need up-to-date sync state info for subscription tables here. */
> + has_subrels = FetchTableStates(&started_tx);
> +
> + found_busy = list_length(table_states_not_ready) > 0;
> +
> + if (started_tx)
> + {
> + CommitTransactionCommand();
> + pgstat_report_stat(false);
> + }
> +
> + /*
> + * When there are no tables, then return false.
> + * When no tablesyncs are busy, then all are READY
> + */
> + return has_subrels && !found_busy;
> +}
>
> Do we really need found_busy variable in above function. Can't we
> change the return as (has_subrels) && (table_states_not_ready != NIL)?
> If so, then change the comments above return.
>
> 7.
> +/*
> + * Common code to fetch the up-to-date sync state info into the static lists.
> + *
> + * Returns true if subscription has 1 or more tables, else false.
> + */
> +static bool
> +FetchTableStates(bool *started_tx)
>
> Can we update comments indicating that if this function starts the
> transaction then the caller is responsible to commit it?
>
> 8.
> (errmsg("logical replication apply worker for subscription \"%s\" will
> restart so two_phase can be enabled",
> + MySubscription->name)));
>
> Can we slightly change the message as: "logical replication apply
> worker for subscription \"%s\" will restart so that two_phase can be
> enabled"?
>
> 9.
> +void
> +UpdateTwoPhaseState(Oid suboid, char new_state)
> {
> ..
> + /* And update/set two_phase ENABLED */
> + values[Anum_pg_subscription_subtwophasestate - 1] = CharGetDatum(new_state);
> + replaces[Anum_pg_subscription_subtwophasestate - 1] = true;
> ..
> }
>
> The above comment seems wrong to me as we are updating the state as
> passed by the caller.
>

All the above reported issues 2-9 are addressed in the latest 2PC patch set v82

------
[1] https://www.postgresql.org/message-id/CAHut%2BPuTjTp_WERO%3D3Ybp8snTgDpiZeNaxzZhN8ky8XMo4KFVQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: CALL versus procedures with output-only arguments
Next
From: Tom Lane
Date:
Subject: Re: CALL versus procedures with output-only arguments