Re: New docs chapter on Transaction Management and related changes - Mailing list pgsql-hackers

On Sat, 2022-11-05 at 10:08 +0000, Simon Riggs wrote:
> Agreed; new compilation patch attached, including mine and then
> Robert's suggested rewordings.

Thanks.  There is clearly a lot of usefule information in this.

Some comments:

> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -24673,7 +24673,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
>         <para>
>          Returns the current transaction's ID.  It will assign a new one if the
>          current transaction does not have one already (because it has not
> -        performed any database updates).
> +        performed any database updates);  see <xref
> +        linkend="transaction-id"/> for details.  If executed in a
> +        subtransaction this will return the top-level xid;  see <xref
> +        linkend="subxacts"/> for details.
>         </para></entry>
>        </row>

I would use a comma after "subtransaction", and I think it would be better to write
"transaction ID" instead of "xid".

> @@ -24690,6 +24693,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
>          ID is assigned yet.  (It's best to use this variant if the transaction
>          might otherwise be read-only, to avoid unnecessary consumption of an
>          XID.)
> +        If executed in a subtransaction this will return the top-level xid.
>         </para></entry>
>        </row>

Same as above.

> @@ -24733,6 +24737,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
>         <para>
>          Returns a current <firstterm>snapshot</firstterm>, a data structure
>          showing which transaction IDs are now in-progress.
> +        Only top-level xids are included in the snapshot; subxids are not
> +        shown;  see <xref linkend="subxacts"/> for details.
>         </para></entry>
>        </row>

Again, I would avoid "xid" and "subxid", or at least use "transaction ID (xid)"
and similar.

> --- a/doc/src/sgml/ref/release_savepoint.sgml
> +++ b/doc/src/sgml/ref/release_savepoint.sgml
> @@ -34,23 +34,16 @@ RELEASE [ SAVEPOINT ] <replaceable>savepoint_name</replaceable>
>    <title>Description</title>
>  
>    <para>
> -   <command>RELEASE SAVEPOINT</command> destroys a savepoint previously defined
> -   in the current transaction.
> +   <command>RELEASE SAVEPOINT</command> will subcommit the subtransaction
> +   established by the named savepoint, if one exists. This will release
> +   any resources held by the subtransaction. If there were any
> +   subtransactions of the named savepoint, these will also be subcommitted.
>    </para>
> 
>    <para>

"Subtransactions of the named savepoint" is somewhat confusing; how about
"subtransactions of the subtransaction established by the named savepoint"?

If that is too long and explicit, perhaps "subtransactions of that subtransaction".

> @@ -78,7 +71,7 @@ RELEASE [ SAVEPOINT ] <replaceable>savepoint_name</replaceable>
> 
>    <para>
>     It is not possible to release a savepoint when the transaction is in
> -   an aborted state.
> +   an aborted state, to do that use <xref linkend="sql-rollback-to"/>.
>    </para>
> 
>    <para>

I think the following is more English:
"It is not possible ... state; to do that, use <xref .../>."

> --- a/doc/src/sgml/ref/rollback.sgml
> +++ b/doc/src/sgml/ref/rollback.sgml
> @@ -56,11 +56,14 @@ ROLLBACK [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
>      <term><literal>AND CHAIN</literal></term>
>      <listitem>
>       <para>
> -      If <literal>AND CHAIN</literal> is specified, a new transaction is
> +      If <literal>AND CHAIN</literal> is specified, a new unaborted transaction is
>        immediately started with the same transaction characteristics (see <xref
>        linkend="sql-set-transaction"/>) as the just finished one.  Otherwise,
>        no new transaction is started.

I don't think that is an improvement.  "Unaborted" is an un-word.  A new transaction
is always "unaborted", isn't it?

> --- a/doc/src/sgml/wal.sgml
> +++ b/doc/src/sgml/wal.sgml
> @@ -909,4 +910,36 @@
>     seem to be a problem in practice.
>    </para>
>   </sect1>
> +
> + <sect1 id="two-phase">
> +
> +  <title>Two-Phase Transactions</title>
> +
> +  <para>
> +   <productname>PostgreSQL</productname> supports a two-phase commit (2PC)
[...]
> +   <filename>pg_twophase</filename> directory. Currently-prepared
> +   transactions can be inspected using <link
> +   linkend="view-pg-prepared-xacts"><structname>pg_prepared_xacts</structname></link>.
> +  </para>
> + </sect1>
> +
>  </chapter>

I don't like "currently-prepared".  How about:
"Transaction that are currently prepared can be inspected..."

This is clearly interesting information, but I don't think the WAL chapter is the right
place for this.  "pg_twophase" is already mentioned in "storage.sgml", and details about
when exactly a prepared transaction is persisted may exceed the details level needed by
the end user.

I'd look for that information in the reference page for PREPARE TRANSACTION; perhaps
that would be a better place.  Or, even better, the new "xact.sgml" chapter.

> --- /dev/null
> +++ b/doc/src/sgml/xact.sgml

+  <title>Transaction Management</title>

+   The word transaction is often abbreviated as "xact".

Should use <quote> here.

> +   <title>Transactions and Identifiers</title>

> +   <para>
> +    Once a transaction writes to the database, it is assigned a
> +    non-virtual <literal>TransactionId</literal> (or <type>xid</type>),
> +    e.g., <literal>278394</literal>. Xids are assigned sequentially
> +    using a global counter used by all databases within the
> +    <productname>PostgreSQL</productname> cluster. This property is used by
> +    the transaction system to order transactions by their first database
> +    write, i.e., lower-numbered xids started writing before higher-numbered
> +    xids.  Of course, transactions might start in a different order.
> +   </para>

"This property"?  How about:
"Because transaction IDs are assigned sequentially, the transaction system can
use them to order transactions by their first database write"

I would want some additional information here: why does the transaction system have
to order transactions by their first database write?

"Of course, transactions might start in a different order."

Now that confuses me.  Are you saying that BEGIN could be in a different order
than the first database write?  Perhaps like this:

"Note that the order in which transactions perform their first database write
might be different from the order in which the transactions started."

> +    The internal transaction ID type <type>xid</type> is 32-bits wide

There should be no hyphen in "32 bits wide", just as in "3 years old".

> +                            A 32-bit epoch is incremented during each
> +    wrap around.

We usually call this "wraparound" without a space.

> +                 There is also a 64-bit type <type>xid8</type> which
> +    includes this epoch and therefore does not wrap around during the
> +    life of an installation and can be converted to xid by casting.

Running "and"s.  Better:

"There is also ... and does not wrap ... life of an installation.
 <type>xid8</type> can be converted to <type>xid</type> by casting."

> +                                      Xids are used as the
> +    basis for <productname>PostgreSQL</productname>'s <link
> +    linkend="mvcc">MVCC</link> concurrency mechanism, <link
> +    linkend="hot-standby">Hot Standby</link>, and Read Replica servers.

What is the difference between a hot standby and a read replica?  I think
one of these terms is sufficient.

> +    In addition to <literal>vxid</literal> and <type>xid</type>,
> +    when a transaction is prepared for two-phase commit it
> +    is also identified by a Global Transaction Identifier
> +    (<acronym>GID</acronym>).

Better:

"In addition to <literal>vxid</literal> and <type>xid</type>,
 prepared transactions also have a Global Transaction Identifier
 (<acronym>GID</acronym>) that is assigned when the transaction is
 prepared for two-phase commit."

> +  <sect1 id="xact-locking">
> +
> +   <title>Transactions and Locking</title>
> +
> +   <para>
> +    Currently-executing transactions are shown in <link
> +    linkend="view-pg-locks"><structname>pg_locks</structname></link>
> +    in columns <structfield>virtualxid</structfield> and
> +    <structfield>transactionid</structfield>.

Better:

"The transaction IDs of currently executing transactions are shown in <link
 linkend="view-pg-locks"><structname>pg_locks</structname></link>
 in the columns <structfield>virtualxid</structfield> and
 <structfield>transactionid</structfield>."

> +    Lock waits on table-level locks are shown waiting for
> +    <structfield>virtualxid</structfield>, while lock waits on row-level
> +    locks are shown waiting for <structfield>transactionid</structfield>.

That's not true.  Transactions waiting for table-level locks are shown
waiting for a "relation" lock in both "pg_stat_activity" and "pg_locks".

> +    Row-level read and write locks are recorded directly in locked
> +    rows and can be inspected using the <xref linkend="pgrowlocks"/>
> +    extension.  Row-level read locks might also require the assignment
> +    of multixact IDs (<literal>mxid</literal>). Mxids are recorded in
> +    the <filename>pg_multixact</filename> directory.

"are recorded directly in *the* locked rows"

I think the mention of multixacts should link to
<xref linkend="vacuum-for-multixact-wraparound"/>.  Again, I would not
specifically mention the directory, since it is already described in
"storage.sgml", but I have no strong optinion there.

> +  <sect1 id="subxacts">
> +
> +   <title>Subtransactions</title>

> +    The word subtransaction is often abbreviated as
> +    <literal>subxact</literal>.

I'd use <quote>, not <literal>.

> +    If a subtransaction is assigned a non-virtual transaction ID,
> +    its transaction ID is referred to as a <literal>subxid</literal>.

Again, I would use <quote>, since we don't <literal> "subxid"
elsewhere.

+                                                                   Up to
+    64 open subxids are cached in shared memory for each backend; after
+    that point, the overhead increases significantly since we must look
+    up subxid entries in <filename>pg_subtrans</filename>.

Comma before "since".  Perhaps you should mention that this means disk I/O.

Yours,
Laurenz Albe



pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: PL/pgSQL cursors should get generated portal names by default
Next
From: Thomas Munro
Date:
Subject: Re: Improve logging when using Huge Pages