Re: Document efficient self-joins / UPDATE LIMIT techniques. - Mailing list pgsql-hackers

From Laurenz Albe
Subject Re: Document efficient self-joins / UPDATE LIMIT techniques.
Date
Msg-id 35b96eb9034f9ab336b273b54e8818ed334e47b0.camel@cybertec.at
Whole thread Raw
In response to Re: Document efficient self-joins / UPDATE LIMIT techniques.  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: Document efficient self-joins / UPDATE LIMIT techniques.
List pgsql-hackers
On Sat, 2024-02-03 at 15:27 -0500, Corey Huinker wrote:
>
> Here's another attempt, applying Laurenz's feedback:

I like this patch much better.

Some comments:

> --- a/doc/src/sgml/ref/delete.sgml
> +++ b/doc/src/sgml/ref/delete.sgml
> @@ -234,6 +234,24 @@ DELETE FROM films
>     In some cases the join style is easier to write or faster to
>     execute than the sub-select style.
>    </para>
> +  <para id="delete-limit">
> +   While there is no <literal>LIMIT</literal> clause for
> +   <command>DELETE</command>, it is possible to get a similar effect
> +   using the method for <command>UPDATE</command> operations described
> +   <link linkend="update-limit">in greater detail here</link>.
> +<programlisting>
> +WITH delete_batch AS (
> +  SELECT l.ctid
> +  FROM user_logs AS l
> +  WHERE l.status = 'archived'
> +  ORDER BY l.creation_date
> +  LIMIT 10000
> +  FOR UPDATE
> +)
> +DELETE FROM user_logs AS ul
> +USING delete_branch AS del
> +WHERE ul.ctid = del.ctid;
> +</programlisting></para>
>   </refsect1>
>
>   <refsect1>

- About the style: there is usually an empty line between an ending </para>
  and the next starting <para>.  It does not matter for correctness, but I
  think it makes the source easier to read.

- I would rather have only "here" as link text rather than "in greater details
  here".  Even better would be something that gives the reader a clue where
  the link will take her, like
  <link linkend="update-limit">the documentation of <command>UPDATE</command></link>.

- I am not sure if it is necessary to have the <programlisting> at all.
  I'd say that it is just a trivial variation of the UPDATE example.
  On the other hand, a beginner might find the example useful.
  Not sure.

If I had my way, I'd just keep the first paragraph, something like

  <para id="delete-limit">
   While there is no <literal>LIMIT</literal> clause for
   <command>DELETE</command>, it is possible to get a similar effect
   using a self-join with a common table expression as described in the
   <link linkend="update-limit"><command>UPDATE</command> examples</link>.
  </para>


> diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
> index 2ab24b0523..49e0dc29de 100644
> --- a/doc/src/sgml/ref/update.sgml
> +++ b/doc/src/sgml/ref/update.sgml
> @@ -434,7 +434,6 @@ UPDATE wines SET stock = stock + 24 WHERE winename = 'Chateau Lafite 2003';
>  COMMIT;
>  </programlisting>
>    </para>
> -
>    <para>
>     Change the <structfield>kind</structfield> column of the table
>     <structname>films</structname> in the row on which the cursor

Please don't.


I'm mostly fine with the UPDATE example.

> +   it can make sense to perform the operation in smaller batches. Performing a
> +   <command>VACUUM</command> operation on the table in between batches can help
> +   reduce table bloat. The

I think the "in" before between is unnecessary and had better be removed, but
I'll defer to the native speaker.

Yours,
Laurenz Albe



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: cfbot is failing all tests on FreeBSD/Meson builds
Next
From: Nathan Bossart
Date:
Subject: Re: glibc qsort() vulnerability