Re: [PATCH] More docs on what to do and not do in extension code - Mailing list pgsql-hackers

From Laurenz Albe
Subject Re: [PATCH] More docs on what to do and not do in extension code
Date
Msg-id 07de8fed255f31aa2dfda8efbe839df463f143c6.camel@cybertec.at
Whole thread Raw
In response to [PATCH] More docs on what to do and not do in extension code  (Craig Ringer <craig.ringer@enterprisedb.com>)
Responses Re: [PATCH] More docs on what to do and not do in extension code
List pgsql-hackers
On Mon, 2021-01-18 at 15:56 +0800, Craig Ringer wrote:
> The attached patch expands the xfunc docs and bgworker docs a little, providing a starting point for developers
>  to learn how to do some common tasks the Postgres Way.

I like these changes!

Here is a review:

+      <para>
+       See <xref linkend="xfunc-shared-addin"/> for information on how to
+       request extension shared memory allocations, <literal>LWLock</literal>s,
+       etc.
+      </para>

This doesn't sound very English to me, and I don't see the point in
repeating parts of the enumeration.  How about

  See ... for detailed information how to allocate these resources.

+  <para>
+   If a background worker needs to sleep or wait for activity it should

Missing comma after "activity".

+   always use <link linkend="xfunc-sleeping-interrupts-blocking">PostgreSQL's
+   interupt-aware APIs</link> for the purpose. Do not <function>usleep()</function>,
+   <function>system()</function>, make blocking system calls, etc.
+  </para>

"system" is not a verb.  Suggestion:

  Do not use <function>usleep()</function>, <function>system()</function>
  or other blocking system calls.

+  <para>
+   The <filename>src/test/modules/worker_spi</filename> and
+   <filename>src/test/modules/test_shm_mq</filename> contain working examples
+   that demonstrates some useful techniques.
   </para>

That is missing a noun in my opinion, I would prefer:

  The modules ... contain working examples ...

+  <sect1 id="bgworker-signals" xreflabel="Signal Handling in Background Workers">
+   <title>Signal Handling in Background Workers</title>

It is not a good idea to start a section in the middle of a documentation page.
That will lead to a weird TOC entry at the top of the page.

The better way to do this is to write a short introductory header and convert
most of the first half of the page into another section, so that we end up
with a page the has the introductory material and two TOC entries for the details.

+    The default signal handlers installed for background workers <emphasis>do
+    not interrupt queries or blocking calls into other postgres code</emphasis>

<productname>PostgreSQL</productname>, not "postgres".
Also, there is a comma missing at the end of the line.

+    so they are only suitable for simple background workers that frequently and
+    predictably return control to their main loop. If your worker uses the
+    default background worker signal handling it should call

Another missing comma after "handling".

+    <function>HandleMainLoopInterrupts()</function> on each pass through its
+    main loop.
+   </para>
+
+   <para>
+    Background workers that may make blocking calls into core PostgreSQL code
+    and/or run user-supplied queries should generally replace the default bgworker

Please stick with "background worker", "bgworker" is too sloppy IMO.

+    signal handlers with the handlers used for normal user backends. This will
+    ensure that the worker will respond in a timely manner to a termination
+    request, query cancel request, recovery conflict interrupt, deadlock detector
+    request, etc. To install these handlers, before unblocking interrupts
+    run:

The following would be more grammatical:

  To install these handlers, run the following before unblocking interrupts:

+    Then ensure that your main loop and any other code that could run for some
+    time contains <function>CHECK_FOR_INTERRUPTS();</function> calls. A
+    background worker using these signal handlers must use <link
+    linkend="xfunc-resource-management">PostgreSQL's resource management APIs
+    and callbacks</link> to handle cleanup rather than relying on control
+    returning to the main loop because the signal handlers may call

There should be a comma before "because".

+    <function>proc_exit()</function> directly. This is recommended practice
+    for all types of extension code anyway.
+   </para>
+
+   <para>
+    See the comments in <filename>src/include/miscadmin.h</filename> in the
+    postgres headers for more details on signal handling.
+   </para>

"in the postgres headers" is redundant - at any rate, it should be "PostgreSQL".

+        Do not attempt to use C++ exceptions or Windows Structured Exception
+        Handling, and never call <function>exit()</function> directly.

I am alright with this addition, but I think it would be good to link to
<xref linkend="extend-cpp"/> from it.

+      <listitem id="xfunc-threading">
+       <para>
+        Individual PostgreSQL backends are single-threaded.
+        Never call any PostgreSQL function or access any PostgreSQL-managed data
+        structure from a thread other than the main

"PostgreSQL" should always have the <productname> tag.
This applies to a lot of places in this patch.

+        thread. If at all possible your extension should not start any threads

Comma after "possible".

+        and should only use the main thread. PostgreSQL generally uses subprocesses

Hm.  If the extension does not start threads, it automatically uses the main thread.
I think that should be removed for clarity.

+        that coordinate over shared memory instead of threads - see
+        <xref linkend="bgworker"/>.

It also uses signals and light-weight locks - but I think that you don't need to
describe the coordination mechanisms here, which are explained in the link you added.

+        primitives like <function>WaitEventSetWait</function> where necessary. Any
+        potentially long-running loop should periodically call <function>
+        CHECK_FOR_INTERRUPTS()</function> to give PostgreSQL a chance to interrupt
+        the function in case of a shutdown request, query cancel, etc. This means

Are there other causes than shutdown or query cancellation?
At any rate, I am not fond of enumerations ending with "etc".

+        you should <function>sleep()</function> or <function>usleep()</function>

You mean: "you should *not* use sleep()"

+        for any nontrivial amount of time - use <function>WaitLatch()</function>

"—" would be better than "-".

+        or its variants instead. For details on interrupt handling see

Comma after "handling".

[...]
+        based cleanup. Your extension function could be terminated mid-execution

... could be terminated *in* mid-execution ...

+        by PostgreSQL if any function that it calls makes a
+        <function>CHECK_FOR_INTERRUPTS()</function> check. It may not

"makes" sound kind of clumsy in my ears.

+        Spinlocks, latches, condition variables, and more. Details on all of these
+        is far outside the scope of this document, and the best reference is
+        the relevant source code.

I don't think we have to add that last sentence.  That holds for pretty much
everything in this documentation.

+       <para>
+        Sometimes relation-based state management for extensions is not
+        sufficient to meet their needs. In that case the extension may need to

Better:
  Sometimes, relation-based state management is not sufficient to meet the
  needs of an extension.

+        use PostgreSQL's shared-memory based inter-process communication
+        features, and should certainly do so instead of inventing its own or
+        trying to use platform level features. An extension may use
+        <link linkend="xfunc-shared-addin">"raw" shared memory requested from
+        the postmaster at startup</link> or higher level features like dynamic
+        shared memory segments (<acronym>DSM</acronym>),
+        dynamic shared areas (<acronym>DSA</acronym>),
+        or shared memory message queues (<acronym>shm_mq</acronym>). Examples
+        of the use of some these features can be found in the
+        <filename>src/test/modules/test_shm_mq/</filename> example extension. Others
+        can be found in various main PostgreSQL backend code.
+       </para>

Instead of the last sentence, I'd prefer
... or other parts of the source code.

+      <listitem id="xfunc-relcache-syscache">
+       <para>
+        Look up system catalogs and table information using the relcache and syscache

How about "table metadata" rather than "table information"?

+        APIs (<function>SearchSysCache...</function>,
+        <function>relation_open()</function>, etc) rather than attempting to run
+        SQL queries to fetch the information. Ensure that your function holds
+        any necessary locks on the target objects. Take care not to make any calls

... holds *the* necessary locks ...

+        that could trigger cache invalidations while still accessing any
+        syscache cache entries, as this can cause subtle bugs. See

Subtle?  Perhaps "bugs that are hard to find".

+        <filename>src/backend/utils/cache/syscache.c</filename>,
+        <filename>src/backend/utils/cache/relcache.c</filename>,
+        <filename>src/backend/access/common/relation.c</filename> and their
+        headers for details.
+       </para>
+      </listitem>


Attached is a new version that has my suggested changes, plus a few from
Bharath Rupireddy (I do not agree with many of his suggestions).

Yours,
Laurenz Albe

Attachment

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: psql - factor out echo code
Next
From: Asif Rehman
Date:
Subject: Re: [PATCH] expand the units that pg_size_pretty supports on output