Thread: bgworker / SPI docs patches

bgworker / SPI docs patches

From
Craig Ringer
Date:
Hi all

After spending a while working on the BDR extension I've found that the current documentation on the SPI and bgworkers could use some more detail.

In the process I've put cross reference labels in for most chapter headings, as I got sick of seeing "Chapter 33" (or whatever) everywhere in the docs. I'd like to progressively add them to every section, refentry, chapter, etc that has an id.

I hope this is useful.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

Re: bgworker / SPI docs patches

From
Peter Eisentraut
Date:
On 2/17/15 3:29 AM, Craig Ringer wrote:
> In the process I've put cross reference labels in for most chapter
> headings, as I got sick of seeing "Chapter 33" (or whatever) everywhere
> in the docs. I'd like to progressively add them to every section,
> refentry, chapter, etc that has an id.

This is not the correct way to do it.  You should change the style sheet
so that it takes the chapter title instead of the chapter number as link
label.

Although I will point out that the current setup is intentional, because
the alternative looks quite ugly in a lot of cases.



Re: bgworker / SPI docs patches

From
Robert Haas
Date:
On Tue, Feb 17, 2015 at 3:29 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> After spending a while working on the BDR extension I've found that the
> current documentation on the SPI and bgworkers could use some more detail.
>
> In the process I've put cross reference labels in for most chapter headings,
> as I got sick of seeing "Chapter 33" (or whatever) everywhere in the docs.
> I'd like to progressively add them to every section, refentry, chapter, etc
> that has an id.
>
> I hope this is useful.

Per Peter, 0001 is not wanted.  I tend to agree with him on that point
of style.  Patch 0002 makes the docs build fail unless 0001 is applied
first, so I can't pursue that unless you want to revise that
accordingly.  I've committed 0003, and will look at 0004 next.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: bgworker / SPI docs patches

From
Alvaro Herrera
Date:
I wrote this a very long time ago but apparently it was eaten by a spam
filter before it reached the lists.  Resending.

Craig Ringer wrote:

> From c73d1303cb3474ccc799eeda252eeef1fc3d9026 Mon Sep 17 00:00:00 2001
> From: Craig Ringer <craig@2ndquadrant.com>
> Date: Mon, 16 Feb 2015 18:17:48 +1300
> Subject: [PATCH 1/4] Add xreflabel to most chapter entries

So I think this is pretty neat as a general idea, but there was
opposition in that printed materials would no longer have a useful
pointer.  If we could tweak the output so that it says something like
"Chapter 33, Herding Elephants" instead of just "Chapter 33" (current)
or just "Herding Elephants" (your proposal), I think the whole patch
would be great.  Now I don't really know how to do this; at the very
least we will need to edit stylesheet.dsl and stylesheet.xsl, I think.

So dbl1en.dsl defines en-xref-strings for chapters like

    (list (normalize "chapter")     (if %chapter-autolabel%
                        "&Chapter; %n"
                        "the &chapter; called %t"))

if we could change that to, say,
    "&Chapter; %n (“%t”)"
which I think would render as
Chapter 33 (“Herding Elephants”)

we'd be done in the DSSSL side.  No idea how to actually do it, though
... Peter, any thoughts?

In XSL, there's xref-number which is presumably what we use today, but
there's also xref-number-and-title.  I think it's just a matter of
changing xrefstyle.

> From 6e2103b211aee44780c6106e307ea4a561d11fde Mon Sep 17 00:00:00 2001
> From: Craig Ringer <craig@2ndquadrant.com>
> Date: Mon, 16 Feb 2015 18:17:59 +1300
> Subject: [PATCH 2/4] Document how to build with the website style

+1 for this.


> diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
> index 6eada92..6437ab1 100644
> --- a/doc/src/sgml/xfunc.sgml
> +++ b/doc/src/sgml/xfunc.sgml
> @@ -3307,6 +3307,15 @@ CREATE FUNCTION make_array(anyelement) RETURNS anyarray
>     <sect2 id="xfunc-shmem-lwlocks" xreflabel="shared memory and LWLocks in user-defined C functions">
>      <title>Shared Memory and LWLocks</title>
>
> +    <indexterm zone="xfunc-shmem-lwlocks">
> +     <primary>Shared memory in extensions</primary>
> +    </indexterm>

I would do this as
  <primary>shared memory</primary><secondary>in extensions</secondary>
instead.

> +    <indexterm zone="xfunc-shmem-lwlocks">
> +     <primary>LWLocks (extensions)</primary>
> +     <secondary>Lightweight locks (extensions)</secondary>
> +    </indexterm>

I think this is two entries,
  <indexterm>
    <primary>lightweight locks</primary>
    <secondary>in extensions</secondary>
  </indexterm>

  <indexterm>
    <primary>lwlocks</primary>
    <see>lightweight locks</see>
  </indexterm>

> From 04e1cbe49268e22886e2b30fe412fef0cb9dcc65 Mon Sep 17 00:00:00 2001
> From: Craig Ringer <craig@2ndquadrant.com>
> Date: Mon, 16 Feb 2015 18:37:19 +1300
> Subject: [PATCH 4/4] Document SPI use from bgworkers in greater detail

This is the meat of the submission, isn't it.

> diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
> index ef28f72..d0e7dc0 100644
> --- a/doc/src/sgml/bgworker.sgml
> +++ b/doc/src/sgml/bgworker.sgml
> @@ -34,18 +34,21 @@
>    <productname>PostgreSQL</> is started by including the module name in
>    <varname>shared_preload_libraries</>.  A module wishing to run a background
>    worker can register it by calling
> +  <indexterm><primary>RegisterBackgroundWorker</primary></indexterm>
>    <function>RegisterBackgroundWorker(<type>BackgroundWorker *worker</type>)</function>
>    from its <function>_PG_init()</>.  Background workers can also be started
>    after the system is up and running by calling the function
> +  <indexterm><primary>RegisterDynamicBackgroundWorker</primary></indexterm>
>    <function>RegisterDynamicBackgroundWorker(<type>BackgroundWorker
>    *worker, BackgroundWorkerHandle **handle</type>)</function>.  Unlike
>    <function>RegisterBackgroundWorker</>, which can only be called from within
>    the postmaster, <function>RegisterDynamicBackgroundWorker</function> must be
> -  called from a regular backend.
> +  called from a regular backend (which might be another background worker).
>   </para>

I don't think we typically attach <indexterm> tags within paragraphs;
most commonly they are outside <para> or other block items.  I imagine
this works fine, but it's probably best to continue tradition unless
there's a specific reason to do otherwise.

>    <para>
>     <structfield>bgw_flags</> is a bitwise-or'd bit mask indicating the
> -   capabilities that the module wants.  Possible values are
> -   <literal>BGWORKER_SHMEM_ACCESS</literal> (requesting shared memory access)
> -   and <literal>BGWORKER_BACKEND_DATABASE_CONNECTION</literal> (requesting the
> -   ability to establish a database connection, through which it can later run
> -   transactions and queries). A background worker using
> -   <literal>BGWORKER_BACKEND_DATABASE_CONNECTION</literal> to connect to
> -   a database must also attach shared memory using
> -   <literal>BGWORKER_SHMEM_ACCESS</literal>, or worker start-up will fail.
> +   capabilities that the module wants.  Possible values are:
> +   <variablelist>

Using a <variablelist> rather than prose seems much better to me.


>    <para>
> -   <structfield>bgw_main</structfield> is a pointer to the function to run when
> -   the process is started.  This function must take a single argument of type
> -   <type>Datum</> and return <type>void</>.
> -   <structfield>bgw_main_arg</structfield> will be passed to it as its only
> -   argument.  Note that the global variable <literal>MyBgworkerEntry</literal>
> -   points to a copy of the <structname>BackgroundWorker</structname> structure
> -   passed at registration time. <structfield>bgw_main</structfield> may be
> -   NULL; in that case, <structfield>bgw_library_name</structfield> and
> -   <structfield>bgw_function_name</structfield> will be used to determine
> -   the entry point.  This is useful for background workers launched after
> -   postmaster startup, where the postmaster does not have the requisite
> -   library loaded.
> +   <structfield>bgw_main</structfield> specifies the function to run when the
> +   background worker is started. If non-NULL,
> +   <structfield>bgw_main</structfield> will take precedence over
> +   <structfield>bgw_library_name</structfield> and
> +   <structfield>bgw_function_name</structfield>.
> +   <warning>
> +    <para>
> +     Use of this field is deprecated. It should be set to
> +     <literal>NULL</literal> then <structfield>bgw_library_name</structfield>
> +     and <structfield>bgw_function_name</structfield> should be used instead.
> +    </para>
> +    <para>
> +     Specifying the background worker entry point using a function pointer will
> +     not work correctly on Windows (or with <literal>EXEC_BACKEND</literal>
> +     defined). It may also malfunction when used to point to a function in a
> +     dynamically loaded shared library used by a dynamic background worker.
> +    </para>
> +   </warning>
>    </para>

Interesting.  Given the drawbacks of the old mechanism, I am okay with this change.
In fact, why don't we remove bgw_main completely?

> +  <para>
>     <structfield>bgw_notify_pid</structfield> is the PID of a PostgreSQL
>     backend process to which the postmaster should send <literal>SIGUSR1</>
> -   when the process is started or exits.  It should be 0 for workers registered
> -   at postmaster startup time, or when the backend registering the worker does
> +   when the process is started or exits, causing the notified process's latch
> +   to be set.  It should be 0 for workers registered
> +   at postmaster startup time, when the backend registering the worker does
>     not wish to wait for the worker to start up.  Otherwise, it should be
>     initialized to <literal>MyProcPid</>.
>    </para>

I'm not sure about documenting that SIGUSR1 will set a process' latch.
That can be changed in that process' signal handler.  Maybe this is fine
if we add the word "typically" somewhere.

> diff --git a/doc/src/sgml/example-worker-spi.sgml b/doc/src/sgml/example-worker-spi.sgml
> new file mode 100644
> index 0000000..63d8f3d
> --- /dev/null
> +++ b/doc/src/sgml/example-worker-spi.sgml
> @@ -0,0 +1,26 @@
> +<!-- doc/src/sgml/example-worker-spi.sgml -->
> +
> +<sect1 id="example-worker-spi" xreflabel="worker-spi">
> + <title>worker-spi example</title>
> +
> + <indexterm zone="example-worker-spi">
> +  <primary>Background worker with SPI</primary>
> +  <secondary>SPI in background worker</secondary>
> + </indexterm>

I don't think this module really needs a doc page, but okay.  In any
case, I think these <indexterm> aren't good; one of them should be split
in <primary><secondary> and the other one should probably be a <see>.

> @@ -4092,6 +4097,143 @@ INSERT INTO a SELECT * FROM a;
>    </para>
>   </sect1>
>
> + <sect1 id="spi-bgworker" xreflabel="SPI in background workers">
> +  <title>Using the SPI from background workers</title>
> +
> +  <para>
> +   Using the SPI from <link linkend="bgworker">background workers</>
> +   requires some additional steps that are not required from SPI-using
> +   procedures invoked via SQL from normal user backends. When using the SPI
> +   from an SQL-callable function it is safe to assume that a transaction is
> +   already open, there is already a snapshot, and that PostgreSQL's statistics
> +   have been updated. None of these things may be assumed for use of the SPI
> +   from a bgworker.
> +  </para>
> +
> +  <para>
> +   To safely use the SPI from a bgworker you must ensure that:
> +  <itemizedlist>
> +   <listitem><para>A transaction is open (in progress)</para></listitem>
> +   <listitem><para>The SPI is connected</para></listitem>
> +   <listitem><para>There is an active snapshot or <parameter>read_only</parameter> is set to
<literal>false</literal>in SPI calls</para></listitem> 
> +  </itemizedlist>
> +   You should also update the statement start and end timestamps for use in
> +   <literal>pg_stat_activity</> using <function>pgstat_report_activity</>.
> +  </para>
> +
> +  <para>
> +   Attempting to use the SPI without an open transaction or using the SPI
> +   in <parameter>read_only</parameter> mode without setting a snapshot will
> +   generally <emphasis>not</emphasis> cause obvious failures or, in a
> +   <literal>--enable-cassert</literal> build, assertions. Instead it will
> +   usually appear to work for simple cases but may produce incorrect
> +   results or be unstable. Users of background workers should generally
> +   prefix SPI use with
> +<programlisting>
> +Assert(IsTransactionState());
> +</programlisting>
> +   if the function does not its self establish a transaction.
> +  </para>

+1 to the general idea of adding this section.  "its self"?

> +  <para>
> +   Typical SPI use from a background worker involves setting up a transaction,
> +   setting the statement start timestamp, connecting to the SPI, updating
> +   pg_stat_activity, establishing a snapshot, running the query/queries,
> +   then performing matching cleanup actions. For example:
> +  </para>

Can't we just refer to the source code of worker_spi?  That's the whole
point of that module.  Maybe we can add lots more comment to that, to
explain the necessity of each line there.

> +  <para>
> +   In more complex cases the function may need to check if a transaction is
> +   already open with <function>IsTransactionState()</function> and only perform
> +   transaction management if there is no current transaction. If the SPI may
> +   already connected by the caller the function should use
> +   <function>SPI_push_conditional</function> before <function>SPI_connect</function> and
> +   <function>SPI_pop_conditional</function> after <function>SPI_finish</function>. Snapshot
> +   state may also be conditionally handled. The resulting procedure will
> +   look more like:

If we need a more complex example in worker_spi, let's add one.

> diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
> index 6437ab1..39e6a15 100644
> --- a/doc/src/sgml/xfunc.sgml
> +++ b/doc/src/sgml/xfunc.sgml
> @@ -3357,6 +3357,60 @@ if (!ptr)
>  }
>  </programlisting>
>      </para>
> +
> +    <para>
> +     Extensions registered to run at startup
> +     using <xref linkend="guc-shared-preload-libraries"> should instead install a
> +     <literal>shmem_startup_hook</> from <function>_PG_init</>.  This will be
> +     invoked once shared memory is ready but before user backends or background
> +     workers get launched. The hook function invoked should use a pattern like
> +     that shown for shared memory initialization above then call the next hook
> +     function, e.g. declare:
> +<programlisting>
> +static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
> +</programlisting>
> +     then from <function>_PG_init</>, after any
> +     <function>RequestAddinShmemSpace</> and <function>RequestAddinLWLocks</>
> +     install the hook:
> +<programlisting>
> +     prev_shmem_startup_hook = shmem_startup_hook;
> +     shmem_startup_hook = my_worker_shmem_startup;
> +</programlisting>
> +     where <function>my_worker_shmem_startup</function> is a <type>void</> function
> +     with <type>void</> arguments that calls the next hook (if any):
> +<programlisting>
> +     if (prev_shmem_startup_hook != NULL)
> +         prev_shmem_startup_hook();
> +</programlisting>
> +     then calls <function>ShmemInitStruct</> per the example shown above. An
> +     example of this usage can be found in the initialisation of the <xref
> +     linkend="pgstatstatements"> extension.
> +    </para>

No problem with this.


> +     <caution>
> +      <title>Shared memory is zeroed on postmaster restart</title>
> +      <para>
> +       If the postmaster restarts - usually due to the crash of a backend
> +       process like a user backend or a background worker - then it zeroes
> +       shared memory on restart then re-invokes any shared memory setup hooks
> +       that have been installed by extensions. Extensions using shared memory
> +       must be prepared for this.
> +      </para>
> +      <para>
> +       Background workers are not unregistered if the postmaster restarts
> +       due to the crash of a background worker or user backend. This can lead
> +       to background workers accessing shared memory that has been reinitialized.
> +       Care must be taken to ensure that initialisation of shared memory produces
> +       stable results when re-run or workers must detect the postmaster restart
> +       using a generation counter and exit. In particular if parameters are being
> +       passed to a bgworker via shared memory the background worker
> +       <structfield>bgw_main_arg</structfield> should always be an index into an
> +       array allocated in shared memory, rather than a pointer, and repeated
> +       initialisations of the shared memory segment must produce the same range
> +       of valid indexes in the same order.
> +      </para>
> +     </caution>
> +
>     </sect2>

Hm.  Aren't all bgworkers also restarted if postmaster power-cycles
everything due to a crash in any process?  ISTM that they should be, and
if not, that's a bug.

Please use — for long dashes, but I think the whole <caution>
block should go away.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: bgworker / SPI docs patches

From
Robert Haas
Date:
On Wed, Jul 29, 2015 at 11:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Feb 17, 2015 at 3:29 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> After spending a while working on the BDR extension I've found that the
>> current documentation on the SPI and bgworkers could use some more detail.
>>
>> In the process I've put cross reference labels in for most chapter headings,
>> as I got sick of seeing "Chapter 33" (or whatever) everywhere in the docs.
>> I'd like to progressively add them to every section, refentry, chapter, etc
>> that has an id.
>>
>> I hope this is useful.
>
> Per Peter, 0001 is not wanted.  I tend to agree with him on that point
> of style.  Patch 0002 makes the docs build fail unless 0001 is applied
> first, so I can't pursue that unless you want to revise that
> accordingly.  I've committed 0003, and will look at 0004 next.

+   <warning>
+    <para>
+     Use of this field is deprecated. It should be set to
+     <literal>NULL</literal> then <structfield>bgw_library_name</structfield>
+     and <structfield>bgw_function_name</structfield> should be used instead.
+    </para>

I don't think bgw_main is exactly deprecated.  It's fine to use it if
the function is in the core code; it just can't be safely used for
functions in dynamically loaded shared libraries.  Maybe that's close
enough to "deprecated" that we should just call it deprecated, but I'm
slightly reluctant to use that word.

   <para>
    <structfield>bgw_notify_pid</structfield> is the PID of a PostgreSQL
    backend process to which the postmaster should send <literal>SIGUSR1</>
-   when the process is started or exits.  It should be 0 for workers registered
-   at postmaster startup time, or when the backend registering the worker does
+   when the process is started or exits, causing the notified process's latch
+   to be set.  It should be 0 for workers registered
+   at postmaster startup time, when the backend registering the worker does
    not wish to wait for the worker to start up.  Otherwise, it should be
    initialized to <literal>MyProcPid</>.
   </para>

This isn't actually right.  You can use set_latch_on_sigusr1 to get
that behavior, but it's not the default behavior.

    <varname>dboid</> is <literal>InvalidOid</>, the session is not connected
-   to any particular database, but shared catalogs can be accessed.
+   to any particular database, but shared catalogs (tables where
+   <varname>relisshared</varname> is <literal>true</literal> in
+   <varname>pg_class</varname>) can be accessed.
    If <varname>username</> is NULL or <varname>useroid</> is

I don't think we should be defining the term "shared catalogs" in the
section on background workers.  If it needs to be explained, let's do
that elsewhere (or find where it's already done) and link to it.

    A background worker can only call one of these two functions, and only
-   once.  It is not possible to switch databases.
+   once.  It is not possible to switch databases without exiting and restarting
+   the background worker.

That's not really switching databases, though, is it?

I've committed some bits of this that seem useful and controversial
with rather extensive wordsmithing; let me know if it doesn't look
good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: bgworker / SPI docs patches

From
Heikki Linnakangas
Date:
On 07/29/2015 09:42 PM, Robert Haas wrote:
> +   <warning>
> +    <para>
> +     Use of this field is deprecated. It should be set to
> +     <literal>NULL</literal> then <structfield>bgw_library_name</structfield>
> +     and <structfield>bgw_function_name</structfield> should be used instead.
> +    </para>
>
> I don't think bgw_main is exactly deprecated.  It's fine to use it if
> the function is in the core code; it just can't be safely used for
> functions in dynamically loaded shared libraries.  Maybe that's close
> enough to "deprecated" that we should just call it deprecated, but I'm
> slightly reluctant to use that word.

Hmm. worker_spi module uses bgw_main. Is that bad? Given that work_spi
is supposedly an example or template that you copy-paste from when
writing your own bgworker, we should make sure it follows the best
practice. Also, I note that worker_spi doesn't memset(0) its
BackgroundWorker struct, so any uninitialized fields will contain
garbage. Including bgw_library_name and bgw_function_name. That seems bad.

> I've committed some bits of this that seem useful and controversial
> with rather extensive wordsmithing; let me know if it doesn't look
> good.

I've marked this as committed in the commitfest. If we're waiting for a
followup patch for the remaining bits, please change it back to Waiting
on Author, or post the followup patch to the next commitfest if it can't
be done quickly.

- Heikki


Re: bgworker / SPI docs patches

From
Robert Haas
Date:
On Thu, Jul 30, 2015 at 3:47 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Hmm. worker_spi module uses bgw_main. Is that bad? Given that work_spi is
> supposedly an example or template that you copy-paste from when writing your
> own bgworker, we should make sure it follows the best practice. Also, I note
> that worker_spi doesn't memset(0) its BackgroundWorker struct, so any
> uninitialized fields will contain garbage. Including bgw_library_name and
> bgw_function_name. That seems bad.

Yeah, that stuff is bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company