Thread: Support worker_spi to execute the function dynamically.

Support worker_spi to execute the function dynamically.

From
Masahiro Ikeda
Date:
Hi,

While I'm working on the thread[1], I found that the function of
worker_spi module fails if 'shared_preload_libraries' doesn't have
worker_spi.

The reason is that the database name is NULL because the database name
is initialized only when process_shared_preload_libraries_in_progress
is true.

```
psql=# SELECT worker_spi_launch(1) ;
2023-07-20 11:00:56.491 JST [1179891] LOG:  worker_spi worker 1 
initialized with schema1.counted
2023-07-20 11:00:56.491 JST [1179891] FATAL:  cannot read pg_class 
without having selected a database at character 22
2023-07-20 11:00:56.491 JST [1179891] QUERY:  select count(*) from 
pg_namespace where nspname = 'schema1'
2023-07-20 11:00:56.491 JST [1179891] STATEMENT:  select count(*) from 
pg_namespace where nspname = 'schema1'
2023-07-20 11:00:56.492 JST [1179095] LOG:  background worker 
"worker_spi" (PID 1179891) exited with exit code 1
```

In my understanding, the restriction is not required. So, I think it's
better to change the behavior.
(v1-0001-Support-worker_spi-to-execute-the-function-dynamical.patch)

What do you think?

[1] Support to define custom wait events for extensions
https://www.postgresql.org/message-id/flat/b9f5411acda0cf15c8fbb767702ff43e%40oss.nttdata.com

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Thu, Jul 20, 2023 at 11:15:51AM +0900, Masahiro Ikeda wrote:
> While I'm working on the thread[1], I found that the function of
> worker_spi module fails if 'shared_preload_libraries' doesn't have
> worker_spi.

I guess that you were patching worker_spi to register dynamically a
wait event and embed that in a TAP test or similar without loading it
in shared_preload_libraries?  FWIW, you could use a trick like what I
am attaching here to load a wait event dynamically with the custom
wait event API.  You would need to make worker_spi_init_shmem() a bit
more aggressive with an extra hook to reserve a shmem area size, but
that's enough to show the custom wait event in the same backend as the
one that launches a worker_spi dynamically, while demonstrating how
the API can be used in this case.

> In my understanding, the restriction is not required. So, I think it's
> better to change the behavior.
> (v1-0001-Support-worker_spi-to-execute-the-function-dynamical.patch)
>
> What do you think?

+1.  I'm OK to lift this restriction with a SIGHUP GUC for the
database name and that's not a pattern to encourage in a template
module.  Will do so, if there are no objections.
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Thu, Jul 20, 2023 at 9:25 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> > In my understanding, the restriction is not required. So, I think it's
> > better to change the behavior.
> > (v1-0001-Support-worker_spi-to-execute-the-function-dynamical.patch)
> >
> > What do you think?
>
> +1.  I'm OK to lift this restriction with a SIGHUP GUC for the
> database name and that's not a pattern to encourage in a template
> module.  Will do so, if there are no objections.

+1. However, a comment above helps one to understand why some GUCs are
defined before if (!process_shared_preload_libraries_in_progress). As
this is an example extension, it will help understand the reasoning
better. I know we will it in the commit message, but a direct comment
helps:

    /*
     * Note that this GUC is defined irrespective of worker_spi shared library
     * presence in shared_preload_libraries. It's possible to create the
     * worker_spi extension and use functions without it being specified in
     * shared_preload_libraries. If we return from here without defining this
     * GUC, the dynamic workers launched by worker_spi_launch() will keep
     * crashing and restarting.
     */

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Thu, Jul 20, 2023 at 09:43:37AM +0530, Bharath Rupireddy wrote:
> +1. However, a comment above helps one to understand why some GUCs are
> defined before if (!process_shared_preload_libraries_in_progress). As
> this is an example extension, it will help understand the reasoning
> better. I know we will it in the commit message, but a direct comment
> helps:
>
>     /*
>      * Note that this GUC is defined irrespective of worker_spi shared library
>      * presence in shared_preload_libraries. It's possible to create the
>      * worker_spi extension and use functions without it being specified in
>      * shared_preload_libraries. If we return from here without defining this
>      * GUC, the dynamic workers launched by worker_spi_launch() will keep
>      * crashing and restarting.
>      */

WFM to be more talkative here and document things, but I don't think
that's it.  How about a simple "These GUCs are defined even if this
library is not loaded with shared_preload_libraries, for
worker_spi_launch()."
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Thu, Jul 20, 2023 at 10:09 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jul 20, 2023 at 09:43:37AM +0530, Bharath Rupireddy wrote:
> > +1. However, a comment above helps one to understand why some GUCs are
> > defined before if (!process_shared_preload_libraries_in_progress). As
> > this is an example extension, it will help understand the reasoning
> > better. I know we will it in the commit message, but a direct comment
> > helps:
> >
> >     /*
> >      * Note that this GUC is defined irrespective of worker_spi shared library
> >      * presence in shared_preload_libraries. It's possible to create the
> >      * worker_spi extension and use functions without it being specified in
> >      * shared_preload_libraries. If we return from here without defining this
> >      * GUC, the dynamic workers launched by worker_spi_launch() will keep
> >      * crashing and restarting.
> >      */
>
> WFM to be more talkative here and document things, but I don't think
> that's it.  How about a simple "These GUCs are defined even if this
> library is not loaded with shared_preload_libraries, for
> worker_spi_launch()."

LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Support worker_spi to execute the function dynamically.

From
Masahiro Ikeda
Date:
On 2023-07-20 12:55, Michael Paquier wrote:
> On Thu, Jul 20, 2023 at 11:15:51AM +0900, Masahiro Ikeda wrote:
>> While I'm working on the thread[1], I found that the function of
>> worker_spi module fails if 'shared_preload_libraries' doesn't have
>> worker_spi.
> 
> I guess that you were patching worker_spi to register dynamically a
> wait event and embed that in a TAP test or similar without loading it
> in shared_preload_libraries?  FWIW, you could use a trick like what I
> am attaching here to load a wait event dynamically with the custom
> wait event API.  You would need to make worker_spi_init_shmem() a bit
> more aggressive with an extra hook to reserve a shmem area size, but
> that's enough to show the custom wait event in the same backend as the
> one that launches a worker_spi dynamically, while demonstrating how
> the API can be used in this case.

Yes, you're right. When I tried using worker_spi to test wait event,
I found the behavior. And thanks a lot for your patch. I wasn't aware
of the way.  I'll merge your patch to the tests for wait events.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Support worker_spi to execute the function dynamically.

From
Masahiro Ikeda
Date:
Hi,

On 2023-07-20 13:50, Bharath Rupireddy wrote:
> On Thu, Jul 20, 2023 at 10:09 AM Michael Paquier <michael@paquier.xyz> 
> wrote:
>> 
>> On Thu, Jul 20, 2023 at 09:43:37AM +0530, Bharath Rupireddy wrote:
>> > +1. However, a comment above helps one to understand why some GUCs are
>> > defined before if (!process_shared_preload_libraries_in_progress). As
>> > this is an example extension, it will help understand the reasoning
>> > better. I know we will it in the commit message, but a direct comment
>> > helps:
>> >
>> >     /*
>> >      * Note that this GUC is defined irrespective of worker_spi shared library
>> >      * presence in shared_preload_libraries. It's possible to create the
>> >      * worker_spi extension and use functions without it being specified in
>> >      * shared_preload_libraries. If we return from here without defining this
>> >      * GUC, the dynamic workers launched by worker_spi_launch() will keep
>> >      * crashing and restarting.
>> >      */
>> 
>> WFM to be more talkative here and document things, but I don't think
>> that's it.  How about a simple "These GUCs are defined even if this
>> library is not loaded with shared_preload_libraries, for
>> worker_spi_launch()."
> 
> LGTM.

Thanks for discussing about the patch. I updated the patch from your 
comments
* v2-0001-Support-worker_spi-to-execute-the-function-dynamical.patch

I found another thing to be changed better. Though the tests was assumed
"shared_preload_libraries = worker_spi", the background workers failed 
to
be launched in initialized phase because the database is not created 
yet.

```
# make check    # in src/test/modules/worker_spi
# cat log/postmaster.log # in src/test/modules/worker_spi/
2023-07-20 17:58:47.958 JST worker_spi[853620] FATAL:  database 
"contrib_regression" does not exist
2023-07-20 17:58:47.958 JST worker_spi[853621] FATAL:  database 
"contrib_regression" does not exist
2023-07-20 17:58:47.959 JST postmaster[853612] LOG:  background worker 
"worker_spi" (PID 853620) exited with exit code 1
2023-07-20 17:58:47.959 JST postmaster[853612] LOG:  background worker 
"worker_spi" (PID 853621) exited with exit code 1
```

It's better to remove "shared_preload_libraries = worker_spi" from the
test configuration. I misunderstood that two background workers would
be launched and waiting at the start of the test.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Thu, Jul 20, 2023 at 05:54:55PM +0900, Masahiro Ikeda wrote:
> Yes, you're right. When I tried using worker_spi to test wait event,
> I found the behavior. And thanks a lot for your patch. I wasn't aware
> of the way.  I'll merge your patch to the tests for wait events.

Be careful when using that.  I have not spent more than a few minutes
to show my point, but what I sent lacks a shmem_request_hook in
_PG_init(), for example, to request an amount of shared memory equal
to the size of the state structure.
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Thu, Jul 20, 2023 at 2:59 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jul 20, 2023 at 05:54:55PM +0900, Masahiro Ikeda wrote:
> > Yes, you're right. When I tried using worker_spi to test wait event,
> > I found the behavior. And thanks a lot for your patch. I wasn't aware
> > of the way.  I'll merge your patch to the tests for wait events.
>
> Be careful when using that.  I have not spent more than a few minutes
> to show my point, but what I sent lacks a shmem_request_hook in
> _PG_init(), for example, to request an amount of shared memory equal
> to the size of the state structure.

I think the preferred way to grab a chunk of shared memory for an
external module is by using shmem_request_hook and shmem_startup_hook.
Wait events shared memory too can use them.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Thu, Jul 20, 2023 at 2:38 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> Thanks for discussing about the patch. I updated the patch from your
> comments
> * v2-0001-Support-worker_spi-to-execute-the-function-dynamical.patch
>
> I found another thing to be changed better. Though the tests was assumed
> "shared_preload_libraries = worker_spi", the background workers failed
> to
> be launched in initialized phase because the database is not created
> yet.
>
> ```
> # make check    # in src/test/modules/worker_spi
> # cat log/postmaster.log # in src/test/modules/worker_spi/
> 2023-07-20 17:58:47.958 JST worker_spi[853620] FATAL:  database
> "contrib_regression" does not exist
> 2023-07-20 17:58:47.958 JST worker_spi[853621] FATAL:  database
> "contrib_regression" does not exist
> 2023-07-20 17:58:47.959 JST postmaster[853612] LOG:  background worker
> "worker_spi" (PID 853620) exited with exit code 1
> 2023-07-20 17:58:47.959 JST postmaster[853612] LOG:  background worker
> "worker_spi" (PID 853621) exited with exit code 1
> ```
>
> It's better to remove "shared_preload_libraries = worker_spi" from the
> test configuration. I misunderstood that two background workers would
> be launched and waiting at the start of the test.

I don't think that change is correct. The worker_spi essentially shows
how to start bg workers with RegisterBackgroundWorker and dynamic bg
workers with RegisterDynamicBackgroundWorker. If
shared_preload_libraries = worker_spi not specified in there, you will
miss to start RegisterBackgroundWorkers. Is giving an initidb time
database name to worker_spi.database work there? If the database for
bg workers doesn't exist, changing bgw_restart_time from
BGW_NEVER_RESTART to say 1 will help to see bg workers coming up
eventually.

I think it's worth adding test cases for the expected number of bg
workers (after creating worker_spi extension) and dynamic bg workers
(after calling worker_spi_launch()). Also, to distinguish bg workers
and dynamic bg workers, you can change
bgw_type in worker_spi_launch to "worker_spi dynamic worker".

-    /* get the configuration */
+    /* Get the configuration */

-    /* set up common data for all our workers */
+    /* Set up common data for all our workers */

These unrelated changes better be there as-is. Because, the postgres
code has both commenting styles /* Get .... */ or /* get ....*/, IOW,
single line comments starting with both uppercase and lowercase.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Thu, Jul 20, 2023 at 03:44:12PM +0530, Bharath Rupireddy wrote:
> I don't think that change is correct. The worker_spi essentially shows
> how to start bg workers with RegisterBackgroundWorker and dynamic bg
> workers with RegisterDynamicBackgroundWorker. If
> shared_preload_libraries = worker_spi not specified in there, you will
> miss to start RegisterBackgroundWorkers. Is giving an initidb time
> database name to worker_spi.database work there? If the database for
> bg workers doesn't exist, changing bgw_restart_time from
> BGW_NEVER_RESTART to say 1 will help to see bg workers coming up
> eventually.

Yeah, it does not move the needle by much.  I think that we are
looking at switching this module to use a TAP test in the long term,
instead, where it would be possible to test the scenarios we want to
look at *with* and *without* shared_preload_libraries especially with
the custom wait events for extensions in mind if we add our tests in
this module.

It does not change the fact that Ikeda-san is right about the launch
of dynamic workers with this module being broken, so I have applied v1
with the comment I have suggested.  This will ease a bit the
implementation of any follow-up test scenarios, while avoiding an
incorrect pattern in this template module.
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Fri, Jul 21, 2023 at 8:38 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jul 20, 2023 at 03:44:12PM +0530, Bharath Rupireddy wrote:
> > I don't think that change is correct. The worker_spi essentially shows
> > how to start bg workers with RegisterBackgroundWorker and dynamic bg
> > workers with RegisterDynamicBackgroundWorker. If
> > shared_preload_libraries = worker_spi not specified in there, you will
> > miss to start RegisterBackgroundWorkers. Is giving an initidb time
> > database name to worker_spi.database work there? If the database for
> > bg workers doesn't exist, changing bgw_restart_time from
> > BGW_NEVER_RESTART to say 1 will help to see bg workers coming up
> > eventually.
>
> Yeah, it does not move the needle by much.  I think that we are
> looking at switching this module to use a TAP test in the long term,
> instead, where it would be possible to test the scenarios we want to
> look at *with* and *without* shared_preload_libraries especially with
> the custom wait events for extensions in mind if we add our tests in
> this module.

Okay. Here's a quick patch for adding TAP tests to the worker_spi
module. We can change it to taste.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Fri, Jul 21, 2023 at 11:24:08AM +0530, Bharath Rupireddy wrote:
> Okay. Here's a quick patch for adding TAP tests to the worker_spi
> module. We can change it to taste.

What do you think if we removed completely the sql/ test, moving it to
TAP so as we have only one cluster set up when running a make check?
worker_spi.sql only does two waits (one for the initialization and one
to check that the tuple has been processed), so these could be
replaced by some poll_query_until()?

As we have a dynamic.conf, installcheck is not supported so we don't
use anything with this switch.  Besides, updating
shared_preload_libraries and restarting the node in TAP is cheaper
than a second initdb.

-       snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi worker %d", i);
-       snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi");
+       snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi static worker %d", i);
+       snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi static worker");
[..]
-   snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi worker %d", i);
-   snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi");
+   snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi dynamic worker %d", i);
+   snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi dynamic worker");

Good idea to split that.
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Fri, Jul 21, 2023 at 11:54 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Jul 21, 2023 at 11:24:08AM +0530, Bharath Rupireddy wrote:
> > Okay. Here's a quick patch for adding TAP tests to the worker_spi
> > module. We can change it to taste.
>
> What do you think if we removed completely the sql/ test, moving it to
> TAP so as we have only one cluster set up when running a make check?
> worker_spi.sql only does two waits (one for the initialization and one
> to check that the tuple has been processed), so these could be
> replaced by some poll_query_until()?

I think we can keep SQL tests around as it will help demonstrate
someone quickly write their own SQL tests.

> As we have a dynamic.conf, installcheck is not supported so we don't
> use anything with this switch.  Besides, updating
> shared_preload_libraries and restarting the node in TAP is cheaper
> than a second initdb.

In SQL tests, I ensured worker_spi doesn't start static bg workers by
setting worker_spi.total_workers = 0. Again, all of this is not
necessary, but it will be a very good example for someone writing
extensions and play around with custom config files, SQL and TAP tests
etc.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Support worker_spi to execute the function dynamically.

From
Masahiro Ikeda
Date:
Hi,

On 2023-07-20 18:39, Bharath Rupireddy wrote:
> On Thu, Jul 20, 2023 at 2:59 PM Michael Paquier <michael@paquier.xyz> 
> wrote:
>> 
>> On Thu, Jul 20, 2023 at 05:54:55PM +0900, Masahiro Ikeda wrote:
>> > Yes, you're right. When I tried using worker_spi to test wait event,
>> > I found the behavior. And thanks a lot for your patch. I wasn't aware
>> > of the way.  I'll merge your patch to the tests for wait events.
>> 
>> Be careful when using that.  I have not spent more than a few minutes
>> to show my point, but what I sent lacks a shmem_request_hook in
>> _PG_init(), for example, to request an amount of shared memory equal
>> to the size of the state structure.
> 
> I think the preferred way to grab a chunk of shared memory for an
> external module is by using shmem_request_hook and shmem_startup_hook.
> Wait events shared memory too can use them.

OK, I'll add the hooks in worker_spi for the test of wait events.


On 2023-07-21 12:08, Michael Paquier wrote:
> On Thu, Jul 20, 2023 at 03:44:12PM +0530, Bharath Rupireddy wrote:
>> I don't think that change is correct. The worker_spi essentially shows
>> how to start bg workers with RegisterBackgroundWorker and dynamic bg
>> workers with RegisterDynamicBackgroundWorker. If
>> shared_preload_libraries = worker_spi not specified in there, you will
>> miss to start RegisterBackgroundWorkers. Is giving an initidb time
>> database name to worker_spi.database work there? If the database for
>> bg workers doesn't exist, changing bgw_restart_time from
>> BGW_NEVER_RESTART to say 1 will help to see bg workers coming up
>> eventually.
> 
> Yeah, it does not move the needle by much.  I think that we are
> looking at switching this module to use a TAP test in the long term,
> instead, where it would be possible to test the scenarios we want to
> look at *with* and *without* shared_preload_libraries especially with
> the custom wait events for extensions in mind if we add our tests in
> this module.
> 
> It does not change the fact that Ikeda-san is right about the launch
> of dynamic workers with this module being broken, so I have applied v1
> with the comment I have suggested.  This will ease a bit the
> implementation of any follow-up test scenarios, while avoiding an
> incorrect pattern in this template module.

Thanks for the commits. As Bharath-san said, I forgot that worker_spi
has an aspect of demonstration and I agree to introduce two types of
tests with and without "shared_preload_libraries = worker_spi".



On 2023-07-21 15:51, Bharath Rupireddy wrote:
> On Fri, Jul 21, 2023 at 11:54 AM Michael Paquier <michael@paquier.xyz> 
> wrote:
>> 
>> On Fri, Jul 21, 2023 at 11:24:08AM +0530, Bharath Rupireddy wrote:
>> As we have a dynamic.conf, installcheck is not supported so we don't
>> use anything with this switch.  Besides, updating
>> shared_preload_libraries and restarting the node in TAP is cheaper
>> than a second initdb.
> 
> In SQL tests, I ensured worker_spi doesn't start static bg workers by
> setting worker_spi.total_workers = 0. Again, all of this is not
> necessary, but it will be a very good example for someone writing
> extensions and play around with custom config files, SQL and TAP tests
> etc.

Thanks for making the patch. I confirmed it works in my environments.

> -       snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi worker %d", 
> i);
> -       snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi");
> +       snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi static worker 
> %d", i);
> +       snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi static 
> worker");
> [..]
> -   snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi worker %d", i);
> -   snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi");
> +   snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi dynamic worker 
> %d", i);
> +   snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi dynamic worker");
> 
> Good idea to split that.

I agree. It very useful. I'll refer to its implementation for the wait 
event tests.

I have some questions about the patch. I'm ok to ignore the following 
comment since
your patch is for PoC.

(1)

Do we need to change the minValue from 1 to 0 to support
worker_spi.total_workers = 0?

    DefineCustomIntVariable("worker_spi.total_workers",
                            "Number of workers.",
                            NULL,
                            &worker_spi_total_workers,
                            2,
                            1,
                            100,
                            PGC_POSTMASTER,
                            0,
                            NULL,
                            NULL,
                            NULL);

(2)

Do we need "worker_spi.total_workers = 0" and
"shared_preload_libraries = worker_spi" in dynamic.conf.

Currently, the static bg workers will not be launched because
"shared_preload_libraries = worker_spi" is removed. So
"worker_spi.total_workers = 0" is meaningless.

(3)

We need change and remove them.

> # Copyright (c) 2021-2023, PostgreSQL Global Development Group
> 
> # Test replication statistics data in pg_stat_replication_slots is sane 
> after
> # drop replication slot and restart.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Fri, Jul 21, 2023 at 4:05 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> > In SQL tests, I ensured worker_spi doesn't start static bg workers by
> > setting worker_spi.total_workers = 0. Again, all of this is not
> > necessary, but it will be a very good example for someone writing
> > extensions and play around with custom config files, SQL and TAP tests
> > etc.
>
> Thanks for making the patch. I confirmed it works in my environments.

Thanks for verifying.

> I have some questions about the patch.
>
> (1)
>
> Do we need to change the minValue from 1 to 0 to support
> worker_spi.total_workers = 0?
>
>         DefineCustomIntVariable("worker_spi.total_workers",
>                                                         "Number of workers.",
>                                                         NULL,
>                                                         &worker_spi_total_workers,
>                                                         2,
>                                                         1,
>                                                         100,
>                                                         PGC_POSTMASTER,
>                                                         0,
>                                                         NULL,
>                                                         NULL,
>                                                         NULL);

No, let's keep it that way.

> (2)
>
> Do we need "worker_spi.total_workers = 0" and
> "shared_preload_libraries = worker_spi" in dynamic.conf.
>
> Currently, the static bg workers will not be launched because
> "shared_preload_libraries = worker_spi" is removed. So
> "worker_spi.total_workers = 0" is meaningless.

You're right. worker_spi.total_workers = 0 in custom.conf has no
effect. without shared_preload_libraries = worker_spi. Removed that.

> (3)
>
> We need change and remove them.
>
> > # Copyright (c) 2021-2023, PostgreSQL Global Development Group
> >
> > # Test replication statistics data in pg_stat_replication_slots is sane
> > after
> > # drop replication slot and restart.

Modified.

I'm attaching the v2 patch. Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Masahiro Ikeda
Date:
On 2023-07-22 01:05, Bharath Rupireddy wrote:
> On Fri, Jul 21, 2023 at 4:05 PM Masahiro Ikeda 
> <ikedamsh@oss.nttdata.com> wrote:
>> (2)
>> 
>> Do we need "worker_spi.total_workers = 0" and
>> "shared_preload_libraries = worker_spi" in dynamic.conf.
>> 
>> Currently, the static bg workers will not be launched because
>> "shared_preload_libraries = worker_spi" is removed. So
>> "worker_spi.total_workers = 0" is meaningless.
> 
> You're right. worker_spi.total_workers = 0 in custom.conf has no
> effect. without shared_preload_libraries = worker_spi. Removed that.

OK. If so, we need to remove the following comment in Makefile.

> # enable our module in shared_preload_libraries for dynamic bgworkers

I also confirmed that the tap tests work with meson and make.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Mon, Jul 24, 2023 at 6:34 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> OK. If so, we need to remove the following comment in Makefile.
>
> > # enable our module in shared_preload_libraries for dynamic bgworkers

Done.

> I also confirmed that the tap tests work with meson and make.

Thanks for verifying.

I also added a note atop worker_spi.c that the module also
demonstrates how to write core (SQL) tests and extended (TAP) tests.

I'm attaching the v3 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Masahiro Ikeda
Date:
On 2023-07-24 12:01, Bharath Rupireddy wrote:
> I'm attaching the v3 patch.

I verified it works and it looks good to me.
Thanks to your work, I will be able to implement tests for
custom wait events.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Mon, Jul 24, 2023 at 08:31:01AM +0530, Bharath Rupireddy wrote:
> I also added a note atop worker_spi.c that the module also
> demonstrates how to write core (SQL) tests and extended (TAP) tests.

The value of the SQL tests comes down to the DO blocks that emulate
what the TAP tests could equally be able to do.  While we already have
some places that do something similar (slot.sql or postgres_fdw.sql),
the SQL tests of worker_spi count for a total of five queries, which
is not much with one cluster initialized:
- One pg_reload_conf() to work a loop to happen in the worker.
- Two sanity checks.
- Two wait emulations.

Anyway, most people that do serious hacking on this list care about
the runtime of the tests all the time, and I am not on board in making
things slower for the sake of showing a test example here
particularly if there are ways to make them faster (long-term, we
should be able to do the init step only once for most cases), and
because we *have to* switch to TAP to have more advanced scenarios for
the custom wait events or just dynamic work launches based on what we
set on shared_preload_libraries.  On top of that, we have other
examples in the tree that emulate waits for plain SQL tests to satisfy
assumptions with some follow-up query.

So, I don't really agree with the value gained here compared to the
execution cost of initializing two clusters for this module.  I have
taken the time to check how the runtime changes when switching to TAP
for all the scenarios discussed here, and from my laptop, I can see
that:
- HEAD takes 4.4s, for only the sql/ test.
- Your latest patch is at 5.6s.
- My version attached to this message is at 3.7s.

In terms of runtime the benefits are here for me.  Note that with the
first part of the test (previously in sql/), we don't lose coverage
with the loop of the workers so I agree that only checking that these
are launched is OK once worker_spi is in shared_preload_libraries.
However, I think that we should make sure that they are connected to
the correct database 'mydb'.  I have updated the test to do that.

So, what do you think about the attached?
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Mon, Jul 24, 2023 at 1:10 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Jul 24, 2023 at 08:31:01AM +0530, Bharath Rupireddy wrote:
> > I also added a note atop worker_spi.c that the module also
> > demonstrates how to write core (SQL) tests and extended (TAP) tests.
>
> In terms of runtime the benefits are here for me.  Note that with the
> first part of the test (previously in sql/), we don't lose coverage
> with the loop of the workers so I agree that only checking that these
> are launched is OK once worker_spi is in shared_preload_libraries.
> However, I think that we should make sure that they are connected to
> the correct database 'mydb'.  I have updated the test to do that.
>
> So, what do you think about the attached?

I disagree with removing SQL tests from the worker_spi module. As said
upthread, it makes the worker_spi a fully demonstrable
extension/module - one can just take it, start adding required
functionality and test-cases (both SQL and TAP) for a new module. I
agree that moving to TAP tests will reduce test run time by 1.9
seconds, but to me personally this is not an optimization we must be
doing at the expense of demonstrability.

Having said that, others might have a different opinion here.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Mon, Jul 24, 2023 at 01:50:45PM +0530, Bharath Rupireddy wrote:
> I disagree with removing SQL tests from the worker_spi module. As said
> upthread, it makes the worker_spi a fully demonstrable
> extension/module - one can just take it, start adding required
> functionality and test-cases (both SQL and TAP) for a new module.

Which is basically the same thing with TAP except that these are
grouped now?  The value of a few raw SQL queries with a
NO_INSTALLCHECK does not strike me as enough on top of having to
maintain two different sets of tests.  I'd still choose the cheap and
extensible path here.

> I agree that moving to TAP tests will reduce test run time by 1.9
> seconds, but to me personally this is not an optimization we must be
> doing at the expense of demonstrability.

In a large parallel run, the difference can be felt.
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Mon, Jul 24, 2023 at 05:38:45PM +0900, Michael Paquier wrote:
> Which is basically the same thing with TAP except that these are
> grouped now?  The value of a few raw SQL queries with a
> NO_INSTALLCHECK does not strike me as enough on top of having to
> maintain two different sets of tests.  I'd still choose the cheap and
> extensible path here.

I've been sleeping on that a bit more, and I'd still go with the
refactoring where we initialize one cluster and have all the tests
done by TAP, for the sake of being much cheaper without changing the
coverage, while being more extensible when it comes to introduce tests
for the follow-up patch on custom wait events.
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Wed, Jul 26, 2023 at 09:02:54AM +0900, Michael Paquier wrote:
> I've been sleeping on that a bit more, and I'd still go with the
> refactoring where we initialize one cluster and have all the tests
> done by TAP, for the sake of being much cheaper without changing the
> coverage, while being more extensible when it comes to introduce tests
> for the follow-up patch on custom wait events.

For now, please note that I have applied your idea to add "dynamic" to
the names of the bgworkers registered on a worker_spi_launch() as this
is useful on its own.  I have given up on the "static" part, because
that felt unconsistent with the API names, and we don't use this term
in the docs for bgworkers, additionally.
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Andres Freund
Date:
Hi,

The new test fails with my AIO branch occasionally. But I'm fairly certain
that's just due to timing differences.

Excerpt from the log:

2023-07-27 21:43:00.385 UTC [42339] LOG:  worker_spi worker 3 initialized with schema3.counted
2023-07-27 21:43:00.399 UTC [42344] 001_worker_spi.pl LOG:  statement: SELECT datname, count(datname) FROM
pg_stat_activity
                WHERE backend_type = 'worker_spi' GROUP BY datname;
2023-07-27 21:43:00.403 UTC [42340] LOG:  worker_spi worker 2 initialized with schema2.counted
2023-07-27 21:43:00.407 UTC [42341] LOG:  worker_spi worker 1 initialized with schema1.counted
2023-07-27 21:43:00.420 UTC [42346] 001_worker_spi.pl LOG:  statement: SELECT worker_spi_launch(1);
2023-07-27 21:43:00.423 UTC [42347] LOG:  worker_spi dynamic worker 1 initialized with schema1.counted
2023-07-27 21:43:00.432 UTC [42349] 001_worker_spi.pl LOG:  statement: SELECT worker_spi_launch(2);
2023-07-27 21:43:00.437 UTC [42350] LOG:  worker_spi dynamic worker 2 initialized with schema2.counted
2023-07-27 21:43:00.443 UTC [42347] ERROR:  duplicate key value violates unique constraint
"pg_namespace_nspname_index"
2023-07-27 21:43:00.443 UTC [42347] DETAIL:  Key (nspname)=(schema1) already exists.
2023-07-27 21:43:00.443 UTC [42347] CONTEXT:  SQL statement "CREATE SCHEMA "schema1" CREATE TABLE "counted" (
typetext CHECK (type IN ('total', 'delta')),         value    integer)CREATE UNIQUE INDEX "counted_unique_total" ON
"counted"(type) WHERE type = 'total'"
 


As written, dynamic and static workers race each other. It doesn't make a lot
of sense to me to use the same ids for either?

The attached patch reproduces the problem on master.

Note that without the sleep(3) in the test the workers don't actually finish
starting, the test shuts down the cluster before that happens...

Greetings,

Andres Freund

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Thu, Jul 27, 2023 at 07:23:32PM -0700, Andres Freund wrote:
> As written, dynamic and static workers race each other. It doesn't make a lot
> of sense to me to use the same ids for either?
>
> The attached patch reproduces the problem on master.
>
> Note that without the sleep(3) in the test the workers don't actually finish
> starting, the test shuts down the cluster before that happens...

So you have faced a race condition where the commit of the transaction
doing the schema creation for the static workers is delayed long
enough that the dynamic workers don't see it, and bumped on a catalog
conflict when they try to create the same schemas.

Having each bgworker on its own schema would be enough to prevent
conflicts, but I'd like to add a second thing: a check on
pg_stat_activity.wait_event after starting the workers.  I have added
something like that in the patch I have posted today for the custom
wait events at [1] and it enforces the startup sequences of the
workers in a stricter way.

Does the attached take care of your issue?

[1]: https://www.postgresql.org/message-id/ZMMUiR7kvzPWenhF@paquier.xyz
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Fri, Jul 28, 2023 at 10:15 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> Having each bgworker on its own schema would be enough to prevent
> conflicts, but I'd like to add a second thing: a check on
> pg_stat_activity.wait_event after starting the workers.  I have added
> something like that in the patch I have posted today for the custom
> wait events at [1] and it enforces the startup sequences of the
> workers in a stricter way.
>
> Does the attached take care of your issue?

+# check their existence.  Use IDs that do not overlap with the schemas created
+# by the previous workers.

While using different IDs in tests is a simple fix, -1 for it. I'd
prefer if worker_spi uses different schema prefixes for static and
dynamic bg workers to avoid conflicts. We can either look at
MyBgworkerEntry->bgw_type in worker_spi_main and have schema name as
'{static, dyamic}_worker_schema_%d', id or pass schema name in
bgw_extra.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Fri, Jul 28, 2023 at 10:47:39AM +0530, Bharath Rupireddy wrote:
> +# check their existence.  Use IDs that do not overlap with the schemas created
> +# by the previous workers.
>
> While using different IDs in tests is a simple fix, -1 for it. I'd
> prefer if worker_spi uses different schema prefixes for static and
> dynamic bg workers to avoid conflicts. We can either look at
> MyBgworkerEntry->bgw_type in worker_spi_main and have schema name as
> '{static, dyamic}_worker_schema_%d', id or pass schema name in
> bgw_extra.

For the sake of a test module, I am not really convinced that there is
any need to go down to such complexity with the names of the schemas
created.
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Bharath Rupireddy
Date:
On Fri, Jul 28, 2023 at 1:26 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Jul 28, 2023 at 10:47:39AM +0530, Bharath Rupireddy wrote:
> > +# check their existence.  Use IDs that do not overlap with the schemas created
> > +# by the previous workers.
> >
> > While using different IDs in tests is a simple fix, -1 for it. I'd
> > prefer if worker_spi uses different schema prefixes for static and
> > dynamic bg workers to avoid conflicts. We can either look at
> > MyBgworkerEntry->bgw_type in worker_spi_main and have schema name as
> > '{static, dyamic}_worker_schema_%d', id or pass schema name in
> > bgw_extra.
>
> For the sake of a test module, I am not really convinced that there is
> any need to go down to such complexity with the names of the schemas
> created.

I don't think something like [1] is complex. It makes worker_spi
foolproof. Rather, the other approach proposed, that is to provide
non-conflicting worker IDs to worker_spi_launch in the TAP test file,
looks complicated to me. And it's easy for someone to come, add a test
case with conflicting IDs input to worker_spi_launch and end up in the
same state that we're in now.

[1]
diff --git a/src/test/modules/worker_spi/t/001_worker_spi.pl
b/src/test/modules/worker_spi/t/001_worker_spi.pl
index c293871313..700530afc7 100644
--- a/src/test/modules/worker_spi/t/001_worker_spi.pl
+++ b/src/test/modules/worker_spi/t/001_worker_spi.pl
@@ -27,16 +27,16 @@ is($result, 't', "dynamic bgworker launched");
 $node->poll_query_until(
        'postgres',
        qq[SELECT count(*) > 0 FROM information_schema.tables
-           WHERE table_schema = 'schema4' AND table_name = 'counted';]);
+           WHERE table_schema = 'dynamic_worker_schema4' AND
table_name = 'counted';]);
 $node->safe_psql('postgres',
-       "INSERT INTO schema4.counted VALUES ('total', 0), ('delta', 1);");
+       "INSERT INTO dynamic_worker_schema4.counted VALUES ('total',
0), ('delta', 1);");
 # Issue a SIGHUP on the node to force the worker to loop once, accelerating
 # this test.
 $node->reload;
 # Wait until the worker has processed the tuple that has just been inserted.
 $node->poll_query_until('postgres',
-       qq[SELECT count(*) FROM schema4.counted WHERE type = 'delta';], '0');
-$result = $node->safe_psql('postgres', 'SELECT * FROM schema4.counted;');
+       qq[SELECT count(*) FROM dynamic_worker_schema4.counted WHERE
type = 'delta';], '0');
+$result = $node->safe_psql('postgres', 'SELECT * FROM
dynamic_worker_schema4.counted;');
 is($result, qq(total|1), 'dynamic bgworker correctly consumed tuple data');

 note "testing bgworkers loaded with shared_preload_libraries";
diff --git a/src/test/modules/worker_spi/worker_spi.c
b/src/test/modules/worker_spi/worker_spi.c
index 903dcddef9..02b4204aa2 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -135,10 +135,19 @@ worker_spi_main(Datum main_arg)
        int                     index = DatumGetInt32(main_arg);
        worktable  *table;
        StringInfoData buf;
-       char            name[20];
+       char            name[NAMEDATALEN];

        table = palloc(sizeof(worktable));
-       sprintf(name, "schema%d", index);
+
+       /*
+        * Use different schema names for static and dynamic bg workers to avoid
+        * name conflicts.
+        */
+       if (strcmp(MyBgworkerEntry->bgw_type, "worker_spi") == 0)
+               sprintf(name, "worker_schema%d", index);
+       else if (strcmp(MyBgworkerEntry->bgw_type, "worker_spi dynamic") == 0)
+               sprintf(name, "dynamic_worker_schema%d", index);
+
        table->schema = pstrdup(name);
        table->name = pstrdup("counted");

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Fri, Jul 28, 2023 at 02:11:48PM +0530, Bharath Rupireddy wrote:
> I don't think something like [1] is complex. It makes worker_spi
> foolproof. Rather, the other approach proposed, that is to provide
> non-conflicting worker IDs to worker_spi_launch in the TAP test file,
> looks complicated to me. And it's easy for someone to come, add a test
> case with conflicting IDs input to worker_spi_launch and end up in the
> same state that we're in now.

Sure, but that's not really something that worries me for a template
such as this one, for the sake of these tests.  So I'd leave things to
be as they are, slightly simpler.  That's a minor point, for sure :)
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Alvaro Herrera
Date:
On 2023-Jul-28, Michael Paquier wrote:

> So you have faced a race condition where the commit of the transaction
> doing the schema creation for the static workers is delayed long
> enough that the dynamic workers don't see it, and bumped on a catalog
> conflict when they try to create the same schemas.
>
> Having each bgworker on its own schema would be enough to prevent
> conflicts, but I'd like to add a second thing: a check on
> pg_stat_activity.wait_event after starting the workers.  I have added
> something like that in the patch I have posted today for the custom
> wait events at [1] and it enforces the startup sequences of the
> workers in a stricter way.

Hmm, I think having all the workers doing their in the same table is
better -- if nothing else, because it gives us the opportunity to show
how to use some other coding technique (but also because we are forced
to write the SQL code in a way that's correct for potentially multiple
concurrent workers, which sounds useful to demonstrate).  Can't we
instead solve the race condition by having some shared resource that
blocks the other workers from proceeding until the schema has been
created?  Perhaps an LWLock, or a condition variable, or an advisory
lock.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Support worker_spi to execute the function dynamically.

From
Andres Freund
Date:
Hi,

On 2023-07-28 13:45:29 +0900, Michael Paquier wrote:
> Having each bgworker on its own schema would be enough to prevent
> conflicts, but I'd like to add a second thing: a check on
> pg_stat_activity.wait_event after starting the workers.  I have added
> something like that in the patch I have posted today for the custom
> wait events at [1] and it enforces the startup sequences of the
> workers in a stricter way.

Is that very meaningful? ISTM the interesting thing to check for would be that
the state is idle?

Greetings,

Andres Freund



Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Fri, Jul 28, 2023 at 01:34:15PM -0700, Andres Freund wrote:
> On 2023-07-28 13:45:29 +0900, Michael Paquier wrote:
>> Having each bgworker on its own schema would be enough to prevent
>> conflicts, but I'd like to add a second thing: a check on
>> pg_stat_activity.wait_event after starting the workers.  I have added
>> something like that in the patch I have posted today for the custom
>> wait events at [1] and it enforces the startup sequences of the
>> workers in a stricter way.
>
> Is that very meaningful? ISTM the interesting thing to check for would be that
> the state is idle?

That's interesting for the sake of the other patch to check that the
custom events are reported.  Anyway, I am a bit short in time, so I
have applied the simplest fix where the dynamic workers just use a
different base ID to get out of your way.
--
Michael

Attachment

Re: Support worker_spi to execute the function dynamically.

From
Michael Paquier
Date:
On Fri, Jul 28, 2023 at 12:06:33PM +0200, Alvaro Herrera wrote:
> Hmm, I think having all the workers doing their in the same table is
> better -- if nothing else, because it gives us the opportunity to show
> how to use some other coding technique (but also because we are forced
> to write the SQL code in a way that's correct for potentially multiple
> concurrent workers, which sounds useful to demonstrate).  Can't we
> instead solve the race condition by having some shared resource that
> blocks the other workers from proceeding until the schema has been
> created?  Perhaps an LWLock, or a condition variable, or an advisory
> lock.

That's an idea interesting idea that you have here.  So basically, you
would have all the workers use the same schema do their counting work
for the same base table?  Or should each worker use the same schema,
perhaps defined by a GUC, but different tables?  One thing that has
been itching me a bit with this module was to be able to pass down to
the main worker routine more arguments than just an int ID, but I
could not find myself do that for just for the wait event patch, like:
- The database to connect to.
- The table to create.
- The schema to use.
If any of these are NULL, just use as default what we have now, with
perhaps the bgworker PID as ID instead of a user-specified one.

Having a shared memory state is second thing I was planning to add,
and that can be useful as point of reference in a template.  The other
patch about custom wait events introduces that, FWIW, to track the
custom wait events added.
--
Michael

Attachment