Thread: Support worker_spi to execute the function dynamically.
Hi, While I'm working on the thread[1], I found that the function of worker_spi module fails if 'shared_preload_libraries' doesn't have worker_spi. The reason is that the database name is NULL because the database name is initialized only when process_shared_preload_libraries_in_progress is true. ``` psql=# SELECT worker_spi_launch(1) ; 2023-07-20 11:00:56.491 JST [1179891] LOG: worker_spi worker 1 initialized with schema1.counted 2023-07-20 11:00:56.491 JST [1179891] FATAL: cannot read pg_class without having selected a database at character 22 2023-07-20 11:00:56.491 JST [1179891] QUERY: select count(*) from pg_namespace where nspname = 'schema1' 2023-07-20 11:00:56.491 JST [1179891] STATEMENT: select count(*) from pg_namespace where nspname = 'schema1' 2023-07-20 11:00:56.492 JST [1179095] LOG: background worker "worker_spi" (PID 1179891) exited with exit code 1 ``` In my understanding, the restriction is not required. So, I think it's better to change the behavior. (v1-0001-Support-worker_spi-to-execute-the-function-dynamical.patch) What do you think? [1] Support to define custom wait events for extensions https://www.postgresql.org/message-id/flat/b9f5411acda0cf15c8fbb767702ff43e%40oss.nttdata.com Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On Thu, Jul 20, 2023 at 11:15:51AM +0900, Masahiro Ikeda wrote: > While I'm working on the thread[1], I found that the function of > worker_spi module fails if 'shared_preload_libraries' doesn't have > worker_spi. I guess that you were patching worker_spi to register dynamically a wait event and embed that in a TAP test or similar without loading it in shared_preload_libraries? FWIW, you could use a trick like what I am attaching here to load a wait event dynamically with the custom wait event API. You would need to make worker_spi_init_shmem() a bit more aggressive with an extra hook to reserve a shmem area size, but that's enough to show the custom wait event in the same backend as the one that launches a worker_spi dynamically, while demonstrating how the API can be used in this case. > In my understanding, the restriction is not required. So, I think it's > better to change the behavior. > (v1-0001-Support-worker_spi-to-execute-the-function-dynamical.patch) > > What do you think? +1. I'm OK to lift this restriction with a SIGHUP GUC for the database name and that's not a pattern to encourage in a template module. Will do so, if there are no objections. -- Michael
Attachment
On Thu, Jul 20, 2023 at 9:25 AM Michael Paquier <michael@paquier.xyz> wrote: > > > In my understanding, the restriction is not required. So, I think it's > > better to change the behavior. > > (v1-0001-Support-worker_spi-to-execute-the-function-dynamical.patch) > > > > What do you think? > > +1. I'm OK to lift this restriction with a SIGHUP GUC for the > database name and that's not a pattern to encourage in a template > module. Will do so, if there are no objections. +1. However, a comment above helps one to understand why some GUCs are defined before if (!process_shared_preload_libraries_in_progress). As this is an example extension, it will help understand the reasoning better. I know we will it in the commit message, but a direct comment helps: /* * Note that this GUC is defined irrespective of worker_spi shared library * presence in shared_preload_libraries. It's possible to create the * worker_spi extension and use functions without it being specified in * shared_preload_libraries. If we return from here without defining this * GUC, the dynamic workers launched by worker_spi_launch() will keep * crashing and restarting. */ -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Jul 20, 2023 at 09:43:37AM +0530, Bharath Rupireddy wrote: > +1. However, a comment above helps one to understand why some GUCs are > defined before if (!process_shared_preload_libraries_in_progress). As > this is an example extension, it will help understand the reasoning > better. I know we will it in the commit message, but a direct comment > helps: > > /* > * Note that this GUC is defined irrespective of worker_spi shared library > * presence in shared_preload_libraries. It's possible to create the > * worker_spi extension and use functions without it being specified in > * shared_preload_libraries. If we return from here without defining this > * GUC, the dynamic workers launched by worker_spi_launch() will keep > * crashing and restarting. > */ WFM to be more talkative here and document things, but I don't think that's it. How about a simple "These GUCs are defined even if this library is not loaded with shared_preload_libraries, for worker_spi_launch()." -- Michael
Attachment
On Thu, Jul 20, 2023 at 10:09 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jul 20, 2023 at 09:43:37AM +0530, Bharath Rupireddy wrote: > > +1. However, a comment above helps one to understand why some GUCs are > > defined before if (!process_shared_preload_libraries_in_progress). As > > this is an example extension, it will help understand the reasoning > > better. I know we will it in the commit message, but a direct comment > > helps: > > > > /* > > * Note that this GUC is defined irrespective of worker_spi shared library > > * presence in shared_preload_libraries. It's possible to create the > > * worker_spi extension and use functions without it being specified in > > * shared_preload_libraries. If we return from here without defining this > > * GUC, the dynamic workers launched by worker_spi_launch() will keep > > * crashing and restarting. > > */ > > WFM to be more talkative here and document things, but I don't think > that's it. How about a simple "These GUCs are defined even if this > library is not loaded with shared_preload_libraries, for > worker_spi_launch()." LGTM. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 2023-07-20 12:55, Michael Paquier wrote: > On Thu, Jul 20, 2023 at 11:15:51AM +0900, Masahiro Ikeda wrote: >> While I'm working on the thread[1], I found that the function of >> worker_spi module fails if 'shared_preload_libraries' doesn't have >> worker_spi. > > I guess that you were patching worker_spi to register dynamically a > wait event and embed that in a TAP test or similar without loading it > in shared_preload_libraries? FWIW, you could use a trick like what I > am attaching here to load a wait event dynamically with the custom > wait event API. You would need to make worker_spi_init_shmem() a bit > more aggressive with an extra hook to reserve a shmem area size, but > that's enough to show the custom wait event in the same backend as the > one that launches a worker_spi dynamically, while demonstrating how > the API can be used in this case. Yes, you're right. When I tried using worker_spi to test wait event, I found the behavior. And thanks a lot for your patch. I wasn't aware of the way. I'll merge your patch to the tests for wait events. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Hi, On 2023-07-20 13:50, Bharath Rupireddy wrote: > On Thu, Jul 20, 2023 at 10:09 AM Michael Paquier <michael@paquier.xyz> > wrote: >> >> On Thu, Jul 20, 2023 at 09:43:37AM +0530, Bharath Rupireddy wrote: >> > +1. However, a comment above helps one to understand why some GUCs are >> > defined before if (!process_shared_preload_libraries_in_progress). As >> > this is an example extension, it will help understand the reasoning >> > better. I know we will it in the commit message, but a direct comment >> > helps: >> > >> > /* >> > * Note that this GUC is defined irrespective of worker_spi shared library >> > * presence in shared_preload_libraries. It's possible to create the >> > * worker_spi extension and use functions without it being specified in >> > * shared_preload_libraries. If we return from here without defining this >> > * GUC, the dynamic workers launched by worker_spi_launch() will keep >> > * crashing and restarting. >> > */ >> >> WFM to be more talkative here and document things, but I don't think >> that's it. How about a simple "These GUCs are defined even if this >> library is not loaded with shared_preload_libraries, for >> worker_spi_launch()." > > LGTM. Thanks for discussing about the patch. I updated the patch from your comments * v2-0001-Support-worker_spi-to-execute-the-function-dynamical.patch I found another thing to be changed better. Though the tests was assumed "shared_preload_libraries = worker_spi", the background workers failed to be launched in initialized phase because the database is not created yet. ``` # make check # in src/test/modules/worker_spi # cat log/postmaster.log # in src/test/modules/worker_spi/ 2023-07-20 17:58:47.958 JST worker_spi[853620] FATAL: database "contrib_regression" does not exist 2023-07-20 17:58:47.958 JST worker_spi[853621] FATAL: database "contrib_regression" does not exist 2023-07-20 17:58:47.959 JST postmaster[853612] LOG: background worker "worker_spi" (PID 853620) exited with exit code 1 2023-07-20 17:58:47.959 JST postmaster[853612] LOG: background worker "worker_spi" (PID 853621) exited with exit code 1 ``` It's better to remove "shared_preload_libraries = worker_spi" from the test configuration. I misunderstood that two background workers would be launched and waiting at the start of the test. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On Thu, Jul 20, 2023 at 05:54:55PM +0900, Masahiro Ikeda wrote: > Yes, you're right. When I tried using worker_spi to test wait event, > I found the behavior. And thanks a lot for your patch. I wasn't aware > of the way. I'll merge your patch to the tests for wait events. Be careful when using that. I have not spent more than a few minutes to show my point, but what I sent lacks a shmem_request_hook in _PG_init(), for example, to request an amount of shared memory equal to the size of the state structure. -- Michael
Attachment
On Thu, Jul 20, 2023 at 2:59 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jul 20, 2023 at 05:54:55PM +0900, Masahiro Ikeda wrote: > > Yes, you're right. When I tried using worker_spi to test wait event, > > I found the behavior. And thanks a lot for your patch. I wasn't aware > > of the way. I'll merge your patch to the tests for wait events. > > Be careful when using that. I have not spent more than a few minutes > to show my point, but what I sent lacks a shmem_request_hook in > _PG_init(), for example, to request an amount of shared memory equal > to the size of the state structure. I think the preferred way to grab a chunk of shared memory for an external module is by using shmem_request_hook and shmem_startup_hook. Wait events shared memory too can use them. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Jul 20, 2023 at 2:38 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > Thanks for discussing about the patch. I updated the patch from your > comments > * v2-0001-Support-worker_spi-to-execute-the-function-dynamical.patch > > I found another thing to be changed better. Though the tests was assumed > "shared_preload_libraries = worker_spi", the background workers failed > to > be launched in initialized phase because the database is not created > yet. > > ``` > # make check # in src/test/modules/worker_spi > # cat log/postmaster.log # in src/test/modules/worker_spi/ > 2023-07-20 17:58:47.958 JST worker_spi[853620] FATAL: database > "contrib_regression" does not exist > 2023-07-20 17:58:47.958 JST worker_spi[853621] FATAL: database > "contrib_regression" does not exist > 2023-07-20 17:58:47.959 JST postmaster[853612] LOG: background worker > "worker_spi" (PID 853620) exited with exit code 1 > 2023-07-20 17:58:47.959 JST postmaster[853612] LOG: background worker > "worker_spi" (PID 853621) exited with exit code 1 > ``` > > It's better to remove "shared_preload_libraries = worker_spi" from the > test configuration. I misunderstood that two background workers would > be launched and waiting at the start of the test. I don't think that change is correct. The worker_spi essentially shows how to start bg workers with RegisterBackgroundWorker and dynamic bg workers with RegisterDynamicBackgroundWorker. If shared_preload_libraries = worker_spi not specified in there, you will miss to start RegisterBackgroundWorkers. Is giving an initidb time database name to worker_spi.database work there? If the database for bg workers doesn't exist, changing bgw_restart_time from BGW_NEVER_RESTART to say 1 will help to see bg workers coming up eventually. I think it's worth adding test cases for the expected number of bg workers (after creating worker_spi extension) and dynamic bg workers (after calling worker_spi_launch()). Also, to distinguish bg workers and dynamic bg workers, you can change bgw_type in worker_spi_launch to "worker_spi dynamic worker". - /* get the configuration */ + /* Get the configuration */ - /* set up common data for all our workers */ + /* Set up common data for all our workers */ These unrelated changes better be there as-is. Because, the postgres code has both commenting styles /* Get .... */ or /* get ....*/, IOW, single line comments starting with both uppercase and lowercase. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Jul 20, 2023 at 03:44:12PM +0530, Bharath Rupireddy wrote: > I don't think that change is correct. The worker_spi essentially shows > how to start bg workers with RegisterBackgroundWorker and dynamic bg > workers with RegisterDynamicBackgroundWorker. If > shared_preload_libraries = worker_spi not specified in there, you will > miss to start RegisterBackgroundWorkers. Is giving an initidb time > database name to worker_spi.database work there? If the database for > bg workers doesn't exist, changing bgw_restart_time from > BGW_NEVER_RESTART to say 1 will help to see bg workers coming up > eventually. Yeah, it does not move the needle by much. I think that we are looking at switching this module to use a TAP test in the long term, instead, where it would be possible to test the scenarios we want to look at *with* and *without* shared_preload_libraries especially with the custom wait events for extensions in mind if we add our tests in this module. It does not change the fact that Ikeda-san is right about the launch of dynamic workers with this module being broken, so I have applied v1 with the comment I have suggested. This will ease a bit the implementation of any follow-up test scenarios, while avoiding an incorrect pattern in this template module. -- Michael
Attachment
On Fri, Jul 21, 2023 at 8:38 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jul 20, 2023 at 03:44:12PM +0530, Bharath Rupireddy wrote: > > I don't think that change is correct. The worker_spi essentially shows > > how to start bg workers with RegisterBackgroundWorker and dynamic bg > > workers with RegisterDynamicBackgroundWorker. If > > shared_preload_libraries = worker_spi not specified in there, you will > > miss to start RegisterBackgroundWorkers. Is giving an initidb time > > database name to worker_spi.database work there? If the database for > > bg workers doesn't exist, changing bgw_restart_time from > > BGW_NEVER_RESTART to say 1 will help to see bg workers coming up > > eventually. > > Yeah, it does not move the needle by much. I think that we are > looking at switching this module to use a TAP test in the long term, > instead, where it would be possible to test the scenarios we want to > look at *with* and *without* shared_preload_libraries especially with > the custom wait events for extensions in mind if we add our tests in > this module. Okay. Here's a quick patch for adding TAP tests to the worker_spi module. We can change it to taste. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Fri, Jul 21, 2023 at 11:24:08AM +0530, Bharath Rupireddy wrote: > Okay. Here's a quick patch for adding TAP tests to the worker_spi > module. We can change it to taste. What do you think if we removed completely the sql/ test, moving it to TAP so as we have only one cluster set up when running a make check? worker_spi.sql only does two waits (one for the initialization and one to check that the tuple has been processed), so these could be replaced by some poll_query_until()? As we have a dynamic.conf, installcheck is not supported so we don't use anything with this switch. Besides, updating shared_preload_libraries and restarting the node in TAP is cheaper than a second initdb. - snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi worker %d", i); - snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi"); + snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi static worker %d", i); + snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi static worker"); [..] - snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi worker %d", i); - snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi"); + snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi dynamic worker %d", i); + snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi dynamic worker"); Good idea to split that. -- Michael
Attachment
On Fri, Jul 21, 2023 at 11:54 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jul 21, 2023 at 11:24:08AM +0530, Bharath Rupireddy wrote: > > Okay. Here's a quick patch for adding TAP tests to the worker_spi > > module. We can change it to taste. > > What do you think if we removed completely the sql/ test, moving it to > TAP so as we have only one cluster set up when running a make check? > worker_spi.sql only does two waits (one for the initialization and one > to check that the tuple has been processed), so these could be > replaced by some poll_query_until()? I think we can keep SQL tests around as it will help demonstrate someone quickly write their own SQL tests. > As we have a dynamic.conf, installcheck is not supported so we don't > use anything with this switch. Besides, updating > shared_preload_libraries and restarting the node in TAP is cheaper > than a second initdb. In SQL tests, I ensured worker_spi doesn't start static bg workers by setting worker_spi.total_workers = 0. Again, all of this is not necessary, but it will be a very good example for someone writing extensions and play around with custom config files, SQL and TAP tests etc. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On 2023-07-20 18:39, Bharath Rupireddy wrote: > On Thu, Jul 20, 2023 at 2:59 PM Michael Paquier <michael@paquier.xyz> > wrote: >> >> On Thu, Jul 20, 2023 at 05:54:55PM +0900, Masahiro Ikeda wrote: >> > Yes, you're right. When I tried using worker_spi to test wait event, >> > I found the behavior. And thanks a lot for your patch. I wasn't aware >> > of the way. I'll merge your patch to the tests for wait events. >> >> Be careful when using that. I have not spent more than a few minutes >> to show my point, but what I sent lacks a shmem_request_hook in >> _PG_init(), for example, to request an amount of shared memory equal >> to the size of the state structure. > > I think the preferred way to grab a chunk of shared memory for an > external module is by using shmem_request_hook and shmem_startup_hook. > Wait events shared memory too can use them. OK, I'll add the hooks in worker_spi for the test of wait events. On 2023-07-21 12:08, Michael Paquier wrote: > On Thu, Jul 20, 2023 at 03:44:12PM +0530, Bharath Rupireddy wrote: >> I don't think that change is correct. The worker_spi essentially shows >> how to start bg workers with RegisterBackgroundWorker and dynamic bg >> workers with RegisterDynamicBackgroundWorker. If >> shared_preload_libraries = worker_spi not specified in there, you will >> miss to start RegisterBackgroundWorkers. Is giving an initidb time >> database name to worker_spi.database work there? If the database for >> bg workers doesn't exist, changing bgw_restart_time from >> BGW_NEVER_RESTART to say 1 will help to see bg workers coming up >> eventually. > > Yeah, it does not move the needle by much. I think that we are > looking at switching this module to use a TAP test in the long term, > instead, where it would be possible to test the scenarios we want to > look at *with* and *without* shared_preload_libraries especially with > the custom wait events for extensions in mind if we add our tests in > this module. > > It does not change the fact that Ikeda-san is right about the launch > of dynamic workers with this module being broken, so I have applied v1 > with the comment I have suggested. This will ease a bit the > implementation of any follow-up test scenarios, while avoiding an > incorrect pattern in this template module. Thanks for the commits. As Bharath-san said, I forgot that worker_spi has an aspect of demonstration and I agree to introduce two types of tests with and without "shared_preload_libraries = worker_spi". On 2023-07-21 15:51, Bharath Rupireddy wrote: > On Fri, Jul 21, 2023 at 11:54 AM Michael Paquier <michael@paquier.xyz> > wrote: >> >> On Fri, Jul 21, 2023 at 11:24:08AM +0530, Bharath Rupireddy wrote: >> As we have a dynamic.conf, installcheck is not supported so we don't >> use anything with this switch. Besides, updating >> shared_preload_libraries and restarting the node in TAP is cheaper >> than a second initdb. > > In SQL tests, I ensured worker_spi doesn't start static bg workers by > setting worker_spi.total_workers = 0. Again, all of this is not > necessary, but it will be a very good example for someone writing > extensions and play around with custom config files, SQL and TAP tests > etc. Thanks for making the patch. I confirmed it works in my environments. > - snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi worker %d", > i); > - snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi"); > + snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi static worker > %d", i); > + snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi static > worker"); > [..] > - snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi worker %d", i); > - snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi"); > + snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi dynamic worker > %d", i); > + snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi dynamic worker"); > > Good idea to split that. I agree. It very useful. I'll refer to its implementation for the wait event tests. I have some questions about the patch. I'm ok to ignore the following comment since your patch is for PoC. (1) Do we need to change the minValue from 1 to 0 to support worker_spi.total_workers = 0? DefineCustomIntVariable("worker_spi.total_workers", "Number of workers.", NULL, &worker_spi_total_workers, 2, 1, 100, PGC_POSTMASTER, 0, NULL, NULL, NULL); (2) Do we need "worker_spi.total_workers = 0" and "shared_preload_libraries = worker_spi" in dynamic.conf. Currently, the static bg workers will not be launched because "shared_preload_libraries = worker_spi" is removed. So "worker_spi.total_workers = 0" is meaningless. (3) We need change and remove them. > # Copyright (c) 2021-2023, PostgreSQL Global Development Group > > # Test replication statistics data in pg_stat_replication_slots is sane > after > # drop replication slot and restart. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Fri, Jul 21, 2023 at 4:05 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > In SQL tests, I ensured worker_spi doesn't start static bg workers by > > setting worker_spi.total_workers = 0. Again, all of this is not > > necessary, but it will be a very good example for someone writing > > extensions and play around with custom config files, SQL and TAP tests > > etc. > > Thanks for making the patch. I confirmed it works in my environments. Thanks for verifying. > I have some questions about the patch. > > (1) > > Do we need to change the minValue from 1 to 0 to support > worker_spi.total_workers = 0? > > DefineCustomIntVariable("worker_spi.total_workers", > "Number of workers.", > NULL, > &worker_spi_total_workers, > 2, > 1, > 100, > PGC_POSTMASTER, > 0, > NULL, > NULL, > NULL); No, let's keep it that way. > (2) > > Do we need "worker_spi.total_workers = 0" and > "shared_preload_libraries = worker_spi" in dynamic.conf. > > Currently, the static bg workers will not be launched because > "shared_preload_libraries = worker_spi" is removed. So > "worker_spi.total_workers = 0" is meaningless. You're right. worker_spi.total_workers = 0 in custom.conf has no effect. without shared_preload_libraries = worker_spi. Removed that. > (3) > > We need change and remove them. > > > # Copyright (c) 2021-2023, PostgreSQL Global Development Group > > > > # Test replication statistics data in pg_stat_replication_slots is sane > > after > > # drop replication slot and restart. Modified. I'm attaching the v2 patch. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On 2023-07-22 01:05, Bharath Rupireddy wrote: > On Fri, Jul 21, 2023 at 4:05 PM Masahiro Ikeda > <ikedamsh@oss.nttdata.com> wrote: >> (2) >> >> Do we need "worker_spi.total_workers = 0" and >> "shared_preload_libraries = worker_spi" in dynamic.conf. >> >> Currently, the static bg workers will not be launched because >> "shared_preload_libraries = worker_spi" is removed. So >> "worker_spi.total_workers = 0" is meaningless. > > You're right. worker_spi.total_workers = 0 in custom.conf has no > effect. without shared_preload_libraries = worker_spi. Removed that. OK. If so, we need to remove the following comment in Makefile. > # enable our module in shared_preload_libraries for dynamic bgworkers I also confirmed that the tap tests work with meson and make. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Mon, Jul 24, 2023 at 6:34 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > OK. If so, we need to remove the following comment in Makefile. > > > # enable our module in shared_preload_libraries for dynamic bgworkers Done. > I also confirmed that the tap tests work with meson and make. Thanks for verifying. I also added a note atop worker_spi.c that the module also demonstrates how to write core (SQL) tests and extended (TAP) tests. I'm attaching the v3 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On 2023-07-24 12:01, Bharath Rupireddy wrote: > I'm attaching the v3 patch. I verified it works and it looks good to me. Thanks to your work, I will be able to implement tests for custom wait events. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Mon, Jul 24, 2023 at 08:31:01AM +0530, Bharath Rupireddy wrote: > I also added a note atop worker_spi.c that the module also > demonstrates how to write core (SQL) tests and extended (TAP) tests. The value of the SQL tests comes down to the DO blocks that emulate what the TAP tests could equally be able to do. While we already have some places that do something similar (slot.sql or postgres_fdw.sql), the SQL tests of worker_spi count for a total of five queries, which is not much with one cluster initialized: - One pg_reload_conf() to work a loop to happen in the worker. - Two sanity checks. - Two wait emulations. Anyway, most people that do serious hacking on this list care about the runtime of the tests all the time, and I am not on board in making things slower for the sake of showing a test example here particularly if there are ways to make them faster (long-term, we should be able to do the init step only once for most cases), and because we *have to* switch to TAP to have more advanced scenarios for the custom wait events or just dynamic work launches based on what we set on shared_preload_libraries. On top of that, we have other examples in the tree that emulate waits for plain SQL tests to satisfy assumptions with some follow-up query. So, I don't really agree with the value gained here compared to the execution cost of initializing two clusters for this module. I have taken the time to check how the runtime changes when switching to TAP for all the scenarios discussed here, and from my laptop, I can see that: - HEAD takes 4.4s, for only the sql/ test. - Your latest patch is at 5.6s. - My version attached to this message is at 3.7s. In terms of runtime the benefits are here for me. Note that with the first part of the test (previously in sql/), we don't lose coverage with the loop of the workers so I agree that only checking that these are launched is OK once worker_spi is in shared_preload_libraries. However, I think that we should make sure that they are connected to the correct database 'mydb'. I have updated the test to do that. So, what do you think about the attached? -- Michael
Attachment
On Mon, Jul 24, 2023 at 1:10 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Jul 24, 2023 at 08:31:01AM +0530, Bharath Rupireddy wrote: > > I also added a note atop worker_spi.c that the module also > > demonstrates how to write core (SQL) tests and extended (TAP) tests. > > In terms of runtime the benefits are here for me. Note that with the > first part of the test (previously in sql/), we don't lose coverage > with the loop of the workers so I agree that only checking that these > are launched is OK once worker_spi is in shared_preload_libraries. > However, I think that we should make sure that they are connected to > the correct database 'mydb'. I have updated the test to do that. > > So, what do you think about the attached? I disagree with removing SQL tests from the worker_spi module. As said upthread, it makes the worker_spi a fully demonstrable extension/module - one can just take it, start adding required functionality and test-cases (both SQL and TAP) for a new module. I agree that moving to TAP tests will reduce test run time by 1.9 seconds, but to me personally this is not an optimization we must be doing at the expense of demonstrability. Having said that, others might have a different opinion here. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Jul 24, 2023 at 01:50:45PM +0530, Bharath Rupireddy wrote: > I disagree with removing SQL tests from the worker_spi module. As said > upthread, it makes the worker_spi a fully demonstrable > extension/module - one can just take it, start adding required > functionality and test-cases (both SQL and TAP) for a new module. Which is basically the same thing with TAP except that these are grouped now? The value of a few raw SQL queries with a NO_INSTALLCHECK does not strike me as enough on top of having to maintain two different sets of tests. I'd still choose the cheap and extensible path here. > I agree that moving to TAP tests will reduce test run time by 1.9 > seconds, but to me personally this is not an optimization we must be > doing at the expense of demonstrability. In a large parallel run, the difference can be felt. -- Michael
Attachment
On Mon, Jul 24, 2023 at 05:38:45PM +0900, Michael Paquier wrote: > Which is basically the same thing with TAP except that these are > grouped now? The value of a few raw SQL queries with a > NO_INSTALLCHECK does not strike me as enough on top of having to > maintain two different sets of tests. I'd still choose the cheap and > extensible path here. I've been sleeping on that a bit more, and I'd still go with the refactoring where we initialize one cluster and have all the tests done by TAP, for the sake of being much cheaper without changing the coverage, while being more extensible when it comes to introduce tests for the follow-up patch on custom wait events. -- Michael
Attachment
On Wed, Jul 26, 2023 at 09:02:54AM +0900, Michael Paquier wrote: > I've been sleeping on that a bit more, and I'd still go with the > refactoring where we initialize one cluster and have all the tests > done by TAP, for the sake of being much cheaper without changing the > coverage, while being more extensible when it comes to introduce tests > for the follow-up patch on custom wait events. For now, please note that I have applied your idea to add "dynamic" to the names of the bgworkers registered on a worker_spi_launch() as this is useful on its own. I have given up on the "static" part, because that felt unconsistent with the API names, and we don't use this term in the docs for bgworkers, additionally. -- Michael
Attachment
Hi, The new test fails with my AIO branch occasionally. But I'm fairly certain that's just due to timing differences. Excerpt from the log: 2023-07-27 21:43:00.385 UTC [42339] LOG: worker_spi worker 3 initialized with schema3.counted 2023-07-27 21:43:00.399 UTC [42344] 001_worker_spi.pl LOG: statement: SELECT datname, count(datname) FROM pg_stat_activity WHERE backend_type = 'worker_spi' GROUP BY datname; 2023-07-27 21:43:00.403 UTC [42340] LOG: worker_spi worker 2 initialized with schema2.counted 2023-07-27 21:43:00.407 UTC [42341] LOG: worker_spi worker 1 initialized with schema1.counted 2023-07-27 21:43:00.420 UTC [42346] 001_worker_spi.pl LOG: statement: SELECT worker_spi_launch(1); 2023-07-27 21:43:00.423 UTC [42347] LOG: worker_spi dynamic worker 1 initialized with schema1.counted 2023-07-27 21:43:00.432 UTC [42349] 001_worker_spi.pl LOG: statement: SELECT worker_spi_launch(2); 2023-07-27 21:43:00.437 UTC [42350] LOG: worker_spi dynamic worker 2 initialized with schema2.counted 2023-07-27 21:43:00.443 UTC [42347] ERROR: duplicate key value violates unique constraint "pg_namespace_nspname_index" 2023-07-27 21:43:00.443 UTC [42347] DETAIL: Key (nspname)=(schema1) already exists. 2023-07-27 21:43:00.443 UTC [42347] CONTEXT: SQL statement "CREATE SCHEMA "schema1" CREATE TABLE "counted" ( typetext CHECK (type IN ('total', 'delta')), value integer)CREATE UNIQUE INDEX "counted_unique_total" ON "counted"(type) WHERE type = 'total'" As written, dynamic and static workers race each other. It doesn't make a lot of sense to me to use the same ids for either? The attached patch reproduces the problem on master. Note that without the sleep(3) in the test the workers don't actually finish starting, the test shuts down the cluster before that happens... Greetings, Andres Freund
Attachment
On Thu, Jul 27, 2023 at 07:23:32PM -0700, Andres Freund wrote: > As written, dynamic and static workers race each other. It doesn't make a lot > of sense to me to use the same ids for either? > > The attached patch reproduces the problem on master. > > Note that without the sleep(3) in the test the workers don't actually finish > starting, the test shuts down the cluster before that happens... So you have faced a race condition where the commit of the transaction doing the schema creation for the static workers is delayed long enough that the dynamic workers don't see it, and bumped on a catalog conflict when they try to create the same schemas. Having each bgworker on its own schema would be enough to prevent conflicts, but I'd like to add a second thing: a check on pg_stat_activity.wait_event after starting the workers. I have added something like that in the patch I have posted today for the custom wait events at [1] and it enforces the startup sequences of the workers in a stricter way. Does the attached take care of your issue? [1]: https://www.postgresql.org/message-id/ZMMUiR7kvzPWenhF@paquier.xyz -- Michael
Attachment
On Fri, Jul 28, 2023 at 10:15 AM Michael Paquier <michael@paquier.xyz> wrote: > > Having each bgworker on its own schema would be enough to prevent > conflicts, but I'd like to add a second thing: a check on > pg_stat_activity.wait_event after starting the workers. I have added > something like that in the patch I have posted today for the custom > wait events at [1] and it enforces the startup sequences of the > workers in a stricter way. > > Does the attached take care of your issue? +# check their existence. Use IDs that do not overlap with the schemas created +# by the previous workers. While using different IDs in tests is a simple fix, -1 for it. I'd prefer if worker_spi uses different schema prefixes for static and dynamic bg workers to avoid conflicts. We can either look at MyBgworkerEntry->bgw_type in worker_spi_main and have schema name as '{static, dyamic}_worker_schema_%d', id or pass schema name in bgw_extra. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Jul 28, 2023 at 10:47:39AM +0530, Bharath Rupireddy wrote: > +# check their existence. Use IDs that do not overlap with the schemas created > +# by the previous workers. > > While using different IDs in tests is a simple fix, -1 for it. I'd > prefer if worker_spi uses different schema prefixes for static and > dynamic bg workers to avoid conflicts. We can either look at > MyBgworkerEntry->bgw_type in worker_spi_main and have schema name as > '{static, dyamic}_worker_schema_%d', id or pass schema name in > bgw_extra. For the sake of a test module, I am not really convinced that there is any need to go down to such complexity with the names of the schemas created. -- Michael
Attachment
On Fri, Jul 28, 2023 at 1:26 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jul 28, 2023 at 10:47:39AM +0530, Bharath Rupireddy wrote: > > +# check their existence. Use IDs that do not overlap with the schemas created > > +# by the previous workers. > > > > While using different IDs in tests is a simple fix, -1 for it. I'd > > prefer if worker_spi uses different schema prefixes for static and > > dynamic bg workers to avoid conflicts. We can either look at > > MyBgworkerEntry->bgw_type in worker_spi_main and have schema name as > > '{static, dyamic}_worker_schema_%d', id or pass schema name in > > bgw_extra. > > For the sake of a test module, I am not really convinced that there is > any need to go down to such complexity with the names of the schemas > created. I don't think something like [1] is complex. It makes worker_spi foolproof. Rather, the other approach proposed, that is to provide non-conflicting worker IDs to worker_spi_launch in the TAP test file, looks complicated to me. And it's easy for someone to come, add a test case with conflicting IDs input to worker_spi_launch and end up in the same state that we're in now. [1] diff --git a/src/test/modules/worker_spi/t/001_worker_spi.pl b/src/test/modules/worker_spi/t/001_worker_spi.pl index c293871313..700530afc7 100644 --- a/src/test/modules/worker_spi/t/001_worker_spi.pl +++ b/src/test/modules/worker_spi/t/001_worker_spi.pl @@ -27,16 +27,16 @@ is($result, 't', "dynamic bgworker launched"); $node->poll_query_until( 'postgres', qq[SELECT count(*) > 0 FROM information_schema.tables - WHERE table_schema = 'schema4' AND table_name = 'counted';]); + WHERE table_schema = 'dynamic_worker_schema4' AND table_name = 'counted';]); $node->safe_psql('postgres', - "INSERT INTO schema4.counted VALUES ('total', 0), ('delta', 1);"); + "INSERT INTO dynamic_worker_schema4.counted VALUES ('total', 0), ('delta', 1);"); # Issue a SIGHUP on the node to force the worker to loop once, accelerating # this test. $node->reload; # Wait until the worker has processed the tuple that has just been inserted. $node->poll_query_until('postgres', - qq[SELECT count(*) FROM schema4.counted WHERE type = 'delta';], '0'); -$result = $node->safe_psql('postgres', 'SELECT * FROM schema4.counted;'); + qq[SELECT count(*) FROM dynamic_worker_schema4.counted WHERE type = 'delta';], '0'); +$result = $node->safe_psql('postgres', 'SELECT * FROM dynamic_worker_schema4.counted;'); is($result, qq(total|1), 'dynamic bgworker correctly consumed tuple data'); note "testing bgworkers loaded with shared_preload_libraries"; diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c index 903dcddef9..02b4204aa2 100644 --- a/src/test/modules/worker_spi/worker_spi.c +++ b/src/test/modules/worker_spi/worker_spi.c @@ -135,10 +135,19 @@ worker_spi_main(Datum main_arg) int index = DatumGetInt32(main_arg); worktable *table; StringInfoData buf; - char name[20]; + char name[NAMEDATALEN]; table = palloc(sizeof(worktable)); - sprintf(name, "schema%d", index); + + /* + * Use different schema names for static and dynamic bg workers to avoid + * name conflicts. + */ + if (strcmp(MyBgworkerEntry->bgw_type, "worker_spi") == 0) + sprintf(name, "worker_schema%d", index); + else if (strcmp(MyBgworkerEntry->bgw_type, "worker_spi dynamic") == 0) + sprintf(name, "dynamic_worker_schema%d", index); + table->schema = pstrdup(name); table->name = pstrdup("counted"); -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Jul 28, 2023 at 02:11:48PM +0530, Bharath Rupireddy wrote: > I don't think something like [1] is complex. It makes worker_spi > foolproof. Rather, the other approach proposed, that is to provide > non-conflicting worker IDs to worker_spi_launch in the TAP test file, > looks complicated to me. And it's easy for someone to come, add a test > case with conflicting IDs input to worker_spi_launch and end up in the > same state that we're in now. Sure, but that's not really something that worries me for a template such as this one, for the sake of these tests. So I'd leave things to be as they are, slightly simpler. That's a minor point, for sure :) -- Michael
Attachment
On 2023-Jul-28, Michael Paquier wrote: > So you have faced a race condition where the commit of the transaction > doing the schema creation for the static workers is delayed long > enough that the dynamic workers don't see it, and bumped on a catalog > conflict when they try to create the same schemas. > > Having each bgworker on its own schema would be enough to prevent > conflicts, but I'd like to add a second thing: a check on > pg_stat_activity.wait_event after starting the workers. I have added > something like that in the patch I have posted today for the custom > wait events at [1] and it enforces the startup sequences of the > workers in a stricter way. Hmm, I think having all the workers doing their in the same table is better -- if nothing else, because it gives us the opportunity to show how to use some other coding technique (but also because we are forced to write the SQL code in a way that's correct for potentially multiple concurrent workers, which sounds useful to demonstrate). Can't we instead solve the race condition by having some shared resource that blocks the other workers from proceeding until the schema has been created? Perhaps an LWLock, or a condition variable, or an advisory lock. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Hi, On 2023-07-28 13:45:29 +0900, Michael Paquier wrote: > Having each bgworker on its own schema would be enough to prevent > conflicts, but I'd like to add a second thing: a check on > pg_stat_activity.wait_event after starting the workers. I have added > something like that in the patch I have posted today for the custom > wait events at [1] and it enforces the startup sequences of the > workers in a stricter way. Is that very meaningful? ISTM the interesting thing to check for would be that the state is idle? Greetings, Andres Freund
On Fri, Jul 28, 2023 at 01:34:15PM -0700, Andres Freund wrote: > On 2023-07-28 13:45:29 +0900, Michael Paquier wrote: >> Having each bgworker on its own schema would be enough to prevent >> conflicts, but I'd like to add a second thing: a check on >> pg_stat_activity.wait_event after starting the workers. I have added >> something like that in the patch I have posted today for the custom >> wait events at [1] and it enforces the startup sequences of the >> workers in a stricter way. > > Is that very meaningful? ISTM the interesting thing to check for would be that > the state is idle? That's interesting for the sake of the other patch to check that the custom events are reported. Anyway, I am a bit short in time, so I have applied the simplest fix where the dynamic workers just use a different base ID to get out of your way. -- Michael
Attachment
On Fri, Jul 28, 2023 at 12:06:33PM +0200, Alvaro Herrera wrote: > Hmm, I think having all the workers doing their in the same table is > better -- if nothing else, because it gives us the opportunity to show > how to use some other coding technique (but also because we are forced > to write the SQL code in a way that's correct for potentially multiple > concurrent workers, which sounds useful to demonstrate). Can't we > instead solve the race condition by having some shared resource that > blocks the other workers from proceeding until the schema has been > created? Perhaps an LWLock, or a condition variable, or an advisory > lock. That's an idea interesting idea that you have here. So basically, you would have all the workers use the same schema do their counting work for the same base table? Or should each worker use the same schema, perhaps defined by a GUC, but different tables? One thing that has been itching me a bit with this module was to be able to pass down to the main worker routine more arguments than just an int ID, but I could not find myself do that for just for the wait event patch, like: - The database to connect to. - The table to create. - The schema to use. If any of these are NULL, just use as default what we have now, with perhaps the bgworker PID as ID instead of a user-specified one. Having a shared memory state is second thing I was planning to add, and that can be useful as point of reference in a template. The other patch about custom wait events introduces that, FWIW, to track the custom wait events added. -- Michael