Re: pg_background (and more parallelism infrastructure patches) - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: pg_background (and more parallelism infrastructure patches) |
Date | |
Msg-id | CAA4eK1J=X_Kuhm8XQD37+yQiqnYF04OsZy4KCRxWu4Wz1rYy5Q@mail.gmail.com Whole thread Raw |
In response to | pg_background (and more parallelism infrastructure patches) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: pg_background (and more parallelism infrastructure patches)
|
List | pgsql-hackers |
On Fri, Jul 25, 2014 at 11:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Patch 4 adds infrastructure that allows one session to save all of its
> non-default GUC values and another session to reload those values.
> This was written by Amit Khandekar and Noah Misch. It allows
> pg_background to start up the background worker with the same GUC
> settings that the launching process is using. I intend this as a
> demonstration of how to synchronize any given piece of state between
> cooperating backends. For real parallelism, we'll need to synchronize
> snapshots, combo CIDs, transaction state, and so on, in addition to
> GUCs. But GUCs are ONE of the things that we'll need to synchronize
> in that context, and this patch shows the kind of API we're thinking
> about for these sorts of problems.
> Patch 6 is pg_background itself. I'm quite pleased with how easily
> this came together. The existing background worker, dsm, shm_toc, and
> shm_mq infrastructure handles most of the heavily lifting here -
> obviously with some exceptions addressed by the preceding patches.
> Again, this is the kind of set-up that I'm expecting will happen in a
> background worker used for actual parallelism - clearly, more state
> will need to be restored there than here, but nonetheless the general
> flow of the code here is about what I'm imagining, just with somewhat
> more different kinds of state. Most of the work of writing this patch
> was actually figuring out how to execute the query itself; what I
> ended up with is mostly copied form exec_simple_query, but with some
> difference here and there. I'm not sure if it would be
> possible/advisable to try to refactor to reduce duplication.
> Patch 4 adds infrastructure that allows one session to save all of its
> non-default GUC values and another session to reload those values.
> This was written by Amit Khandekar and Noah Misch. It allows
> pg_background to start up the background worker with the same GUC
> settings that the launching process is using. I intend this as a
> demonstration of how to synchronize any given piece of state between
> cooperating backends. For real parallelism, we'll need to synchronize
> snapshots, combo CIDs, transaction state, and so on, in addition to
> GUCs. But GUCs are ONE of the things that we'll need to synchronize
> in that context, and this patch shows the kind of API we're thinking
> about for these sorts of problems.
Don't we need some way to prohibit changing GUC by launching process,
once it has shared the existing GUC?
> Patch 6 is pg_background itself. I'm quite pleased with how easily
> this came together. The existing background worker, dsm, shm_toc, and
> shm_mq infrastructure handles most of the heavily lifting here -
> obviously with some exceptions addressed by the preceding patches.
> Again, this is the kind of set-up that I'm expecting will happen in a
> background worker used for actual parallelism - clearly, more state
> will need to be restored there than here, but nonetheless the general
> flow of the code here is about what I'm imagining, just with somewhat
> more different kinds of state. Most of the work of writing this patch
> was actually figuring out how to execute the query itself; what I
> ended up with is mostly copied form exec_simple_query, but with some
> difference here and there. I'm not sure if it would be
> possible/advisable to try to refactor to reduce duplication.
1. This patch generates warning on windows
1>pg_background.obj : error LNK2001: unresolved external symbol StatementTimeout
You need to add PGDLLIMPORT for StatementTimeout
2.
CREATE FUNCTION pg_background_launch(sql pg_catalog.text,
queue_size pg_catalog.int4 DEFAULT 65536)
Here shouldn't queue_size be pg_catalog.int8 as I could see
some related functions in test_shm_mq uses int8?
CREATE FUNCTION test_shm_mq(queue_size pg_catalog.int8,
CREATE FUNCTION test_shm_mq_pipelined(queue_size pg_catalog.int8,
Anyway I think corresponding C function doesn't use matching function
to extract the function args.
pg_background_launch(PG_FUNCTION_ARGS)
{
text *sql = PG_GETARG_TEXT_PP(0);
int32 queue_size = PG_GETARG_INT64(1);
Here it should _INT32 variant to match with current sql definition,
otherwise it leads to below error.
postgres=# select pg_background_launch('vacuum verbose foo');
ERROR: queue size must be at least 64 bytes
3.
Comparing execute_sql_string() and exec_simple_query(), I could see below
main differences:
a. Allocate a new memory context different from message context
b. Start/commit control of transaction is outside execute_sql_string
c. enable/disable statement timeout is done from outside incase of execute_sql_string()
d. execute_sql_string() prohibits Start/Commit/Abort transaction statements.
e. execute_sql_string() doesn't log statements
f. execute_sql_string() uses binary format for result whereas exec_simple_query()
uses TEXT as defult format
g. processed stat related info from caller incase of execute_sql_string().
Can't we handle all these or other changes inside exec_simple_query()
based on some parameter or may be a use a hook similar to what we
do in ProcessUtility?
Basically it looks bit odd to have a duplicate (mostly) copy of
exec_simple_query().
4.
Won't it be better if pg_background_worker_main() can look more
like PostgresMain() (in terms of handling different kind of messages),
so that it can be extended in future to handle parallel worker.
5.
pg_background_result()
{
..
/* Read and processes messages from the shared memory queue. */
}
Shouldn't the processing of messages be a separate function as
we do for pqParseInput3().
pgsql-hackers by date: