Re: assessing parallel-safety - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: assessing parallel-safety |
Date | |
Msg-id | CA+TgmoZQU3qN_v=xB_4=De-S=-Thy2VV54NKUf4Atxth=zbwFA@mail.gmail.com Whole thread Raw |
In response to | Re: assessing parallel-safety (Noah Misch <noah@leadboat.com>) |
Responses |
Re: assessing parallel-safety
(Noah Misch <noah@leadboat.com>)
Re: assessing parallel-safety (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Sun, Feb 15, 2015 at 6:24 PM, Noah Misch <noah@leadboat.com> wrote: > It's important for parallelism to work under the extended query protocol, and > that's nontrivial. exec_parse_message() sets cursorOptions. > exec_bind_message() runs the planner. We don't know if a parallel plan is > okay before seeing max_rows in exec_execute_message(). Yes, that's a problem. One could require users of the extended query protocol to indicate their willingness to accept a parallel query plan when sending the bind message by setting the appropriate cursor option; and one could, when that option is specified, refuse execute messages with max_rows != 0. However, that has the disadvantage of forcing all clients to be updated for the new world order. Or, one could assume a parallel plan is OK and re-plan if we choose one and then a non-zero max_rows shows up. In the worst case, though, that could require every query to be planned twice. > On Sun, Feb 15, 2015 at 01:29:06AM -0500, Robert Haas wrote: >> - For testing purposes, I set this up so that the executor activates >> parallel mode when running any query that seems to be parallel-safe. >> It seems nearly certain we'd only want to do this when a parallel plan >> was actually selected, but this behavior is awfully useful for >> testing, and revealed a number of bugs in the parallel-safety >> markings. > > Why wouldn't that be a good permanent behavior? The testing benefit extends > to users. If I wrongly mark a function something other than > PROPARALLEL_UNSAFE, I'd rather find out ASAP. It might be a good permanent behavior, or it might just annoy users unnecessarily. I am unable to guess what is best. >> - The patch includes a script, fixpgproc.pl, which I used to insert >> and update the parallel markings on the system catalogs. That is, of >> course, not intended for commit, but it's mighty useful for >> experimentation. > > The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql > reset proparallel=u for some built-in functions, such as make_interval. I can fix that if I add a PARALLEL { SAFE | RESTRICTED | UNSAFE } clause to CREATE FUNCTION. Or are you saying that CORF shouldn't change the existing value of the flag? I'm not sure exactly what our policy is here, but I would expect that the CORF is doing the right thing and we need to add the necessary syntax to CREATE FUNCTION and then add that clause to those calls. > Some categories of PROPARALLEL_SAFE functions deserve more attention: > > - database_to_xml* need to match proparallel of schema_to_xml*, which are > PROPARALLEL_UNSAFE due to the heavyweight lock rule. cursor_to_xml* > probably need to do likewise. Thanks, fixed. > - Several functions, such as array_in and anytextcat, can call arbitrary I/O > functions. This implies a policy that I/O functions must be > PROPARALLEL_SAFE. I agree with that decision, but it should be documented. Where? > Several json-related functions can invoke X->json cast functions; the > marking implies that all such cast functions must be PROPARALLEL_SAFE. > That's probably okay, too. > > - All selectivity estimator functions are PROPARALLEL_SAFE. That implies, for > example, that any operator using eqjoinsel ought to be PROPARALLEL_SAFE or > define its own restriction selectivity estimator function. On the other > hand, the planner need not check the parallel safety of selectivity > estimator functions. If we're already in parallel mode at the moment we > enter the planner, we must be planning a query called within a > PROPARALLEL_SAFE function. If the query uses an unsafe operator, the > containing function was mismarked. This seems backwards to me. If some hypothetical selectivity estimator were PROPARALLEL_UNSAFE, then any operator that uses that function would also need to be PROPARALLEL_UNSAFE. The reverse is not true. The operator can be as unsafe as it likes and still use a safe estimator. > - inet_{client,server}_{addr,port} are PROPARALLEL_SAFE; will the information > they need indeed be available in workers? Nope, marked those PROPARALLEL_RESTRICTED. Thanks. > - Assuming you don't want to propagate XactLastRecEnd from the slave back to > the master, restrict XLogInsert() during parallel mode. Restrict functions > that call it, including pg_create_restore_point, pg_switch_xlog and > pg_stop_backup. Hmm. Altogether prohibiting XLogInsert() in parallel mode seems unwise, because it would foreclose heap_page_prune_opt() in workers. I realize there's separate conversation about whether pruning during SELECT queries is good policy, but in the interested of separating mechanism from policy, and in the sure knowledge that allowing at least some writes in parallel mode is certainly going to be something people will want, it seems better to look into propagating XactLastRecEnd. > - pg_extension_config_dump modifies the database, so it shall be > PROPARALLEL_UNSAFE. Assuming pg_stat_clear_snapshot won't reach back to > the master to clear the snapshot, it shall be PROPARALLEL_RESTRICTED. Right, makes sense. > If a mismarked function elicits "ERROR: cannot retain locks acquired while in > parallel mode", innocent queries later in the transaction get the same error: > > begin; > select 1; -- ok > savepoint q; select database_to_xml(true, true, 'x'); rollback to q; -- ERROR > select 1; -- ERROR > rollback; Hmm, I guess that's a bug in the parallel-mode patch. Will fix. >> --- a/src/backend/commands/typecmds.c >> +++ b/src/backend/commands/typecmds.c >> @@ -1610,6 +1610,7 @@ makeRangeConstructors(const char *name, Oid namespace, >> false, /* leakproof */ >> false, /* isStrict */ >> PROVOLATILE_IMMUTABLE, /* volatility */ >> + PROPARALLEL_UNSAFE, /* parallel safety*/ > > Range constructors are PROPARALLEL_SAFE. Changed. >> --- a/src/backend/executor/functions.c >> +++ b/src/backend/executor/functions.c >> @@ -496,7 +496,9 @@ init_execution_state(List *queryTree_list, >> if (queryTree->commandType == CMD_UTILITY) >> stmt = queryTree->utilityStmt; >> else >> - stmt = (Node *) pg_plan_query(queryTree, 0, NULL); >> + stmt = (Node *) pg_plan_query(queryTree, >> + fcache->readonly_func ? CURSOR_OPT_PARALLEL_OK : 0, >> + NULL); > > (This code is deciding whether to allow parallelism in one statement of an > SQL-language function.) Why does it care if the function is read-only? I don't know what I was thinking there. Interestingly, changing that causes several tests to fail with complaints about lock retention: SELECT p.name, name(p.hobbies), name(equipment(p.hobbies)) FROM ONLY person p; SELECT p.name, name(p.hobbies), name(equipment(p.hobbies)) FROM person* p; SELECT name(equipment(p.hobbies)), p.name, name(p.hobbies) FROM ONLY person p; SELECT (p.hobbies).equipment.name, p.name, name(p.hobbies) FROM person* p; SELECT (p.hobbies).equipment.name, name(p.hobbies), p.name FROM ONLY person p; SELECT name(equipment(p.hobbies)), name(p.hobbies), p.name FROM person* p; I don't immediately know why. > I see that PL/pgSQL never allows parallelism in queries it launches. Yeah, I mentioned to mention that limitation in my initial email. Fixed. >> --- a/src/include/utils/lsyscache.h >> +++ b/src/include/utils/lsyscache.h >> @@ -79,6 +79,7 @@ extern bool op_mergejoinable(Oid opno, Oid inputtype); >> extern bool op_hashjoinable(Oid opno, Oid inputtype); >> extern bool op_strict(Oid opno); >> extern char op_volatile(Oid opno); >> +extern char op_parallel(Oid opno); > > No such function defined. Removed. > In passing, I noticed a few cosmetic problems in the prerequisite > parallel-mode-v6.patch. Typos: initating Musn't paralell paralellism > paralllelism processe. Capitalization of "transactionIdPrecedes". When > applying the patch, this: > > $ git apply ../rev/parallel-mode-v6.patch > ../rev/parallel-mode-v6.patch:2897: indent with spaces. > (nLocksAcquiredInParallelMode - n)); > warning: 1 line adds whitespace errors. OK, will fix those. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: