Re: assessing parallel-safety - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: assessing parallel-safety |
Date | |
Msg-id | 20150215232447.GA3966927@tornado.leadboat.com Whole thread Raw |
In response to | Re: assessing parallel-safety (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: assessing parallel-safety
|
List | pgsql-hackers |
It's important for parallelism to work under the extended query protocol, and that's nontrivial. exec_parse_message() sets cursorOptions. exec_bind_message() runs the planner. We don't know if a parallel plan is okay before seeing max_rows in exec_execute_message(). On Sun, Feb 15, 2015 at 01:29:06AM -0500, Robert Haas wrote: > - For testing purposes, I set this up so that the executor activates > parallel mode when running any query that seems to be parallel-safe. > It seems nearly certain we'd only want to do this when a parallel plan > was actually selected, but this behavior is awfully useful for > testing, and revealed a number of bugs in the parallel-safety > markings. Why wouldn't that be a good permanent behavior? The testing benefit extends to users. If I wrongly mark a function something other than PROPARALLEL_UNSAFE, I'd rather find out ASAP. > - The patch includes a script, fixpgproc.pl, which I used to insert > and update the parallel markings on the system catalogs. That is, of > course, not intended for commit, but it's mighty useful for > experimentation. The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql reset proparallel=u for some built-in functions, such as make_interval. Some categories of PROPARALLEL_SAFE functions deserve more attention: - database_to_xml* need to match proparallel of schema_to_xml*, which are PROPARALLEL_UNSAFE due to the heavyweight lockrule. cursor_to_xml* probably need to do likewise. - Several functions, such as array_in and anytextcat, can call arbitrary I/O functions. This implies a policy that I/O functionsmust be PROPARALLEL_SAFE. I agree with that decision, but it should be documented. Several json-related functionscan invoke X->json cast functions; the marking implies that all such cast functions must be PROPARALLEL_SAFE. That'sprobably okay, too. - All selectivity estimator functions are PROPARALLEL_SAFE. That implies, for example, that any operator using eqjoinselought to be PROPARALLEL_SAFE or define its own restriction selectivity estimator function. On the other hand, theplanner need not check the parallel safety of selectivity estimator functions. If we're already in parallel mode at themoment we enter the planner, we must be planning a query called within a PROPARALLEL_SAFE function. If the query usesan unsafe operator, the containing function was mismarked. - inet_{client,server}_{addr,port} are PROPARALLEL_SAFE; will the information they need indeed be available in workers? - Assuming you don't want to propagate XactLastRecEnd from the slave back to the master, restrict XLogInsert() during parallelmode. Restrict functions that call it, including pg_create_restore_point, pg_switch_xlog and pg_stop_backup. - pg_extension_config_dump modifies the database, so it shall be PROPARALLEL_UNSAFE. Assuming pg_stat_clear_snapshot won'treach back to the master to clear the snapshot, it shall be PROPARALLEL_RESTRICTED. If a mismarked function elicits "ERROR: cannot retain locks acquired while in parallel mode", innocent queries later in the transaction get the same error: begin; select 1; -- ok savepoint q; select database_to_xml(true, true, 'x'); rollback to q; -- ERROR select 1; -- ERROR rollback; > --- a/src/backend/commands/typecmds.c > +++ b/src/backend/commands/typecmds.c > @@ -1610,6 +1610,7 @@ makeRangeConstructors(const char *name, Oid namespace, > false, /* leakproof */ > false, /* isStrict */ > PROVOLATILE_IMMUTABLE, /* volatility */ > + PROPARALLEL_UNSAFE, /* parallel safety */ Range constructors are PROPARALLEL_SAFE. > --- a/src/backend/executor/functions.c > +++ b/src/backend/executor/functions.c > @@ -496,7 +496,9 @@ init_execution_state(List *queryTree_list, > if (queryTree->commandType == CMD_UTILITY) > stmt = queryTree->utilityStmt; > else > - stmt = (Node *) pg_plan_query(queryTree, 0, NULL); > + stmt = (Node *) pg_plan_query(queryTree, > + fcache->readonly_func ? CURSOR_OPT_PARALLEL_OK : 0, > + NULL); (This code is deciding whether to allow parallelism in one statement of an SQL-language function.) Why does it care if the function is read-only? I see that PL/pgSQL never allows parallelism in queries it launches. > --- a/src/include/utils/lsyscache.h > +++ b/src/include/utils/lsyscache.h > @@ -79,6 +79,7 @@ extern bool op_mergejoinable(Oid opno, Oid inputtype); > extern bool op_hashjoinable(Oid opno, Oid inputtype); > extern bool op_strict(Oid opno); > extern char op_volatile(Oid opno); > +extern char op_parallel(Oid opno); No such function defined. In passing, I noticed a few cosmetic problems in the prerequisite parallel-mode-v6.patch. Typos: initating Musn't paralell paralellism paralllelism processe. Capitalization of "transactionIdPrecedes". When applying the patch, this: $ git apply ../rev/parallel-mode-v6.patch ../rev/parallel-mode-v6.patch:2897: indent with spaces. (nLocksAcquiredInParallelMode - n)); warning: 1 line adds whitespace errors.
pgsql-hackers by date: