Re: assessing parallel-safety - Mailing list pgsql-hackers

From Noah Misch
Subject Re: assessing parallel-safety
Date
Msg-id 20150215232447.GA3966927@tornado.leadboat.com
Whole thread Raw
In response to Re: assessing parallel-safety  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: assessing parallel-safety
List pgsql-hackers
It's important for parallelism to work under the extended query protocol, and
that's nontrivial.  exec_parse_message() sets cursorOptions.
exec_bind_message() runs the planner.  We don't know if a parallel plan is
okay before seeing max_rows in exec_execute_message().

On Sun, Feb 15, 2015 at 01:29:06AM -0500, Robert Haas wrote:
> - For testing purposes, I set this up so that the executor activates
> parallel mode when running any query that seems to be parallel-safe.
> It seems nearly certain we'd only want to do this when a parallel plan
> was actually selected, but this behavior is awfully useful for
> testing, and revealed a number of bugs in the parallel-safety
> markings.

Why wouldn't that be a good permanent behavior?  The testing benefit extends
to users.  If I wrongly mark a function something other than
PROPARALLEL_UNSAFE, I'd rather find out ASAP.

> - The patch includes a script, fixpgproc.pl, which I used to insert
> and update the parallel markings on the system catalogs.  That is, of
> course, not intended for commit, but it's mighty useful for
> experimentation.

The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql
reset proparallel=u for some built-in functions, such as make_interval.

Some categories of PROPARALLEL_SAFE functions deserve more attention:

- database_to_xml* need to match proparallel of schema_to_xml*, which are PROPARALLEL_UNSAFE due to the heavyweight
lockrule.  cursor_to_xml* probably need to do likewise.
 

- Several functions, such as array_in and anytextcat, can call arbitrary I/O functions.  This implies a policy that I/O
functionsmust be PROPARALLEL_SAFE.  I agree with that decision, but it should be documented. Several json-related
functionscan invoke X->json cast functions; the marking implies that all such cast functions must be PROPARALLEL_SAFE.
That'sprobably okay, too.
 

- All selectivity estimator functions are PROPARALLEL_SAFE.  That implies, for example, that any operator using
eqjoinselought to be PROPARALLEL_SAFE or define its own restriction selectivity estimator function.  On the other hand,
theplanner need not check the parallel safety of selectivity estimator functions.  If we're already in parallel mode at
themoment we enter the planner, we must be planning a query called within a PROPARALLEL_SAFE function.  If the query
usesan unsafe operator, the containing function was mismarked.
 

- inet_{client,server}_{addr,port} are PROPARALLEL_SAFE; will the information they need indeed be available in
workers?

- Assuming you don't want to propagate XactLastRecEnd from the slave back to the master, restrict XLogInsert() during
parallelmode.  Restrict functions that call it, including pg_create_restore_point, pg_switch_xlog and pg_stop_backup.
 

- pg_extension_config_dump modifies the database, so it shall be PROPARALLEL_UNSAFE.  Assuming pg_stat_clear_snapshot
won'treach back to the master to clear the snapshot, it shall be PROPARALLEL_RESTRICTED.
 

If a mismarked function elicits "ERROR:  cannot retain locks acquired while in
parallel mode", innocent queries later in the transaction get the same error:

begin;
select 1;            -- ok
savepoint q; select database_to_xml(true, true, 'x'); rollback to q; -- ERROR
select 1;            -- ERROR
rollback;

> --- a/src/backend/commands/typecmds.c
> +++ b/src/backend/commands/typecmds.c
> @@ -1610,6 +1610,7 @@ makeRangeConstructors(const char *name, Oid namespace,
>                                    false,        /* leakproof */
>                                    false,        /* isStrict */
>                                    PROVOLATILE_IMMUTABLE,        /* volatility */
> +                                  PROPARALLEL_UNSAFE,            /* parallel safety */

Range constructors are PROPARALLEL_SAFE.

> --- a/src/backend/executor/functions.c
> +++ b/src/backend/executor/functions.c
> @@ -496,7 +496,9 @@ init_execution_state(List *queryTree_list,
>              if (queryTree->commandType == CMD_UTILITY)
>                  stmt = queryTree->utilityStmt;
>              else
> -                stmt = (Node *) pg_plan_query(queryTree, 0, NULL);
> +                stmt = (Node *) pg_plan_query(queryTree,
> +                    fcache->readonly_func ? CURSOR_OPT_PARALLEL_OK : 0,
> +                    NULL);

(This code is deciding whether to allow parallelism in one statement of an
SQL-language function.)  Why does it care if the function is read-only?

I see that PL/pgSQL never allows parallelism in queries it launches.

> --- a/src/include/utils/lsyscache.h
> +++ b/src/include/utils/lsyscache.h
> @@ -79,6 +79,7 @@ extern bool op_mergejoinable(Oid opno, Oid inputtype);
>  extern bool op_hashjoinable(Oid opno, Oid inputtype);
>  extern bool op_strict(Oid opno);
>  extern char op_volatile(Oid opno);
> +extern char op_parallel(Oid opno);

No such function defined.


In passing, I noticed a few cosmetic problems in the prerequisite
parallel-mode-v6.patch.  Typos: initating Musn't paralell paralellism
paralllelism processe.  Capitalization of "transactionIdPrecedes".  When
applying the patch, this:

$ git apply ../rev/parallel-mode-v6.patch 
../rev/parallel-mode-v6.patch:2897: indent with spaces.                        (nLocksAcquiredInParallelMode - n));
warning: 1 line adds whitespace errors.



pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: gcc5: initdb produces gigabytes of _fsm files
Next
From: David Steele
Date:
Subject: Issue installing doc tools on OSX