assessing parallel-safety - Mailing list pgsql-hackers

From Robert Haas
Subject assessing parallel-safety
Date
Msg-id CA+TgmoarOjAY6v+WJEKObAQjGH5aU0ys-cytEdsW_E25csoVig@mail.gmail.com
Whole thread Raw
Responses Re: assessing parallel-safety  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
Amit's parallel sequential scan assumes that we can enter parallel
mode when the parallel sequential scan is initialized and exit
parallel mode when the scan ends and all the code that runs in between
will be happy with that.  Unfortunately, that's not necessarily the
case.  There are two ways it can fail:

1. Some other part of the query can contain functions that are not
safe to run in parallel-mode; e.g. a PL/pgsql function that writes
data or uses subtransactions.
2. The user can run partially execute the query and then, while
execution is suspended, go do something not parallel-safe with the
results before resuming query execution.

To properly assess whether a query is parallel-safe, we need to
inspect the entire query for non-parallel-safe functions.  We also
need the code that's going to execute the plan to tell us whether or
not they might want to do not-parallel-safe things between the time we
start running the query and the time we finish running it.  So I tried
writing some code to address this; a first cut is attached.  Here's
what it does:

1. As we parse each query, it sets a flag in the parse-state if we see
a non-immutable function.  For the time being, I'm assuming immutable
== parallel-safe, although that's probably not correct in detail.  It
also sets the flag if it sees a data-modifying operation, meaning an
insert, update, delete, or locking clause.  The point of this is to
avoid making an extra pass over the query just to assess
parallel-safety; we want to accumulate that information as we go
along.

2. When parsing is complete, the parse-state flag is copied into the
Query, similar to what we already do for flags like hasModifyingCTE.

3. When the query is planned, planner() sets a flag in the
PlannerGlobal called parallelModeOK if the Query is not marked as
parallel-mode unsafe.  There's also a new cursor option,
CURSOR_OPT_NO_PARALLEL, with forces parallelModeOK to false regardless
of what the Query says.  It initializes another flag
parallelModeNeeded to false as well.  The idea here is that before
generating a parallel path, the planner should examine parallelModeOK
and skip it if that's false.  If we end up creating a plan from a
parallel path, then the plan-generation function should set
parallelModeNeeded.

4. At the conclusion of planning, the parallelModeNeeded flag is
copied from the PlannerGlobal to the PlannedStmt.

5. ExecutorStart() calls EnterParallelMode() if parallelModeNeeded is
set and we're not already in parallel mode.  ExecutorEnd() calls
ExitParallelMode() if EnterParallelMode() was called in
ExecutorStart().

There are a few problems with this design that I don't immediately
know how to solve:

1. I'm concerned that the query-rewrite step could substitute a query
that is not parallel-safe for one that is.  The upper Query might
still be flagged as safe, and that's all that planner() looks at.

2. Interleaving the execution of two parallel queries by firing up two
copies of the executor simultaneously can result in leaving parallel
mode at the wrong time.

3. Any code using SPI has to think hard about whether to pass
OPT_CURSOR_NO_PARALLEL.  For example, PL/pgsql doesn't need to pass
this flag when caching a plan for a query that will be run to
completion each time it's executed.  But it DOES need to pass the flag
for a FOR loop over an SQL statement, because the code inside the FOR
loop might do parallel-unsafe things while the query is suspended.

Thoughts, either on the general approach or on what to do about the problems?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Andreas Karlsson
Date:
Subject: Re: PATCH: Reducing lock strength of trigger and foreign key DDL
Next
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan