assessing parallel-safety - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | assessing parallel-safety |
Date | |
Msg-id | CA+TgmoarOjAY6v+WJEKObAQjGH5aU0ys-cytEdsW_E25csoVig@mail.gmail.com Whole thread Raw |
Responses |
Re: assessing parallel-safety
(Noah Misch <noah@leadboat.com>)
|
List | pgsql-hackers |
Amit's parallel sequential scan assumes that we can enter parallel mode when the parallel sequential scan is initialized and exit parallel mode when the scan ends and all the code that runs in between will be happy with that. Unfortunately, that's not necessarily the case. There are two ways it can fail: 1. Some other part of the query can contain functions that are not safe to run in parallel-mode; e.g. a PL/pgsql function that writes data or uses subtransactions. 2. The user can run partially execute the query and then, while execution is suspended, go do something not parallel-safe with the results before resuming query execution. To properly assess whether a query is parallel-safe, we need to inspect the entire query for non-parallel-safe functions. We also need the code that's going to execute the plan to tell us whether or not they might want to do not-parallel-safe things between the time we start running the query and the time we finish running it. So I tried writing some code to address this; a first cut is attached. Here's what it does: 1. As we parse each query, it sets a flag in the parse-state if we see a non-immutable function. For the time being, I'm assuming immutable == parallel-safe, although that's probably not correct in detail. It also sets the flag if it sees a data-modifying operation, meaning an insert, update, delete, or locking clause. The point of this is to avoid making an extra pass over the query just to assess parallel-safety; we want to accumulate that information as we go along. 2. When parsing is complete, the parse-state flag is copied into the Query, similar to what we already do for flags like hasModifyingCTE. 3. When the query is planned, planner() sets a flag in the PlannerGlobal called parallelModeOK if the Query is not marked as parallel-mode unsafe. There's also a new cursor option, CURSOR_OPT_NO_PARALLEL, with forces parallelModeOK to false regardless of what the Query says. It initializes another flag parallelModeNeeded to false as well. The idea here is that before generating a parallel path, the planner should examine parallelModeOK and skip it if that's false. If we end up creating a plan from a parallel path, then the plan-generation function should set parallelModeNeeded. 4. At the conclusion of planning, the parallelModeNeeded flag is copied from the PlannerGlobal to the PlannedStmt. 5. ExecutorStart() calls EnterParallelMode() if parallelModeNeeded is set and we're not already in parallel mode. ExecutorEnd() calls ExitParallelMode() if EnterParallelMode() was called in ExecutorStart(). There are a few problems with this design that I don't immediately know how to solve: 1. I'm concerned that the query-rewrite step could substitute a query that is not parallel-safe for one that is. The upper Query might still be flagged as safe, and that's all that planner() looks at. 2. Interleaving the execution of two parallel queries by firing up two copies of the executor simultaneously can result in leaving parallel mode at the wrong time. 3. Any code using SPI has to think hard about whether to pass OPT_CURSOR_NO_PARALLEL. For example, PL/pgsql doesn't need to pass this flag when caching a plan for a query that will be run to completion each time it's executed. But it DOES need to pass the flag for a FOR loop over an SQL statement, because the code inside the FOR loop might do parallel-unsafe things while the query is suspended. Thoughts, either on the general approach or on what to do about the problems? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: