Re: assessing parallel-safety - Mailing list pgsql-hackers

From Robert Haas
Subject Re: assessing parallel-safety
Date
Msg-id CA+TgmoYqMtQ=aTwOpYJ_kD6bNF=QtSLgiz5RDu=_ddsZ0N1UoA@mail.gmail.com
Whole thread Raw
In response to Re: assessing parallel-safety  (Noah Misch <noah@leadboat.com>)
Responses Re: assessing parallel-safety  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sun, Feb 8, 2015 at 11:31 AM, Noah Misch <noah@leadboat.com> wrote:
> On Sat, Feb 07, 2015 at 08:18:55PM -0500, Robert Haas wrote:
>> There are a few problems with this design that I don't immediately
>> know how to solve:
>>
>> 1. I'm concerned that the query-rewrite step could substitute a query
>> that is not parallel-safe for one that is.  The upper Query might
>> still be flagged as safe, and that's all that planner() looks at.
>
> I would look at determining the query's parallel safety early in the planner
> instead; simplify_function() might be a cheap place to check.  Besides
> avoiding rewriter trouble, this allows one to alter parallel safety of a
> function without invalidating Query nodes serialized in the system catalogs.

Thanks, I'll investigate that approach.

>> 2. Interleaving the execution of two parallel queries by firing up two
>> copies of the executor simultaneously can result in leaving parallel
>> mode at the wrong time.
>
> Perhaps the parallel mode state should be a reference count, not a boolean.
> Alternately, as a first cut, just don't attempt parallelism when we're already
> in parallel mode.

I think changing it to a reference count makes sense.  I'll work on that.

>> 3. Any code using SPI has to think hard about whether to pass
>> OPT_CURSOR_NO_PARALLEL.  For example, PL/pgsql doesn't need to pass
>> this flag when caching a plan for a query that will be run to
>> completion each time it's executed.  But it DOES need to pass the flag
>> for a FOR loop over an SQL statement, because the code inside the FOR
>> loop might do parallel-unsafe things while the query is suspended.
>
> That makes sense; the code entering SPI knows best which restrictions it can
> tolerate for the life of a given cursor.  (One can imagine finer-grained rules
> in the future.  If the current function is itself marked parallel-safe, it's
> safe in principle for a FOR-loop SQL statement to use parallelism.)  I do
> recommend inverting the sense of the flag, so unmodified non-core PLs will
> continue to behave as they do today.

Yeah, that's probably a good idea.  Sort of annoying, but playing with
the patch in the OP made it pretty clear that we cannot possibly just
assume parallelism is OK by default.  In the core code, parallelism is
OK in more places than not, but in the PLs it seems to be the reverse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: What exactly is our CRC algorithm?
Next
From: Robert Haas
Date:
Subject: Re: New CF app deployment