Re: [bug?] Missed parallel safety checks, and wrong parallel safety - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Date
Msg-id CA+TgmoZ_iJWWV0c+o7Hp-CByjfs13zQO_kxwNp13qEidS=PC2g@mail.gmail.com
Whole thread Raw
In response to RE: [bug?] Missed parallel safety checks, and wrong parallel safety  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: [bug?] Missed parallel safety checks, and wrong parallel safety  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On Wed, May 5, 2021 at 10:54 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
> (1) Is it better to get hardcoded function properties out of fmgr_builtins[]?
> It's little worth doing so or thinking about that.  It's no business for users to change system objects, in this case
systemfunctions. 

I don't entirely agree with this. Whether or not users have any
business changing system functions, it's better to have one source of
truth than two. Now that being said, this is not a super-important
problem for us to go solve, and hard-coding a certain amount of stuff
is probably necessary to allow the system to bootstrap itself. So for
me it's one of those things that is in.a grey area: if someone showed
up with a patch to make it better, I'd be happy. But I probably
wouldn't spend much time on writing such a patch unless it solved some
other problem that I cared about.

> (3) Why do we want to have parallel-safety in fmgr_builtins[]?
> As proposed in this thread and/or "Parallel INSERT SELECT take 2", we thought of detecting parallel unsafe function
executionduring SQL statement execution, instead of imposing much overhead to check parallel safety during query
planning. Specifically, we add parallel safety check in fmgr_info() and/or FunctionCallInvoke(). 

I haven't read that thread, but I don't understand how that can work.
The reason we need to detect it at plan time is because we might need
to use a different plan. At execution time it's too late for that.

Also, it seems potentially quite expensive. A query may be planned
once and executed many times. Also, a single query execution may call
the same SQL function many times. I think we don't want to incur the
overhead of an extra syscache lookup every time anyone calls any
function. A very simple expression like a+b+c+d+e involves four
function calls, and + need not be a built-in, if the data type is
user-defined. And that might be happening for every row in a table
with millions of rows.

> (Alternatively, I think we can conclude that we assume parallel unsafe built-in functions won't be used in parallel
DML. In that case, we don't change FmgrBuiltin and we just skip the parallel safety check for built-in functions when
thefunction is called.  Would you buy this?) 

I don't really understand this idea. There's no such thing as parallel
DML, is there? There's just DML, which we must to decide whether can
be done in parallel or not based on, among other things, the
parallel-safety markings of the functions it contains. Maybe I am not
understanding you correctly, but it seems like you're suggesting that
in some cases we can just assume that the user hasn't done something
parallel-unsafe without making any attempt to check it. I don't think
I could support that.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: decoupling table and index vacuum
Next
From: Robert Haas
Date:
Subject: Re: [bug?] Missed parallel safety checks, and wrong parallel safety