Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel Seq Scan
Date
Msg-id CA+TgmobR=EuLaNjKUS3gZQSZfpi15+hyGc9Vyweu_Ab9bWALgw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Noah Misch <noah@leadboat.com>)
Responses Re: Parallel Seq Scan  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Tue, Oct 13, 2015 at 9:08 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Oct 12, 2015 at 11:46:08AM -0400, Robert Haas wrote:
>> plpgsql_param_fetch() assumes that it can detect whether it's being
>> called from copyParamList() by checking whether params !=
>> estate->paramLI.  I don't know why this works, but I do know that this
>
> It works because PL/pgSQL creates an unshared list whenever copyParamList() is
> forthcoming.  (This in turn relies on intimate knowledge of how the rest of
> the system processes param lists.)  The comments at setup_param_list() and
> setup_unshared_param_list() are most pertinent.

Thanks for the pointer.

>> test fails to detect the case where it's being called from
>> SerializeParamList(), which causes failures in exec_eval_datum() as
>> predicted.  Calls from SerializeParamList() need the same treatment as
>> calls from copyParamList() because it, too, will try to evaluate every
>> parameter in the list.  Here, I've taken the approach of making that
>> check unconditional, which seems to work, but I'm not sure if some
>> other approach would be better, such as adding an additional Boolean
>> (or enum context?) argument to ParamFetchHook.  I *think* that
>> skipping this check is merely a performance optimization rather than
>> anything that affects correctness, and bms_is_member() is pretty
>> cheap, so perhaps the way I've done it is OK.
>
> Like you, I don't expect bms_is_member() to be expensive relative to the task
> at hand.  However, copyParamList() and SerializeParamList() copy non-dynamic
> params without calling plpgsql_param_fetch().  Given the shared param list,
> they will copy non-dynamic params the current query doesn't use.  That cost is
> best avoided, not being well-bounded; consider the case of an unrelated
> variable containing a TOAST pointer to a 1-GiB value.  One approach is to have
> setup_param_list() copy the paramnos pointer to a new ParamListInfoData field:
>
>         Bitmapset  *paramMask;  /* if non-NULL, ignore params lacking a 1-bit */
>
> Test it directly in copyParamList() and SerializeParamList().  As a bonus,
> that would allow use of the shared param list for more cases involving
> cursors.  Furthermore, plpgsql_param_fetch() would never need to test
> paramnos.  A more-general alternative is to have a distinct "paramIsUsed"
> callback, but I don't know how one would exploit the extra generality.

I'm anxious to minimize the number of things that must be fixed in
order for a stable version of parallel query to exist in our master
repository, and I fear that trying to improve ParamListInfo generally
could take me fairly far afield.  How about adding a paramListCopyHook
to ParamListInfoData?  SerializeParamList() would, if it found a
parameter with !OidIsValid(prm->prmtype) && param->paramFetch != NULL,
call this function, which would return a new ParamListInfo to be
serialized in place of the original?  This wouldn't require any
modification to the current plpgsql_param_fetch() at all, but the new
function would steal its bms_is_member() test.  Furthermore, no user
of ParamListInfo other than plpgsql needs to care at all -- which,
with your proposals, they would.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Euler Taveira
Date:
Subject: Re: pam auth - add rhost item
Next
From: Amit Langote
Date:
Subject: Fix unclear comments in tablecmds.c