Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays |
Date | |
Msg-id | CAAaqYe8zdhuwWdhdKcg9n2_SfLe2dHZfpyfymXgGGP9mHgt5uQ@mail.gmail.com Whole thread Raw |
In response to | Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays (James Coleman <jtc331@gmail.com>) |
Responses |
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
|
List | pgsql-hackers |
On Fri, Sep 11, 2020 at 5:11 PM James Coleman <jtc331@gmail.com> wrote: > > On Tue, Sep 8, 2020 at 4:37 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > > On 08/09/2020 22:25, James Coleman wrote: > > > On Wed, Aug 19, 2020 at 3:16 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > >> > > >> You could also apply the optimization for non-Const expressions, as long > > >> as they're stable (i.e. !contain_volatile_functions(expr)). > > > > > > Is that true? Don't we also have to worry about whether or not the > > > value is stable (i.e., know when a param has changed)? There have been > > > discussions about being able to cache stable subexpressions, and my > > > understanding was that we'd need to have that infrastructure (along > > > with the necessarily invalidation when the param changes) to be able > > > to use this for non-Const expressions. > > > > Yeah, you're right, you'd have to also check for PARAM_EXEC Params. And > > Vars. I think the conditions are the same as those checked in > > match_clause_to_partition_key() in partprune.c (it's a long function, > > search for "if (!IsA(rightop, Const))"). Not sure it's worth the > > trouble, then. But it would be nice to handle queries like "WHERE column > > = ANY ($1)" > > If I'm understanding properly you're imagining something in the form of: > > with x as (select '{1,2,3,4,5,6,7,8,9,10}'::int[]) > select * from t where i = any ((select * from x)::int[]); > > I've been playing around with this and currently have these checks: > > contain_var_clause((Node *) arrayarg) > contain_volatile_functions((Node *) arrayarg) > contain_exec_param((Node *) arrayarg, NIL) > > (the last one I had to modify the function to handle empty lists) > > If any are true, then have to disable the optimization. But for > queries in the form above the test contain_exec_param((Node *) > arrayarg, NIL) evaluates to true, even though we know from looking at > the query that the array subexpression is stable for the length of the > query. > > Am I misunderstanding what you're going for? Or is there a way to > confirm the expr, although an exec param, won't change? > > Another interesting thing that this would imply is that we'd either have to: > > 1. Remove the array length check altogether, > 2. Always use the hash when have a non-Const, but when a Const only if > the array length check passes, or > 3. Make this new expr op more fully featured by teaching it how to use > either a straight loop through a deconstructed array or use the hash. > > That last option feeds into further discussion in the below: > > > >> Deconstructing the array Datum into a simple C array on first call would > > >> be a win even for very small arrays and for AND semantics, even if you > > >> don't use a hash table. > > > > > > Because you wouldn't have to repeatedly detoast it? Or some other > > > reason I'm not thinking of? My intuition would have been that (aside > > > from detoasting if necessary) there wouldn't be much real overhead in, > > > for example, an array storing integers. > > > > Dealing with NULLs and different element sizes in the array is pretty > > complicated. Looping through the array currently looks like this: > > > > /* Loop over the array elements */ > > s = (char *) ARR_DATA_PTR(arr); > > bitmap = ARR_NULLBITMAP(arr); > > bitmask = 1; > > > > for (int i = 0; i < nitems; i++) > > { > > Datum elt; > > Datum thisresult; > > > > /* Get array element, checking for NULL */ > > if (bitmap && (*bitmap & bitmask) == 0) > > { > > fcinfo->args[1].value = (Datum) 0; > > fcinfo->args[1].isnull = true; > > } > > else > > { > > elt = fetch_att(s, typbyval, typlen); > > s = att_addlength_pointer(s, typlen, s); > > s = (char *) att_align_nominal(s, typalign); > > fcinfo->args[1].value = elt; > > fcinfo->args[1].isnull = false; > > } > > > > [do stuff with Datum/isnull] > > > > /* advance bitmap pointer if any */ > > if (bitmap) > > { > > bitmask <<= 1; > > if (bitmask == 0x100) > > { > > bitmap++; > > bitmask = 1; > > } > > } > > } > > > > Compared with just: > > > > for (int i = 0; i < nitems; i++) > > { > > Datum elt = datums[i]; > > > > [do stuff with the Datum] > > } > > > > I'm not sure how much difference that makes, but I presume it's not > > zero, and it seems like an easy win when you have the code to deal with > > the Datum array representation anyway. > > Doing this would necessitate option 3 above: we'd have to have this > new expr op be able both to use a hash or alternatively do a normal > loop. > > Being able to use this in more cases than just a Const array expr is > certainly interesting, but I'm not sure yet about the feasibility or > desirability of that at this point given the above restrictions. > > One other point in favor of the additional complexity here is that > it's likely that the above described runtime switching between hash > and loop would be necessary (for this optimization to come into play) > if caching of stable subexpressions ever lands. I have some interest > in working on that...but it's also a large project. I've attached a cleaned up patch. Last CF it was listed in is https://commitfest.postgresql.org/29/2542/ -- what's the appropriate step to take here given it's an already existing patch, but not yet moved into recent CFs? Thanks, James
Attachment
pgsql-hackers by date: