Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays |
Date | |
Msg-id | CAAaqYe-7JEece1P8WOxN6Vjz-SPY_8p_nKhVxtN+9LqNdmMdrw@mail.gmail.com Whole thread Raw |
In response to | Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
|
List | pgsql-hackers |
On Tue, Sep 8, 2020 at 4:37 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 08/09/2020 22:25, James Coleman wrote: > > On Wed, Aug 19, 2020 at 3:16 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> > >> You could also apply the optimization for non-Const expressions, as long > >> as they're stable (i.e. !contain_volatile_functions(expr)). > > > > Is that true? Don't we also have to worry about whether or not the > > value is stable (i.e., know when a param has changed)? There have been > > discussions about being able to cache stable subexpressions, and my > > understanding was that we'd need to have that infrastructure (along > > with the necessarily invalidation when the param changes) to be able > > to use this for non-Const expressions. > > Yeah, you're right, you'd have to also check for PARAM_EXEC Params. And > Vars. I think the conditions are the same as those checked in > match_clause_to_partition_key() in partprune.c (it's a long function, > search for "if (!IsA(rightop, Const))"). Not sure it's worth the > trouble, then. But it would be nice to handle queries like "WHERE column > = ANY ($1)" If I'm understanding properly you're imagining something in the form of: with x as (select '{1,2,3,4,5,6,7,8,9,10}'::int[]) select * from t where i = any ((select * from x)::int[]); I've been playing around with this and currently have these checks: contain_var_clause((Node *) arrayarg) contain_volatile_functions((Node *) arrayarg) contain_exec_param((Node *) arrayarg, NIL) (the last one I had to modify the function to handle empty lists) If any are true, then have to disable the optimization. But for queries in the form above the test contain_exec_param((Node *) arrayarg, NIL) evaluates to true, even though we know from looking at the query that the array subexpression is stable for the length of the query. Am I misunderstanding what you're going for? Or is there a way to confirm the expr, although an exec param, won't change? Another interesting thing that this would imply is that we'd either have to: 1. Remove the array length check altogether, 2. Always use the hash when have a non-Const, but when a Const only if the array length check passes, or 3. Make this new expr op more fully featured by teaching it how to use either a straight loop through a deconstructed array or use the hash. That last option feeds into further discussion in the below: > >> Deconstructing the array Datum into a simple C array on first call would > >> be a win even for very small arrays and for AND semantics, even if you > >> don't use a hash table. > > > > Because you wouldn't have to repeatedly detoast it? Or some other > > reason I'm not thinking of? My intuition would have been that (aside > > from detoasting if necessary) there wouldn't be much real overhead in, > > for example, an array storing integers. > > Dealing with NULLs and different element sizes in the array is pretty > complicated. Looping through the array currently looks like this: > > /* Loop over the array elements */ > s = (char *) ARR_DATA_PTR(arr); > bitmap = ARR_NULLBITMAP(arr); > bitmask = 1; > > for (int i = 0; i < nitems; i++) > { > Datum elt; > Datum thisresult; > > /* Get array element, checking for NULL */ > if (bitmap && (*bitmap & bitmask) == 0) > { > fcinfo->args[1].value = (Datum) 0; > fcinfo->args[1].isnull = true; > } > else > { > elt = fetch_att(s, typbyval, typlen); > s = att_addlength_pointer(s, typlen, s); > s = (char *) att_align_nominal(s, typalign); > fcinfo->args[1].value = elt; > fcinfo->args[1].isnull = false; > } > > [do stuff with Datum/isnull] > > /* advance bitmap pointer if any */ > if (bitmap) > { > bitmask <<= 1; > if (bitmask == 0x100) > { > bitmap++; > bitmask = 1; > } > } > } > > Compared with just: > > for (int i = 0; i < nitems; i++) > { > Datum elt = datums[i]; > > [do stuff with the Datum] > } > > I'm not sure how much difference that makes, but I presume it's not > zero, and it seems like an easy win when you have the code to deal with > the Datum array representation anyway. Doing this would necessitate option 3 above: we'd have to have this new expr op be able both to use a hash or alternatively do a normal loop. Being able to use this in more cases than just a Const array expr is certainly interesting, but I'm not sure yet about the feasibility or desirability of that at this point given the above restrictions. One other point in favor of the additional complexity here is that it's likely that the above described runtime switching between hash and loop would be necessary (for this optimization to come into play) if caching of stable subexpressions ever lands. I have some interest in working on that...but it's also a large project. James
pgsql-hackers by date: