Re: execExprInterp() questions / How to improve scalar array op expreval? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: execExprInterp() questions / How to improve scalar array op expreval? |
Date | |
Msg-id | 20200411180132.ccxokkyvdz7rgxll@alap3.anarazel.de Whole thread Raw |
In response to | execExprInterp() questions / How to improve scalar array op expr eval? (James Coleman <jtc331@gmail.com>) |
Responses |
Re: execExprInterp() questions / How to improve scalar array op expr eval?
Re: execExprInterp() questions / How to improve scalar array op expr eval? |
List | pgsql-hackers |
Hi, Tom, CCing you because of expanded datum question at bottom. On 2020-04-11 08:58:46 -0400, James Coleman wrote: > - Does the execExpr/execExprInterp framework allow a scalar array op > to get an already expanded array (unless I'm missing something we > can't easily lookup a given index in a flattened array)? Well, I'm not quite sure what you're thinking of. If the input is constant, then expression initialization can do just about everything it wants. Including preprocessing the array into whatever form it wants. But there's no logic for doing preprocessing whenever values change. > - If not, is there a way in that framework to know if the array expr > has stayed the same through multiple evaluations of the expression > tree (i.e., so you could expand and sort it just once)? No. > Long version/background: > > I've been looking at how a query like: > select * from t where i in (<long array>); > executes as a seq scan the execution time increases linearly with > respect to the length of the array. That's because in > execExprInterp.c's ExecEvalScalarArrayOp() we do a linear search > through the array. Is "<long array>" constant in the cases you're interested in? Because it's a heck of a lot easier to add an optimization for that case, than adding runtime tracking the array values by comparing the whole array for equality with the last - the comparisons of the whole array could easily end up adding more cost than what's being saved. > I've been considering approaches to improve the seq scan case. We might: > - At plan time rewrite as a hash join to a deduped array values > relation (gross, but included here since that's an approach we can > take rewriting the SQL itself as a proof of concept). > - At execution time build a hash and lookup. > - At execution time sort the array and binary search through it. > > I've been working other last approach to see what results I might get > (it seemed like the easiest to hack together). Putting that together > left me with the questions mentioned above in the "short version". > > Obviously if anyone has thoughts on the above approaches I'd be > interested in that too. If you're content with optimizing constant arrays, I'd go for detecting that case in the T_ScalarArrayOpExpr case in ExecInitExprRec(), and preprocessing the array into an optimized form. Probably with a separate opcode for execution. > Side question: when we do: > arr = DatumGetArrayTypeP(*op->resvalue); > in ExecEvalScalarArrayOp() is that going to be expensive each time > through a seq scan? Or is it (likely) going to resolve to an already > in-memory array and effectively be the cost of retrieving that > pointer? It Depends TM. For the constant case it's likely going to be cheap-ish, because it'll not be toasted. For the case where it's the return value from a subquery or something, you cannot assume it won't change between calls. I think the worst case here is something like a nestloop, where the inner side does foo IN (outer.column). If I recall the code correctly, we'll currently end up detoasting the array value every single iteration. I wonder if it would be a good idea to change ExecEvalParamExec and ExecEvalParamExtern to detoast the to-be-returned value. If we change the value that's stored in econtext->ecxt_param_exec_vals / econtext->ecxt_param_list_info, we'd avoid repeated detaosting. It'd be nice if we somehow could make the expanded datum machinery work here. I'm not quite seeing how though? Crazy idea: I have a patch to make execExprInterp largely use NullableDatum. Tom and I had theorized a while ago about adding additional fields in the padding that currently exists in it. I wonder if we could utilize a bit in there to allow to expand in-place? Greetings, Andres Freund
pgsql-hackers by date: