Home > mailing lists

execExprInterp() questions / How to improve scalar array op expr eval? - Mailing list pgsql-hackers

From	James Coleman
Subject	execExprInterp() questions / How to improve scalar array op expr eval?
Date	April 11, 2020 15:58:46
Msg-id	CAAaqYe-UQBba7sScrucDOyHb7cDoNbWf_rcLrOWeD4ikP3_qTQ@mail.gmail.com Whole thread Raw
Responses	Re: execExprInterp() questions / How to improve scalar array op expreval?
List	pgsql-hackers

Tree view

Short version:

In what I'm currently working on I had a few questions about arrays
and the execExpr/execExprInterp framework that didn't seem obviously
answered in the code or README.

- Does the execExpr/execExprInterp framework allow a scalar array op
to get an already expanded array (unless I'm missing something we
can't easily lookup a given index in a flattened array)?
- If not, is there a way in that framework to know if the array expr
has stayed the same through multiple evaluations of the expression
tree (i.e., so you could expand and sort it just once)?

Long version/background:

I've been looking at how a query like:
select * from t where i in (<long array>);
executes as a seq scan the execution time increases linearly with
respect to the length of the array. That's because in
execExprInterp.c's ExecEvalScalarArrayOp() we do a linear search
through the array.

In contrast, when setting up a btree scan with a similar saop, we
first sort the array, remove duplicates, and remove nulls. Of course
with btree scans this has other values, like allowing us to return
results in array order.

I've been considering approaches to improve the seq scan case. We might:
- At plan time rewrite as a hash join to a deduped array values
relation (gross, but included here since that's an approach we can
take rewriting the SQL itself as a proof of concept).
- At execution time build a hash and lookup.
- At execution time sort the array and binary search through it.

I've been working other last approach to see what results I might get
(it seemed like the easiest to hack together). Putting that together
left me with the questions mentioned above in the "short version".

Obviously if anyone has thoughts on the above approaches I'd be
interested in that too.

Side question: when we do:
arr = DatumGetArrayTypeP(*op->resvalue);
in ExecEvalScalarArrayOp() is that going to be expensive each time
through a seq scan? Or is it (likely) going to resolve to an already
in-memory array and effectively be the cost of retrieving that
pointer?

James

pgsql-hackers by date:

From: Jürgen Purtz
Date: 11 April 2020, 15:10:21
Subject: Re: Add A Glossary

From: Julien Rouhaud
Date: 11 April 2020, 16:24:47
Subject: Re: WAL usage calculation patch

execExprInterp() questions / How to improve scalar array op expr eval? - Mailing list pgsql-hackers

Previous

Next