On Mon, Jul 28, 2025 at 2:20 AM Sajith Prabhakar Shetty
<ssajith@blackduck.com> wrote:
> We are able to get you a self-contained reproducer, please find attached dump, sql script and read me files.
I find that your test case spends a great deal of time on nbtree
preprocessing, which happens once per execution of the inner index
scan on "zsf". According to "perf top", most cycles on spent on these:
32.02% postgres [.] FunctionCall2Coll
22.01% postgres [.] qsort_arg
18.64% postgres [.] _bt_compare_array_elements
8.20% postgres [.] btint8cmp
3.97% postgres [.] _bt_preprocess_keys
...
The query takes ~1550ms on my local workstation. If I just comment out
the relevant qsort, it'll take only ~190 ms. That qsort might not be
the only problem here, but it is the immediate problem. Note that
commenting out the qsort should produce the same answer, at least for
this one query, since the constants that appear in the query are
already sorted (the EXPLAIN row counts match what they show with the
qsort in place).
In principle, we could limit the use of the qsort to the first inner
index scan, and safely skip each subsequent qsort -- at least in cases
where the array was a constant (which includes this case). Obviously,
we *do* need a qsort (what if the constants aren't exactly in sorted
order?), but we generally don't need to do it once per inner index
scan.
There is a separate question as to whether or not the planner should
pick this plan in the first place. I find that I can get a faster plan
(without commenting out anything) by tricking the planner into using
the single column "zsf_pkey", rather than the multi-column
"zsf_id_fpi_cid_key". Even then, I can only get a merge join with the
"zsf_pkey" index on the inner side -- not a nested loop join with the
"zsf_pkey" index on the inner side, as expected. The merge join plan
brings the execution time down to ~90ms, which is better, but
certainly still less than ideal.
--
Peter Geoghegan