Hi hackers,
While trying to measure if there is any gain from the change as
discussed in [1], I happened to notice another place that is slowed down
by list_delete_first. I'm using the query as below:
(n=1000000;
printf "explain (summary on) select * from t where "
for ((i=1;i<$n;i++)); do printf "a = $i or "; done;
printf "a = $n;"
) | psql
And I notice that a large part of planning time is spent on the
list_delete_first calls inside simplify_or_arguments().
I think the issue here is clear and straightforward: list_delete_first
has an O(N) cost due to data movement. And I believe similar issue has
been discussed several times before.
I wonder if we can improve it by using list_delete_last instead, so I
tried the following change:
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -3612,9 +3612,9 @@ simplify_or_arguments(List *args,
unprocessed_args = list_copy(args);
while (unprocessed_args)
{
- Node *arg = (Node *) linitial(unprocessed_args);
+ Node *arg = (Node *) llast(unprocessed_args);
- unprocessed_args = list_delete_first(unprocessed_args);
+ unprocessed_args = list_delete_last(unprocessed_args);
With this change, in my box the planning time for the query above is
reduced from 64257.784 ms to 1411.666 ms, a big improvement. The side
effect is that it results in a lot of plan diffs in regression tests,
but they are all about different order of OR arguments.