Re: POC: converting Lists into arrays - Mailing list pgsql-hackers

From Tom Lane
Subject Re: POC: converting Lists into arrays
Date
Msg-id 21272.1563318411@sss.pgh.pa.us
Whole thread Raw
In response to Re: POC: converting Lists into arrays  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: POC: converting Lists into arrays
Re: POC: converting Lists into arrays
Re: POC: converting Lists into arrays
List pgsql-hackers
I wrote:
> * Look at places using lcons/list_delete_first to maintain FIFO lists.
> The patch makes these O(N^2) for long lists.  If we can reverse the list
> order and use lappend/list_truncate instead, it'd be better.  Possibly in
> some places the list ordering is critical enough to make this impractical,
> but I suspect it's an easy win in most.

Attached are two patches that touch all the places where it seemed like
an easy win to stop using lcons and/or list_delete_first.

0001 adds list_delete_last() as a mirror image to list_delete_first(),
and changes all the places where it seemed 100% safe to do so (ie,
there's no behavioral change because the list order is demonstrably
immaterial).

0002 changes some additional places where it's maybe a bit less safe,
ie there's a potential for user-visible behavioral change because
processing will occur in a different order.  In particular, the proposed
change in execExpr.c causes aggregates and window functions that are in
the same plan node to be executed in a different order than before ---
but it seems to me that this order is saner.  (Note the change in the
expected regression results, in a test that's intentionally sensitive to
the execution order.)  And anyway when did we guarantee anything about
that?

I refrained from changing lcons to lappend in get_relation_info, because
that demonstrably causes the planner to change its choices when two
indexes look equally attractive, and probably people would complain
about that.  I think that the other changes proposed in 0002 are pretty
harmless --- for example, in get_tables_to_cluster the order depends
initially on the results of a seqscan of pg_index, so anybody who's
expecting stability is in for rude surprises anyhow.  Also, the proposed
changes in plancat.c, parse_agg.c, selfuncs.c almost certainly have no
user-visible effect, but maybe there could be changes at the
roundoff-error level due to processing estimates in a different order?

There are a bunch of places that are using list_delete_first to remove
the next-to-process entry from a List used as a queue.  In principle,
we could invert the order of those queues and then use list_delete_last,
but I thought this would probably be too confusing: it's natural to
think of the front of the list as being the head of the queue.  I doubt
that any of those queues get long enough for it to be a serious
performance problem to leave them as-is.

(Actually, I doubt that any of these changes will really move the
performance needle in the real world.  It's more a case of wanting
the code to present good examples not bad ones.)

Thoughts?  Anybody want to object to any of the changes in 0002?

            regards, tom lane

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index dfb51f6..169bf6f 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1323,8 +1323,6 @@ static void
 gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
                 GISTSTATE *giststate, List *splitinfo, bool unlockbuf)
 {
-    ListCell   *lc;
-    List       *reversed;
     GISTPageSplitInfo *right;
     GISTPageSplitInfo *left;
     IndexTuple    tuples[2];
@@ -1339,14 +1337,6 @@ gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
      * left. Finally insert the downlink for the last new page and update the
      * downlink for the original page as one operation.
      */
-
-    /* for convenience, create a copy of the list in reverse order */
-    reversed = NIL;
-    foreach(lc, splitinfo)
-    {
-        reversed = lcons(lfirst(lc), reversed);
-    }
-
     LockBuffer(stack->parent->buffer, GIST_EXCLUSIVE);
     gistFindCorrectParent(state->r, stack);

@@ -1354,10 +1344,10 @@ gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
      * insert downlinks for the siblings from right to left, until there are
      * only two siblings left.
      */
-    while (list_length(reversed) > 2)
+    for (int pos = list_length(splitinfo) - 1; pos > 1; pos--)
     {
-        right = (GISTPageSplitInfo *) linitial(reversed);
-        left = (GISTPageSplitInfo *) lsecond(reversed);
+        right = (GISTPageSplitInfo *) list_nth(splitinfo, pos);
+        left = (GISTPageSplitInfo *) list_nth(splitinfo, pos - 1);

         if (gistinserttuples(state, stack->parent, giststate,
                              &right->downlink, 1,
@@ -1371,11 +1361,10 @@ gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
             gistFindCorrectParent(state->r, stack);
         }
         /* gistinserttuples() released the lock on right->buf. */
-        reversed = list_delete_first(reversed);
     }

-    right = (GISTPageSplitInfo *) linitial(reversed);
-    left = (GISTPageSplitInfo *) lsecond(reversed);
+    right = (GISTPageSplitInfo *) lsecond(splitinfo);
+    left = (GISTPageSplitInfo *) linitial(splitinfo);

     /*
      * Finally insert downlink for the remaining right page and update the
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 6c3ff76..032fab9 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -633,7 +633,7 @@ CheckAttributeType(const char *attname,
                      errmsg("composite type %s cannot be made a member of itself",
                             format_type_be(atttypid))));

-        containing_rowtypes = lcons_oid(atttypid, containing_rowtypes);
+        containing_rowtypes = lappend_oid(containing_rowtypes, atttypid);

         relation = relation_open(get_typ_typrelid(atttypid), AccessShareLock);

@@ -653,7 +653,7 @@ CheckAttributeType(const char *attname,

         relation_close(relation, AccessShareLock);

-        containing_rowtypes = list_delete_first(containing_rowtypes);
+        containing_rowtypes = list_delete_last(containing_rowtypes);
     }
     else if (OidIsValid((att_typelem = get_element_type(atttypid))))
     {
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 417d595..bae3b38 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -281,11 +281,11 @@ LockViewRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, List *ancestor_views
     context.nowait = nowait;
     context.viewowner = view->rd_rel->relowner;
     context.viewoid = reloid;
-    context.ancestor_views = lcons_oid(reloid, ancestor_views);
+    context.ancestor_views = lappend_oid(ancestor_views, reloid);

     LockViewRecurse_walker((Node *) viewquery, &context);

-    ancestor_views = list_delete_oid(ancestor_views, reloid);
+    (void) list_delete_last(context.ancestor_views);

     table_close(view, NoLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0c0ddd5..fc1c4df 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14480,6 +14480,11 @@ register_on_commit_action(Oid relid, OnCommitAction action)
     oc->creating_subid = GetCurrentSubTransactionId();
     oc->deleting_subid = InvalidSubTransactionId;

+    /*
+     * We use lcons() here so that ON COMMIT actions are processed in reverse
+     * order of registration.  That might not be essential but it seems
+     * reasonable.
+     */
     on_commits = lcons(oc, on_commits);

     MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 5584fa8..9163464 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -827,6 +827,30 @@ list_delete_first(List *list)
 }

 /*
+ * Delete the last element of the list.
+ *
+ * This is the opposite of list_delete_first(), but is noticeably cheaper
+ * with a long list, since no data need be moved.
+ */
+List *
+list_delete_last(List *list)
+{
+    check_list_invariants(list);
+
+    if (list == NIL)
+        return NIL;                /* would an error be better? */
+
+    /* list_truncate won't free list if it goes to empty, but this should */
+    if (list_length(list) <= 1)
+    {
+        list_free(list);
+        return NIL;
+    }
+
+    return list_truncate(list, list_length(list) - 1);
+}
+
+/*
  * Generate the union of two lists. This is calculated by copying
  * list1 via list_copy(), then adding to it all the members of list2
  * that aren't already in list1.
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index f0e789f..99dbf8d 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -881,11 +881,9 @@ is_parallel_safe(PlannerInfo *root, Node *node)
         foreach(l, proot->init_plans)
         {
             SubPlan    *initsubplan = (SubPlan *) lfirst(l);
-            ListCell   *l2;

-            foreach(l2, initsubplan->setParam)
-                context.safe_param_ids = lcons_int(lfirst_int(l2),
-                                                   context.safe_param_ids);
+            context.safe_param_ids = list_concat(context.safe_param_ids,
+                                                 initsubplan->setParam);
         }
     }

@@ -1015,6 +1013,7 @@ max_parallel_hazard_walker(Node *node, max_parallel_hazard_context *context)
                                               context->safe_param_ids);
         if (max_parallel_hazard_walker(subplan->testexpr, context))
             return true;        /* no need to restore safe_param_ids */
+        list_free(context->safe_param_ids);
         context->safe_param_ids = save_safe_param_ids;
         /* we must also check args, but no special Param treatment there */
         if (max_parallel_hazard_walker((Node *) subplan->args, context))
@@ -4185,8 +4184,8 @@ add_function_defaults(List *args, HeapTuple func_tuple)
     ndelete = nargsprovided + list_length(defaults) - funcform->pronargs;
     if (ndelete < 0)
         elog(ERROR, "not enough default arguments");
-    while (ndelete-- > 0)
-        defaults = list_delete_first(defaults);
+    if (ndelete > 0)
+        defaults = list_copy_tail(defaults, ndelete);

     /* And form the combined argument list, not modifying the input list */
     return list_concat(list_copy(args), defaults);
@@ -4701,9 +4700,9 @@ inline_function(Oid funcid, Oid result_type, Oid result_collid,
      * Recursively try to simplify the modified expression.  Here we must add
      * the current function to the context list of active functions.
      */
-    context->active_fns = lcons_oid(funcid, context->active_fns);
+    context->active_fns = lappend_oid(context->active_fns, funcid);
     newexpr = eval_const_expressions_mutator(newexpr, context);
-    context->active_fns = list_delete_first(context->active_fns);
+    context->active_fns = list_delete_last(context->active_fns);

     error_context_stack = sqlerrcontext.previous;

diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 5b047d1..93b6784 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1973,7 +1973,7 @@ fireRIRrules(Query *parsetree, List *activeRIRs)
                             (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
                              errmsg("infinite recursion detected in rules for relation \"%s\"",
                                     RelationGetRelationName(rel))));
-                activeRIRs = lcons_oid(RelationGetRelid(rel), activeRIRs);
+                activeRIRs = lappend_oid(activeRIRs, RelationGetRelid(rel));

                 foreach(l, locks)
                 {
@@ -1986,7 +1986,7 @@ fireRIRrules(Query *parsetree, List *activeRIRs)
                                                   activeRIRs);
                 }

-                activeRIRs = list_delete_first(activeRIRs);
+                activeRIRs = list_delete_last(activeRIRs);
             }
         }

@@ -2059,7 +2059,7 @@ fireRIRrules(Query *parsetree, List *activeRIRs)
                              errmsg("infinite recursion detected in policy for relation \"%s\"",
                                     RelationGetRelationName(rel))));

-                activeRIRs = lcons_oid(RelationGetRelid(rel), activeRIRs);
+                activeRIRs = lappend_oid(activeRIRs, RelationGetRelid(rel));

                 /*
                  * get_row_security_policies just passed back securityQuals
@@ -2084,7 +2084,7 @@ fireRIRrules(Query *parsetree, List *activeRIRs)
                 expression_tree_walker((Node *) withCheckOptions,
                                        fireRIRonSubLink, (void *) activeRIRs);

-                activeRIRs = list_delete_first(activeRIRs);
+                activeRIRs = list_delete_last(activeRIRs);
             }

             /*
@@ -3711,7 +3711,7 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
             rev = (rewrite_event *) palloc(sizeof(rewrite_event));
             rev->relation = RelationGetRelid(rt_entry_relation);
             rev->event = event;
-            rewrite_events = lcons(rev, rewrite_events);
+            rewrite_events = lappend(rewrite_events, rev);

             foreach(n, product_queries)
             {
@@ -3722,7 +3722,7 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
                 rewritten = list_concat(rewritten, newstuff);
             }

-            rewrite_events = list_delete_first(rewrite_events);
+            rewrite_events = list_delete_last(rewrite_events);
         }

         /*
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 71dc4dc..1463408 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -531,6 +531,7 @@ extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
 extern List *list_delete_oid(List *list, Oid datum);
 extern List *list_delete_first(List *list);
+extern List *list_delete_last(List *list);
 extern List *list_delete_nth_cell(List *list, int n);
 extern List *list_delete_cell(List *list, ListCell *cell);

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index ebaec4f..cedb4ee 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1566,7 +1566,7 @@ get_tables_to_cluster(MemoryContext cluster_context)
         rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
         rvtc->tableOid = index->indrelid;
         rvtc->indexOid = index->indexrelid;
-        rvs = lcons(rvtc, rvs);
+        rvs = lappend(rvs, rvtc);

         MemoryContextSwitchTo(old_context);
     }
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index e9c8873..89887b8 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -2999,7 +2999,7 @@ get_rels_with_domain(Oid domainOid, LOCKMODE lockmode)
             rtc->rel = rel;
             rtc->natts = 0;
             rtc->atts = (int *) palloc(sizeof(int) * RelationGetNumberOfAttributes(rel));
-            result = lcons(rtc, result);
+            result = lappend(result, rtc);
         }

         /*
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index e4e0575..6d09f2a 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -786,7 +786,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
                 {
                     AggState   *aggstate = (AggState *) state->parent;

-                    aggstate->aggs = lcons(astate, aggstate->aggs);
+                    aggstate->aggs = lappend(aggstate->aggs, astate);
                     aggstate->numaggs++;
                 }
                 else
@@ -834,7 +834,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
                     WindowAggState *winstate = (WindowAggState *) state->parent;
                     int            nfuncs;

-                    winstate->funcs = lcons(wfstate, winstate->funcs);
+                    winstate->funcs = lappend(winstate->funcs, wfstate);
                     nfuncs = ++winstate->numfuncs;
                     if (wfunc->winagg)
                         winstate->numaggs++;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6ea625a..98e9948 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -419,6 +419,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,

             index_close(indexRelation, NoLock);

+            /*
+             * We've historically used lcons() here.  It'd make more sense to
+             * use lappend(), but that causes the planner to change behavior
+             * in cases where two indexes seem equally attractive.  For now,
+             * stick with lcons() --- few tables should have so many indexes
+             * that the O(N^2) behavior of lcons() is really a problem.
+             */
             indexinfos = lcons(info, indexinfos);
         }

@@ -1339,7 +1346,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
             info->kind = STATS_EXT_NDISTINCT;
             info->keys = bms_copy(keys);

-            stainfos = lcons(info, stainfos);
+            stainfos = lappend(stainfos, info);
         }

         if (statext_is_kind_built(dtup, STATS_EXT_DEPENDENCIES))
@@ -1351,7 +1358,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
             info->kind = STATS_EXT_DEPENDENCIES;
             info->keys = bms_copy(keys);

-            stainfos = lcons(info, stainfos);
+            stainfos = lappend(stainfos, info);
         }

         if (statext_is_kind_built(dtup, STATS_EXT_MCV))
@@ -1363,7 +1370,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
             info->kind = STATS_EXT_MCV;
             info->keys = bms_copy(keys);

-            stainfos = lcons(info, stainfos);
+            stainfos = lappend(stainfos, info);
         }

         ReleaseSysCache(htup);
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 19e3164..354030e 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -1132,7 +1132,7 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
         if (expr == NULL)
             continue;            /* probably cannot happen */

-        groupClauses = lcons(expr, groupClauses);
+        groupClauses = lappend(groupClauses, expr);
     }

     /*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 66449b8..7eba59e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3201,7 +3201,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
          * Split the list of varinfos in two - one for the current rel, one
          * for remaining Vars on other rels.
          */
-        relvarinfos = lcons(varinfo1, relvarinfos);
+        relvarinfos = lappend(relvarinfos, varinfo1);
         for_each_cell(l, varinfos, list_second_cell(varinfos))
         {
             GroupVarInfo *varinfo2 = (GroupVarInfo *) lfirst(l);
@@ -3209,12 +3209,12 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
             if (varinfo2->rel == varinfo1->rel)
             {
                 /* varinfos on current rel */
-                relvarinfos = lcons(varinfo2, relvarinfos);
+                relvarinfos = lappend(relvarinfos, varinfo2);
             }
             else
             {
                 /* not time to process varinfo2 yet */
-                newvarinfos = lcons(varinfo2, newvarinfos);
+                newvarinfos = lappend(newvarinfos, varinfo2);
             }
         }

diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index ef8eec3..be4ddf8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2030,10 +2030,10 @@ NOTICE:  avg_transfn called with 3

 -- this should not share the state due to different input columns.
 select my_avg(one),my_sum(two) from (values(1,2),(3,4)) t(one,two);
-NOTICE:  avg_transfn called with 2
 NOTICE:  avg_transfn called with 1
-NOTICE:  avg_transfn called with 4
+NOTICE:  avg_transfn called with 2
 NOTICE:  avg_transfn called with 3
+NOTICE:  avg_transfn called with 4
  my_avg | my_sum
 --------+--------
       2 |      6

pgsql-hackers by date:

Previous
From: Jerry Sievers
Date:
Subject: Re: SegFault on 9.6.14
Next
From: Thomas Munro
Date:
Subject: Re: SegFault on 9.6.14