Thread: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

From
David Rowley
Date:
Over in [1] I complained to Tom that I thought he should use
list_nth() instead of linitial() and lsecond(). My reasoning was that
we already knew that the list contained 2 elements, and linitial() and
lsecond() do some additional checks and return NULL if there are fewer
than that amount of elements in the list, or so I thought.  As it
turns out, due to the lfirst() dereferencing the returned pointer,
we'd just segfault if the list being too short and we got NULL.

Since there appears to be no actual safety reason to use those macros
in case your list is too small, it seems better just to use
lfirst(list_nth_cell(..., ...)).  This saves doing the checks to see
if the list is NIL or too short.  It seems nice to get rid of that
additional branching.

We can't just return list_nth() as some code still do things like:

linitial(list) = something;

and we obviously can't assign "something" to the return value of a
function call.

I've attached a patch which improves the macros.

I see on gcc9.3 this reduces the size of the postgres binary by about
16KB: 9027296 to 9011320.

I'm a bit unsure about llast()'s new double evaluation of the list.
Perhaps I can add a new inline function named list_last_cell() to get
around that... Or maybe it doesn't matter. I'm not quite sure what's
best there.

David


[1] https://www.postgresql.org/message-id/CAApHDvqeh8JEqMjpCFTgHD_zu2S03nOVh2srejd+sNLza8M+mg@mail.gmail.com

Attachment
David Rowley <dgrowleyml@gmail.com> writes:
> I'm a bit unsure about llast()'s new double evaluation of the list.
> Perhaps I can add a new inline function named list_last_cell() to get
> around that... Or maybe it doesn't matter. I'm not quite sure what's
> best there.

Double evaluation bad, especially in a macro that has not had such a
hazard for the last twenty-plus years.

It might not be worth mucking with llast, as it's not used very heavily
I believe.  But if it is, then let's add another inline function.

            regards, tom lane



Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

From
David Rowley
Date:
On Mon, 28 Sep 2020 at 12:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > I'm a bit unsure about llast()'s new double evaluation of the list.
> > Perhaps I can add a new inline function named list_last_cell() to get
> > around that... Or maybe it doesn't matter. I'm not quite sure what's
> > best there.
>
> Double evaluation bad, especially in a macro that has not had such a
> hazard for the last twenty-plus years.
>
> It might not be worth mucking with llast, as it's not used very heavily
> I believe.  But if it is, then let's add another inline function.

Thanks for having a look at this.

I changed things around to make llast() and the int and oid variant
use a new inline function to get the last cell.

I also pushed the resulting code to master.

David



David Rowley <dgrowleyml@gmail.com> writes:
> I changed things around to make llast() and the int and oid variant
> use a new inline function to get the last cell.
> I also pushed the resulting code to master.

LGTM.

            regards, tom lane



Poking around to count remaining uses of those inline functions,
I found a few places that should be using the macros instead,
and fixed them.  After that, I notice that list_tail(),
list_third_cell(), and list_fourth_cell() are entirely unreferenced.
I'm hesitant to get rid of list_tail(), because it seems like it
could well be used by extensions.  But I'd bet quite a bit that
list_third_cell() and list_fourth_cell() are not used anywhere
anymore.  Should we get rid of them to shave a few microseconds
from compile times?

            regards, tom lane



Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

From
David Rowley
Date:
Thanks for 9d299a492.

On Mon, 28 Sep 2020 at 15:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Poking around to count remaining uses of those inline functions,
> I found a few places that should be using the macros instead,
> and fixed them.  After that, I notice that list_tail(),
> list_third_cell(), and list_fourth_cell() are entirely unreferenced.
> I'm hesitant to get rid of list_tail(), because it seems like it
> could well be used by extensions.  But I'd bet quite a bit that
> list_third_cell() and list_fourth_cell() are not used anywhere
> anymore.  Should we get rid of them to shave a few microseconds
> from compile times?

I wouldn't object to the removal of list_third_cell() and list_fourth_cell().

I agree to your reasoning with last_tail(). It does seem more likely
that someone would use it. Although, if you'd proposed to remove it
too, I wouldn't have objected.  It's not like it's hard to reimplement
within an extension for any extensions that use it. Though, perhaps it
would maybe be a shame if that was the sole thing we broke for them
when they try compiling their extension in a year's time on the newly
release PG14.

David



David Rowley <dgrowleyml@gmail.com> writes:
> On Mon, 28 Sep 2020 at 15:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Poking around to count remaining uses of those inline functions,
>> I found a few places that should be using the macros instead,
>> and fixed them.  After that, I notice that list_tail(),
>> list_third_cell(), and list_fourth_cell() are entirely unreferenced.
>> I'm hesitant to get rid of list_tail(), because it seems like it
>> could well be used by extensions.  But I'd bet quite a bit that
>> list_third_cell() and list_fourth_cell() are not used anywhere
>> anymore.  Should we get rid of them to shave a few microseconds
>> from compile times?

> I wouldn't object to the removal of list_third_cell() and list_fourth_cell().

> I agree to your reasoning with last_tail(). It does seem more likely
> that someone would use it. Although, if you'd proposed to remove it
> too, I wouldn't have objected.  It's not like it's hard to reimplement
> within an extension for any extensions that use it. Though, perhaps it
> would maybe be a shame if that was the sole thing we broke for them
> when they try compiling their extension in a year's time on the newly
> release PG14.

Looking a bit harder, I notice that list_third_cell() and
list_fourth_cell() were in fact introduced in v13, purely to support
lthird() and lfourth().  So it hardly seems likely that any extensions
would have grown direct dependencies on them already.  list_tail()
has been there a long time though.

Just to be sure, I checked codesearch.debian.net, and it failed
to find any outside uses either.  So I'll go ahead and remove
those two.

list_second_cell() does have uses, although I observe that they
are almost exclusively in locutions such as

    for_each_cell(lc, rollups, list_second_cell(rollups))

to iterate over all-but-the-first elements of a list.  I wonder if
we ought to come up with a better notation for that.  I'm imagining
something like

    for_each_from(lc, rollups, 1)

to start from list index 1.  It looks like this would be marginally
more efficient, and perhaps more readable.

            regards, tom lane



I wrote:
> list_second_cell() does have uses, although I observe that they
> are almost exclusively in locutions such as
>     for_each_cell(lc, rollups, list_second_cell(rollups))
> to iterate over all-but-the-first elements of a list.  I wonder if
> we ought to come up with a better notation for that.  I'm imagining
> something like
>     for_each_from(lc, rollups, 1)
> to start from list index 1.  It looks like this would be marginally
> more efficient, and perhaps more readable.

Concretely, I'm thinking about the attached.  This does seem simpler.
It reduces the size of the executable by about 560 bytes on
my machine, or 46 bytes per usage, which isn't bad.  (Note: this
is in an assert-enabled build, might be different otherwise.)
I didn't try to measure performance changes, but it should be for
the better.

Looking at the remaining instances of for_each_cell, I see several
where it seems like it'd be simpler and clearer to use for_each_from.
But for the moment I confined myself to changing just the instances
following the pattern above.

I noticed while messing with this that I'd neglected to const-ify
the support functions for for_each_cell() and for_both_cell(),
so this fixes that too.

I'm somewhat inclined to back-patch this into v13.  The missing
const decoration seems arguably a bug, which we've missed noticing
only because of our generally lamentable under-usage of const.
And I think it'll be helpful for future back-patching if
for_each_from is available in all versions with the new List API.

            regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 16285ad09f..e0ac4e05e5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5732,7 +5732,7 @@ ATCheckPartitionsNotInUse(Relation rel, LOCKMODE lockmode)

         inh = find_all_inheritors(RelationGetRelid(rel), lockmode, NULL);
         /* first element is the parent rel; must ignore it */
-        for_each_cell(cell, inh, list_second_cell(inh))
+        for_each_from(cell, inh, 1)
         {
             Relation    childrel;

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 9ce8f43385..1dc873ed25 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -441,7 +441,7 @@ exprTypmod(const Node *expr)
                 typmod = exprTypmod((Node *) linitial(cexpr->args));
                 if (typmod < 0)
                     return -1;    /* no point in trying harder */
-                for_each_cell(arg, cexpr->args, list_second_cell(cexpr->args))
+                for_each_from(arg, cexpr->args, 1)
                 {
                     Node       *e = (Node *) lfirst(arg);

@@ -469,7 +469,7 @@ exprTypmod(const Node *expr)
                 typmod = exprTypmod((Node *) linitial(mexpr->args));
                 if (typmod < 0)
                     return -1;    /* no point in trying harder */
-                for_each_cell(arg, mexpr->args, list_second_cell(mexpr->args))
+                for_each_from(arg, mexpr->args, 1)
                 {
                     Node       *e = (Node *) lfirst(arg);

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 99278eed93..3d7a4e373f 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2261,7 +2261,7 @@ create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path)
     {
         bool        is_first_sort = ((RollupData *) linitial(rollups))->is_hashed;

-        for_each_cell(lc, rollups, list_second_cell(rollups))
+        for_each_from(lc, rollups, 1)
         {
             RollupData *rollup = lfirst(lc);
             AttrNumber *new_grpColIdx;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e2b4965c4..f331f82a6c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4430,7 +4430,7 @@ consider_groupingsets_paths(PlannerInfo *root,
              * below, must use the same condition.
              */
             i = 0;
-            for_each_cell(lc, gd->rollups, list_second_cell(gd->rollups))
+            for_each_from(lc, gd->rollups, 1)
             {
                 RollupData *rollup = lfirst_node(RollupData, lc);

@@ -4464,7 +4464,7 @@ consider_groupingsets_paths(PlannerInfo *root,
                 rollups = list_make1(linitial(gd->rollups));

                 i = 0;
-                for_each_cell(lc, gd->rollups, list_second_cell(gd->rollups))
+                for_each_from(lc, gd->rollups, 1)
                 {
                     RollupData *rollup = lfirst_node(RollupData, lc);

diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index f813b587f1..783f3fe8f2 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -1083,7 +1083,7 @@ parseCheckAggregates(ParseState *pstate, Query *qry)

         if (gset_common)
         {
-            for_each_cell(l, gsets, list_second_cell(gsets))
+            for_each_from(l, gsets, 1)
             {
                 gset_common = list_intersection_int(gset_common, lfirst(l));
                 if (!gset_common)
@@ -1774,7 +1774,7 @@ expand_grouping_sets(List *groupingSets, int limit)
         result = lappend(result, list_union_int(NIL, (List *) lfirst(lc)));
     }

-    for_each_cell(lc, expanded_groups, list_second_cell(expanded_groups))
+    for_each_from(lc, expanded_groups, 1)
     {
         List       *p = lfirst(lc);
         List       *new_result = NIL;
diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index 88ef9550e9..53f422260c 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -441,7 +441,7 @@ makeItemList(List *list)
     while (end->next)
         end = end->next;

-    for_each_cell(cell, list, list_second_cell(list))
+    for_each_from(cell, list, 1)
     {
         JsonPathParseItem *c = (JsonPathParseItem *) lfirst(cell);

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 03cf241996..62023c20b2 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -8113,7 +8113,7 @@ get_rule_expr(Node *node, deparse_context *context,
             {
                 BoolExpr   *expr = (BoolExpr *) node;
                 Node       *first_arg = linitial(expr->args);
-                ListCell   *arg = list_second_cell(expr->args);
+                ListCell   *arg;

                 switch (expr->boolop)
                 {
@@ -8122,12 +8122,11 @@ get_rule_expr(Node *node, deparse_context *context,
                             appendStringInfoChar(buf, '(');
                         get_rule_expr_paren(first_arg, context,
                                             false, node);
-                        while (arg)
+                        for_each_from(arg, expr->args, 1)
                         {
                             appendStringInfoString(buf, " AND ");
                             get_rule_expr_paren((Node *) lfirst(arg), context,
                                                 false, node);
-                            arg = lnext(expr->args, arg);
                         }
                         if (!PRETTY_PAREN(context))
                             appendStringInfoChar(buf, ')');
@@ -8138,12 +8137,11 @@ get_rule_expr(Node *node, deparse_context *context,
                             appendStringInfoChar(buf, '(');
                         get_rule_expr_paren(first_arg, context,
                                             false, node);
-                        while (arg)
+                        for_each_from(arg, expr->args, 1)
                         {
                             appendStringInfoString(buf, " OR ");
                             get_rule_expr_paren((Node *) lfirst(arg), context,
                                                 false, node);
-                            arg = lnext(expr->args, arg);
                         }
                         if (!PRETTY_PAREN(context))
                             appendStringInfoChar(buf, ')');
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 00c7afc66f..bec357fcef 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3519,7 +3519,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
          * for remaining Vars on other rels.
          */
         relvarinfos = lappend(relvarinfos, varinfo1);
-        for_each_cell(l, varinfos, list_second_cell(varinfos))
+        for_each_from(l, varinfos, 1)
         {
             GroupVarInfo *varinfo2 = (GroupVarInfo *) lfirst(l);

diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 104df4174a..eefe705c39 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -389,6 +389,32 @@ lnext(const List *l, const ListCell *c)
  */
 #define foreach_current_index(cell)  (cell##__state.i)

+/*
+ * for_each_from -
+ *      Like foreach(), but start from the N'th (zero-based) list element,
+ *      not necessarily the first one.
+ *
+ * It's okay for N to exceed the list length, but not for it to be negative.
+ *
+ * The caveats for foreach() apply equally here.
+ */
+#define for_each_from(cell, lst, N)    \
+    for (ForEachState cell##__state = for_each_from_setup(lst, N); \
+         (cell##__state.l != NIL && \
+          cell##__state.i < cell##__state.l->length) ? \
+         (cell = &cell##__state.l->elements[cell##__state.i], true) : \
+         (cell = NULL, false); \
+         cell##__state.i++)
+
+static inline ForEachState
+for_each_from_setup(const List *lst, int N)
+{
+    ForEachState r = {lst, N};
+
+    Assert(N >= 0);
+    return r;
+}
+
 /*
  * for_each_cell -
  *      a convenience macro which loops through a list starting from a
@@ -405,7 +431,7 @@ lnext(const List *l, const ListCell *c)
          cell##__state.i++)

 static inline ForEachState
-for_each_cell_setup(List *lst, ListCell *initcell)
+for_each_cell_setup(const List *lst, const ListCell *initcell)
 {
     ForEachState r = {lst,
     initcell ? list_cell_number(lst, initcell) : list_length(lst)};
@@ -456,8 +482,8 @@ for_each_cell_setup(List *lst, ListCell *initcell)
          cell1##__state.i1++, cell1##__state.i2++)

 static inline ForBothCellState
-for_both_cell_setup(List *list1, ListCell *initcell1,
-                    List *list2, ListCell *initcell2)
+for_both_cell_setup(const List *list1, const ListCell *initcell1,
+                    const List *list2, const ListCell *initcell2)
 {
     ForBothCellState r = {list1, list2,
         initcell1 ? list_cell_number(list1, initcell1) : list_length(list1),

Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

From
David Rowley
Date:
On Tue, 29 Sep 2020 at 11:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > list_second_cell() does have uses, although I observe that they
> > are almost exclusively in locutions such as
> >       for_each_cell(lc, rollups, list_second_cell(rollups))
> > to iterate over all-but-the-first elements of a list.  I wonder if
> > we ought to come up with a better notation for that.  I'm imagining
> > something like
> >       for_each_from(lc, rollups, 1)
> > to start from list index 1.  It looks like this would be marginally
> > more efficient, and perhaps more readable.
>
> Concretely, I'm thinking about the attached.

I had a look over this and I like it.  It seems good as it allows
consumers to choose N programmatically rather than be fixed into using
list_second_cell() or list_fortysecond_cell().

> I'm somewhat inclined to back-patch this into v13.  The missing
> const decoration seems arguably a bug, which we've missed noticing
> only because of our generally lamentable under-usage of const.
> And I think it'll be helpful for future back-patching if
> for_each_from is available in all versions with the new List API.

It does seem fairly low risk and having personally experienced
backpatching pain, I understand your motivation to backpatch.  I
certainly wouldn't object to backpacking but will defer to your better
judgement on whether you choose to or not.

David



David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, 29 Sep 2020 at 11:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Concretely, I'm thinking about the attached.

> I had a look over this and I like it.  It seems good as it allows
> consumers to choose N programmatically rather than be fixed into using
> list_second_cell() or list_fortysecond_cell().

Thanks for looking!  Right now, if you want to start from a non-first
point in the list, you have to use for_each_cell (or else write out
a loop manually, but let's not).  I'm not proposing to remove that
alternative, but there are surely cases where it's simpler or clearer
to use a list index instead of a ListCell pointer -- especially so if
the list index is a constant.  So I think this is a pretty clear win
that simply failed to occur to me earlier.

>> I'm somewhat inclined to back-patch this into v13.  The missing
>> const decoration seems arguably a bug, which we've missed noticing
>> only because of our generally lamentable under-usage of const.
>> And I think it'll be helpful for future back-patching if
>> for_each_from is available in all versions with the new List API.

> It does seem fairly low risk and having personally experienced
> backpatching pain, I understand your motivation to backpatch.  I
> certainly wouldn't object to backpacking but will defer to your better
> judgement on whether you choose to or not.

A key point here is that everyplace I'm proposing to touch was already
changed in v13 (a fortiori, because list_second_cell wasn't there in v12).
So we can either have two different coding patterns in these areas, or
three.  Two's better from a backpatching standpoint.  The fact that
v13 is barely out the door also factors into this ... a year from now,
my judgment would probably be different.

            regards, tom lane



Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

From
David Rowley
Date:
On Tue, 29 Sep 2020 at 12:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > It does seem fairly low risk and having personally experienced
> > backpatching pain, I understand your motivation to backpatch.  I
> > certainly wouldn't object to backpacking but will defer to your better
> > judgement on whether you choose to or not.
>
> A key point here is that everyplace I'm proposing to touch was already
> changed in v13 (a fortiori, because list_second_cell wasn't there in v12).
> So we can either have two different coding patterns in these areas, or
> three.  Two's better from a backpatching standpoint.  The fact that
> v13 is barely out the door also factors into this ... a year from now,
> my judgment would probably be different.

Yeah, I understand that part.  The pain I was talking about was
writing a patch for master, trying to apply it to the previous version
only to find it does not apply. After fixing it up for that version
trying to apply that patch to the version before that and getting more
conflicts. Repeating that down to our earliest supported version is
something I'd really love to never have to do again. It's an
exhausting process. It's also risky having to custom write a version
of the patch for each release. So I understand and agree with your
reasoning to backpatch as it could reduce the number of versions of a
patch that must be written to fix a bug.  That could reduce the
chances of someone messing up a backpatch at some later date so might
be a safe option. aka. I'm not going to object to you backpatching
this. :)

David



David Rowley <dgrowleyml@gmail.com> writes:
> Yeah, I understand that part.  The pain I was talking about was
> writing a patch for master, trying to apply it to the previous version
> only to find it does not apply. After fixing it up for that version
> trying to apply that patch to the version before that and getting more
> conflicts. Repeating that down to our earliest supported version is
> something I'd really love to never have to do again.

Tell me about it :-(.  I've spent countless hours on that sort
of activity.  So I really dislike changing an API and then changing
it again in the next release.

            regards, tom lane