Thread: inefficient loop in StandbyReleaseLockList()

inefficient loop in StandbyReleaseLockList()

From
"Bossart, Nathan"
Date:
Hi hackers,

I've seen a few cases now for v13 where the startup process on a
standby appears to be stuck on StandbyReleaseLockList().  It looks
like most of the time is spent on list_delete_first().  I believe this
is related to the recent list rewrite (1cff1b9), which has the
following note in the commit message:

        * Inserting or deleting a list element now takes time proportional to
        the distance to the end of the list, due to moving the array elements.
        (However, it typically *doesn't* require palloc or pfree, so except in
        long lists it's probably still faster than before.)  Notably, lcons()
        used to be about the same cost as lappend(), but that's no longer true
        if the list is long.  Code that uses lcons() and list_delete_first()
        to maintain a stack might usefully be rewritten to push and pop at the
        end of the list rather than the beginning.

The current form of StandbyReleaseLockList() is something like this:

        while (mylist != NIL)
        {
                int i = linitial_int(mylist);
                ...
                mylist = list_delete_first(mylist);
        }

For a long enough list, this is wildly inefficient.  The following
form is much faster for longer lists:

        foreach(lc, mylist)
        {
                int i = lfirst_int(lc);
                ...
        }
        list_free(mylist);

I wrote up a simple test function for each form.  For a list of
500,000 integers, the first form took about a minute, while the second
form took about 6 milliseconds.

I've attached a patch that converts StandbyReleaseLockList() to the
second loop form.

Nathan


Attachment

Re: inefficient loop in StandbyReleaseLockList()

From
Amul Sul
Date:
On Thu, Oct 28, 2021 at 9:07 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> Hi hackers,
>
> I've seen a few cases now for v13 where the startup process on a
> standby appears to be stuck on StandbyReleaseLockList().  It looks
> like most of the time is spent on list_delete_first().  I believe this
> is related to the recent list rewrite (1cff1b9), which has the
> following note in the commit message:
>
>         * Inserting or deleting a list element now takes time proportional to
>         the distance to the end of the list, due to moving the array elements.
>         (However, it typically *doesn't* require palloc or pfree, so except in
>         long lists it's probably still faster than before.)  Notably, lcons()
>         used to be about the same cost as lappend(), but that's no longer true
>         if the list is long.  Code that uses lcons() and list_delete_first()
>         to maintain a stack might usefully be rewritten to push and pop at the
>         end of the list rather than the beginning.
>
> The current form of StandbyReleaseLockList() is something like this:
>
>         while (mylist != NIL)
>         {
>                 int i = linitial_int(mylist);
>                 ...
>                 mylist = list_delete_first(mylist);
>         }
>
> For a long enough list, this is wildly inefficient.  The following
> form is much faster for longer lists:
>
>         foreach(lc, mylist)
>         {
>                 int i = lfirst_int(lc);
>                 ...
>         }
>         list_free(mylist);
>
> I wrote up a simple test function for each form.  For a list of
> 500,000 integers, the first form took about a minute, while the second
> form took about 6 milliseconds.
>
> I've attached a patch that converts StandbyReleaseLockList() to the
> second loop form.
>

+1, deleting everything at once is much better.  Deleting one by one
using list_delete_first is a bit heavy; does the element shifting that
involves memory allocation and copy operation which is unnecessary
here.

Regards,
Amul



Re: inefficient loop in StandbyReleaseLockList()

From
Kyotaro Horiguchi
Date:
At Thu, 28 Oct 2021 09:54:42 +0530, Amul Sul <sulamul@gmail.com> wrote in 
> On Thu, Oct 28, 2021 at 9:07 AM Bossart, Nathan <bossartn@amazon.com> wrote:
> >
> > Hi hackers,
> >
> > I've seen a few cases now for v13 where the startup process on a
> > standby appears to be stuck on StandbyReleaseLockList().  It looks
> > like most of the time is spent on list_delete_first().  I believe this
> > is related to the recent list rewrite (1cff1b9), which has the
> > following note in the commit message:
> >
> >         * Inserting or deleting a list element now takes time proportional to
> >         the distance to the end of the list, due to moving the array elements.
> >         (However, it typically *doesn't* require palloc or pfree, so except in
> >         long lists it's probably still faster than before.)  Notably, lcons()
> >         used to be about the same cost as lappend(), but that's no longer true
> >         if the list is long.  Code that uses lcons() and list_delete_first()
> >         to maintain a stack might usefully be rewritten to push and pop at the
> >         end of the list rather than the beginning.
> >
> > The current form of StandbyReleaseLockList() is something like this:
> >
> >         while (mylist != NIL)
> >         {
> >                 int i = linitial_int(mylist);
> >                 ...
> >                 mylist = list_delete_first(mylist);
> >         }
> >
> > For a long enough list, this is wildly inefficient.  The following
> > form is much faster for longer lists:
> >
> >         foreach(lc, mylist)
> >         {
> >                 int i = lfirst_int(lc);
> >                 ...
> >         }
> >         list_free(mylist);
> >
> > I wrote up a simple test function for each form.  For a list of
> > 500,000 integers, the first form took about a minute, while the second
> > form took about 6 milliseconds.

Nice finding!

> > I've attached a patch that converts StandbyReleaseLockList() to the
> > second loop form.
> >
> 
> +1, deleting everything at once is much better.  Deleting one by one
> using list_delete_first is a bit heavy; does the element shifting that
> involves memory allocation and copy operation which is unnecessary
> here.

+1.

I found several other instances of the pattern
"while(list){list_delete_first(); /*no-break*/}" in
llvm_release_context, gistProcessEmptyingQueue, AtEOXact_Namespace and
maybe transformGraph and processState in trgm_regexp.c.  We might want
to apply this technique to the three first, and maybe to the last two.

However, I'm fine with fixing only StandbyRelaseLockList(), which
actually suffers from list_delete_first().

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: inefficient loop in StandbyReleaseLockList()

From
Michael Paquier
Date:
On Thu, Oct 28, 2021 at 03:57:51PM +0900, Kyotaro Horiguchi wrote:
> I found several other instances of the pattern
> "while(list){list_delete_first(); /*no-break*/}" in
> llvm_release_context, gistProcessEmptyingQueue, AtEOXact_Namespace and
> maybe transformGraph and processState in trgm_regexp.c.  We might want
> to apply this technique to the three first, and maybe to the last two.
>
> However, I'm fine with fixing only StandbyRelaseLockList(), which
> actually suffers from list_delete_first().

I can also see a large gap between one technique and the other, so
this looks like a good catch to me coming from Nathan :)

As it could indeed hurt badly the time it takes to do a shutdown or to
end recovery, we had better back-patch that down to 13 in my opinion.

transformGraph and processState seem to be worth improving on
performance ground, as well, but they look less critical than this
one but we could do something on HEAD.  Skimming through the rest of
the code, we may be able to improve some areas related to namespaces,
but that does not seem worth it in terms of code complication.
--
Michael

Attachment

Re: inefficient loop in StandbyReleaseLockList()

From
"Bossart, Nathan"
Date:
On 10/28/21, 12:58 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Thu, Oct 28, 2021 at 03:57:51PM +0900, Kyotaro Horiguchi wrote:
>> However, I'm fine with fixing only StandbyRelaseLockList(), which
>> actually suffers from list_delete_first().
>
> I can also see a large gap between one technique and the other, so
> this looks like a good catch to me coming from Nathan :)

:)

> As it could indeed hurt badly the time it takes to do a shutdown or to 
> end recovery, we had better back-patch that down to 13 in my opinion.

+1

> transformGraph and processState seem to be worth improving on
> performance ground, as well, but they look less critical than this
> one but we could do something on HEAD.  Skimming through the rest of
> the code, we may be able to improve some areas related to namespaces,
> but that does not seem worth it in terms of code complication.

I just did my own scan through uses of list_delete_first(), and I only
found a couple that might be easily transitioned to the foreach()
approach.  I don't think I'm going to pick that up at the moment, but
I'd readily help review such patches if there is a demonstrable
improvement.

Nathan


Re: inefficient loop in StandbyReleaseLockList()

From
Andres Freund
Date:
Hi,

On 2021-10-28 15:57:51 +0900, Kyotaro Horiguchi wrote:
> I found several other instances of the pattern
> "while(list){list_delete_first(); /*no-break*/}" in
> llvm_release_context, gistProcessEmptyingQueue, AtEOXact_Namespace and
> maybe transformGraph and processState in trgm_regexp.c.  We might want
> to apply this technique to the three first, and maybe to the last two.

We should be careful with changes like this, because there's some advantages
in the while(!empty) pattern too. Iterating over the whole list doesn't work
if there's any other modifications to the list, or if there's a chance of
errors. For the latter there just needs to be a CHECK_FOR_INTERRUPTS()
somewhere...

Greetings,

Andres Freund



Re: inefficient loop in StandbyReleaseLockList()

From
Andres Freund
Date:
Hi,

On 2021-10-28 15:07:48 -0700, Andres Freund wrote:
> On 2021-10-28 15:57:51 +0900, Kyotaro Horiguchi wrote:
> > I found several other instances of the pattern
> > "while(list){list_delete_first(); /*no-break*/}" in
> > llvm_release_context, gistProcessEmptyingQueue, AtEOXact_Namespace and
> > maybe transformGraph and processState in trgm_regexp.c.  We might want
> > to apply this technique to the three first, and maybe to the last two.
> 
> We should be careful with changes like this, because there's some advantages
> in the while(!empty) pattern too. Iterating over the whole list doesn't work
> if there's any other modifications to the list, or if there's a chance of
> errors. For the latter there just needs to be a CHECK_FOR_INTERRUPTS()
> somewhere...

Which leads to to wonder whether the better fix would be to switch to deleting
the last element, but still use the while (!empty) style. That should convert
the O(n^2) due to 1cff1b9 back to O(n). It might or might not be faster/slower
than using foreach(), but it should be within the same ballpark.

Greetings,

Andres Freund



Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Which leads to to wonder whether the better fix would be to switch to deleting
> the last element, but still use the while (!empty) style. That should convert
> the O(n^2) due to 1cff1b9 back to O(n). It might or might not be faster/slower
> than using foreach(), but it should be within the same ballpark.

Does it matter what order we're releasing the locks in?

            regards, tom lane



Re: inefficient loop in StandbyReleaseLockList()

From
"Bossart, Nathan"
Date:
On 10/28/21, 3:15 PM, "Andres Freund" <andres@anarazel.de> wrote:
> Which leads to to wonder whether the better fix would be to switch to deleting
> the last element, but still use the while (!empty) style. That should convert
> the O(n^2) due to 1cff1b9 back to O(n). It might or might not be faster/slower
> than using foreach(), but it should be within the same ballpark.

Yeah, deleting from the end of the list yields a similar improvement.
foreach() appears to be slightly faster, but the difference is
basically negligible.  For a list of a million integers, foreach()
consistently takes ~12ms, deleting from the end of the list takes
~15ms, and deleting from the beginning of the list takes ~4 minutes.

Nathan


Re: inefficient loop in StandbyReleaseLockList()

From
"Bossart, Nathan"
Date:
On 10/28/21, 3:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> Which leads to to wonder whether the better fix would be to switch to deleting
>> the last element, but still use the while (!empty) style. That should convert
>> the O(n^2) due to 1cff1b9 back to O(n). It might or might not be faster/slower
>> than using foreach(), but it should be within the same ballpark.
>
> Does it matter what order we're releasing the locks in?

I'm not seeing anything that indicates the ordering matters.  AFAICT
either approach would work in this case.  IMO changing the order is
scarier than switching to foreach(), though.

Nathan


Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
"Bossart, Nathan" <bossartn@amazon.com> writes:
> On 10/28/21, 3:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> Does it matter what order we're releasing the locks in?

> I'm not seeing anything that indicates the ordering matters.  AFAICT
> either approach would work in this case.  IMO changing the order is
> scarier than switching to foreach(), though.

Yeah, that was my feeling...

            regards, tom lane



Re: inefficient loop in StandbyReleaseLockList()

From
Andres Freund
Date:
Hi,

On 2021-10-28 19:24:08 -0400, Tom Lane wrote:
> "Bossart, Nathan" <bossartn@amazon.com> writes:
> > On 10/28/21, 3:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> >> Does it matter what order we're releasing the locks in?
> 
> > I'm not seeing anything that indicates the ordering matters.  AFAICT
> > either approach would work in this case.  IMO changing the order is
> > scarier than switching to foreach(), though.
> 
> Yeah, that was my feeling...

I suspect the reverse lock order release could be tad faster. But I probably
wouldn't change it either - I was more thinking of some of the other cases
that deleted the first element, here it's a bit harder to know wether there's
a chance of a CFI() or such.

Greetings,

Andres Freund



Re: inefficient loop in StandbyReleaseLockList()

From
Michael Paquier
Date:
On Thu, Oct 28, 2021 at 04:52:48PM -0700, Andres Freund wrote:
> I suspect the reverse lock order release could be tad faster. But I probably
> wouldn't change it either - I was more thinking of some of the other cases
> that deleted the first element, here it's a bit harder to know wether there's
> a chance of a CFI() or such.

Actually, as the list of recovery locks is saved in TopMemoryContext,
wouldn't it be better to keep a per-cell deletion of the list, which
would mean that we'd better do the operation in the reverse order to
make things faster with the new list implementation?  But that's what
Andres points at with CFIs in the middle of one list of the hash table
processed?
--
Michael

Attachment

Re: inefficient loop in StandbyReleaseLockList()

From
"Bossart, Nathan"
Date:
On 10/28/21, 11:53 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Thu, Oct 28, 2021 at 04:52:48PM -0700, Andres Freund wrote:
>> I suspect the reverse lock order release could be tad faster. But I probably
>> wouldn't change it either - I was more thinking of some of the other cases
>> that deleted the first element, here it's a bit harder to know wether there's
>> a chance of a CFI() or such.
>
> Actually, as the list of recovery locks is saved in TopMemoryContext,
> wouldn't it be better to keep a per-cell deletion of the list, which
> would mean that we'd better do the operation in the reverse order to
> make things faster with the new list implementation?  But that's what
> Andres points at with CFIs in the middle of one list of the hash table
> processed?

Hm.  IIUC anything bad enough to cause the startup process to break
out of the StandbyReleaseLockList() loop will also cause the entire
process to be restarted.  I'm not seeing any risk of reusing a half-
released lock list.  I might be misunderstanding the concern, though.

Nathan


Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
"Bossart, Nathan" <bossartn@amazon.com> writes:
> On 10/28/21, 11:53 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>> Actually, as the list of recovery locks is saved in TopMemoryContext,
>> wouldn't it be better to keep a per-cell deletion of the list, which
>> would mean that we'd better do the operation in the reverse order to
>> make things faster with the new list implementation?  But that's what
>> Andres points at with CFIs in the middle of one list of the hash table
>> processed?

> Hm.  IIUC anything bad enough to cause the startup process to break
> out of the StandbyReleaseLockList() loop will also cause the entire
> process to be restarted.  I'm not seeing any risk of reusing a half-
> released lock list.  I might be misunderstanding the concern, though.

Yeah, there's no expectation that this data structure needs to be kept
consistent after an error; and I'm not real sure that the existing
code could claim to satisfy such a requirement if we did need it.
(There's at least a short window where the caller's hash table entry
will point at an already-freed List.)

Pushed the patch as given.  I've not yet reviewed other list_delete_first
callers, but I'll take a look.  (I seem to remember that I did survey
them while writing 1cff1b95a, but I evidently missed that this code
could be dealing with a list long enough to be problematic.)

            regards, tom lane



Re: inefficient loop in StandbyReleaseLockList()

From
Andres Freund
Date:
Hi,

On 2021-10-31 15:38:15 -0400, Tom Lane wrote:
> Yeah, there's no expectation that this data structure needs to be kept
> consistent after an error; and I'm not real sure that the existing
> code could claim to satisfy such a requirement if we did need it.

To be clear, I was making that comment in response to the search for
other places doing the while(!empty) delete_first() style loops. Some of
them are reached via resowners etc, where reaching the same code
repeatedly is perhaps more realistic.

Greetings,

Andres Freund



Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
I wrote:
> Pushed the patch as given.  I've not yet reviewed other list_delete_first
> callers, but I'll take a look.  (I seem to remember that I did survey
> them while writing 1cff1b95a, but I evidently missed that this code
> could be dealing with a list long enough to be problematic.)

I looked at the remaining list_delete_first callers.

1. Attached is a proposed patch to get rid of the calls in trgm_regexp.c.
I'm not certain that those lists could get long enough to be a problem,
given the existing complexity limits in that file (MAX_EXPANDED_STATES
etc).  But I'm not certain they can't, either, and it's easy enough to
fix along the same lines as in StandbyReleaseLockList.

2. I think we almost certainly have a problem in SyncPostCheckpoint.

3. Is agg_refill_hash_table a problem?  Probably; we've seen cases
with lots of batches.

4. I'm a bit worried about the uses in access/gist/, but I don't know
that code well enough to want to mess with it.  It's possible the
list lengths are bounded by the index tree height, in which case it
likely doesn't matter.  The logic in gistFindPath looks like a mess
anyway since it's appending to both ends of the "fifo" list in different
places (is that really necessary?).

5. Not sure about CopyMultiInsertInfoFlush ... how many buffers
could we have there?

6. llvm_release_context may not have a long enough list to be a
problem, but on the other hand, it looks easy to fix.

7. The list lengths in the parser and dependency.c, ruleutils.c,
explain.c are bounded by subquery nesting depth or plan tree depth,
so I doubt it's worth worrying about.

8. The uses in namespace.c don't seem like an issue either -- for
instance, GetOverrideSearchPath can't iterate more than twice,
and the overrideStack list shouldn't get very deep.

            regards, tom lane

diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c
index 64816dd370..1887a39161 100644
--- a/contrib/pg_trgm/trgm_regexp.c
+++ b/contrib/pg_trgm/trgm_regexp.c
@@ -907,6 +907,7 @@ transformGraph(TrgmNFA *trgmNFA)
     HASHCTL        hashCtl;
     TrgmStateKey initkey;
     TrgmState  *initstate;
+    ListCell   *lc;
 
     /* Initialize this stage's workspace in trgmNFA struct */
     trgmNFA->queue = NIL;
@@ -937,12 +938,13 @@ transformGraph(TrgmNFA *trgmNFA)
     /*
      * Recursively build the expanded graph by processing queue of states
      * (breadth-first search).  getState already put initstate in the queue.
+     * Note that getState will append new states to the queue within the loop,
+     * too; this works as long as we don't do repeat fetches using the "lc"
+     * pointer.
      */
-    while (trgmNFA->queue != NIL)
+    foreach(lc, trgmNFA->queue)
     {
-        TrgmState  *state = (TrgmState *) linitial(trgmNFA->queue);
-
-        trgmNFA->queue = list_delete_first(trgmNFA->queue);
+        TrgmState  *state = (TrgmState *) lfirst(lc);
 
         /*
          * If we overflowed then just mark state as final.  Otherwise do
@@ -966,22 +968,29 @@ transformGraph(TrgmNFA *trgmNFA)
 static void
 processState(TrgmNFA *trgmNFA, TrgmState *state)
 {
+    ListCell   *lc;
+
     /* keysQueue should be NIL already, but make sure */
     trgmNFA->keysQueue = NIL;
 
     /*
      * Add state's own key, and then process all keys added to keysQueue until
-     * queue is empty.  But we can quit if the state gets marked final.
+     * queue is finished.  But we can quit if the state gets marked final.
      */
     addKey(trgmNFA, state, &state->stateKey);
-    while (trgmNFA->keysQueue != NIL && !(state->flags & TSTATE_FIN))
+    foreach(lc, trgmNFA->keysQueue)
     {
-        TrgmStateKey *key = (TrgmStateKey *) linitial(trgmNFA->keysQueue);
+        TrgmStateKey *key = (TrgmStateKey *) lfirst(lc);
 
-        trgmNFA->keysQueue = list_delete_first(trgmNFA->keysQueue);
+        if (state->flags & TSTATE_FIN)
+            break;
         addKey(trgmNFA, state, key);
     }
 
+    /* Release keysQueue to clean up for next cycle */
+    list_free(trgmNFA->keysQueue);
+    trgmNFA->keysQueue = NIL;
+
     /*
      * Add outgoing arcs only if state isn't final (we have no interest in
      * outgoing arcs if we already match)

Re: inefficient loop in StandbyReleaseLockList()

From
Kyotaro Horiguchi
Date:
At Sun, 31 Oct 2021 16:55:01 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> I wrote:
> > Pushed the patch as given.  I've not yet reviewed other list_delete_first
> > callers, but I'll take a look.  (I seem to remember that I did survey
> > them while writing 1cff1b95a, but I evidently missed that this code
> > could be dealing with a list long enough to be problematic.)
> 
> I looked at the remaining list_delete_first callers.
> 
> 1. Attached is a proposed patch to get rid of the calls in trgm_regexp.c.
> I'm not certain that those lists could get long enough to be a problem,
> given the existing complexity limits in that file (MAX_EXPANDED_STATES
> etc).  But I'm not certain they can't, either, and it's easy enough to
> fix along the same lines as in StandbyReleaseLockList.

I should be missing something, but at the added list_free() there's a
case where keysQueue has some elelments.  I think the remaining
elements are useless but AFAICS all the memory allocated there is
freed after createTrgmNFAInternal returnes, before the "next cycle"
comes. Do we need to add that list_free there?

> 2. I think we almost certainly have a problem in SyncPostCheckpoint.

Maybe we want list_delete_first_n() or such to remove the first n
elements in a list at once.

> 3. Is agg_refill_hash_table a problem?  Probably; we've seen cases
> with lots of batches.

I excluded it since I'm not sure it is in the pattern at a glance. I
would want to leave it alone, since changing the logic there seems
making things a bit complex and the gain by removing list_delete_first
doesn't look so large..

> 4. I'm a bit worried about the uses in access/gist/, but I don't know
> that code well enough to want to mess with it.  It's possible the
> list lengths are bounded by the index tree height, in which case it
> likely doesn't matter.  The logic in gistFindPath looks like a mess
> anyway since it's appending to both ends of the "fifo" list in different
> places (is that really necessary?).

From the other side, the elemnts are inserted by lcons, then removed
by list_delete_first.  It is the worst usage of the current list
implementation as a FIFO. Couldn't we construct and iterate over a
list in the reverse order?

> 5. Not sure about CopyMultiInsertInfoFlush ... how many buffers
> could we have there?

(I'm not sure..)

> 6. llvm_release_context may not have a long enough list to be a
> problem, but on the other hand, it looks easy to fix.

Agreed.

> 7. The list lengths in the parser and dependency.c, ruleutils.c,
> explain.c are bounded by subquery nesting depth or plan tree depth,
> so I doubt it's worth worrying about.

Agreed.

> 8. The uses in namespace.c don't seem like an issue either -- for
> instance, GetOverrideSearchPath can't iterate more than twice,
> and the overrideStack list shouldn't get very deep.

If we didn't need the resulting list I'm for changing it but actually
it is needed. So I think we won't get so much by changing the
function.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> At Sun, 31 Oct 2021 16:55:01 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
>> I looked at the remaining list_delete_first callers.
>> 
>> 1. Attached is a proposed patch to get rid of the calls in trgm_regexp.c.
>> I'm not certain that those lists could get long enough to be a problem,
>> given the existing complexity limits in that file (MAX_EXPANDED_STATES
>> etc).  But I'm not certain they can't, either, and it's easy enough to
>> fix along the same lines as in StandbyReleaseLockList.

> I should be missing something, but at the added list_free() there's a
> case where keysQueue has some elelments.  I think the remaining
> elements are useless but AFAICS all the memory allocated there is
> freed after createTrgmNFAInternal returnes, before the "next cycle"
> comes. Do we need to add that list_free there?

I was mainly trying to preserve the memory allocation behavior of the
current code, which will free the List when its last element is removed.
I agree that it *probably* doesn't matter --- but processState is
invoked multiple times during one createTrgmNFAInternal call, so I'm
not quite sure that leaking all those lists couldn't amount to anything.
It seemed prudent to ensure that things couldn't be made worse by this
patch.

>> 3. Is agg_refill_hash_table a problem?  Probably; we've seen cases
>> with lots of batches.

> I excluded it since I'm not sure it is in the pattern at a glance. I
> would want to leave it alone, since changing the logic there seems
> making things a bit complex and the gain by removing list_delete_first
> doesn't look so large..

It looks to me like nodeAgg.c uses the hash_batches list as a stack:
it's pushing things on with lcons, and later popping them off with
list_delete_first.  This is exactly the pattern that 1cff1b95a
recommended reversing.  Now, it's possible that that stack never gets
deep enough for the O(N^2) cost to matter, but ...

>> The logic in gistFindPath looks like a mess
>> anyway since it's appending to both ends of the "fifo" list in different
>> places (is that really necessary?).

> From the other side, the elemnts are inserted by lcons, then removed
> by list_delete_first.  It is the worst usage of the current list
> implementation as a FIFO. Couldn't we construct and iterate over a
> list in the reverse order?

Yeah; at the very least, the use of both lcons and lappend falsifies
the variable name "fifo".  I wonder though if that was intentional
or just somebody's sloppy coding.

            regards, tom lane



Re: inefficient loop in StandbyReleaseLockList()

From
"Bossart, Nathan"
Date:
On 10/31/21, 12:39 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> Yeah, there's no expectation that this data structure needs to be kept
> consistent after an error; and I'm not real sure that the existing
> code could claim to satisfy such a requirement if we did need it.
> (There's at least a short window where the caller's hash table entry
> will point at an already-freed List.)

Right.

> Pushed the patch as given.  I've not yet reviewed other list_delete_first
> callers, but I'll take a look.  (I seem to remember that I did survey
> them while writing 1cff1b95a, but I evidently missed that this code
> could be dealing with a list long enough to be problematic.)

Thanks!

Nathan


Re: inefficient loop in StandbyReleaseLockList()

From
"Bossart, Nathan"
Date:
On 10/31/21, 1:55 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> 1. Attached is a proposed patch to get rid of the calls in trgm_regexp.c.
> I'm not certain that those lists could get long enough to be a problem,
> given the existing complexity limits in that file (MAX_EXPANDED_STATES
> etc).  But I'm not certain they can't, either, and it's easy enough to
> fix along the same lines as in StandbyReleaseLockList.

Should there be a list_free(trgmNFA->queue) at the end of
transformGraph()?

> 2. I think we almost certainly have a problem in SyncPostCheckpoint.

This one doesn't look as straightforward.  It looks like we might need
a list_delete_first_n() to delete the first N entries all at once to
improve this one.

> 3. Is agg_refill_hash_table a problem?  Probably; we've seen cases
> with lots of batches.

IIUC this one can be improved by pushing/popping from the end of the
list instead of the beginning.

> 4. I'm a bit worried about the uses in access/gist/, but I don't know
> that code well enough to want to mess with it.  It's possible the
> list lengths are bounded by the index tree height, in which case it
> likely doesn't matter.  The logic in gistFindPath looks like a mess
> anyway since it's appending to both ends of the "fifo" list in different
> places (is that really necessary?).
>
> 5. Not sure about CopyMultiInsertInfoFlush ... how many buffers
> could we have there?

I haven't looked too closely at these.

> 6. llvm_release_context may not have a long enough list to be a
> problem, but on the other hand, it looks easy to fix.

Yeah, I reviewed this one earlier.  I didn't see any reason this one
couldn't be changed to foreach().

> 7. The list lengths in the parser and dependency.c, ruleutils.c,
> explain.c are bounded by subquery nesting depth or plan tree depth,
> so I doubt it's worth worrying about.

+1

> 8. The uses in namespace.c don't seem like an issue either -- for
> instance, GetOverrideSearchPath can't iterate more than twice,
> and the overrideStack list shouldn't get very deep.

+1

Nathan


Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
"Bossart, Nathan" <bossartn@amazon.com> writes:
> On 10/31/21, 1:55 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> 1. Attached is a proposed patch to get rid of the calls in trgm_regexp.c.

> Should there be a list_free(trgmNFA->queue) at the end of
> transformGraph()?

There could be, but that's visibly invoked only once per 
createTrgmNFAInternal call, so I didn't think it was worthwhile
to do so (unlike the case for processState).  If we were concerned
about leakage in that function, the hash table would be a far
bigger issue.

            regards, tom lane



Re: inefficient loop in StandbyReleaseLockList()

From
"Bossart, Nathan"
Date:
On 11/1/21, 9:58 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> "Bossart, Nathan" <bossartn@amazon.com> writes:
>> Should there be a list_free(trgmNFA->queue) at the end of
>> transformGraph()?
>
> There could be, but that's visibly invoked only once per
> createTrgmNFAInternal call, so I didn't think it was worthwhile
> to do so (unlike the case for processState).  If we were concerned
> about leakage in that function, the hash table would be a far
> bigger issue.

Ah, I see it now.  The patch looks good to me, then.

Nathan


Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
"Bossart, Nathan" <bossartn@amazon.com> writes:
> Ah, I see it now.  The patch looks good to me, then.

Thanks for looking!  Here's an expanded patch that also takes care
of the other two easy-to-fix cases, nodeAgg.c and llvmjit.c.
AFAICS, llvm_release_context is like StandbyReleaseLockList
in that we don't need to worry about whether the data structure
is valid after an error partway through.  (Maybe we should be
worrying, but I think the callers would need work as well if
that's to be the standard.)

            regards, tom lane

diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c
index 64816dd370..1887a39161 100644
--- a/contrib/pg_trgm/trgm_regexp.c
+++ b/contrib/pg_trgm/trgm_regexp.c
@@ -907,6 +907,7 @@ transformGraph(TrgmNFA *trgmNFA)
     HASHCTL        hashCtl;
     TrgmStateKey initkey;
     TrgmState  *initstate;
+    ListCell   *lc;

     /* Initialize this stage's workspace in trgmNFA struct */
     trgmNFA->queue = NIL;
@@ -937,12 +938,13 @@ transformGraph(TrgmNFA *trgmNFA)
     /*
      * Recursively build the expanded graph by processing queue of states
      * (breadth-first search).  getState already put initstate in the queue.
+     * Note that getState will append new states to the queue within the loop,
+     * too; this works as long as we don't do repeat fetches using the "lc"
+     * pointer.
      */
-    while (trgmNFA->queue != NIL)
+    foreach(lc, trgmNFA->queue)
     {
-        TrgmState  *state = (TrgmState *) linitial(trgmNFA->queue);
-
-        trgmNFA->queue = list_delete_first(trgmNFA->queue);
+        TrgmState  *state = (TrgmState *) lfirst(lc);

         /*
          * If we overflowed then just mark state as final.  Otherwise do
@@ -966,22 +968,29 @@ transformGraph(TrgmNFA *trgmNFA)
 static void
 processState(TrgmNFA *trgmNFA, TrgmState *state)
 {
+    ListCell   *lc;
+
     /* keysQueue should be NIL already, but make sure */
     trgmNFA->keysQueue = NIL;

     /*
      * Add state's own key, and then process all keys added to keysQueue until
-     * queue is empty.  But we can quit if the state gets marked final.
+     * queue is finished.  But we can quit if the state gets marked final.
      */
     addKey(trgmNFA, state, &state->stateKey);
-    while (trgmNFA->keysQueue != NIL && !(state->flags & TSTATE_FIN))
+    foreach(lc, trgmNFA->keysQueue)
     {
-        TrgmStateKey *key = (TrgmStateKey *) linitial(trgmNFA->keysQueue);
+        TrgmStateKey *key = (TrgmStateKey *) lfirst(lc);

-        trgmNFA->keysQueue = list_delete_first(trgmNFA->keysQueue);
+        if (state->flags & TSTATE_FIN)
+            break;
         addKey(trgmNFA, state, key);
     }

+    /* Release keysQueue to clean up for next cycle */
+    list_free(trgmNFA->keysQueue);
+    trgmNFA->keysQueue = NIL;
+
     /*
      * Add outgoing arcs only if state isn't final (we have no interest in
      * outgoing arcs if we already match)
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index c99a0de4dd..f5a187cae3 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2584,8 +2584,9 @@ agg_refill_hash_table(AggState *aggstate)
     if (aggstate->hash_batches == NIL)
         return false;

-    batch = linitial(aggstate->hash_batches);
-    aggstate->hash_batches = list_delete_first(aggstate->hash_batches);
+    /* hash_batches is a stack, with the top item at the end of the list */
+    batch = llast(aggstate->hash_batches);
+    aggstate->hash_batches = list_delete_last(aggstate->hash_batches);

     hash_agg_set_limits(aggstate->hashentrysize, batch->input_card,
                         batch->used_bits, &aggstate->hash_mem_limit,
@@ -3098,7 +3099,7 @@ hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
         new_batch = hashagg_batch_new(tape, setno,
                                       spill->ntuples[i], cardinality,
                                       used_bits);
-        aggstate->hash_batches = lcons(new_batch, aggstate->hash_batches);
+        aggstate->hash_batches = lappend(aggstate->hash_batches, new_batch);
         aggstate->hash_batches_used++;
     }

@@ -3113,8 +3114,6 @@ hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 static void
 hashagg_reset_spill_state(AggState *aggstate)
 {
-    ListCell   *lc;
-
     /* free spills from initial pass */
     if (aggstate->hash_spills != NULL)
     {
@@ -3132,13 +3131,7 @@ hashagg_reset_spill_state(AggState *aggstate)
     }

     /* free batches */
-    foreach(lc, aggstate->hash_batches)
-    {
-        HashAggBatch *batch = (HashAggBatch *) lfirst(lc);
-
-        pfree(batch);
-    }
-    list_free(aggstate->hash_batches);
+    list_free_deep(aggstate->hash_batches);
     aggstate->hash_batches = NIL;

     /* close tape set */
diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 169dad96d7..fb29449573 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -171,6 +171,7 @@ static void
 llvm_release_context(JitContext *context)
 {
     LLVMJitContext *llvm_context = (LLVMJitContext *) context;
+    ListCell   *lc;

     /*
      * When this backend is exiting, don't clean up LLVM. As an error might
@@ -188,12 +189,9 @@ llvm_release_context(JitContext *context)
         llvm_context->module = NULL;
     }

-    while (llvm_context->handles != NIL)
+    foreach(lc, llvm_context->handles)
     {
-        LLVMJitHandle *jit_handle;
-
-        jit_handle = (LLVMJitHandle *) linitial(llvm_context->handles);
-        llvm_context->handles = list_delete_first(llvm_context->handles);
+        LLVMJitHandle *jit_handle = (LLVMJitHandle *) lfirst(lc);

 #if LLVM_VERSION_MAJOR > 11
         {
@@ -221,6 +219,8 @@ llvm_release_context(JitContext *context)

         pfree(jit_handle);
     }
+    list_free(llvm_context->handles);
+    llvm_context->handles = NIL;
 }

 /*

Re: inefficient loop in StandbyReleaseLockList()

From
"Bossart, Nathan"
Date:
On 11/1/21, 10:34 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> Thanks for looking!  Here's an expanded patch that also takes care
> of the other two easy-to-fix cases, nodeAgg.c and llvmjit.c.
> AFAICS, llvm_release_context is like StandbyReleaseLockList
> in that we don't need to worry about whether the data structure
> is valid after an error partway through.  (Maybe we should be
> worrying, but I think the callers would need work as well if
> that's to be the standard.)

LGTM.

Nathan


Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
"Bossart, Nathan" <bossartn@amazon.com> writes:
> On 10/31/21, 1:55 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> 2. I think we almost certainly have a problem in SyncPostCheckpoint.

> This one doesn't look as straightforward.  It looks like we might need
> a list_delete_first_n() to delete the first N entries all at once to
> improve this one.

Yeah.  We don't absolutely need a new list primitive: we could use
list_copy_tail() and then free the old list.  But the extra palloc
traffic involved makes this sound like a bad idea.  It does have
the advantage that we could shorten the List's storage even when
it doesn't go to empty, but I'm not sure that's worth anything.
If the List isn't going to empty, that implies that we're getting
a steady stream of unlink requests, meaning we'd probably just
fill it up again.

The minimum-change patch would have us truncating the list before
each AbsorbSyncRequests call, so that the list state meets that
function's expectations.  However, as long as UNLINKS_PER_ABSORB
is only 10, I don't think that gets us out of the O(N^2) woods.
So what I did in the attached is add a "canceled" flag to
PendingUnlinkEntry, which lets us deal with canceled or finished
entries without having to delete them from the list right away.
Then we only need to physically clean up the list once per
SyncPostCheckpoint call.

            regards, tom lane

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 94fb236daf..12847e35ea 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -906,6 +906,72 @@ list_delete_last(List *list)
     return list_truncate(list, list_length(list) - 1);
 }

+/*
+ * Delete the first N cells of the list.
+ *
+ * The List is pfree'd if the request causes all cells to be deleted.
+ */
+List *
+list_delete_first_n(List *list, int n)
+{
+    check_list_invariants(list);
+
+    /* No-op request? */
+    if (n <= 0)
+        return list;
+
+    /* Delete whole list? */
+    if (n >= list_length(list))
+    {
+        list_free(list);
+        return NIL;
+    }
+
+    /*
+     * Otherwise, we normally just collapse out the removed elements.  But for
+     * debugging purposes, move the whole list contents someplace else.
+     *
+     * (Note that we *must* keep the contents in the same memory context.)
+     */
+#ifndef DEBUG_LIST_MEMORY_USAGE
+    memmove(&list->elements[0], &list->elements[n],
+            (list->length - n) * sizeof(ListCell));
+    list->length -= n;
+#else
+    {
+        ListCell   *newelems;
+        int            newmaxlen = list->length - n;
+
+        newelems = (ListCell *)
+            MemoryContextAlloc(GetMemoryChunkContext(list),
+                               newmaxlen * sizeof(ListCell));
+        memcpy(newelems, &list->elements[n], newmaxlen * sizeof(ListCell));
+        if (list->elements != list->initial_elements)
+            pfree(list->elements);
+        else
+        {
+            /*
+             * As in enlarge_list(), clear the initial_elements[] space and/or
+             * mark it inaccessible.
+             */
+#ifdef CLOBBER_FREED_MEMORY
+            wipe_mem(list->initial_elements,
+                     list->max_length * sizeof(ListCell));
+#else
+            VALGRIND_MAKE_MEM_NOACCESS(list->initial_elements,
+                                       list->max_length * sizeof(ListCell));
+#endif
+        }
+        list->elements = newelems;
+        list->max_length = newmaxlen;
+        list->length = newmaxlen;
+        check_list_invariants(list);
+    }
+#endif
+
+    return list;
+}
+
 /*
  * Generate the union of two lists. This is calculated by copying
  * list1 via list_copy(), then adding to it all the members of list2
diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c
index 4a2ed414b0..023d7413e2 100644
--- a/src/backend/storage/sync/sync.c
+++ b/src/backend/storage/sync/sync.c
@@ -69,6 +69,7 @@ typedef struct
 {
     FileTag        tag;            /* identifies handler and file */
     CycleCtr    cycle_ctr;        /* checkpoint_cycle_ctr when request was made */
+    bool        canceled;        /* true if request has been canceled */
 } PendingUnlinkEntry;

 static HTAB *pendingOps = NULL;
@@ -195,11 +196,12 @@ void
 SyncPostCheckpoint(void)
 {
     int            absorb_counter;
+    ListCell   *lc;

     absorb_counter = UNLINKS_PER_ABSORB;
-    while (pendingUnlinks != NIL)
+    foreach(lc, pendingUnlinks)
     {
-        PendingUnlinkEntry *entry = (PendingUnlinkEntry *) linitial(pendingUnlinks);
+        PendingUnlinkEntry *entry = (PendingUnlinkEntry *) lfirst(lc);
         char        path[MAXPGPATH];

         /*
@@ -214,6 +216,10 @@ SyncPostCheckpoint(void)
         if (entry->cycle_ctr == checkpoint_cycle_ctr)
             break;

+        /* Skip over any canceled entries */
+        if (entry->canceled)
+            continue;
+
         /* Unlink the file */
         if (syncsw[entry->tag.handler].sync_unlinkfiletag(&entry->tag,
                                                           path) < 0)
@@ -231,15 +237,13 @@ SyncPostCheckpoint(void)
                          errmsg("could not remove file \"%s\": %m", path)));
         }

-        /* And remove the list entry */
-        pendingUnlinks = list_delete_first(pendingUnlinks);
-        pfree(entry);
+        /* Mark the list entry as canceled, just in case */
+        entry->canceled = true;

         /*
          * As in ProcessSyncRequests, we don't want to stop absorbing fsync
          * requests for a long time when there are many deletions to be done.
-         * We can safely call AbsorbSyncRequests() at this point in the loop
-         * (note it might try to delete list entries).
+         * We can safely call AbsorbSyncRequests() at this point in the loop.
          */
         if (--absorb_counter <= 0)
         {
@@ -247,6 +251,26 @@ SyncPostCheckpoint(void)
             absorb_counter = UNLINKS_PER_ABSORB;
         }
     }
+
+    /*
+     * If we reached the end of the list, we can just remove the whole list
+     * (remembering to pfree all the PendingUnlinkEntry objects).  Otherwise,
+     * we must keep the entries at or after "lc".
+     */
+    if (lc == NULL)
+    {
+        list_free_deep(pendingUnlinks);
+        pendingUnlinks = NIL;
+    }
+    else
+    {
+        int            ntodelete = list_cell_number(pendingUnlinks, lc);
+
+        for (int i = 0; i < ntodelete; i++)
+            pfree(list_nth(pendingUnlinks, i));
+
+        pendingUnlinks = list_delete_first_n(pendingUnlinks, ntodelete);
+    }
 }

 /*
@@ -486,17 +510,14 @@ RememberSyncRequest(const FileTag *ftag, SyncRequestType type)
                 entry->canceled = true;
         }

-        /* Remove matching unlink requests */
+        /* Cancel matching unlink requests */
         foreach(cell, pendingUnlinks)
         {
             PendingUnlinkEntry *entry = (PendingUnlinkEntry *) lfirst(cell);

             if (entry->tag.handler == ftag->handler &&
                 syncsw[ftag->handler].sync_filetagmatches(ftag, &entry->tag))
-            {
-                pendingUnlinks = foreach_delete_current(pendingUnlinks, cell);
-                pfree(entry);
-            }
+                entry->canceled = true;
         }
     }
     else if (type == SYNC_UNLINK_REQUEST)
@@ -508,6 +529,7 @@ RememberSyncRequest(const FileTag *ftag, SyncRequestType type)
         entry = palloc(sizeof(PendingUnlinkEntry));
         entry->tag = *ftag;
         entry->cycle_ctr = checkpoint_cycle_ctr;
+        entry->canceled = false;

         pendingUnlinks = lappend(pendingUnlinks, entry);

diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 30f98c4595..c3f47db888 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -564,6 +564,7 @@ extern pg_nodiscard List *list_delete_int(List *list, int datum);
 extern pg_nodiscard List *list_delete_oid(List *list, Oid datum);
 extern pg_nodiscard List *list_delete_first(List *list);
 extern pg_nodiscard List *list_delete_last(List *list);
+extern pg_nodiscard List *list_delete_first_n(List *list, int n);
 extern pg_nodiscard List *list_delete_nth_cell(List *list, int n);
 extern pg_nodiscard List *list_delete_cell(List *list, ListCell *cell);


Re: inefficient loop in StandbyReleaseLockList()

From
Kyotaro Horiguchi
Date:
At Mon, 01 Nov 2021 11:58:35 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> > At Sun, 31 Oct 2021 16:55:01 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> >> I looked at the remaining list_delete_first callers.
> >> 
> >> 1. Attached is a proposed patch to get rid of the calls in trgm_regexp.c.
> >> I'm not certain that those lists could get long enough to be a problem,
> >> given the existing complexity limits in that file (MAX_EXPANDED_STATES
> >> etc).  But I'm not certain they can't, either, and it's easy enough to
> >> fix along the same lines as in StandbyReleaseLockList.
> 
> > I should be missing something, but at the added list_free() there's a
> > case where keysQueue has some elelments.  I think the remaining
> > elements are useless but AFAICS all the memory allocated there is
> > freed after createTrgmNFAInternal returnes, before the "next cycle"
> > comes. Do we need to add that list_free there?
> 
> I was mainly trying to preserve the memory allocation behavior of the
> current code, which will free the List when its last element is removed.
> I agree that it *probably* doesn't matter --- but processState is
> invoked multiple times during one createTrgmNFAInternal call, so I'm
> not quite sure that leaking all those lists couldn't amount to anything.
> It seemed prudent to ensure that things couldn't be made worse by this
> patch.

Hmm. Actually that's a kind of convincing.  Thinking together with the
reason for not releasing ramaining elements,  It's fine with me.

> >> 3. Is agg_refill_hash_table a problem?  Probably; we've seen cases
> >> with lots of batches.
> 
> > I excluded it since I'm not sure it is in the pattern at a glance. I
> > would want to leave it alone, since changing the logic there seems
> > making things a bit complex and the gain by removing list_delete_first
> > doesn't look so large..
> 
> It looks to me like nodeAgg.c uses the hash_batches list as a stack:
> it's pushing things on with lcons, and later popping them off with
> list_delete_first.  This is exactly the pattern that 1cff1b95a
> recommended reversing.  Now, it's possible that that stack never gets
> deep enough for the O(N^2) cost to matter, but ...

Right. And the patch in another branch looks good to me.

> >> The logic in gistFindPath looks like a mess
> >> anyway since it's appending to both ends of the "fifo" list in different
> >> places (is that really necessary?).
> 
> > From the other side, the elemnts are inserted by lcons, then removed
> > by list_delete_first.  It is the worst usage of the current list
> > implementation as a FIFO. Couldn't we construct and iterate over a
> > list in the reverse order?
> 
> Yeah; at the very least, the use of both lcons and lappend falsifies
> the variable name "fifo".  I wonder though if that was intentional
> or just somebody's sloppy coding.

It looks like intentional.

> /* Append this child to the list of pages to visit later */

So we would replace the lappend with lcons for the same effect with
the reverse list.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: inefficient loop in StandbyReleaseLockList()

From
Kyotaro Horiguchi
Date:
At Mon, 1 Nov 2021 17:02:49 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> On 11/1/21, 9:58 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> > "Bossart, Nathan" <bossartn@amazon.com> writes:
> >> Should there be a list_free(trgmNFA->queue) at the end of
> >> transformGraph()?
> >
> > There could be, but that's visibly invoked only once per
> > createTrgmNFAInternal call, so I didn't think it was worthwhile
> > to do so (unlike the case for processState).  If we were concerned
> > about leakage in that function, the hash table would be a far
> > bigger issue.
> 
> Ah, I see it now.  The patch looks good to me, then.

+1

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: inefficient loop in StandbyReleaseLockList()

From
Kyotaro Horiguchi
Date:
At Mon, 01 Nov 2021 18:01:18 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> "Bossart, Nathan" <bossartn@amazon.com> writes:
> > On 10/31/21, 1:55 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> >> 2. I think we almost certainly have a problem in SyncPostCheckpoint.
> 
> > This one doesn't look as straightforward.  It looks like we might need
> > a list_delete_first_n() to delete the first N entries all at once to
> > improve this one.
> 
> Yeah.  We don't absolutely need a new list primitive: we could use
> list_copy_tail() and then free the old list.  But the extra palloc
> traffic involved makes this sound like a bad idea.  It does have
> the advantage that we could shorten the List's storage even when
> it doesn't go to empty, but I'm not sure that's worth anything.
> If the List isn't going to empty, that implies that we're getting
> a steady stream of unlink requests, meaning we'd probably just
> fill it up again.

I agree to that.  In general I think it's better not to resize storage
on truncation (or shortning) of a list except for a few special cases.
(for example, for a list that temporarily grows prominently but won't
go empty)

> The minimum-change patch would have us truncating the list before
> each AbsorbSyncRequests call, so that the list state meets that
> function's expectations.  However, as long as UNLINKS_PER_ABSORB
> is only 10, I don't think that gets us out of the O(N^2) woods.

Agreed.

> So what I did in the attached is add a "canceled" flag to
> PendingUnlinkEntry, which lets us deal with canceled or finished
> entries without having to delete them from the list right away.
> Then we only need to physically clean up the list once per
> SyncPostCheckpoint call.

We don't loop over so many canceled elements usually so I think it
works well. However, shouldn't we consider canceled before checking
cycle_ctr?  A canceled elements should be invisible from later
accesses at all.  I vaguely feel the name "cancel" might be better be
"consumed" or such but I don't object to "cancel".

I feel that we might need to wipe_mem for the memmove case as well
(together with list_delete_nth_cell) but that is another thing even if
that's correct.

Otherwise it looks good to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> At Mon, 01 Nov 2021 18:01:18 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
>> So what I did in the attached is add a "canceled" flag to
>> PendingUnlinkEntry, which lets us deal with canceled or finished
>> entries without having to delete them from the list right away.
>> Then we only need to physically clean up the list once per
>> SyncPostCheckpoint call.

> We don't loop over so many canceled elements usually so I think it
> works well. However, shouldn't we consider canceled before checking
> cycle_ctr?

Good point.  I was thinking that it's best to break out of the loop
at the first opportunity.  But if the first few entries with the next
cycle_ctr value are canceled, it's best to advance over them in the
current SyncPostCheckpoint call.  It saves nothing to postpone that
work to later, and indeed adds a few cycles by leaving more data to
be copied by list_delete_first_n.  Will change it.

> I feel that we might need to wipe_mem for the memmove case as well
> (together with list_delete_nth_cell) but that is another thing even if
> that's correct.

Hm.  I made this function by copying-and-modifying list_delete_nth_cell,
so if there's something wrong there then this code inherited it.  But
I don't think it's wrong.  The wipe_mem business is only intended to
be used when enabling expensive debug options.

> Otherwise it looks good to me.

Thanks for looking!

            regards, tom lane



Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
I've pushed the SyncPostCheckpoint change, and I think I'm going
to stop here.  It's not clear that the remaining list_delete_first
callers have any real problem; and changing them would be complex.
We can revisit the question if we find out there is an issue.
Or, if somebody else wants to pursue the issue, feel free.

            regards, tom lane



Re: inefficient loop in StandbyReleaseLockList()

From
"Bossart, Nathan"
Date:
On 11/2/21, 8:36 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> I've pushed the SyncPostCheckpoint change, and I think I'm going
> to stop here.  It's not clear that the remaining list_delete_first
> callers have any real problem; and changing them would be complex.
> We can revisit the question if we find out there is an issue.
> Or, if somebody else wants to pursue the issue, feel free.

Thanks!

Nathan


Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I wonder if it's worth adding a note to list_delete_first() mentioning its
> O(N) behaviour. It's not immediately visible from the code, and from the list
> name one could very well be excused to not be worried about O(N) costs.

Hm.  I think it's not the only list function with O(N) behavior;
in fact there used to be more such functions than there are now.
But I could get behind a patch that annotates all of them.

            regards, tom lane



Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Thu, Nov 04, 2021 at 08:21:56PM -0400, Tom Lane wrote:
>> Hm.  I think it's not the only list function with O(N) behavior;
>> in fact there used to be more such functions than there are now.
>> But I could get behind a patch that annotates all of them.

> Documenting that makes sense.  Shouldn't we be careful to do that in
> both pg_list.h and list.c, then?

We have seldom, if ever, put function API-definition comments into .h files.
I do not see a reason why this case deserves an exception.  (It's tough
enough to get people to maintain definition comments that are right beside
the code they describe --- I think putting them in .h files would be a
disaster.)

            regards, tom lane



Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
I wrote:
> Hm.  I think it's not the only list function with O(N) behavior;
> in fact there used to be more such functions than there are now.
> But I could get behind a patch that annotates all of them.

Here's a quick hack at that.  Having done it, I'm not sure if it's
really worth the trouble or not ... thoughts?

            regards, tom lane

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 12847e35ea..247032124d 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -410,6 +410,9 @@ insert_new_cell(List *list, int pos)
 /*
  * Insert the given datum at position 'pos' (measured from 0) in the list.
  * 'pos' must be valid, ie, 0 <= pos <= list's length.
+ *
+ * Note that this takes time proportional to the distance to the end of the
+ * list, since the following entries must be moved.
  */
 List *
 list_insert_nth(List *list, int pos, void *datum)
@@ -460,6 +463,9 @@ list_insert_nth_oid(List *list, int pos, Oid datum)
  * value, rather than continuing to use the pointer passed as the
  * second argument.
  *
+ * Note that this takes time proportional to the length of the list,
+ * since the existing entries must be moved.
+ *
  * Caution: before Postgres 8.0, the original List was unmodified and
  * could be considered to retain its separate identity.  This is no longer
  * the case.
@@ -525,6 +531,10 @@ lcons_oid(Oid datum, List *list)
  * Callers should be sure to use the return value as the new pointer to the
  * concatenated list: the 'list1' input pointer may or may not be the same
  * as the returned pointer.
+ *
+ * Note that this takes at least time proportional to the length of list2.
+ * It'd typically be the case that we have to enlarge list1's storage,
+ * probably adding time proportional to the length of list1.
  */
 List *
 list_concat(List *list1, const List *list2)
@@ -623,6 +633,8 @@ list_truncate(List *list, int new_size)
  * Return true iff 'datum' is a member of the list. Equality is
  * determined via equal(), so callers should ensure that they pass a
  * Node as 'datum'.
+ *
+ * This does a simple linear search --- avoid using it on long lists.
  */
 bool
 list_member(const List *list, const void *datum)
@@ -706,6 +718,9 @@ list_member_oid(const List *list, Oid datum)
  * Delete the n'th cell (counting from 0) in list.
  *
  * The List is pfree'd if this was the last member.
+ *
+ * Note that this takes time proportional to the distance to the end of the
+ * list, since the following entries must be moved.
  */
 List *
 list_delete_nth_cell(List *list, int n)
@@ -777,6 +792,9 @@ list_delete_nth_cell(List *list, int n)
  *
  * The List is pfree'd if this was the last member.  However, we do not
  * touch any data the cell might've been pointing to.
+ *
+ * Note that this takes time proportional to the distance to the end of the
+ * list, since the following entries must be moved.
  */
 List *
 list_delete_cell(List *list, ListCell *cell)
@@ -787,6 +805,8 @@ list_delete_cell(List *list, ListCell *cell)
 /*
  * Delete the first cell in list that matches datum, if any.
  * Equality is determined via equal().
+ *
+ * This does a simple linear search --- avoid using it on long lists.
  */
 List *
 list_delete(List *list, void *datum)
@@ -870,6 +890,9 @@ list_delete_oid(List *list, Oid datum)
  * where the intent is to alter the list rather than just traverse it.
  * Beware that the list is modified, whereas the Lisp-y coding leaves
  * the original list head intact in case there's another pointer to it.
+ *
+ * Note that this takes time proportional to the length of the list,
+ * since the remaining entries must be moved.
  */
 List *
 list_delete_first(List *list)
@@ -885,8 +908,8 @@ list_delete_first(List *list)
 /*
  * Delete the last element of the list.
  *
- * This is the opposite of list_delete_first(), but is noticeably cheaper
- * with a long list, since no data need be moved.
+ * This is the opposite of list_delete_first(), but is O(1),
+ * since no data need be moved.
  */
 List *
 list_delete_last(List *list)
@@ -910,6 +933,9 @@ list_delete_last(List *list)
  * Delete the first N cells of the list.
  *
  * The List is pfree'd if the request causes all cells to be deleted.
+ *
+ * Note that this takes time proportional to the distance to the end of the
+ * list, since the following entries must be moved.
  */
 List *
 list_delete_first_n(List *list, int n)
@@ -989,8 +1015,10 @@ list_delete_first_n(List *list, int n)
  * you probably want to use list_concat_unique() instead to avoid wasting
  * the storage of the old x list.
  *
- * This function could probably be implemented a lot faster if it is a
- * performance bottleneck.
+ * Note that this takes time proportional to the product of the list
+ * lengths, so beware of using it on long lists.  (We could probably
+ * improve that, but really you should be using some other data structure
+ * if this'd be a performance bottleneck.)
  */
 List *
 list_union(const List *list1, const List *list2)
@@ -1094,6 +1122,11 @@ list_union_oid(const List *list1, const List *list2)
  * This variant works on lists of pointers, and determines list
  * membership via equal().  Note that the list1 member will be pointed
  * to in the result.
+ *
+ * Note that this takes time proportional to the product of the list
+ * lengths, so beware of using it on long lists.  (We could probably
+ * improve that, but really you should be using some other data structure
+ * if this'd be a performance bottleneck.)
  */
 List *
 list_intersection(const List *list1, const List *list2)
@@ -1152,6 +1185,11 @@ list_intersection_int(const List *list1, const List *list2)
  *
  * This variant works on lists of pointers, and determines list
  * membership via equal()
+ *
+ * Note that this takes time proportional to the product of the list
+ * lengths, so beware of using it on long lists.  (We could probably
+ * improve that, but really you should be using some other data structure
+ * if this'd be a performance bottleneck.)
  */
 List *
 list_difference(const List *list1, const List *list2)
@@ -1256,6 +1294,8 @@ list_difference_oid(const List *list1, const List *list2)
  *
  * Whether an element is already a member of the list is determined
  * via equal().
+ *
+ * This does a simple linear search --- avoid using it on long lists.
  */
 List *
 list_append_unique(List *list, void *datum)
@@ -1313,6 +1353,11 @@ list_append_unique_oid(List *list, Oid datum)
  * modified in-place rather than being copied. However, callers of this
  * function may have strict ordering expectations -- i.e. that the relative
  * order of those list2 elements that are not duplicates is preserved.
+ *
+ * Note that this takes time proportional to the product of the list
+ * lengths, so beware of using it on long lists.  (We could probably
+ * improve that, but really you should be using some other data structure
+ * if this'd be a performance bottleneck.)
  */
 List *
 list_concat_unique(List *list1, const List *list2)
@@ -1401,6 +1446,8 @@ list_concat_unique_oid(List *list1, const List *list2)
  *
  * It is caller's responsibility to have sorted the list to bring duplicates
  * together, perhaps via list_sort(list, list_oid_cmp).
+ *
+ * Note that this takes time proportional to the length of the list.
  */
 void
 list_deduplicate_oid(List *list)
@@ -1557,6 +1604,8 @@ list_copy_deep(const List *oldlist)
  *
  * Like qsort(), this provides no guarantees about sort stability
  * for equal keys.
+ *
+ * This is based on qsort(), so it likewise has O(N log N) runtime.
  */
 void
 list_sort(List *list, list_sort_comparator cmp)

Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2021-11-06 14:06:12 -0400, Tom Lane wrote:
>> + * Note that this takes time proportional to the length of the list,
>> + * since the remaining entries must be moved.
>> */
>> List *
>> list_delete_first(List *list)

> Perhaps we could point to list_delete_last()? But it's an improvement without
> that too.

Good point.  The note at list_delete_last that it's O(1) isn't really
on point --- instead, the text for list_delete_first should be like

+ * Note that this takes time proportional to the length of the list,
+ * since the remaining entries must be moved.  Consider reversing the
+ * list order so that you can use list_delete_last() instead.  However,
+ * if that causes you to replace lappend() with lcons(), you haven't
+ * improved matters.

            regards, tom lane



Re: inefficient loop in StandbyReleaseLockList()

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2021-11-06 18:32:54 -0400, Tom Lane wrote:
>> Good point.  The note at list_delete_last that it's O(1) isn't really
>> on point --- instead, the text for list_delete_first should be like
>> 
>> + * Note that this takes time proportional to the length of the list,
>> + * since the remaining entries must be moved.  Consider reversing the
>> + * list order so that you can use list_delete_last() instead.  However,
>> + * if that causes you to replace lappend() with lcons(), you haven't
>> + * improved matters.

> LGTM

Done that way, then.

            regards, tom lane