Thread: performance issue in remove_from_unowned_list()

performance issue in remove_from_unowned_list()

From
Tomas Vondra
Date:
Hi,

While working on benchmarks for the syscache patch (negative entries and
all of that), I've ran into a an issue in remove_from_unowned_list.

If you create a lot of relations in a single transaction (say, 100k) and
then abort the transaction (or if it fails for some reason, e.g. because
of hitting max_locks_per_transaction), smgrDoPendingDeletes ought to
clean relation files.

Unfortunately, the constructed srels array is ordered in exactly the
opposite way compared to "unowned list" (used by smgrclose). So what
happens is that we start with the first relation, walk the whole list
unowned list and remove the last item. Then we get the second rel and
again walk the whole unowned list until the last item. And so on.

For large number of objects that's pretty significant. With 100k rels
I've lost the patience after a couple of minutes, and just nuked the
whole cluster (interrupting ROLLBACK is a no go, and after a kill the
recovery just starts chewing on it again).

Of course, transactions creating 100k tables in a single transaction are
not that common, but it's not unheard of either (such applications
exist, are discussed in the syscache thread, and restoring them from a
pg_dump is likely to hit this issue). And the same issue applies to
cases that drop a bunch of tables in a single transaction (and I've seen
plenty of scripts doing that, although in smaller batches).

But it's a bit funnier, because there's also DropRelationFiles() which
does smgrclose on a batch of relations too, and it says this

    /*
     * Call smgrclose() in reverse order as when smgropen() is called.
     * This trick enables remove_from_unowned_list() in smgrclose()
     * to search the SMgrRelation from the unowned list,
     * with O(1) performance.
     */
    for (i = ndelrels - 1; i >= 0; i--)
    ...

but it's called from two places in xact.c only. And if you trigger the
issue with 100k x CREATE TABLE + ROLLBACK, and then force a recovery by
killing postmaster, you actually get the very same behavior with always
traversing the unowned list for some reason. I'm not quite sure why, but
it seems the optimization is exactly the wrong thing to do here.


Attached is a WIP patch removing the optimization from DropRelationFiles
and adding it to smgrDoPendingDeletes. This resolves the issue, at least
in the cases I've been able to reproduce. But maybe we should deal with
this issue earlier by ensuring the two lists are ordered the same way
from the very beginning, somehow.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

RE: performance issue in remove_from_unowned_list()

From
"Ideriha, Takeshi"
Date:
>From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
>But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on
>a batch of relations too, and it says this
>
>    /*
>     * Call smgrclose() in reverse order as when smgropen() is called.
>     * This trick enables remove_from_unowned_list() in smgrclose()
>     * to search the SMgrRelation from the unowned list,
>     * with O(1) performance.
>     */
>    for (i = ndelrels - 1; i >= 0; i--)
>    ...
>
>but it's called from two places in xact.c only. And if you trigger the issue with 100k x
>CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you
>actually get the very same behavior with always traversing the unowned list for some
>reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing
>to do here.

So when DropRelationFiles() is called, order of calling smgr_close() and order of unowed list is always same?

This one was inroduced at b4166911 and I'd like to hear author and reviewer's opinion. 
https://www.postgresql.org/message-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O%3D1NrRTpHcbtCBw%40mail.gmail.com

Regards,
Takeshi Ideriha

Re: performance issue in remove_from_unowned_list()

From
Tomas Vondra
Date:


On 2/8/19 12:32 PM, Ideriha, Takeshi wrote:
>> From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
>> But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on
>> a batch of relations too, and it says this
>>
>>    /*
>>     * Call smgrclose() in reverse order as when smgropen() is called.
>>     * This trick enables remove_from_unowned_list() in smgrclose()
>>     * to search the SMgrRelation from the unowned list,
>>     * with O(1) performance.
>>     */
>>    for (i = ndelrels - 1; i >= 0; i--)
>>    ...
>>
>> but it's called from two places in xact.c only. And if you trigger the issue with 100k x
>> CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you
>> actually get the very same behavior with always traversing the unowned list for some
>> reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing
>> to do here.
> 
> So when DropRelationFiles() is called, order of calling smgr_close()
and order of unowed list is always same?
> 

Well, let's say you create 10k tables in a single transaction, and then
kill the server with "kill -9" right after the commit. Then on restart,
during recovery this happens

2019-02-08 14:16:21.781 CET [12817] LOG:  remove_from_unowned_list 10015
2019-02-08 14:16:21.871 CET [12817] LOG:  remove_from_unowned_list 10005
2019-02-08 14:16:21.967 CET [12817] LOG:  remove_from_unowned_list 10004
2019-02-08 14:16:22.057 CET [12817] LOG:  remove_from_unowned_list 10001
2019-02-08 14:16:22.147 CET [12817] LOG:  remove_from_unowned_list 10000
2019-02-08 14:16:22.238 CET [12817] LOG:  remove_from_unowned_list 9999
2019-02-08 14:16:22.327 CET [12817] LOG:  remove_from_unowned_list 9998
2019-02-08 14:16:22.421 CET [12817] LOG:  remove_from_unowned_list 9996
2019-02-08 14:16:22.513 CET [12817] LOG:  remove_from_unowned_list 9995
2019-02-08 14:16:22.605 CET [12817] LOG:  remove_from_unowned_list 9994
2019-02-08 14:16:22.696 CET [12817] LOG:  remove_from_unowned_list 9993
...
2019-02-08 14:19:13.921 CET [12817] LOG:  remove_from_unowned_list 8396
2019-02-08 14:19:14.025 CET [12817] LOG:  remove_from_unowned_list 8395
2019-02-08 14:19:14.174 CET [12817] LOG:  remove_from_unowned_list 8394
2019-02-08 14:19:14.277 CET [12817] LOG:  remove_from_unowned_list 8393
2019-02-08 14:19:14.387 CET [12817] LOG:  remove_from_unowned_list 8392
2019-02-08 14:19:14.508 CET [12817] LOG:  remove_from_unowned_list 8391
2019-02-08 14:19:14.631 CET [12817] LOG:  remove_from_unowned_list 8390
2019-02-08 14:19:14.770 CET [12817] LOG:  remove_from_unowned_list 8389
...

This is with the attached debug patch, which simply prints index of the
unowned list index entry. And yes, it took ~3 minutes to get from 10k to
8k (at which point I interrupted the recovery and killed the cluster).

I see similar issue after creating a lot of tables in the same xact and
rolling it back, although in this case it's faster for some reason.

> This one was inroduced at b4166911 and I'd like to hear author and reviewer's opinion. 
> https://www.postgresql.org/message-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O%3D1NrRTpHcbtCBw%40mail.gmail.com
> 

That probably explains why we haven't seen the issue before.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: performance issue in remove_from_unowned_list()

From
Tomas Vondra
Date:
On 2/8/19 2:27 PM, Tomas Vondra wrote:
> 
> 
> 
> On 2/8/19 12:32 PM, Ideriha, Takeshi wrote:
>>> From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
>>> But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on
>>> a batch of relations too, and it says this
>>>
>>>    /*
>>>     * Call smgrclose() in reverse order as when smgropen() is called.
>>>     * This trick enables remove_from_unowned_list() in smgrclose()
>>>     * to search the SMgrRelation from the unowned list,
>>>     * with O(1) performance.
>>>     */
>>>    for (i = ndelrels - 1; i >= 0; i--)
>>>    ...
>>>
>>> but it's called from two places in xact.c only. And if you trigger the issue with 100k x
>>> CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you
>>> actually get the very same behavior with always traversing the unowned list for some
>>> reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing
>>> to do here.
>>
>> So when DropRelationFiles() is called, order of calling smgr_close()
> and order of unowed list is always same?
>>
> 
> Well, let's say you create 10k tables in a single transaction, and then
> kill the server with "kill -9" right after the commit. Then on restart,
> during recovery this happens
> 
> 2019-02-08 14:16:21.781 CET [12817] LOG:  remove_from_unowned_list 10015
> 2019-02-08 14:16:21.871 CET [12817] LOG:  remove_from_unowned_list 10005
> 2019-02-08 14:16:21.967 CET [12817] LOG:  remove_from_unowned_list 10004
> 2019-02-08 14:16:22.057 CET [12817] LOG:  remove_from_unowned_list 10001
> 2019-02-08 14:16:22.147 CET [12817] LOG:  remove_from_unowned_list 10000
> 2019-02-08 14:16:22.238 CET [12817] LOG:  remove_from_unowned_list 9999
> 2019-02-08 14:16:22.327 CET [12817] LOG:  remove_from_unowned_list 9998
> 2019-02-08 14:16:22.421 CET [12817] LOG:  remove_from_unowned_list 9996
> 2019-02-08 14:16:22.513 CET [12817] LOG:  remove_from_unowned_list 9995
> 2019-02-08 14:16:22.605 CET [12817] LOG:  remove_from_unowned_list 9994
> 2019-02-08 14:16:22.696 CET [12817] LOG:  remove_from_unowned_list 9993
> ...
> 2019-02-08 14:19:13.921 CET [12817] LOG:  remove_from_unowned_list 8396
> 2019-02-08 14:19:14.025 CET [12817] LOG:  remove_from_unowned_list 8395
> 2019-02-08 14:19:14.174 CET [12817] LOG:  remove_from_unowned_list 8394
> 2019-02-08 14:19:14.277 CET [12817] LOG:  remove_from_unowned_list 8393
> 2019-02-08 14:19:14.387 CET [12817] LOG:  remove_from_unowned_list 8392
> 2019-02-08 14:19:14.508 CET [12817] LOG:  remove_from_unowned_list 8391
> 2019-02-08 14:19:14.631 CET [12817] LOG:  remove_from_unowned_list 8390
> 2019-02-08 14:19:14.770 CET [12817] LOG:  remove_from_unowned_list 8389
> ...
> 
> This is with the attached debug patch, which simply prints index of the
> unowned list index entry. And yes, it took ~3 minutes to get from 10k to
> 8k (at which point I interrupted the recovery and killed the cluster).
> 
> I see similar issue after creating a lot of tables in the same xact and
> rolling it back, although in this case it's faster for some reason.
> 
>> This one was inroduced at b4166911 and I'd like to hear author and reviewer's opinion. 
>> https://www.postgresql.org/message-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O%3D1NrRTpHcbtCBw%40mail.gmail.com
>>
> 
> That probably explains why we haven't seen the issue before.
> 

FWIW I've just hit this issue (remove_from_unowned_list consuming 100%
CPU for long periods of time) in checkpointer, which calls smgrcloseall.
And that does this:

    hash_seq_init(&status, SMgrRelationHash);

    while ((reln = (SMgrRelation) hash_seq_search(&status)) != NULL)
        smgrclose(reln);

which means the order of closing relations is entirely random in this
case, and there's no hope of smartly ordering the close requests.

I'm wondering if we should just get rid of all such optimizations, and
make the unowned list doubly-linked (WIP patch attached, needs fixing
the comments etc.).

The comment before remove_from_unowned_list() claims it's not worth it,
but I guess we may need to revisit the assumptions - the fact that we've
accumulated ordering optimizations on a number of places suggests they
may not be entirely accurate.

The doubly-linked list seems way less fragile than the attempts to order
the requests in a particular way, and it works all the time. It requires
an extra pointer in the struct, but per pahole size changes from 88 to
96 bytes - so the number of cachelines does not change, and the allocset
chunk size remains the same. So no problem here, IMO. But some testing
is probably needed.

I'm going to use this for the syscache patch benchmarking, because it's
rather impossible without it.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: performance issue in remove_from_unowned_list()

From
Alvaro Herrera
Date:
On 2019-Feb-08, Tomas Vondra wrote:

> I'm wondering if we should just get rid of all such optimizations, and
> make the unowned list doubly-linked (WIP patch attached, needs fixing
> the comments etc.).

+1 for that approach.

Did you consider using a dlist?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: performance issue in remove_from_unowned_list()

From
Robert Haas
Date:
On Wed, Mar 6, 2019 at 1:53 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2019-Feb-08, Tomas Vondra wrote:
> > I'm wondering if we should just get rid of all such optimizations, and
> > make the unowned list doubly-linked (WIP patch attached, needs fixing
> > the comments etc.).
>
> +1 for that approach.

+1 for me, too.

> Did you consider using a dlist?

Maybe that is worthwhile, but this is a smaller change, which I think
should count for quite a bit.  Nothing keeps somebody from doing the
dlist change as a separate patch, if desired.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: performance issue in remove_from_unowned_list()

From
Tomas Vondra
Date:

On 3/6/19 7:52 PM, Alvaro Herrera wrote:
> On 2019-Feb-08, Tomas Vondra wrote:
> 
>> I'm wondering if we should just get rid of all such optimizations, and
>> make the unowned list doubly-linked (WIP patch attached, needs fixing
>> the comments etc.).
> 
> +1 for that approach.
> 
> Did you consider using a dlist?
> 

I'm not sure. I might have considered it, but decided to go with a
simpler / less invasive fix demonstrating the effect. And maybe make it
more acceptable for backpatch, if we want that. Which we probably don't,
so I agree dlist might be a better choice.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: performance issue in remove_from_unowned_list()

From
Tomas Vondra
Date:
On 3/6/19 8:04 PM, Robert Haas wrote:
> On Wed, Mar 6, 2019 at 1:53 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> On 2019-Feb-08, Tomas Vondra wrote:
>>> I'm wondering if we should just get rid of all such optimizations, and
>>> make the unowned list doubly-linked (WIP patch attached, needs fixing
>>> the comments etc.).
>>
>> +1 for that approach.
> 
> +1 for me, too.
> 
>> Did you consider using a dlist?
> 
> Maybe that is worthwhile, but this is a smaller change, which I think
> should count for quite a bit.  Nothing keeps somebody from doing the
> dlist change as a separate patch, if desired.
> 

Yeah, although now that I think about it I wouldn't expect the dlist
version to be much more complicated. We access next_unowned_reln on two
or three places, IIRC, so switching to dlist would be trivial I think.

What worries me more is that I observe the opposite behavior than what's
described in comment for b4166911 (which is from 2018, so not that old)
and 279628a0a7 (from 2013). So what changed since then? Seems fishy ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: performance issue in remove_from_unowned_list()

From
Alvaro Herrera
Date:
On 2019-Feb-07, Tomas Vondra wrote:

> Attached is a WIP patch removing the optimization from DropRelationFiles
> and adding it to smgrDoPendingDeletes. This resolves the issue, at least
> in the cases I've been able to reproduce. But maybe we should deal with
> this issue earlier by ensuring the two lists are ordered the same way
> from the very beginning, somehow.

I noticed that this patch isn't in the commitfest.  Are you planning to
push it soon?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: performance issue in remove_from_unowned_list()

From
Tomas Vondra
Date:

On 3/10/19 9:09 PM, Alvaro Herrera wrote:
> On 2019-Feb-07, Tomas Vondra wrote:
> 
>> Attached is a WIP patch removing the optimization from DropRelationFiles
>> and adding it to smgrDoPendingDeletes. This resolves the issue, at least
>> in the cases I've been able to reproduce. But maybe we should deal with
>> this issue earlier by ensuring the two lists are ordered the same way
>> from the very beginning, somehow.
> 
> I noticed that this patch isn't in the commitfest.  Are you planning to
> push it soon?
> 

I wasn't planning to push anything particularly soon, for two reasons:
Firstly, the issue is not particularly pressing except with non-trivial
number of relations (and I only noticed that during benchmarking).
Secondly, I still have a feeling I'm missing something about b4166911
because for me that commit does not actually fix the issue - i.e. I can
create a lot of relations in a transaction, abort it, and observe that
the replica actually accesses the relations in exactly the wrong order.
So that commit does not seem to actually fix anything.

Attached is a patch adopting the dlist approach - it seems to be working
quite fine, and is a bit cleaner than just slapping another pointer into
the SMgrRelationData struct. So I'd say this is the way to go.

I see b4166911 was actually backpatched to all supported versions, on
the basis that it fixes oversight in 279628a0a7. So I suppose this fix
should also be backpatched.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: performance issue in remove_from_unowned_list()

From
Robert Haas
Date:
On Tue, Mar 12, 2019 at 6:54 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Attached is a patch adopting the dlist approach - it seems to be working
> quite fine, and is a bit cleaner than just slapping another pointer into
> the SMgrRelationData struct. So I'd say this is the way to go.

What about using a data structure that supports O(1) lookups for any element?

The current efforts all seem to revolve around correctly guessing from
which end of the list we are likely to delete stuff, but your research
suggests that we don't always make such guesses particularly well.
And it seems unnecessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: performance issue in remove_from_unowned_list()

From
Tomas Vondra
Date:
On 3/13/19 1:12 PM, Robert Haas wrote:
> On Tue, Mar 12, 2019 at 6:54 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> Attached is a patch adopting the dlist approach - it seems to be working
>> quite fine, and is a bit cleaner than just slapping another pointer into
>> the SMgrRelationData struct. So I'd say this is the way to go.
> 
> What about using a data structure that supports O(1) lookups for any element?
> 
> The current efforts all seem to revolve around correctly guessing from
> which end of the list we are likely to delete stuff, but your research
> suggests that we don't always make such guesses particularly well.
> And it seems unnecessary.
> 

Isn't the the doubly-linked list is doing exactly that?

AFAICS we already maintain a hash table of the smgr relations, and we
look them up in this table. We don't need to look them up in the list of
unowned relations - the whole problem is that with the current
single-linked list, we need to iterate the list anyway to update pointer
in the preceding entry.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: performance issue in remove_from_unowned_list()

From
Robert Haas
Date:
On Wed, Mar 13, 2019 at 9:47 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> AFAICS we already maintain a hash table of the smgr relations, and we
> look them up in this table. We don't need to look them up in the list of
> unowned relations - the whole problem is that with the current
> single-linked list, we need to iterate the list anyway to update pointer
> in the preceding entry.

OK, I'm dumb.  Sorry for the noise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: performance issue in remove_from_unowned_list()

From
Tomas Vondra
Date:
On 3/12/19 11:54 PM, Tomas Vondra wrote:
> 
> 
> On 3/10/19 9:09 PM, Alvaro Herrera wrote:
>> On 2019-Feb-07, Tomas Vondra wrote:
>>
>>> Attached is a WIP patch removing the optimization from DropRelationFiles
>>> and adding it to smgrDoPendingDeletes. This resolves the issue, at least
>>> in the cases I've been able to reproduce. But maybe we should deal with
>>> this issue earlier by ensuring the two lists are ordered the same way
>>> from the very beginning, somehow.
>>
>> I noticed that this patch isn't in the commitfest.  Are you planning to
>> push it soon?
>>
> 
> I wasn't planning to push anything particularly soon, for two reasons:
> Firstly, the issue is not particularly pressing except with non-trivial
> number of relations (and I only noticed that during benchmarking).
> Secondly, I still have a feeling I'm missing something about b4166911
> because for me that commit does not actually fix the issue - i.e. I can
> create a lot of relations in a transaction, abort it, and observe that
> the replica actually accesses the relations in exactly the wrong order.
> So that commit does not seem to actually fix anything.
> 
> Attached is a patch adopting the dlist approach - it seems to be working
> quite fine, and is a bit cleaner than just slapping another pointer into
> the SMgrRelationData struct. So I'd say this is the way to go.
> 
> I see b4166911 was actually backpatched to all supported versions, on
> the basis that it fixes oversight in 279628a0a7. So I suppose this fix
> should also be backpatched.
> 

OK, so here is a bit more polished version of the dlist-based patch.

It's almost identical to what I posted before, except that it:

1) undoes the non-working optimization in DropRelationFiles()

2) removes add_to_unowned_list/remove_from_unowned_list entirely and
just replaces that with dlist_push_tail/dlist_delete

I've originally planned to keep those functions, mostly because the
remove_from_unowned_list comment says this:

  - * If the reln is not present in the list, nothing happens.
  - * Typically this would be caller error, but there seems no
  - * reason to throw an error.

I don't think dlist_delete allows that. But after further inspection of
all places calling those functions, don't think that can happen.

I plan to commit this soon-ish and backpatch it as mentioned before,
because I consider it pretty much a fix for b4166911.

I'm however still mildly puzzled that b4166911 apparently improved the
behavior in some cases, at least judging by [1]. OTOH there's not much
detail about how it was tested, so I can't quite run it again.


[1]
https://www.postgresql.org/message-id/flat/CAHGQGwHVQkdfDqtvGVkty%2B19cQakAydXn1etGND3X0PHbZ3%2B6w%40mail.gmail.com

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment