Re: performance issue in remove_from_unowned_list() - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: performance issue in remove_from_unowned_list()
Date
Msg-id 91454a24-f8dc-f806-f499-79580a4f5bb8@2ndquadrant.com
Whole thread Raw
In response to Re: performance issue in remove_from_unowned_list()  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: performance issue in remove_from_unowned_list()
List pgsql-hackers
On 2/8/19 2:27 PM, Tomas Vondra wrote:
> 
> 
> 
> On 2/8/19 12:32 PM, Ideriha, Takeshi wrote:
>>> From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
>>> But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on
>>> a batch of relations too, and it says this
>>>
>>>    /*
>>>     * Call smgrclose() in reverse order as when smgropen() is called.
>>>     * This trick enables remove_from_unowned_list() in smgrclose()
>>>     * to search the SMgrRelation from the unowned list,
>>>     * with O(1) performance.
>>>     */
>>>    for (i = ndelrels - 1; i >= 0; i--)
>>>    ...
>>>
>>> but it's called from two places in xact.c only. And if you trigger the issue with 100k x
>>> CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you
>>> actually get the very same behavior with always traversing the unowned list for some
>>> reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing
>>> to do here.
>>
>> So when DropRelationFiles() is called, order of calling smgr_close()
> and order of unowed list is always same?
>>
> 
> Well, let's say you create 10k tables in a single transaction, and then
> kill the server with "kill -9" right after the commit. Then on restart,
> during recovery this happens
> 
> 2019-02-08 14:16:21.781 CET [12817] LOG:  remove_from_unowned_list 10015
> 2019-02-08 14:16:21.871 CET [12817] LOG:  remove_from_unowned_list 10005
> 2019-02-08 14:16:21.967 CET [12817] LOG:  remove_from_unowned_list 10004
> 2019-02-08 14:16:22.057 CET [12817] LOG:  remove_from_unowned_list 10001
> 2019-02-08 14:16:22.147 CET [12817] LOG:  remove_from_unowned_list 10000
> 2019-02-08 14:16:22.238 CET [12817] LOG:  remove_from_unowned_list 9999
> 2019-02-08 14:16:22.327 CET [12817] LOG:  remove_from_unowned_list 9998
> 2019-02-08 14:16:22.421 CET [12817] LOG:  remove_from_unowned_list 9996
> 2019-02-08 14:16:22.513 CET [12817] LOG:  remove_from_unowned_list 9995
> 2019-02-08 14:16:22.605 CET [12817] LOG:  remove_from_unowned_list 9994
> 2019-02-08 14:16:22.696 CET [12817] LOG:  remove_from_unowned_list 9993
> ...
> 2019-02-08 14:19:13.921 CET [12817] LOG:  remove_from_unowned_list 8396
> 2019-02-08 14:19:14.025 CET [12817] LOG:  remove_from_unowned_list 8395
> 2019-02-08 14:19:14.174 CET [12817] LOG:  remove_from_unowned_list 8394
> 2019-02-08 14:19:14.277 CET [12817] LOG:  remove_from_unowned_list 8393
> 2019-02-08 14:19:14.387 CET [12817] LOG:  remove_from_unowned_list 8392
> 2019-02-08 14:19:14.508 CET [12817] LOG:  remove_from_unowned_list 8391
> 2019-02-08 14:19:14.631 CET [12817] LOG:  remove_from_unowned_list 8390
> 2019-02-08 14:19:14.770 CET [12817] LOG:  remove_from_unowned_list 8389
> ...
> 
> This is with the attached debug patch, which simply prints index of the
> unowned list index entry. And yes, it took ~3 minutes to get from 10k to
> 8k (at which point I interrupted the recovery and killed the cluster).
> 
> I see similar issue after creating a lot of tables in the same xact and
> rolling it back, although in this case it's faster for some reason.
> 
>> This one was inroduced at b4166911 and I'd like to hear author and reviewer's opinion. 
>> https://www.postgresql.org/message-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O%3D1NrRTpHcbtCBw%40mail.gmail.com
>>
> 
> That probably explains why we haven't seen the issue before.
> 

FWIW I've just hit this issue (remove_from_unowned_list consuming 100%
CPU for long periods of time) in checkpointer, which calls smgrcloseall.
And that does this:

    hash_seq_init(&status, SMgrRelationHash);

    while ((reln = (SMgrRelation) hash_seq_search(&status)) != NULL)
        smgrclose(reln);

which means the order of closing relations is entirely random in this
case, and there's no hope of smartly ordering the close requests.

I'm wondering if we should just get rid of all such optimizations, and
make the unowned list doubly-linked (WIP patch attached, needs fixing
the comments etc.).

The comment before remove_from_unowned_list() claims it's not worth it,
but I guess we may need to revisit the assumptions - the fact that we've
accumulated ordering optimizations on a number of places suggests they
may not be entirely accurate.

The doubly-linked list seems way less fragile than the attempts to order
the requests in a particular way, and it works all the time. It requires
an extra pointer in the struct, but per pahole size changes from 88 to
96 bytes - so the number of cachelines does not change, and the allocset
chunk size remains the same. So no problem here, IMO. But some testing
is probably needed.

I'm going to use this for the syscache patch benchmarking, because it's
rather impossible without it.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Fixing findDependentObjects()'s dependency on scan order(regressions in DROP diagnostic messages)
Next
From: Konstantin Knizhnik
Date:
Subject: Re: libpq compression