Thread: performance issue in remove_from_unowned_list()
Hi, While working on benchmarks for the syscache patch (negative entries and all of that), I've ran into a an issue in remove_from_unowned_list. If you create a lot of relations in a single transaction (say, 100k) and then abort the transaction (or if it fails for some reason, e.g. because of hitting max_locks_per_transaction), smgrDoPendingDeletes ought to clean relation files. Unfortunately, the constructed srels array is ordered in exactly the opposite way compared to "unowned list" (used by smgrclose). So what happens is that we start with the first relation, walk the whole list unowned list and remove the last item. Then we get the second rel and again walk the whole unowned list until the last item. And so on. For large number of objects that's pretty significant. With 100k rels I've lost the patience after a couple of minutes, and just nuked the whole cluster (interrupting ROLLBACK is a no go, and after a kill the recovery just starts chewing on it again). Of course, transactions creating 100k tables in a single transaction are not that common, but it's not unheard of either (such applications exist, are discussed in the syscache thread, and restoring them from a pg_dump is likely to hit this issue). And the same issue applies to cases that drop a bunch of tables in a single transaction (and I've seen plenty of scripts doing that, although in smaller batches). But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on a batch of relations too, and it says this /* * Call smgrclose() in reverse order as when smgropen() is called. * This trick enables remove_from_unowned_list() in smgrclose() * to search the SMgrRelation from the unowned list, * with O(1) performance. */ for (i = ndelrels - 1; i >= 0; i--) ... but it's called from two places in xact.c only. And if you trigger the issue with 100k x CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you actually get the very same behavior with always traversing the unowned list for some reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing to do here. Attached is a WIP patch removing the optimization from DropRelationFiles and adding it to smgrDoPendingDeletes. This resolves the issue, at least in the cases I've been able to reproduce. But maybe we should deal with this issue earlier by ensuring the two lists are ordered the same way from the very beginning, somehow. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
>From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com] >But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on >a batch of relations too, and it says this > > /* > * Call smgrclose() in reverse order as when smgropen() is called. > * This trick enables remove_from_unowned_list() in smgrclose() > * to search the SMgrRelation from the unowned list, > * with O(1) performance. > */ > for (i = ndelrels - 1; i >= 0; i--) > ... > >but it's called from two places in xact.c only. And if you trigger the issue with 100k x >CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you >actually get the very same behavior with always traversing the unowned list for some >reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing >to do here. So when DropRelationFiles() is called, order of calling smgr_close() and order of unowed list is always same? This one was inroduced at b4166911 and I'd like to hear author and reviewer's opinion. https://www.postgresql.org/message-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O%3D1NrRTpHcbtCBw%40mail.gmail.com Regards, Takeshi Ideriha
On 2/8/19 12:32 PM, Ideriha, Takeshi wrote: >> From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com] >> But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on >> a batch of relations too, and it says this >> >> /* >> * Call smgrclose() in reverse order as when smgropen() is called. >> * This trick enables remove_from_unowned_list() in smgrclose() >> * to search the SMgrRelation from the unowned list, >> * with O(1) performance. >> */ >> for (i = ndelrels - 1; i >= 0; i--) >> ... >> >> but it's called from two places in xact.c only. And if you trigger the issue with 100k x >> CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you >> actually get the very same behavior with always traversing the unowned list for some >> reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing >> to do here. > > So when DropRelationFiles() is called, order of calling smgr_close() and order of unowed list is always same? > Well, let's say you create 10k tables in a single transaction, and then kill the server with "kill -9" right after the commit. Then on restart, during recovery this happens 2019-02-08 14:16:21.781 CET [12817] LOG: remove_from_unowned_list 10015 2019-02-08 14:16:21.871 CET [12817] LOG: remove_from_unowned_list 10005 2019-02-08 14:16:21.967 CET [12817] LOG: remove_from_unowned_list 10004 2019-02-08 14:16:22.057 CET [12817] LOG: remove_from_unowned_list 10001 2019-02-08 14:16:22.147 CET [12817] LOG: remove_from_unowned_list 10000 2019-02-08 14:16:22.238 CET [12817] LOG: remove_from_unowned_list 9999 2019-02-08 14:16:22.327 CET [12817] LOG: remove_from_unowned_list 9998 2019-02-08 14:16:22.421 CET [12817] LOG: remove_from_unowned_list 9996 2019-02-08 14:16:22.513 CET [12817] LOG: remove_from_unowned_list 9995 2019-02-08 14:16:22.605 CET [12817] LOG: remove_from_unowned_list 9994 2019-02-08 14:16:22.696 CET [12817] LOG: remove_from_unowned_list 9993 ... 2019-02-08 14:19:13.921 CET [12817] LOG: remove_from_unowned_list 8396 2019-02-08 14:19:14.025 CET [12817] LOG: remove_from_unowned_list 8395 2019-02-08 14:19:14.174 CET [12817] LOG: remove_from_unowned_list 8394 2019-02-08 14:19:14.277 CET [12817] LOG: remove_from_unowned_list 8393 2019-02-08 14:19:14.387 CET [12817] LOG: remove_from_unowned_list 8392 2019-02-08 14:19:14.508 CET [12817] LOG: remove_from_unowned_list 8391 2019-02-08 14:19:14.631 CET [12817] LOG: remove_from_unowned_list 8390 2019-02-08 14:19:14.770 CET [12817] LOG: remove_from_unowned_list 8389 ... This is with the attached debug patch, which simply prints index of the unowned list index entry. And yes, it took ~3 minutes to get from 10k to 8k (at which point I interrupted the recovery and killed the cluster). I see similar issue after creating a lot of tables in the same xact and rolling it back, although in this case it's faster for some reason. > This one was inroduced at b4166911 and I'd like to hear author and reviewer's opinion. > https://www.postgresql.org/message-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O%3D1NrRTpHcbtCBw%40mail.gmail.com > That probably explains why we haven't seen the issue before. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2/8/19 2:27 PM, Tomas Vondra wrote: > > > > On 2/8/19 12:32 PM, Ideriha, Takeshi wrote: >>> From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com] >>> But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on >>> a batch of relations too, and it says this >>> >>> /* >>> * Call smgrclose() in reverse order as when smgropen() is called. >>> * This trick enables remove_from_unowned_list() in smgrclose() >>> * to search the SMgrRelation from the unowned list, >>> * with O(1) performance. >>> */ >>> for (i = ndelrels - 1; i >= 0; i--) >>> ... >>> >>> but it's called from two places in xact.c only. And if you trigger the issue with 100k x >>> CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you >>> actually get the very same behavior with always traversing the unowned list for some >>> reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing >>> to do here. >> >> So when DropRelationFiles() is called, order of calling smgr_close() > and order of unowed list is always same? >> > > Well, let's say you create 10k tables in a single transaction, and then > kill the server with "kill -9" right after the commit. Then on restart, > during recovery this happens > > 2019-02-08 14:16:21.781 CET [12817] LOG: remove_from_unowned_list 10015 > 2019-02-08 14:16:21.871 CET [12817] LOG: remove_from_unowned_list 10005 > 2019-02-08 14:16:21.967 CET [12817] LOG: remove_from_unowned_list 10004 > 2019-02-08 14:16:22.057 CET [12817] LOG: remove_from_unowned_list 10001 > 2019-02-08 14:16:22.147 CET [12817] LOG: remove_from_unowned_list 10000 > 2019-02-08 14:16:22.238 CET [12817] LOG: remove_from_unowned_list 9999 > 2019-02-08 14:16:22.327 CET [12817] LOG: remove_from_unowned_list 9998 > 2019-02-08 14:16:22.421 CET [12817] LOG: remove_from_unowned_list 9996 > 2019-02-08 14:16:22.513 CET [12817] LOG: remove_from_unowned_list 9995 > 2019-02-08 14:16:22.605 CET [12817] LOG: remove_from_unowned_list 9994 > 2019-02-08 14:16:22.696 CET [12817] LOG: remove_from_unowned_list 9993 > ... > 2019-02-08 14:19:13.921 CET [12817] LOG: remove_from_unowned_list 8396 > 2019-02-08 14:19:14.025 CET [12817] LOG: remove_from_unowned_list 8395 > 2019-02-08 14:19:14.174 CET [12817] LOG: remove_from_unowned_list 8394 > 2019-02-08 14:19:14.277 CET [12817] LOG: remove_from_unowned_list 8393 > 2019-02-08 14:19:14.387 CET [12817] LOG: remove_from_unowned_list 8392 > 2019-02-08 14:19:14.508 CET [12817] LOG: remove_from_unowned_list 8391 > 2019-02-08 14:19:14.631 CET [12817] LOG: remove_from_unowned_list 8390 > 2019-02-08 14:19:14.770 CET [12817] LOG: remove_from_unowned_list 8389 > ... > > This is with the attached debug patch, which simply prints index of the > unowned list index entry. And yes, it took ~3 minutes to get from 10k to > 8k (at which point I interrupted the recovery and killed the cluster). > > I see similar issue after creating a lot of tables in the same xact and > rolling it back, although in this case it's faster for some reason. > >> This one was inroduced at b4166911 and I'd like to hear author and reviewer's opinion. >> https://www.postgresql.org/message-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O%3D1NrRTpHcbtCBw%40mail.gmail.com >> > > That probably explains why we haven't seen the issue before. > FWIW I've just hit this issue (remove_from_unowned_list consuming 100% CPU for long periods of time) in checkpointer, which calls smgrcloseall. And that does this: hash_seq_init(&status, SMgrRelationHash); while ((reln = (SMgrRelation) hash_seq_search(&status)) != NULL) smgrclose(reln); which means the order of closing relations is entirely random in this case, and there's no hope of smartly ordering the close requests. I'm wondering if we should just get rid of all such optimizations, and make the unowned list doubly-linked (WIP patch attached, needs fixing the comments etc.). The comment before remove_from_unowned_list() claims it's not worth it, but I guess we may need to revisit the assumptions - the fact that we've accumulated ordering optimizations on a number of places suggests they may not be entirely accurate. The doubly-linked list seems way less fragile than the attempts to order the requests in a particular way, and it works all the time. It requires an extra pointer in the struct, but per pahole size changes from 88 to 96 bytes - so the number of cachelines does not change, and the allocset chunk size remains the same. So no problem here, IMO. But some testing is probably needed. I'm going to use this for the syscache patch benchmarking, because it's rather impossible without it. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2019-Feb-08, Tomas Vondra wrote: > I'm wondering if we should just get rid of all such optimizations, and > make the unowned list doubly-linked (WIP patch attached, needs fixing > the comments etc.). +1 for that approach. Did you consider using a dlist? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 6, 2019 at 1:53 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2019-Feb-08, Tomas Vondra wrote: > > I'm wondering if we should just get rid of all such optimizations, and > > make the unowned list doubly-linked (WIP patch attached, needs fixing > > the comments etc.). > > +1 for that approach. +1 for me, too. > Did you consider using a dlist? Maybe that is worthwhile, but this is a smaller change, which I think should count for quite a bit. Nothing keeps somebody from doing the dlist change as a separate patch, if desired. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/6/19 7:52 PM, Alvaro Herrera wrote: > On 2019-Feb-08, Tomas Vondra wrote: > >> I'm wondering if we should just get rid of all such optimizations, and >> make the unowned list doubly-linked (WIP patch attached, needs fixing >> the comments etc.). > > +1 for that approach. > > Did you consider using a dlist? > I'm not sure. I might have considered it, but decided to go with a simpler / less invasive fix demonstrating the effect. And maybe make it more acceptable for backpatch, if we want that. Which we probably don't, so I agree dlist might be a better choice. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/6/19 8:04 PM, Robert Haas wrote: > On Wed, Mar 6, 2019 at 1:53 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> On 2019-Feb-08, Tomas Vondra wrote: >>> I'm wondering if we should just get rid of all such optimizations, and >>> make the unowned list doubly-linked (WIP patch attached, needs fixing >>> the comments etc.). >> >> +1 for that approach. > > +1 for me, too. > >> Did you consider using a dlist? > > Maybe that is worthwhile, but this is a smaller change, which I think > should count for quite a bit. Nothing keeps somebody from doing the > dlist change as a separate patch, if desired. > Yeah, although now that I think about it I wouldn't expect the dlist version to be much more complicated. We access next_unowned_reln on two or three places, IIRC, so switching to dlist would be trivial I think. What worries me more is that I observe the opposite behavior than what's described in comment for b4166911 (which is from 2018, so not that old) and 279628a0a7 (from 2013). So what changed since then? Seems fishy ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Feb-07, Tomas Vondra wrote: > Attached is a WIP patch removing the optimization from DropRelationFiles > and adding it to smgrDoPendingDeletes. This resolves the issue, at least > in the cases I've been able to reproduce. But maybe we should deal with > this issue earlier by ensuring the two lists are ordered the same way > from the very beginning, somehow. I noticed that this patch isn't in the commitfest. Are you planning to push it soon? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/10/19 9:09 PM, Alvaro Herrera wrote: > On 2019-Feb-07, Tomas Vondra wrote: > >> Attached is a WIP patch removing the optimization from DropRelationFiles >> and adding it to smgrDoPendingDeletes. This resolves the issue, at least >> in the cases I've been able to reproduce. But maybe we should deal with >> this issue earlier by ensuring the two lists are ordered the same way >> from the very beginning, somehow. > > I noticed that this patch isn't in the commitfest. Are you planning to > push it soon? > I wasn't planning to push anything particularly soon, for two reasons: Firstly, the issue is not particularly pressing except with non-trivial number of relations (and I only noticed that during benchmarking). Secondly, I still have a feeling I'm missing something about b4166911 because for me that commit does not actually fix the issue - i.e. I can create a lot of relations in a transaction, abort it, and observe that the replica actually accesses the relations in exactly the wrong order. So that commit does not seem to actually fix anything. Attached is a patch adopting the dlist approach - it seems to be working quite fine, and is a bit cleaner than just slapping another pointer into the SMgrRelationData struct. So I'd say this is the way to go. I see b4166911 was actually backpatched to all supported versions, on the basis that it fixes oversight in 279628a0a7. So I suppose this fix should also be backpatched. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Mar 12, 2019 at 6:54 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Attached is a patch adopting the dlist approach - it seems to be working > quite fine, and is a bit cleaner than just slapping another pointer into > the SMgrRelationData struct. So I'd say this is the way to go. What about using a data structure that supports O(1) lookups for any element? The current efforts all seem to revolve around correctly guessing from which end of the list we are likely to delete stuff, but your research suggests that we don't always make such guesses particularly well. And it seems unnecessary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/13/19 1:12 PM, Robert Haas wrote: > On Tue, Mar 12, 2019 at 6:54 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> Attached is a patch adopting the dlist approach - it seems to be working >> quite fine, and is a bit cleaner than just slapping another pointer into >> the SMgrRelationData struct. So I'd say this is the way to go. > > What about using a data structure that supports O(1) lookups for any element? > > The current efforts all seem to revolve around correctly guessing from > which end of the list we are likely to delete stuff, but your research > suggests that we don't always make such guesses particularly well. > And it seems unnecessary. > Isn't the the doubly-linked list is doing exactly that? AFAICS we already maintain a hash table of the smgr relations, and we look them up in this table. We don't need to look them up in the list of unowned relations - the whole problem is that with the current single-linked list, we need to iterate the list anyway to update pointer in the preceding entry. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 13, 2019 at 9:47 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > AFAICS we already maintain a hash table of the smgr relations, and we > look them up in this table. We don't need to look them up in the list of > unowned relations - the whole problem is that with the current > single-linked list, we need to iterate the list anyway to update pointer > in the preceding entry. OK, I'm dumb. Sorry for the noise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/12/19 11:54 PM, Tomas Vondra wrote: > > > On 3/10/19 9:09 PM, Alvaro Herrera wrote: >> On 2019-Feb-07, Tomas Vondra wrote: >> >>> Attached is a WIP patch removing the optimization from DropRelationFiles >>> and adding it to smgrDoPendingDeletes. This resolves the issue, at least >>> in the cases I've been able to reproduce. But maybe we should deal with >>> this issue earlier by ensuring the two lists are ordered the same way >>> from the very beginning, somehow. >> >> I noticed that this patch isn't in the commitfest. Are you planning to >> push it soon? >> > > I wasn't planning to push anything particularly soon, for two reasons: > Firstly, the issue is not particularly pressing except with non-trivial > number of relations (and I only noticed that during benchmarking). > Secondly, I still have a feeling I'm missing something about b4166911 > because for me that commit does not actually fix the issue - i.e. I can > create a lot of relations in a transaction, abort it, and observe that > the replica actually accesses the relations in exactly the wrong order. > So that commit does not seem to actually fix anything. > > Attached is a patch adopting the dlist approach - it seems to be working > quite fine, and is a bit cleaner than just slapping another pointer into > the SMgrRelationData struct. So I'd say this is the way to go. > > I see b4166911 was actually backpatched to all supported versions, on > the basis that it fixes oversight in 279628a0a7. So I suppose this fix > should also be backpatched. > OK, so here is a bit more polished version of the dlist-based patch. It's almost identical to what I posted before, except that it: 1) undoes the non-working optimization in DropRelationFiles() 2) removes add_to_unowned_list/remove_from_unowned_list entirely and just replaces that with dlist_push_tail/dlist_delete I've originally planned to keep those functions, mostly because the remove_from_unowned_list comment says this: - * If the reln is not present in the list, nothing happens. - * Typically this would be caller error, but there seems no - * reason to throw an error. I don't think dlist_delete allows that. But after further inspection of all places calling those functions, don't think that can happen. I plan to commit this soon-ish and backpatch it as mentioned before, because I consider it pretty much a fix for b4166911. I'm however still mildly puzzled that b4166911 apparently improved the behavior in some cases, at least judging by [1]. OTOH there's not much detail about how it was tested, so I can't quite run it again. [1] https://www.postgresql.org/message-id/flat/CAHGQGwHVQkdfDqtvGVkty%2B19cQakAydXn1etGND3X0PHbZ3%2B6w%40mail.gmail.com regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services