Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Deleting older versions in unique indexes to avoid page splits |
Date | |
Msg-id | CAH2-Wz=2X=0eJkmfn1DTN6TMYbQOP10GCXCXV7U7=98akyfp5w@mail.gmail.com Whole thread Raw |
In response to | Re: Deleting older versions in unique indexes to avoid page splits (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Deleting older versions in unique indexes to avoid page splits
|
List | pgsql-hackers |
On Thu, Jan 21, 2021 at 9:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Slowing down non-HOT updaters in these extreme cases may actually be a > > good thing, even when bottom-up deletion finally becomes ineffective. > > It can be thought of as backpressure. I am not worried about slowing > > down something that is already hopelessly inefficient and > > unsustainable. I'd go even further than that, in fact -- I now wonder > > if we should *deliberately* slow them down some more! > > > > Do you have something specific in mind for this? Maybe something a bit like the VACUUM cost delay stuff could be applied at the point that we realize that a given bottom-up deletion pass is entirely effective purely due to a long running transaction, that gets applied by nbtree caller once it splits the page. This isn't something that I plan to work on anytime soon. My point was mostly that it really would make sense to deliberately throttle non-hot updates at the point that they trigger page splits that are believed to be more or less caused by a long running transaction. They're so incredibly harmful to the general responsiveness of the system that having a last line of defense like that (backpressure/throttling) really does make sense. > I have briefly tried that but numbers were not consistent probably > because at that time autovacuum was also 'on'. So, I tried switching > off autovacuum and dropping/recreating the tables. It's not at all surprising that they weren't consistent. Clearly bottom-up deletion wastes cycles on the first execution (it is wasted effort in at least one sense) -- you showed that already. Subsequent executions will actually manage to delete some tuples (probably a great many tuples), and so will have totally different performance profiles/characteristics. Isn't that obvious? > > Besides, this test case is just ludicrous. > > > > I think it might be okay to say that in such cases we can expect > regression especially because we see benefits in many other cases so > paying some cost in such cases is acceptable or such scenarios are > less common or probably such cases are already not efficient so it > doesn't matter much but I am not sure if we can say they are > completely unreasonable. I think this test case depicts the behavior > with bulk updates. Sincere question: What do you want me to do about it? You're asking me about two separate though very closely related issues at the same time here (this bulk update regression thing, plus the question of doing bottom-up passes when the incoming item isn't from a HOT updater). While your positions on these closely related issues are not incompatible, exactly, it's difficult for me to get to any underlying substantive point. In effect, you are pulling the patch in two separate directions at the same time. In practical terms, it's very likely that I cannot move the bottom-up deletion code closer to one of your ideals without simultaneously moving it further away from the other ideal. I will say the following about your bulk update example now, just in case you feel that I gave you the impression of never having taken it seriously: 1. I accept that the effect that you've described is real. It is a pretty narrow effect in practice, though, and will be of minimal concern to real applications (especially relative to the benefits they receive). I happen to believe that the kind of bulk update that you showed is naturally very rare, and will naturally cause horrible problems with any RDBMS -- and that's why I'm not too worried (about this specific sequence that you showed, or one like it that somehow artfully avoids receiving any performance benefit). I just cannot buy the idea that any real world user will do a bulk update like that exactly once, and be frustrated by the fact that it's ~12% slower in Postgres 14. If they do it more than once the story changes, of course (the technique starts to win). You have to do something more than once to notice a change in its performance in the first place, of course -- so it just doesn't seem plausible to me. Of course it's still possible to imagine a way that that could happen. This is a risk that I am willing to live with, given the benefits. 2. If there was some simple way of avoiding the relative loss of performance without hurting other cases I would certainly take it -- in general I prefer to not have to rely on anybody's opinion of what is or is not a reasonable price to pay for something. I strongly doubt that I can do anything about your first/bulk update complaint (without causing much greater harm elsewhere), and so I won't be pursuing it. I did not deny the existence of cases like this at any point. In fact, the entire discussion was ~50% me agonizing over regressions (though never this precise case). Discussion of possible regressions happened over many months and many dense emails. So I refuse to be lectured about my supposed indifference to regressions -- not by you, and not by anyone else. In general I consistently *bend over backwards* to avoid regressions, and never assume that I didn't miss something. This came up recently, in fact: https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html See the "Updates" section of this recent blog post. No regressions could be attributed to any of the nbtree projects I was involved with in the past few years. There was a tiny (IMV quite acceptable) regression attributed to insert-driven autovacuum in Postgres 13. Deduplication didn't lead to any appreciable loss of performance, even though most of the benchmarks were rather unsympathetic towards it (no low cardinality data). I specifically asked Mark Callaghan to isolate the small regression that he thought might be attributable to the deduplication feature (via deduplicate_items storage param) -- even though at the time I myself imagined that that would *confirm* his original suspicion that nbtree deduplication was behind the issue. I don't claim to be objective when it comes to my own work, but I have at least been very conscientious. > I am not denying that I could be missing your point but OTOH you are > also completely refuting the points raised even though I have shown > them by test and by sharing an example. Actually, I quite specifically and directly said that I was *sympathetic* to your second point (the one about doing bottom-up deletion in more marginal cases not directly involving non-HOT updaters). I also said that I struggled with it myself for a long time. I just don't think that it is worth pursuing at this time -- but that shouldn't stop anyone else that's interested in it. > 'Note' in the code (or README) suggesting something like we have > considered the alternative for page-level-flag (to be aggressive in > triggering this optimization) but not pursued with that for so-and-so > reasons. Good news, then: I pretty much *did* document it in the nbtree README! The following aside concerns this *exact* theoretical limitation, and appears in parenthesis as part of commentary on the wider issue of how bottom-up passes are triggered: "(in theory a small amount of version churn could make a page split occur earlier than strictly necessary, but that's pretty harmless)" The "breadcrumbs" *are* there. You'd notice that if you actually looked for them. > I think this can help future developers to carefully think > about it even if they want to attempt something like that. You have > considered it during the early development phase and then the same > thing occurred to Victor and me as an interesting optimization to > explore so the same can occur to someone else as well. I would not document the idea in the README unless perhaps I had high confidence that it would work out. I have an open mind about that as a possibility, but that alone doesn't seem like a good enough reason to do it. Though I am now prepared to say that this does not seem like an amazing opportunity to make the feature much better. That's why it's not a priority for me right now. There actually *are* things that I would describe that way (Sawada-san's complementary work on VACUUM). And so that's what I'll be focussing on in the weeks ahead. -- Peter Geoghegan
pgsql-hackers by date: