Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations |
Date | |
Msg-id | CAH2-WzmTS0dW=vuRi34aasmE9uiS0ojwdquaECtDxjnh9z3+jQ@mail.gmail.com Whole thread Raw |
In response to | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
|
List | pgsql-hackers |
On Fri, Feb 25, 2022 at 3:26 PM Andres Freund <andres@anarazel.de> wrote: > freeze_required_limit, freeze_desired_limit? Or s/limit/cutoff/? Or > s/limit/below/? I kind of like below because that answers < vs <= which I find > hard to remember around freezing. I like freeze_required_limit the most. > That may be true, but I think working more incrementally is better in this > are. I'd rather have a smaller improvement for a release, collect some data, > get another improvement in the next, than see a bunch of reports of larger > wind and large regressions. I agree. There is an important practical way in which it makes sense to treat 0001 as separate to 0002. It is true that 0001 is independently quite useful. In practical terms, I'd be quite happy to just get 0001 into Postgres 15, without 0002. I think that that's what you meant here, in concrete terms, and we can agree on that now. However, it is *also* true that there is an important practical sense in which they *are* related. I don't want to ignore that either -- it does matter. Most of the value to be had here comes from the synergy between 0001 and 0002 -- or what I've been calling a "virtuous cycle", the thing that makes it possible to advance relfrozenxid/relminmxid in almost every VACUUM. Having both 0001 and 0002 together (or something along the same lines) is way more valuable than having just one. Perhaps we can even agree on this second point. I am encouraged by the fact that you at least recognize the general validity of the key ideas from 0002. If I am going to commit 0001 (and not 0002) ahead of feature freeze for 15, I better be pretty sure that I have at least roughly the right idea with 0002, too -- since that's the direction that 0001 is going in. It almost seems dishonest to pretend that I wasn't thinking of 0002 when I wrote 0001. I'm glad that you seem to agree that this business of accumulating freezing debt without any natural limit is just not okay. That is really fundamental to me. I mean, vacuum_freeze_min_age kind of doesn't work as designed. This is a huge problem for us. > > Under these conditions, we will have many more opportunities to > > advance relminmxid for most of the tables (including the larger > > tables) all the way up to current-oldestMxact with the patch series. > > Without needing to freeze *any* MultiXacts early (just freezing some > > XIDs early) to get that benefit. The patch series is not just about > > spreading the burden of freezing, so that non-aggressive VACUUMs > > freeze more -- it's also making relfrozenxid and relminmxid more > > recent and therefore *reliable* indicators of which tables any > > wraparound problems *really* are. > > My concern was explicitly about the case where we have to create new > multixacts... It was a mistake on my part to counter your point about that with this other point about eager relminmxid advancement. As I said in the last email, while that is very valuable, it's not something that needs to be brought into this. > > Does that make sense to you? > > Yes. Okay, great. The fact that you recognize the value in that comes as a relief. > > You mean to change the signature of heap_tuple_needs_freeze, so it > > doesn't return a bool anymore? It just has two bool pointers as > > arguments, can_freeze and need_freeze? > > Something like that. Or return true if there's anything to do, and then rely > on can_freeze and need_freeze for finer details. But it doesn't matter that much. Got it. > > The problem that all of these heuristics have is that they will tend > > to make it impossible for future non-aggressive VACUUMs to be able to > > advance relfrozenxid. All that it takes is one single all-visible page > > to make that impossible. As I said upthread, I think that being able > > to advance relfrozenxid (and especially relminmxid) by *some* amount > > in every VACUUM has non-obvious value. > > I think that's a laudable goal. But I don't think we should go there unless we > are quite confident we've mitigated the potential downsides. True. But that works both ways. We also shouldn't err in the direction of adding these kinds of heuristics (which have real downsides) until the idea of mostly swallowing the cost of freezing whole pages (while making it possible to disable) has lost, fairly. Overall, it looks like the cost is acceptable in most cases. I think that users will find it very reassuring to regularly and reliably see confirmation that wraparound is being kept at bay, by every VACUUM operation, with details that they can relate to their workload. That has real value IMV -- even when it's theoretically unnecessary for us to be so eager with advancing relfrozenxid. I really don't like the idea of falling behind on freezing systematically. You always run the "risk" of freezing being wasted. But that way of looking at it can be penny wise, pound foolish -- maybe we should just accept that trying to predict what will happen in the future (whether or not freezing will be worth it) is mostly not helpful. Our users mostly complain about performance stability these days. Big shocks are really something we ought to avoid. That does have a cost. Why wouldn't it? > > Maybe you can address that by changing the behavior of non-aggressive > > VACUUMs, so that they are directly sensitive to this. Maybe they don't > > skip any all-visible pages when there aren't too many, that kind of > > thing. That needs to be in scope IMV. > > Yea. I still like my idea to have vacuum process a some all-visible pages > every time and to increase that percentage based on how old the relfrozenxid > is. You can quite easily construct cases where the patch does much better than that, though -- very believable cases. Any table like pgbench_history. And so I lean towards quantifying the cost of page-level freezing carefully, making sure there is nothing pathological, and then just accepting it (with a GUC to disable). The reality is that freezing is really a cost of storing data in Postgres, and will be for the foreseeable future. > > Can you think of an adversarial workload, to get a sense of the extent > > of the problem? > > I'll try to come up with something. That would be very helpful. Thanks! > It might make sense to separate the purposes of SKIP_PAGES_THRESHOLD. The > relfrozenxid advancement doesn't benefit from visiting all-frozen pages, just > because there are only 30 of them in a row. Right. I imagine that SKIP_PAGES_THRESHOLD actually does help with this, but if we actually tried we'd find a much better way. > I wish somebody would tackle merging heap_page_prune() with > vacuuming. Primarily so we only do a single WAL record. But also because the > separation has caused a *lot* of complexity. I've already more projects than > I should, otherwise I'd start on it... That has value, but it doesn't feel as urgent. -- Peter Geoghegan
pgsql-hackers by date: