Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date
Msg-id 20220225232642.l6wegd3qtt4lmfot@alap3.anarazel.de
Whole thread Raw
In response to Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
List pgsql-hackers
Hi,

On 2022-02-25 14:00:12 -0800, Peter Geoghegan wrote:
> On Thu, Feb 24, 2022 at 11:14 PM Andres Freund <andres@anarazel.de> wrote:
> > I am not a fan of the backstop terminology. It's still the reason we need to
> > do freezing for correctness reasons.
> 
> Thanks for the review!
> 
> I'm not wedded to that particular terminology, but I think that we
> need something like it. Open to suggestions.
>
> How about limit-based? Something like that?

freeze_required_limit, freeze_desired_limit? Or s/limit/cutoff/? Or
s/limit/below/? I kind of like below because that answers < vs <= which I find
hard to remember around freezing.


> > I'm a tad concerned about replacing mxids that have some members that are
> > older than OldestXmin but not older than FreezeLimit. It's not too hard to
> > imagine that accelerating mxid consumption considerably.  But we can probably,
> > if not already done, special case that.
> 
> Let's assume for a moment that this is a real problem. I'm not sure if
> it is or not myself (it's complicated), but let's say that it is. The
> problem may be more than offset by the positive impact on relminxmid
> advancement. I have placed a large emphasis on enabling
> relfrozenxid/relminxmid advancement in every non-aggressive VACUUM,
> for a number of reasons -- this is one of the reasons. Finding a way
> for every VACUUM operation to be "vacrel->scanned_pages +
> vacrel->frozenskipped_pages == orig_rel_pages" (i.e. making *some*
> amount of relfrozenxid/relminxmid advancement possible in every
> VACUUM) has a great deal of value.

That may be true, but I think working more incrementally is better in this
are. I'd rather have a smaller improvement for a release, collect some data,
get another improvement in the next, than see a bunch of reports of larger
wind and large regressions.


> As I said recently on the "do only critical work during single-user
> vacuum?" thread, why should the largest tables in databases that
> consume too many MXIDs do so evenly, across all their tables? There
> are usually one or two large tables, and many more smaller tables. I
> think it's much more likely that the largest tables consume
> approximately zero MultiXactIds in these databases -- actual
> MultiXactId consumption is probably concentrated in just one or two
> smaller tables (even when we burn through MultiXacts very quickly).
> But we don't recognize these kinds of distinctions at all right now.

Recognizing those distinctions seems independent of freezing multixacts with
live members. I am happy with freezing them more aggressively if they don't
have live members. It's freezing mxids with live members that has me
concerned.  The limits you're proposing are quite aggressive and can advance
quickly.

I've seen large tables with plenty multixacts. Typically concentrated over a
value range (often changing over time).


> Under these conditions, we will have many more opportunities to
> advance relminmxid for most of the tables (including the larger
> tables) all the way up to current-oldestMxact with the patch series.
> Without needing to freeze *any* MultiXacts early (just freezing some
> XIDs early) to get that benefit. The patch series is not just about
> spreading the burden of freezing, so that non-aggressive VACUUMs
> freeze more -- it's also making relfrozenxid and relminmxid more
> recent and therefore *reliable* indicators of which tables any
> wraparound problems *really* are.

My concern was explicitly about the case where we have to create new
multixacts...


> Does that make sense to you?

Yes.


> > > On HEAD, FreezeMultiXactId() doesn't get passed down the VACUUM operation's
> > > OldestXmin at all (it actually just gets FreezeLimit passed as its
> > > cutoff_xid argument). It cannot possibly recognize any of this for itself.
> >
> > It does recognize something like OldestXmin in a more precise and expensive
> > way - MultiXactIdIsRunning() and TransactionIdIsCurrentTransactionId().
> 
> It doesn't look that way to me.
> 
> While it's true that FreezeMultiXactId() will call MultiXactIdIsRunning(),
> that's only a cross-check.

> This cross-check is made at a point where we've already determined that the
> MultiXact in question is < cutoff_multi. In other words, it catches cases
> where a "MultiXactId < cutoff_multi" Multi contains an XID *that's still
> running* -- a correctness issue. Nothing to do with being smart about
> avoiding allocating new MultiXacts during freezing, or exploiting the fact
> that "FreezeLimit < OldestXmin" (which is almost always true, very true).

If there's <= 1 live members in a mxact, we replace it with with a plain xid
iff the xid also would get frozen. With the current freezing logic I don't see
what passing down OldestXmin would change. Or how it differs to a meaningful
degree from heap_prepare_freeze_tuple()'s logic.  I don't see how it'd avoid a
single new mxact from being allocated.



> > > Caller should make temp
> > > + * copies of global tracking variables before starting to process a page, so
> > > + * that we can only scribble on copies.  That way caller can just discard the
> > > + * temp copies if it isn't okay with that assumption.
> > > + *
> > > + * Only aggressive VACUUM callers are expected to really care when a tuple
> > > + * "needs freezing" according to us.  It follows that non-aggressive VACUUMs
> > > + * can use *relfrozenxid_nofreeze_out and *relminmxid_nofreeze_out in all
> > > + * cases.
> >
> > Could it make sense to track can_freeze and need_freeze separately?
> 
> You mean to change the signature of heap_tuple_needs_freeze, so it
> doesn't return a bool anymore? It just has two bool pointers as
> arguments, can_freeze and need_freeze?

Something like that. Or return true if there's anything to do, and then rely
on can_freeze and need_freeze for finer details. But it doesn't matter that much.


> > I still suspect this will cause a very substantial increase in WAL traffic in
> > realistic workloads. It's common to have workloads where tuples are inserted
> > once, and deleted once/ partition dropped.
> 
> I agree with the principle that this kind of use case should be
> accommodated in some way.
> 
> > I think we'll have to make this less aggressive or tunable. Random ideas for
> > heuristics:
> 
> The problem that all of these heuristics have is that they will tend
> to make it impossible for future non-aggressive VACUUMs to be able to
> advance relfrozenxid. All that it takes is one single all-visible page
> to make that impossible. As I said upthread, I think that being able
> to advance relfrozenxid (and especially relminmxid) by *some* amount
> in every VACUUM has non-obvious value.

I think that's a laudable goal. But I don't think we should go there unless we
are quite confident we've mitigated the potential downsides.

Observed horizons for "never vacuumed before" tables and for aggressive
vacuums alone would be a huge win.


> Maybe you can address that by changing the behavior of non-aggressive
> VACUUMs, so that they are directly sensitive to this. Maybe they don't
> skip any all-visible pages when there aren't too many, that kind of
> thing. That needs to be in scope IMV.

Yea. I still like my idea to have vacuum process a some all-visible pages
every time and to increase that percentage based on how old the relfrozenxid
is.

We could slowly "refill" the number of all-visible pages VACUUM is allowed to
process whenever dirtying a page for other reasons.



> > I think this means my concern above about increasing mxid creation rate
> > substantially may be warranted.
> 
> Can you think of an adversarial workload, to get a sense of the extent
> of the problem?

I'll try to come up with something.


> > FWIW, I'd really like to get rid of SKIP_PAGES_THRESHOLD. It often ends up
> > causing a lot of time doing IO that we never need, completely trashing all CPU
> > caches, while not actually causing decent readaead IO from what I've seen.
> 
> I am also suspicious of SKIP_PAGES_THRESHOLD. But if we want to get
> rid of it, we'll need to be sensitive to how that affects relfrozenxid
> advancement in non-aggressive VACUUMs IMV.

It might make sense to separate the purposes of SKIP_PAGES_THRESHOLD. The
relfrozenxid advancement doesn't benefit from visiting all-frozen pages, just
because there are only 30 of them in a row.


> Thanks again for the review!

NP, I think we need a lot of improvements in this area.

I wish somebody would tackle merging heap_page_prune() with
vacuuming. Primarily so we only do a single WAL record. But also because the
separation has caused a *lot* of complexity.  I've already more projects than
I should, otherwise I'd start on it...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Next
From: Peter Geoghegan
Date:
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations