Re: The Free Space Map: Problems and Opportunities - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: The Free Space Map: Problems and Opportunities
Date
Msg-id CAH2-WzmhPE_Awanhh+52vj9N84L1mD2hSrm7HzJTfDoTgZ5DMA@mail.gmail.com
Whole thread Raw
In response to Re: The Free Space Map: Problems and Opportunities  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: The Free Space Map: Problems and Opportunities
List pgsql-hackers
On Wed, Sep 8, 2021 at 8:20 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > I didn't mean to suggest that it had to happen in perfect lockstep.
> > But I do think that it should be pretty close to perfect. What I
> > actually do right now is prune an open page when it *appears* to be
> > full inside the loop in RelationGetBufferForTuple().
>
> That seems like a good idea.

> I don't know, I'm not really convinced that "much larger patches" that
> change a lot of loosely related things all at once are good for the
> project. It seems to me that there's a reasonably good chance of
> replacing an annoying set of problems that existing PostgreSQL users
> have worked around to some degree, knowing or unknowingly, with a
> different annoying set of problems that may cause fewer or more
> problems in practice. Sometimes there's no way to improve something
> short of a giant project that changes a lot of things at the same
> time, but a series of incremental changes is a lot less risky.

But these things are *highly* related.

The RelationGetBufferForTuple() prune mechanism I described (that
targets aborted xact tuples and sets hint bits) is fundamentally built
on top of the idea of ownership of heap pages by backends/transactions
-- that was what I meant before. We *need* to have context. This isn't an
ordinary heap prune -- it doesn't have any of the prechecks to avoid
useless pruning that you see at the top of heap_page_prune_opt(). It's
possible that we won't be able to get a super-exclusive lock in the
specialized prune code path, but that's considered a rare corner case.
There is no question of concurrent inserters senselessly blocking the
prune, which is not at all true with the current approach to free
space management. So there is no way I could extract a minimal "prune
inside RelationGetBufferForTuple()" patch that would actually work.

Systems that follow ARIES closely and have UNDO *must* treat free
space as a qualitative thing, something that is meaningful only with
associated information about a deleting or inserting transaction, and
its status. There is logical UNDO for the free space management
structure, and even getting free space from a page can involve
heavyweight locking. Postgres works differently, but there is no
reason why Postgres should not do a lightweight approximate version of
the same thing - the laws of physics favor carefully grouping
logically related data, and working to keep the physical database
representation as clean a representation of the logical database as
possible, right from the start.

> > It seems to me that this leaves one harder question unanswered: at
> > what point does a "medium sized" transaction become so large that it
> > just doesn't make sense to do either? What's the crossover point at
> > which background processing and foreground processing like this should
> > be assumed to be not worth it? That I won't speculate about just yet.
> > I suspect that at some point it really does make sense to leave it all
> > up to a true table-level batch operation, like a conventional VACUUM.
>
> I doubt it makes sense to define a limit here explicitly. At some
> point strategies will naturally start to fail, e.g. prune-before-evict
> won't work once the operation becomes large enough that pages have to
> be evicted while the transaction is still running.

Perhaps. As you know I'm generally in favor of letting things fail
naturally, and then falling back on an alternative strategy.


--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Next
From: Robert Haas
Date:
Subject: Re: VARDATA_COMPRESSED_GET_COMPRESS_METHOD comment?