Re: The Free Space Map: Problems and Opportunities - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: The Free Space Map: Problems and Opportunities |
Date | |
Msg-id | CAH2-WzmhPE_Awanhh+52vj9N84L1mD2hSrm7HzJTfDoTgZ5DMA@mail.gmail.com Whole thread Raw |
In response to | Re: The Free Space Map: Problems and Opportunities (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: The Free Space Map: Problems and Opportunities
|
List | pgsql-hackers |
On Wed, Sep 8, 2021 at 8:20 AM Robert Haas <robertmhaas@gmail.com> wrote: > > I didn't mean to suggest that it had to happen in perfect lockstep. > > But I do think that it should be pretty close to perfect. What I > > actually do right now is prune an open page when it *appears* to be > > full inside the loop in RelationGetBufferForTuple(). > > That seems like a good idea. > I don't know, I'm not really convinced that "much larger patches" that > change a lot of loosely related things all at once are good for the > project. It seems to me that there's a reasonably good chance of > replacing an annoying set of problems that existing PostgreSQL users > have worked around to some degree, knowing or unknowingly, with a > different annoying set of problems that may cause fewer or more > problems in practice. Sometimes there's no way to improve something > short of a giant project that changes a lot of things at the same > time, but a series of incremental changes is a lot less risky. But these things are *highly* related. The RelationGetBufferForTuple() prune mechanism I described (that targets aborted xact tuples and sets hint bits) is fundamentally built on top of the idea of ownership of heap pages by backends/transactions -- that was what I meant before. We *need* to have context. This isn't an ordinary heap prune -- it doesn't have any of the prechecks to avoid useless pruning that you see at the top of heap_page_prune_opt(). It's possible that we won't be able to get a super-exclusive lock in the specialized prune code path, but that's considered a rare corner case. There is no question of concurrent inserters senselessly blocking the prune, which is not at all true with the current approach to free space management. So there is no way I could extract a minimal "prune inside RelationGetBufferForTuple()" patch that would actually work. Systems that follow ARIES closely and have UNDO *must* treat free space as a qualitative thing, something that is meaningful only with associated information about a deleting or inserting transaction, and its status. There is logical UNDO for the free space management structure, and even getting free space from a page can involve heavyweight locking. Postgres works differently, but there is no reason why Postgres should not do a lightweight approximate version of the same thing - the laws of physics favor carefully grouping logically related data, and working to keep the physical database representation as clean a representation of the logical database as possible, right from the start. > > It seems to me that this leaves one harder question unanswered: at > > what point does a "medium sized" transaction become so large that it > > just doesn't make sense to do either? What's the crossover point at > > which background processing and foreground processing like this should > > be assumed to be not worth it? That I won't speculate about just yet. > > I suspect that at some point it really does make sense to leave it all > > up to a true table-level batch operation, like a conventional VACUUM. > > I doubt it makes sense to define a limit here explicitly. At some > point strategies will naturally start to fail, e.g. prune-before-evict > won't work once the operation becomes large enough that pages have to > be evicted while the transaction is still running. Perhaps. As you know I'm generally in favor of letting things fail naturally, and then falling back on an alternative strategy. -- Peter Geoghegan
pgsql-hackers by date: