Re: The Free Space Map: Problems and Opportunities - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: The Free Space Map: Problems and Opportunities |
Date | |
Msg-id | CAH2-Wz=zEV4y_wxh-A_EvKxeAoCMdquYMHABEh_kZO1rk3a-gw@mail.gmail.com Whole thread Raw |
In response to | Re: The Free Space Map: Problems and Opportunities (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: The Free Space Map: Problems and Opportunities
|
List | pgsql-hackers |
On Fri, Aug 20, 2021 at 1:30 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Aug 20, 2021 at 3:40 PM Peter Geoghegan <pg@bowt.ie> wrote: > > My concern here is really the data structure and its overall > > complexity. > I don't think I understand the data structure that you have in mind > well enough to comment intelligently. Right now my prototype has a centralized table in shared memory, with a hash table. One entry per relation, generally multiple freelists per relation. And with per-freelist metadata such as owner and original leader backend XID values. Plus of course the lists of free blocks themselves. The prototype already clearly fixes the worst problems with BenchmarkSQL, but that's only one of my goals. That's just the starting point. I appreciate your input on this. And not just on the implementation details -- the actual requirements themselves are still in flux. This isn't so much a project to replace the FSM as it is a project that adds a new rich abstraction layer that goes between access methods and smgr.c -- free space management is only one of the responsibilities. Access methods like heapam can talk to the new abstraction layer, sometimes exploiting semantics from the logical database -- owning transaction IDs for a given lease, things like that. The new abstraction may thereby help us when working on other problems, including problems in areas that might seem totally unrelated. That's the big idea. Here are some (admittedly quite speculative) examples of places that the infrastructure could help with: * Parallel INSERT/COPY. I imagine parallel DML is hard because of coordination problems. With the infrastructure I have in mind, a parallel DML implementation could request a large free space lease, and distribute blocks from that lease among parallel workers during execution, on demand. This naturally avoids unnecessary buffer lock contention among concurrent writes from the same transaction (but different workers). But that's not all. I'm also thinking of things like LWLock/buffer lock deadlock avoidance -- issues of fundamental correctness. That stuff becomes easier, too. Individual parallel workers from a parallel INSERT could safely assume that the blocks that the parallel leader tells them about really is only going to be accessed by them, for as long as the owning transaction is around (barring read-only access by VACUUM, things like that). There is simply no question of any other backend/parallel worker thinking that it has the right to use the same block before the transaction commits. The transaction-scoped ownership semantics must be truly robust for this to work at all, but that's a good idea anyway. * Truncate tables without a disruptive relation-level lock. AFAICT it isn't actually logically necessary to get a heavyweight relation extension lock to truncate a relation during VACUUM. The only reason we need that right now is because it's really hard to nail down which backends might have a usable BlockNumber that they still expect to not point past the end of the physical relfilenode (or much worse, point to new and logically unrelated data). Perhaps we can solve the underlying problem using something like Lanin & Shasha's "drain technique", which is used inside nbtree for page deletion today. The general idea is to defer the physical unlink/recycling until we know for sure that any backends that might have had unsafe references have gone away. This is a very broad technique. * Truncate-away empty heap table segments that happen to come before other segments that still have useful data. * Eager cleanup of aborted transactions by VACUUM (or some variation of VACUUM that just handles aborts). Again, we're using information about the logical database to solve problem with the physical database -- that information is what we seem to currently be missing in all these areas. That's really the central idea behind the new infrastructure. -- Peter Geoghegan
pgsql-hackers by date: