Re: The Free Space Map: Problems and Opportunities - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: The Free Space Map: Problems and Opportunities |
Date | |
Msg-id | CAH2-WzmH_MvhNGWebV9O08t+rikDJW=OZ+fdHE7pmxVsP8GnbQ@mail.gmail.com Whole thread Raw |
In response to | Re: The Free Space Map: Problems and Opportunities (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: The Free Space Map: Problems and Opportunities
|
List | pgsql-hackers |
On Fri, Aug 20, 2021 at 7:45 AM Robert Haas <robertmhaas@gmail.com> wrote: > I very much doubt that you can get away without some sort of free > space map. Even if in most cases most pages are closed to insertions, > there will be important corner cases where lots of pages are open for > insertions, like when you just deleted a ton of rows and then ran > VACUUM. And we cannot lose track of even one of those open pages or, > if the pre-8.4 state of the world is any indication, we will be super > sad. I agree with all that. The point I was making is that this new FSM design will have an on-disk size that is "a function of the workload". The new FSM can be required to store information about every single page, but that is the worst case. And a rather atypical case. I imagine that we'll find that the new FSM on-disk structure stores far less information than the current FSM in most cases, even though we're operating within the confines of what you've said. I think of this whole area as making heap pages a bit like B-Tree leaf pages. TIDs are stable logical identifiers of rows (that happen to have a physical component, a block number) in the other DB systems that I have referenced -- the heap pages from these systems are therefore intrinsically more like B-Tree leaf pages than those from Postgres heapam. ISTM that that's relevant to total space utilization. Users with sparse deletion patterns in their heap structure will get low space utilization -- an issue that we're familiar with as a problem for B-Tree indexing. I don't think that having a smaller on-disk FSM should be a goal of this project (though I suppose you could say that that aspect enables FSM WAL-logging, which would be nice). Smaller on-disk footprints seem like a natural consequence of this whole direction -- that's all. I also don't think that you're going to see huge space utilization benefits. This project is mostly aimed at reducing fragmentation, and all of the many problems that stem from it. > I don't know > if those are exactly the right boundaries, and 10 categories might be > worse than 8 or 16, but I think it's likely correct to suppose that > (a) we don't really care at all how much space is present in closed > pages, and (b) for open pages, exactitude is most important when the > amount of available space is small. I really don't have a detailed opinion on the appropriate number of categories just yet, except that it should be maybe 16 or 20 at the very most -- only real testing is likely to help me to refine my thinking on that. Note that the paper "Towards Effective and Efficient Free Space Management" recommends logarithmic intervals (called "buckets"), with 15 total. Details are under "4 Implementing Object Placement". I think that it's quite possible that the final scheme will not be a linear scale. Plus we may have to account for fill factor settings. -- Peter Geoghegan
pgsql-hackers by date: