Re: The Free Space Map: Problems and Opportunities - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: The Free Space Map: Problems and Opportunities |
Date | |
Msg-id | CA+TgmoYiqJoDjhkdna+D6QcNpYkRUGVzZN8RbctmGDH7g2gGmw@mail.gmail.com Whole thread Raw |
In response to | Re: The Free Space Map: Problems and Opportunities (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: The Free Space Map: Problems and Opportunities
|
List | pgsql-hackers |
On Wed, Aug 18, 2021 at 1:05 PM Peter Geoghegan <pg@bowt.ie> wrote: > Maybe there isn't even a conventional FSM in this new world. Maybe > free space management works by focusing on recent events, and on > outcomes. This means that we store much less information about the > entire database, and much more information about very recent events. > In particular, I hope that free space management won't have to care > about how much free space most individual heap pages have. Perhaps > most heap pages are simply "closed", unavailable for reuse. I very much doubt that you can get away without some sort of free space map. Even if in most cases most pages are closed to insertions, there will be important corner cases where lots of pages are open for insertions, like when you just deleted a ton of rows and then ran VACUUM. And we cannot lose track of even one of those open pages or, if the pre-8.4 state of the world is any indication, we will be super sad. I suppose technically it doesn't need to be a map; it could be structured some other way. But it is going to have to be able to keep track of an arbitrary number of open pages without losing any, and a map is one good way of doing that. Regarding the FSM_CATEGORIES question, I think the issue is big tuples vs. small tuples. You don't want to keep on finding pages with some free space and then find out that they don't actually have enough for the tuple you want to insert. Now that does become less of an issue with this open/closed system because, at least initially, when a page is re-opened to inserts, more than a quarter of the page will be free (I presume) and so the details don't matter. But, as it gets closer to filling up, it might matter again. Perhaps we don't want to waste time looking through a bunch of pages with 1kB free when we're trying to insert a 2kB tuple. We could close all those pages as we try them, to prevent repeating the same process over and over again, but maybe the 2kB tuple we're now trying to insert is unusual, and most tuples are 40 bytes. Then we look silly. One idea might be to have a CLOSED state in the FSM and then a variety of OPEN states that are distinguished by amount of available free space. For example, imagine something like CLOSED, OPEN (>2kb), OPEN (1.5-2kb), OPEN (1-1.5kB), OPEN (768-1023 byte), OPEN (512-767 bytes), OPEN (256-511 bytes), OPEN (128-255 bytes), OPEN (64-127 bytes), OPEN (<64 bytes). I think that would give us enough information to pack open pages tightly full before having them become closed. I don't know if those are exactly the right boundaries, and 10 categories might be worse than 8 or 16, but I think it's likely correct to suppose that (a) we don't really care at all how much space is present in closed pages, and (b) for open pages, exactitude is most important when the amount of available space is small. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: