Re: The Free Space Map: Problems and Opportunities - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: The Free Space Map: Problems and Opportunities |
Date | |
Msg-id | CA+TgmoaGAD9DcDj9M374OfbXMwCurcyooAL201j041VgVJ09rw@mail.gmail.com Whole thread Raw |
In response to | Re: The Free Space Map: Problems and Opportunities (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: The Free Space Map: Problems and Opportunities
|
List | pgsql-hackers |
On Mon, Sep 6, 2021 at 8:29 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Sep 6, 2021 at 4:33 PM Hannu Krosing <hannuk@google.com> wrote: > > When I have been thinking of this type of problem it seems that the > > latest -- and correct :) -- place which should do all kinds of > > cleanup like removing aborted tuples, freezing committed tuples and > > setting any needed hint bits would be background writer or CHECKPOINT. > > > > This would be more PostgreSQL-like, as it moves any work not > > immediately needed from the critical path, as an extension of how MVCC > > for PostgreSQL works in general. > > I think it depends. There is no need to do work in the background > here, with TPC-C. With my patch series each backend can know that it > just had an aborted transaction that affected a page that it more or > less still owns. And has very close at hand, for further inserts. It's > very easy to piggy-back the work once you have that sense of ownership > of newly allocated heap pages by individual backends/transactions. Doing work in the background has some advantages, though. In particular, it has the possibly-large advantage of not slowing down foreground work. For me the key insight here is that HOT-pruning a heap page is a lot cheaper if you do it before you write the page. Once you've written the page, the eventual HOT-prune is going to need to dirty it, which will cause it to be written again. If you prune before writing it the first time, that cost is avoided. I'm not sure that it really matters whether the space consumed by aborted tuples gets reused by "the very next transaction" or, say, 10 transactions after that, or even 1000 transactions after that. If you wait for a million transactions, you've quite possibly lost enough temporality to matter, but at 10 or 1000 that's much less likely. The exact threshold is fuzzy: every moment you wait makes it less likely that you have sufficient locality, but you can always find a workload where even a very long wait is acceptable, and another one where even a tiny delay is catastrophic, and it's hard to say what the "real world" looks like. On the other hand, there's nothing fuzzy about the expense incurred by writing the page before it's HOT-pruned. That is essentially certain to incur an extra page write, except in the corner case where the relation gets dropped or truncated before then. So I think you might find that if you found a way to ensure that HOT-pruning -- and entry into the FSM -- always happens for every heap page just before it's written, if it hasn't already happened sooner and might be needed, you might end up in a pretty good spot. It wouldn't even be ignoring temporal locality, since at minimum dirty pages are written once per checkpoint cycle. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: