Re: The Free Space Map: Problems and Opportunities - Mailing list pgsql-hackers

From Robert Haas
Subject Re: The Free Space Map: Problems and Opportunities
Date
Msg-id CA+TgmoaGAD9DcDj9M374OfbXMwCurcyooAL201j041VgVJ09rw@mail.gmail.com
Whole thread Raw
In response to Re: The Free Space Map: Problems and Opportunities  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: The Free Space Map: Problems and Opportunities
List pgsql-hackers
On Mon, Sep 6, 2021 at 8:29 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Mon, Sep 6, 2021 at 4:33 PM Hannu Krosing <hannuk@google.com> wrote:
> > When I have been thinking of this type of problem it seems that the
> > latest -- and correct :) --  place which should do all kinds of
> > cleanup like removing aborted tuples, freezing committed tuples and
> > setting any needed hint bits would be background writer or CHECKPOINT.
> >
> > This would be more PostgreSQL-like, as it moves any work not
> > immediately needed from the critical path, as an extension of how MVCC
> > for PostgreSQL works in general.
>
> I think it depends. There is no need to do work in the background
> here, with TPC-C. With my patch series each backend can know that it
> just had an aborted transaction that affected a page that it more or
> less still owns. And has very close at hand, for further inserts. It's
> very easy to piggy-back the work once you have that sense of ownership
> of newly allocated heap pages by individual backends/transactions.

Doing work in the background has some advantages, though. In
particular, it has the possibly-large advantage of not slowing down
foreground work.

For me the key insight here is that HOT-pruning a heap page is a lot
cheaper if you do it before you write the page. Once you've written
the page, the eventual HOT-prune is going to need to dirty it, which
will cause it to be written again. If you prune before writing it the
first time, that cost is avoided. I'm not sure that it really matters
whether the space consumed by aborted tuples gets reused by "the very
next transaction" or, say, 10 transactions after that, or even 1000
transactions after that. If you wait for a million transactions,
you've quite possibly lost enough temporality to matter, but at 10 or
1000 that's much less likely. The exact threshold is fuzzy: every
moment you wait makes it less likely that you have sufficient
locality, but you can always find a workload where even a very long
wait is acceptable, and another one where even a tiny delay is
catastrophic, and it's hard to say what the "real world" looks like.

On the other hand, there's nothing fuzzy about the expense incurred by
writing the page before it's HOT-pruned. That is essentially certain
to incur an extra page write, except in the corner case where the
relation gets dropped or truncated before then. So I think you might
find that if you found a way to ensure that HOT-pruning -- and entry
into the FSM -- always happens for every heap page just before it's
written, if it hasn't already happened sooner and might be needed, you
might end up in a pretty good spot. It wouldn't even be ignoring
temporal locality, since at minimum dirty pages are written once per
checkpoint cycle.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Don't clean up LLVM state when exiting in a bad way
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] Add tab-complete for backslash commands