Re: Problems with the FSM, heap fillfactor, and temporal locality - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Problems with the FSM, heap fillfactor, and temporal locality
Date
Msg-id 20200825132148.GC29590@tamriel.snowman.net
Whole thread Raw
In response to Re: Problems with the FSM, heap fillfactor, and temporal locality  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Problems with the FSM, heap fillfactor, and temporal locality
List pgsql-hackers
Greetings,

* Peter Geoghegan (pg@bowt.ie) wrote:
> On Mon, Aug 24, 2020 at 6:38 AM John Naylor <john.naylor@2ndquadrant.com> wrote:
> > Other ideas?
>
> I've been experimenting with changing the way that we enforce heap
> fill factor with calls to heap_insert() (and even heap_update()) that
> happen to occur at a "natural temporal boundary". This works by
> remembering an XID alongside the target block in the relcache when the
> target block is set. When we have an existing target block whose XID
> does not match our backend's current XID (i.e. it's an old XID for the
> backend), then that means we're at one of these boundaries. We require
> that the page has a little more free space before we'll insert on it
> when at a boundary. If we barely have enough space to insert the
> incoming heap tuple, and it's the first of a few tuples the
> transaction will ultimately insert, then we should start early on a
> new page instead of using the last little bit of space (note that the
> "last little bit" of space does not include the space left behind by
> fill factor). The overall effect is that groups of related tuples are
> much less likely to span a heap page boundary unless and until we have
> lots of updates -- though maybe not even then. I think that it's very
> common for transactions to insert a group of 2 - 15 logically related
> tuples into a table at a time.

This all definitely sounds quite interesting and the idea to look at the
XID to see if we're in the same transaction and therefore likely
inserting a related tuple certainly makes some sense.  While I get that
it might not specifically work with TPC-C, I'm wondering about if we
could figure out how to make a multi-tuple INSERT use
heap/table_multi_insert (which seems to only be used by COPY currently,
and internally thanks to the recent work to use it for some catalog
tables) and then consider the size of the entire set of tuples being
INSERT'd when working to find a page, or deciding if we should extend
the relation.

> Roughly speaking, you can think of this as the heapam equivalent of
> the nbtree page split choice logic added by commit fab25024. We ought
> to go to at least a little bit of effort to minimize the number of
> distinct XIDs that are present on each heap page (in the tuple
> headers). We can probably figure out heuristics that result in
> respecting heap fill factor on average, while giving inserts (and even
> non-HOT updates) a little wiggle room when it comes to heap page
> boundaries.

Agreed.

> By applying both of these techniques together (locality/page split
> thing and real atomic ops for fp_next_slot) the prototype patch I'm
> working on mostly restores the system's current ability to reuse space
> (as measured by the final size of relations when everything is done),
> while maintaining most of the performance benefits of not using the
> FSM at all. The code is still pretty rough, though.
>
> I haven't decided how far to pursue this. It's not as if there are
> that many ways to make TPC-C go 5%+ faster left; it's very
> write-heavy, and stresses many different parts of the system all at
> once. I'm sure that anything like my current prototype patch will be
> controversial, though. Maybe it will be acceptable if we only change
> the behavior for people that explicitly set heap fillfactor.

Getting a 5% improvement is pretty exciting, very cool and seems worth
spending effort on.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: [PATCH] Resource leaks (src/backend/libpq/hba.c)
Next
From: Heikki Linnakangas
Date:
Subject: Re: Refactor pg_rewind code and make it work against a standby