Re: Problems with the FSM, heap fillfactor, and temporal locality - Mailing list pgsql-hackers

From John Naylor
Subject Re: Problems with the FSM, heap fillfactor, and temporal locality
Date
Msg-id CACPNZCsg6SprM_aQ59gKBBswUXAq7NLFxkNTr870oUXdWDq01A@mail.gmail.com
Whole thread Raw
In response to Re: Problems with the FSM, heap fillfactor, and temporal locality  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Problems with the FSM, heap fillfactor, and temporal locality
List pgsql-hackers
On Fri, Aug 21, 2020 at 8:53 PM Peter Geoghegan <pg@bowt.ie> wrote:

> Note that there is a ~20% reduction in blks_hit here, even though the
> patch does ~1% more transactions (the rate limiting doesn't work
> perfectly). There is also a ~5.5% reduction in aggregate
> blk_read_time, and a ~9% reduction in blk_write_time. I'd say that
> that's pretty significant.

Indeed.

> Preferring close-by blocks in the FSM is something that there was
> discussion of when the current FSM implementation went in back in 8.4.

Right, I found a pretty long one here:

https://www.postgresql.org/message-id/flat/1253201179.9666.174.camel%40ebony.2ndQuadrant

> > I'm not sure how to distinguish blocks that have never reached
> > fillfactor vs. ones that did but haven't gained back enough free space
> > to be considered again. Naively, we could start by assuming the last
> > block can always be filled up to fillfactor, but earlier blocks must
> > use the stricter rule. That's easy since we already try the last block
> > anyway before extending the relation.
>
> I was thinking of explicitly marking blocks as "freeable", meaning
> that the FSM will advertise their free space. This isn't self-evident
> from the amount of free space on the page alone, since we need to
> distinguish between at least two cases: the case where a page has yet
> to apply fill factor for the first time (which may still be close to
> fillfactor% full) versus the case where the page did reach fillfactor,
> but then had a small amount of space freed. I think that the FSM ought
> to give out space in the former case, but not in the latter case. Even
> though an identical amount of free space might be present in either
> case.

I imagine you're suggesting to make this change in the FSM data? I'm
thinking we could change the category byte to a signed integer, and
reduce FSM_CATEGORIES to 128. (That gives us 64-byte granularity which
doesn't sound bad, especially if we're considering ignoring free space
for inserts until we get a couple kilobytes back.) The absolute value
represents the actual space. A negative number would always compare as
less, so use the negative range to mark the page as not usable. A
fresh page will have positive-numbered categories until fill_factor is
reached, at which point we flip the sign to negative. When the used
space gets below "min_fill_factor", flip the sign to mark it usable.
Updates would have to be taught to read the absolute value. Managing
the math might be a little tricky, but maybe that could be contained.

One possible disadvantage is that negative values would bubble up to
higher levels in the opposite way that positive ones do, but maybe we
don't care since the pages are marked unusable anyway. All we care is
that all negative numbers give the correct binary comparison when we
search for available space. We could also reserve the high bit as a
flag (1 = usable), and use the lower bits for the value, but I'm not
sure that's better.

We could also preserve the 255 categories as is, but add a second byte
for the flag. If we could imagine other uses for a new byte, this
might be good, but would make the FSM much bigger, which doesn't sound
attractive at all.

Any change of the FSM file would require pg_upgrade to rewrite the
FSM, but it still doesn't seem like a lot of code.

Other ideas?

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: [ASAN] Postgres14 (Windows 64 bits)
Next
From: Tom Lane
Date:
Subject: Re: stress test for parallel workers