Problems with the FSM, heap fillfactor, and temporal locality - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Problems with the FSM, heap fillfactor, and temporal locality
Date
Msg-id CAH2-WzkEP6wz2Eg7mgKbF-qTPi21+BWrJgNm0qfU5kf0iJBV2g@mail.gmail.com
Whole thread Raw
Responses Re: Problems with the FSM, heap fillfactor, and temporal locality
List pgsql-hackers
I'm concerned about how the FSM gives out pages to heapam. Disabling
the FSM entirely helps TPC-C/BenchmarkSQL, which uses non-default heap
fillfactors for most tables [1]. Testing has shown that this actually
increases throughput for the benchmark (as measured in TPM) by 5% -
9%, even though my approach is totally naive and stupid. My approach
makes one or two small tables much bigger, but that doesn't have that
much downside for the workload in practice. My approach helps by
accidentally improving temporal locality -- related records are more
consistently located on the same block, which in practice reduces the
number of pages dirtied and the number of FPIs generated. TPC-C has a
tendency to insert a set of related tuples together (e.g., order lines
from an order), while later updating all of those records together.

Interestingly, the problems start before we even begin the benchmark
proper, and can be observed directly using simple ad-hoc queries (I
developed some simple SQL queries involving ctid for this).
BenchmarkSQL's initial bulk loading is performed by concurrent workers
that insert related groups of tuples into tables, so that we start out
with a realistic amount of old orders to refer back to, etc. I can
clearly observe that various natural groupings (e.g., grouping order
lines by order number, grouping customers by district + warehouse)
actually get initially inserted in a way that leaves tuples in a
grouping spread around an excessive number of heap blocks. For
example, while most order lines do fit on one block, there is a
significant minority of orderlines that span two or more blocks for
the master branch case. Whereas with the FSM artificially disabled,
the heap relation looks more "pristine" in that related tuples are
located on the same blocks (or at least on neighboring blocks). It's
possible that one orderline will span two neighboring blocks here, but
it will never span three or more blocks. Each order has 5 - 15 order
lines, and so I was surprised to see that a small minority or order
line tuples end up occupying as many as 5 or 7 heap pages on the
master branch (i.e. with the current FSM intact during bulk loading).

The underlying cause of this "bulk inserts are surprisingly
indifferent to locality" issue seems to be that heap am likes to
remember small amounts of space from earlier pages when the backend
couldn't fit one last tuple on an earlier target page (before
allocating a new page that became the new relcache target page in
turn). This is penny wise and pound foolish, because we're eagerly
using a little bit more space in a case where we are applying a heap
fill factor anyway. I think that we also have a number of related
problems.

It seems to me that we don't make enough effort to store related heap
tuples together -- both during bulk inserts like this, but also during
subsequent updates that cannot fit successor tuples on the same heap
page. The current design of the FSM seems to assume that it doesn't
matter where the free space comes from, as long as you get it from
somewhere and as long as fill factor isn't violated -- it cares about
the letter of the fill factor law without caring about its spirit or
intent.

If the FSM tried to get free space from a close-by block, then we
might at least see related updates that cannot fit a successor tuple
on the same block behave in a coordinated fashion. We might at least
have both updates relocate the successor tuple to the same
mostly-empty block -- they both notice the same nearby free block, so
both sets of successor tuples end up going on the same most-empty
block. The two updating backends don't actually coordinate, of course
-- this just happens as a consequence of looking for nearby free
space.

The FSM should probably be taught to treat pages as free space
candidates (candidates to give out free space from) based on more
sophisticated, locality-aware criteria. The FSM should care about the
*rate of change* for a block over time. Suppose you have heap fill
factor set to 90. Once a heap block reaches fill factor% full, it
ought to not be used to insert new tuples unless and until the used
space on the block shrinks *significantly* -- the free space is now
supposed to be reserved. It should not be enough for the space used on
the page to shrink by just 1% (to 89%). Maybe it should have to reach
as low as 50% or less before we flip it back to "free to take space
from for new unrelated tuples". The idea is that fill factor space is
truly reserved for updates -- that should be "sticky" for all of this
to work well.

What's the point in having the concept of a heap fill factor at all if
we don't make any real effort to enforce that the extra free space
left behind is used as intended, for updates of tuples located on the
same heap page?

Thoughts?

[1] https://github.com/petergeoghegan/benchmarksql/commit/3ef4fe71077b40f56b91286d4b874a15835c241e
--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jeremy Schneider
Date:
Subject: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)
Next
From: Masahiko Sawada
Date:
Subject: Re: SyncRepLock acquired exclusively in default configuration