Problems with the FSM, heap fillfactor, and temporal locality - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Problems with the FSM, heap fillfactor, and temporal locality |
Date | |
Msg-id | CAH2-WzkEP6wz2Eg7mgKbF-qTPi21+BWrJgNm0qfU5kf0iJBV2g@mail.gmail.com Whole thread Raw |
Responses |
Re: Problems with the FSM, heap fillfactor, and temporal locality
|
List | pgsql-hackers |
I'm concerned about how the FSM gives out pages to heapam. Disabling the FSM entirely helps TPC-C/BenchmarkSQL, which uses non-default heap fillfactors for most tables [1]. Testing has shown that this actually increases throughput for the benchmark (as measured in TPM) by 5% - 9%, even though my approach is totally naive and stupid. My approach makes one or two small tables much bigger, but that doesn't have that much downside for the workload in practice. My approach helps by accidentally improving temporal locality -- related records are more consistently located on the same block, which in practice reduces the number of pages dirtied and the number of FPIs generated. TPC-C has a tendency to insert a set of related tuples together (e.g., order lines from an order), while later updating all of those records together. Interestingly, the problems start before we even begin the benchmark proper, and can be observed directly using simple ad-hoc queries (I developed some simple SQL queries involving ctid for this). BenchmarkSQL's initial bulk loading is performed by concurrent workers that insert related groups of tuples into tables, so that we start out with a realistic amount of old orders to refer back to, etc. I can clearly observe that various natural groupings (e.g., grouping order lines by order number, grouping customers by district + warehouse) actually get initially inserted in a way that leaves tuples in a grouping spread around an excessive number of heap blocks. For example, while most order lines do fit on one block, there is a significant minority of orderlines that span two or more blocks for the master branch case. Whereas with the FSM artificially disabled, the heap relation looks more "pristine" in that related tuples are located on the same blocks (or at least on neighboring blocks). It's possible that one orderline will span two neighboring blocks here, but it will never span three or more blocks. Each order has 5 - 15 order lines, and so I was surprised to see that a small minority or order line tuples end up occupying as many as 5 or 7 heap pages on the master branch (i.e. with the current FSM intact during bulk loading). The underlying cause of this "bulk inserts are surprisingly indifferent to locality" issue seems to be that heap am likes to remember small amounts of space from earlier pages when the backend couldn't fit one last tuple on an earlier target page (before allocating a new page that became the new relcache target page in turn). This is penny wise and pound foolish, because we're eagerly using a little bit more space in a case where we are applying a heap fill factor anyway. I think that we also have a number of related problems. It seems to me that we don't make enough effort to store related heap tuples together -- both during bulk inserts like this, but also during subsequent updates that cannot fit successor tuples on the same heap page. The current design of the FSM seems to assume that it doesn't matter where the free space comes from, as long as you get it from somewhere and as long as fill factor isn't violated -- it cares about the letter of the fill factor law without caring about its spirit or intent. If the FSM tried to get free space from a close-by block, then we might at least see related updates that cannot fit a successor tuple on the same block behave in a coordinated fashion. We might at least have both updates relocate the successor tuple to the same mostly-empty block -- they both notice the same nearby free block, so both sets of successor tuples end up going on the same most-empty block. The two updating backends don't actually coordinate, of course -- this just happens as a consequence of looking for nearby free space. The FSM should probably be taught to treat pages as free space candidates (candidates to give out free space from) based on more sophisticated, locality-aware criteria. The FSM should care about the *rate of change* for a block over time. Suppose you have heap fill factor set to 90. Once a heap block reaches fill factor% full, it ought to not be used to insert new tuples unless and until the used space on the block shrinks *significantly* -- the free space is now supposed to be reserved. It should not be enough for the space used on the page to shrink by just 1% (to 89%). Maybe it should have to reach as low as 50% or less before we flip it back to "free to take space from for new unrelated tuples". The idea is that fill factor space is truly reserved for updates -- that should be "sticky" for all of this to work well. What's the point in having the concept of a heap fill factor at all if we don't make any real effort to enforce that the extra free space left behind is used as intended, for updates of tuples located on the same heap page? Thoughts? [1] https://github.com/petergeoghegan/benchmarksql/commit/3ef4fe71077b40f56b91286d4b874a15835c241e -- Peter Geoghegan
pgsql-hackers by date: