Thread: Maximize page freezing
Starting new thread with updated patch to avoid confusion, as mentioned by David Steele on the original thread: Original messageid: 20201118020418.GA13408@alvherre.pgsql On Wed, 18 Nov 2020 at 02:04, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2020-Nov-17, Simon Riggs wrote: > > > As an additional optimization, if we do find a row that needs freezing > > on a data block, we should simply freeze *all* row versions on the > > page, not just the ones below the selected cutoff. This is justified > > since writing the block is the biggest cost and it doesn't make much > > sense to leave a few rows unfrozen on a block that we are dirtying. > > Yeah. We've had earlier proposals to use high and low watermarks: if any > tuple is past the high watermark, then freeze all tuples that are past > the low watermark. However this is ancient thinking (prior to > HEAP_XMIN_FROZEN) and we don't need the low watermark to be different > from zero, since the original xid is retained anyway. > > So +1 for this idea. Updated patch attached. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Thu, 28 Jul 2022 at 15:36, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > Starting new thread with updated patch to avoid confusion, as > mentioned by David Steele on the original thread: > Original messageid: 20201118020418.GA13408@alvherre.pgsql > On Wed, 18 Nov 2020 at 02:04, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2020-Nov-17, Simon Riggs wrote: > > > > > As an additional optimization, if we do find a row that needs freezing > > > on a data block, we should simply freeze *all* row versions on the > > > page, not just the ones below the selected cutoff. This is justified > > > since writing the block is the biggest cost and it doesn't make much > > > sense to leave a few rows unfrozen on a block that we are dirtying. > > > > Yeah. We've had earlier proposals to use high and low watermarks: if any > > tuple is past the high watermark, then freeze all tuples that are past > > the low watermark. However this is ancient thinking (prior to > > HEAP_XMIN_FROZEN) and we don't need the low watermark to be different > > from zero, since the original xid is retained anyway. > > > > So +1 for this idea. > > Updated patch attached. Great idea, yet this patch seems to only freeze those tuples that are located after the first to-be-frozen tuple. It should probably re-visit earlier live tuples to potentially freeze those as well. Kind regards, Matthias van de Meent
On Thu, Jul 28, 2022 at 6:56 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > Great idea, yet this patch seems to only freeze those tuples that are > located after the first to-be-frozen tuple. It should probably > re-visit earlier live tuples to potentially freeze those as well. I have a big patch set pending that does this (which I dubbed "page-level freezing"), plus a bunch of other things that control the overhead. Although the basic idea of freezing all of the tuples on a page together appears in earlier patching that were posted. These were things that didn't make it into Postgres 15. I should be able to post something in a couple of weeks. -- Peter Geoghegan
On Thu, 28 Jul 2022 at 20:57, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Jul 28, 2022 at 6:56 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > Great idea, yet this patch seems to only freeze those tuples that are > > located after the first to-be-frozen tuple. It should probably > > re-visit earlier live tuples to potentially freeze those as well. > > I have a big patch set pending that does this (which I dubbed > "page-level freezing"), plus a bunch of other things that control the > overhead. Although the basic idea of freezing all of the tuples on a > page together appears in earlier patching that were posted. These were > things that didn't make it into Postgres 15. Yes, my patch from 2020 was never reviewed, which is why I was resubmitting here. > I should be able to post something in a couple of weeks. How do you see that affecting this thread? -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, 28 Jul 2022 at 14:55, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Thu, 28 Jul 2022 at 15:36, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > Starting new thread with updated patch to avoid confusion, as > > mentioned by David Steele on the original thread: > > Original messageid: 20201118020418.GA13408@alvherre.pgsql > > On Wed, 18 Nov 2020 at 02:04, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > On 2020-Nov-17, Simon Riggs wrote: > > > > > > > As an additional optimization, if we do find a row that needs freezing > > > > on a data block, we should simply freeze *all* row versions on the > > > > page, not just the ones below the selected cutoff. This is justified > > > > since writing the block is the biggest cost and it doesn't make much > > > > sense to leave a few rows unfrozen on a block that we are dirtying. > > > > > > Yeah. We've had earlier proposals to use high and low watermarks: if any > > > tuple is past the high watermark, then freeze all tuples that are past > > > the low watermark. However this is ancient thinking (prior to > > > HEAP_XMIN_FROZEN) and we don't need the low watermark to be different > > > from zero, since the original xid is retained anyway. > > > > > > So +1 for this idea. > > > > Updated patch attached. > > Great idea, yet this patch seems to only freeze those tuples that are > located after the first to-be-frozen tuple. It should probably > re-visit earlier live tuples to potentially freeze those as well. Like this? -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Fri, Jul 29, 2022 at 5:55 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > I should be able to post something in a couple of weeks. > > How do you see that affecting this thread? Well, it's clearly duplicative, at least in part. That in itself doesn't mean much, but there are some general questions (that apply to any variant of proactive/batched freezing), particularly around the added overhead, and the question of whether or not we get to advance relfrozenxid substantially in return for that cost. Those parts are quite tricky. I have every intention of addressing these thorny questions in my upcoming patch set, which actually does far more than change the rules about when and how we freeze -- changing the mechanism itself is very much the easy part. I'm taking a holistic approach that involves making an up-front decision about freezing strategy based on the observed characteristics of the table, driven by what we see in the visibility map at the start. Similar questions will also apply to this patch, even though it isn't as aggressive (your patch doesn't trigger freezing when a page is about to be set all-visible in order to make sure that it can be set all-frozen instead). You still want to give the user a clear benefit for any added overhead. It needs a great deal of performance validation, too. -- Peter Geoghegan
On Fri, 29 Jul 2022 at 16:38, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Thu, 28 Jul 2022 at 14:55, Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > Great idea, yet this patch seems to only freeze those tuples that are > > located after the first to-be-frozen tuple. It should probably > > re-visit earlier live tuples to potentially freeze those as well. > > Like this? That wasn't quite what I imagined. In your patch, heap_page_prune is disabled after the first frozen tuple, which makes the retry mechanism with the HTSV check loop forever because it expects that tuple to be vacuumed. I was thinking more in the line of "do a backtrack in a specialized code block when entering max_freeze_page mode" (without using 'retry'), though I'm not sure whether that's the best option available. Kind regards, Matthias van de Meent