Hi,
On 2021-11-22 14:34:36 -0800, Andres Freund wrote:
> I think I'm actually just not sure what the better approach for the
> backbranches is. I'm worried about the size of your patch, I'm worried about
> the craziness of the current architecture (if you can call it that) of
> heap_page_prune() with my patch.
Given the lack of further discussion, let's go with the "minimal" fix and your
stuff later. Attached is further polished patch - no significant changes,
plenty copy-editing stuff.
0001 is the bugfix
0002 adds the additional assertions - I'm wondering about only committing that
in HEAD?
I think your patch should easily be able to use prstate->htsv?
0003 removes the redundant RelationGetNumberOfBlocks() calls. As 0001 does not
change the total number of RelationGetNumberOfBlocks() calls I'm inclined to
just apply this to HEAD?
0004 is a patch that tries to address your point about the GlobalVisTestFor()
placement, sort of
My earlier point that moving it to earlier would be a bad idea because it'd
make the horizon "older" was bogus - GlobalVisTestFor() doesn't itself do
any horizon determination. However moving the GlobalVisTestFor() earlier
still seems wrong - imo the vacuum_set_xid_limits() call should be moved to
later. The attached patch moves both, wrapped in a new function, to just
before the scan in lazy_scan_heap().
This clearly is HEAD only material - I'm only bringing it up here because
the issue of the GlobalVisTestFor() placement was raised in here...
> > > I think it's worth to clean up the regression test I wrote and use it
> > > regardless of which version of the fix we end up choosing. But I'm a bit bit
> > > on the fence - it's quite complicated.
> >
> > +1 to the idea of keeping it somewhere. Without necessarily running it
> > on the BF.
> >
> > > OTOH, I think with the framework in place it'd not be too hard to write a few
> > > more tests for odd pruning scenarios...
> >
> > Can we commit a test like this without running it by default, at all?
> > It's not like it has never been done before.
>
> Is there a reason not to run it if we commit it? IME tests not run by default
> tend to bitrot very quickly. The only issue that I can think of is that it
> might be hard to make the test useful and robust in the presence of
> debug_discard_caches != 0.
I tried pretty hard to get the test to be reliable enough to commit it. But in
the end I think using the index locking as a "control" mechanism just isn't
great - as evidenced by 0003 breaking the test entirely. I think unfortunately
there's not much hope for testing this unless we get wait points - which seems
not too close :(.
Greetings,
Andres Freund