Open issues for HOT patch - Mailing list pgsql-hackers

I have finished a first review pass over all of the HOT patch
(updated code is posted on -patches).  I haven't found any showstoppers,
but there seem still several areas that need discussion:

* The patch makes undocumented changes that cause autovacuum's decisions
to be driven by total estimated dead space rather than total number of
dead tuples.  Do we like this?  What should happen to the default
threshold parameters (they are not even in the same units as before...)?
Is there any value in even continuing to track dead tuple counts, per
se, in the pgstats machinery?  It seems redundant/expensive to track
both tuple counts and byte counts, and it's not like the size of the
stats file is not already known to be a performance issue ...

* I'm still pretty unhappy about the patch's use of a relcache copy of
GetAvgFSMRequestSize()'s result.  The fact that there's no provision for
ever updating the value while the relcache entry lives is part of it,
but the bigger part is that I'd rather not have anything at all
depending on that number.  FSM in its current form desperately needs to
die; and once it's replaced by some form of distributed on-disk storage,
it's unlikely that we will have any simple means of getting an
equivalent number.  The average request size was never meant for
external use anyway, but only as a filter to help reject useless entries
from getting into the limited shared-memory FSM space.  Perhaps we could
replace that heuristic with something that is page-local; seems like
dividing the total used space by the number of item pointers would give
at least a rough approximation of the page's average tuple size.

* We also need to think harder about when to invoke the page pruning
code.  As the patch stands, if you set a breakpoint at
heap_page_prune_opt it'll seem to be hit constantly (eg, once for every
system catalog probe), which seems uselessly often.  And yet it also
seems not often enough, because one thing I found out real fast is that
the "prune if free space < 1.2 average tuple size" heuristic fails badly
when you look at queries that execute multiple updates within the same
heap page.  We only prune when we first pin a particular target page,
and so the additional updates don't afford another chance to see if it's
time to prune.

I'd like to see if we can arrange to only do pruning when reading a page
that is known to be an update target (ie, never during plain SELECTs);
I suspect this would be relatively easy with some executor and perhaps
planner changes.  But that only fixes the first half of the gripe above;
I'm not at all sure what to do about the multiple-updates-per-page
issue.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Sibte Abbas"
Date:
Subject: Re: Raw device I/O for large objects
Next
From: Bruce Momjian
Date:
Subject: Re: Open issues for HOT patch