Open issues for HOT patch - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Open issues for HOT patch |
Date | |
Msg-id | 24348.1190083765@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Open issues for HOT patch
Re: Open issues for HOT patch Re: Open issues for HOT patch Re: Open issues for HOT patch |
List | pgsql-hackers |
I have finished a first review pass over all of the HOT patch (updated code is posted on -patches). I haven't found any showstoppers, but there seem still several areas that need discussion: * The patch makes undocumented changes that cause autovacuum's decisions to be driven by total estimated dead space rather than total number of dead tuples. Do we like this? What should happen to the default threshold parameters (they are not even in the same units as before...)? Is there any value in even continuing to track dead tuple counts, per se, in the pgstats machinery? It seems redundant/expensive to track both tuple counts and byte counts, and it's not like the size of the stats file is not already known to be a performance issue ... * I'm still pretty unhappy about the patch's use of a relcache copy of GetAvgFSMRequestSize()'s result. The fact that there's no provision for ever updating the value while the relcache entry lives is part of it, but the bigger part is that I'd rather not have anything at all depending on that number. FSM in its current form desperately needs to die; and once it's replaced by some form of distributed on-disk storage, it's unlikely that we will have any simple means of getting an equivalent number. The average request size was never meant for external use anyway, but only as a filter to help reject useless entries from getting into the limited shared-memory FSM space. Perhaps we could replace that heuristic with something that is page-local; seems like dividing the total used space by the number of item pointers would give at least a rough approximation of the page's average tuple size. * We also need to think harder about when to invoke the page pruning code. As the patch stands, if you set a breakpoint at heap_page_prune_opt it'll seem to be hit constantly (eg, once for every system catalog probe), which seems uselessly often. And yet it also seems not often enough, because one thing I found out real fast is that the "prune if free space < 1.2 average tuple size" heuristic fails badly when you look at queries that execute multiple updates within the same heap page. We only prune when we first pin a particular target page, and so the additional updates don't afford another chance to see if it's time to prune. I'd like to see if we can arrange to only do pruning when reading a page that is known to be an update target (ie, never during plain SELECTs); I suspect this would be relatively easy with some executor and perhaps planner changes. But that only fixes the first half of the gripe above; I'm not at all sure what to do about the multiple-updates-per-page issue. Comments? regards, tom lane
pgsql-hackers by date: