Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic |
Date | |
Msg-id | CAEze2WgT63ggfP7KXdxC7d1xnxxWKFoeYs=1DeaGvc+XF=xyEw@mail.gmail.com Whole thread Raw |
In response to | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic |
List | pgsql-hackers |
On Tue, 8 Jun 2021 at 14:11, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Tue, Jun 08, 2021 at 01:54:41PM +0200, Matthias van de Meent wrote: > > On Tue, 8 Jun 2021 at 13:03, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote: > > > > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > I'll leave the instance running for a little bit before restarting (or kill-9) > > > > > in case someone requests more info. > > > > > > > > How about dumping the page image out, and sharing it with the list? > > > > This procedure should work fine from gdb: > > > > > > > > https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB > > > > > > > I suggest that you dump the "page" pointer inside lazy_scan_prune(). I > > > > imagine that you have the instance already stuck in an infinite loop, > > > > so what we'll probably see from the page image is the page after the > > > > first prune and another no-progress prune. > > > > > > The cluster was again rejecting with "too many clients already". > > > > > > I was able to open a shell this time, but it immediately froze when I tried to > > > tab complete "pg_stat_acti"... > > > > > > I was able to dump the page image, though - attached. I can send you its > > > "data" privately, if desirable. I'll also try to step through this. > > > > Could you attach a dump of lazy_scan_prune's vacrel, all the global > > visibility states (GlobalVisCatalogRels, and possibly > > GlobalVisSharedRels, GlobalVisDataRels, and GlobalVisTempRels), and > > heap_page_prune's PruneState? > > (gdb) p *vacrel > $56 = {... OldestXmin = 926025113, ...} > > (gdb) p GlobalVisCatalogRels > $57 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}} This maybe_needed is older than the OldestXmin, which indeed gives us this problematic behaviour: heap_prune_satisfies_vacuum considers 1 more transaction to be unvacuumable, and thus indeed won't vacuum that tuple that HeapTupleSatisfiesVacuum does want to be vacuumed. The new open question is now: Why is GlobalVisCatalogRels->maybe_needed < OldestXmin? IIRC GLobalVisCatalogRels->maybe_needed is constructed from the same ComputeXidHorizonsResult->catalog_oldest_nonremovable which later is returned to be used in vacrel->OldestXmin. > Maybe you need to know that this is also returning RECENTLY_DEAD. I had expected that, but good to have confirmation. Thanks for the information! With regards, Matthias van de Meent.
pgsql-hackers by date: