Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date
Msg-id 20210608192711.GA21312@telsasoft.com
Whole thread Raw
In response to Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
List pgsql-hackers
On Tue, Jun 08, 2021 at 12:34:04PM -0500, Justin Pryzby wrote:
> On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote:
> > On 2021-Jun-06, Justin Pryzby wrote:
> > 
> > > However, I also found an autovacuum chewing 100% CPU, and it appears the
> > > problem is actually because autovacuum has locked a page of pg-statistic, and
> > > every other process then gets stuck waiting in the planner.  I checked a few
> > > and found these:
> > 
> > Hmm ... I wonder if this could be related to commits d9d076222f5b,
> > c98763bf51bf, etc.  I don't have any connecting thoughts other than the
> > tuple visibility code being involved.  Do you see any procs with the
> > PROC_IN_SAFE_IC flag set?
> 
> I do have a reindex script which runs CIC, and it does looks suspicious.
> 
> <<Mon Jun  7 22:00:54 CDT 2021: starting db=ts
> ...
> Mon Jun  7 22:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_statio_all_tables_snap_t_idx(reindex
non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: pg_statio_all_tables_snap_t_idx_ccnew
> Mon Jun  7 23:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_toast.pg_toast_33011_index(reindex system)...
> Mon Jun  7 23:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_statio_all_indexes_snap_t_idx(reindex
non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: pg_statio_all_indexes_snap_t_idx_ccnew
> Tue Jun  8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)...
> Tue Jun  8 01:21:20 CDT 2021: ts: permissions: perm_group_idx(reindex non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: perm_group_idx_ccnew
> Tue Jun  8 02:21:20 CDT 2021: ts: permissions: perm_user_idx(reindex non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: perm_user_idx_ccnew
> Tue Jun  8 03:21:20 CDT 2021: ts: permissions: pg_toast.pg_toast_33577_index(reindex system)...
> Tue Jun  8 03:21:20 CDT 2021: ts: patchfiles: patchfiles_filename_idx(reindex non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: patchfiles_filename_idx_ccnew
> Tue Jun  8 04:21:21 CDT 2021: ts: patchfiles: patchfiles_pkey(reindex non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: patchfiles_pkey_ccnew
> 
> => It's strange that these timed out, since the statio tables are less than
> 15MB, and permissions and patchfiles are both under 100kB.
> 
> And it seems to say that it time out after less than 1sec.

Oops, no: it timed out after 3600sec, as requested.

> They're running this:
> | PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i"
> And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad"
...
> $ date -d @1623121264
> Mon Jun  7 22:01:04 CDT 2021

Which is probably because the reindex was waiting for the vacuum process to
finish (or maybe waiting on the page that vacuum had locked?).

-- 
Justin



pgsql-hackers by date:

Previous
From: David Christensen
Date:
Subject: Re: DELETE CASCADE
Next
From: Jeff Davis
Date:
Subject: Re: Make unlogged table resets detectable