Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic |
Date | |
Msg-id | 20210608192711.GA21312@telsasoft.com Whole thread Raw |
In response to | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
|
List | pgsql-hackers |
On Tue, Jun 08, 2021 at 12:34:04PM -0500, Justin Pryzby wrote: > On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote: > > On 2021-Jun-06, Justin Pryzby wrote: > > > > > However, I also found an autovacuum chewing 100% CPU, and it appears the > > > problem is actually because autovacuum has locked a page of pg-statistic, and > > > every other process then gets stuck waiting in the planner. I checked a few > > > and found these: > > > > Hmm ... I wonder if this could be related to commits d9d076222f5b, > > c98763bf51bf, etc. I don't have any connecting thoughts other than the > > tuple visibility code being involved. Do you see any procs with the > > PROC_IN_SAFE_IC flag set? > > I do have a reindex script which runs CIC, and it does looks suspicious. > > <<Mon Jun 7 22:00:54 CDT 2021: starting db=ts > ... > Mon Jun 7 22:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_statio_all_tables_snap_t_idx(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: pg_statio_all_tables_snap_t_idx_ccnew > Mon Jun 7 23:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_toast.pg_toast_33011_index(reindex system)... > Mon Jun 7 23:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_statio_all_indexes_snap_t_idx(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: pg_statio_all_indexes_snap_t_idx_ccnew > Tue Jun 8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)... > Tue Jun 8 01:21:20 CDT 2021: ts: permissions: perm_group_idx(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: perm_group_idx_ccnew > Tue Jun 8 02:21:20 CDT 2021: ts: permissions: perm_user_idx(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: perm_user_idx_ccnew > Tue Jun 8 03:21:20 CDT 2021: ts: permissions: pg_toast.pg_toast_33577_index(reindex system)... > Tue Jun 8 03:21:20 CDT 2021: ts: patchfiles: patchfiles_filename_idx(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: patchfiles_filename_idx_ccnew > Tue Jun 8 04:21:21 CDT 2021: ts: patchfiles: patchfiles_pkey(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: patchfiles_pkey_ccnew > > => It's strange that these timed out, since the statio tables are less than > 15MB, and permissions and patchfiles are both under 100kB. > > And it seems to say that it time out after less than 1sec. Oops, no: it timed out after 3600sec, as requested. > They're running this: > | PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i" > And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad" ... > $ date -d @1623121264 > Mon Jun 7 22:01:04 CDT 2021 Which is probably because the reindex was waiting for the vacuum process to finish (or maybe waiting on the page that vacuum had locked?). -- Justin
pgsql-hackers by date: