Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: Berserk Autovacuum (let's save next Mandrill)
Date
Msg-id CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com
Whole thread Raw
In response to Re: Berserk Autovacuum (let's save next Mandrill)  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Berserk Autovacuum (let's save next Mandrill)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, 31 Mar 2020 at 04:39, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Sat, 28 Mar 2020 at 22:22, David Rowley <dgrowleyml@gmail.com> wrote:
> > I'm unsure yet if this has caused an instability on lousyjack's run in
> > [1].
>
> pogona has just joined in on the fun [1], so, we're not out the woods
> on this yet. I'll start having a look at this in more detail.
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03
>

I had a go at reproducing this. I wasn't able to produce the reported
failure, but I can reliably produce an Assert failure that may be
related by doing a VACUUM FULL simultaneously with an ANALYZE that is
generating extended stats, which produces:

#0  0x00007f28081c9520 in raise () from /lib64/libc.so.6
#1  0x00007f28081cab01 in abort () from /lib64/libc.so.6
#2  0x0000000000aad1ad in ExceptionalCondition (conditionName=0xb2f1a1
"ItemIdIsNormal(lp)", errorType=0xb2e7c9 "FailedAssertion",
fileName=0xb2e848 "heapam.c", lineNumber=3016) at assert.c:67
#3  0x00000000004fb79e in heap_update (relation=0x7f27feebeda8,
otid=0x2d881fc, newtup=0x2d881f8, cid=0, crosscheck=0x0, wait=true,
tmfd=0x7ffc568a5900, lockmode=0x7ffc568a58fc) at heapam.c:3016
#4  0x00000000004fdead in simple_heap_update (relation=0x7f27feebeda8,
otid=0x2d881fc, tup=0x2d881f8) at heapam.c:3902
#5  0x00000000005be860 in CatalogTupleUpdate (heapRel=0x7f27feebeda8,
otid=0x2d881fc, tup=0x2d881f8) at indexing.c:230
#6  0x00000000008df898 in statext_store (statOid=18964, ndistinct=0x0,
dependencies=0x2a85fe0, mcv=0x0, stats=0x2a86570) at
extended_stats.c:553
#7  0x00000000008deec0 in BuildRelationExtStatistics
(onerel=0x7f27feed9008, totalrows=5000, numrows=5000, rows=0x2ad5a30,
natts=7, vacattrstats=0x2a75f40) at extended_stats.c:187
#8  0x000000000065c1b7 in do_analyze_rel (onerel=0x7f27feed9008,
params=0x7ffc568a5fc0, va_cols=0x0, acquirefunc=0x65ce37
<acquire_sample_rows>, relpages=31, inh=false, in_outer_xact=false,
elevel=13) at analyze.c:606
#9  0x000000000065b532 in analyze_rel (relid=18956,
relation=0x29b0bc0, params=0x7ffc568a5fc0, va_cols=0x0,
in_outer_xact=false, bstrategy=0x2a7dfa0) at analyze.c:263
#10 0x00000000006fd768 in vacuum (relations=0x2a7e148,
params=0x7ffc568a5fc0, bstrategy=0x2a7dfa0, isTopLevel=true) at
vacuum.c:468
#11 0x00000000006fd22c in ExecVacuum (pstate=0x2a57a00,
vacstmt=0x29b0ca8, isTopLevel=true) at vacuum.c:251

It looks to me as though the problem is that statext_store() needs to
take its lock on pg_statistic_ext_data *before* searching for the
stats tuple to update.

It's late here, so I haven't worked up a patch yet, but it looks
pretty straightforward.

Regards,
Dean



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)
Next
From: Bruce Momjian
Date:
Subject: Re: Small docs bugfix: make it clear what can be used in UPDATE FROMand DELETE USING