Home > mailing lists

Re: new heapcheck contrib module - Mailing list pgsql-hackers

From	Mark Dilger
Subject	Re: new heapcheck contrib module
Date	July 30, 2020 22:10:52
Msg-id	54096A22-3C44-4746-BB38-3FADC2C5147E@enterprisedb.com Whole thread Raw
In response to	Re: new heapcheck contrib module (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: new heapcheck contrib module
List	pgsql-hackers

Tree view

> On Jul 30, 2020, at 2:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jul 30, 2020 at 4:18 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>>> Maybe I'm just being dense here -- exactly what problem are you worried about?
>>
>> Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax committed via TransactionIdDidCommit.  I
amworried about concurrent truncation of clog entries causing I/O errors on SLRU lookup when performing that check.
Thethree strategies I had for dealing with that were taking the XactTruncationLock (formerly known as
CLogTruncationLock,for those reading this thread from the beginning), locking out vacuum, and the idea upthread from
Andresabout setting PROC_IN_VACUUM and such.  Maybe I'm being dense and don't need to worry about this.  But I haven't
convincedmyself of that, yet. 
>
> I don't get it. If you've already checked that the XIDs are >=
> relfrozenxid and <= ReadNewFullTransactionId(), then this shouldn't be
> a problem. It could be, if CLOG is hosed, which is possible, because
> if the table is corrupted, why shouldn't CLOG also be corrupted? But
> I'm not sure that's what your concern is here.

No, that wasn't my concern.  I was thinking about CLOG entries disappearing during the scan as a consequence of
concurrentvacuums, and the effect that would have on the validity of the cached [relfrozenxid..next_valid_xid] range.
Inthe absence of corruption, I don't immediately see how this would cause any problems.  But for a corrupt table, I'm
lesscertain how it would play out. 

The kind of scenario I'm worried about may not be possible in practice.  I think it would depend on how vacuum behaves
whenscanning a corrupt table that is corrupt in some way that vacuum doesn't notice, and whether vacuum could finish
scanningthe table with the false belief that it has frozen all tuples with xids less than some cutoff. 

I thought it would be safer if that kind of thing were not happening during verify_heapam's scan of the table.  Even if
acareful analysis proved it was not an issue with the current coding of vacuum, I don't think there is any coding
conventionrequiring future versions of vacuum to be hardened against corruption, so I don't see how I can rely on
vacuumnot causing such problems. 

I don't think this is necessarily a too-rare-to-care-about type concern, either.  If corruption across multiple tables
preventsautovacuum from succeeding, and the DBA doesn't get involved in scanning tables for corruption until the lack
ofsuccessful vacuums impacts the production system, I imagine you could end up with vacuums repeatedly happening (or
tryingto happen) around the time the DBA is trying to fix tables, or perhaps drop them, or whatever, using
verify_heapamfor guidance on which tables are corrupted. 

Anyway, that's what I was thinking.  I was imagining that calling TransactionIdDidCommit might keep crashing the
backendwhile the DBA is trying to find and fix corruption, and that could get really annoying. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 30 July 2020, 22:08:51
Subject: Re: Batch insert in CTAS/MatView code

From: Mark Dilger
Date: 30 July 2020, 22:11:21
Subject: Re: new heapcheck contrib module

Re: new heapcheck contrib module - Mailing list pgsql-hackers

Previous

Next