On Tue, Aug 27, 2019 at 9:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I agree that this code is absolutely horrid as it stands. However,
> it doesn't look to me like caching RelationHasUnloggedIndex is quite
> enough to fix it. The other problem is that the calls in question
> seem to be mostly in TestForOldSnapshot, which is called in places
> like heapgetpage:
>
> LockBuffer(buffer, BUFFER_LOCK_SHARE);
>
> dp = BufferGetPage(buffer);
> TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp);
> lines = PageGetMaxOffsetNumber(dp);
> ntup = 0;
>
> It is hard to express what a bad idea it is to be asking for complex
> catalog searches while holding a buffer lock. We could easily get
> into undetectable deadlocks that way, for example. We need to refactor
> these call sites to arrange that the catalog lookup happens outside
> the low-level page access.
Hmm. Right. Perhaps the theory was that it was OK because it's
shared (rather than exclusive), or perhaps the catalog lookup was
sufficiently well hidden and was forgotten. At first glance it seems
like we need to capture PageGetLSN(page) while we have the lock, and
then later pass that into TestForOldSnapshot() instead of the page.
I'll look into that and write a patch, probably in a day or two.
> Your 0001 patch looks reasonable for the purpose of caching the
> result, though.
Thanks for the review. I'll wait until we figure out what to do about
the other problem and what needs to be back-patched.
--
Thomas Munro
https://enterprisedb.com