subtransaction performance regression [kind of] due to snapshot caching - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | subtransaction performance regression [kind of] due to snapshot caching |
Date | |
Msg-id | 20210406043521.lopeo7bbigad3n6t@alap3.anarazel.de Whole thread Raw |
Responses |
Re: subtransaction performance regression [kind of] due to snapshot caching
Re: subtransaction performance regression [kind of] due to snapshot caching |
List | pgsql-hackers |
Hi, In a recent thread ([1]) I found a performance regression of the following statement DO $do$ BEGIN FOR i IN 1 .. 10000 LOOP BEGIN EXECUTE $cf$CREATE OR REPLACE FUNCTION foo() RETURNS VOID LANGUAGE plpgsql AS $f$BEGIN frakbar; END;$f$;$cf$; EXCEPTION WHEN others THEN END; END LOOP; END;$do$; 13: 1617.798 14-dev: 34088.505 The time in 14 is spent mostly below: - 94.58% 0.01% postgres postgres [.] CreateFunction - 94.57% CreateFunction - 94.49% ProcedureCreate - 90.95% record_object_address_dependencies - 90.93% recordMultipleDependencies - 89.65% isObjectPinned - 89.12% systable_getnext - 89.06% index_getnext_slot - 56.13% index_fetch_heap - 54.82% table_index_fetch_tuple + 53.79% heapam_index_fetch_tuple 0.07% heap_hot_search_buffer 0.01% ReleaseAndReadBuffer 0.01% LockBuffer 0.08% heapam_index_fetch_tuple After a bit of debugging I figured out that the direct failure lies with 623a9ba79b. The problem is that subtransaction abort does not increment ShmemVariableCache->xactCompletionCount. That's trivial to remedy (we already lock ProcArrayLock during XidCacheRemoveRunningXids). What happens is that heap_hot_search_buffer()-> correctly recognizes the aborted subtransaction's rows as dead, but they are not recognized as "surely dead". Which then leads to O(iterations^2) index->heap lookups, because the rows from previous iterations are never recognized as dead. I initially was a bit worried that this could be a correctness issue as well. The more I thought about it the more confused I got. A transaction's subtransaction abort should not actually change the contents of a snapshot, right? Snapshot GetSnapshotData(Snapshot snapshot) ... /* * We don't include our own XIDs (if any) in the snapshot. It * needs to be includeded in the xmin computation, but we did so * outside the loop. */ if (pgxactoff == mypgxactoff) continue; The sole reason for the behavioural difference is that the cached snapshot's xmax is *lower* than a new snapshot's would be, because GetSnapshotData() initializes xmax as ShmemVariableCache->latestCompletedXid - which XidCacheRemoveRunningXids() incremented, without incrementing ShmemVariableCache->xactCompletionCount. Which then causes HeapTupleSatisfiesMVCC to go down if (!HeapTupleHeaderXminCommitted(tuple)) ... else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot)) return false; else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple))) SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED, HeapTupleHeaderGetRawXmin(tuple)); else { /* it must have aborted or crashed */ SetHintBits(tuple, buffer, HEAP_XMIN_INVALID, InvalidTransactionId); return false; } the "return false" for XidInMVCCSnapshot() rather than the SetHintBits(HEAP_XMIN_INVALID) path. Which then in turn causes HeapTupleIsSurelyDead() to not recognize the rows as surely dead. bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) { uint32 i; /* * Make a quick range check to eliminate most XIDs without looking at the * xip arrays. Note that this is OK even if we convert a subxact XID to * its parent below, because a subxact with XID < xmin has surely also got * a parent with XID < xmin, while one with XID >= xmax must belong to a * parent that was not yet committed at the time of this snapshot. */ /* Any xid < xmin is not in-progress */ if (TransactionIdPrecedes(xid, snapshot->xmin)) return false; /* Any xid >= xmax is in-progress */ if (TransactionIdFollowsOrEquals(xid, snapshot->xmax)) return true; I *think* this issue doesn't lead to actually wrong query results. For HeapTupleSatisfiesMVCC purposes there's not much of a difference between an aborted transaction and one that's "in progress" according to the snapshot (that's required - we don't check for aborts for xids in the snapshot). It is a bit disappointing that there - as far as I could find - are no tests for kill_prior_tuple actually working. I guess that lack, and that there's no difference in query results, explains why nobody noticed the issue in the last ~9 months. See the attached fix. I did include a test that verifies that the kill_prior_tuples optimization actually prevents the index from growing, when subtransactions are involved. I think it should be stable, even with concurrent activity. But I'd welcome a look. I don't think that's why the issue exists, but I very much hate the XidCache* name. Makes it sound much less important than it is... Greetings, Andres Freund [1] https://postgr.es/m/20210317055718.v6qs3ltzrformqoa%40alap3.anarazel.de
Attachment
pgsql-hackers by date: