Thread: old_snapshot_threshold vs indexes
Hello, I ran into someone with a system where big queries scanning 8GB+ of all-in-cache data took consistently ~2.5x longer on a primary server than on a replica. Both servers had concurrent activity on them but plenty of spare capacity and similar specs. After some investigation it turned out that on the primary there were (1) some select() syscalls waiting for 1ms, which might indicate contended SpinLockAcquire() back-offs, and (2) a huge amount of time spent in: + 93,31% 0,00% postgres postgres [.] index_getnext + 93,30% 0,00% postgres postgres [.] index_fetch_heap + 81,66% 0,01% postgres postgres [.] heap_page_prune_opt + 75,85% 0,00% postgres postgres [.] TransactionIdLimitedForOldSnapshots + 75,83% 0,01% postgres postgres [.] RelationHasUnloggedIndex + 75,79% 0,00% postgres postgres [.] RelationGetIndexList + 75,79% 75,78% postgres postgres [.] list_copy The large tables in question have around 30 indexes. I see that heap_page_prune_opt()'s call to TransactionIdLimitedForOldSnapshots() acquires a couple of system-wide spinlocks, and also tests RelationAllowsEarlyPruning() which calls RelationHasUnloggedIndex() which says: * Tells whether any index for the relation is unlogged. * * Note: There doesn't seem to be any way to have an unlogged index attached * to a permanent table, but it seems best to keep this general so that it * returns sensible results even when they seem obvious (like for an unlogged * table) and to handle possible future unlogged indexes on permanent tables. It calls RelationGetIndexList() which conses up a new copy of the list every time, so that we can spin through it looking for unlogged indexes (and in this user's case there are none). I didn't try to poke at this in lab conditions, but from a glance a the code, I guess heap_page_prune_opt() is running for every index tuple except those that reference the same heap page as the one before, so I guess it happens a lot if the heap is not physically correlated with the index keys. Ouch. -- Thomas Munro https://enterprisedb.com
On Fri, Jun 21, 2019 at 1:21 AM Thomas Munro <thomas.munro@gmail.com> wrote: > I ran into someone with a system where big queries scanning 8GB+ of > all-in-cache data took consistently ~2.5x longer on a primary server > than on a replica. Both servers had concurrent activity on them but > plenty of spare capacity and similar specs. After some investigation > it turned out that on the primary there were (1) some select() > syscalls waiting for 1ms, which might indicate contended > SpinLockAcquire() back-offs, and (2) a huge amount of time spent in: > > + 93,31% 0,00% postgres postgres [.] index_getnext > + 93,30% 0,00% postgres postgres [.] index_fetch_heap > + 81,66% 0,01% postgres postgres [.] heap_page_prune_opt > + 75,85% 0,00% postgres postgres [.] TransactionIdLimitedForOldSnapshots > + 75,83% 0,01% postgres postgres [.] RelationHasUnloggedIndex > + 75,79% 0,00% postgres postgres [.] RelationGetIndexList > + 75,79% 75,78% postgres postgres [.] list_copy On my laptop, all prewarmed, no concurrency, the mere existence of 10 brin indexes causes a sequential scan to take ~5% longer and an uncorrelated index scan to take ~45% longer (correlated index scans don't suffer). Here's a draft patch for v13 that fixes that problem by caching the result of RelationHasUnloggedIndex(). Reproducer scripts also attached. I ran them with shared_buffers=8GB, old_snapshot_threshold=10s and pg_prewarm installed. I didn't try to look into the complaint about suspected spinlock contention. -- Thomas Munro https://enterprisedb.com
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: > On my laptop, all prewarmed, no concurrency, the mere existence of 10 > brin indexes causes a sequential scan to take ~5% longer and an > uncorrelated index scan to take ~45% longer (correlated index scans > don't suffer). Here's a draft patch for v13 that fixes that problem > by caching the result of RelationHasUnloggedIndex(). I agree that this code is absolutely horrid as it stands. However, it doesn't look to me like caching RelationHasUnloggedIndex is quite enough to fix it. The other problem is that the calls in question seem to be mostly in TestForOldSnapshot, which is called in places like heapgetpage: LockBuffer(buffer, BUFFER_LOCK_SHARE); dp = BufferGetPage(buffer); TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp); lines = PageGetMaxOffsetNumber(dp); ntup = 0; It is hard to express what a bad idea it is to be asking for complex catalog searches while holding a buffer lock. We could easily get into undetectable deadlocks that way, for example. We need to refactor these call sites to arrange that the catalog lookup happens outside the low-level page access. Your 0001 patch looks reasonable for the purpose of caching the result, though. regards, tom lane
On Tue, Aug 27, 2019 at 9:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I agree that this code is absolutely horrid as it stands. However, > it doesn't look to me like caching RelationHasUnloggedIndex is quite > enough to fix it. The other problem is that the calls in question > seem to be mostly in TestForOldSnapshot, which is called in places > like heapgetpage: > > LockBuffer(buffer, BUFFER_LOCK_SHARE); > > dp = BufferGetPage(buffer); > TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp); > lines = PageGetMaxOffsetNumber(dp); > ntup = 0; > > It is hard to express what a bad idea it is to be asking for complex > catalog searches while holding a buffer lock. We could easily get > into undetectable deadlocks that way, for example. We need to refactor > these call sites to arrange that the catalog lookup happens outside > the low-level page access. Hmm. Right. Perhaps the theory was that it was OK because it's shared (rather than exclusive), or perhaps the catalog lookup was sufficiently well hidden and was forgotten. At first glance it seems like we need to capture PageGetLSN(page) while we have the lock, and then later pass that into TestForOldSnapshot() instead of the page. I'll look into that and write a patch, probably in a day or two. > Your 0001 patch looks reasonable for the purpose of caching the > result, though. Thanks for the review. I'll wait until we figure out what to do about the other problem and what needs to be back-patched. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Aug 27, 2019 at 9:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It is hard to express what a bad idea it is to be asking for complex >> catalog searches while holding a buffer lock. We could easily get >> into undetectable deadlocks that way, for example. We need to refactor >> these call sites to arrange that the catalog lookup happens outside >> the low-level page access. > Hmm. Right. Perhaps the theory was that it was OK because it's > shared (rather than exclusive), or perhaps the catalog lookup was > sufficiently well hidden and was forgotten. I strongly suspect the latter. Also, it may well be that the unlogged-index check was not in the original design but was added later with insufficient thought about where it'd be called from. > At first glance it seems > like we need to capture PageGetLSN(page) while we have the lock, and > then later pass that into TestForOldSnapshot() instead of the page. > I'll look into that and write a patch, probably in a day or two. Hm, but surely we need to do other things to the page besides TestForOldSnapshot? I was imagining that we'd collect the RelationHasUnloggedIndex flag (or perhaps better, the RelationAllowsEarlyPruning result) before attempting to lock the page, and then pass it to TestForOldSnapshot. regards, tom lane
On Tue, Aug 27, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > At first glance it seems > > like we need to capture PageGetLSN(page) while we have the lock, and > > then later pass that into TestForOldSnapshot() instead of the page. > > I'll look into that and write a patch, probably in a day or two. > > Hm, but surely we need to do other things to the page besides > TestForOldSnapshot? I was imagining that we'd collect the > RelationHasUnloggedIndex flag (or perhaps better, the > RelationAllowsEarlyPruning result) before attempting to lock > the page, and then pass it to TestForOldSnapshot. OK I started writing a patch and realised there were a few ugly problems that I was about to report here... but now I wonder if, based on the comment for RelationHasUnloggedIndex(), we shouldn't just nuke all this code. We don't actually support unlogged indexes on permanent tables (there is no syntax to create them, and RelationHasUnloggedIndex() will never return true in practice because RelationNeedsWAL() will always return false first). This is a locking protocol violation and a performance pessimisation in support of a feature we don't have. If we add support for that in some future release, we can figure out how to do it properly then, no? -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > OK I started writing a patch and realised there were a few ugly > problems that I was about to report here... but now I wonder if, based > on the comment for RelationHasUnloggedIndex(), we shouldn't just nuke > all this code. We don't actually support unlogged indexes on > permanent tables (there is no syntax to create them, and > RelationHasUnloggedIndex() will never return true in practice because > RelationNeedsWAL() will always return false first). Oh! That explains why the code coverage report shows clearly that RelationHasUnloggedIndex never returns true ;-) > This is a locking > protocol violation and a performance pessimisation in support of a > feature we don't have. If we add support for that in some future > release, we can figure out how to do it properly then, no? +1. That fix is also back-patchable, which adding fields to relcache entries would not be. It's not really apparent to me that unlogged indexes on logged tables would ever be a useful combination, so I'm certainly willing to nuke poorly-thought-out code that putatively supports it. But perhaps we should add some comments to remind us that this area would need work if anyone ever wanted to support that. Not sure where. regards, tom lane
On Tue, Aug 27, 2019 at 1:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > OK I started writing a patch and realised there were a few ugly > > problems that I was about to report here... but now I wonder if, based > > on the comment for RelationHasUnloggedIndex(), we shouldn't just nuke > > all this code. We don't actually support unlogged indexes on > > permanent tables (there is no syntax to create them, and > > RelationHasUnloggedIndex() will never return true in practice because > > RelationNeedsWAL() will always return false first). > > Oh! That explains why the code coverage report shows clearly that > RelationHasUnloggedIndex never returns true ;-) > > > This is a locking > > protocol violation and a performance pessimisation in support of a > > feature we don't have. If we add support for that in some future > > release, we can figure out how to do it properly then, no? > > +1. That fix is also back-patchable, which adding fields to relcache > entries would not be. There is a fly in the ointment: REL9_6_STABLE's copy of RelationHasUnloggedIndex() is hardcoded to return true for hash indexes (see commit 2cc41acd8). However, I now see that there isn't a buffer content lock deadlock risk here after all, because we don't reach RelationHasUnloggedIndex() if IsCatalogRelation(rel). That reminds me of commit 4fd05bb55b4. It still doesn't seem like a great idea to be doing catalog access while holding the buffer content lock, though. So I think we need to leave 9.6 as is, and discuss how far back to back-patch the attached. It could go back to 10, but perhaps we should be cautious and push it to master only for now, if you agree with my analysis of the deadlock thing. > It's not really apparent to me that unlogged indexes on logged tables > would ever be a useful combination, so I'm certainly willing to nuke > poorly-thought-out code that putatively supports it. But perhaps > we should add some comments to remind us that this area would need > work if anyone ever wanted to support that. Not sure where. It might make sense for some kind of in-memory index that is rebuilt from the heap at startup, but then I doubt such a thing would have an index relation with a relpersistence to check anyway. -- Thomas Munro https://enterprisedb.com
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Aug 27, 2019 at 1:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> +1. That fix is also back-patchable, which adding fields to relcache >> entries would not be. > There is a fly in the ointment: REL9_6_STABLE's copy of > RelationHasUnloggedIndex() is hardcoded to return true for hash > indexes (see commit 2cc41acd8). True, in 9.6 hash indexes *were* effectively unlogged, so that the code actually did something in that branch. Given the lack of bug reports traceable to this, I wouldn't feel too bad about leaving it alone in 9.6. > However, I now see that there isn't a buffer content lock deadlock > risk here after all, because we don't reach RelationHasUnloggedIndex() > if IsCatalogRelation(rel). That reminds me of commit 4fd05bb55b4. It > still doesn't seem like a great idea to be doing catalog access while > holding the buffer content lock, though. Yeah, I'm not convinced that that observation means the problem is unreachable. Probably does make it harder to hit a deadlock, but if you mix a few VACUUMs and untimely cache flushes into the equation, I feel like one could still happen. > So I think we need to leave 9.6 as is, and discuss how far back to > back-patch the attached. It could go back to 10, but perhaps we > should be cautious and push it to master only for now, if you agree > with my analysis of the deadlock thing. I'd vote for back-patching to 10. Even if there is in fact no deadlock hazard, you've clearly demonstrated a significant performance hit that we're taking for basically no reason. In the larger picture, the commit this reminds me of is b04aeb0a0. I'm wondering if we could add some assertions to the effect of "don't initiate relcache or syscache lookups while holding a buffer lock". It would be relatively easy to do that if we could make the rule be "... while holding any LWLock", but I suspect that that would break some legitimate cases. regards, tom lane
On Wed, Aug 28, 2019 at 3:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'd vote for back-patching to 10. Even if there is in fact no deadlock > hazard, you've clearly demonstrated a significant performance hit that > we're taking for basically no reason. Done. The second symptom reported in my first email looked like evidence of high levels of spinlock backoff, which I guess might have been coming from TransactionIdLimitedForOldSnapshots()'s hammering of oldSnapshotControl->mutex_threshold and oldSnapshotControl->mutex_threshold, when running heap_page_prune_opt()-heavy workloads like the one generated by test-indexscan.sql (from my earlier message) from many backends at the same time on a large system. That's just an observation I'm leaving here, I'm not planning to chase that any further for now. -- Thomas Munro https://enterprisedb.com