old_snapshot_threshold vs indexes - Mailing list pgsql-hackers

From Thomas Munro
Subject old_snapshot_threshold vs indexes
Date
Msg-id CA+hUKGKT8oTkp5jw_U4p0S-7UG9zsvtw_M47Y285bER6a2gD+g@mail.gmail.com
Whole thread Raw
Responses Re: old_snapshot_threshold vs indexes  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hello,

I ran into someone with a system where big queries scanning 8GB+ of
all-in-cache data took consistently ~2.5x longer on a primary server
than on a replica.  Both servers had concurrent activity on them but
plenty of spare capacity and similar specs.  After some investigation
it turned out that on the primary there were (1) some select()
syscalls waiting for 1ms, which might indicate contended
SpinLockAcquire() back-offs, and (2) a huge amount of time spent in:

+ 93,31% 0,00% postgres postgres [.] index_getnext
+ 93,30% 0,00% postgres postgres [.] index_fetch_heap
+ 81,66% 0,01% postgres postgres [.] heap_page_prune_opt
+ 75,85% 0,00% postgres postgres [.] TransactionIdLimitedForOldSnapshots
+ 75,83% 0,01% postgres postgres [.] RelationHasUnloggedIndex
+ 75,79% 0,00% postgres postgres [.] RelationGetIndexList
+ 75,79% 75,78% postgres postgres [.] list_copy

The large tables in question have around 30 indexes.  I see that
heap_page_prune_opt()'s call to TransactionIdLimitedForOldSnapshots()
acquires a couple of system-wide spinlocks, and also tests
RelationAllowsEarlyPruning() which calls RelationHasUnloggedIndex()
which says:

 * Tells whether any index for the relation is unlogged.
 *
 * Note: There doesn't seem to be any way to have an unlogged index attached
 * to a permanent table, but it seems best to keep this general so that it
 * returns sensible results even when they seem obvious (like for an unlogged
 * table) and to handle possible future unlogged indexes on permanent tables.

It calls RelationGetIndexList() which conses up a new copy of the list
every time, so that we can spin through it looking for unlogged
indexes (and in this user's case there are none).  I didn't try to
poke at this in lab conditions, but from a glance a the code, I guess
heap_page_prune_opt() is running for every index tuple except those
that reference the same heap page as the one before, so I guess it
happens a lot if the heap is not physically correlated with the index
keys.  Ouch.

-- 
Thomas Munro
https://enterprisedb.com



pgsql-hackers by date:

Previous
From: Jesper Pedersen
Date:
Subject: Re: Index Skip Scan
Next
From: Peter Eisentraut
Date:
Subject: Re: check_recovery_target_lsn() does a PG_CATCH without a throw