On Fri, Jun 21, 2019 at 1:21 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> I ran into someone with a system where big queries scanning 8GB+ of
> all-in-cache data took consistently ~2.5x longer on a primary server
> than on a replica. Both servers had concurrent activity on them but
> plenty of spare capacity and similar specs. After some investigation
> it turned out that on the primary there were (1) some select()
> syscalls waiting for 1ms, which might indicate contended
> SpinLockAcquire() back-offs, and (2) a huge amount of time spent in:
>
> + 93,31% 0,00% postgres postgres [.] index_getnext
> + 93,30% 0,00% postgres postgres [.] index_fetch_heap
> + 81,66% 0,01% postgres postgres [.] heap_page_prune_opt
> + 75,85% 0,00% postgres postgres [.] TransactionIdLimitedForOldSnapshots
> + 75,83% 0,01% postgres postgres [.] RelationHasUnloggedIndex
> + 75,79% 0,00% postgres postgres [.] RelationGetIndexList
> + 75,79% 75,78% postgres postgres [.] list_copy
On my laptop, all prewarmed, no concurrency, the mere existence of 10
brin indexes causes a sequential scan to take ~5% longer and an
uncorrelated index scan to take ~45% longer (correlated index scans
don't suffer). Here's a draft patch for v13 that fixes that problem
by caching the result of RelationHasUnloggedIndex().
Reproducer scripts also attached. I ran them with shared_buffers=8GB,
old_snapshot_threshold=10s and pg_prewarm installed.
I didn't try to look into the complaint about suspected spinlock contention.
--
Thomas Munro
https://enterprisedb.com