Re: Slow standby snapshot - Mailing list pgsql-hackers

From Michail Nikolaev
Subject Re: Slow standby snapshot
Date
Msg-id CANtu0ojw4-vFQk4nscmGXYj2sPRNAuez39=1Oy1JEhO97Ekp9A@mail.gmail.com
Whole thread Raw
In response to Re: Slow standby snapshot  (Andres Freund <andres@anarazel.de>)
Responses Re: Slow standby snapshot  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hello, Andres.

Thanks for the feedback again.

> The problem is that we don't want to add a lot of work
> KnownAssignedXidsAdd/Remove, because very often nobody will build a snapshot
> for that moment and building a sorted, gap-free, linear array of xids isn't
> cheap. In my experience it's more common to be bottlenecked by replay CPU
> performance than on replica snapshot building.

Yes, but my patch adds almost the smallest possible amount for
KnownAssignedXidsAdd/Remove - a single write to the array by index.
It differs from the first version in this thread which is based on linked lists.
The "next valid offset" is just "optimistic optimization" - it means
"you could safely skip KnownAssignedXidsNext[i] while finding the next
valid".
But KnownAssignedXidsNext is not updated by Add/Remove.

> During recovery, where there's only one writer to the procarray / known xids,
> it might not be hard to opportunistically build a dense version of the known
> xids whenever a snapshot is built.

AFAIU the patch does exactly the same.
On the first snapshot building, offsets to the next valid entry are
set. So, a dense version is created on-demand.
And this version is reused (even partly if something was removed) on
the next snapshot building.

> I'm not entirely sure what datastructure would work best. I can see something
> like a radix tree work well, or a skiplist style approach. Or a hashtable:

We could try to use some other structure (for example - linked hash
map) - but the additional cost (memory management, links, hash
calculation) will probably significantly reduce performance.
And it is a much harder step to perform.

So, I think "next valid offset" optimization is a good trade-off for now:
* KnownAssignedXidsAdd/Remove are almost not affected in their complexity
* KnownAssignedXidsGetAndSetXmin is eliminated from the CPU top on
typical read scenario - so, more TPS, less ProcArrayLock contention
* it complements GetSnapshotData() scalability - now on standby
* changes are small

Thanks,
Michail.



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Asynchronous and "direct" IO support for PostgreSQL.
Next
From: Michael Meskes
Date:
Subject: Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE