Re: Slow standby snapshot - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Slow standby snapshot
Date
Msg-id 20210803173531.2ww6v35255wppsni@alap3.anarazel.de
Whole thread Raw
In response to Re: Slow standby snapshot  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: Slow standby snapshot  (Michail Nikolaev <michail.nikolaev@gmail.com>)
Re: Slow standby snapshot  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
Hi,

On 2021-08-03 10:33:50 +0500, Andrey Borodin wrote:
> > 3 авг. 2021 г., в 03:01, Andres Freund <andres@anarazel.de> написал(а):
> > On 2021-08-03 00:07:23 +0300, Michail Nikolaev wrote:
> >> The main idea is simple optimistic optimization - store offset to next
> >> valid entry. So, in most cases, we could just skip all the gaps.
> >> Of course, it adds some additional impact for workloads without long
> >> (few seconds) transactions but it is almost not detectable (because of
> >> CPU caches).
> > 
> > I'm doubtful that that's really the right direction. For workloads that
> > are replay heavy we already often can see the cost of maintaining the
> > known xids datastructures show up significantly - not surprising, given
> > the datastructure. And for standby workloads with active primaries the
> > cost of searching through the array in all backends is noticeable as
> > well.  I think this needs a bigger data structure redesign.
> 
> KnownAssignedXids implements simple membership test idea. What kind of
> redesign would you suggest? Proposed optimisation makes it close to optimal,
> but needs eventual compression.

Binary searches are very ineffecient on modern CPUs (unpredictable memory
accesses, unpredictable branches). We constantly need to do binary searches
during replay to remove xids from the array. I don't see how you can address
that with just the current datastructure.


> Maybe use a hashtable of running transactions? It will be slightly faster
> when adding\removing single transactions. But much worse when doing
> KnownAssignedXidsRemove().

Why would it be worse for KnownAssignedXidsRemove()? Were you intending to
write KnownAssignedXidsGet[AndSetXmin]()?


> Maybe use a tree? (AVL\RB or something like that) It will be slightly better, because it does not need eventual
compressionlike exiting array.
 

I'm not entirely sure what datastructure would work best. I can see something
like a radix tree work well, or a skiplist style approach. Or a hashtable:

I'm not sure that we need to care as much about the cost of
KnownAssignedXidsGetAndSetXmin() - for one, the caching we now have makes that
less frequent. But more importantly, it'd not be hard to maintain an
occasionally (or opportunistically) maintained denser version for
GetSnapshotData() - there's only a single writer, making the concurrency
issues a lot simpler.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Next
From: Andres Freund
Date:
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)