Re: MultiXact\SLRU buffers configuration - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: MultiXact\SLRU buffers configuration
Date
Msg-id 20201028233243.ygm6yqlynkqpzekr@development
Whole thread Raw
In response to Re: MultiXact\SLRU buffers configuration  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: MultiXact\SLRU buffers configuration
List pgsql-hackers
Hi,

On Wed, Oct 28, 2020 at 12:34:58PM +0500, Andrey Borodin wrote:
>Tomas, thanks for looking into this!
>
>> 28 окт. 2020 г., в 06:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
>>
>>
>> This thread started with a discussion about making the SLRU sizes
>> configurable, but this patch version only adds a local cache. Does this
>> achieve the same goal, or would we still gain something by having GUCs
>> for the SLRUs?
>>
>> If we're claiming this improves performance, it'd be good to have some
>> workload demonstrating that and measurements. I don't see anything like
>> that in this thread, so it's a bit hand-wavy. Can someone share details
>> of such workload (even synthetic one) and some basic measurements?
>
>All patches in this thread aim at the same goal: improve performance in presence of MultiXact locks contention.
>I could not build synthetical reproduction of the problem, however I did some MultiXact stressing here [0]. It's a
clumsytest program, because it still is not clear to me which parameters of workload trigger MultiXact locks
contention.In generic case I was encountering other locks like *GenLock: XidGenLock, MultixactGenLock etc. Yet our
productionsystem encounters this problem approximately once in a month through this year.
 
>
>Test program locks for share different set of tuples in presence of concurrent full scans.
>To produce a set of locks we choose one of 14 bits. If a row number has this bit set to 0 we add lock it.
>I've been measuring time to lock all rows 3 time for each of 14 bits, observing total time to set all locks.
>During the test I was observing locks in pg_stat_activity, if they did not contain enough MultiXact locks I was tuning
parametersfurther (number of concurrent clients, number of bits, select queries etc).
 
>
>Why is it so complicated? It seems that other reproductions of a problem were encountering other locks.
>

It's not my intention to be mean or anything like that, but to me this
means we don't really understand the problem we're trying to solve. Had
we understood it, we should be able to construct a workload reproducing
the issue ...

I understand what the individual patches are doing, and maybe those
changes are desirable in general. But without any benchmarks from a
plausible workload I find it hard to convince myself that:

(a) it actually will help with the issue you're observing on production

and 

(b) it's actually worth the extra complexity (e.g. the lwlock changes)


I'm willing to invest some of my time into reviewing/testing this, but I
think we badly need better insight into the issue, so that we can build
a workload reproducing it. Perhaps collecting some perf profiles and a
sample of the queries might help, but I assume you already tried that.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: MultiXact\SLRU buffers configuration
Next
From: Tom Lane
Date:
Subject: Re: Autovacuum worker doesn't immediately exit on postmaster death