Re: Database logins taking longer and longer, showing up as "authentication" in ps(1) - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)
Date
Msg-id f85131fd-6d96-19de-196b-c1e262cd6aa6@aklaver.com
Whole thread Raw
In response to Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)  (Adam Sjøgren <asjo@koldfront.dk>)
Responses Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)
List pgsql-general
On 9/30/20 1:22 PM, Adam Sjøgren wrote:
> Tom writes:
> 
>> =?utf-8?Q?Adam_Sj=C3=B8gren?= <asjo@koldfront.dk> writes:
>>> Tom writes:
>>>> ... which implies that the problem is unexpectedly high contention for the
>>>> ProcArrayLock.
>>
>>> One thing I should have mentioned, but forgot, is that the database is
>>> configured to do logical replication to another machine - could that
>>> have an effect on the lock contention?
>>> A colleague pointed out that in the pg_locks output, the replication
>>> processes are waiting on the ProcArray lock:
>>>   · https://koldfront.dk/misc/postgresql/pg_stat_activity_pg_locks-BAD.txt.gz
>>
>> Yeah, that is *mighty* interesting.  For the archives' sake, I'll point
>> out that this shows 16 walsender processes, of which 6 are in
>> WalSenderWaitForWAL waits (ie, basically idle) and all of the other 10
>> are waiting for the ProcArrayLock.
> 
> Sorry for not returning on this before now.
> 
> I have seen the slow login problem - ProcArrayLock contention - happen
> with replication turned off, so I think that can be ruled out as the
> cause.
> 
> I just observed something interesting right now, however.
> 
> Just to recap the scene: the basic setup is that we have a table with a
> queue of jobs to be processed, and we have a number of programs (say 10)
> on 6 other machines grabbing those jobs, doing calculations and
> reporting back.

I don't have an answer. Not even sure if this is relevant to the 
problem, but how are the jobs getting into the queue?

> 
> The number of jobs at any given time can fluctuate from none to
> millions. Typically millions of jobs take some days to a week to get
> through.
> 
> Now, I happened to have a psql prompt open when the problem of new
> logins being slow appeared - and I ran a query that counted the number
> of jobs by job-type.
> 
> Before the slow login problem, that query took around 70-80 ms, but when
> the slow login appeared, it took 20+ seconds to run the query.
> 
> EXPLAIN ANALYZE showed that it was running a parallel query.
> 
> So I tried turning parallel queries off - with SET
> max_parallel_workers_per_gather = 0; - and now the query took around 150
> ms consistenly, despite logins still being (variably) slow.
> 
> So that confirms the ProcArrayLock contention, I think.
> 
> My next thought is: I have 6*10 programs making connections and doing
> parallel queries over the job queue, to get new jobs. If the jobs happen
> to be very "easy", those programs will be doing a lot of queries for
> jobs. And when the queue is large enough (I'm guessing), those queries
> will be parallel, so they will be creating new processes that need to be
> added and removed from the ProcArray.
> 
> So am I shooting myself in the foot here, by creating lock contention
> when trying to get jobs quickly from a largeish table?
> 
> I think my next step will be to modify the programs that grab the jobs
> to avoid parallel queries when looking up the next job to run.
> 
> You anticipated this, it just took me until now to get a little closer:
> 
>> It's also evident that there's some parallel query activity going on,
>> as some sessions are in BgWorkerShutdown waits; so parallel worker
>> entry/exit might be contributing to the overall load on ProcArrayLock.
> 
> 
>    Best regards,
> 
>      Adam
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



pgsql-general by date:

Previous
From: Adam Sjøgren
Date:
Subject: Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)
Next
From: Adam Sjøgren
Date:
Subject: Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)