Re: Database logins taking longer and longer, showing up as "authentication" in ps(1) - Mailing list pgsql-general

From Adam Sjøgren
Subject Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)
Date
Msg-id 87zh57dnh2.fsf@tullinup.koldfront.dk
Whole thread Raw
In response to Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)
List pgsql-general
Tom writes:

> =?utf-8?Q?Adam_Sj=C3=B8gren?= <asjo@koldfront.dk> writes:
>> Tom writes:
>>> ... which implies that the problem is unexpectedly high contention for the
>>> ProcArrayLock.
>
>> One thing I should have mentioned, but forgot, is that the database is
>> configured to do logical replication to another machine - could that
>> have an effect on the lock contention?
>> A colleague pointed out that in the pg_locks output, the replication
>> processes are waiting on the ProcArray lock:
>>  · https://koldfront.dk/misc/postgresql/pg_stat_activity_pg_locks-BAD.txt.gz
>
> Yeah, that is *mighty* interesting.  For the archives' sake, I'll point
> out that this shows 16 walsender processes, of which 6 are in
> WalSenderWaitForWAL waits (ie, basically idle) and all of the other 10
> are waiting for the ProcArrayLock.

Sorry for not returning on this before now.

I have seen the slow login problem - ProcArrayLock contention - happen
with replication turned off, so I think that can be ruled out as the
cause.

I just observed something interesting right now, however.

Just to recap the scene: the basic setup is that we have a table with a
queue of jobs to be processed, and we have a number of programs (say 10)
on 6 other machines grabbing those jobs, doing calculations and
reporting back.

The number of jobs at any given time can fluctuate from none to
millions. Typically millions of jobs take some days to a week to get
through.

Now, I happened to have a psql prompt open when the problem of new
logins being slow appeared - and I ran a query that counted the number
of jobs by job-type.

Before the slow login problem, that query took around 70-80 ms, but when
the slow login appeared, it took 20+ seconds to run the query.

EXPLAIN ANALYZE showed that it was running a parallel query.

So I tried turning parallel queries off - with SET
max_parallel_workers_per_gather = 0; - and now the query took around 150
ms consistenly, despite logins still being (variably) slow.

So that confirms the ProcArrayLock contention, I think.

My next thought is: I have 6*10 programs making connections and doing
parallel queries over the job queue, to get new jobs. If the jobs happen
to be very "easy", those programs will be doing a lot of queries for
jobs. And when the queue is large enough (I'm guessing), those queries
will be parallel, so they will be creating new processes that need to be
added and removed from the ProcArray.

So am I shooting myself in the foot here, by creating lock contention
when trying to get jobs quickly from a largeish table?

I think my next step will be to modify the programs that grab the jobs
to avoid parallel queries when looking up the next job to run.

You anticipated this, it just took me until now to get a little closer:

> It's also evident that there's some parallel query activity going on,
> as some sessions are in BgWorkerShutdown waits; so parallel worker
> entry/exit might be contributing to the overall load on ProcArrayLock.


  Best regards,

    Adam

--
 "That's one of the remarkable things about life.           Adam Sjøgren
  It's never so bad that it can't get worse."          asjo@koldfront.dk



pgsql-general by date:

Previous
From: Diego
Date:
Subject: Re: Procedure to install and configure pgadmin4 in desktop mode in Red Hat Linux 8 and other Linux distributions
Next
From: Adrian Klaver
Date:
Subject: Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)