Re: Database logins taking longer and longer, showing up as "authentication" in ps(1) - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Database logins taking longer and longer, showing up as "authentication" in ps(1) |
Date | |
Msg-id | f85131fd-6d96-19de-196b-c1e262cd6aa6@aklaver.com Whole thread Raw |
In response to | Re: Database logins taking longer and longer, showing up as "authentication" in ps(1) (Adam Sjøgren <asjo@koldfront.dk>) |
Responses |
Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)
|
List | pgsql-general |
On 9/30/20 1:22 PM, Adam Sjøgren wrote: > Tom writes: > >> =?utf-8?Q?Adam_Sj=C3=B8gren?= <asjo@koldfront.dk> writes: >>> Tom writes: >>>> ... which implies that the problem is unexpectedly high contention for the >>>> ProcArrayLock. >> >>> One thing I should have mentioned, but forgot, is that the database is >>> configured to do logical replication to another machine - could that >>> have an effect on the lock contention? >>> A colleague pointed out that in the pg_locks output, the replication >>> processes are waiting on the ProcArray lock: >>> · https://koldfront.dk/misc/postgresql/pg_stat_activity_pg_locks-BAD.txt.gz >> >> Yeah, that is *mighty* interesting. For the archives' sake, I'll point >> out that this shows 16 walsender processes, of which 6 are in >> WalSenderWaitForWAL waits (ie, basically idle) and all of the other 10 >> are waiting for the ProcArrayLock. > > Sorry for not returning on this before now. > > I have seen the slow login problem - ProcArrayLock contention - happen > with replication turned off, so I think that can be ruled out as the > cause. > > I just observed something interesting right now, however. > > Just to recap the scene: the basic setup is that we have a table with a > queue of jobs to be processed, and we have a number of programs (say 10) > on 6 other machines grabbing those jobs, doing calculations and > reporting back. I don't have an answer. Not even sure if this is relevant to the problem, but how are the jobs getting into the queue? > > The number of jobs at any given time can fluctuate from none to > millions. Typically millions of jobs take some days to a week to get > through. > > Now, I happened to have a psql prompt open when the problem of new > logins being slow appeared - and I ran a query that counted the number > of jobs by job-type. > > Before the slow login problem, that query took around 70-80 ms, but when > the slow login appeared, it took 20+ seconds to run the query. > > EXPLAIN ANALYZE showed that it was running a parallel query. > > So I tried turning parallel queries off - with SET > max_parallel_workers_per_gather = 0; - and now the query took around 150 > ms consistenly, despite logins still being (variably) slow. > > So that confirms the ProcArrayLock contention, I think. > > My next thought is: I have 6*10 programs making connections and doing > parallel queries over the job queue, to get new jobs. If the jobs happen > to be very "easy", those programs will be doing a lot of queries for > jobs. And when the queue is large enough (I'm guessing), those queries > will be parallel, so they will be creating new processes that need to be > added and removed from the ProcArray. > > So am I shooting myself in the foot here, by creating lock contention > when trying to get jobs quickly from a largeish table? > > I think my next step will be to modify the programs that grab the jobs > to avoid parallel queries when looking up the next job to run. > > You anticipated this, it just took me until now to get a little closer: > >> It's also evident that there's some parallel query activity going on, >> as some sessions are in BgWorkerShutdown waits; so parallel worker >> entry/exit might be contributing to the overall load on ProcArrayLock. > > > Best regards, > > Adam > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: