Thread: Max connections reached without max connections reached

Max connections reached without max connections reached

From
James Sewell
Date:
Hi,

This is a bit of an odd one - I'm on PostgreSQL 11.9 with a single streaming replica.

The system handles a lot of connections - we have a max_connections of 600. Most are long lived JDBC, but there are a lot of ETL / ad-hoc jobs etc.

Connections normally sit at 300ish, with 70 active at the most. The machines have 32 CPU cores . PgBouncer is sadly not an option hereas we are using many long lived connections which make use of prepared statements.

Sometimes a strange condition occurs. The number of connections is well under 600 (and dropping), but new connections are not being allowed into the database, I can see this message in the logs:

   (0:53300)FATAL:  remaining connection slots are reserved for non-replication superuser connections

From ps I can see a lot of processes like this:

  postgres: accessor monitoring 10.10.7.54[41655] startup

The number of these grows, and no new connections are allowed in. These startup connections do not appear in pg_stat_activity so I can't find what they are waiting on.

Removing some long lived connections sometimes appears to help clear the startup processes and return the system to normal - but I have not been able to find a correlation. It's more blind luck at the moment.

Any pointers on where I could start prodding?

Cheers,
James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Tom Lane
Date:
James Sewell <james.sewell@jirotech.com> writes:
> The system handles a lot of connections - we have a max_connections of 600.
> Most are long lived JDBC, but there are a lot of ETL / ad-hoc jobs etc.

> Connections normally sit at 300ish, with 70 active at the most. The
> machines have 32 CPU cores . PgBouncer is sadly not an option hereas we are
> using many long lived connections which make use of prepared statements.

> Sometimes a strange condition occurs. The number of connections is well
> under 600 (and dropping), but new connections are not being allowed into
> the database, I can see this message in the logs:

>    (0:53300)FATAL:  remaining connection slots are reserved for
> non-replication superuser connections

What are you looking at to claim the number of connections is under 600?
Maybe there's some disconnect between what you're measuring and what the
database thinks.

A different line of thought is that ProcArray slots can be consumed by
things that aren't client connection processes, in particular
(1) parallel-query workers
(2) autovacuum workers
Looking into pg_stat_activity when you see this issue might help
clarify that.

            regards, tom lane



Re: Max connections reached without max connections reached

From
James Sewell
Date:
What are you looking at to claim the number of connections is under 600?
Maybe there's some disconnect between what you're measuring and what the
database thinks.

A different line of thought is that ProcArray slots can be consumed by
things that aren't client connection processes, in particular
(1) parallel-query workers
(2) autovacuum workers
Looking into pg_stat_activity when you see this issue might help
clarify that.

Sorry, I could have been clearer - pg_stat_activity is what I'm looking at - I'm recording connection info from here every 15 seconds (from a superuser account so I don't get locked out). It never peaks above 300 (in fact when the incident happens no new connections can come in so it falls to around 100) - yet I'm seeing the log lines claim that I'm still hitting 597 (600 - 3 reserved). 

If I measure from `ps  -ef | grep postgres` and look at the connections then I can see that with the startup connections I am hitting this limit.

So client processes which are listed to the OS as "startup" ARE counted towards the 597 connections, but are NOT reported in pg_stat_activity

James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Adrian Klaver
Date:
On 11/23/21 14:56, James Sewell wrote:

> Sorry, I could have been clearer - pg_stat_activity is what I'm looking 
> at - I'm recording connection info from here every 15 seconds (from a 
> superuser account so I don't get locked out). It never peaks above 300 
> (in fact when the incident happens no new connections can come in so it 
> falls to around 100) - yet I'm seeing the log lines claim that I'm still 
> hitting 597 (600 - 3 reserved).

What is the query you are using against pg_stat_activity?

> 
> James
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Max connections reached without max connections reached

From
Tom Lane
Date:
James Sewell <james.sewell@jirotech.com> writes:
> If I measure from `ps  -ef | grep postgres` and look at the connections
> then I can see that with the startup connections I am hitting this limit.
> So client processes which are listed to the OS as "startup" ARE counted
> towards the 597 connections, but are NOT reported in pg_stat_activity

So I guess the question becomes why are they spending so much time in
the startup state.  That should take mere milliseconds, unless the
clients are being slow to handle the authentication exchange?

I'm also wondering a bit about whether they're being blocked on a lock,
eg. due to something taking an exclusive lock on pg_authid or pg_database.
pg_locks might be interesting to check.

            regards, tom lane



Re: Max connections reached without max connections reached

From
James Sewell
Date:
> Sorry, I could have been clearer - pg_stat_activity is what I'm looking
> at - I'm recording connection info from here every 15 seconds (from a
> superuser account so I don't get locked out). It never peaks above 300
> (in fact when the incident happens no new connections can come in so it
> falls to around 100) - yet I'm seeing the log lines claim that I'm still
> hitting 597 (600 - 3 reserved).

What is the query you are using against pg_stat_activity?


For general counts ad-hoc we just use:

SELECT count(*) FROM pg_stat_activity

The monitoring system runs:

SELECT CASE WHEN usename IS NOT NULL THEN usename ELSE 'SYSTEM' END as role, 
        datname AS database,
        state,
        wait_event,
        count(*) AS connection
FROM pg_stat_activity
GROUP BY 1,2,3,4

 


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
James Sewell
Date:

So I guess the question becomes why are they spending so much time in
the startup state.  That should take mere milliseconds, unless the
clients are being slow to handle the authentication exchange?

I'm also wondering a bit about whether they're being blocked on a lock,
eg. due to something taking an exclusive lock on pg_authid or pg_database.
pg_locks might be interesting to check.

It's v odd as it happens *sometimes* - having said that it's happening right this moment:

[enterprisedb@oprpgs001 edb-as-11]$ ps -ef|grep postgres:  | wc -l
517

[enterprisedb@oprpgs001 ~]$ ps -ef|grep postgres: | grep -i start | wc -l
480

postgres=# select count(*) from pg_stat_activity;
 count
-------
   97
(1 row)

postgres=# select * from pg_locks where not granted;
 locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
(0 rows)

 
Logs are showing a stream of:

2021-11-24 10:50:58 AEDT [869]: [1-1] user=u_rundeck_rw,db=rundeck_cluster,client=10.154.19.23 10.154.19.23(57122) (0:53300)FATAL:  remaining connection slots are reserved for non-replication superuser connections
2021-11-24 10:50:58 AEDT [870]: [1-1] user=u_rundeck_rw,db=rundeck_cluster,client=10.154.19.23 10.154.19.23(57124) (0:53300)FATAL:  remaining connection slots are reserved for non-replication superuser connections
2021-11-24 10:50:58 AEDT [871]: [1-1] user=u_rundeck_rw,db=rundeck_cluster,client=10.154.19.23 10.154.19.23(57126) (0:53300)FATAL:  remaining connection slots are reserved for non-replication superuser connections


James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Adrian Klaver
Date:
On 11/23/21 15:53, James Sewell wrote:

> 
> It's v odd as it happens *sometimes* - having said that it's happening 
> right this moment:
> 
> [enterprisedb@oprpgs001 edb-as-11]$ ps -ef|grep postgres:  | wc -l
> 517
> 
> [enterprisedb@oprpgs001 ~]$ ps -ef|grep postgres: | grep -i start | wc -l
> 480

The enterprisedb is one of their customized versions?

rundeck_cluster below refers to https://digital.ai/technology/rundeck?

> 
> postgres=# select count(*) from pg_stat_activity;
>   count
> -------
>     97
> (1 row)
> 
> postgres=# select * from pg_locks where not granted;
>   locktype | database | relation | page | tuple | virtualxid | 
> transactionid | classid | objid | objsubid | virtualtransaction | pid | 
> mode | granted | fastpath
>
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
> (0 rows)
> Logs are showing a stream of:
> 
> 2021-11-24 10:50:58 AEDT [869]: [1-1] 
> user=u_rundeck_rw,db=rundeck_cluster,client=10.154.19.23 
> 10.154.19.23(57122) (0:53300)FATAL:  remaining connection slots are 
> reserved for non-replication superuser connections
> 2021-11-24 10:50:58 AEDT [870]: [1-1] 
> user=u_rundeck_rw,db=rundeck_cluster,client=10.154.19.23 
> 10.154.19.23(57124) (0:53300)FATAL:  remaining connection slots are 
> reserved for non-replication superuser connections
> 2021-11-24 10:50:58 AEDT [871]: [1-1] 
> user=u_rundeck_rw,db=rundeck_cluster,client=10.154.19.23 
> 10.154.19.23(57126) (0:53300)FATAL:  remaining connection slots are 
> reserved for non-replication superuser connections
> 
> James
> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Max connections reached without max connections reached

From
James Sewell
Date:

The enterprisedb is one of their customized versions?

rundeck_cluster below refers to https://digital.ai/technology/rundeck?


re: EnterpriseDB yes it is - I'm having this same discussion with them in parallel 
re: rundeck, yes - but this is only one of many things connecting. it's not doing anything special.



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Adrian Klaver
Date:
On 11/23/21 16:23, James Sewell wrote:
> 
>     The enterprisedb is one of their customized versions?
> 
>     rundeck_cluster below refers to
>     https://digital.ai/technology/rundeck
>     <https://digital.ai/technology/rundeck>?
> 
> 
> re: EnterpriseDB yes it is - I'm having this same discussion with them 
> in parallel

What version of theirs?

> re: rundeck, yes - but this is only one of many things connecting. it's 
> not doing anything special.

Except sending a stream of connections to the server.

Any idea what they are doing?

-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Max connections reached without max connections reached

From
James Sewell
Date:

> re: EnterpriseDB yes it is - I'm having this same discussion with them
> in parallel

What version of theirs?
 
PostgreSQL 11.9 (EnterpriseDB Advanced Server 11.9.18) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
 

> re: rundeck, yes - but this is only one of many things connecting. it's
> not doing anything special.

Except sending a stream of connections to the server.

Any idea what they are doing?

It's used to trigger ETL jobs. There are other bigger consumers of connections - the issue isn't the stream of jobs, that works fine under normal operation. The issue is that when jobs stay in "startup" and don't enter pg_stat_activity the system spirals downwards and no new connections are allowed.

James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Adrian Klaver
Date:
On 11/23/21 16:58, James Sewell wrote:
> 
>      > re: EnterpriseDB yes it is - I'm having this same discussion with
>     them
>      > in parallel
> 
>     What version of theirs?
> 
> PostgreSQL 11.9 (EnterpriseDB Advanced Server 11.9.18) on 
> x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 
> 4.8.5-36), 64-bit
> 
> 
>      > re: rundeck, yes - but this is only one of many things
>     connecting. it's
>      > not doing anything special.
> 
>     Except sending a stream of connections to the server.
> 
>     Any idea what they are doing?
> 
> 
> It's used to trigger ETL jobs. There are other bigger consumers of 
> connections - the issue isn't the stream of jobs, that works fine under 
> normal operation. The issue is that when jobs stay in "startup" and 
> don't enter pg_stat_activity the system spirals downwards and no new 
> connections are allowed.

Is there a correlation to these ETL jobs or the other consumers?

If so what are the consumers trying to do at that time?

> 
> James
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Max connections reached without max connections reached

From
James Sewell
Date:

> It's used to trigger ETL jobs. There are other bigger consumers of
> connections - the issue isn't the stream of jobs, that works fine under
> normal operation. The issue is that when jobs stay in "startup" and
> don't enter pg_stat_activity the system spirals downwards and no new
> connections are allowed.

Is there a correlation to these ETL jobs or the other consumers?

If so what are the consumers trying to do at that time?

Not really no - it seems to be pretty randomly timed.  


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Tom Lane
Date:
James Sewell <james.sewell@jirotech.com> writes:
>> I'm also wondering a bit about whether they're being blocked on a lock,
>> eg. due to something taking an exclusive lock on pg_authid or pg_database.
>> pg_locks might be interesting to check.

> postgres=# select * from pg_locks where not granted;

The hypothesis I'm thinking of is that incoming sessions are being blocked
somewhere before they can acquire a ProcArray entry; if so, they'd not
show up in either pg_stat_activity or pg_locks.  What we have to look for
then is evidence of somebody holding a strong lock on a shared relation.
Try "select * from pg_locks where locktype = 'relation' and database = 0".

            regards, tom lane



Re: Max connections reached without max connections reached

From
James Sewell
Date:
The hypothesis I'm thinking of is that incoming sessions are being blocked
somewhere before they can acquire a ProcArray entry; if so, they'd not
show up in either pg_stat_activity or pg_locks.  What we have to look for
then is evidence of somebody holding a strong lock on a shared relation.
Try "select * from pg_locks where locktype = 'relation' and database = 0".


That doesn't show anything when the issue is happening.

Strace shows the startup processes looping like so:

13:51:15.510850 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=733559666}) = 0
13:51:15.511254 futex(0x7ee891405238, FUTEX_WAKE, 1) = 1
13:51:15.511422 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=734037545}) = 0
13:51:15.511690 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.520470 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=743136450}) = 0
13:51:15.520772 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=743376746}) = 0
13:51:15.521005 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.539309 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=761920081}) = 0
13:51:15.539508 futex(0x7ee891406bb8, FUTEX_WAKE, 1) = 1
13:51:15.539690 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=762298276}) = 0
13:51:15.539882 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.547311 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=769929341}) = 0
13:51:15.547757 futex(0x7ee89140a738, FUTEX_WAKE, 1) = 1
13:51:15.547931 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=770524120}) = 0
13:51:15.548089 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.551482 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
13:51:15.551950 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.593631 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
13:51:15.593807 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.626167 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.627590 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=850204434}) = 0
13:51:15.627817 futex(0x7ee8914033b8, FUTEX_WAKE, 1) = 1
13:51:15.627987 open("pg_subtrans/A100", O_RDWR) = 16
13:51:15.628270 lseek(16, 131072, SEEK_SET) = 131072
13:51:15.628461 read(16, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
13:51:15.628677 close(16)               = 0
13:51:15.628911 open("pg_subtrans/A0E1", O_RDWR) = 16
13:51:15.629132 lseek(16, 122880, SEEK_SET) = 122880
13:51:15.629309 read(16, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
13:51:15.629503 close(16)               = 0
13:51:15.629685 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=852272645}) = 0
13:51:15.629863 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.656694 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=879314944}) = 0
13:51:15.656923 futex(0x7ee8914125b8, FUTEX_WAKE, 1) = 1
13:51:15.657129 open("pg_subtrans/A0EE", O_RDWR) = 16
13:51:15.657385 lseek(16, 73728, SEEK_SET) = 73728
13:51:15.657638 read(16, "\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"..., 8192) = 8192
13:51:15.657907 close(16)               = 0
13:51:15.658141 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=880753952}) = 0
13:51:15.658346 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.689705 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=912331058}) = 0
13:51:15.689950 futex(0x7ee89140a0b8, FUTEX_WAKE, 1) = 1
13:51:15.690182 futex(0x7ee8914118b8, FUTEX_WAKE, 1) = 1
13:51:15.690420 open("pg_subtrans/A0EE", O_RDWR) = 16
13:51:15.690742 lseek(16, 73728, SEEK_SET) = 73728
13:51:15.691037 read(16, "\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"..., 8192) = 8192
13:51:15.691303 close(16)               = 0
13:51:15.691582 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=914202777}) = 0
13:51:15.691837 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.713319 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.716423 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=939067321}) = 0
13:51:15.716737 futex(0x7ee89140a0b8, FUTEX_WAKE, 1) = 1
13:51:15.717010 futex(0x7ee89140f3b8, FUTEX_WAKE, 1) = 1
13:51:15.717286 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=939899218}) = 0
13:51:15.717578 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.722270 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.746391 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=969020657}) = 0
13:51:15.746664 futex(0x7ee89140fa38, FUTEX_WAKE, 1) = 1
13:51:15.746877 open("pg_subtrans/A0EE", O_RDWR) = 16

 

While gdb shows similar to:

   24 ^[[?1034h(gdb) bt
   23 #0  0x00007f18fb9deb3b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
   22 #1  0x00007f18fb9debcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
   21 #2  0x00007f18fb9dec6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
   20 #3  0x000000000071c482 in PGSemaphoreLock ()
   19 #4  0x000000000079078c in LWLockAcquire ()
   18 #5  0x00000000004fa9ae in SimpleLruWaitIO.isra.1 ()
   17 #6  0x00000000004fb2a4 in SimpleLruReadPage ()
   16 #7  0x00000000004fbc07 in SubTransGetParent ()
   15 #8  0x00000000004fbc57 in SubTransGetTopmostTransaction ()
   14 #9  0x00000000008f65a7 in XidInMVCCSnapshot ()
   13 #10 0x00000000008f662e in HeapTupleSatisfiesMVCC ()
   12 #11 0x00000000004c436e in heapgetpage ()
   11 #12 0x00000000004c58e4 in heap_getnext ()
   10 #13 0x00000000008ab8b5 in ScanPgRelation ()
    9 #14 0x00000000008ae124 in RelationBuildDesc ()
    8 #15 0x00000000008b010e in load_critical_index ()
    7 #16 0x00000000008b39ac in RelationCacheInitializePhase3 ()
    6 #17 0x00000000008cbd56 in InitPostgres ()
    5 #18 0x00000000007a1cfe in PostgresMain ()
    4 #19 0x000000000048a9c0 in ServerLoop ()
    3 #20 0x000000000072f11e in PostmasterMain ()
    2 #21 0x000000000048b854 in main ()


Does that shed any light?

- James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Dilip Kumar
Date:
On Thu, Nov 25, 2021 at 8:58 AM James Sewell <james.sewell@jirotech.com> wrote:
>>
>> The hypothesis I'm thinking of is that incoming sessions are being blocked
>> somewhere before they can acquire a ProcArray entry; if so, they'd not
>> show up in either pg_stat_activity or pg_locks.  What we have to look for
>> then is evidence of somebody holding a strong lock on a shared relation.
>> Try "select * from pg_locks where locktype = 'relation' and database = 0".
>>
>
> That doesn't show anything when the issue is happening.
>
> Strace shows the startup processes looping like so:
>
> 13:51:15.510850 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=733559666}) = 0
> 13:51:15.511254 futex(0x7ee891405238, FUTEX_WAKE, 1) = 1
> 13:51:15.511422 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=734037545}) = 0
> 13:51:15.511690 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.520470 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=743136450}) = 0
> 13:51:15.520772 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=743376746}) = 0
> 13:51:15.521005 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.539309 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=761920081}) = 0
> 13:51:15.539508 futex(0x7ee891406bb8, FUTEX_WAKE, 1) = 1
> 13:51:15.539690 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=762298276}) = 0
> 13:51:15.539882 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.547311 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=769929341}) = 0
> 13:51:15.547757 futex(0x7ee89140a738, FUTEX_WAKE, 1) = 1
> 13:51:15.547931 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=770524120}) = 0
> 13:51:15.548089 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.551482 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> 13:51:15.551950 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.593631 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> 13:51:15.593807 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.626167 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.627590 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=850204434}) = 0
> 13:51:15.627817 futex(0x7ee8914033b8, FUTEX_WAKE, 1) = 1
> 13:51:15.627987 open("pg_subtrans/A100", O_RDWR) = 16
> 13:51:15.628270 lseek(16, 131072, SEEK_SET) = 131072
> 13:51:15.628461 read(16, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> 13:51:15.628677 close(16)               = 0
> 13:51:15.628911 open("pg_subtrans/A0E1", O_RDWR) = 16
> 13:51:15.629132 lseek(16, 122880, SEEK_SET) = 122880
> 13:51:15.629309 read(16, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> 13:51:15.629503 close(16)               = 0
> 13:51:15.629685 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=852272645}) = 0
> 13:51:15.629863 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.656694 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=879314944}) = 0
> 13:51:15.656923 futex(0x7ee8914125b8, FUTEX_WAKE, 1) = 1
> 13:51:15.657129 open("pg_subtrans/A0EE", O_RDWR) = 16
> 13:51:15.657385 lseek(16, 73728, SEEK_SET) = 73728
> 13:51:15.657638 read(16, "\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"..., 8192) = 8192
> 13:51:15.657907 close(16)               = 0
> 13:51:15.658141 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=880753952}) = 0
> 13:51:15.658346 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.689705 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=912331058}) = 0
> 13:51:15.689950 futex(0x7ee89140a0b8, FUTEX_WAKE, 1) = 1
> 13:51:15.690182 futex(0x7ee8914118b8, FUTEX_WAKE, 1) = 1
> 13:51:15.690420 open("pg_subtrans/A0EE", O_RDWR) = 16
> 13:51:15.690742 lseek(16, 73728, SEEK_SET) = 73728
> 13:51:15.691037 read(16, "\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"..., 8192) = 8192
> 13:51:15.691303 close(16)               = 0
> 13:51:15.691582 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=914202777}) = 0
> 13:51:15.691837 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.713319 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.716423 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=939067321}) = 0
> 13:51:15.716737 futex(0x7ee89140a0b8, FUTEX_WAKE, 1) = 1
> 13:51:15.717010 futex(0x7ee89140f3b8, FUTEX_WAKE, 1) = 1
> 13:51:15.717286 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=939899218}) = 0
> 13:51:15.717578 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.722270 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.746391 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=969020657}) = 0
> 13:51:15.746664 futex(0x7ee89140fa38, FUTEX_WAKE, 1) = 1
> 13:51:15.746877 open("pg_subtrans/A0EE", O_RDWR) = 16
>
>
> While gdb shows similar to:
>
>    24 ^[[?1034h(gdb) bt
>    23 #0  0x00007f18fb9deb3b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
>    22 #1  0x00007f18fb9debcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
>    21 #2  0x00007f18fb9dec6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
>    20 #3  0x000000000071c482 in PGSemaphoreLock ()
>    19 #4  0x000000000079078c in LWLockAcquire ()
>    18 #5  0x00000000004fa9ae in SimpleLruWaitIO.isra.1 ()
>    17 #6  0x00000000004fb2a4 in SimpleLruReadPage ()
>    16 #7  0x00000000004fbc07 in SubTransGetParent ()
>    15 #8  0x00000000004fbc57 in SubTransGetTopmostTransaction ()
>    14 #9  0x00000000008f65a7 in XidInMVCCSnapshot ()
>    13 #10 0x00000000008f662e in HeapTupleSatisfiesMVCC ()
>    12 #11 0x00000000004c436e in heapgetpage ()
>    11 #12 0x00000000004c58e4 in heap_getnext ()
>    10 #13 0x00000000008ab8b5 in ScanPgRelation ()
>     9 #14 0x00000000008ae124 in RelationBuildDesc ()
>     8 #15 0x00000000008b010e in load_critical_index ()
>     7 #16 0x00000000008b39ac in RelationCacheInitializePhase3 ()
>     6 #17 0x00000000008cbd56 in InitPostgres ()
>     5 #18 0x00000000007a1cfe in PostgresMain ()
>     4 #19 0x000000000048a9c0 in ServerLoop ()
>     3 #20 0x000000000072f11e in PostmasterMain ()
>     2 #21 0x000000000048b854 in main ()
>
> Does that shed any light?

Seems like some of the processes are taking a long time or stuck while
reading/writing SLRU pages, and due to that while creating a new
connection the backend process is not able to check the transaction
status (from pg_xlog) of the pg_class tuple and gets stuck/taking a
long time in a startup.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Max connections reached without max connections reached

From
Dilip Kumar
Date:
On Thu, Nov 25, 2021 at 9:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 25, 2021 at 8:58 AM James Sewell <james.sewell@jirotech.com> wrote:
> >>
> >> The hypothesis I'm thinking of is that incoming sessions are being blocked
> >> somewhere before they can acquire a ProcArray entry; if so, they'd not
> >> show up in either pg_stat_activity or pg_locks.  What we have to look for
> >> then is evidence of somebody holding a strong lock on a shared relation.
> >> Try "select * from pg_locks where locktype = 'relation' and database = 0".
> >>
> >
> > That doesn't show anything when the issue is happening.
> >
> > Strace shows the startup processes looping like so:
> >
> > 13:51:15.510850 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=733559666}) = 0
> > 13:51:15.511254 futex(0x7ee891405238, FUTEX_WAKE, 1) = 1
> > 13:51:15.511422 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=734037545}) = 0
> > 13:51:15.511690 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.520470 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=743136450}) = 0
> > 13:51:15.520772 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=743376746}) = 0
> > 13:51:15.521005 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.539309 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=761920081}) = 0
> > 13:51:15.539508 futex(0x7ee891406bb8, FUTEX_WAKE, 1) = 1
> > 13:51:15.539690 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=762298276}) = 0
> > 13:51:15.539882 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.547311 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=769929341}) = 0
> > 13:51:15.547757 futex(0x7ee89140a738, FUTEX_WAKE, 1) = 1
> > 13:51:15.547931 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=770524120}) = 0
> > 13:51:15.548089 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.551482 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> > 13:51:15.551950 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.593631 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> > 13:51:15.593807 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.626167 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.627590 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=850204434}) = 0
> > 13:51:15.627817 futex(0x7ee8914033b8, FUTEX_WAKE, 1) = 1
> > 13:51:15.627987 open("pg_subtrans/A100", O_RDWR) = 16
> > 13:51:15.628270 lseek(16, 131072, SEEK_SET) = 131072
> > 13:51:15.628461 read(16, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> > 13:51:15.628677 close(16)               = 0
> > 13:51:15.628911 open("pg_subtrans/A0E1", O_RDWR) = 16
> > 13:51:15.629132 lseek(16, 122880, SEEK_SET) = 122880
> > 13:51:15.629309 read(16, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> > 13:51:15.629503 close(16)               = 0
> > 13:51:15.629685 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=852272645}) = 0
> > 13:51:15.629863 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.656694 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=879314944}) = 0
> > 13:51:15.656923 futex(0x7ee8914125b8, FUTEX_WAKE, 1) = 1
> > 13:51:15.657129 open("pg_subtrans/A0EE", O_RDWR) = 16
> > 13:51:15.657385 lseek(16, 73728, SEEK_SET) = 73728
> > 13:51:15.657638 read(16, "\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"..., 8192) = 8192
> > 13:51:15.657907 close(16)               = 0
> > 13:51:15.658141 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=880753952}) = 0
> > 13:51:15.658346 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.689705 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=912331058}) = 0
> > 13:51:15.689950 futex(0x7ee89140a0b8, FUTEX_WAKE, 1) = 1
> > 13:51:15.690182 futex(0x7ee8914118b8, FUTEX_WAKE, 1) = 1
> > 13:51:15.690420 open("pg_subtrans/A0EE", O_RDWR) = 16
> > 13:51:15.690742 lseek(16, 73728, SEEK_SET) = 73728
> > 13:51:15.691037 read(16, "\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"..., 8192) = 8192
> > 13:51:15.691303 close(16)               = 0
> > 13:51:15.691582 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=914202777}) = 0
> > 13:51:15.691837 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.713319 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.716423 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=939067321}) = 0
> > 13:51:15.716737 futex(0x7ee89140a0b8, FUTEX_WAKE, 1) = 1
> > 13:51:15.717010 futex(0x7ee89140f3b8, FUTEX_WAKE, 1) = 1
> > 13:51:15.717286 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=939899218}) = 0
> > 13:51:15.717578 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.722270 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.746391 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, tv_nsec=969020657}) = 0
> > 13:51:15.746664 futex(0x7ee89140fa38, FUTEX_WAKE, 1) = 1
> > 13:51:15.746877 open("pg_subtrans/A0EE", O_RDWR) = 16
> >
> >
> > While gdb shows similar to:
> >
> >    24 ^[[?1034h(gdb) bt
> >    23 #0  0x00007f18fb9deb3b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
> >    22 #1  0x00007f18fb9debcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
> >    21 #2  0x00007f18fb9dec6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> >    20 #3  0x000000000071c482 in PGSemaphoreLock ()
> >    19 #4  0x000000000079078c in LWLockAcquire ()
> >    18 #5  0x00000000004fa9ae in SimpleLruWaitIO.isra.1 ()
> >    17 #6  0x00000000004fb2a4 in SimpleLruReadPage ()
> >    16 #7  0x00000000004fbc07 in SubTransGetParent ()
> >    15 #8  0x00000000004fbc57 in SubTransGetTopmostTransaction ()
> >    14 #9  0x00000000008f65a7 in XidInMVCCSnapshot ()
> >    13 #10 0x00000000008f662e in HeapTupleSatisfiesMVCC ()
> >    12 #11 0x00000000004c436e in heapgetpage ()
> >    11 #12 0x00000000004c58e4 in heap_getnext ()
> >    10 #13 0x00000000008ab8b5 in ScanPgRelation ()
> >     9 #14 0x00000000008ae124 in RelationBuildDesc ()
> >     8 #15 0x00000000008b010e in load_critical_index ()
> >     7 #16 0x00000000008b39ac in RelationCacheInitializePhase3 ()
> >     6 #17 0x00000000008cbd56 in InitPostgres ()
> >     5 #18 0x00000000007a1cfe in PostgresMain ()
> >     4 #19 0x000000000048a9c0 in ServerLoop ()
> >     3 #20 0x000000000072f11e in PostmasterMain ()
> >     2 #21 0x000000000048b854 in main ()
> >
> > Does that shed any light?
>
> Seems like some of the processes are taking a long time or stuck while
> reading/writing SLRU pages, and due to that while creating a new
> connection the backend process is not able to check the transaction
> status (from pg_xlog) of the pg_class tuple and gets stuck/taking a
> long time in a startup.

My next question would be whether this particular process shown is
stack is stuck forever or finally, it is able to make a connection?  I
want to understand is this just due to slow I/O or something else?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Max connections reached without max connections reached

From
Dilip Kumar
Date:
On Thu, Nov 25, 2021 at 9:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

> > Does that shed any light?
>
> Seems like some of the processes are taking a long time or stuck while
> reading/writing SLRU pages, and due to that while creating a new
> connection the backend process is not able to check the transaction
> status (from pg_xlog) of the pg_class tuple and gets stuck/taking a
> long time in a startup.

Correction, I mean pg_xact (pg_clog in older version) not pg_xlog :)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Max connections reached without max connections reached

From
James Sewell
Date:

> Seems like some of the processes are taking a long time or stuck while
> reading/writing SLRU pages, and due to that while creating a new
> connection the backend process is not able to check the transaction
> status (from pg_xlog) of the pg_class tuple and gets stuck/taking a
> long time in a startup.

My next question would be whether this particular process shown is
stack is stuck forever or finally, it is able to make a connection?  I
want to understand is this just due to slow I/O or something else?

They don't ever seem to clear, (more and more show up)  and IO doesn't seem to be deadlocked at that time.

It's a really odd one! 


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Dilip Kumar
Date:
On Thu, Nov 25, 2021 at 10:53 AM James Sewell <james.sewell@jirotech.com> wrote:
>
>
>> > Seems like some of the processes are taking a long time or stuck while
>> > reading/writing SLRU pages, and due to that while creating a new
>> > connection the backend process is not able to check the transaction
>> > status (from pg_xlog) of the pg_class tuple and gets stuck/taking a
>> > long time in a startup.
>>
>> My next question would be whether this particular process shown is
>> stack is stuck forever or finally, it is able to make a connection?  I
>> want to understand is this just due to slow I/O or something else?
>
>
> They don't ever seem to clear, (more and more show up)  and IO doesn't seem to be deadlocked at that time.

How did you verify that, maybe some process started IO and stuck
there? Can we check pg_stat_activity that is there some process that
shows in the wait event as SLRURead/SLRUWrite and not coming out of
that state?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Max connections reached without max connections reached

From
James Sewell
Date:
How did you verify that, maybe some process started IO and stuck
there? Can we check pg_stat_activity that is there some process that
shows in the wait event as SLRURead/SLRUWrite and not coming out of
that state?

There are no interesting entries in pg_stat_activity, and no wait events of those types DURING the problem periods (I poll this every 15s) .

It's an odd one.

- James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Dilip Kumar
Date:
On Tue, Nov 30, 2021 at 5:16 AM James Sewell <james.sewell@jirotech.com> wrote:
>>
>> How did you verify that, maybe some process started IO and stuck
>> there? Can we check pg_stat_activity that is there some process that
>> shows in the wait event as SLRURead/SLRUWrite and not coming out of
>> that state?

Looking again into the back trace[1], it appeared that the backend is
getting stuck while getting the subtransaction's parent information,
and that can only happen if the snapshot is getting marked as
overflowed.  So it seems that some of the scripts are using a lot of
sub-transaction (> 64) within a transaction and in such case, we can
not cache the sub-transaction information and the snapshot will be
marked overflowed.  After that, for checking the visibility of any
tuple we need to consult the SLRU to get TopMost transaction
information and if the subtransaction SLRU cache is full then it will
have to do the I/O.  So in this case it appeared that a lot of
parallel scans/new connections are trying to read the pg_subtrans and
due to limited cache size that is not even able to load the data in
the cache and getting frequently blocked.  So I assume that when you
say the startup is getting stuck forever, how long did you wait, maybe
it is moving but getting stuck again for the next tuple, so to confirm
that attach the process with gdb put breakpoint in XidInMVCCSnapshot()
and check whether it is hitting that breakpoint again or not.  Also,
check whether you are seeing "SubtransControlLock" in
pg_stat_activity.  There is one nice blog[2] explaining this issue so
you can refer to that as well.


[1]
   21 #2  0x00007f18fb9dec6b in sem_wait@@GLIBC_2.2.5 () from
/lib64/libpthread.so.0
   20 #3  0x000000000071c482 in PGSemaphoreLock ()
   19 #4  0x000000000079078c in LWLockAcquire ()
   18 #5  0x00000000004fa9ae in SimpleLruWaitIO.isra.1 ()
   17 #6  0x00000000004fb2a4 in SimpleLruReadPage ()
   16 #7  0x00000000004fbc07 in SubTransGetParent ()
   15 #8  0x00000000004fbc57 in SubTransGetTopmostTransaction ()
   14 #9  0x00000000008f65a7 in XidInMVCCSnapshot ()
   13 #10 0x00000000008f662e in HeapTupleSatisfiesMVCC ()
   12 #11 0x00000000004c436e in heapgetpage ()

[2]https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Max connections reached without max connections reached

From
James Sewell
Date:

Looking again into the back trace[1], it appeared that the backend is
getting stuck while getting the subtransaction's parent information,
and that can only happen if the snapshot is getting marked as
overflowed.  So it seems that some of the scripts are using a lot of
sub-transaction (> 64) within a transaction and in such case, we can
not cache the sub-transaction information and the snapshot will be
marked overflowed.  After that, for checking the visibility of any
tuple we need to consult the SLRU to get TopMost transaction
information and if the subtransaction SLRU cache is full then it will
have to do the I/O.  So in this case it appeared that a lot of
parallel scans/new connections are trying to read the pg_subtrans and
due to limited cache size that is not even able to load the data in
the cache and getting frequently blocked.  So I assume that when you
say the startup is getting stuck forever, how long did you wait, maybe
it is moving but getting stuck again for the next tuple, so to confirm
that attach the process with gdb put breakpoint in XidInMVCCSnapshot()
and check whether it is hitting that breakpoint again or not.  Also,
check whether you are seeing "SubtransControlLock" in
pg_stat_activity.  There is one nice blog[2] explaining this issue so
you can refer to that as well.

Ah. Snap. This makes a lot of sense. The developers have a nasty habit of using LOOP over rows which has never been stamped out. I bet they have applied this to something with a lot of rows.

Is there a way of seeing which statements / backends / processes have greater than 64 stxns? I'd like to monitor this.

- James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Rob Sargent
Date:
On 12/1/21 3:29 PM, James Sewell wrote:

Looking again into the back trace[1], it appeared that the backend is
getting stuck while getting the subtransaction's parent information,
and that can only happen if the snapshot is getting marked as
overflowed.  So it seems that some of the scripts are using a lot of
sub-transaction (> 64) within a transaction and in such case, we can
not cache the sub-transaction information and the snapshot will be
marked overflowed.  After that, for checking the visibility of any
tuple we need to consult the SLRU to get TopMost transaction
information and if the subtransaction SLRU cache is full then it will
have to do the I/O.  So in this case it appeared that a lot of
parallel scans/new connections are trying to read the pg_subtrans and
due to limited cache size that is not even able to load the data in
the cache and getting frequently blocked.  So I assume that when you
say the startup is getting stuck forever, how long did you wait, maybe
it is moving but getting stuck again for the next tuple, so to confirm
that attach the process with gdb put breakpoint in XidInMVCCSnapshot()
and check whether it is hitting that breakpoint again or not.  Also,
check whether you are seeing "SubtransControlLock" in
pg_stat_activity.  There is one nice blog[2] explaining this issue so
you can refer to that as well.

Ah. Snap. This makes a lot of sense. The developers have a nasty habit of using LOOP over rows which has never been stamped out. I bet they have applied this to something with a lot of rows.

Is there a way of seeing which statements / backends / processes have greater than 64 stxns? I'd like to monitor this.

- James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Also be on the look-out for begin trans; begin trans; etc  I read Kumar's report as nested transactions.  If your gang is doing a transaction per row, they need a kick in the rear.  Anyone not closing those needs a pink slip. 

Re: Max connections reached without max connections reached

From
Michael Lewis
Date:
On Wed, Dec 1, 2021 at 3:52 PM Rob Sargent <robjsargent@gmail.com> wrote:
Also be on the look-out for begin trans; begin trans; etc  I read Kumar's report as nested transactions.  If your gang is doing a transaction per row, they need a kick in the rear.  Anyone not closing those needs a pink slip. 

I have seen the second attempt at calling begin seem to be ignored with "WARNING:  there is already a transaction in progress" similar to calling rollback without a transaction open. Are you stating or suggesting that calling begin a second time actually is more similar to creating a savepoint? That would be rather unexpected from the messaging, at least for me.

Re: Max connections reached without max connections reached

From
Rob Sargent
Date:
On 12/1/21 7:08 PM, Michael Lewis wrote:
On Wed, Dec 1, 2021 at 3:52 PM Rob Sargent <robjsargent@gmail.com> wrote:
Also be on the look-out for begin trans; begin trans; etc  I read Kumar's report as nested transactions.  If your gang is doing a transaction per row, they need a kick in the rear.  Anyone not closing those needs a pink slip. 

I have seen the second attempt at calling begin seem to be ignored with "WARNING:  there is already a transaction in progress" similar to calling rollback without a transaction open. Are you stating or suggesting that calling begin a second time actually is more similar to creating a savepoint? That would be rather unexpected from the messaging, at least for me.
No.  Hard to know what the devs are up to here (what stack they're using, etc), but I'm suggesting one transaction is getting started and not closed. Warnings have been know to be ignored.  I admit I am not familiar with Kumar's term of "sub-transaction" and how those might be created but it sounds to me like a transaction is begun early in the processing and more work happens unaware of the transaction.  And possible not committed.



Re: Max connections reached without max connections reached

From
Dilip Kumar
Date:
On Thu, Dec 2, 2021 at 3:59 AM James Sewell <james.sewell@jirotech.com> wrote:
>
>
>> Looking again into the back trace[1], it appeared that the backend is
>> getting stuck while getting the subtransaction's parent information,
>> and that can only happen if the snapshot is getting marked as
>> overflowed.  So it seems that some of the scripts are using a lot of
>> sub-transaction (> 64) within a transaction and in such case, we can
>> not cache the sub-transaction information and the snapshot will be
>> marked overflowed.  After that, for checking the visibility of any
>> tuple we need to consult the SLRU to get TopMost transaction
>> information and if the subtransaction SLRU cache is full then it will
>> have to do the I/O.  So in this case it appeared that a lot of
>> parallel scans/new connections are trying to read the pg_subtrans and
>> due to limited cache size that is not even able to load the data in
>> the cache and getting frequently blocked.  So I assume that when you
>> say the startup is getting stuck forever, how long did you wait, maybe
>> it is moving but getting stuck again for the next tuple, so to confirm
>> that attach the process with gdb put breakpoint in XidInMVCCSnapshot()
>> and check whether it is hitting that breakpoint again or not.  Also,
>> check whether you are seeing "SubtransControlLock" in
>> pg_stat_activity.  There is one nice blog[2] explaining this issue so
>> you can refer to that as well.
>
>
> Ah. Snap. This makes a lot of sense. The developers have a nasty habit of using LOOP over rows which has never been
stampedout. I bet they have applied this to something with a lot of rows.
 
>
> Is there a way of seeing which statements / backends / processes have greater than 64 stxns? I'd like to monitor
this.

I think there is no such view or anything which tells about which
backend or transaction has more than 64 sub transaction.  But if we
are ready to modify the code then we can LOG that information in
GetNewTransactionId(), when first time we are marking it overflown.
if (nxids < PGPROC_MAX_CACHED_SUBXIDS)
{
   MyProc->subxids.xids[nxids] = xid;
  pg_write_barrier();
  MyProc->subxidStatus.count = substat->count = nxids + 1;
}
else
{
  MyProc->subxidStatus.overflowed = substat->overflowed = true;
 <-- we can log or put breakpoint here and identify which statement is
creating oeverflow-->
}

IMHO, it is good to LOG such information if we are not already logging
this anywhere.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Max connections reached without max connections reached

From
Michael Lewis
Date:
On Wed, Dec 1, 2021 at 8:54 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 12/1/21 7:08 PM, Michael Lewis wrote:
On Wed, Dec 1, 2021 at 3:52 PM Rob Sargent <robjsargent@gmail.com> wrote:
Also be on the look-out for begin trans; begin trans; etc  I read Kumar's report as nested transactions.  If your gang is doing a transaction per row, they need a kick in the rear.  Anyone not closing those needs a pink slip. 

I have seen the second attempt at calling begin seem to be ignored with "WARNING:  there is already a transaction in progress" similar to calling rollback without a transaction open. Are you stating or suggesting that calling begin a second time actually is more similar to creating a savepoint? That would be rather unexpected from the messaging, at least for me.
No.  Hard to know what the devs are up to here (what stack they're using, etc), but I'm suggesting one transaction is getting started and not closed. Warnings have been know to be ignored.  I admit I am not familiar with Kumar's term of "sub-transaction" and how those might be created but it sounds to me like a transaction is begun early in the processing and more work happens unaware of the transaction.  And possible not committed.

Okay cool. Subtransactions are interesting. They are used when a savepoint is issued, and with exception blocks in PL/pgSQL functions. They allow the system to keep track of which "sub" process made changes to rows. So just like the main transaction gets a transaction ID and rows get marked with xmin or xmax as needed for MVCC, the subtransactions get their own transaction ID and the objects they modify get updated to set xmin and xmax with their transaction ID. There are some interesting performance bottlenecks when many subtransactions get involved, but I am still learning about that.

Disclaimer- I am no kind of PG contributor and very much still learning, the above explanation is my best understanding but I welcome any corrections or clarifications.

Re: Max connections reached without max connections reached

From
Michael Lewis
Date:
On Wed, Dec 1, 2021 at 9:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
IMHO, it is good to LOG such information if we are not already logging
this anywhere.

+1

I expect my 'vote' counts for naught, but I fully expect seeing these show up in the logs would have helped me much more quickly have insight into what was going on during times of very high concurrency and extreme slowness with many processes showing as waiting on LwLocks.

Re: Max connections reached without max connections reached

From
James Sewell
Date:
I expect my 'vote' counts for naught, but I fully expect seeing these show up in the logs would have helped me much more quickly have insight into what was going on during times of very high concurrency and extreme slowness with many processes showing as waiting on LwLocks.

Is there any mechanism I could use from a C extension and a hook to warn about >64 overflows? 

- James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Dilip Kumar
Date:
On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

> I think there is no such view or anything which tells about which
> backend or transaction has more than 64 sub transaction.  But if we
> are ready to modify the code then we can LOG that information in
> GetNewTransactionId(), when first time we are marking it overflown.
> if (nxids < PGPROC_MAX_CACHED_SUBXIDS)
> {
>    MyProc->subxids.xids[nxids] = xid;
>   pg_write_barrier();
>   MyProc->subxidStatus.count = substat->count = nxids + 1;
> }
> else
> {
>   MyProc->subxidStatus.overflowed = substat->overflowed = true;
>  <-- we can log or put breakpoint here and identify which statement is
> creating oeverflow-->
> }
>
> IMHO, it is good to LOG such information if we are not already logging
> this anywhere.
>

I have prepared a small patch to log this information.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Max connections reached without max connections reached

From
Dilip Kumar
Date:
On Fri, Dec 3, 2021 at 9:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dilip Kumar <dilipbalaut@gmail.com> writes:
> > On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >> I think there is no such view or anything which tells about which
> >> backend or transaction has more than 64 sub transaction.  But if we
> >> are ready to modify the code then we can LOG that information in
> >> GetNewTransactionId(), when first time we are marking it overflown.
>
> > I have prepared a small patch to log this information.
>
> Putting an elog call into GetNewTransactionId seems like a completely
> horrid idea from a performance standpoint.  Especially if you put it
> inside the XidGenLock hold, where it can block the entire system not just
> the one process.  But even without that, this seems like a performance
> penalty with basically no real-world benefit.  People who have issues
> like this are not going to want to trawl the postmaster log for such
> messages.

Agreed with both points.  What about we add, subxid count and overflow
status in LocalPgBackendStatus and through that, we can show in
pg_stat_activity.  That way we don't have to report it ever and
whenever the user is running pg_stat_activity they can fetch it
directly from "proc->subxidStatus", along with fetching the proc.xid
and proc.xmin.  Does this make sense?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Max connections reached without max connections reached

From
Dilip Kumar
Date:
On Sun, Dec 5, 2021 at 10:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 3, 2021 at 9:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Dilip Kumar <dilipbalaut@gmail.com> writes:
> > > On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >> I think there is no such view or anything which tells about which
> > >> backend or transaction has more than 64 sub transaction.  But if we
> > >> are ready to modify the code then we can LOG that information in
> > >> GetNewTransactionId(), when first time we are marking it overflown.
> >
> > > I have prepared a small patch to log this information.
> >
> > Putting an elog call into GetNewTransactionId seems like a completely
> > horrid idea from a performance standpoint.  Especially if you put it
> > inside the XidGenLock hold, where it can block the entire system not just
> > the one process.  But even without that, this seems like a performance
> > penalty with basically no real-world benefit.  People who have issues
> > like this are not going to want to trawl the postmaster log for such
> > messages.
>
> Agreed with both points.  What about we add, subxid count and overflow
> status in LocalPgBackendStatus and through that, we can show in
> pg_stat_activity.  That way we don't have to report it ever and
> whenever the user is running pg_stat_activity they can fetch it
> directly from "proc->subxidStatus", along with fetching the proc.xid
> and proc.xmin.  Does this make sense?
>

The attached patch implements the idea I explained.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Max connections reached without max connections reached

From
James Sewell
Date:
Agreed with both points.  What about we add, subxid count and overflow
status in LocalPgBackendStatus and through that, we can show in
pg_stat_activity.  That way we don't have to report it ever and
whenever the user is running pg_stat_activity they can fetch it
directly from "proc->subxidStatus", along with fetching the proc.xid
and proc.xmin.  Does this make sense?

Not sure about the overhead and performance aspects, but I like this solution from a functional point of view. 


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
Amul Sul
Date:
On Mon, Dec 6, 2021 at 6:11 AM James Sewell <james.sewell@jirotech.com> wrote:
>>
>> Agreed with both points.  What about we add, subxid count and overflow
>> status in LocalPgBackendStatus and through that, we can show in
>> pg_stat_activity.  That way we don't have to report it ever and
>> whenever the user is running pg_stat_activity they can fetch it
>> directly from "proc->subxidStatus", along with fetching the proc.xid
>> and proc.xmin.  Does this make sense?
>
>
> Not sure about the overhead and performance aspects, but I like this solution from a functional point of view.
>

+1, I too like the idea. The patch doesn't seem to be doing any heavy
lifting, I think that much overhead should be acceptable.

Regards,
Amul



Re: Max connections reached without max connections reached

From
James Sewell
Date:


+1, I too like the idea. The patch doesn't seem to be doing any heavy
lifting, I think that much overhead should be acceptable.

I'm guessing this won't be back-patched? Is it possible to somehow read this information from a C function?

- James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
James Sewell
Date:

I'm guessing this won't be back-patched? Is it possible to somehow read this information from a C function?
 

Ok it is possible, I've got a C extension up and running which hooks ExecutorStart, then once for each TX ( I monitor the nesting depth like in pg_stat_statements, and only attach at the top level) attaches a sub transaction callback, tracking start subtransaction events and incrementing a counter / keeping track of the worst offenders in a hashmap.

This seems to work very well - but I've got a question. How many sub transactions would you expect the following anon block and function to create respectively? The first seems to report 9, and the second 10 - is this expected? It feels like it might be - I just want to make sure.

do $$
begin
   for counter in 1..10 loop
   begin
   INSERT INTO a VALUES (counter);
exception
when no_data_found then  raise exception 'gosh';
  end;
   end loop;
end; 
$$


CREATE OR REPLACE FUNCTION public.create_subtransactions()
 RETURNS void
 LANGUAGE plpgsql
AS $function$
begin
   for counter in 1..10 loop
     begin
       INSERT INTO a VALUES (counter);
     exception
       when no_data_found then  raise exception 'gosh';
     end;
  end loop;
end;
$function$


- James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Max connections reached without max connections reached

From
James Sewell
Date:

Ok it is possible, I've got a C extension up and running which hooks ExecutorStart, then once for each TX ( I monitor the nesting depth like in pg_stat_statements, and only attach at the top level) attaches a sub transaction callback, tracking start subtransaction events and incrementing a counter / keeping track of the worst offenders in a hashmap.

Sorry that should be ExecutorRun  - although I'm a little confused about which is the correct choice.

-James


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.