Re: Query on pg_stat_activity table got stuck - Mailing list pgsql-admin

From neeraj kumar
Subject Re: Query on pg_stat_activity table got stuck
Date
Msg-id CAPR3Wj4af3Lpu61DUdB8eri3TQhHLhcKY-q2bTRVLKAmj8h1Ug@mail.gmail.com
Whole thread Raw
In response to Re: Query on pg_stat_activity table got stuck  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Query on pg_stat_activity table got stuck
Re: Query on pg_stat_activity table got stuck
List pgsql-admin
We got more information about this issue. There is one backend process still present into beentry which has changecount as odd value. However this process is long gone/terminated. It means when this process was killed/terminated its entry was not cleaned from beentry list. There seems to be some shutdown hook which cleans beentry if process is killed/terminated that somehow was not kicked off ?

These are some of the fields of corrupted beentry entry which is still hanging :
st_changecount = 1407, st_procpid = 0, st_backendType = B_BACKEND, st_proc_start_timestamp = 610236763633421, st_xact_start_timestamp = 0, st_clienthostname = 0x9000023d480 "", st_ssl = 1 '\001', st_sslstatus = 0x90000c60f80, st_state = STATE_IDLEINTRANSACTION_ABORTED,




On Thu, May 9, 2019 at 1:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeremy Schneider <schnjere@amazon.com> writes:
> Seems to me that at a minimum, this loop shouldn't go on forever. Even
> having an arbitrary, crazy high, hard-coded number of attempts before
> failure (like a million) would be better than spinning on the CPU
> forever - which is what we are seeing.

I don't think it's the readers' fault.  The problem is that the
writer is violating the protocol.  If we put an upper limit on
the number of spin cycles on the reader side, we'll just be creating
a new failure mode when a writer gets swapped out at the wrong moment.

IMO we need to (a) get the failure-prone code out of the critical
section, and then (b) fix the pgstat_increment_changecount macros
so that the critical sections around these shmem changes really are
critical sections (ie bump CritSectionCount).  That way, if somebody
makes the same mistake again, at least there'll be a pretty obvious
failure rather than a lot of stuck readers.

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: Query on pg_stat_activity table got stuck
Next
From: Tom Lane
Date:
Subject: Re: Query on pg_stat_activity table got stuck