Re: BUG #16007: Regarding patch for BUG #3995: pqSocketCheck doesn't return - Mailing list pgsql-bugs

From Kiran Khatke
Subject Re: BUG #16007: Regarding patch for BUG #3995: pqSocketCheck doesn't return
Date
Msg-id CAGKgC8Vv3MyFmyOFTC68_H-JTHcBfqCpSD4GhJrG9i9BOSXd7A@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16007: Regarding patch for BUG #3995: pqSocketCheck doesn't return  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs

Hello Tom, 

Thanks for the support.

Below are the thread which uses libpq, and both the thread stuck in poll() only.

We haven't enabled server logs earlier, so not sure about server side happening.

This issue is rarely reproducible, hence could not check enabling server logs. 

Thread 1:

#0  0x2ea7f184 in *__GI___poll (fds=<value optimized out>, nfds=1, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87

#1  0x2b600238 in pqSocketCheck (conn=0x110999b8, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1043

#2  0x2b600404 in pqWaitTimed (forRead=<value optimized out>, forWrite=1, conn=0x110999b8, finish_time=0) at fe-misc.c:917

#3  0x2b5ff884 in PQgetResult (conn=0x110999b8) at fe-exec.c:1223

#4  0x100c2fa4 in dbConnObj::execStatement_nowait (this=0x110910e8,

    sqlStatement=0x313aae84 "INSERT INTO event (event_id,severity,flags,timestamp,managed_obj_id,managed_obj,groups,params) VALUES (184,5,0,'2019-06-26T10:26:38.133353-07:00',6,'ServicesNode.1025','TRClient','Name=\"031663-SCSN-FO"...) at src/dbmgr/dbConnObj.c:169

#5  0x10099c80 in dbConnectionMgr::insertSQL (this=0x11090640, objID=DBO_EVENT, type=DB_LOGGING,

    serialObj=0x1156273c "9,11,1171457,1,0,13,1171458,3,184,11,1171459,1,5,11,1171460,1,0,43,1171461,32,2019-06-26T10:26:38.133353-07:00,11,1171462,1,6,28,1171463,17,ServicesNode.1025,18,1171464,8,TRClient,84,1171465,73,Name=\""..., retSeqErr=true) at src/dbmgr/dbConnectionMgr.c:1489

 

Thread 2: (main processing thread)

#5  0x1006f4b8 in _pga_stop_db () at src/dbmgr/pg_admin.c:7643

#6  0x1006f618 in pga_stop () at src/dbmgr/pg_admin.c:168

#7  0x10f0c330 in _dbm_sigabrt (signo=6, si=0x7f766d58, context=0x7f766dd8) at src/dbmgr/dbm_main.c:1567

#8  <signal handler called>

#9  0x2ea7f184 in *__GI___poll (fds=<value optimized out>, nfds=1, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87

#10 0x2b600238 in pqSocketCheck (conn=0x11088158, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1043

#11 0x2b600404 in pqWaitTimed (forRead=<value optimized out>, forWrite=4, conn=0x11088158, finish_time=1) at fe-misc.c:917

#12 0x2b5ff884 in PQgetResult (conn=0x11088158) at fe-exec.c:1223

#13 0x2b5ffb48 in PQexecFinish (conn=0x11088158) at fe-exec.c:1452

#14 0x100c2930 in dbConnObj::execStatement (this=0x11091048, sqlStatement=0x3100bec4 "UPDATE MGMT_SERVER SET LAST_SUCCESSFUL_CONNECTION='1561569998' ", checkAlreadyExists=false, freeResult=true,

    retSeqErr=false) at src/dbmgr/dbConnObj.c:243

#15 0x10099498 in dbConnectionMgr::updateSQL (this=0x11090640, objID=DBO_MGMT_SERVER_TIME, type=DB_CONFIGURATION, serialObj=0x1156249c "1,19,16385,10,1561569998", consObj=@0x7f7673f8, cons=0x2e8957e4 "",

    consSerial=0x2e8957e4 "") at src/dbmgr/dbConnectionMgr.c:1647

Regards,
Kiran 

On Mon, Sep 16, 2019 at 6:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kiran Khatke <kirankhatke23may@gmail.com> writes:
> One of the thread of DBMGR Daemon is waiting for the result of poll()
> function.
> poll() was called by pgSocketCheck(). So pqSocketCheck() didn't return,
> hung in poll().
> Below is the backtrace.

Well, it's waiting for the query to finish, or so it thinks.  Did you
look at what the server thinks the session is doing?

Your reference to multiple threads is a red flag to me.  Very often
we see people whose programs try to use the same PGconn object from
multiple threads.  That doesn't work --- and libpq does not have any
internal mutexes that would prevent the object's state from getting
messed up by concurrent operations.  So a plausible theory is that
this PGconn was used concurrently, and now this particular thread
is stuck because the object's state is corrupt (ie, it shows the
query as busy but the server doesn't think so).

It might be worth enabling log_statement = all on the server side
and then watching the server log to see what seems to be happening
from that end.

                        regards, tom lane

pgsql-bugs by date:

Previous
From: Anthony Sotolongo
Date:
Subject: Re: BUG #16011: Select * query for sequences does not show allcolumns in output.
Next
From: "Bossart, Nathan"
Date:
Subject: Re: ERROR: multixact X from before cutoff Y found to be still running