Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date
Msg-id 00ad01cdc3ef$1ede2f10$5c9a8d30$@kapila@huawei.com
Whole thread Raw
In response to Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown  (Amit kapila <amit.kapila@huawei.com>)
List pgsql-hackers
On Thursday, November 15, 2012 7:29 PM Amit kapila wrote:
> On Monday, November 12, 2012 8:23 PM Fujii Masao wrote:
> On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila <amit.kapila@huawei.com>
> wrote:
> > On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
> >> On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit.kapila@huawei.com>
> >> wrote:
> >> > On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
> >> >> On 19.10.2012 14:42, Amit kapila wrote:
> >> >> > On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
> 
> >>> Are you planning to introduce the timeout mechanism in pg_basebackup
> I feel apart from above, remaining problem is for function call
> PQgetResult() 1. Wherever query is getting sent from BaseBackup, it
> calls the function PQgetResult to receive the result of query.
>     As PQgetResult() is blocking function (it calls pqWait which can
> hang), so if network is down before sending the query itself,
>     then there will not be any result, so it will keep hanging in
> PQgetResult .
> IMO, it can be solved in below ways:
> a. Create one corresponding non-blocking function. But this function is
> being called from inside some of the
>      other libpq function (PQexec->PQexecFinish->PQgetResult). So it can
> be little tricky to solve this way.
> b. Add the receive_timeout variable in PGconn structure and use it in
> pqWait for timeout whenever it is set.
> c. any other better way?
> 
> 
> >> BTW, IIRC the walsender has no timeout mechanism during sending
> >> backup data to pg_basebackup. So it's also useful to implement the
> >> timeout mechanism for the walsender during backup.
> >
> 
> >What about using pq_putmessage_noblock()?
> 
> I think may be some more functions also needs to be made as noblock. I
> am still evaluating.

Done the analysis and seems that for below API's also, we need equivalent
noblock, otherwise same problem can happen as they are also
used in the flow.       a. pq_endmessage        b. EndCommand        c. pq_puttextmessage        d. pq_putemptymessage
     e. ReadyForQuery - For this, because now walsender and normal
 
backend are same.       f. ReadCommand - For this, because now walsender and normal backend
are same. It seems solution for it can be tricky as pq_getbyte is not called
from first level function.

Suggestions/Thoughts?


With Regards,
Amit Kapila.




pgsql-hackers by date:

Previous
From: Amit kapila
Date:
Subject: Re: [PATCH] Patch to compute Max LSN of Data Pages
Next
From: Dimitri Fontaine
Date:
Subject: Re: Materialized views WIP patch