Thread: Queries that should be canceled will get stuck on secure_write function
Hi, all
Recently, I got a problem that the startup process of standby is stuck and keeps in a waiting state. The backtrace of startup process shows that it is waiting for a backend process which conflicts with recovery processing to exit, the guc parameters max_standby_streaming_delay and max_standby_archive_delay are both set as 30 seconds, but it doesn't work. The backend process keeps alive, and the backtrace of this backend process show that it is waiting for the socket to be writeable in secure_write(). After further reading the code, I found that ProcessClientWriteInterrupt() in secure_write() only process interrupts when ProcDiePending is true, otherwise do nothing. However, snapshot conflicts with recovery will only set QueryCancelPending as true, so the response to the signal will de delayed indefinitely if the corresponding client is stuck, thus blocking the recovery process.
I want to know why the interrupt is only handled when ProcDiePending is true, I think query which is supposed to be canceled also should respond to the signal.
I also want to share a patch with you, I add a guc parameter max_standby_client_write_delay, if a query is supposed to be canceled, and the time delayed by a client exceeds max_standby_client_write_delay, then set ProcDiePending as true to avoid being delayed indefinitely, what do you think of it, hope to get your reply.
Thanks & Best Regard
Attachment
On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com> wrote: > I want to know why the interrupt is only handled when ProcDiePending is true, I think query which is supposed to be canceledalso should respond to the signal. Well, if we're halfway through sending a message to the client and we abort the write, we have no way of re-establishing protocol sync, right? It's OK to die without sending any more data to the client, since then the connection is closed anyway and the loss of sync doesn't matter, but continuing the session doesn't seem workable. Your proposed patch actually seems to dodge this problem and I think perhaps we could consider something along those lines. It would be interesting to hear what Andres thinks about this idea, since I think he was the last one to rewrite that code. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Queries that should be canceled will get stuck on secure_write function
From
Alvaro Herrera
Date:
On 2021-Aug-23, Robert Haas wrote: > On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com> wrote: > > I want to know why the interrupt is only handled when ProcDiePending > > is true, I think query which is supposed to be canceled also should > > respond to the signal. Yeah, I agree. > Well, if we're halfway through sending a message to the client and we > abort the write, we have no way of re-establishing protocol sync, > right? It's OK to die without sending any more data to the client, > since then the connection is closed anyway and the loss of sync > doesn't matter, but continuing the session doesn't seem workable. > > Your proposed patch actually seems to dodge this problem and I think > perhaps we could consider something along those lines. Do we actually need new GUCs, though? I think we should never let an unresponsive client dictate what the server does, because that opens the door for uncooperative or malicious clients to wreak serious havoc. I think the implementation should wait until time now+X to cancel the query, but if by time now+2X (or whatever we deem reasonable -- maybe now+1.1X) we're still waiting, then it's okay to just close the connection. This suggests a completely different implementation, though. I wonder if it's possible to write a test for this. We would have to send a query and then hang the client somehow. I recently added a TAP test that uses SIGSTOP to a walsender ... can we use SIGSTOP with a background psql that's running SELECT pg_sleep() perhaps? (Or maybe it's sufficient to start background psql and not pump() it?) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "It takes less than 2 seconds to get to 78% complete; that's a good sign. A few seconds later it's at 90%, but it seems to have stuck there. Did somebody make percentages logarithmic while I wasn't looking?" http://smylers.hates-software.com/2005/09/08/1995c749.html
On Mon, Aug 23, 2021 at 10:45 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Do we actually need new GUCs, though? I think we should never let an > unresponsive client dictate what the server does, because that opens the > door for uncooperative or malicious clients to wreak serious havoc. I > think the implementation should wait until time now+X to cancel the > query, but if by time now+2X (or whatever we deem reasonable -- maybe > now+1.1X) we're still waiting, then it's okay to just close the > connection. This suggests a completely different implementation, though. I don't quite understand what you mean by waiting until time now+X to cancel the query. We don't know a priori when query cancels are going to happen, but when they do happen we want to respond to them as quickly as we can. It seems reasonable to say that if we can't handle them within X amount of time we can resort to emergency measures, but that's basically what the patch does, except it uses a GUC instead of hardcoding X. > I wonder if it's possible to write a test for this. We would have to > send a query and then hang the client somehow. I recently added a TAP > test that uses SIGSTOP to a walsender ... can we use SIGSTOP with a > background psql that's running SELECT pg_sleep() perhaps? > (Or maybe it's sufficient to start background psql and not pump() it?) Starting a background process and not pumping it sounds promising, because it seems like it would be more likely to be portable. I think we'd want to be careful not to assume very much about how large the output buffer is, because I'm guessing that varies by platform and that it might in some cases be fairly large. Perhaps we could use pg_stat_activity to wait until we block in a ClientWrite state, although I wonder if we might find out that we sometimes block on a different wait state than what we expect to see. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Queries that should be canceled will get stuck on secure_write function
From
Alvaro Herrera
Date:
On 2021-Aug-23, Robert Haas wrote: > On Mon, Aug 23, 2021 at 10:45 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Do we actually need new GUCs, though? I think we should never let an > > unresponsive client dictate what the server does, because that opens the > > door for uncooperative or malicious clients to wreak serious havoc. I > > think the implementation should wait until time now+X to cancel the > > query, but if by time now+2X (or whatever we deem reasonable -- maybe > > now+1.1X) we're still waiting, then it's okay to just close the > > connection. This suggests a completely different implementation, though. > > I don't quite understand what you mean by waiting until time now+X to > cancel the query. We don't know a priori when query cancels are going > to happen, but when they do happen we want to respond to them as > quickly as we can. It seems reasonable to say that if we can't handle > them within X amount of time we can resort to emergency measures, but > that's basically what the patch does, except it uses a GUC instead of > hardcoding X. Aren't we talking about query cancellations that occur in response to a standby delay limit? Those aren't in response to user action. What I mean is that if the standby delay limit is exceeded, then we send a query cancel; we expect the standby to cancel its query at that time and then the primary can move on. But if the standby doesn't react, then we can have it terminate its connection. I'm looking at the problem from the primary's point of view rather than the standby's point of view. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Aren't we talking about query cancellations that occur in response to a > standby delay limit? Those aren't in response to user action. Oh, you're right. But I guess a similar problem could also occur in response to pg_terminate_backend(), no? -- Robert Haas EDB: http://www.enterprisedb.com
On 2021/08/24 0:26, Alvaro Herrera wrote: > Aren't we talking about query cancellations that occur in response to a > standby delay limit? Those aren't in response to user action. What I > mean is that if the standby delay limit is exceeded, then we send a > query cancel; we expect the standby to cancel its query at that time and > then the primary can move on. But if the standby doesn't react, then we > can have it terminate its connection. +1 On 2021/08/24 3:45, Robert Haas wrote: > On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> Aren't we talking about query cancellations that occur in response to a >> standby delay limit? Those aren't in response to user action. > > Oh, you're right. But I guess a similar problem could also occur in > response to pg_terminate_backend(), no? There seems no problem in that case because pg_terminate_backend() causes a backend to set ProcDiePending to true in die() signal hander and ProcessClientWriteInterrupt() called by secure_write() handles ProcDiePending. No? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Yes, pg_terminate_backend() can terminate the connection successfully in this case because ProcDiePending is set as true and ProcessClientWriteInterrupt() can handle it.
Queries those exceed standby delay limit can be terminated in this way, but what about other queries that should be canceled but get stuck on secure_write()? After all, there is currently no way to list all possible situations and then terminate these queries one by one.
------------------------------------------------------------------发件人:Fujii Masao <masao.fujii@oss.nttdata.com>发送时间:2021年8月24日(星期二) 13:15收件人:Robert Haas <robertmhaas@gmail.com>; Alvaro Herrera <alvherre@alvh.no-ip.org>抄 送:蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>; pgsql-hackers <pgsql-hackers@lists.postgresql.org>; Andres Freund <andres@anarazel.de>主 题:Re: Queries that should be canceled will get stuck on secure_write function
On 2021/08/24 0:26, Alvaro Herrera wrote:
> Aren't we talking about query cancellations that occur in response to a
> standby delay limit? Those aren't in response to user action. What I
> mean is that if the standby delay limit is exceeded, then we send a
> query cancel; we expect the standby to cancel its query at that time and
> then the primary can move on. But if the standby doesn't react, then we
> can have it terminate its connection.
+1
On 2021/08/24 3:45, Robert Haas wrote:
> On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Aren't we talking about query cancellations that occur in response to a
>> standby delay limit? Those aren't in response to user action.
>
> Oh, you're right. But I guess a similar problem could also occur in
> response to pg_terminate_backend(), no?
There seems no problem in that case because pg_terminate_backend() causes
a backend to set ProcDiePending to true in die() signal hander and
ProcessClientWriteInterrupt() called by secure_write() handles ProcDiePending.
No?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Tue, Aug 24, 2021 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > Oh, you're right. But I guess a similar problem could also occur in > > response to pg_terminate_backend(), no? > > There seems no problem in that case because pg_terminate_backend() causes > a backend to set ProcDiePending to true in die() signal hander and > ProcessClientWriteInterrupt() called by secure_write() handles ProcDiePending. > No? Hmm, maybe you're right. What about pg_cancel_backend()? -- Robert Haas EDB: http://www.enterprisedb.com
On 2021/08/25 2:30, Robert Haas wrote: > Hmm, maybe you're right. What about pg_cancel_backend()? I was thinking that it's valid even if secure_write() doesn't react to pg_cancel_backend() because it's basically called outside transaction block, i.e., there is no longer transaction to cancel in that case. But there can be some cases where secure_write() is called inside transaction block, for example, when the query generates NOTICE message. In these cases, secure_write() might need to react to the cancel request. BTW, when an error happens, I found that a backend calls EmitErrorReport() to report an error to a client, and then calls AbortCurrentTransaction() to abort the transaction. If secure_write() called by EmitErrorReport() gets stuck, a backend gets stuck inside transaction block and the locks keep being held unnecessarily. Isn't this problematic? Can we change the order of them? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, Aug 24, 2021 at 9:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > I was thinking that it's valid even if secure_write() doesn't react to > pg_cancel_backend() because it's basically called outside transaction block, > i.e., there is no longer transaction to cancel in that case. But there can be > some cases where secure_write() is called inside transaction block, > for example, when the query generates NOTICE message. In these cases, > secure_write() might need to react to the cancel request. Yeah. I think we could also be sending tuple data. > BTW, when an error happens, I found that a backend calls EmitErrorReport() > to report an error to a client, and then calls AbortCurrentTransaction() > to abort the transaction. If secure_write() called by EmitErrorReport() > gets stuck, a backend gets stuck inside transaction block and the locks > keep being held unnecessarily. Isn't this problematic? Can we change > the order of them? I think there might be problems with that, like perhaps the ErrorData object can have pointers into the memory contexts that we'd be destroying in AbortCurrentTransaction(). More generally, I think it's hopeless to try to improve the situation very much by changing the place where secure_write() happens. It happens in so many different places, and it is clearly not going to be possible to make it so that in none of those places do we hold any important server resources. Therefore I think the only solution is to fix secure_write() itself ... and the trick is what to do there given that we have to be very careful not to do anything that might try to write another message while we are already in the middle of writing a message. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Queries that should be canceled will get stuck on secure_write function
From
Andres Freund
Date:
Hi, On 2021-08-23 10:13:03 -0400, Robert Haas wrote: > On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com> wrote: > > I want to know why the interrupt is only handled when ProcDiePending is true, I think query which is supposed to be canceledalso should respond to the signal. > > Well, if we're halfway through sending a message to the client and we > abort the write, we have no way of re-establishing protocol sync, > right? It's OK to die without sending any more data to the client, > since then the connection is closed anyway and the loss of sync > doesn't matter, but continuing the session doesn't seem workable. Right. > Your proposed patch actually seems to dodge this problem and I think > perhaps we could consider something along those lines. It would be > interesting to hear what Andres thinks about this idea, since I think > he was the last one to rewrite that code. I think it's a reasonable idea. I have some quibbles with the implementation (new code should be in ProcessClientWriteInterrupt(), not secure_write()), and I suspect we should escalate more quickly to killing the connection, but those seem like details. I think that if we want to tackle this, we need to do the same for secure_read() as well. secure_read() will process interrupts normally while DoingCommandRead, but if we're already in a command, we'd just as well be prevented from processing a !ProcDiePending interrupt. Greetings, Andres Freund
Re: Queries that should be canceled will get stuck on secure_write function
From
Andres Freund
Date:
Hi, On 2021-08-27 08:27:38 -0400, Robert Haas wrote: > On Tue, Aug 24, 2021 at 9:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > to report an error to a client, and then calls AbortCurrentTransaction() > > to abort the transaction. If secure_write() called by EmitErrorReport() > > gets stuck, a backend gets stuck inside transaction block and the locks > > keep being held unnecessarily. Isn't this problematic? Can we change > > the order of them? > ... > More generally, I think it's hopeless to try to improve the situation > very much by changing the place where secure_write() happens. It > happens in so many different places, and it is clearly not going to be > possible to make it so that in none of those places do we hold any > important server resources. Therefore I think the only solution is to > fix secure_write() itself ... and the trick is what to do there given > that we have to be very careful not to do anything that might try to > write another message while we are already in the middle of writing a > message. I wonder if we could improve the situation somewhat by using the non-blocking pqcomm functions in a few select places. E.g. if elog.c's send_message_to_frontend() sent its message via a new pq_endmessage_noblock() (which'd use the existing pq_putmessage_noblock()) and used pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the client before AbortCurrentTransaction(), b) able to queue further error messages safely. I think this'd not violate the goal of putting pq_flush() into send_message_to_frontend(): /* * This flush is normally not necessary, since postgres.c will flush out * waiting data when control returns to the main loop. But it seems best * to leave it here, so that the client has some clue what happened if the * backend dies before getting back to the main loop ... error/notice * messages should not be a performance-critical path anyway, so an extra * flush won't hurt much ... */ pq_flush(); because the only situations where we'd not send the data out immediately would be when the socket buffer is already full. In which case the client wouldn't get the error immediately anyway? Greetings, Andres Freund
On Fri, Aug 27, 2021 at 3:24 PM Andres Freund <andres@anarazel.de> wrote: > I wonder if we could improve the situation somewhat by using the non-blocking > pqcomm functions in a few select places. E.g. if elog.c's > send_message_to_frontend() sent its message via a new pq_endmessage_noblock() > (which'd use the existing pq_putmessage_noblock()) and used > pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the > client before AbortCurrentTransaction(), b) able to queue further error > messages safely. pq_flush_if_writable() could succeed in sending only part of the data, so I don't see how this works. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Queries that should be canceled will get stuck on secure_write function
From
"Andres Freund"
Date:
Hi, On Fri, Aug 27, 2021, at 13:07, Robert Haas wrote: > On Fri, Aug 27, 2021 at 3:24 PM Andres Freund <andres@anarazel.de> wrote: > > I wonder if we could improve the situation somewhat by using the non-blocking > > pqcomm functions in a few select places. E.g. if elog.c's > > send_message_to_frontend() sent its message via a new pq_endmessage_noblock() > > (which'd use the existing pq_putmessage_noblock()) and used > > pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the > > client before AbortCurrentTransaction(), b) able to queue further error > > messages safely. > > pq_flush_if_writable() could succeed in sending only part of the data, > so I don't see how this works. All the data is buffered though, so I don't see that problem that causes? Andres
I add a test to reproduce the problem, you can see the attachment for specific content
during the last sleep time of the test, use pstack to get the stack of the backend process, which is as follows:
#0 0x00007f6ebdd744d3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x00000000007738d2 in WaitEventSetWait ()
#2 0x0000000000675aae in secure_write ()
#3 0x000000000067bfab in internal_flush ()
#4 0x000000000067c13a in internal_putbytes ()
#5 0x000000000067c217 in socket_putmessage ()
#6 0x0000000000497f36 in printtup ()
#7 0x00000000006301e0 in standard_ExecutorRun ()
#8 0x00000000007985fb in PortalRunSelect ()
#9 0x0000000000799968 in PortalRun ()
#10 0x0000000000795866 in exec_simple_query ()
#11 0x0000000000796cff in PostgresMain ()
#12 0x0000000000488339 in ServerLoop ()
#13 0x0000000000717bbc in PostmasterMain ()
#14 0x0000000000488f26 in main ()
#1 0x00000000007738d2 in WaitEventSetWait ()
#2 0x0000000000675aae in secure_write ()
#3 0x000000000067bfab in internal_flush ()
#4 0x000000000067c13a in internal_putbytes ()
#5 0x000000000067c217 in socket_putmessage ()
#6 0x0000000000497f36 in printtup ()
#7 0x00000000006301e0 in standard_ExecutorRun ()
#8 0x00000000007985fb in PortalRunSelect ()
#9 0x0000000000799968 in PortalRun ()
#10 0x0000000000795866 in exec_simple_query ()
#11 0x0000000000796cff in PostgresMain ()
#12 0x0000000000488339 in ServerLoop ()
#13 0x0000000000717bbc in PostmasterMain ()
#14 0x0000000000488f26 in main ()
Attachment
I changed the implementation about this problem:
a) if the cancel query interrupt is from db for some reason, such as recovery conflict, then handle it immediately, and cancel request is treated as terminate request;
b) if the cancel query interrupt is from client, then ignore as original way
In the attached patch, I also add a tap test to generate a recovery conflict on a standby during the backend process is stuck on client write, check whether it can handle the cancel query request due to recovery conflict.
what do you think of it, hope to get your reply
Thanks & Best Regards
Attachment
On Fri, Aug 27, 2021 at 4:10 PM Andres Freund <andres@anarazel.de> wrote: > On Fri, Aug 27, 2021, at 13:07, Robert Haas wrote: > > On Fri, Aug 27, 2021 at 3:24 PM Andres Freund <andres@anarazel.de> wrote: > > > I wonder if we could improve the situation somewhat by using the non-blocking > > > pqcomm functions in a few select places. E.g. if elog.c's > > > send_message_to_frontend() sent its message via a new pq_endmessage_noblock() > > > (which'd use the existing pq_putmessage_noblock()) and used > > > pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the > > > client before AbortCurrentTransaction(), b) able to queue further error > > > messages safely. > > > > pq_flush_if_writable() could succeed in sending only part of the data, > > so I don't see how this works. > > All the data is buffered though, so I don't see that problem that causes? OK, I guess I'm confused here. If we're always buffering the data, then I suppose there's no risk of injecting a protocol message into the middle of some other protocol message, assuming that we don't have a non-local transfer of control halfway through putting a message in the buffer. But there's still the risk of getting out of step with the client. Suppose the client does SELECT 1/0 and we send an ErrorResponse complaining about the division by zero. But as we're trying to send that response, we block. Later, a query cancel occurs. We can't queue another ErrorResponse because the client will interpret that as the response to the next query, since the division by zero error is the response to the current one. I don't think that changing pq_flush() to pq_flush_if_writable() in elog.c or anywhere else can fix that problem. But that doesn't mean that it isn't a good idea. Any place where we're doing a pq_flush() and know that another one will happen soon afterward, before we wait for data from the client, can be changed to pq_flush_if_writable() without harm, and it's beneficial to do so, because like you say, it avoids blocking in places that users may find inconvenient - e.g. while holding locks, as Fujii-san said. The comment here claims that "postgres.c will flush out waiting data when control returns to the main loop" but the only pq_flush() call that's directly present in postgres.c is in response to receiving a Flush message, so I suppose this is actually talking about the pq_flush() inside ReadyForQuery. It's not 100% clear to me that we do that in all relevant cases, though. Suppose we hit an error while processing some protocol message that does not set send_ready_for_query = true, like for example Describe ('D'). I think in that case the flush in elog.c is the only one. Perhaps we ought to change postgres.c so that if we don't enter the block guarded by "if (send_ready_for_query)" we instead pq_flush(). -- Robert Haas EDB: http://www.enterprisedb.com
Hi all, I want to know your opinion on this patch, or in what way do you think we should solve this problem?
------------------------------------------------------------------发件人:蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>发送时间:2021年9月9日(星期四) 17:38收件人:Robert Haas <robertmhaas@gmail.com>; Andres Freund <andres@anarazel.de>; alvherre <alvherre@alvh.no-ip.org>; masao.fujii <masao.fujii@oss.nttdata.com>抄 送:pgsql-hackers <pgsql-hackers@lists.postgresql.org>主 题:回复:Queries that should be canceled will get stuck on secure_write functionI changed the implementation about this problem:a) if the cancel query interrupt is from db for some reason, such as recovery conflict, then handle it immediately, and cancel request is treated as terminate request;b) if the cancel query interrupt is from client, then ignore as original wayIn the attached patch, I also add a tap test to generate a recovery conflict on a standby during the backend process is stuck on client write, check whether it can handle the cancel query request due to recovery conflict.what do you think of it, hope to get your replyThanks & Best Regards
Attachment
Re: 回复:Queries that should be canceled will get stuck on secure_write function
From
Fujii Masao
Date:
On 2021/09/22 1:14, 蔡梦娟(玊于) wrote: > Hi all, I want to know your opinion on this patch, or in what way do you think we should solve this problem? I agree that something like the patch (i.e., introduction of promotion from cancel request to terminate one) is necessary for the fix. One concern on the patch is that the cancel request can be promoted to the terminate one even when secure_write() doesn't actually get stuck. Is this acceptable? Maybe I'm tempted to set up the duration until the promotion happens.... Or we should introduce the dedicated timer for communication on the socket? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Yes, it is more appropriate to set a duration time to determine whether secure_write() is stuck, but it is difficult to define how long the duration time is.
in my first patch, I add a GUC to allow the user to set the time, or it can be hardcoded if a time deemed reasonable is provided?
------------------------------------------------------------------I agree that something like the patch (i.e., introduction of promotion
from cancel request to terminate one) is necessary for the fix. One concern
on the patch is that the cancel request can be promoted to the terminate one
even when secure_write() doesn't actually get stuck. Is this acceptable?
Maybe I'm tempted to set up the duration until the promotion happens....
Or we should introduce the dedicated timer for communication on the socket?