Thread: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
Hello, I'm investigating a mysterious hang problem on PostgreSQL 9.2.8. If many sessions use temporary tables whose rows are deleted on commit, the hang occurs. I'd like to show you the stack trace, but I'm trying to figure out how to reproduce the problem. IIRC, the stack trace was as follows. The standby server was running normally. ... SyncRepWaitForLSN CommitTransaction CommitTransactionCommand ProcessCatchupEvent HandleCatchupInterrupt procsignal_sigusr1_handler <SIGUSR1 received> recv ... ReadCommand PostgresMain ... Looking at smgrtruncate(), the sinval message is sent even when the truncated relation is a temporary relation. However, I think the sinval message is not necessary for temp relations, because each session doesn't see the temp relations of other sessions. So, the following code seems better. This avoids sinval queue overflow which leads to SIGUSR1. Is this correct? if (SmgrIsTemp(reln)) /* Do his on behalf of sinval message handler */ smgrclosenode(reln->smgr_rnode); else CacheInvalidateSmgr(reln->smgr_rnode); Regards MauMau
"MauMau" <maumau307@gmail.com> writes: > Looking at smgrtruncate(), the sinval message is sent even when the > truncated relation is a temporary relation. However, I think the sinval > message is not necessary for temp relations, because each session doesn't > see the temp relations of other sessions. This seems like a pretty unsafe suggestion, because the smgr level doesn't know or care whether relations are temp; files are files. In any case it would only paper over one specific instance of whatever problem you're worried about, because sinval messages definitely do need to be sent in general. regards, tom lane
From: "Tom Lane" <tgl@sss.pgh.pa.us> > This seems like a pretty unsafe suggestion, because the smgr level doesn't > know or care whether relations are temp; files are files. In any case it > would only paper over one specific instance of whatever problem you're > worried about, because sinval messages definitely do need to be sent in > general. I'm sorry I don't show the exact problem yet. Apart from that, I understood that you insist it's not appropriate for smgr to be aware of whether the file is a temporary relation, in terms of module layering. However, it doesn't seem so in the current implementation. md.c, which is a layer under or part of smgr, uses SmgrIsTemp(). In addition, as the name suggests, SmgrIsTemp() is a function of smgr, which is defined in smgr.h. So, it's not inappropriate for smgr to use it. What I wanted to ask is whether and why sinval messages are really necessary for session-private objects like temp relations. I thought shared inval is, as the name indicates, for objects "shared" among sessions. If so, sinval for session-private objects doesn't seem to match the concept. Regards MauMau
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Robert Haas
Date:
On Wed, Jul 23, 2014 at 12:17 PM, MauMau <maumau307@gmail.com> wrote: > From: "Tom Lane" <tgl@sss.pgh.pa.us> >> This seems like a pretty unsafe suggestion, because the smgr level doesn't >> know or care whether relations are temp; files are files. In any case it >> would only paper over one specific instance of whatever problem you're >> worried about, because sinval messages definitely do need to be sent in >> general. > > I'm sorry I don't show the exact problem yet. Apart from that, I understood > that you insist it's not appropriate for smgr to be aware of whether the > file is a temporary relation, in terms of module layering. However, it > doesn't seem so in the current implementation. md.c, which is a layer under > or part of smgr, uses SmgrIsTemp(). In addition, as the name suggests, > SmgrIsTemp() is a function of smgr, which is defined in smgr.h. So, it's > not inappropriate for smgr to use it. > > What I wanted to ask is whether and why sinval messages are really necessary > for session-private objects like temp relations. I thought shared inval is, > as the name indicates, for objects "shared" among sessions. If so, sinval > for session-private objects doesn't seem to match the concept. I think the problem here is that it actually is possible for one session to access the temporary objects of another session: rhaas=# create temp table fructivore (a int); CREATE TABLE rhaas=# select 'fructivore'::regclass::oid; oid -------24578 (1 row) Switch windows: rhaas=# select 24578::regclass; regclass ----------------------pg_temp_2.fructivore (1 row) rhaas=# alter table pg_temp_2.fructivore add column b int; ALTER TABLE Now, we could prohibit that specific thing. But at the very least, it has to be possible for one session to drop another session's temporary objects, because autovacuum does it eventually, and superusers will want to do it sooner to shut autovacuum up. So it's hard to reason about whether and to what extent it's safe to not send sinval messages for temporary objects. I think you might be approaching this problem from the wrong end, though. The question in my mind is: why does the StartTransactionCommand() / CommitTransactionCommand() pair in ProcessCatchupEvent() end up writing a commit record? The obvious possibility that occurs to me is that maybe rereading the invalidated catalog entries causes a HOT prune, and maybe there ought to be some way for a transaction that has only done HOT pruning to commit asynchronously, just as we already do for transactions that only modify temporary tables. Or, failing that, maybe there's a way to suppress synchronous commit for this particular transaction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Andres Freund
Date:
On 2014-07-24 11:17:15 -0400, Robert Haas wrote: > I think you might be approaching this problem from the wrong end, > though. Yep. > The question in my mind is: why does the > StartTransactionCommand() / CommitTransactionCommand() pair in > ProcessCatchupEvent() end up writing a commit record? The obvious > possibility that occurs to me is that maybe rereading the invalidated > catalog entries causes a HOT prune, and maybe there ought to be some > way for a transaction that has only done HOT pruning to commit > asynchronously, just as we already do for transactions that only > modify temporary tables. Or, failing that, maybe there's a way to > suppress synchronous commit for this particular transaction. I think we should do what the first paragraph in http://archives.postgresql.org/message-id/20140707155113.GB1136%40alap3.anarazel.de outlined. As Tom says somewhere downthread that requires some code review, but other than that it should get rid of a fair amount of problems. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
From: "Robert Haas" <robertmhaas@gmail.com> > I think the problem here is that it actually is possible for one > session to access the temporary objects of another session: > Now, we could prohibit that specific thing. But at the very least, it > has to be possible for one session to drop another session's temporary > objects, because autovacuum does it eventually, and superusers will > want to do it sooner to shut autovacuum up. So it's hard to reason > about whether and to what extent it's safe to not send sinval messages > for temporary objects. I was a bit surprised to know that one session can access the data of another session's temporary tables. That implenentation nay be complicating the situation -- extra sinval messages. > I think you might be approaching this problem from the wrong end, > though. The question in my mind is: why does the > StartTransactionCommand() / CommitTransactionCommand() pair in > ProcessCatchupEvent() end up writing a commit record? The obvious > possibility that occurs to me is that maybe rereading the invalidated > catalog entries causes a HOT prune, and maybe there ought to be some > way for a transaction that has only done HOT pruning to commit > asynchronously, just as we already do for transactions that only > modify temporary tables. Or, failing that, maybe there's a way to > suppress synchronous commit for this particular transaction. I could figure out what log record was output in the transaction started in ProcessCatchupEvent() by inserting elog() in XLogInsert(). The log record was (RM_STANDBY_ID, XLOG_STANDBY_LOCK). The cause of the hang turned out clear. It was caused as follows: 1. When a transaction commits which used a temporary table created with ON COMMIT DELETE ROWS, the sinval catchup signal (SIGUSR1) was issued from smgrtruncate(). This is because the temporary table is truncated at transaction end. 2. Another session, which is waiting for a client request, receives SIGUSR1. It calls ProcessCatchupEvent(). 3. ProcessCatchupEvent() calls StartTransactionCommand(), emits the XLOG_STANDBY_LOCK WAL record, and then calls CommitTransactionCommand(). 4. It then calls SyncRepWaitForLSN(), which in turn waits on the latch. 5. But the WaitLatch() never returns, because the session is already running inside the SIGUSR1 handler in step 2. WaitLatch() needs SIGUSR1 to complete. I think there is a problem with the latch and SIGUSR1 mechanism. How can we fix this problem? Regards MauMau
From: "Andres Freund" <andres@2ndquadrant.com> > I think we should do what the first paragraph in > http://archives.postgresql.org/message-id/20140707155113.GB1136%40alap3.anarazel.de > outlined. As Tom says somewhere downthread that requires some code > review, but other than that it should get rid of a fair amount of > problems. As mentioned in the mail I've just sent, there seems to be a problem around the latch and/or sinval catchup implementation. Or, is it bad that many things are done in SIGUSR1 handler? If some processing in SIGUSR1 handler requires waiting on a latch, it hangs at WaitLatch(). Currently, the only processing in the backend which requires a latch may be to wait for the sync standby. However, in the future, the latch may be used for more tasks. Another problem is, who knows WaitLatch() can return prematurely (before the actual waited-for event does SetLatch()) due to the SIGUSR1 issued for sinval catchup event? How should we tackle these problem? Regards MauMau
"MauMau" <maumau307@gmail.com> writes: > [ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ] Ugh. One line of thought is that it's pretty unsafe to be doing anything as complicated as transaction start/commit in a signal handler, even one that is sure it's not interrupting anything else. The only excuse ProcessCatchupEvent has for that is that it's "trying to ensure proper cleanup from an error". Perhaps with closer analysis we could convince ourselves that errors thrown from AcceptInvalidationMessages() wouldn't need transaction abort cleanup. For that matter, it's not exactly clear that an error thrown out of there would result in good things even with the transaction infrastructure. Presuming that we're waiting for client input, an error longjmp out of ProcessCatchupEvent could mean taking control away from OpenSSL, and I bet that won't end well. Maybe we should just be doing PG_TRY AcceptInvalidationMessages();PG_CATCH elog(FATAL, "curl up and die"); ProcessIncomingNotify is also using StartTransactionCommand()/CommitTransactionCommand(), so that would need some thought too. Or we could try to get rid of the need to do anything beyond setting a flag in the interrupt handler itself. But I'm afraid that's probably unworkable, especially now that we use SA_RESTART on all signals. Another line of thought is that we've been way too uncritical about shoving different kinds of events into the SIGUSR1 multiplexor. It might be a good idea to separate "high level" interrupts from "low level" ones, using say SIGUSR1 for the former and SIGUSR2 for the latter. However, that doesn't sound very back-patchable, even assuming that we can come up with a clean division. regards, tom lane
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Andres Freund
Date:
On 2014-07-26 11:32:24 -0400, Tom Lane wrote: > "MauMau" <maumau307@gmail.com> writes: > > [ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ] > > Ugh. > > One line of thought is that it's pretty unsafe to be doing anything > as complicated as transaction start/commit in a signal handler, even one > that is sure it's not interrupting anything else. Yea, that's really not nice. > The only excuse > ProcessCatchupEvent has for that is that it's "trying to ensure proper > cleanup from an error". Perhaps with closer analysis we could convince > ourselves that errors thrown from AcceptInvalidationMessages() wouldn't > need transaction abort cleanup. Hm. I'm not convinced that's going to be easy. Wouldn't it be better to move the catchup interrupt processing out of the signal handler? For normal backends we only enable when reading from the client and DoingCommandRead is set. How about setting a variable in the signal handler and doing the actual catchup processing after the recv() returned EINTR? That'd require either renegging on SA_RESTART or using WaitLatchOrSocket() and nonblocking send/recv. I think that'd be a much safer model and after researching it a bit for the idle in transaction timeout thing it doesn't look super hard. Even windows seems to already support everything necessary. > Or we could try to get rid of the need to do anything beyond setting > a flag in the interrupt handler itself. But I'm afraid that's probably > unworkable, especially now that we use SA_RESTART on all signals. Yea :(. Several platforms actually ignore SA_RESTART for send/recv/select/... under some circumstances (notably linux), but it'd probably be hard to get it right for all. I don't think we can continue to use the blocking calls for much longer unless we allow them to be interruptible. But I doubt that change would be backpatchable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > Wouldn't it be better to move the catchup interrupt processing out of > the signal handler? For normal backends we only enable when reading from > the client and DoingCommandRead is set. How about setting a variable in > the signal handler and doing the actual catchup processing after the > recv() returned EINTR? Only it won't. See SA_RESTART. I think turning that off is a nonstarter, as per previous discussions. > That'd require either renegging on SA_RESTART or > using WaitLatchOrSocket() and nonblocking send/recv. Yeah, I was wondering about using WaitLatchOrSocket for client I/O too. We already have a hook that lets us do the actual recv even when using OpenSSL, and in principle that function could do interrupt-service-like functions if it got kicked off the recv(). Anything in this line is going to be a bigger change than I'd want to back-patch, though. Are we OK with not fixing the problem in the back branches? Given the shortage of field complaints, that might be all right. regards, tom lane
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Andres Freund
Date:
On 2014-07-26 13:58:38 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > That'd require either renegging on SA_RESTART or > > using WaitLatchOrSocket() and nonblocking send/recv. > > Yeah, I was wondering about using WaitLatchOrSocket for client I/O too. > We already have a hook that lets us do the actual recv even when using > OpenSSL, and in principle that function could do interrupt-service-like > functions if it got kicked off the recv(). I've started playing with this. Looks clearly worthwile. I think if we do it right we pretty much can get rid of the whole prepare_for_client_read() machinery and handle everything via ProcessInterrupts(). EnableCatchupInterrupt() et al don't really fill me with joy. I'm not yet entirely sure where the interrupt processing should happen, but I guess that'll fall out of the work at some point. The important bit imo is to *not* not do anything but return with BIO_set_retry_*() from my_sock_read/write(). That then allows us to implement stuff like the idle transaction timeout with much fewer problems. I probably won't finish doing this before leaving on holidays, so nobody should hesitate to look themselves if interested. If not, I plan to pick this up again. I think it's a prerequisite to getting rid of the FATAL for recovery conflict interrupts which I really would like to do. > Anything in this line is going to be a bigger change than I'd want to > back-patch, though. Agreed. I don't think it will, but it very well could have performance implications. Besides the obvious risk of bugs... > Are we OK with not fixing the problem in the back > branches? Given the shortage of field complaints, that might be all > right. I'm not really comfortable with that. How about simply flagging a couple contexts to not do the SyncRepWaitForLsn() dance? Possibly just by doing something ugly like SetConfigOption('synchronous_commit', 'off', PGC_INTERNAL, PGC_S_OVERRIDE, GUC_ACTION_LOCAL, true, ERROR)? during startup, inval and similar transaction commands? Not pretty, but it looks simple enough to be backpatchable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
From: "Tom Lane" <tgl@sss.pgh.pa.us> >> [ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ] I must add one thing. After some client processes closed the connection without any hang, their server processes were stuck with a stack trace like this (I'll look for and show the exact stack trace tomorrow): WaitLatchOrSocket SyncRepWaitForLSN CommitTransaction CommitTransactionCommand ProcessCatchupEvent ... shmem_exit proc_exit_prepare proc_exit PostgresMain ... The process appears to be hanging during session termination. So, it's not the problem only during client request wait. > Another line of thought is that we've been way too uncritical about > shoving different kinds of events into the SIGUSR1 multiplexor. > It might be a good idea to separate "high level" interrupts from > "low level" ones, using say SIGUSR1 for the former and SIGUSR2 > for the latter. However, that doesn't sound very back-patchable, > even assuming that we can come up with a clean division. This seems to be one step in the right direction. There are two issues in the current implementation: [Issue 1] [ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ] This is (partly) because the latch wakeup and other processing use the same SIGUSR1 in normal backend, autovacuum launcher/worker, and the background worker with database access. On the other hand, other background daemon processes properly use SIGUSR1 only for latch wakeup, and SIGUSR2 for other tasks. [Issue 2] WaitLatch() returns prematurely due to the sinval catchup signal, even though the target event (e.g. reply from standby) hasn't occurred and called SetLatch() yet. This is because procsignal_sigusr1_handler() always calls latch_sigusr1_handler(). This is probably not related to the cause of the hang. So, as you suggest, I think it would be a good idea to separate latch_sigusr1_handler() call into a different function solely for it like other daemon processes, and leave the rest in procsignal_sigusr1_handler() and rename function to procsignal_sigusr2_handler() or procsignal_handler(). Originally, it's not natural that the current procsignal_sigusr1_handler() contains latch_sigusr1_handler() call, because latch processing is not based on the procsignal mechanism (SetLatch() doesn't call SendProcSignal()). I'll try the fix tomorrow if possible. What kind of problems do you hink of for back-patching? Regards MauMau
From: "MauMau" <maumau307@gmail.com> > I must add one thing. After some client processes closed the connection > without any hang, their server processes were stuck with a stack trace > like this (I'll look for and show the exact stack trace tomorrow): I found two kinds of stack traces: #0 0x0000003199ec488f in poll () from /lib64/libc.so.6 #1 0x0000000000609f24 in WaitLatchOrSocket () #2 0x000000000063ad92 in SyncRepWaitForLSN () #3 0x00000000004ad474 in CommitTransaction () #4 0x00000000004aef53 in CommitTransactionCommand () #5 0x000000000064b547 in shmem_exit () #6 0x000000000064b625 in proc_exit_prepare () #7 0x000000000064b6a8 in proc_exit () #8 0x0000000000668a94 in PostgresMain () #9 0x0000000000617f2c in ServerLoop () #10 0x000000000061ae96 in PostmasterMain () #11 0x00000000005b2ccf in main () #0 0x0000003f4badf258 in poll () from /lib64/libc.so.6 #1 0x0000000000619b94 in WaitLatchOrSocket () #2 0x0000000000640c4c in SyncRepWaitForLSN () #3 0x0000000000491c18 in RecordTransactionCommit () #4 0x0000000000491d98 in CommitTransaction () #5 0x0000000000493135 in CommitTransactionCommand () #6 0x0000000000653fc5 in ProcessCatchupEvent () #7 0x00000000006540ed in HandleCatchupInterrupt () #8 0x00000000006533e3 in procsignal_sigusr1_handler () #9 <signal handler called> #10 0x0000003f4bae96b0 in recv () from /lib64/libc.so.6 #11 0x00000000005b75f6 in secure_read () #12 0x00000000005c223b in pq_recvbuf () #13 0x00000000005c263b in pq_getbyte () #14 0x000000000066e081 in PostgresMain () #15 0x0000000000627d81 in PostmasterMain () #16 0x00000000005c4803 in main () > I'll try the fix tomorrow if possible. What kind of problems do you hink > of for back-patching? I could reproduce the problem with 9.2.8, but have not yet with 9.5dev. I'll try with 9.2.9, and create the fix. Regards MauMau
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Andres Freund
Date:
On 2014-07-26 20:20:05 +0200, Andres Freund wrote: > On 2014-07-26 13:58:38 -0400, Tom Lane wrote: > > > Andres Freund <andres@2ndquadrant.com> writes: > > > That'd require either renegging on SA_RESTART or > > > using WaitLatchOrSocket() and nonblocking send/recv. > > > > Yeah, I was wondering about using WaitLatchOrSocket for client I/O too. > > We already have a hook that lets us do the actual recv even when using > > OpenSSL, and in principle that function could do interrupt-service-like > > functions if it got kicked off the recv(). > > I've started playing with this. Looks clearly worthwile. > > I think if we do it right we pretty much can get rid of the whole > prepare_for_client_read() machinery and handle everything via > ProcessInterrupts(). EnableCatchupInterrupt() et al don't really fill me > with joy. One thing I am wondering about around this is: Why are we only processing catchup events when DoingCommandRead? There's other paths where we can wait for data from the client for a long time. Obviously we don't want to process async.c stuff from inside copy, but I don't see why that's the case for sinval.c. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > One thing I am wondering about around this is: Why are we only > processing catchup events when DoingCommandRead? There's other paths > where we can wait for data from the client for a long time. Obviously we > don't want to process async.c stuff from inside copy, but I don't see > why that's the case for sinval.c. It might be all right to do it during copy, but I'd just as soon treat that as a separate issue. If you merge it into the basic patch then it might be hard to get rid of if there are problems. regards, tom lane
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Andres Freund
Date:
On 2014-07-28 15:29:57 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > One thing I am wondering about around this is: Why are we only > > processing catchup events when DoingCommandRead? There's other paths > > where we can wait for data from the client for a long time. Obviously we > > don't want to process async.c stuff from inside copy, but I don't see > > why that's the case for sinval.c. > > It might be all right to do it during copy, but I'd just as soon treat > that as a separate issue. If you merge it into the basic patch then it > might be hard to get rid of if there are problems. Yea, not planning to merge it. Just wondering to make sure I understand all the implications. Another thing I'm wondering about - also not for the basic patch - is accepting termination while writing to the client. It's rather annoying that we currently don't allow to pg_terminate_backend() when writing to the client. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Andres Freund
Date:
On 2014-07-26 20:20:05 +0200, Andres Freund wrote: > On 2014-07-26 13:58:38 -0400, Tom Lane wrote: > > > Andres Freund <andres@2ndquadrant.com> writes: > > > That'd require either renegging on SA_RESTART or > > > using WaitLatchOrSocket() and nonblocking send/recv. > > > > Yeah, I was wondering about using WaitLatchOrSocket for client I/O too. > > We already have a hook that lets us do the actual recv even when using > > OpenSSL, and in principle that function could do interrupt-service-like > > functions if it got kicked off the recv(). > > I've started playing with this. Looks clearly worthwile. > > I think if we do it right we pretty much can get rid of the whole > prepare_for_client_read() machinery and handle everything via > ProcessInterrupts(). EnableCatchupInterrupt() et al don't really fill me > with joy. > > I'm not yet entirely sure where the interrupt processing should happen, > but I guess that'll fall out of the work at some point. The important > bit imo is to *not* not do anything but return with BIO_set_retry_*() > from my_sock_read/write(). That then allows us to implement stuff like > the idle transaction timeout with much fewer problems. > > I probably won't finish doing this before leaving on holidays, so nobody > should hesitate to look themselves if interested. If not, I plan to pick > this up again. I think it's a prerequisite to getting rid of the FATAL > for recovery conflict interrupts which I really would like to do. I tried to get something reviewable before leaving, but alas, there's far too many edgecases to consider. A good part of it has to do with my decision to always operate the underlying socket in nonblocking mode and do all the writing using latches. I think that's the right decision, because it allows reliable interruptions everywhere and is easier than duplicated codepaths. But I realize that's not guaranteed to be well liked. a) Latches aren't ready early enough. We need to read/write to the socket long before MyProc is initialized. To have sane behaviour during early startup we need to be canceleable there was well. Right now I'm just busy looping in that case, but that's obviously not acceptable. My best idea is to have another latch that we can use during early startup. Say *MyProcLatch. Initially that points to a process local latch and once initialized it points to MyProc->procLatch. b) Latches don't support WL_WRITEABLE without WL_READABLE. I've simply changed the code in both latch implementations to poll for errors in both cases and set both readable/writeable if an error occurs. That seems easy enough. Are there any bigger caveats for changing this? c) There's a couple more or less undocumented pgwin32_waitforsinglesocket() calls in be-secure.c. Afaics they're likely partially already not required and definitely not required after socket handling is baded on latches. d) prepare_for_client_read(), client_read_ended() are now really quite a misnomer. Because interrupts interrupt send/recv appropriately and because we do *not* want to process them inside my_sock_read|write (so we don't recursively enter openssl to send the FATAL to the client) they're not used from within ssl anymore. But on a higher level, just like: prepare_for_client_read(); CHECK_FOR_INTERRUPTS(); client_read_ended(); Not sure if that's the right abstraction for just a: EnableNotifyInterrupt(); EnableCatchupInterrupt(); CHECK_FOR_INTERRUPTS(); DisableNotifyInterrupt(); DisableCatchupInterrupt(); where the Enable* just set a variable so CHECK_FOR_INTERRUPTS can process the events if occuring. Doing it that way allows to throw away *large* chunks of code from sinval.c and async.c because they don't have to care about doing dangerous stuff from signal handlers anymore. But the above needs a new name. Even though it ends up being a bit more work than I anticipated, the result still seems like a significant improvement over the current code. Will get back to this once I'm back (~10 days). Unless somebody else wants to pick this up. I've at attached my *heavily WIP* patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
From: "MauMau" <maumau307@gmail.com> >> I'll try the fix tomorrow if possible. What kind of problems do you hink >> of for back-patching? > > I could reproduce the problem with 9.2.8, but have not yet with 9.5dev. > I'll try with 9.2.9, and create the fix. I could also reproduce the problem with 9.2.9, but I couldn't with 9.5devel. However, I could confirm that the attached patch solves the problem. The patch is based on 9.2.9. To adjust this patch for 9.3 and later, set the background worker's SIGUSR2 handler to procsignal_sigusr2_handler like normal backends. Could you review and commit this? We wish the fix for 9.1 and above. To reproduce the problem, you can do as follows with the attached files. Originally, test.sql had many columns the customer is actually using, but we cannot show the real DDL as it's the customer's asset. $ createdb test $ pgbench -c 20 -j 20 -T 600 -s -n -f test.pgbench Synchronous streaming replication and hot standby need to be used. When I ran this on a 4-core RHEL5 box, the problem arose within a few minutes. pgbench continues to run emitting a lot of messages like below. Over time, the number of normal backends will increases, and dozens of which remain stuck with the stack trace I showed before. CREATE TABLE psql:test.sql:1: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_table_pkey" for table "test_table" I have no idea why 9.5devel didn't show the problem. One difference I noticed is that pgbench didn't output the message of implicit index creation for the primary key. Regards MauMau
Attachment
Please find attached the revised patch. I failed to change SIGUSR1 to SIGUSR2 when sending sinval catchup signal. In addition, I changed wal sender to not receive SIGUSR1/2 from SendProcSignal(). Without this, wal sender will excessively wake up or terminate by the signals. This is an existing problem. Regards MauMau
Attachment
I've tracked down the real root cause. The fix is very simple. Please check the attached one-liner patch. The cause is that the temporary relations are truncated unconditionally regardless of whether they are accessed in the transaction or not. That is, the following sequence of steps result in the hang: 1. A session creates a temporary table with ON COMMIT DELETE ROWS. It adds the temporary table to the list of relations that should be truncated at transaction commit. 2. The session receives a sinval catchup signal (SIGUSR1) from another session. It starts a transaction and processes sinval messages in the SIGUSR1 signal handler. No WAL is output while processing the sinval messages. 3. When the transaction commits, the list of temporary relations are checked to see if they need to be truncated. 4. The temporary table created in step 1 is truncated. To truncate a relation, Access Exclusive lock is acquired on it. When hot standby is used, acquiring an Access Exclusive lock generates a WAL record (RM_STANDBY_ID, XLOG_STANDBY_LOCK). 5. The transaction waits on a latch for a reply from a synchronous standby, because it wrote some WAL. But the latch wait never returns, because the latch needs to receive SIGUSR1 but the SIGUSR1 handler is already in progress from step 2. The correct behavior is for the transaction not to truncate the temporary table in step 4, because the transaction didn't use the temporary table. I confirmed that the fix is already in 9.3 and 9.5devel, so I just copied the code fragment from 9.5devel to 9.2.9. The attached patch is for 9.2.9. I didn't check 9.4 and other versions. Why wasn't the fix applied to 9.2? Finally, I found a very easy way to reproduce the problem: 1. On terminal session 1, start psql and run: CREATE TEMPORARY TABLE t (c int) ON COMMIT DELETE ROWS; Leave the psql session open. 2. On terminal session 2, run: pgbench -c8 -t500 -s1 -n -f test.sql dbname [test.sql] CREATE TEMPORARY TABLE t (c int) ON COMMIT DELETE ROWS; DROP TABLE t; 3. On the psql session on terminal session 1, run any SQL statement. It doesn't reply. The backend is stuck at SyncRepWaitForLSN(). Regards MauMau
Attachment
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Robert Haas
Date:
On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote: > I've tracked down the real root cause. The fix is very simple. Please > check the attached one-liner patch. > > I confirmed that the fix is already in 9.3 and 9.5devel, so I just copied > the code fragment from 9.5devel to 9.2.9. The attached patch is for 9.2.9. > I didn't check 9.4 and other versions. Why wasn't the fix applied to 9.2? It was considered a performance improvement - i.e. a feature - rather than a bug fix, so it was only added to master. That was after the release of 9.2 and before the release of 9.3, so it's in newer branches but not older ones. Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Branch: master Release: REL9_3_BR [c9d7dbacd] 2013-01-29 10:43:33 +0200 Skip truncating ON COMMIT DELETE ROWS temp tables, if the transaction hasn't touched any temporary tables. We could try harder, and keep track of whether we've inserted to any temp tables, rather than accessed them, and whichtemp tables have been inserted to. But this is dead simple, and already covers many interesting scenarios. I'd support back-porting that commit to 9.1 and 9.2 as a fix for this problem. As the commit message says, it's dead simple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote: >> I've tracked down the real root cause. The fix is very simple. Please >> check the attached one-liner patch. > I'd support back-porting that commit to 9.1 and 9.2 as a fix for this > problem. As the commit message says, it's dead simple. While I have no great objection to back-porting Heikki's patch, it seems like a very large stretch to call this a root-cause fix. At best it's band-aiding one symptom in a rather fragile way. regards, tom lane
> Robert Haas <robertmhaas@gmail.com> writes: >> I'd support back-porting that commit to 9.1 and 9.2 as a fix for this >> problem. As the commit message says, it's dead simple. From: "Tom Lane" <tgl@sss.pgh.pa.us> > While I have no great objection to back-porting Heikki's patch, it seems > like a very large stretch to call this a root-cause fix. At best it's > band-aiding one symptom in a rather fragile way. Thank you, Robert san. I'll be waiting for it to be back-ported to the next 9.1/9.2 release. Yes, I think this failure is only one potential symptom caused by the implemnentation mistake -- handling both latch wakeup and other tasks that wait on a latch in the SIGUSR1 handler. Although there may be no such tasks now, I'd like to correct and clean up the implementation as follows to avoid similar problems in the future. I think it's enough to do this only for 9.5. Please correct me before I go deeper in the wrong direction. * The SIGUSR1 handler only does latch wakeup. Any other task is done in other signal handlers such as SIGUSR2. Many daemon postgres processes follow this style, but the normal backend, autovacuum daemons, and background workers don't now. * InitializeLatchSupport() in unix_latch.c calls pqsignal(SIGUSR1, latch_sigusr1_handler). Change the argument of latch_sigusr1_handler() accordingly. * Remove SIGUSR1 handler registration and process-specific SIGUSR1 handler functions from all processes. We can eliminate many SIGUSR1 handler functions which have the same contents. Regards MauMau
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Robert Haas
Date:
On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote: >>> I've tracked down the real root cause. The fix is very simple. Please >>> check the attached one-liner patch. > >> I'd support back-porting that commit to 9.1 and 9.2 as a fix for this >> problem. As the commit message says, it's dead simple. > > While I have no great objection to back-porting Heikki's patch, it seems > like a very large stretch to call this a root-cause fix. At best it's > band-aiding one symptom in a rather fragile way. Yeah, that's true, but I'm not clear that we have another back-patchable fix, so doing something almost-certainly-harmless to alleviate the immediate pain seems worthwhile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Andres Freund
Date:
On 2014-08-12 11:04:00 -0400, Robert Haas wrote: > On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote: > >>> I've tracked down the real root cause. The fix is very simple. Please > >>> check the attached one-liner patch. > > > >> I'd support back-porting that commit to 9.1 and 9.2 as a fix for this > >> problem. As the commit message says, it's dead simple. > > > > While I have no great objection to back-porting Heikki's patch, it seems > > like a very large stretch to call this a root-cause fix. At best it's > > band-aiding one symptom in a rather fragile way. > > Yeah, that's true, but I'm not clear that we have another > back-patchable fix, so doing something almost-certainly-harmless to > alleviate the immediate pain seems worthwhile. Isn't that still leaving the very related issue of waits due to hot pruning open? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Robert Haas
Date:
On Tue, Aug 12, 2014 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-08-12 11:04:00 -0400, Robert Haas wrote: >> On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > Robert Haas <robertmhaas@gmail.com> writes: >> >> On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote: >> >>> I've tracked down the real root cause. The fix is very simple. Please >> >>> check the attached one-liner patch. >> > >> >> I'd support back-porting that commit to 9.1 and 9.2 as a fix for this >> >> problem. As the commit message says, it's dead simple. >> > >> > While I have no great objection to back-porting Heikki's patch, it seems >> > like a very large stretch to call this a root-cause fix. At best it's >> > band-aiding one symptom in a rather fragile way. >> >> Yeah, that's true, but I'm not clear that we have another >> back-patchable fix, so doing something almost-certainly-harmless to >> alleviate the immediate pain seems worthwhile. > > Isn't that still leaving the very related issue of waits due to hot > pruning open? Yes. Do you have a back-patchable solution for that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Andres Freund
Date:
On 2014-08-12 11:56:41 -0400, Robert Haas wrote: > On Tue, Aug 12, 2014 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2014-08-12 11:04:00 -0400, Robert Haas wrote: > >> On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> > Robert Haas <robertmhaas@gmail.com> writes: > >> >> On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote: > >> >>> I've tracked down the real root cause. The fix is very simple. Please > >> >>> check the attached one-liner patch. > >> > > >> >> I'd support back-porting that commit to 9.1 and 9.2 as a fix for this > >> >> problem. As the commit message says, it's dead simple. > >> > > >> > While I have no great objection to back-porting Heikki's patch, it seems > >> > like a very large stretch to call this a root-cause fix. At best it's > >> > band-aiding one symptom in a rather fragile way. > >> > >> Yeah, that's true, but I'm not clear that we have another > >> back-patchable fix, so doing something almost-certainly-harmless to > >> alleviate the immediate pain seems worthwhile. > > > > Isn't that still leaving the very related issue of waits due to hot > > pruning open? > > Yes. Do you have a back-patchable solution for that? The easiest thing I can think of is sprinkling a few SetConfigOption('synchronous_commit', 'off', PGC_INTERNAL, PGC_S_OVERRIDE, GUC_ACTION_LOCAL,true, ERROR); in some strategic places. From a quick look: * all of autovacuum * locally in ProcessCompletedNotifies * locally in ProcessIncomingNotify * locally in ProcessCatchupEvent * locally in InitPostgres Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-08-12 11:56:41 -0400, Robert Haas wrote: >> Yes. Do you have a back-patchable solution for that? > The easiest thing I can think of is sprinkling a few > SetConfigOption('synchronous_commit', 'off', > PGC_INTERNAL, PGC_S_OVERRIDE, > GUC_ACTION_LOCAL, true, ERROR); This still seems to me to be applying a band-aid that covers over some symptoms; it's not dealing with the root cause that we've overloaded the signal handling mechanism too much. What reason is there to think that there are no other symptoms of that? regards, tom lane
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Andres Freund
Date:
On 2014-08-12 13:11:55 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-08-12 11:56:41 -0400, Robert Haas wrote: > >> Yes. Do you have a back-patchable solution for that? > > > The easiest thing I can think of is sprinkling a few > > SetConfigOption('synchronous_commit', 'off', > > PGC_INTERNAL, PGC_S_OVERRIDE, > > GUC_ACTION_LOCAL, true, ERROR); > > This still seems to me to be applying a band-aid that covers over some > symptoms; it's not dealing with the root cause that we've overloaded > the signal handling mechanism too much. What reason is there to think > that there are no other symptoms of that? I'm not arguing against fixing that. I think we need to do both, although I am wary of fixing the signal handling in the back branches. Fixing the signal handling won't get rid of the problem that one e.g. might not be able to log in anymore if all synchronous standbys are down and login caused hot pruning. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
From
Andres Freund
Date:
Hi, On 2014-07-26 18:16:01 +0200, Andres Freund wrote: > On 2014-07-26 11:32:24 -0400, Tom Lane wrote: > > "MauMau" <maumau307@gmail.com> writes: > > > [ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ] > > > > Ugh. > > > > One line of thought is that it's pretty unsafe to be doing anything > > as complicated as transaction start/commit in a signal handler, even one > > that is sure it's not interrupting anything else. > > Yea, that's really not nice. MauMau, we don't do this anymore. Could you verify that the issue is fixed for you? I'd completely forgotten that this thread made me work on moving everything complicated out of signal handlers... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services