Thread: Add PortalDrop in exec_execute_message
Hi, hackers. I've been playing with "autoprepared" patch, and have got isolation "freeze-the-dead" test stuck on first VACUUM FREEZE statement. After some research I found issue is reproduced with unmodified master branch if extended protocol used. I've prepared ruby script for demonstration (cause ruby-pg has simple interface to PQsendQueryParams). Further investigation showed it happens due to portal is not dropped inside of exec_execute_message, and it is kept in third session till COMMIT is called. Therefore heap page remains pinned, and VACUUM FREEZE became locked inside LockBufferForCleanup. It seems that it is usually invisible to common users since either: - command is called as standalone and then transaction is closed immediately, - next PQsendQueryParams will initiate another unnamed portal using CreatePortal("", true, true) and this action will drop previous one. But "freeze-the-dead" remains locked since third session could not send COMMIT until VACUUM FULL finished. I propose to add PortalDrop at the 'if (completed)' branch of exec_execute_message. --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2209,6 +2209,8 @@ exec_execute_message(const char *portal_name, long max_rows) if (completed) { + PortalDrop(portal, false); + if (is_xact_command) { With this change 'make check-world' runs without flaws (at least on empty configure with enable-cassert and enable-tap-tests). There is small chance applications exist which abuses seekable portals with 'execute' protocol message so not every completed portal can be safely dropped. But I believe there is some sane conditions that cover common case. For example, isn't empty name check is enough? Can client reset or seek portal with empty name? regards, Sokolov Yura aka funny_falcon
Attachment
Yura Sokolov <y.sokolov@postgrespro.ru> writes: > I propose to add PortalDrop at the 'if (completed)' branch of > exec_execute_message. This violates our wire protocol specification, which specifically says If successfully created, a named portal object lasts till the end of the current transaction, unless explicitly destroyed. An unnamed portal is destroyed at the end of the transaction, or as soon as the next Bind statement specifying the unnamed portal as destination is issued. (Note that a simple Query message also destroys the unnamed portal.) I'm inclined to think that your complaint would be better handled by having the client send a portal-close command, if it's not going to do something else immediately. regards, tom lane
Tom Lane писал 2021-05-21 21:23: > Yura Sokolov <y.sokolov@postgrespro.ru> writes: >> I propose to add PortalDrop at the 'if (completed)' branch of >> exec_execute_message. > > This violates our wire protocol specification, which > specifically says > > If successfully created, a named portal object lasts till the end > of > the current transaction, unless explicitly destroyed. An unnamed > portal is destroyed at the end of the transaction, or as soon as > the > next Bind statement specifying the unnamed portal as destination is > issued. (Note that a simple Query message also destroys the unnamed > portal.) > > I'm inclined to think that your complaint would be better handled > by having the client send a portal-close command, if it's not > going to do something else immediately. I thought about it as well. Then, if I understand correctly, PQsendQueryGuts and PQsendQueryInternal in pipeline mode should send "close portal" (CP) message after "execute" message, right? regards, Sokolov Yura
On 2021-May-25, Yura Sokolov wrote: > Tom Lane писал 2021-05-21 21:23: > > Yura Sokolov <y.sokolov@postgrespro.ru> writes: > > > I propose to add PortalDrop at the 'if (completed)' branch of > > > exec_execute_message. > > > > This violates our wire protocol specification, which > > specifically says > > > > If successfully created, a named portal object lasts till the end of > > the current transaction, unless explicitly destroyed. An unnamed > > portal is destroyed at the end of the transaction, or as soon as the > > next Bind statement specifying the unnamed portal as destination is > > issued. (Note that a simple Query message also destroys the unnamed > > portal.) > > > > I'm inclined to think that your complaint would be better handled > > by having the client send a portal-close command, if it's not > > going to do something else immediately. > > I thought about it as well. Then, if I understand correctly, > PQsendQueryGuts and PQsendQueryInternal in pipeline mode should send > "close portal" (CP) message after "execute" message, right? I don't think they should do that. The portal remains open, and the libpq interface does that. The portal gets closed at end of transaction without the need for any client message. I think if the client wanted to close the portal ahead of time, it would need a new libpq entry point to send the message to do that. (I didn't add a Close Portal message to PQsendQueryInternal in pipeline mode precisely because there is no such message in PQsendQueryGuts. I think it would be wrong to unconditionally add a Close Portal message to any of those places.) -- Álvaro Herrera 39°49'30"S 73°17'W
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > (I didn't add a Close Portal message to PQsendQueryInternal in pipeline > mode precisely because there is no such message in PQsendQueryGuts. > I think it would be wrong to unconditionally add a Close Portal message > to any of those places.) Yeah, I'm not very comfortable with having libpq take it on itself to do that, either. Looking back at the original complaint, it seems like it'd be fair to wonder why we're still holding a page pin in a supposedly completed executor run. Maybe the right fix is somewhere in the executor scan logic. regards, tom lane
Alvaro Herrera писал 2021-05-26 23:59: > On 2021-May-25, Yura Sokolov wrote: > >> Tom Lane писал 2021-05-21 21:23: >> > Yura Sokolov <y.sokolov@postgrespro.ru> writes: >> > > I propose to add PortalDrop at the 'if (completed)' branch of >> > > exec_execute_message. >> > >> > This violates our wire protocol specification, which >> > specifically says >> > >> > If successfully created, a named portal object lasts till the end of >> > the current transaction, unless explicitly destroyed. An unnamed >> > portal is destroyed at the end of the transaction, or as soon as the >> > next Bind statement specifying the unnamed portal as destination is >> > issued. (Note that a simple Query message also destroys the unnamed >> > portal.) >> > >> > I'm inclined to think that your complaint would be better handled >> > by having the client send a portal-close command, if it's not >> > going to do something else immediately. >> >> I thought about it as well. Then, if I understand correctly, >> PQsendQueryGuts and PQsendQueryInternal in pipeline mode should send >> "close portal" (CP) message after "execute" message, right? > > I don't think they should do that. The portal remains open, and the > libpq interface does that. The portal gets closed at end of > transaction > without the need for any client message. I think if the client wanted > to close the portal ahead of time, it would need a new libpq entry > point > to send the message to do that. - PQsendQuery issues Query message, and exec_simple_query closes its portal. - people doesn't expect PQsendQueryParams to be different from PQsendQuery aside of parameter sending. The fact that the portal remains open is a significant, unexpected and undesired difference. - PQsendQueryGuts is used in PQsendQueryParams and PQsendQueryPrepared. It is always sends empty portal name and always "send me all rows" limit (zero). Both usages are certainly to just perform query and certainly no one expects portal remains open. > (I didn't add a Close Portal message to PQsendQueryInternal in pipeline > mode precisely because there is no such message in PQsendQueryGuts. But PQsendQueryInternal should replicate behavior of PQsendQuery and not PQsendQueryParams. Despite it has to use new protocol, it should be indistinguishable to user, therefore portal should be closed. > I think it would be wrong to unconditionally add a Close Portal message > to any of those places.) Why? If you foresee problems, please share your mind. regards, Sokolov Yura aka funny_falcon
Tom Lane wrote 2021-05-27 00:19: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> (I didn't add a Close Portal message to PQsendQueryInternal in >> pipeline >> mode precisely because there is no such message in PQsendQueryGuts. >> I think it would be wrong to unconditionally add a Close Portal >> message >> to any of those places.) > > Yeah, I'm not very comfortable with having libpq take it on itself > to do that, either. But... Tom Lane wrote 2021-05-21 21:23: > I'm inclined to think that your complaint would be better handled > by having the client send a portal-close command, if it's not > going to do something else immediately. And given PQsendQueryParams should not be different from PQsendQuery (aside of parameters sending) why shouldn't it close its portal immediately, like it happens in exec_simple_query ? I really doubt user of PQsendQueryPrepared is aware of portal as well since it is also unnamed and also exhausted (because PQsendQueryGuts always sends "send me all rows" limit). And why PQsendQueryInternal should behave differently in pipelined and not pipelined mode? It closes portal in not pipelined mode, and will not close portal of last query in pipelined mode (inside of transaction). > Looking back at the original complaint, it seems like it'd be fair to > wonder why we're still holding a page pin in a supposedly completed > executor run. Maybe the right fix is somewhere in the executor > scan logic. Perhaps because query is simple and portal is created as seek-able? > > regards, tom lane regards Yura Sokolov
On 2021-May-27, Yura Sokolov wrote: > Alvaro Herrera писал 2021-05-26 23:59: > > I don't think they should do that. The portal remains open, and the > > libpq interface does that. The portal gets closed at end of transaction > > without the need for any client message. I think if the client wanted > > to close the portal ahead of time, it would need a new libpq entry point > > to send the message to do that. > > - PQsendQuery issues Query message, and exec_simple_query closes its > portal. > - people doesn't expect PQsendQueryParams to be different from > PQsendQuery aside of parameter sending. The fact that the portal > remains open is a significant, unexpected and undesired difference. > - PQsendQueryGuts is used in PQsendQueryParams and PQsendQueryPrepared. > It is always sends empty portal name and always "send me all rows" > limit (zero). Both usages are certainly to just perform query and > certainly no one expects portal remains open. Thinking about it some more, Yura's argument about PQsendQuery does make sense -- since what we're doing is replacing the use of a 'Q' message just because we can't use it when in pipeline mode, then it is reasonable to think that the replacement ought to have the same behavior. Upon receipt of a 'Q' message, the portal is closed automatically, and ISTM that that behavior should be preserved. That change would not solve the problem he complains about, because IIUC his framework is using PQsendQueryPrepared, which I'm not proposing to change. It just removes the other discrepancy that was discussed in the thread. The attached patch does it. Any opinions? -- Álvaro Herrera Valdivia, Chile "[PostgreSQL] is a great group; in my opinion it is THE best open source development communities in existence anywhere." (Lamar Owen)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > The attached patch does it. Any opinions? My opinion is there's no patch here. regards, tom lane
On 2021-Jun-07, Alvaro Herrera wrote: > The attached patch does it. Any opinions? Eh, really attached. -- Álvaro Herrera 39°49'30"S 73°17'W "No es bueno caminar con un hombre muerto"
Attachment
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2021-Jun-07, Alvaro Herrera wrote: >> The attached patch does it. Any opinions? > Eh, really attached. No particular objection. I'm not sure this will behave quite the same as simple-Query in error cases, but probably it's close enough. I'm still wondering though why Yura is observing resources remaining held by an executed-to-completion Portal. I think investigating that might be more useful than tinkering with pipeline mode. regards, tom lane
I wrote: > I'm still wondering though why Yura is observing resources remaining > held by an executed-to-completion Portal. I think investigating that > might be more useful than tinkering with pipeline mode. I got a chance to look into this finally. The lens I've been looking at this through is "why are we still holding any buffer pins when ExecutorRun finishes?". Normal table scan nodes won't do that. It turns out that the problem is specific to SELECT FOR UPDATE, and it happens because nodeLockRows is not careful to shut down the EvalPlanQual mechanism it uses before returning NULL at the end of a scan. If EPQ has been fired, it'll be holding a tuple slot referencing whatever tuple it was last asked about. The attached trivial patch seems to take care of the issue nicely, while adding little if any overhead. (A repeat call to EvalPlanQualEnd doesn't do much.) regards, tom lane diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c index b2e5c30079..7583973f4a 100644 --- a/src/backend/executor/nodeLockRows.c +++ b/src/backend/executor/nodeLockRows.c @@ -59,7 +59,11 @@ lnext: slot = ExecProcNode(outerPlan); if (TupIsNull(slot)) + { + /* Release any resources held by EPQ mechanism before exiting */ + EvalPlanQualEnd(&node->lr_epqstate); return NULL; + } /* We don't need EvalPlanQual unless we get updated tuple version(s) */ epq_needed = false; @@ -381,6 +385,7 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags) void ExecEndLockRows(LockRowsState *node) { + /* We may have shut down EPQ already, but no harm in another call */ EvalPlanQualEnd(&node->lr_epqstate); ExecEndNode(outerPlanState(node)); }
I wrote: > It turns out that the problem is specific to SELECT FOR UPDATE, and > it happens because nodeLockRows is not careful to shut down the > EvalPlanQual mechanism it uses before returning NULL at the end of > a scan. If EPQ has been fired, it'll be holding a tuple slot > referencing whatever tuple it was last asked about. The attached > trivial patch seems to take care of the issue nicely, while adding > little if any overhead. (A repeat call to EvalPlanQualEnd doesn't > do much.) BTW, to be clear: I think Alvaro's change is also necessary. If libpq is going to silently do something different in pipeline mode than it does in normal mode, it should strive to minimize the semantic difference. exec_simple_query includes a PortalDrop, so we'd best do the same in pipeline mode. But this LockRows misbehavior would be visible in other operating modes anyway. regards, tom lane
Alvaro Herrera wrote 2021-06-08 00:07: > On 2021-May-27, Yura Sokolov wrote: > >> Alvaro Herrera писал 2021-05-26 23:59: > >> > I don't think they should do that. The portal remains open, and the >> > libpq interface does that. The portal gets closed at end of transaction >> > without the need for any client message. I think if the client wanted >> > to close the portal ahead of time, it would need a new libpq entry point >> > to send the message to do that. >> >> - PQsendQuery issues Query message, and exec_simple_query closes its >> portal. >> - people doesn't expect PQsendQueryParams to be different from >> PQsendQuery aside of parameter sending. The fact that the portal >> remains open is a significant, unexpected and undesired difference. >> - PQsendQueryGuts is used in PQsendQueryParams and >> PQsendQueryPrepared. >> It is always sends empty portal name and always "send me all rows" >> limit (zero). Both usages are certainly to just perform query and >> certainly no one expects portal remains open. > > Thinking about it some more, Yura's argument about PQsendQuery does > make > sense -- since what we're doing is replacing the use of a 'Q' message > just because we can't use it when in pipeline mode, then it is > reasonable to think that the replacement ought to have the same > behavior. Upon receipt of a 'Q' message, the portal is closed > automatically, and ISTM that that behavior should be preserved. > > That change would not solve the problem he complains about, because > IIUC > his framework is using PQsendQueryPrepared, which I'm not proposing to > change. It just removes the other discrepancy that was discussed in > the > thread. > > The attached patch does it. Any opinions? I'm propose to change PQsendQueryParams and PQsendQueryPrepared (through change of PQsendQueryGuts) since they both has semantic "execute unnamed portal till the end and send me all rows". Extended protocol were introduced by Tom Lane on 2003-05-05 in 16503e6fa4a13051debe09698b6db9ce0d509af8 This commit already has Close ('C') message. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=16503e6fa4a13051debe09698b6db9ce0d509af8 libpq adoption of extended protocol were made by Tom month later on 2003-06-23 in efc3a25bb02ada63158fe7006673518b005261ba and there is already no Close message in PQsendQueryParams. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=efc3a25bb02ada63158fe7006673518b005261ba I didn't found any relevant discussion in pgsql-hackers on May and June 2003. This makes me think, Close message were intended to be used but simply forgotten when libpq patch were made. Tom, could I be right? regards, Yura.
Yura Sokolov <y.sokolov@postgrespro.ru> writes: > This makes me think, Close message were intended to be used > but simply forgotten when libpq patch were made. > Tom, could I be right? You could argue all day about what the intentions were nearly twenty years ago. But the facts on the ground are that we don't issue Close in those places, and changing it now would be a de facto protocol change for applications. So I'm a hard -1 on these proposals. (Alvaro's proposed change isn't a protocol break, since pipeline mode hasn't shipped yet. It's trying to make some brand new code act more like old code, which seems like a fine idea.) I think that the actual problem here has been resolved in commit bb4aed46a. Perhaps we should reconsider my decision not to back-patch that. Unlike a protocol change, that one could possibly be sane to back-patch. I didn't think it was worth the trouble and risk, but maybe there's a case for it. regards, tom lane