Thread: BUG #5066: plperl issues with perl_destruct() and END blocks
The following bug has been logged online: Bug reference: 5066 Logged by: Tim Bunce Email address: Tim.Bunce@pobox.com PostgreSQL version: 8.4.1 Operating system: darwin Description: plperl issues with perl_destruct() and END blocks Details: The plperl implementation doesn't call perl_destruct() during server shutdown. So any resources held by references, in %_SHARED for example, are not properly freed. The perl interpreter never gets a chance to cleanup, it's simply discarded. Related to the above, plperl should also set PL_exit_flags |= PERL_EXIT_DESTRUCT_END. Currently any END blocks defined during initialization get executed at initialization (just before perl_run() returns). Any END blocks defined later never get run. Setting PL_exit_flags |= PERL_EXIT_DESTRUCT_END in plperl_init_interp() and calling perl_destruct() will fix the issue. The timing of the perl_destruct() call (i.e., early or late in the shutdown sequence) doesn't matter much. You might want to make the spi_* functions return an error if there's a shutdown in progress.
"Tim Bunce" <Tim.Bunce@pobox.com> writes: > The plperl implementation doesn't call perl_destruct() during server > shutdown. Who cares? The process is going away anyway. regards, tom lane
On Sat, Sep 19, 2009 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Tim Bunce" <Tim.Bunce@pobox.com> writes: >> The plperl implementation doesn't call perl_destruct() during server >> shutdown. > > Who cares? =A0The process is going away anyway. END {} blocks can execute arbitrary code. Perl users will expect them to be executed. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Sep 19, 2009 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Tim Bunce" <Tim.Bunce@pobox.com> writes: >>> The plperl implementation doesn't call perl_destruct() during server >>> shutdown. >> >> Who cares? The process is going away anyway. > END {} blocks can execute arbitrary code. Perl users will expect them > to be executed. [ shrug... ] As a database geek I find the lack of guarantees about that to be entirely unsatisfying. What exactly are you going to put in that block? If it's actually important, are you going to trust a mechanism that doesn't have any crash safeness? regards, tom lane
On Sat, Sep 19, 2009 at 11:43:26PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Sat, Sep 19, 2009 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> "Tim Bunce" <Tim.Bunce@pobox.com> writes: > >>> The plperl implementation doesn't call perl_destruct() during server > >>> shutdown. > >> > >> Who cares? The process is going away anyway. > > > END {} blocks can execute arbitrary code. Perl users will expect them > > to be executed. > > [ shrug... ] As a database geek I find the lack of guarantees about > that to be entirely unsatisfying. What exactly are you going to put > in that block? If it's actually important, are you going to trust > a mechanism that doesn't have any crash safeness? Can you expand on what you mean by 'guarantees' and 'crash safeness'? I my particular case I'm trying to enable Devel::NYTProf, the perl source code performance profiler, to profile PL/Perl code. See http://www.slideshare.net/Tim.Bunce/develnytprof-200907 (starting NYTProf uses an END block to finish up the profile and write final details to the data file. One of the first problems I ran into was that Postgres was executing the END block before the first plperl sub was even executed. Very counter intuitive. Since NYTProf is implemented in C (XS) I've worked around that problem by adding an option to enable PL_exit_flags |= PERL_EXIT_DESTRUCT_END. But because postgres doesn't call perl_destruct() the problem has just moved from END blocks being called too early to END blocks not being called at all. The effect is that the profile data file is unterminated and so corrupt and unusable. To finish the profiling users currently have to execute a SQL statement to trigger a plperl sub that calls an NYTProf sub that finializes the profile. Ideally I'd like users to be able to finish the profiling cleanly with a shutdown (to then restart with profiling disabled). Calling perl_destruct() during shutdown would fix that. Tim. p.s. As a random data point, google code search finds about 7000 perl modules containing an END block: http://www.google.com/codesearch?as_q=%5EEND%5C+%7B&btnG=Search+Code&hl=en&as_lang=perlas_filename=%5C.pm%24&as_case=y
There's a definitional problem here however. When should we call the destructor? My impression is that it should happen when the calling query terminates, not when the backend shuts down. I'm sure this will cause other issues -- for example %_SHARED will be destroyed way too early. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Sun, Sep 20, 2009 at 10:00:01PM -0400, Alvaro Herrera wrote: > There's a definitional problem here however. When should we call the > destructor? My impression is that it should happen when the calling > query terminates, not when the backend shuts down. I'm sure this will > cause other issues -- for example %_SHARED will be destroyed way too > early. The perlmod man page says: An "END" code block is executed as late as possible, that is, after perl has finished running the program and just before the interpreter is being exited, even if it is exiting as a result of a die() function. [...] Note that "END" code blocks are not executed at the end of a string "eval()": if any "END" code blocks are created in a string "eval()", they will be executed just as any other "END" code block of that packâ age in LIFO order just before the interpreter is being exited. so executing at the end of query, a transaction, or a session would be wrong. They should execute "as late as possible", "just before the interpreter being exited". Tim.
On Mon, Sep 21, 2009 at 11:05:43AM +0100, Tim Bunce wrote: > On Sun, Sep 20, 2009 at 10:00:01PM -0400, Alvaro Herrera wrote: > > There's a definitional problem here however. When should we call the > > destructor? My impression is that it should happen when the calling > > query terminates, not when the backend shuts down. I'm sure this will > > cause other issues -- for example %_SHARED will be destroyed way too > > early. > > The perlmod man page says: > > An "END" code block is executed as late as possible, that is, after > perl has finished running the program and just before the interpreter > is being exited, even if it is exiting as a result of a die() function. > [...] > Note that "END" code blocks are not executed at the end of a string > "eval()": if any "END" code blocks are created in a string "eval()", > they will be executed just as any other "END" code block of that packâ > age in LIFO order just before the interpreter is being exited. > > so executing at the end of query, a transaction, or a session would be > wrong. They should execute "as late as possible", "just before the > interpreter being exited". Taken literally, that would mean, "the last action before the backend exits," but at least to me, that sounds troubling for the same reasons that "end of transaction" triggers do. What happens when there are two different END blocks in a session? With connection poolers, backends can last quite awhile. Is it OK for the END block to run hours after the rest of the code? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter escribió: > Taken literally, that would mean, "the last action before the backend > exits," but at least to me, that sounds troubling for the same reasons > that "end of transaction" triggers do. What happens when there are > two different END blocks in a session? The manual is clear that both are executed. > With connection poolers, backends can last quite awhile. Is it OK for > the END block to run hours after the rest of the code? This is an interesting point -- should END blocks be called on DISCARD ALL? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote: > David Fetter escribió: > > > Taken literally, that would mean, "the last action before the > > backend exits," but at least to me, that sounds troubling for the > > same reasons that "end of transaction" triggers do. What happens > > when there are two different END blocks in a session? > > The manual is clear that both are executed. So it is, but does order matter, and if so, how would PostgreSQL know? > > With connection poolers, backends can last quite awhile. Is it OK > > for the END block to run hours after the rest of the code? > > This is an interesting point -- should END blocks be called on > DISCARD ALL? ENOCLUE Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter escribió: > On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote: > > David Fetter escribió: > > > > > Taken literally, that would mean, "the last action before the > > > backend exits," but at least to me, that sounds troubling for the > > > same reasons that "end of transaction" triggers do. What happens > > > when there are two different END blocks in a session? > > > > The manual is clear that both are executed. > > So it is, but does order matter, and if so, how would PostgreSQL know? The fine manual saith You may have multiple "END" blocks within a file--they will execute in reverse order of definition; that is: last in, first out (LIFO). But then, why would we care? We just call the destructor and Perl ensures that the blocks are called in the right order. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, Sep 21, 2009 at 12:06 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > David Fetter escribi=F3: > >> Taken literally, that would mean, "the last action before the backend >> exits," but at least to me, that sounds troubling for the same reasons >> that "end of transaction" triggers do. =A0What happens when there are >> two different END blocks in a session? > > The manual is clear that both are executed. > >> With connection poolers, backends can last quite awhile. =A0Is it OK for >> the END block to run hours after the rest of the code? > > This is an interesting point -- should END blocks be called on DISCARD AL= L? It seems pretty reasonable that it would. The intention of DISCARD ALL is to completely reset the entire session. ...Robert
On Mon, Sep 21, 2009 at 01:06:17PM -0400, Alvaro Herrera wrote: > David Fetter escribió: > > On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote: > > > David Fetter escribió: > > > > > > > Taken literally, that would mean, "the last action before the > > > > backend exits," but at least to me, that sounds troubling for > > > > the same reasons that "end of transaction" triggers do. What > > > > happens when there are two different END blocks in a session? > > > > > > The manual is clear that both are executed. > > > > So it is, but does order matter, and if so, how would PostgreSQL > > know? > > The fine manual saith > > You may have multiple "END" blocks within a file--they will > execute in reverse order of definition; that is: last in, first > out (LIFO). > > But then, why would we care? We just call the destructor and Perl > ensures that the blocks are called in the right order. This is not quite what I meant. Let's say we have two or more different PL/Perl functions executed over the course of a backend. Which one's END block gets executed last? Do we need to warn people about this? Generate a WARNING, even? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, Sep 21, 2009 at 2:17 PM, David Fetter <david@fetter.org> wrote: > On Mon, Sep 21, 2009 at 01:06:17PM -0400, Alvaro Herrera wrote: >> David Fetter escribi=F3: >> > On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote: >> > > David Fetter escribi=F3: >> > > >> > > > Taken literally, that would mean, "the last action before the >> > > > backend exits," but at least to me, that sounds troubling for >> > > > the same reasons that "end of transaction" triggers do. =A0What >> > > > happens when there are two different END blocks in a session? >> > > >> > > The manual is clear that both are executed. >> > >> > So it is, but does order matter, and if so, how would PostgreSQL >> > know? >> >> The fine manual saith >> >> =A0 =A0 =A0 You may have multiple "END" blocks within a file--they will >> =A0 =A0 =A0 execute in reverse order of definition; that is: last in, fi= rst >> =A0 =A0 =A0 out (LIFO). >> >> But then, why would we care? =A0We just call the destructor and Perl >> ensures that the blocks are called in the right order. > > This is not quite what I meant. =A0Let's say we have two or more different > PL/Perl functions executed over the course of a backend. =A0Which one's > END block gets executed last? =A0Do we need to warn people about this? > Generate a WARNING, even? This is a feature of the Perl language. I don't think it's our job to second-guess the language design, however good or bad it may be. As a long-time Perl programmer, I would certainly say that if you are counting on the execution ordering of your END blocks, you are probably playing with fire and likely ought to rethink your application design, because there are all kinds of ways this could fail spectacularly as a result of apparently innocuous application changes (like, say, alphabetizing the list of "use" declarations in some package). But that's true not only with PL/perl but with just plain old perl, and I don't see that it's substantially more dangerous here than anywhere else. ...Robert
On Mon, Sep 21, 2009 at 02:28:11PM -0400, Robert Haas wrote: > On Mon, Sep 21, 2009 at 2:17 PM, David Fetter <david@fetter.org> wrote: > > On Mon, Sep 21, 2009 at 01:06:17PM -0400, Alvaro Herrera wrote: > >> David Fetter escribió: > >> > On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote: > >> > > David Fetter escribió: > >> > > > >> > > > Taken literally, that would mean, "the last action before the > >> > > > backend exits," but at least to me, that sounds troubling for > >> > > > the same reasons that "end of transaction" triggers do. What > >> > > > happens when there are two different END blocks in a session? > >> > > > >> > > The manual is clear that both are executed. > >> > > >> > So it is, but does order matter, and if so, how would PostgreSQL > >> > know? > >> > >> The fine manual saith > >> > >> You may have multiple "END" blocks within a file--they will > >> execute in reverse order of definition; that is: last in, first > >> out (LIFO). > >> > >> But then, why would we care? We just call the destructor and Perl > >> ensures that the blocks are called in the right order. > > > > This is not quite what I meant. Let's say we have two or more different > > PL/Perl functions executed over the course of a backend. Which one's > > END block gets executed last? Do we need to warn people about this? > > Generate a WARNING, even? > > This is a feature of the Perl language. I don't think it's our job to > second-guess the language design, however good or bad it may be. As a > long-time Perl programmer, I would certainly say that if you are > counting on the execution ordering of your END blocks, you are > probably playing with fire and likely ought to rethink your > application design, because there are all kinds of ways this could > fail spectacularly as a result of apparently innocuous application > changes (like, say, alphabetizing the list of "use" declarations in > some package). But that's true not only with PL/perl but with just > plain old perl, and I don't see that it's substantially more dangerous > here than anywhere else. OK, we've considered it and decided it's people's own foot-gun :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter escribió: > On Mon, Sep 21, 2009 at 01:06:17PM -0400, Alvaro Herrera wrote: > > The fine manual saith > > > > You may have multiple "END" blocks within a file--they will > > execute in reverse order of definition; that is: last in, first > > out (LIFO). > > > > But then, why would we care? We just call the destructor and Perl > > ensures that the blocks are called in the right order. > > This is not quite what I meant. Let's say we have two or more different > PL/Perl functions executed over the course of a backend. Which one's > END block gets executed last? I think the manual is quite clear on this point. It talks about "files" which we don't have, but other than that it doesn't seem like there shouldn't be any problem. Now that I think about it, this only affects loaded modules, not the plperl functions themselves, right? I mean, you can't define an END block inside a function. > Do we need to warn people about this? I don't see why not -- in the docs, of course. > Generate a WARNING, even? "Log spam" anyone? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Sep 21, 2009 at 3:08 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > David Fetter escribi=F3: >> On Mon, Sep 21, 2009 at 01:06:17PM -0400, Alvaro Herrera wrote: > >> > The fine manual saith >> > >> > =A0 =A0 You may have multiple "END" blocks within a file--they will >> > =A0 =A0 execute in reverse order of definition; that is: last in, first >> > =A0 =A0 out (LIFO). >> > >> > But then, why would we care? =A0We just call the destructor and Perl >> > ensures that the blocks are called in the right order. >> >> This is not quite what I meant. =A0Let's say we have two or more differe= nt >> PL/Perl functions executed over the course of a backend. =A0Which one's >> END block gets executed last? > > I think the manual is quite clear on this point. =A0It talks about "files" > which we don't have, but other than that it doesn't seem like there > shouldn't be any problem. > > Now that I think about it, this only affects loaded modules, not the > plperl functions themselves, right? =A0I mean, you can't define an END > block inside a function. You might think that, but it turns out the world of Perl is crazier than the ordinary mind can fathom. $ perl -e 'sub foo { END { print "hi\n" } }' hi ...Robert
David Fetter <david@fetter.org> writes: > On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote: >>> With connection poolers, backends can last quite awhile. Is it OK >>> for the END block to run hours after the rest of the code? >> >> This is an interesting point -- should END blocks be called on >> DISCARD ALL? > ENOCLUE And in the same vein, should they be called inside a transaction, or not? What if they fail? I don't see any reason whatsoever that we couldn't just document this as a Perl feature not supported in plperl. If you do something like creating threads inside plperl, we're going to give you the raspberry when you complain about it breaking. END blocks can perfectly well go into the same category. regards, tom lane
On Mon, Sep 21, 2009 at 07:30:51PM -0400, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > > On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote: > >>> With connection poolers, backends can last quite awhile. Is it OK > >>> for the END block to run hours after the rest of the code? Yes. > >> This is an interesting point -- should END blocks be called on > >> DISCARD ALL? > > > ENOCLUE > > And in the same vein, should they be called inside a transaction, > or not? What if they fail? As I said in the original ticket, I'd be quite happy for plperl END blocks to have no access to postgres at all, other than warnings going to the log. The spi_* functions could return an error if postgres is being shutdown (perhaps they already would if perl_destruct is called late in the shutdown sequence). So transactions are mute. Also, perl_destruct() will catch any exceptions from END blocks. > I don't see any reason whatsoever that we couldn't just document this > as a Perl feature not supported in plperl. If you do something like > creating threads inside plperl, we're going to give you the raspberry > when you complain about it breaking. END blocks can perfectly well > go into the same category. Returning to my original use case, the NYTProf profiler needs END blocks to work otherwise the generated profile data will be corrupt. I don't see any reason not to add PL_exit_flags |= PERL_EXIT_DESTRUCT_END; to plperl_init_interp(), and for perl_destruct() to be called late in the shutdown sequence. Tim.
On Mon, Sep 21, 2009 at 7:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Fetter <david@fetter.org> writes: >> On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote: >>>> With connection poolers, backends can last quite awhile. =A0Is it OK >>>> for the END block to run hours after the rest of the code? >>> >>> This is an interesting point -- should END blocks be called on >>> DISCARD ALL? > >> ENOCLUE > > And in the same vein, should they be called inside a transaction, > or not? =A0What if they fail? > > I don't see any reason whatsoever that we couldn't just document this > as a Perl feature not supported in plperl. =A0If you do something like > creating threads inside plperl, we're going to give you the raspberry > when you complain about it breaking. =A0END blocks can perfectly well > go into the same category. If the changes are simple, as Tim seems to believe, exactly what do we lose by doing this? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > If the changes are simple, as Tim seems to believe, exactly what do we > lose by doing this? It's not simple. There are any number of issues that Tim has not addressed. The most obvious: *his* use case might not require database access in an END block, but that doesn't mean the next complainant won't want it. Another point that comes to mind is shared_preload_libraries: if plperl is loaded once in the postmaster, it seems entirely likely that the same END block would be executed in multiple backends after having been loaded only once. Again, while that might be okay for his particular use-case, it seems horribly dangerous for anything else. (The shared_preload_libraries case also destroys the most obvious implementation path, ie having plperl add an on_proc_exit callback at _PG_init time...) But my basic objection is that a PL is a device for executing code in functions. It has no business being used to cause "action at a distance" outside of those functions. If we go down this path we are going to regret it. regards, tom lane
On Tue, Sep 22, 2009 at 10:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> If the changes are simple, as Tim seems to believe, exactly what do we >> lose by doing this? > > It's not simple. =A0There are any number of issues that Tim has not > addressed. =A0The most obvious: *his* use case might not require database > access in an END block, but that doesn't mean the next complainant won't > want it. Well, that's a reason why you might have to say no to the next complainant, but not necessarily the current one. > Another point that comes to mind is shared_preload_libraries: if plperl > is loaded once in the postmaster, it seems entirely likely that the same > END block would be executed in multiple backends after having been > loaded only once. =A0Again, while that might be okay for his particular > use-case, it seems horribly dangerous for anything else. Regular perl has the same problem, so any perl modules that are counting on any other behavior are already broken. $ perl -e 'fork(); END { print "hi\n" }' hi hi > (The shared_preload_libraries case also destroys the most obvious > implementation path, ie having plperl add an on_proc_exit callback > at _PG_init time...) Can't comment on that. > But my basic objection is that a PL is a device for executing code in > functions. =A0It has no business being used to cause "action at a > distance" outside of those functions. =A0If we go down this path we are > going to regret it. I agree with that concern, but if this were already implemented the way that Tim wants it, we would be unlikely to change it to the way that it is now on that basis. In practice, I don't think anyone uses END {} blocks for anything terribly elaborate, because it falls over. They are however useful for things like dumping debugs or profiling statistics, and I don't see much reason to prevent those sorts of uses. I also think that to some degree the horse has already left the barn on this one: if the Perl interpreter isn't being destroyed until backend exit, that implies that you can ALREADY do all kinds of horrible things, like setting global variables and reading them back later. What happens if you call fork(), or pthread_create(), or modify %SIG? ...Robert
On Tue, Sep 22, 2009 at 10:13:24AM -0400, Tom Lane wrote: > > Another point that comes to mind is shared_preload_libraries: if plperl > is loaded once in the postmaster, it seems entirely likely that the same > END block would be executed in multiple backends after having been > loaded only once. Again, while that might be okay for his particular > use-case, it seems horribly dangerous for anything else. I agree with Robert that that would be expected behaviour for perl users - but it would only apply for cases where END blocks had been compiled by the postmaster. > (The shared_preload_libraries case also destroys the most obvious > implementation path, ie having plperl add an on_proc_exit callback > at _PG_init time...) Could you explain the problem? (I don't see it, but I'm not familar with postgres internals.) Tim.