Thread: pg_proc.probin should become text?
pg_proc.probin is currently declared as being type bytea. Perhaps that had some real utility once upon a time, but no currently defined procedural language stores binary data there. In fact, the only thing that gets stored there is the shared-library file name for C functions. To make things even more fun, none of the code touching probin is doing anything to honor the special escaping conventions for bytea. This was pretty broken already, and might perhaps explain some of the weirder error reports we've gotten. It is now obvious to me that no shlib file name containing non-ASCII characters will correctly survive a dump/reload, in any existing PG release. Backslashes accidentally fail to fail, at least for some values of standard_conforming_strings. However, with the hex bytea output patch in place, it's *completely* broken. I offer the following pg_dump output: CREATE FUNCTION interpt_pp(path, path) RETURNS point LANGUAGE c AS '\\x2f686f6d652f706f7374677265732f706773716c2f7372632f746573742f726567726573732f726567726573732e736c','interpt_pp'; which should of course have looked like this: CREATE FUNCTION interpt_pp(path, path) RETURNS point LANGUAGE c AS '/home/postgres/testversion/src/test/regress/regress.sl','interpt_pp'; I think that the least painful solution might be to change pg_proc.probin to be declared as text. Otherwise we're going to need version-specific klugery in pg_dump and who knows where else. Comments? regards, tom lane
On Mon, Aug 3, 2009 at 10:03 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > pg_proc.probin is currently declared as being type bytea. Perhaps that > had some real utility once upon a time, but no currently defined > procedural language stores binary data there. In fact, the only > thing that gets stored there is the shared-library file name for > C functions. To make things even more fun, none of the code touching > probin is doing anything to honor the special escaping conventions for > bytea. This was pretty broken already, and might perhaps explain some > of the weirder error reports we've gotten. It is now obvious to me > that no shlib file name containing non-ASCII characters will correctly > survive a dump/reload, in any existing PG release. Backslashes > accidentally fail to fail, at least for some values of > standard_conforming_strings. > > However, with the hex bytea output patch in place, it's *completely* > broken. I offer the following pg_dump output: > > CREATE FUNCTION interpt_pp(path, path) RETURNS point > LANGUAGE c > AS '\\x2f686f6d652f706f7374677265732f706773716c2f7372632f746573742f726567726573732f726567726573732e736c', 'interpt_pp'; > > which should of course have looked like this: > > CREATE FUNCTION interpt_pp(path, path) RETURNS point > LANGUAGE c > AS '/home/postgres/testversion/src/test/regress/regress.sl', 'interpt_pp'; > > I think that the least painful solution might be to change > pg_proc.probin to be declared as text. Otherwise we're going to need > version-specific klugery in pg_dump and who knows where else. > > Comments? Will that require a special hack in pg_migrator? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Aug 3, 2009 at 10:03 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I think that the least painful solution might be to change >> pg_proc.probin to be declared as text. �Otherwise we're going to need >> version-specific klugery in pg_dump and who knows where else. > Will that require a special hack in pg_migrator? No, pg_dump (and hence pg_migrator) wouldn't care. If we thought we were going to try to back-patch a fix for the non-ASCII-char problem into older releases, then some other approach would be preferable. But given the fact that this hasn't gotten complained of (at least not enough to be identified) in all the years since Berkeley, I can't see doing the pushups that would be needed to back-patch a fix. It looks to me like everyone has been effectively assuming that probin stores text, and we should just make it really truly do that. regards, tom lane
On Mon, Aug 03, 2009 at 10:03:11PM -0400, Tom Lane wrote: > However, with the hex bytea output patch in place, it's *completely* > broken. I offer the following pg_dump output: > > CREATE FUNCTION interpt_pp(path, path) RETURNS point > LANGUAGE c > AS '\\x2f686f6d652f706f7374677265732f706773716c2f7372632f746573742f726567726573732f726567726573732e736c', 'interpt_pp'; > > which should of course have looked like this: > > CREATE FUNCTION interpt_pp(path, path) RETURNS point > LANGUAGE c > AS '/home/postgres/testversion/src/test/regress/regress.sl', 'interpt_pp'; Oh, of course! How could I have been so dense? ;) > I think that the least painful solution might be to change > pg_proc.probin to be declared as text. Otherwise we're going to need > version-specific klugery in pg_dump and who knows where else. > > Comments? +1 on changing it to text. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, Aug 3, 2009 at 10:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Aug 3, 2009 at 10:03 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> I think that the least painful solution might be to change >>> pg_proc.probin to be declared as text. Otherwise we're going to need >>> version-specific klugery in pg_dump and who knows where else. > >> Will that require a special hack in pg_migrator? > > No, pg_dump (and hence pg_migrator) wouldn't care. > > If we thought we were going to try to back-patch a fix for the > non-ASCII-char problem into older releases, then some other approach > would be preferable. But given the fact that this hasn't gotten > complained of (at least not enough to be identified) in all the years > since Berkeley, I can't see doing the pushups that would be needed to > back-patch a fix. It looks to me like everyone has been effectively > assuming that probin stores text, and we should just make it really > truly do that. Sounds good to me, then. ...Robert
2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>: > pg_proc.probin is currently declared as being type bytea. Perhaps that > had some real utility once upon a time, but no currently defined > procedural language stores binary data there. In fact, the only > thing that gets stored there is the shared-library file name for > C functions. To make things even more fun, none of the code touching > probin is doing anything to honor the special escaping conventions for > bytea. This was pretty broken already, and might perhaps explain some > of the weirder error reports we've gotten. It is now obvious to me > that no shlib file name containing non-ASCII characters will correctly > survive a dump/reload, in any existing PG release. Backslashes > accidentally fail to fail, at least for some values of > standard_conforming_strings. > > However, with the hex bytea output patch in place, it's *completely* > broken. I offer the following pg_dump output: > > CREATE FUNCTION interpt_pp(path, path) RETURNS point > LANGUAGE c > AS '\\x2f686f6d652f706f7374677265732f706773716c2f7372632f746573742f726567726573732f726567726573732e736c', 'interpt_pp'; > > which should of course have looked like this: > > CREATE FUNCTION interpt_pp(path, path) RETURNS point > LANGUAGE c > AS '/home/postgres/testversion/src/test/regress/regress.sl', 'interpt_pp'; > > I think that the least painful solution might be to change > pg_proc.probin to be declared as text. Otherwise we're going to need > version-specific klugery in pg_dump and who knows where else. > > Comments? > -1 correct solution is moving the path to prosrc col or create new colum "externalproc". I thing so probin should be useful for plpgsql - sometime we should to store parser result from plpgsql compilation stage in this column. So my option is don't change native sense of this column. If we actually have not using, the we should to drop it, not create some breaks in future. Pavel Stehule > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>: >> I think that the least painful solution might be to change >> pg_proc.probin to be declared as text. > correct solution is moving the path to prosrc col or create new colum > "externalproc". I thing so probin should be useful for plpgsql - > sometime we should to store parser result from plpgsql compilation > stage in this column. So my option is don't change native sense of > this column. If we actually have not using, the we should to drop it, > not create some breaks in future. [ shrug... ] Can't get excited about it. What you propose would be a significant amount of work and added complexity, because pg_dump or somebody would have to take care to translate existing function definitions properly. And the benefit is purely speculative. I seriously doubt that serializing plpgsql compilation trees into a bytea representation would be worth anyone's time. The original Berkeley authors probably had some idea of storing compiled machine code in probin, but nobody's done that either, and I'll bet long odds that nobody ever will. (The advantage compared to external C functions implemented as we currently know them seems negligible. The Berkeley folk probably did not foresee the prevalence of loadable-shared-library support, nor the general security-driven move away from allowing execution of writable memory.) regards, tom lane
2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>: >>> I think that the least painful solution might be to change >>> pg_proc.probin to be declared as text. > >> correct solution is moving the path to prosrc col or create new colum >> "externalproc". I thing so probin should be useful for plpgsql - >> sometime we should to store parser result from plpgsql compilation >> stage in this column. So my option is don't change native sense of >> this column. If we actually have not using, the we should to drop it, >> not create some breaks in future. > > [ shrug... ] Can't get excited about it. What you propose would be a > significant amount of work and added complexity, because pg_dump or > somebody would have to take care to translate existing function > definitions properly. And the benefit is purely speculative. > I seriously doubt that serializing plpgsql compilation trees into a > bytea representation would be worth anyone's time. The original > Berkeley authors probably had some idea of storing compiled machine code > in probin, but nobody's done that either, and I'll bet long odds that > nobody ever will. (The advantage compared to external C functions > implemented as we currently know them seems negligible. The Berkeley > folk probably did not foresee the prevalence of loadable-shared-library > support, nor the general security-driven move away from allowing > execution of writable memory.) > > regards, tom lane >
2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>: >>> I think that the least painful solution might be to change >>> pg_proc.probin to be declared as text. > >> correct solution is moving the path to prosrc col or create new colum >> "externalproc". I thing so probin should be useful for plpgsql - >> sometime we should to store parser result from plpgsql compilation >> stage in this column. So my option is don't change native sense of >> this column. If we actually have not using, the we should to drop it, >> not create some breaks in future. > > [ shrug... ] Can't get excited about it. What you propose would be a > significant amount of work and added complexity, because pg_dump or > somebody would have to take care to translate existing function > definitions properly. And the benefit is purely speculative. > I seriously doubt that serializing plpgsql compilation trees into a > bytea representation would be worth anyone's time. The original > Berkeley authors probably had some idea of storing compiled machine code > in probin, but nobody's done that either, and I'll bet long odds that > nobody ever will. (The advantage compared to external C functions > implemented as we currently know them seems negligible. The Berkeley > folk probably did not foresee the prevalence of loadable-shared-library > support, nor the general security-driven move away from allowing > execution of writable memory.) I agree, so information about patch would be store in text field. But I am not sure, if your fix isn't too simply. I haven't plan to compile plpgsql to C or to binary code. But could be interesting link postgres with some virtual machine like parrot or lua vm, and translate plpgsql to p code. It's maybe far future. Early future is integration main SQL parser to plpgsql. I am not sure, but maybe we will need some persistent cache for store parametrized sql queries. I though about store it in probin column. Pavel > > regards, tom lane >
Pavel Stehule <pavel.stehule@gmail.com> writes: > I agree, so information about patch would be store in text field. But > I am not sure, if your fix isn't too simply. I haven't plan to compile > plpgsql to C or to binary code. But could be interesting link postgres > with some virtual machine like parrot or lua vm, and translate plpgsql > to p code. It's maybe far future. > Early future is integration main SQL parser to plpgsql. I am not sure, > but maybe we will need some persistent cache for store parametrized > sql queries. I though about store it in probin column. Well, probin is the wrong place for any such thing anyhow. Regardless of datatype issues, the system is clearly built on the assumption that the value of probin is to be specified *by the user* in CREATE FUNCTION. If you want a cache for derived information it would need to be a new column. (I remain of the opinion that caching such stuff in pg_proc would be a bad design decision. Every data structure that goes to disk is another data structure that you cannot easily change. There just isn't enough gain there to justify the maintenance headaches.) regards, tom lane
Pavel Stehule escribió: > I agree, so information about patch would be store in text field. But > I am not sure, if your fix isn't too simply. I haven't plan to compile > plpgsql to C or to binary code. But could be interesting link postgres > with some virtual machine like parrot or lua vm, and translate plpgsql > to p code. It's maybe far future. In this case I think it would make more sense to compile the code and keep the p-code in backend local memory. Keeping compiled code around in more permanent or global storage would only make sense if you had large amounts of code, and in that case I think we'd want to have a more concrete and complete proposal. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> I agree, so information about patch would be store in text field. But >> I am not sure, if your fix isn't too simply. I haven't plan to compile >> plpgsql to C or to binary code. But could be interesting link postgres >> with some virtual machine like parrot or lua vm, and translate plpgsql >> to p code. It's maybe far future. > >> Early future is integration main SQL parser to plpgsql. I am not sure, >> but maybe we will need some persistent cache for store parametrized >> sql queries. I though about store it in probin column. > > Well, probin is the wrong place for any such thing anyhow. Regardless > of datatype issues, the system is clearly built on the assumption that > the value of probin is to be specified *by the user* in CREATE FUNCTION. > If you want a cache for derived information it would need to be a new > column. > > (I remain of the opinion that caching such stuff in pg_proc would be > a bad design decision. Every data structure that goes to disk is > another data structure that you cannot easily change. There just isn't > enough gain there to justify the maintenance headaches.) ook, I agree - but I am not sure, so some cache will be necessary. Pavel > > regards, tom lane >