Thread: plperl vs. bytea
I have been talking with Theo some more about his recent problem with bytea arguments and results (see recent discussion on -bugs and also recent docs patch), what he needs is a way to have bytea (and possibly other unknown types) passed as binary data to and from plperl. The conversion overhead is too big both computationally and in increased memory usage. After discussing some possibilities, we decided that maybe the best approach would be to allow a custom GUC variable that would specify a list of types to be passed in binary form with no conversion, e.g. plperl.pass_as_binary = 'bytea, other-type' This would affect function args, trigger data, return results, and I think it should also apply to arguments for SPI prepared queries and to SPI returned results. If this seems like a good idea maybe it should go on the TODO list in whatever is the current incarnation. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > After discussing some possibilities, we decided that maybe > the best approach would be to allow a custom GUC variable that would > specify a list of types to be passed in binary form with no conversion, e.g. > plperl.pass_as_binary = 'bytea, other-type' At minimum this GUC would have to be superuser-only, and even then the security risks seem a bit high. But the real problem with this thinking is the same one I already pointed out to Theo: why do you think this issue is plperl-specific? regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> After discussing some possibilities, we decided that maybe >> the best approach would be to allow a custom GUC variable that would >> specify a list of types to be passed in binary form with no conversion, e.g. >> > > >> plperl.pass_as_binary = 'bytea, other-type' >> > > At minimum this GUC would have to be superuser-only, and even then the > security risks seem a bit high. But the real problem with this thinking > is the same one I already pointed out to Theo: why do you think this > issue is plperl-specific? > > > It's not. If we really want to tackle this root and branch without upsetting legacy code, I think we'd need to have a way of marking data items as binary in the grammar, e.g. create function myfunc(myarg binary bytea) returns binary bytea language plperl as $$ ...$$; That's what I originally suggested to Theo. It would be a lot more work, though :-) cheers andrew
Andrew Dunstan wrote: > It's not. If we really want to tackle this root and branch without > upsetting legacy code, I think we'd need to have a way of marking > data items as binary in the grammar, e.g. > > create function myfunc(myarg binary bytea) returns binary bytea > language plperl as $$ ...$$; This ought to be a property of data type plus language, not a property of a function. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > Andrew Dunstan wrote: > >> It's not. If we really want to tackle this root and branch without >> upsetting legacy code, I think we'd need to have a way of marking >> data items as binary in the grammar, e.g. >> >> create function myfunc(myarg binary bytea) returns binary bytea >> language plperl as $$ ...$$; >> > > This ought to be a property of data type plus language, not a property > of a function. > > Why should it? And how would you do it in such a way that it didn't break legacy code? My GUC proposal would have made it language+type specific, but Tom didn't like that approach. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Peter Eisentraut wrote: >> This ought to be a property of data type plus language, not a property >> of a function. > Why should it? > And how would you do it in such a way that it didn't break legacy code? > My GUC proposal would have made it language+type specific, but Tom > didn't like that approach. It may indeed need to be language+type specific; what I was objecting to was the proposal of an ad-hoc plperl-specific solution without any consideration for other languages (or other data types for that matter). I think that's working at the wrong level of detail, at least for initial design. What we've basically got here is a complaint that the default textual-representation-based method for transmitting PL function parameters and results is awkward and inefficient for bytea. So the first question is whether this is really localized to only bytea, and if not which other types have got similar issues. (Even if you make the case that no other scalar types need help, what of bytea[] and composite types containing bytea or bytea[]?) After that we have to look at which PLs have the issue. I think this is largely driven by what the PL's internal type system is like, in particular does it have a datatype that is a natural conversion target for bytea, or other types with the same issue? (Tcl for instance once did not have 8-bit-clean strings, though I think it does today.) After we've got a handle on the scope of the problem we can start to think about solutions. regards, tom lane
> What we've basically got here is a complaint that the default > textual-representation-based method for transmitting PL function > parameters and results is awkward and inefficient for bytea. > So the first question is whether this is really localized to only > bytea, and if not which other types have got similar issues. > (Even if you make the case that no other scalar types need help, > what of bytea[] and composite types containing bytea or bytea[]?) > It can be solution for known isues. Current textual representation is more ugly hack than everythink else. Regards Pavel Stehule
On Sun, May 06, 2007 at 08:48:28PM -0400, Tom Lane wrote: > What we've basically got here is a complaint that the default > textual-representation-based method for transmitting PL function > parameters and results is awkward and inefficient for bytea. > So the first question is whether this is really localized to only > bytea, and if not which other types have got similar issues. > (Even if you make the case that no other scalar types need help, > what of bytea[] and composite types containing bytea or bytea[]?) I must say I was indeed surprised by the idea that bytea is passed by text, since Perl handles embedded nulls in strings without any problem at all. Does this mean integers are passed as text also? I would have expected an array argument to be passed as an array, but now I'm not so sure. So I'm with Tom on this one: there needs to be a serious discussion about how types are passed to Perl and the costs associated with it. I do have one problem though: for bytea/integers/floats Perl has appropriate internel representations. But what about other user-defined types? Say the user-defined UUID type, it should probably also passed by a byte string, yet how could Perl know that. That would imply that user-defined types need to be able to specify how they are passed to PLs, to *any* PL. So fixing it for bytea is one thing, but there's a bigger issue here that needs discussion. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout schrieb: ...> I do have one problem though: for bytea/integers/floats Perl has> appropriate internel representations. But what aboutother user-defined> types? Say the user-defined UUID type, it should probably also passed> by a byte string, yet howcould Perl know that. That would imply that> user-defined types need to be able to specify how they are passed to> PLs,to *any* PL.> Yes exactly. One way could be to pass the type binary and provide a hull class for the PL/languages which then call the input/output routines on the string boundaries of the type unless overridden by user implementation. So default handling could be done in string representation of the type whatever that is and for a defined set of types every pl/language could implement special treatment like mapping to natural types. This handling can be done independently for every pl implementation since it would for the most types just move the current type treatment just a bit closer to the user code instead of doing all of it in the call handler. 2nd problem is language interface for outside of the database scripting. Efficient and lossless type handling there would improve some situations - maybe a similar approach could be taken here. Regards Tino
Tom Lane wrote: > >> My GUC proposal would have made it language+type specific, but Tom >> didn't like that approach. >> > > It may indeed need to be language+type specific; what I was objecting to > was the proposal of an ad-hoc plperl-specific solution without any > consideration for other languages (or other data types for that matter). > I think that's working at the wrong level of detail, at least for > initial design. > > What we've basically got here is a complaint that the default > textual-representation-based method for transmitting PL function > parameters and results is awkward and inefficient for bytea. > So the first question is whether this is really localized to only > bytea, and if not which other types have got similar issues. > (Even if you make the case that no other scalar types need help, > what of bytea[] and composite types containing bytea or bytea[]?) > Well, the proposal would have allowed the user to specify the types to be passed binary, so it wouldn't have been bytea only. Array types are currently passed as text. This item used to be on the TODO list but it disappeared at some stage: . Pass arrays natively instead of as text between plperl and postgres (Perhaps it's naughty of me to observe that if we had a tracker we might know why it disappeared). Arrays can be returned as arrayrefs, and plperl has a little postprocessing magic that turns that into text which will in turn be parsed back into a postgres array. Not very efficient but it's a placeholder until we get better array support. Composites are in fact passed as hashrefs and can be returned as hashrefs. Unfortunately, this is not true recursively - a composite within a composite will be received as text. Another aspect of this is how we deal with SPI arguments and results. I need to look into that, but sufficient unto the day ... cheers andrew
Tino Wildenhain wrote: > Martijn van Oosterhout schrieb: > ... > > I do have one problem though: for bytea/integers/floats Perl has > > appropriate internel representations. But what about other user-defined > > types? Say the user-defined UUID type, it should probably also passed > > by a byte string, yet how could Perl know that. That would imply that > > user-defined types need to be able to specify how they are passed to > > PLs, to *any* PL. > > > Yes exactly. One way could be to pass the type binary and provide > a hull class for the PL/languages which then call the input/output > routines on the string boundaries of the type unless overridden by > user implementation. So default handling could be done in string > representation of the type whatever that is and for a defined set > of types every pl/language could implement special treatment like > mapping to natural types. > > This handling can be done independently for every pl implementation > since it would for the most types just move the current type treatment > just a bit closer to the user code instead of doing all of it > in the call handler. > > 2nd problem is language interface for outside of the database scripting. > Efficient and lossless type handling there would improve some > situations - maybe a similar approach could be taken here. > > This seems like an elaborate piece of scaffolding for a relatively small problem. This does not need to be over-engineered, IMNSHO. cheers andrew
Martijn van Oosterhout <kleptog@svana.org> writes: > On Sun, May 06, 2007 at 08:48:28PM -0400, Tom Lane wrote: >> What we've basically got here is a complaint that the default >> textual-representation-based method for transmitting PL function >> parameters and results is awkward and inefficient for bytea. > I must say I was indeed surprised by the idea that bytea is passed by > text, since Perl handles embedded nulls in strings without any problem > at all. Does this mean integers are passed as text also? Pretty much everything is passed as text. This is a historical accident, in part: our first PL with an external interpreter was pltcl, and Tcl of the day had no other variable type besides "text string". (They've gotten smarter since then, but from a user's-eye point of view it's still true that every value in Tcl is a string.) So it was natural to decree that the value transmission protocol was just to convert to text and back with the SQL datatype I/O functions. Later PLs copied that decision without thinking hard about it. We've wedged a few bits of custom transmission protocol into plperl for arrays and records, but it's been pretty ad-hoc each time. Seems it's time to take a step back and question the assumptions. regards, tom lane
Andrew Dunstan schrieb: > > > Tino Wildenhain wrote: >> Martijn van Oosterhout schrieb: >> ... >> > I do have one problem though: for bytea/integers/floats Perl has >> > appropriate internel representations. But what about other user-defined >> > types? Say the user-defined UUID type, it should probably also passed >> > by a byte string, yet how could Perl know that. That would imply that >> > user-defined types need to be able to specify how they are passed to >> > PLs, to *any* PL. >> > >> Yes exactly. One way could be to pass the type binary and provide >> a hull class for the PL/languages which then call the input/output >> routines on the string boundaries of the type unless overridden by >> user implementation. So default handling could be done in string >> representation of the type whatever that is and for a defined set >> of types every pl/language could implement special treatment like >> mapping to natural types. >> >> This handling can be done independently for every pl implementation >> since it would for the most types just move the current type treatment >> just a bit closer to the user code instead of doing all of it >> in the call handler. >> >> 2nd problem is language interface for outside of the database scripting. >> Efficient and lossless type handling there would improve some >> situations - maybe a similar approach could be taken here. >> >> > > This seems like an elaborate piece of scaffolding for a relatively small > problem. > > This does not need to be over-engineered, IMNSHO. Well could you explain where it would appear over-engineered? All I was proposing is to move the rather hard-coded type mapping to a softer approach where the language is able to support it. Is there any insufficience in perl which makes it harder to do in a clean way? Regards Tino
Tino Wildenhain wrote: > Andrew Dunstan schrieb: >> >> >> Tino Wildenhain wrote: >>> Martijn van Oosterhout schrieb: >>> ... >>> > I do have one problem though: for bytea/integers/floats Perl has >>> > appropriate internel representations. But what about other >>> user-defined >>> > types? Say the user-defined UUID type, it should probably also passed >>> > by a byte string, yet how could Perl know that. That would imply that >>> > user-defined types need to be able to specify how they are passed to >>> > PLs, to *any* PL. >>> > >>> Yes exactly. One way could be to pass the type binary and provide >>> a hull class for the PL/languages which then call the input/output >>> routines on the string boundaries of the type unless overridden by >>> user implementation. So default handling could be done in string >>> representation of the type whatever that is and for a defined set >>> of types every pl/language could implement special treatment like >>> mapping to natural types. >>> >>> This handling can be done independently for every pl implementation >>> since it would for the most types just move the current type treatment >>> just a bit closer to the user code instead of doing all of it >>> in the call handler. >>> >>> 2nd problem is language interface for outside of the database >>> scripting. >>> Efficient and lossless type handling there would improve some >>> situations - maybe a similar approach could be taken here. >>> >>> >> >> This seems like an elaborate piece of scaffolding for a relatively >> small problem. >> >> This does not need to be over-engineered, IMNSHO. > > Well could you explain where it would appear over-engineered? > All I was proposing is to move the rather hard-coded > type mapping to a softer approach where the language > is able to support it. > > Is there any insufficience in perl which makes it harder to > do in a clean way? > > Anything that imposes extra requirements on type creators seems undesirable. I'm not sure either that the UUID example is a very good one. This whole problem arose because of performance problems handling large gobs of data, not just anything that happens to be binary. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tino Wildenhain wrote: >> Andrew Dunstan schrieb: >>> This does not need to be over-engineered, IMNSHO. >> >> Well could you explain where it would appear over-engineered? > Anything that imposes extra requirements on type creators seems undesirable. > I'm not sure either that the UUID example is a very good one. This whole > problem arose because of performance problems handling large gobs of > data, not just anything that happens to be binary. Well, we realize that bytea has got a performance problem, but are we so sure that nothing else does? I don't want to stick in a one-purpose wart only to find later that we need a few more warts of the same kind. An example of something else we ought to be considering is binary transmission of float values. The argument in favor of that is not so much performance (although text-and-back conversion is hardly cheap) as it is that the conversion is potentially lossy, since float8out doesn't by default generate enough digits to ensure a unique back-conversion. ISTM there are three reasons for considering non-text-based transmission: 1. Performance, as in the bytea case 2. Avoidance of information loss, as for float 3. Providing a natural/convenient mapping to the PL's internal data types, as we already do --- but incompletely --- forarrays and records It's clear that the details of #3 have to vary across PLs, but I'd like it not to vary capriciously. For instance plperl currently has special treatment for returning perl arrays as SQL arrays, but AFAICS from the manual not for going in the other direction; plpython and pltcl overlook arrays entirely, even though there are natural mappings they could and should be using. I don't know to what extent we should apply point #3 to situations other than arrays and records, but now is the time to think about it. An example: working with the geometric types in a PL function is probably going to be pretty painful for lack of simple access to the constituent float values (not to mention the lossiness problem). We should also be considering some non-core PLs such as PL/Ruby and PL/R; they might provide additional examples to influence our thinking. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tino Wildenhain wrote: >> >>> Andrew Dunstan schrieb: >>> >>>> This does not need to be over-engineered, IMNSHO. >>>> >>> Well could you explain where it would appear over-engineered? >>> > > >> Anything that imposes extra requirements on type creators seems undesirable. >> > > >> I'm not sure either that the UUID example is a very good one. This whole >> problem arose because of performance problems handling large gobs of >> data, not just anything that happens to be binary. >> > > Well, we realize that bytea has got a performance problem, but are we so > sure that nothing else does? I don't want to stick in a one-purpose > wart only to find later that we need a few more warts of the same kind. > > An example of something else we ought to be considering is binary > transmission of float values. The argument in favor of that is not > so much performance (although text-and-back conversion is hardly cheap) > as it is that the conversion is potentially lossy, since float8out > doesn't by default generate enough digits to ensure a unique > back-conversion. > > ISTM there are three reasons for considering non-text-based > transmission: > > 1. Performance, as in the bytea case > 2. Avoidance of information loss, as for float > 3. Providing a natural/convenient mapping to the PL's internal data types, > as we already do --- but incompletely --- for arrays and records > > It's clear that the details of #3 have to vary across PLs, but I'd > like it not to vary capriciously. For instance plperl currently has > special treatment for returning perl arrays as SQL arrays, but AFAICS > from the manual not for going in the other direction; plpython and > pltcl overlook arrays entirely, even though there are natural mappings > they could and should be using. > > I don't know to what extent we should apply point #3 to situations other > than arrays and records, but now is the time to think about it. An > example: working with the geometric types in a PL function is probably > going to be pretty painful for lack of simple access to the constituent > float values (not to mention the lossiness problem). > > We should also be considering some non-core PLs such as PL/Ruby and > PL/R; they might provide additional examples to influence our thinking. > OK, we have a lot of work to do here, then. I can really only speak with any significant knowledge on the perl front. Fundamentally, it has 3 types of scalars: IV, NV and PV (integer, float, string). IV can accomodate at least the largest integer or pointer type on the platform, NV a double, and PV an arbitrary string of bytes. As for structured types, as I noted elsewhere we have some of the work done for plperl. My suggestion would be to complete it for plperl and get it fully orthogonal and then retrofit that to plpython/pltcl. I've actually been worried for some time that the conversion glue was probably imposing significant penalties on the non-native PLs, so I'm glad to see this getting some attention. cheers andrew
Added to TODO: o Allow data to be passed in native language formats, rather than only text http://archives.postgresql.org/pgsql-hackers/2007-05/msg00289$ --------------------------------------------------------------------------- Andrew Dunstan wrote: > > > Tom Lane wrote: > > Andrew Dunstan <andrew@dunslane.net> writes: > > > >> Tino Wildenhain wrote: > >> > >>> Andrew Dunstan schrieb: > >>> > >>>> This does not need to be over-engineered, IMNSHO. > >>>> > >>> Well could you explain where it would appear over-engineered? > >>> > > > > > >> Anything that imposes extra requirements on type creators seems undesirable. > >> > > > > > >> I'm not sure either that the UUID example is a very good one. This whole > >> problem arose because of performance problems handling large gobs of > >> data, not just anything that happens to be binary. > >> > > > > Well, we realize that bytea has got a performance problem, but are we so > > sure that nothing else does? I don't want to stick in a one-purpose > > wart only to find later that we need a few more warts of the same kind. > > > > An example of something else we ought to be considering is binary > > transmission of float values. The argument in favor of that is not > > so much performance (although text-and-back conversion is hardly cheap) > > as it is that the conversion is potentially lossy, since float8out > > doesn't by default generate enough digits to ensure a unique > > back-conversion. > > > > ISTM there are three reasons for considering non-text-based > > transmission: > > > > 1. Performance, as in the bytea case > > 2. Avoidance of information loss, as for float > > 3. Providing a natural/convenient mapping to the PL's internal data types, > > as we already do --- but incompletely --- for arrays and records > > > > It's clear that the details of #3 have to vary across PLs, but I'd > > like it not to vary capriciously. For instance plperl currently has > > special treatment for returning perl arrays as SQL arrays, but AFAICS > > from the manual not for going in the other direction; plpython and > > pltcl overlook arrays entirely, even though there are natural mappings > > they could and should be using. > > > > I don't know to what extent we should apply point #3 to situations other > > than arrays and records, but now is the time to think about it. An > > example: working with the geometric types in a PL function is probably > > going to be pretty painful for lack of simple access to the constituent > > float values (not to mention the lossiness problem). > > > > We should also be considering some non-core PLs such as PL/Ruby and > > PL/R; they might provide additional examples to influence our thinking. > > > > OK, we have a lot of work to do here, then. > > I can really only speak with any significant knowledge on the perl > front. Fundamentally, it has 3 types of scalars: IV, NV and PV (integer, > float, string). IV can accomodate at least the largest integer or > pointer type on the platform, NV a double, and PV an arbitrary string of > bytes. > > As for structured types, as I noted elsewhere we have some of the work > done for plperl. My suggestion would be to complete it for plperl and > get it fully orthogonal and then retrofit that to plpython/pltcl. > > I've actually been worried for some time that the conversion glue was > probably imposing significant penalties on the non-native PLs, so I'm > glad to see this getting some attention. > > > cheers > > andrew > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Ühel kenal päeval, E, 2007-05-07 kell 13:57, kirjutas Andrew Dunstan: > > Tom Lane wrote: > > Andrew Dunstan <andrew@dunslane.net> writes: > > > >> Tino Wildenhain wrote: > >> > >>> Andrew Dunstan schrieb: > >>> > >>>> This does not need to be over-engineered, IMNSHO. > >>>> > >>> Well could you explain where it would appear over-engineered? > >>> > > > > > >> Anything that imposes extra requirements on type creators seems undesirable. > >> > > > > > >> I'm not sure either that the UUID example is a very good one. This whole > >> problem arose because of performance problems handling large gobs of > >> data, not just anything that happens to be binary. > >> > > > > Well, we realize that bytea has got a performance problem, but are we so > > sure that nothing else does? I don't want to stick in a one-purpose > > wart only to find later that we need a few more warts of the same kind. > > > > An example of something else we ought to be considering is binary > > transmission of float values. The argument in favor of that is not > > so much performance (although text-and-back conversion is hardly cheap) > > as it is that the conversion is potentially lossy, since float8out > > doesn't by default generate enough digits to ensure a unique > > back-conversion. > > > > ISTM there are three reasons for considering non-text-based > > transmission: > > > > 1. Performance, as in the bytea case > > 2. Avoidance of information loss, as for float > > 3. Providing a natural/convenient mapping to the PL's internal data types, > > as we already do --- but incompletely --- for arrays and records > > > > It's clear that the details of #3 have to vary across PLs, but I'd > > like it not to vary capriciously. For instance plperl currently has > > special treatment for returning perl arrays as SQL arrays, but AFAICS > > from the manual not for going in the other direction; plpython and > > pltcl overlook arrays entirely, even though there are natural mappings > > they could and should be using. plpy (from http://python.projects.postgresql.org/project/be.html ) goes to another extreme and exposes the whole postgresql type system to embedded python interpreter. > > I don't know to what extent we should apply point #3 to situations other > > than arrays and records, but now is the time to think about it. If we can avoid copying/converting large(ish) values between postgresql and embedded language, we should try to do it. The main problems seem to be in differences alloc/free, palloc, refcounting/CG between pg and embedded languages. > > An > > example: working with the geometric types in a PL function is probably > > going to be pretty painful for lack of simple access to the constituent > > float values (not to mention the lossiness problem). of course we should provide access to subparts of pg types, either by writing some wrapper class/accessor functios or providing access through postgresql's existing functions. > > We should also be considering some non-core PLs such as PL/Ruby and > > PL/R; they might provide additional examples to influence our thinking. > > > > OK, we have a lot of work to do here, then. > > I can really only speak with any significant knowledge on the perl > front. Fundamentally, it has 3 types of scalars: IV, NV and PV (integer, > float, string). IV can accomodate at least the largest integer or > pointer type on the platform, NV a double, and PV an arbitrary string of > bytes. OTOH python has an extensible type system from the start (i.e. anything is an object), and thus could be painlessly (just SMOP) extended to use postgresql's native types when there is no 1:1 match with existing types. > As for structured types, as I noted elsewhere we have some of the work > done for plperl. My suggestion would be to complete it for plperl and > get it fully orthogonal and then retrofit that to plpython/pltcl. > > I've actually been worried for some time that the conversion glue was > probably imposing significant penalties on the non-native PLs, so I'm > glad to see this getting some attention. > > > cheers > > andrew > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly