Re: [PATCH] plpythonu datatype conversion improvements - Mailing list pgsql-hackers
From | Pierre Frédéric Caillaud |
---|---|
Subject | Re: [PATCH] plpythonu datatype conversion improvements |
Date | |
Msg-id | op.uyq89qqfcke6l8@soyouz Whole thread Raw |
In response to | Re: [PATCH] plpythonu datatype conversion improvements (Peter Eisentraut <peter_e@gmx.net>) |
List | pgsql-hackers |
>> Primary motivation of the attached patch is to support handling bytea >> conversion allowing for embedded nulls, which in turn allows for >> supporting the marshal module. >> >> Secondary motivation is slightly improved performance for conversion >> routines of basic datatypes that have simple mappings between >> postgres/python. >> >> Primary design is to change the conversion routines from being based >> on cstrings to datums, eg: >> PLyBool_FromString(const char *) => >> PLyBool_FromBool(PLyDatumToOb, Datum); > > I have reworked this patch a bit and extended the plpython test suite > around it. Current copy attached. > > The remaining problem is that the patch loses domain checking on the > return types, because some paths no longer go through the data type's > input function. I have marked these places as FIXME, and the regression > tests also contain a failing test case for this. > > What's needed here, I think, is an API that takes a datum plus type > information and checks whether the datum is valid within the domain. I > haven't found one that is exported, but maybe someone could give a tip. I see an intersection between the work I'm currently doing on COPY BINARY and this. Basically if you have an INT, you aren't going to make lots of checks. However, for a TEXT, postgres needs to reject it if it has a NULL in it (which doesn't bother Python at all), or if it is has chars which are not valid in the current encoding, etc. Many other types like TIMESTAMP have checks which are absolutely necessary for correctness... > What's needed here, I think, is an API that takes a datum plus type > information and checks whether the datum is valid within the domain. I > haven't found one that is exported, but maybe someone could give a tip. Problems : - If the data you're trying to put in the Datum doesn't fit (example : out of range error, varchar too small, etc), and you want a datum-type-specific function to check your datum and reject it, how are you going to build the datum ? perhaps you can't, since your value doesn't fit. It's a chicken and egg problem : the check function that you expect to reject your invalid datum will not know it's invalid, since you've trimmed it at the edges to make it fit in the required Datum type... - you are going to build a datum that is perhaps valid, and perhaps not, and send this to a function... having known-invalid datums moving around could be not such a good idea... Why not use the copy binary format to communicate between python and pg ? -> you write code to serialize python objects to binary form -> you call the recv function to get a postgres datum -> recv function throws an error if there is any problem -> as a bonus, you release your python object <-> postgres binary code as a separate library so people can use it to output data readable by COPY BINARY and parse COPY BINARY dumps.
pgsql-hackers by date: