Re: [PATCH] plpythonu datatype conversion improvements - Mailing list pgsql-hackers

From Pierre Frédéric Caillaud
Subject Re: [PATCH] plpythonu datatype conversion improvements
Date
Msg-id op.uyq89qqfcke6l8@soyouz
Whole thread Raw
In response to Re: [PATCH] plpythonu datatype conversion improvements  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
>> Primary motivation of the attached patch is to support handling bytea
>> conversion allowing for embedded nulls, which in turn allows for
>> supporting the marshal module.
>>
>> Secondary motivation is slightly improved performance for conversion
>> routines of basic datatypes that have simple mappings between
>> postgres/python.
>>
>> Primary design is to change the conversion routines from being based
>> on cstrings to datums, eg:
>>     PLyBool_FromString(const char *)  =>
>> PLyBool_FromBool(PLyDatumToOb, Datum);
>
> I have reworked this patch a bit and extended the plpython test suite
> around it.  Current copy attached.
>
> The remaining problem is that the patch loses domain checking on the
> return types, because some paths no longer go through the data type's
> input function.  I have marked these places as FIXME, and the regression
> tests also contain a failing test case for this.
>
> What's needed here, I think, is an API that takes a datum plus type
> information and checks whether the datum is valid within the domain.  I
> haven't found one that is exported, but maybe someone could give a tip.


I see an intersection between the work I'm currently doing on COPY BINARY  
and this.

Basically if you have an INT, you aren't going to make lots of checks.

However, for a TEXT, postgres needs to reject it if it has a NULL in it  
(which doesn't bother Python at all), or if it is has chars which are not  
valid in the current encoding, etc.
Many other types like TIMESTAMP have checks which are absolutely necessary  
for correctness...

> What's needed here, I think, is an API that takes a datum plus type
> information and checks whether the datum is valid within the domain.  I
> haven't found one that is exported, but maybe someone could give a tip.

Problems :

- If the data you're trying to put in the Datum doesn't fit (example : out  
of range error, varchar too small, etc), and you want a  
datum-type-specific function to check your datum and reject it, how are  
you going to build the datum ? perhaps you can't, since your value doesn't  
fit. It's a chicken and egg problem : the check function that you expect  
to reject your invalid datum will not know it's invalid, since you've  
trimmed it at the edges to make it fit in the required Datum type...

- you are going to build a datum that is perhaps valid, and perhaps not,  
and send this to a function... having known-invalid datums moving around  
could be not such a good idea...

Why not use the copy binary format to communicate between python and pg ?

-> you write code to serialize python objects to binary form
-> you call the recv function to get a postgres datum
-> recv function throws an error if there is any problem

-> as a bonus, you release your python object <-> postgres binary code as  
a separate library so people can use it to output data readable by COPY  
BINARY and parse COPY BINARY dumps.


pgsql-hackers by date:

Previous
From: Sam Mason
Date:
Subject: docs for random function
Next
From: Peter Eisentraut
Date:
Subject: Re: GRANT ON ALL IN schema