Thread: [PATCH] Add missing type conversion functions for PL/Python
Hi All,
PL/Python already has different type conversion functions to
convert PostgreSQL datum to Python object. However, the conversion
functions from Python object to PostgreSQL datum only has Boolean,
Bytea and String functions.
In this patch, we add rest of Integer and Float related datatype conversion functions
and can increase the performance of data conversion greatly especially
when returning a large array.
We did a quick test about the performance of returning array in PL/Python:
The following UDF is used for test:
```
CREATE OR REPLACE FUNCTION pyoutfloat8(num int) RETURNS float8[] AS $$ return [x/3.0 for x in range(num)] $$ LANGUAGE plpythonu;
--
```
The test command is
```
select count(pyoutfloat8(n));
```
The count is used for avoid large output, where n is the number of element in returned array, and the size is from 1.5k to15m.
Size of Array 1.5k | 15k | 150k | 1.5m | 15m |
Origin 2.324ms | 19.834ms | 194.991ms | 1927.28ms | 19982.1ms |
With this patch 1.168ms | 3.052ms | 21.888ms | 213.39ms | 2138.5ms |
All test for both PG and PL/Python are passed.
Thanks very much.
Regards,
Haozhou
Attachment
On Wed, 31 Jan 2018 11:57:12 +0800 Haozhou Wang <hawang@pivotal.io> wrote: > Hi All, > > PL/Python already has different type conversion functions to > convert PostgreSQL datum to Python object. However, the conversion > functions from Python object to PostgreSQL datum only has Boolean, > Bytea and String functions. > > In this patch, we add rest of Integer and Float related datatype > conversion functions > and can increase the performance of data conversion greatly especially > when returning a large array. > > We did a quick test about the performance of returning array in > PL/Python: > > The following UDF is used for test: > > ``` > CREATE OR REPLACE FUNCTION pyoutfloat8(num int) RETURNS float8[] AS $$ > return [x/3.0 for x in range(num)] > $$ LANGUAGE plpythonu; > ``` > > The test command is > > ``` > select count(pyoutfloat8(n)); > ``` > > The count is used for avoid large output, where n is the number of > element in returned array, and the size is from 1.5k to15m. > > Size of Array 1.5k | 15k | > 150k | 1.5m | 15m | > > Origin 2.324ms | 19.834ms | 194.991ms > | 1927.28ms | 19982.1ms | > > With this patch 1.168ms | 3.052ms | > 21.888ms | 213.39ms | 2138.5ms | > > All test for both PG and PL/Python are passed. > > Thanks very much. > > Hello, sounds like a really nice patch. I've started looking through the code and noticed a sort of a typos (or I just couldn't understand what did you mean) in comments. file "src/pl/plpython/plpy_typeio.c" the comment is * If can not convert if directly, fallback to PLyObject_ToDatum * to convert it Maybe it should be something like ("it" instead of second "if") * If can not convert it directly, fallback to PLyObject_ToDatum * to convert it And the same typo is repeated several times in comments. -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Thank you very much for your review!
I attached a new patch with typo fixed.
Regards,
Haozhou
On Mon, Feb 19, 2018 at 2:37 PM, Anthony Bykov <a.bykov@postgrespro.ru> wrote:
Hello,On Wed, 31 Jan 2018 11:57:12 +0800
Haozhou Wang <hawang@pivotal.io> wrote:
> Hi All,
>
> PL/Python already has different type conversion functions to
> convert PostgreSQL datum to Python object. However, the conversion
> functions from Python object to PostgreSQL datum only has Boolean,
> Bytea and String functions.
>
> In this patch, we add rest of Integer and Float related datatype
> conversion functions
> and can increase the performance of data conversion greatly especially
> when returning a large array.
>
> We did a quick test about the performance of returning array in
> PL/Python:
>
> The following UDF is used for test:
>
> ```
> CREATE OR REPLACE FUNCTION pyoutfloat8(num int) RETURNS float8[] AS $$
> return [x/3.0 for x in range(num)]
> $$ LANGUAGE plpythonu;
> ```
>
> The test command is
>
> ```
> select count(pyoutfloat8(n));
> ```
>
> The count is used for avoid large output, where n is the number of
> element in returned array, and the size is from 1.5k to15m.
>
> Size of Array 1.5k | 15k |
> 150k | 1.5m | 15m |
>
> Origin 2.324ms | 19.834ms | 194.991ms
> | 1927.28ms | 19982.1ms |
>
> With this patch 1.168ms | 3.052ms |
> 21.888ms | 213.39ms | 2138.5ms |
>
> All test for both PG and PL/Python are passed.
>
> Thanks very much.
>
>
sounds like a really nice patch. I've started looking
through the code and noticed a sort of a typos (or I just couldn't
understand what did you mean) in comments.
file "src/pl/plpython/plpy_typeio.c"
the comment is
* If can not convert if directly, fallback to PLyObject_ToDatum
* to convert it
Maybe it should be something like ("it" instead of second "if")
* If can not convert it directly, fallback to PLyObject_ToDatum
* to convert it
And the same typo is repeated several times in comments.
--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
On 2/20/18 10:14 AM, Haozhou Wang wrote: > Thank you very much for your review! > > I attached a new patch with typo fixed. I think it's a bit premature to mark this Ready for Committer after a review consisting of a few typos. Anthony only said that he started looking at it so I've marked it Needs Review. Anthony, were you planning to have a deeper look? Regards, -- -David david@pgmasters.net
On 26.03.2018 17:19, David Steele wrote: > On 2/20/18 10:14 AM, Haozhou Wang wrote: >> Thank you very much for your review! >> >> I attached a new patch with typo fixed. > I think it's a bit premature to mark this Ready for Committer after a > review consisting of a few typos. Anthony only said that he started > looking at it so I've marked it Needs Review. Hi. I also have looked at this patch and found some problems. Attached fixed 3th version of the patch: * initialization of arg->u.scalar was moved into PLy_output_setup_func() * added range checks for int16 and int32 types * added subroutine PLyInt_AsLong() for correct handling OverflowError that can be thrown from PyInt_AsLong() * casting from Python float to PostgreSQL numeric using PyFloat_AsDouble() was removed because it can return incorrect result for Python long and float8_numeric() uses float8 and numeric I/O functions * fixed whitespace -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Thanks Nikita!
On Tue, Mar 27, 2018 at 12:07 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
On 26.03.2018 17:19, David Steele wrote:On 2/20/18 10:14 AM, Haozhou Wang wrote:Thank you very much for your review!I think it's a bit premature to mark this Ready for Committer after a
I attached a new patch with typo fixed.
review consisting of a few typos. Anthony only said that he started
looking at it so I've marked it Needs Review.
Hi.
I also have looked at this patch and found some problems.
Attached fixed 3th version of the patch:
* initialization of arg->u.scalar was moved into PLy_output_setup_func()
* added range checks for int16 and int32 types
* added subroutine PLyInt_AsLong() for correct handling OverflowError that can
be thrown from PyInt_AsLong()
* casting from Python float to PostgreSQL numeric using PyFloat_AsDouble() was
removed because it can return incorrect result for Python long and
float8_numeric() uses float8 and numeric I/O functions
* fixed whitespace
--
Nikita Glukhov
Regards,
Haozhou
On 3/26/18 12:07 PM, Nikita Glukhov wrote: > On 26.03.2018 17:19, David Steele wrote: > >> On 2/20/18 10:14 AM, Haozhou Wang wrote: >>> Thank you very much for your review! >>> >>> I attached a new patch with typo fixed. >> I think it's a bit premature to mark this Ready for Committer after a >> review consisting of a few typos. Anthony only said that he started >> looking at it so I've marked it Needs Review. > > Hi. > > I also have looked at this patch and found some problems. > Attached fixed 3th version of the patch: > * initialization of arg->u.scalar was moved into PLy_output_setup_func() > * added range checks for int16 and int32 types > * added subroutine PLyInt_AsLong() for correct handling OverflowError > that can > be thrown from PyInt_AsLong() > * casting from Python float to PostgreSQL numeric using > PyFloat_AsDouble() was > removed because it can return incorrect result for Python long and > float8_numeric() uses float8 and numeric I/O functions > * fixed whitespace There's a new patch on this thread but since it was not submitted by the author I've moved this entry to the next CF in Waiting for Author state. Regards, -- -David david@pgmasters.net
Hi David,
Thanks for your email.
Just wondering will I need to re-submit this patch?
Thanks a lot!
Regards,
Haozhou
On Tue, Apr 10, 2018 at 9:35 PM, David Steele <david@pgmasters.net> wrote:
On 3/26/18 12:07 PM, Nikita Glukhov wrote:
> On 26.03.2018 17:19, David Steele wrote:
>
>> On 2/20/18 10:14 AM, Haozhou Wang wrote:
>>> Thank you very much for your review!
>>>
>>> I attached a new patch with typo fixed.
>> I think it's a bit premature to mark this Ready for Committer after a
>> review consisting of a few typos. Anthony only said that he started
>> looking at it so I've marked it Needs Review.
>
> Hi.
>
> I also have looked at this patch and found some problems.
> Attached fixed 3th version of the patch:
> * initialization of arg->u.scalar was moved into PLy_output_setup_func()
> * added range checks for int16 and int32 types
> * added subroutine PLyInt_AsLong() for correct handling OverflowError
> that can
> be thrown from PyInt_AsLong()
> * casting from Python float to PostgreSQL numeric using
> PyFloat_AsDouble() was
> removed because it can return incorrect result for Python long and
> float8_numeric() uses float8 and numeric I/O functions
> * fixed whitespace
There's a new patch on this thread but since it was not submitted by the
author I've moved this entry to the next CF in Waiting for Author state.
Regards,
--
-David
david@pgmasters.net
On 4/11/18 2:36 AM, Haozhou Wang wrote: > > Thanks for your email. > Just wondering will I need to re-submit this patch? No need to resubmit, the CF entry has been moved here: https://commitfest.postgresql.org/18/1499/ You should have a look at Nikita's patch, though. Regards, -- -David david@pgmasters.net
Hi David,
This new patch is look good for me. So I change the status to need review.
Thanks!
Regards,
Haozhou
On Wed, Apr 11, 2018 at 7:52 PM, David Steele <david@pgmasters.net> wrote:
On 4/11/18 2:36 AM, Haozhou Wang wrote:
>
> Thanks for your email.
> Just wondering will I need to re-submit this patch?
No need to resubmit, the CF entry has been moved here:
https://commitfest.postgresql.org/18/1499/
You should have a look at Nikita's patch, though.
Regards,
--
-David
david@pgmasters.net
On 26/03/18 19:07, Nikita Glukhov wrote: > Attached fixed 3th version of the patch: Thanks, I'm reviewing this now. Nice speedup! There is no test coverage for some of the added code. You can get a code coverage report with: ./configure --enable-coverage ... make make -C src/pl/plpython check make coverage-html That produces a code coverage report in coverage/index.html. Please look at the coverage of the new functions, and add tests to src/pl/plpython/sql/plpython_types.sql so that all the new code is covered. In some places, where you've already checked the object type e.g. with PyFloat_Check(), I think you could use the less safe variants, like PyFloat_AS_DOUBLE() instead of PyFloat_AsDouble(). Since this patch is all about performance, seems worth doing. Some of the conversions seem a bit questionable. For example: > /* > * Convert a Python object to a PostgreSQL float8 datum directly. > * If can not convert it directly, fallback to PLyObject_ToScalar > * to convert it. > */ > static Datum > PLyObject_ToFloat8(PLyObToDatum *arg, PyObject *plrv, > bool *isnull, bool inarray) > { > if (plrv == Py_None) > { > *isnull = true; > return (Datum) 0; > } > > if (PyFloat_Check(plrv) || PyInt_Check(plrv) || PyLong_Check(plrv)) > { > *isnull = false; > return Float8GetDatum((double)PyFloat_AsDouble(plrv)); > } > > return PLyObject_ToScalar(arg, plrv, isnull, inarray); > } The conversion from Python int to C double is performed by PyFloat_AsDouble(). But there's no error checking. And wouldn't PyLong_AsDouble() be more appropriate in that case, since we already checked the python type? - Heikki
Hi Heikki,
I will prepare a new patch and make it ready soon.
Regards,
Haozhou
On Thu, Jul 12, 2018 at 2:03 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 26/03/18 19:07, Nikita Glukhov wrote:
> Attached fixed 3th version of the patch:
Thanks, I'm reviewing this now. Nice speedup!
There is no test coverage for some of the added code. You can get a code
coverage report with:
./configure --enable-coverage ...
make
make -C src/pl/plpython check
make coverage-html
That produces a code coverage report in coverage/index.html. Please look
at the coverage of the new functions, and add tests to
src/pl/plpython/sql/plpython_types.sql so that all the new code is covered.
In some places, where you've already checked the object type e.g. with
PyFloat_Check(), I think you could use the less safe variants, like
PyFloat_AS_DOUBLE() instead of PyFloat_AsDouble(). Since this patch is
all about performance, seems worth doing.
Some of the conversions seem a bit questionable. For example:
> /*
> * Convert a Python object to a PostgreSQL float8 datum directly.
> * If can not convert it directly, fallback to PLyObject_ToScalar
> * to convert it.
> */
> static Datum
> PLyObject_ToFloat8(PLyObToDatum *arg, PyObject *plrv,
> bool *isnull, bool inarray)
> {
> if (plrv == Py_None)
> {
> *isnull = true;
> return (Datum) 0;
> }
>
> if (PyFloat_Check(plrv) || PyInt_Check(plrv) || PyLong_Check(plrv))
> {
> *isnull = false;
> return Float8GetDatum((double)PyFloat_AsDouble(plrv));
> }
>
> return PLyObject_ToScalar(arg, plrv, isnull, inarray);
> }
The conversion from Python int to C double is performed by
PyFloat_AsDouble(). But there's no error checking. And wouldn't
PyLong_AsDouble() be more appropriate in that case, since we already
checked the python type?
- Heikki
Regards,
Haozhou
On 11.07.2018 21:03, Heikki Linnakangas wrote: > On 26/03/18 19:07, Nikita Glukhov wrote: >> Attached fixed 3th version of the patch: > > Thanks, I'm reviewing this now. Nice speedup! Thank you for your review. > > There is no test coverage for some of the added code. You can get a > code coverage report with: > > ./configure --enable-coverage ... > make > make -C src/pl/plpython check > make coverage-html > > That produces a code coverage report in coverage/index.html. Please > look at the coverage of the new functions, and add tests to > src/pl/plpython/sql/plpython_types.sql so that all the new code is > covered. > I have added some cross-type test cases and now almost all new code is covered (excluding several error cases which can be triggered only by custom numeric type implementations). > In some places, where you've already checked the object type e.g. with > PyFloat_Check(), I think you could use the less safe variants, like > PyFloat_AS_DOUBLE() instead of PyFloat_AsDouble(). Since this patch is > all about performance, seems worth doing. Fixed. > Some of the conversions seem a bit questionable. For example: > >> /* >> * Convert a Python object to a PostgreSQL float8 datum directly. >> * If can not convert it directly, fallback to PLyObject_ToScalar >> * to convert it. >> */ >> static Datum >> PLyObject_ToFloat8(PLyObToDatum *arg, PyObject *plrv, >> bool *isnull, bool inarray) >> { >> if (plrv == Py_None) >> { >> *isnull = true; >> return (Datum) 0; >> } >> >> if (PyFloat_Check(plrv) || PyInt_Check(plrv) || PyLong_Check(plrv)) >> { >> *isnull = false; >> return Float8GetDatum((double)PyFloat_AsDouble(plrv)); >> } >> >> return PLyObject_ToScalar(arg, plrv, isnull, inarray); >> } > > The conversion from Python int to C double is performed by > PyFloat_AsDouble(). But there's no error checking. And wouldn't > PyLong_AsDouble() be more appropriate in that case, since we already > checked the python type? > Yes, this might be wrong, but PyFloat_AsDouble() internally tries first to convert number to float. Also, after gaining more experience in PL/Python during the implementation of jsonb transforms, I found a lot of similar problems in the code. All of them are fixed in the 4th version of the patch. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 12/07/18 18:06, Nikita Glukhov wrote: > I have added some cross-type test cases and now almost all new code is covered > (excluding several error cases which can be triggered only by custom numeric > type implementations). Thanks! Some of those new tests actually fail, if you run them against unpatched master. For example: SELECT * FROM test_type_conversion_float8_int2(100::float8); INFO: (100.0, <type 'float'>) - test_type_conversion_float8_int2 ----------------------------------- - 100 -(1 row) - +ERROR: invalid input syntax for integer: "100.0" +CONTEXT: while creating return value +PL/Python function "test_type_conversion_float8_int2" So this patch is making some subtle changes to behavior. I don't think we want that. - Heikki
+1, I also think that we may not change the previous behavior of plpython.
@Nikita Glukhov maybe we just check the whether pyobject is int or long only in related conversion functions, and fallback otherwise?
On Fri, Jul 13, 2018 at 12:09 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 12/07/18 18:06, Nikita Glukhov wrote:
> I have added some cross-type test cases and now almost all new code is covered
> (excluding several error cases which can be triggered only by custom numeric
> type implementations).
Thanks! Some of those new tests actually fail, if you run them against
unpatched master. For example:
SELECT * FROM test_type_conversion_float8_int2(100::float8);
INFO: (100.0, <type 'float'>)
- test_type_conversion_float8_int2
-----------------------------------
- 100
-(1 row)
-
+ERROR: invalid input syntax for integer: "100.0"
+CONTEXT: while creating return value
+PL/Python function "test_type_conversion_float8_int2"
So this patch is making some subtle changes to behavior. I don't think
we want that.
- Heikki
Regards,
Haozhou
On Mon, Jul 16, 2018 at 03:35:57PM +0800, Haozhou Wang wrote: > +1, I also think that we may not change the previous behavior of plpython. > @Nikita Glukhov <n.gluhov@postgrespro.ru> maybe we just check the > whether pyobject is int or long only in related conversion functions, and > fallback otherwise? This patch was around for some time, and the changes in behavior are not really nice, so this is marked as returned with feedback. -- Michael