Thread: [PATCH] Add missing type conversion functions for PL/Python

[PATCH] Add missing type conversion functions for PL/Python

From
Haozhou Wang
Date:
Hi All,

PL/Python already has different type conversion functions to
convert PostgreSQL datum to Python object. However, the conversion
functions from Python object to PostgreSQL datum only has Boolean,
Bytea and String functions. 

In this patch, we add rest of Integer and Float related datatype conversion functions
and can increase the performance of data conversion greatly especially
when returning a large array.

We did a quick test about the performance of returning array in PL/Python:

The following UDF is used for test:

```
CREATE OR REPLACE FUNCTION pyoutfloat8(num int) RETURNS float8[] AS $$ return [x/3.0 for x in range(num)] 
$$ LANGUAGE plpythonu;
```

The test command is 

```
select count(pyoutfloat8(n));
```

The count is used for avoid large output, where n is the number of element in returned array, and the size is from 1.5k to15m. 

Size of Array      1.5k           |            15k         |           150k        |           1.5m        |         15m       |

Origin                 2.324ms     |     19.834ms      |     194.991ms    |     1927.28ms    |   19982.1ms  |  

With this patch   1.168ms      |      3.052ms       |      21.888ms     |     213.39ms      |    2138.5ms   |

All test for both PG and PL/Python are passed.

Thanks very much.


--
Regards,
Haozhou
Attachment

Re: [PATCH] Add missing type conversion functions for PL/Python

From
Anthony Bykov
Date:
On Wed, 31 Jan 2018 11:57:12 +0800
Haozhou Wang <hawang@pivotal.io> wrote:

> Hi All,
> 
> PL/Python already has different type conversion functions to
> convert PostgreSQL datum to Python object. However, the conversion
> functions from Python object to PostgreSQL datum only has Boolean,
> Bytea and String functions.
> 
> In this patch, we add rest of Integer and Float related datatype
> conversion functions
> and can increase the performance of data conversion greatly especially
> when returning a large array.
> 
> We did a quick test about the performance of returning array in
> PL/Python:
> 
> The following UDF is used for test:
> 
> ```
> CREATE OR REPLACE FUNCTION pyoutfloat8(num int) RETURNS float8[] AS $$
> return [x/3.0 for x in range(num)]
> $$ LANGUAGE plpythonu;
> ```
> 
> The test command is
> 
> ```
> select count(pyoutfloat8(n));
> ```
> 
> The count is used for avoid large output, where n is the number of
> element in returned array, and the size is from 1.5k to15m.
> 
> Size of Array      1.5k           |            15k         |
>  150k        |           1.5m        |         15m       |
> 
> Origin                 2.324ms     |     19.834ms      |     194.991ms
> |     1927.28ms    |   19982.1ms  |
> 
> With this patch   1.168ms      |      3.052ms       |
> 21.888ms     | 213.39ms      |    2138.5ms   |
> 
> All test for both PG and PL/Python are passed.
> 
> Thanks very much.
> 
> 

Hello,
sounds like a really nice patch. I've started looking
through the code and noticed a sort of a typos (or I just couldn't
understand what did you mean) in comments.

file "src/pl/plpython/plpy_typeio.c"
the comment is 
* If can not convert if directly, fallback to PLyObject_ToDatum
* to convert it

Maybe it should be something like ("it" instead of second "if")
* If can not convert it directly, fallback to PLyObject_ToDatum
* to convert it

And the same typo is repeated several times in comments.

--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [PATCH] Add missing type conversion functions for PL/Python

From
Haozhou Wang
Date:
Thank you very much for your review!

I attached a new patch with typo fixed.

Regards,
Haozhou

On Mon, Feb 19, 2018 at 2:37 PM, Anthony Bykov <a.bykov@postgrespro.ru> wrote:
On Wed, 31 Jan 2018 11:57:12 +0800
Haozhou Wang <hawang@pivotal.io> wrote:

> Hi All,
>
> PL/Python already has different type conversion functions to
> convert PostgreSQL datum to Python object. However, the conversion
> functions from Python object to PostgreSQL datum only has Boolean,
> Bytea and String functions.
>
> In this patch, we add rest of Integer and Float related datatype
> conversion functions
> and can increase the performance of data conversion greatly especially
> when returning a large array.
>
> We did a quick test about the performance of returning array in
> PL/Python:
>
> The following UDF is used for test:
>
> ```
> CREATE OR REPLACE FUNCTION pyoutfloat8(num int) RETURNS float8[] AS $$
> return [x/3.0 for x in range(num)]
> $$ LANGUAGE plpythonu;
> ```
>
> The test command is
>
> ```
> select count(pyoutfloat8(n));
> ```
>
> The count is used for avoid large output, where n is the number of
> element in returned array, and the size is from 1.5k to15m.
>
> Size of Array      1.5k           |            15k         |
>  150k        |           1.5m        |         15m       |
>
> Origin                 2.324ms     |     19.834ms      |     194.991ms
> |     1927.28ms    |   19982.1ms  |
>
> With this patch   1.168ms      |      3.052ms       |
> 21.888ms     | 213.39ms      |    2138.5ms   |
>
> All test for both PG and PL/Python are passed.
>
> Thanks very much.
>
>

Hello,
sounds like a really nice patch. I've started looking
through the code and noticed a sort of a typos (or I just couldn't
understand what did you mean) in comments.

file "src/pl/plpython/plpy_typeio.c"
the comment is
* If can not convert if directly, fallback to PLyObject_ToDatum
* to convert it

Maybe it should be something like ("it" instead of second "if")
* If can not convert it directly, fallback to PLyObject_ToDatum
* to convert it

And the same typo is repeated several times in comments.

--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Re: [PATCH] Add missing type conversion functions for PL/Python

From
David Steele
Date:
On 2/20/18 10:14 AM, Haozhou Wang wrote:
> Thank you very much for your review!
> 
> I attached a new patch with typo fixed.

I think it's a bit premature to mark this Ready for Committer after a
review consisting of a few typos.  Anthony only said that he started
looking at it so I've marked it Needs Review.

Anthony, were you planning to have a deeper look?

Regards,
-- 
-David
david@pgmasters.net


Re: [PATCH] Add missing type conversion functions for PL/Python

From
Nikita Glukhov
Date:
On 26.03.2018 17:19, David Steele wrote:

> On 2/20/18 10:14 AM, Haozhou Wang wrote:
>> Thank you very much for your review!
>>
>> I attached a new patch with typo fixed.
> I think it's a bit premature to mark this Ready for Committer after a
> review consisting of a few typos.  Anthony only said that he started
> looking at it so I've marked it Needs Review.

Hi.

I also have looked at this patch and found some problems.
Attached fixed 3th version of the patch:
  * initialization of arg->u.scalar was moved into PLy_output_setup_func()
  * added range checks for int16 and int32 types
  * added subroutine PLyInt_AsLong() for correct handling OverflowError that can
    be thrown from PyInt_AsLong()
  * casting from Python float to PostgreSQL numeric using PyFloat_AsDouble() was
    removed because it can return incorrect result for Python long and
    float8_numeric() uses float8 and numeric I/O functions
  * fixed whitespace


-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [PATCH] Add missing type conversion functions for PL/Python

From
Haozhou Wang
Date:
Thanks Nikita!

On Tue, Mar 27, 2018 at 12:07 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
On 26.03.2018 17:19, David Steele wrote:

On 2/20/18 10:14 AM, Haozhou Wang wrote:
Thank you very much for your review!

I attached a new patch with typo fixed.
I think it's a bit premature to mark this Ready for Committer after a
review consisting of a few typos.  Anthony only said that he started
looking at it so I've marked it Needs Review.

Hi.

I also have looked at this patch and found some problems.
Attached fixed 3th version of the patch:
 * initialization of arg->u.scalar was moved into PLy_output_setup_func()
 * added range checks for int16 and int32 types
 * added subroutine PLyInt_AsLong() for correct handling OverflowError that can
   be thrown from PyInt_AsLong()
 * casting from Python float to PostgreSQL numeric using PyFloat_AsDouble() was
   removed because it can return incorrect result for Python long and
   float8_numeric() uses float8 and numeric I/O functions
 * fixed whitespace


--
Nikita Glukhov

Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




--
Regards,
Haozhou

Re: [PATCH] Add missing type conversion functions for PL/Python

From
David Steele
Date:
On 3/26/18 12:07 PM, Nikita Glukhov wrote:
> On 26.03.2018 17:19, David Steele wrote:
> 
>> On 2/20/18 10:14 AM, Haozhou Wang wrote:
>>> Thank you very much for your review!
>>>
>>> I attached a new patch with typo fixed.
>> I think it's a bit premature to mark this Ready for Committer after a
>> review consisting of a few typos.  Anthony only said that he started
>> looking at it so I've marked it Needs Review.
> 
> Hi.
> 
> I also have looked at this patch and found some problems.
> Attached fixed 3th version of the patch:
>  * initialization of arg->u.scalar was moved into PLy_output_setup_func()
>  * added range checks for int16 and int32 types
>  * added subroutine PLyInt_AsLong() for correct handling OverflowError
> that can
>    be thrown from PyInt_AsLong()
>  * casting from Python float to PostgreSQL numeric using
> PyFloat_AsDouble() was
>    removed because it can return incorrect result for Python long and
>    float8_numeric() uses float8 and numeric I/O functions
>  * fixed whitespace

There's a new patch on this thread but since it was not submitted by the
author I've moved this entry to the next CF in Waiting for Author state.

Regards,
-- 
-David
david@pgmasters.net


Re: [PATCH] Add missing type conversion functions for PL/Python

From
Haozhou Wang
Date:
Hi David,

Thanks for your email.
Just wondering will I need to re-submit this patch?

Thanks a lot!

Regards,
Haozhou

On Tue, Apr 10, 2018 at 9:35 PM, David Steele <david@pgmasters.net> wrote:
On 3/26/18 12:07 PM, Nikita Glukhov wrote:
> On 26.03.2018 17:19, David Steele wrote:
>
>> On 2/20/18 10:14 AM, Haozhou Wang wrote:
>>> Thank you very much for your review!
>>>
>>> I attached a new patch with typo fixed.
>> I think it's a bit premature to mark this Ready for Committer after a
>> review consisting of a few typos.  Anthony only said that he started
>> looking at it so I've marked it Needs Review.
>
> Hi.
>
> I also have looked at this patch and found some problems.
> Attached fixed 3th version of the patch:
>  * initialization of arg->u.scalar was moved into PLy_output_setup_func()
>  * added range checks for int16 and int32 types
>  * added subroutine PLyInt_AsLong() for correct handling OverflowError
> that can
>    be thrown from PyInt_AsLong()
>  * casting from Python float to PostgreSQL numeric using
> PyFloat_AsDouble() was
>    removed because it can return incorrect result for Python long and
>    float8_numeric() uses float8 and numeric I/O functions
>  * fixed whitespace

There's a new patch on this thread but since it was not submitted by the
author I've moved this entry to the next CF in Waiting for Author state.

Regards,
--
-David
david@pgmasters.net

Re: [PATCH] Add missing type conversion functions for PL/Python

From
David Steele
Date:
On 4/11/18 2:36 AM, Haozhou Wang wrote:
> 
> Thanks for your email.
> Just wondering will I need to re-submit this patch?

No need to resubmit, the CF entry has been moved here:

https://commitfest.postgresql.org/18/1499/

You should have a look at Nikita's patch, though.

Regards,
-- 
-David
david@pgmasters.net


Re: [PATCH] Add missing type conversion functions for PL/Python

From
Haozhou Wang
Date:
Hi David, 

This new patch is look good for me. So I change the status to need review.
Thanks!

Regards,
Haozhou

On Wed, Apr 11, 2018 at 7:52 PM, David Steele <david@pgmasters.net> wrote:
On 4/11/18 2:36 AM, Haozhou Wang wrote:
>
> Thanks for your email.
> Just wondering will I need to re-submit this patch?

No need to resubmit, the CF entry has been moved here:

https://commitfest.postgresql.org/18/1499/

You should have a look at Nikita's patch, though.

Regards,
--
-David
david@pgmasters.net

Re: [PATCH] Add missing type conversion functions for PL/Python

From
Heikki Linnakangas
Date:
On 26/03/18 19:07, Nikita Glukhov wrote:
> Attached fixed 3th version of the patch:

Thanks, I'm reviewing this now. Nice speedup!

There is no test coverage for some of the added code. You can get a code 
coverage report with:

./configure --enable-coverage ...
make
make -C src/pl/plpython check
make coverage-html

That produces a code coverage report in coverage/index.html. Please look 
at the coverage of the new functions, and add tests to 
src/pl/plpython/sql/plpython_types.sql so that all the new code is covered.

In some places, where you've already checked the object type e.g. with 
PyFloat_Check(), I think you could use the less safe variants, like 
PyFloat_AS_DOUBLE() instead of PyFloat_AsDouble(). Since this patch is 
all about performance, seems worth doing.

Some of the conversions seem a bit questionable. For example:

> /*
>  * Convert a Python object to a PostgreSQL float8 datum directly.
>  * If can not convert it directly, fallback to PLyObject_ToScalar
>  * to convert it.
>  */
> static Datum
> PLyObject_ToFloat8(PLyObToDatum *arg, PyObject *plrv,
>                    bool *isnull, bool inarray)
> {
>     if (plrv == Py_None)
>     {
>         *isnull = true;
>         return (Datum) 0;
>     }
> 
>     if (PyFloat_Check(plrv) || PyInt_Check(plrv) || PyLong_Check(plrv))
>     {
>         *isnull = false;
>         return Float8GetDatum((double)PyFloat_AsDouble(plrv));
>     }
> 
>     return PLyObject_ToScalar(arg, plrv, isnull, inarray);
> }

The conversion from Python int to C double is performed by 
PyFloat_AsDouble(). But there's no error checking. And wouldn't 
PyLong_AsDouble() be more appropriate in that case, since we already 
checked the python type?

- Heikki


Re: [PATCH] Add missing type conversion functions for PL/Python

From
Haozhou Wang
Date:
Hi Heikki,

Thank you very much for your review! 
I will prepare a new patch and make it ready soon.

Regards,
Haozhou

On Thu, Jul 12, 2018 at 2:03 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 26/03/18 19:07, Nikita Glukhov wrote:
> Attached fixed 3th version of the patch:

Thanks, I'm reviewing this now. Nice speedup!

There is no test coverage for some of the added code. You can get a code
coverage report with:

./configure --enable-coverage ...
make
make -C src/pl/plpython check
make coverage-html

That produces a code coverage report in coverage/index.html. Please look
at the coverage of the new functions, and add tests to
src/pl/plpython/sql/plpython_types.sql so that all the new code is covered.

In some places, where you've already checked the object type e.g. with
PyFloat_Check(), I think you could use the less safe variants, like
PyFloat_AS_DOUBLE() instead of PyFloat_AsDouble(). Since this patch is
all about performance, seems worth doing.

Some of the conversions seem a bit questionable. For example:

> /*
>  * Convert a Python object to a PostgreSQL float8 datum directly.
>  * If can not convert it directly, fallback to PLyObject_ToScalar
>  * to convert it.
>  */
> static Datum
> PLyObject_ToFloat8(PLyObToDatum *arg, PyObject *plrv,
>                                  bool *isnull, bool inarray)
> {
>       if (plrv == Py_None)
>       {
>               *isnull = true;
>               return (Datum) 0;
>       }
>
>       if (PyFloat_Check(plrv) || PyInt_Check(plrv) || PyLong_Check(plrv))
>       {
>               *isnull = false;
>               return Float8GetDatum((double)PyFloat_AsDouble(plrv));
>       }
>
>       return PLyObject_ToScalar(arg, plrv, isnull, inarray);
> }

The conversion from Python int to C double is performed by
PyFloat_AsDouble(). But there's no error checking. And wouldn't
PyLong_AsDouble() be more appropriate in that case, since we already
checked the python type?

- Heikki


--
Regards,
Haozhou

Re: [PATCH] Add missing type conversion functions for PL/Python

From
Nikita Glukhov
Date:
On 11.07.2018 21:03, Heikki Linnakangas wrote:

> On 26/03/18 19:07, Nikita Glukhov wrote:
>> Attached fixed 3th version of the patch:
>
> Thanks, I'm reviewing this now. Nice speedup!

Thank you for your review.

>
> There is no test coverage for some of the added code. You can get a 
> code coverage report with:
>
> ./configure --enable-coverage ...
> make
> make -C src/pl/plpython check
> make coverage-html
>
> That produces a code coverage report in coverage/index.html. Please 
> look at the coverage of the new functions, and add tests to 
> src/pl/plpython/sql/plpython_types.sql so that all the new code is 
> covered.
>
I have added some cross-type test cases and now almost all new code is covered
(excluding several error cases which can be triggered only by custom numeric
type implementations).


> In some places, where you've already checked the object type e.g. with 
> PyFloat_Check(), I think you could use the less safe variants, like 
> PyFloat_AS_DOUBLE() instead of PyFloat_AsDouble(). Since this patch is 
> all about performance, seems worth doing.

Fixed.


> Some of the conversions seem a bit questionable. For example:
>
>> /*
>>  * Convert a Python object to a PostgreSQL float8 datum directly.
>>  * If can not convert it directly, fallback to PLyObject_ToScalar
>>  * to convert it.
>>  */
>> static Datum
>> PLyObject_ToFloat8(PLyObToDatum *arg, PyObject *plrv,
>>                    bool *isnull, bool inarray)
>> {
>>     if (plrv == Py_None)
>>     {
>>         *isnull = true;
>>         return (Datum) 0;
>>     }
>>
>>     if (PyFloat_Check(plrv) || PyInt_Check(plrv) || PyLong_Check(plrv))
>>     {
>>         *isnull = false;
>>         return Float8GetDatum((double)PyFloat_AsDouble(plrv));
>>     }
>>
>>     return PLyObject_ToScalar(arg, plrv, isnull, inarray);
>> }
>
> The conversion from Python int to C double is performed by 
> PyFloat_AsDouble(). But there's no error checking. And wouldn't 
> PyLong_AsDouble() be more appropriate in that case, since we already 
> checked the python type?
>

Yes, this might be wrong, but PyFloat_AsDouble() internally tries first to
convert number to float.  Also, after gaining more experience in PL/Python
during the implementation of jsonb transforms, I found a lot of similar
problems in the code.  All of them are fixed in the 4th version of the patch.

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Attachment

Re: [PATCH] Add missing type conversion functions for PL/Python

From
Heikki Linnakangas
Date:
On 12/07/18 18:06, Nikita Glukhov wrote:
> I have added some cross-type test cases and now almost all new code is covered
> (excluding several error cases which can be triggered only by custom numeric
> type implementations).

Thanks! Some of those new tests actually fail, if you run them against 
unpatched master. For example:

  SELECT * FROM test_type_conversion_float8_int2(100::float8);
  INFO:  (100.0, <type 'float'>)
- test_type_conversion_float8_int2
-----------------------------------
-                              100
-(1 row)
-
+ERROR:  invalid input syntax for integer: "100.0"
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_float8_int2"

So this patch is making some subtle changes to behavior. I don't think 
we want that.

- Heikki


Re: [PATCH] Add missing type conversion functions for PL/Python

From
Haozhou Wang
Date:
+1, I also think that we may not change the previous behavior of plpython.
@Nikita Glukhov maybe we just check the whether pyobject is int or long only in related conversion functions, and fallback otherwise?

On Fri, Jul 13, 2018 at 12:09 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 12/07/18 18:06, Nikita Glukhov wrote:
> I have added some cross-type test cases and now almost all new code is covered
> (excluding several error cases which can be triggered only by custom numeric
> type implementations).

Thanks! Some of those new tests actually fail, if you run them against
unpatched master. For example:

  SELECT * FROM test_type_conversion_float8_int2(100::float8);
  INFO:  (100.0, <type 'float'>)
- test_type_conversion_float8_int2
-----------------------------------
-                              100
-(1 row)
-
+ERROR:  invalid input syntax for integer: "100.0"
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_float8_int2"

So this patch is making some subtle changes to behavior. I don't think
we want that.

- Heikki


--
Regards,
Haozhou

Re: [PATCH] Add missing type conversion functions for PL/Python

From
Michael Paquier
Date:
On Mon, Jul 16, 2018 at 03:35:57PM +0800, Haozhou Wang wrote:
> +1, I also think that we may not change the previous behavior of plpython.
> @Nikita Glukhov <n.gluhov@postgrespro.ru> maybe we just check the
> whether pyobject is int or long only in related conversion functions, and
> fallback otherwise?

This patch was around for some time, and the changes in behavior are not
really nice, so this is marked as returned with feedback.
--
Michael

Attachment