Home > mailing lists

Thread: [PATCH] Fix conversion for Decimal arguments in plpython functions

[PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

28 May 2013, 20:41:31

Hi,

I've got a patch.

This is for a plpython enhancement.

There is an item at the TODO list http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages

"Fix loss of information during conversion of numeric type to Python float"

This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float.

Patch contains changes in code, documentation and tests.

Most probably there is something wrong, as this is my first Postgres patch :)

thanks,

Szymon

Attachment

plpython_decimal.patch

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Steve Singer

Date:

25 June 2013, 03:15:29

On 05/28/2013 04:41 PM, Szymon Guz wrote:
> Hi,
> I've got a patch.
>
> This is for a plpython enhancement.
>
> There is an item at the TODO list
> http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages
> "Fix loss of information during conversion of numeric type to Python
> float"
>
> This patch uses a decimal.Decimal type from Python standard library
> for the plpthon function numeric argument instead of float.
>
> Patch contains changes in code, documentation and tests.
>
> Most probably there is something wrong, as this is my first Postgres
> patch :)
>

Thanks for contributing.

This patch applies cleanly against master and compiles with warnings

plpy_main.c: In function ‘PLy_init_interp’:
plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]

You can avoid this by moving the declaration of decimal and decimal_dict
to be at the top of the function where mainmod is declared.

Also in this function you've introduced places where it returns with an
error (the PLy_elog(ERROR...) calls before decrementing the reference to
mainmod. I think you can decrement the mainmod reference after the call
to SetItemString  before your changes that import the Decimal module.

The patch works as expected, I am able to write python functions that
take numerics as arguments and work with them.  I can adjust the decimal
context precision inside of  my function.

One concern I have is that this patch makes pl/python functions
involving numerics more than 3 times as slow as before.

create temp table b(a numeric);
insert into b select generate_series(1,10000);

create or replace function x(a numeric,b numeric) returns numeric as $$
if a==None:  return b
return a+b
$$ language plpythonu;
create aggregate sm(basetype=numeric, sfunc=x,stype=numeric);

test=# select sm(a) from b;    sm
---------- 50005000
(1 row)

Time: 565.650 ms

versus before the patch this was taking in the range of 80ms.

Would it be faster to call numeric_send instead of numeric_out and then
convert the sequence of Int16's to a tuple of digits that can be passed
into the Decimal constructor? I think this is worth trying and testing,

Documentation
=================
Your patched version of the docs say
  PostgreSQL <type>real</type>, <type>double</type>, and
<type>numeric</type> are converted to   Python <type>Decimal</type>. This type is imported
from<literal>decimal.Decimal</literal>.

I don't think this is correct, as far as I can tell your not changing
the behaviour for postgresql real and double types, they continue to use
floating point.

<listitem>
<para>       PostgreSQL <type>real</type> and <type>double</type>are converted to       Python <type>float</type>.
</para>
</listitem>

<listitem>
<para>       PostgreSQL <type>numeric</type> is converted to       Python <type>Decimal</type>. This type is imported
from
<literal>decimal.Decimal</literal>.
</para>
</listitem>

Maybe?

Steve

> thanks,
> Szymon
>
>
>

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

25 June 2013, 09:42:30

On 25 June 2013 05:16, Steve Singer <steve@ssinger.info> wrote:

One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before.

create temp table b(a numeric);
insert into b select generate_series(1,10000);

create or replace function x(a numeric,b numeric) returns numeric as $$
if a==None:
return b
return a+b
$$ language plpythonu;
create aggregate sm(basetype=numeric, sfunc=x,stype=numeric);

test=# select sm(a) from b;
sm
----------
50005000
(1 row)

Time: 565.650 ms

versus before the patch this was taking in the range of 80ms.

Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing,

Hi,

thanks for all the remarks.

I think I cannot do anything about speeding up the code. What I've found so far is:

I cannot use simple fields from NumericVar in my code, so to not waste time on something not sensible, I've tried to found out if using the tuple constructor for decimal.Decimal will be faster. I've changed the function to something like this:

static PyObject *

PLyDecimal_FromNumeric(PLyDatumToOb *arg, Datum d)

{

PyObject *digits = PyTuple_New(4);

PyTuple_SetItem(digits, 0, PyInt_FromLong(1));

PyTuple_SetItem(digits, 1, PyInt_FromLong(4));

PyTuple_SetItem(digits, 2, PyInt_FromLong(1));

PyTuple_SetItem(digits, 3, PyInt_FromLong(4));

PyObject *tuple = PyTuple_New(3);

PyTuple_SetItem(tuple, 0, PyInt_FromLong(1));

PyTuple_SetItem(tuple, 1, digits);

PyTuple_SetItem(tuple, 2, PyInt_FromLong(-3));

value = PyObject_CallFunctionObjArgs(PLy_decimal_ctor_global, tuple, NULL);

return value;

}

Yes, it returns the same value regardless the params. The idea is to call Python code like:

Decimal((0, (1, 4, 1, 4), -3))

which is simply:

Decimal('1.414')

Unfortunately this is not faster. It is as slow as it was with string constructor.

I've checked the speed of decimal.Decimal using pure python. For this I used a simple function, similar to yours:

def x(a, b):

if a is None:

return b

return a + b

I've run the tests using simple ints:

def test():

a = 0

for i in xrange(0, 10000):

a += x(a, i)

for a in xrange(1, 100):

test()

And later I've run the same function, but with converting the arguments to Decimals:

from decimal import Decimal

def x(a, b):

if a is None:

return b

return a + b

def test():

a = 0

for i in xrange(0, 10000):

a += x(Decimal(a), Decimal(i))

for a in xrange(1, 100):

test()

It was run 100 times for decreasing the impact of test initialization.

The results for both files are:

int: 0.697s

decimal: 38.859s

What gives average time for one function call of:

int: 69ms

decimal: 380ms

For me the problem is with slow code at Python's side, the Decimal constructors are pretty slow, and there is nothing I can do with that at the Postgres' side.

I will send patch with fixes later.

thanks,

Szymon

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

25 June 2013, 10:42:59

On 25 June 2013 05:16, Steve Singer <steve@ssinger.info> wrote:

On 05/28/2013 04:41 PM, Szymon Guz wrote:
Hi,
I've got a patch.

This is for a plpython enhancement.

There is an item at the TODO list http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages
"Fix loss of information during conversion of numeric type to Python float"

This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float.

Patch contains changes in code, documentation and tests.

Most probably there is something wrong, as this is my first Postgres patch :)

Thanks for contributing.

This patch applies cleanly against master and compiles with warnings

plpy_main.c: In function ‘PLy_init_interp’:
plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]

You can avoid this by moving the declaration of decimal and decimal_dict to be at the top of the function where mainmod is declared.

Also in this function you've introduced places where it returns with an error (the PLy_elog(ERROR...) calls before decrementing the reference to mainmod. I think you can decrement the mainmod reference after the call to SetItemString before your changes that import the Decimal module.

The patch works as expected, I am able to write python functions that take numerics as arguments and work with them. I can adjust the decimal context precision inside of my function.

One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before.

create temp table b(a numeric);
insert into b select generate_series(1,10000);

create or replace function x(a numeric,b numeric) returns numeric as $$
if a==None:
return b
return a+b
$$ language plpythonu;
create aggregate sm(basetype=numeric, sfunc=x,stype=numeric);

test=# select sm(a) from b;
sm
----------
50005000
(1 row)

Time: 565.650 ms

versus before the patch this was taking in the range of 80ms.

Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing,

Documentation
=================
Your patched version of the docs say

PostgreSQL <type>real</type>, <type>double</type>, and <type>numeric</type> are converted to
Python <type>Decimal</type>. This type is imported from<literal>decimal.Decimal</literal>.

I don't think this is correct, as far as I can tell your not changing the behaviour for postgresql real and double types, they continue to use floating point.

<listitem>
<para>
PostgreSQL <type>real</type> and <type>double</type>are converted to
Python <type>float</type>.
</para>
</listitem>

<listitem>
<para>
PostgreSQL <type>numeric</type> is converted to
Python <type>Decimal</type>. This type is imported from
<literal>decimal.Decimal</literal>.
</para>
</listitem>

Hi,

I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.

thanks,

Szymon

Attachment

plpython_decimal_v2.patch

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Ronan Dunklau

Date:

25 June 2013, 12:23:51

Concerning the efficiency problem, it should be noted that the latest 3.3 release of cpython introduces an "accelerator" for decimal data types, as a C-module. This module was previously available from the Python package index at: https://pypi.python.org/pypi/cdecimal/2.2

It may be overkill to try to include such a dependency, but the performance overhead from using decimal is really mitigated by this implementation.

2013/6/25 Szymon Guz <mabewlun@gmail.com>

On 25 June 2013 05:16, Steve Singer <steve@ssinger.info> wrote:
On 05/28/2013 04:41 PM, Szymon Guz wrote:
Hi,
I've got a patch.

This is for a plpython enhancement.

There is an item at the TODO list http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages
"Fix loss of information during conversion of numeric type to Python float"

This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float.

Patch contains changes in code, documentation and tests.

Most probably there is something wrong, as this is my first Postgres patch :)

Thanks for contributing.

This patch applies cleanly against master and compiles with warnings

plpy_main.c: In function ‘PLy_init_interp’:
plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]

You can avoid this by moving the declaration of decimal and decimal_dict to be at the top of the function where mainmod is declared.

Also in this function you've introduced places where it returns with an error (the PLy_elog(ERROR...) calls before decrementing the reference to mainmod. I think you can decrement the mainmod reference after the call to SetItemString before your changes that import the Decimal module.

The patch works as expected, I am able to write python functions that take numerics as arguments and work with them. I can adjust the decimal context precision inside of my function.

One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before.

create temp table b(a numeric);
insert into b select generate_series(1,10000);

create or replace function x(a numeric,b numeric) returns numeric as $$
if a==None:
return b
return a+b
$$ language plpythonu;
create aggregate sm(basetype=numeric, sfunc=x,stype=numeric);

test=# select sm(a) from b;
sm
----------
50005000
(1 row)

Time: 565.650 ms

versus before the patch this was taking in the range of 80ms.

Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing,

Documentation
=================
Your patched version of the docs say

PostgreSQL <type>real</type>, <type>double</type>, and <type>numeric</type> are converted to
Python <type>Decimal</type>. This type is imported from<literal>decimal.Decimal</literal>.

I don't think this is correct, as far as I can tell your not changing the behaviour for postgresql real and double types, they continue to use floating point.

<listitem>
<para>
PostgreSQL <type>real</type> and <type>double</type>are converted to
Python <type>float</type>.
</para>
</listitem>

<listitem>
<para>
PostgreSQL <type>numeric</type> is converted to
Python <type>Decimal</type>. This type is imported from
<literal>decimal.Decimal</literal>.
</para>
</listitem>

Hi,
I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.

thanks,
Szymon

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

25 June 2013, 12:27:56

Well, I really don't like the idea of such a dependency.

However it could be added as configuration option, so you could compile postgres with e.g. --with-cdecimal, and then it would be user dependent.

Maybe it is a good idea for another patch.

On 25 June 2013 14:23, Ronan Dunklau <rdunklau@gmail.com> wrote:

Concerning the efficiency problem, it should be noted that the latest 3.3 release of cpython introduces an "accelerator" for decimal data types, as a C-module. This module was previously available from the Python package index at: https://pypi.python.org/pypi/cdecimal/2.2

It may be overkill to try to include such a dependency, but the performance overhead from using decimal is really mitigated by this implementation.

2013/6/25 Szymon Guz <mabewlun@gmail.com>
On 25 June 2013 05:16, Steve Singer <steve@ssinger.info> wrote:
On 05/28/2013 04:41 PM, Szymon Guz wrote:
Hi,
I've got a patch.

This is for a plpython enhancement.

There is an item at the TODO list http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages
"Fix loss of information during conversion of numeric type to Python float"

This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float.

Patch contains changes in code, documentation and tests.

Most probably there is something wrong, as this is my first Postgres patch :)

Thanks for contributing.

This patch applies cleanly against master and compiles with warnings

plpy_main.c: In function ‘PLy_init_interp’:
plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]

You can avoid this by moving the declaration of decimal and decimal_dict to be at the top of the function where mainmod is declared.

Also in this function you've introduced places where it returns with an error (the PLy_elog(ERROR...) calls before decrementing the reference to mainmod. I think you can decrement the mainmod reference after the call to SetItemString before your changes that import the Decimal module.

The patch works as expected, I am able to write python functions that take numerics as arguments and work with them. I can adjust the decimal context precision inside of my function.

One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before.

create temp table b(a numeric);
insert into b select generate_series(1,10000);

create or replace function x(a numeric,b numeric) returns numeric as $$
if a==None:
return b
return a+b
$$ language plpythonu;
create aggregate sm(basetype=numeric, sfunc=x,stype=numeric);

test=# select sm(a) from b;
sm
----------
50005000
(1 row)

Time: 565.650 ms

versus before the patch this was taking in the range of 80ms.

Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing,

Documentation
=================
Your patched version of the docs say

PostgreSQL <type>real</type>, <type>double</type>, and <type>numeric</type> are converted to
Python <type>Decimal</type>. This type is imported from<literal>decimal.Decimal</literal>.

I don't think this is correct, as far as I can tell your not changing the behaviour for postgresql real and double types, they continue to use floating point.

<listitem>
<para>
PostgreSQL <type>real</type> and <type>double</type>are converted to
Python <type>float</type>.
</para>
</listitem>

<listitem>
<para>
PostgreSQL <type>numeric</type> is converted to
Python <type>Decimal</type>. This type is imported from
<literal>decimal.Decimal</literal>.
</para>
</listitem>

Hi,
I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.

thanks,
Szymon

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Steve Singer

Date:

25 June 2013, 23:39:00

On 06/25/2013 06:42 AM, Szymon Guz wrote:

>
> Hi,
> I've attached a new patch. I've fixed all the problems you've found, 
> except for the efficiency problem, which has been described in 
> previous email.
>
> thanks,
> Szymon
>

This version of the patch addresses the issues I mentioned.  Thanks for 
looking into seeing if the performance issue is with our conversions to 
strings or inherit with the python decimal type.  I guess we 
(Postgresql) can't do much about it.   A runtime switch to use cdecimal 
if it is available is a good idea, but I agree with you that could be a 
different patch.

One minor thing I noticed in this round,
 PLy_elog(ERROR, "could not import module 'decimal'");

I think should have "decimal" in double-quotes.

I think this patch is ready for a committer to look at it.

Steve

>

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

26 June 2013, 07:17:57

On 26 June 2013 01:40, Steve Singer <steve@ssinger.info> wrote:

On 06/25/2013 06:42 AM, Szymon Guz wrote:

Hi,

I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.

thanks,
Szymon

This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch.

One minor thing I noticed in this round,

PLy_elog(ERROR, "could not import module 'decimal'");

I think should have "decimal" in double-quotes.

I think this patch is ready for a committer to look at it.

Steve

Hi Steve,

thanks for the review.

I was thinking about speeding up the Decimal conversion using the module you wrote about. What about trying to import it, if it fails, than trying to load decimal.Decimal? There will be no warning in logs, just additional information in documentation that it uses this module if it is available?

thanks,

Szymon

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Ronan Dunklau

Date:

26 June 2013, 08:12:10

The v2 patch does not work for me: regression tests for plpython fail on the plpython_types test: every numeric is converted to None.

It seems the global decimal ctor is not initialized.

Please find two patches, to be applied on top of the v2 patch: one initializes the decimal ctor, the other uses cdecimal when possible.

Using the performance test from steve, on my machine:

- with cdecimal installed: ~84ms
- without cdecimal installed (standard decimal module): ~511ms

2013/6/26 Szymon Guz <mabewlun@gmail.com>

On 26 June 2013 01:40, Steve Singer <steve@ssinger.info> wrote:
On 06/25/2013 06:42 AM, Szymon Guz wrote:

Hi,

I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.

thanks,
Szymon

This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch.

One minor thing I noticed in this round,

PLy_elog(ERROR, "could not import module 'decimal'");

I think should have "decimal" in double-quotes.

I think this patch is ready for a committer to look at it.

Steve

Hi Steve,
thanks for the review.

I was thinking about speeding up the Decimal conversion using the module you wrote about. What about trying to import it, if it fails, than trying to load decimal.Decimal? There will be no warning in logs, just additional information in documentation that it uses this module if it is available?

thanks,
Szymon

Attachment

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

26 June 2013, 08:28:18

Thanks Steve, that's exactly what I wanted to send when you sent your patches :)

I need to figure out why that patch v2 worked for me, I think I made mess somewhere in my git repo and didn't create the patch properly. Sorry for that.

Patch is attached, I've also added information about cdecimal to plpython documentation.

I'm just wondering how to make integration tests to check when cdecimal is installed and when it is not.

thanks,

Szymon

On 26 June 2013 10:12, Ronan Dunklau <rdunklau@gmail.com> wrote:

The v2 patch does not work for me: regression tests for plpython fail on the plpython_types test: every numeric is converted to None.

It seems the global decimal ctor is not initialized.

Please find two patches, to be applied on top of the v2 patch: one initializes the decimal ctor, the other uses cdecimal when possible.

Using the performance test from steve, on my machine:

- with cdecimal installed: ~84ms
- without cdecimal installed (standard decimal module): ~511ms

2013/6/26 Szymon Guz <mabewlun@gmail.com>
On 26 June 2013 01:40, Steve Singer <steve@ssinger.info> wrote:
On 06/25/2013 06:42 AM, Szymon Guz wrote:

Hi,

I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.

thanks,
Szymon

This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch.

One minor thing I noticed in this round,

PLy_elog(ERROR, "could not import module 'decimal'");

I think should have "decimal" in double-quotes.

I think this patch is ready for a committer to look at it.

Steve

Hi Steve,
thanks for the review.

I was thinking about speeding up the Decimal conversion using the module you wrote about. What about trying to import it, if it fails, than trying to load decimal.Decimal? There will be no warning in logs, just additional information in documentation that it uses this module if it is available?

thanks,
Szymon

Attachment

plpython_decimal_v3.patch

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

26 June 2013, 08:35:14

You had a great idea, the time with cdecimal is really great, the difference on my machine is 64 ms vs 430 ms.

Szymon

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Ronan Dunklau

Date:

26 June 2013, 10:05:02

It seems like you confused me with steve :)

The patch applies cleanly, and the regression tests pass on python2 when cdecimal is not installed. When it is, the type info returned for the converted numeric value is cdecimal.Decimal instead of decimal.Decimal.

The regression tests expected output have not been modified for python3, and as such they fail on the type conversions.

I am a bit confused with the use of PyModule_GetDict: shouldn't PyObj_GetAttrString be used directly instead ? Moreover, the reference count in the current implementation might be off: the reference count for the decimal module is never decreased, while the reference count to the module dict is, when the docs say it returns a borrowed reference.

Please find a patch that fixes both issues.

2013/6/26 Szymon Guz <mabewlun@gmail.com>

Thanks Steve, that's exactly what I wanted to send when you sent your patches :)

I need to figure out why that patch v2 worked for me, I think I made mess somewhere in my git repo and didn't create the patch properly. Sorry for that.

Patch is attached, I've also added information about cdecimal to plpython documentation.

I'm just wondering how to make integration tests to check when cdecimal is installed and when it is not.

thanks,
Szymon

On 26 June 2013 10:12, Ronan Dunklau <rdunklau@gmail.com> wrote:
The v2 patch does not work for me: regression tests for plpython fail on the plpython_types test: every numeric is converted to None.

It seems the global decimal ctor is not initialized.

Please find two patches, to be applied on top of the v2 patch: one initializes the decimal ctor, the other uses cdecimal when possible.

Using the performance test from steve, on my machine:

- with cdecimal installed: ~84ms
- without cdecimal installed (standard decimal module): ~511ms

2013/6/26 Szymon Guz <mabewlun@gmail.com>
On 26 June 2013 01:40, Steve Singer <steve@ssinger.info> wrote:
On 06/25/2013 06:42 AM, Szymon Guz wrote:

Hi,

I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.

thanks,
Szymon

This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch.

One minor thing I noticed in this round,

PLy_elog(ERROR, "could not import module 'decimal'");

I think should have "decimal" in double-quotes.

I think this patch is ready for a committer to look at it.

Steve

Hi Steve,
thanks for the review.

I was thinking about speeding up the Decimal conversion using the module you wrote about. What about trying to import it, if it fails, than trying to load decimal.Decimal? There will be no warning in logs, just additional information in documentation that it uses this module if it is available?

thanks,
Szymon

Attachment

plpython_decimal_fix_regression.patch

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

26 June 2013, 11:04:03

On 26 June 2013 12:04, Ronan Dunklau <rdunklau@gmail.com> wrote:

It seems like you confused me with steve :)

Hi Ronan,

Oh, yes. I'm sorry for that :)

The patch applies cleanly, and the regression tests pass on python2 when cdecimal is not installed. When it is, the type info returned for the converted numeric value is cdecimal.Decimal instead of decimal.Decimal.

The regression tests expected output have not been modified for python3, and as such they fail on the type conversions.

I am a bit confused with the use of PyModule_GetDict: shouldn't PyObj_GetAttrString be used directly instead ? Moreover, the reference count in the current implementation might be off: the reference count for the decimal module is never decreased, while the reference count to the module dict is, when the docs say it returns a borrowed reference.

Please find a patch that fixes both issues.

Thanks for the patch. I assume you generated that from clean trunk, and it includes all the changes (mine and yours) right?

I've checked the patch, everything looks great.

I've attached it to this email with changed name, just for consistent naming in commitfest app.

thanks,

Szymon

Attachment

plpython_decimal_v4.patch

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Peter Eisentraut

Date:

26 June 2013, 19:59:49

On 6/26/13 7:03 AM, Szymon Guz wrote:
> I've checked the patch, everything looks great.
> I've attached it to this email with changed name, just for consistent
> naming in commitfest app.

Could the setup of the decimal.Decimal constructor be moved into
PLyDecimal_FromNumeric() and kept in a static pointer?  I'd rather not
clutter up the main initialization routine.

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

26 June 2013, 20:08:17

<div dir="ltr">On 26 June 2013 21:59, Peter Eisentraut <span dir="ltr"><<a href="mailto:peter_e@gmx.net"
target="_blank">peter_e@gmx.net</a>></span>wrote:<br /><div class="gmail_extra"><div class="gmail_quote"><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 6/26/13
7:03AM, Szymon Guz wrote:<br /> > I've checked the patch, everything looks great.<br /> > I've attached it to
thisemail with changed name, just for consistent<br /> > naming in commitfest app.<br /><br /></div>Could the setup
ofthe decimal.Decimal constructor be moved into<br /> PLyDecimal_FromNumeric() and kept in a static pointer?  I'd
rathernot<br /> clutter up the main initialization routine.<br /><br /></blockquote></div><br /></div><div
class="gmail_extra"style="style">OK, I will.</div></div>

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

26 June 2013, 20:47:53

On 26 June 2013 22:08, Szymon Guz <mabewlun@gmail.com> wrote:

On 26 June 2013 21:59, Peter Eisentraut <peter_e@gmx.net> wrote:
On 6/26/13 7:03 AM, Szymon Guz wrote:
> I've checked the patch, everything looks great.
> I've attached it to this email with changed name, just for consistent
> naming in commitfest app.

Could the setup of the decimal.Decimal constructor be moved into
PLyDecimal_FromNumeric() and kept in a static pointer? I'd rather not
clutter up the main initialization routine.

Attached patch has all changes against trunk code.

There is added a function for conversion from Postgres numeric to Python Decimal. The Decimal type is taken from cdecimal.Decimal, if it is available. It is an external library, quite fast, but may be not available. If it is not available, then decimal.Decimal will be used. It is in standard Python library, however it is rather slow.

The initialization is done in the conversion function, the pointer to a proper Decimal constructor is stored as static variable inside the function and is lazy initialized.

The documentation is updated.

Tests for python 2 and 3 have been added. They work only with standard decimal.Decimal, as the type is printed in the *.out files. I think there is nothing we can do with that now.

regards,

Szymon

Attachment

plpython_decimal_v5.patch

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Steve Singer

Date:

27 June 2013, 03:20:32

On 06/26/2013 04:47 PM, Szymon Guz wrote:
>
>
>
>
> Attached patch has all changes against trunk code.
>
> There is added a function for conversion from Postgres numeric to
> Python Decimal. The Decimal type is taken from cdecimal.Decimal, if it
> is available. It is an external library, quite fast, but may be not
> available. If it is not available, then decimal.Decimal will be used.
> It is in standard Python library, however it is rather slow.
>
> The initialization is done in the conversion function, the pointer to
> a proper Decimal constructor is stored as static variable inside the
> function and is lazy initialized.
>
> The documentation is updated.
>

I've tested this version with python 2.7 with and without cdecimal and
also with 3.3 that has the faster decimal performance. It seems fine.

The v5 version of the patch makes only white-space changes to
plpy_main.c you should excluded that from the patch if your making a new
version (I have done this in the v6 version I'm attaching)

> Tests for python 2 and 3 have been added. They work only with standard
> decimal.Decimal, as the type is printed in the *.out files. I think
> there is nothing we can do with that now.
>
>

I think we should make  test_type_conversion_numeric to do something
that generates the same output in both cases.  ie
py.info(str(x)).  I downside of having the test fail on installs with
cdecimal installed is much greater than any benefit we get by ensuring
that the type is really decimal.
I've attached a v6 version of the patch that does this, do you agree
with my thinking?

Steve

> regards,
> Szymon
>
>

Attachment

plpython_decimal_v6.patch

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

27 June 2013, 09:04:57

On 27 June 2013 05:21, Steve Singer <steve@ssinger.info> wrote:

On 06/26/2013 04:47 PM, Szymon Guz wrote:

Attached patch has all changes against trunk code.

There is added a function for conversion from Postgres numeric to Python Decimal. The Decimal type is taken from cdecimal.Decimal, if it is available. It is an external library, quite fast, but may be not available. If it is not available, then decimal.Decimal will be used. It is in standard Python library, however it is rather slow.

The initialization is done in the conversion function, the pointer to a proper Decimal constructor is stored as static variable inside the function and is lazy initialized.

The documentation is updated.

I've tested this version with python 2.7 with and without cdecimal and also with 3.3 that has the faster decimal performance. It seems fine.

The v5 version of the patch makes only white-space changes to plpy_main.c you should excluded that from the patch if your making a new version (I have done this in the v6 version I'm attaching)

Tests for python 2 and 3 have been added. They work only with standard decimal.Decimal, as the type is printed in the *.out files. I think there is nothing we can do with that now.

I think we should make test_type_conversion_numeric to do something that generates the same output in both cases. ie
py.info(str(x)). I downside of having the test fail on installs with cdecimal installed is much greater than any benefit we get by ensuring that the type is really decimal.
I've attached a v6 version of the patch that does this, do you agree with my thinking?

Hi Steve,

thanks for the changes.

You're idea about common code for decimal and cdecimal is good, however not good enough. I like the idea of common code for decimal and cdecimal. But we need class name, not the value.

I've changed the code from str(x) to x.__class__.__name__ so the function prints class name (which is Decimal for both packages), not the value. We need to have the class name check. The value is returned by the function and is a couple of lines lower in the file.

patch is attached.

thanks,

Szymon

Attachment

plpython_decimal_v7.patch

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Steve Singer

Date:

28 June 2013, 20:13:04

On 06/27/2013 05:04 AM, Szymon Guz wrote:
> On 27 June 2013 05:21, Steve Singer <steve@ssinger.info
> <mailto:steve@ssinger.info>> wrote:
>
>     On 06/26/2013 04:47 PM, Szymon Guz wrote:
>
>
>
>
>
>
> Hi Steve,
> thanks for the changes.
>
> You're idea about common code for decimal and cdecimal is good,
> however not good enough. I like the idea of common code for decimal
> and cdecimal. But we need class name, not the value.
>
> I've changed the code from str(x) to x.__class__.__name__ so the
> function prints class name (which is Decimal for both packages), not
> the value. We need to have the class name check. The value is returned
> by the function and is a couple of lines lower in the file.
>
> patch is attached.
>

I think the value is more important than the name, I want to the tests
to make sure that the conversion is actually converting properly.  With
your method of getting the class name without the module we can have both.

The attached patch should print the value and the class name but not the
module name.

Steve


> thanks,
> Szymon
>
>
>
>
>
>

Attachment

plpython_decimal_v8.patch

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

28 June 2013, 20:28:14

On 28 June 2013 22:14, Steve Singer <steve@ssinger.info> wrote:

I think the value is more important than the name, I want to the tests to make sure that the conversion is actually converting properly. With your method of getting the class name without the module we can have both.

The attached patch should print the value and the class name but not the module name.

Steve

Hi Steve,

I agree, we can check both. This is quite a nice patch now, I've reviewed it, all tests pass, works as expected. I think it is ready for committing.

thanks,

Szymon

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Claudio Freire

Date:

28 June 2013, 20:29:23

On Fri, Jun 28, 2013 at 5:14 PM, Steve Singer <steve@ssinger.info> wrote:
> On 06/27/2013 05:04 AM, Szymon Guz wrote:
>>
>> On 27 June 2013 05:21, Steve Singer <steve@ssinger.info
>> <mailto:steve@ssinger.info>> wrote:
>>
>>     On 06/26/2013 04:47 PM, Szymon Guz wrote:
>>
>>
>>
>>
>>
>>
>> Hi Steve,
>> thanks for the changes.
>>
>> You're idea about common code for decimal and cdecimal is good, however
>> not good enough. I like the idea of common code for decimal and cdecimal.
>> But we need class name, not the value.
>>
>> I've changed the code from str(x) to x.__class__.__name__ so the function
>> prints class name (which is Decimal for both packages), not the value. We
>> need to have the class name check. The value is returned by the function and
>> is a couple of lines lower in the file.
>>
>> patch is attached.
>>
>
> I think the value is more important than the name, I want to the tests to
> make sure that the conversion is actually converting properly.  With your
> method of getting the class name without the module we can have both.
>
> The attached patch should print the value and the class name but not the
> module name.


Why not forego checking of the type, and instead check the interface?

plpy.info(x.as_tuple())

Should do.

>>> d  = decimal.Decimal((0,(3,1,4),-2))
>>> d.as_tuple()
DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2)
>>> d.as_tuple() == (0,(3,1,4),-2)
True
>>> d = decimal.Decimal("3.14")
>>> d.as_tuple()
DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2)
>>> d.as_tuple() == (0,(3,1,4),-2)
True
>>>

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Peter Eisentraut

Date:

06 July 2013, 02:46:55

On Fri, 2013-06-28 at 22:28 +0200, Szymon Guz wrote:
> I agree, we can check both. This is quite a nice patch now, I've reviewed
> it, all tests pass, works as expected. I think it is ready for committing.

Committed.

Very nice to get that finally fixed.

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Peter Eisentraut

Date:

06 July 2013, 02:48:09

On Fri, 2013-06-28 at 17:29 -0300, Claudio Freire wrote:
> Why not forego checking of the type, and instead check the interface?
> 
> plpy.info(x.as_tuple())
> 
> Should do.
> 
> >>> d  = decimal.Decimal((0,(3,1,4),-2))
> >>> d.as_tuple()
> DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2) 

I think that potentially exposes us to more version differences and
such, and it doesn't really add much value in the test output.

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Claudio Freire

Date:

06 July 2013, 04:23:53

On Fri, Jul 5, 2013 at 11:47 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Fri, 2013-06-28 at 17:29 -0300, Claudio Freire wrote:
>> Why not forego checking of the type, and instead check the interface?
>>
>> plpy.info(x.as_tuple())
>>
>> Should do.
>>
>> >>> d  = decimal.Decimal((0,(3,1,4),-2))
>> >>> d.as_tuple()
>> DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2)
>
> I think that potentially exposes us to more version differences and
> such, and it doesn't really add much value in the test output.

Why?

That interface is documented, and Python guys aren't likely to change
it, not even in alternative implementations.

Besides, this is the "zen" way of checking. Python uses duck typing,
so checking the actual class of stuff is frowned upon. The "Pythonic"
way to write tests is to check the expected behavior of returned
objects. In this case, that it has an as_tuple that returns the right
thing.

You could also check other interesting behavior if you want, including
its string representation, that it can be converted to float, that two
decimals can be operated upon and that they preserve the right amount
of precision, etc..

Re: [PATCH] Fix conversion for Decimal arguments in plpython functions

From

Szymon Guz

Date:

06 July 2013, 18:10:52

<div dir="ltr"><br /><div class="gmail_extra"><br /><br /><div class="gmail_quote">On 28 June 2013 22:29, Claudio
Freire<span dir="ltr"><<a href="mailto:klaussfreire@gmail.com" target="_blank">klaussfreire@gmail.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div
class=""><divclass="h5">On Fri, Jun 28, 2013 at 5:14 PM, Steve Singer <<a
href="mailto:steve@ssinger.info">steve@ssinger.info</a>>wrote:<br /> > On 06/27/2013 05:04 AM, Szymon Guz
wrote:<br/> >><br /> >> On 27 June 2013 05:21, Steve Singer <<a
href="mailto:steve@ssinger.info">steve@ssinger.info</a><br/> >> <mailto:<a
href="mailto:steve@ssinger.info">steve@ssinger.info</a>>>wrote:<br /> >><br /> >>     On 06/26/2013
04:47PM, Szymon Guz wrote:<br /> >><br /> >><br /> >><br /> >><br /> >><br /> >><br
/>>> Hi Steve,<br /> >> thanks for the changes.<br /> >><br /> >> You're idea about common code
fordecimal and cdecimal is good, however<br /> >> not good enough. I like the idea of common code for decimal and
cdecimal.<br/> >> But we need class name, not the value.<br /> >><br /> >> I've changed the code from
str(x)to x.__class__.__name__ so the function<br /> >> prints class name (which is Decimal for both packages),
notthe value. We<br /> >> need to have the class name check. The value is returned by the function and<br />
>>is a couple of lines lower in the file.<br /> >><br /> >> patch is attached.<br /> >><br />
><br/> > I think the value is more important than the name, I want to the tests to<br /> > make sure that the
conversionis actually converting properly.  With your<br /> > method of getting the class name without the module we
canhave both.<br /> ><br /> > The attached patch should print the value and the class name but not the<br /> >
modulename.<br /><br /><br /></div></div>Why not forego checking of the type, and instead check the interface?<br /><br
/><ahref="http://plpy.info" target="_blank">plpy.info</a>(x.as_tuple())<br /><br /> Should do.<br /><br /> >>>
d = decimal.Decimal((0,(3,1,4),-2))<br /> >>> d.as_tuple()<br /> DecimalTuple(sign=0, digits=(3, 1, 4),
exponent=-2)<br/> >>> d.as_tuple() == (0,(3,1,4),-2)<br /> True<br /> >>> d =
decimal.Decimal("3.14")<br/> >>> d.as_tuple()<br /> DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2)<br />
>>>d.as_tuple() == (0,(3,1,4),-2)<br /> True<br /> >>><br /></blockquote></div><br /></div><div
class="gmail_extra">Yea,however decimal and cdecimal have different outputs:</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra"><div class="gmail_extra">For decimal:</div><div class="gmail_extra"><br /></div><div
class="gmail_extra">!INFO:  DecimalTuple(sign=1, digits=(1, 0, 0), exponent=0)</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">for cdecimal:</div><div class="gmail_extra"><br /></div><div class="gmail_extra">!
INFO: DecimalTuple(sign=1, digits=(1, 0, 0), exponent=0L)</div></div></div>