Thread: [HACKERS] Jsonb transform for pl/python
Hi. I've implemented jsonb transform (https://www.postgresql.org/docs/9.5/static/sql-createtransform.html) for pl/python. 1. '{"1":1}'::jsonb is transformed into dict {"1"=>1}, while '["1",2]'::jsonb is transformed into list(not tuple!) ["1", 2] 2. If there is a numeric value appear in jsonb, it will be transformed to decimal through string (Numeric->String->Decimal). Not the best solution, but as far as I understand this is usual practise in postgresql to serialize Numerics and de-serialize them. 3. Decimal is transformed into jsonb through string (Decimal->String->Numeric). An example may also be helpful to understand extension. So, as an example, function "test" transforms incoming jsonb into python, transforms it back into jsonb and returns it. create extension jsonb_plpython2u cascade; create or replace function test(val jsonb) returns jsonb transform for type jsonb language plpython2u as $$ return (val); $$; select test('{"1":1,"example": null}'::jsonb); -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Oct 25, 2017 at 02:51:00PM +0300, Anthony Bykov wrote: > Hi. > I've implemented jsonb transform > (https://www.postgresql.org/docs/9.5/static/sql-createtransform.html) > for pl/python. > > 1. '{"1":1}'::jsonb is transformed into dict {"1"=>1}, while > '["1",2]'::jsonb is transformed into list(not tuple!) ["1", 2] > > 2. If there is a numeric value appear in jsonb, it will be transformed > to decimal through string (Numeric->String->Decimal). Not the best > solution, but as far as I understand this is usual practise in > postgresql to serialize Numerics and de-serialize them. > > 3. Decimal is transformed into jsonb through string > (Decimal->String->Numeric). > > An example may also be helpful to understand extension. So, as an > example, function "test" transforms incoming jsonb into python, > transforms it back into jsonb and returns it. > > create extension jsonb_plpython2u cascade; Thanks for your hard work! Should there also be one for PL/Python3U? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, 29 Oct 2017 19:11:02 +0100 David Fetter <david@fetter.org> wrote: > Thanks for your hard work! > > Should there also be one for PL/Python3U? > > Best, > David. Hi. Actually, there is one for PL/Python3U. This patch contains following extensions: jsonb_plpythonu jsonb_plpython2u jsonb_plpython3u "make install" checks which python major version was your postgresql configured with and installs corresponding extension. -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 30, 2017 at 11:15:00AM +0300, Anthony Bykov wrote: > On Sun, 29 Oct 2017 19:11:02 +0100 > David Fetter <david@fetter.org> wrote: > > > Thanks for your hard work! > > > > Should there also be one for PL/Python3U? > > > > Best, > > David. > Hi. > Actually, there is one for PL/Python3U. This patch contains following > extensions: > jsonb_plpythonu > jsonb_plpython2u > jsonb_plpython3u > "make install" checks which python major version was your postgresql > configured with and installs corresponding extension. My mistake. Sorry about the noise. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed Hello Anthony, Great job! I decided to take a closer look on your patch. Here are some defects I discovered. > + Additional extensions are available that implement transforms for > + the <type>jsonb</type> type for the language PL/Python. The > + extensions for PL/Perl are called 1. The part regarding PL/Perl is obviously from another patch. 2. jsonb_plpython2u and jsonb_plpythonu are marked as relocatable, while jsonb_plpython3u is not. Is it a mistake? Anyway if an extension is relocatable there should be a test that checks this. 3. Not all json types are test-covered. Tests for 'true' :: jsonb, '3.14' :: jsonb and 'null' :: jsonb are missing. 4. jsonb_plpython.c:133 - "Iterate trhrough Jsonb object." Typo, it should be "through" or probably even "over". 5. It looks like you've implemented transform in two directions Python <-> JSONB, however I see tests only for Python <- JSONB case. 6. Tests passed on Python 2.7.14 but failed on 3.6.2: > CREATE EXTENSION jsonb_plpython3u CASCADE; > + ERROR: could not access file "$libdir/jsonb_plpython3": No such file or > directory module_pathname in jsonb_plpython3u.control should be $libdir/jsonb_plpython3u, not $libdir/jsonb_plpython3. Tested on Arch Linux x64, GCC 7.2.0. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, 09 Nov 2017 12:26:46 +0000 Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: > The following review has been posted through the commitfest > application: make installcheck-world: tested, failed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation: tested, passed > > Hello Anthony, > > Great job! > > I decided to take a closer look on your patch. Here are some defects > I discovered. > > > + Additional extensions are available that implement transforms > > for > > + the <type>jsonb</type> type for the language PL/Python. The > > + extensions for PL/Perl are called > > 1. The part regarding PL/Perl is obviously from another patch. > > 2. jsonb_plpython2u and jsonb_plpythonu are marked as relocatable, > while jsonb_plpython3u is not. Is it a mistake? Anyway if an > extension is relocatable there should be a test that checks this. > > 3. Not all json types are test-covered. Tests for 'true' :: jsonb, > '3.14' :: jsonb and 'null' :: jsonb are missing. > > 4. jsonb_plpython.c:133 - "Iterate trhrough Jsonb object." Typo, it > should be "through" or probably even "over". > > 5. It looks like you've implemented transform in two directions > Python <-> JSONB, however I see tests only for Python <- JSONB case. > > 6. Tests passed on Python 2.7.14 but failed on 3.6.2: > > > CREATE EXTENSION jsonb_plpython3u CASCADE; > > + ERROR: could not access file "$libdir/jsonb_plpython3": No such > > file or directory > > module_pathname in jsonb_plpython3u.control should be > $libdir/jsonb_plpython3u, not $libdir/jsonb_plpython3. > > Tested on Arch Linux x64, GCC 7.2.0. > > The new status of this patch is: Waiting on Author > Hello, Aleksander. Thank you for your time. The defects you have noticed were fixed. Please, find in attachments new version of the patch (it is called 0001-jsonb_plpython-extension-v2.patch). Most of changes were made to fix defects(list of the defects may be found in citation in the beginning of this message), but the algorithm of iterating through incoming jsonb was changed so that it looks tidier. -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed Hi Anthony, Thank you for the new version of the patch! Here is my code review. 1. In jsonb_plpython2.out: +CREATE FUNCTION back(val jsonb) RETURNS jsonb +LANGUAGE plpython2u +TRANSFORM FOR TYPE jsonb +as $$ +return val +$$; +SELECT back('null'::jsonb); + back +-------- + [null] +(1 row) + +SELECT back('1'::jsonb); + back +------ + [1] +(1 row) + +SELECT back('true'::jsonb); + back +-------- + [true] +(1 row) Maybe I'm missing something, but why exactly all JSONB values turn into arrays? 2. Could you please also add tests for some negative and real numbers? Also could you check that your code handles numbers like MAX_INT, MIN_INT, +/- infinity and NaN properly in both (Python <-> JSONB) directions? 3. Handling unicode strings properly is another thing that is worth checking. 4. I think we also need some tests that check the behavior of Python -> JSONB conversion when the object contains data that is not representable in JSON format, e.g. set(), some custom objects, etc. 5. PyObject_FromJsonbValue - I realize it's unlikely that the new jsonbValue->type will be introduced any time soon. Still I believe it's a good practice to add "it should never happen" default case that just does elog(ERROR, ...) in case it happens nevertheless. Otherwise in this scenario instead of reporting the error the code will silently do the wrong thing. 6. Well, you decided to make the extension non-relocatable. Could you at least describe what prevents it to be relocatable or why it's meaningless is a comment in .control file? Please note that almost all contrib/ extensions are relocatable. I believe your extension should be relocatable as well unless there is a good reason why it can't. The new status of this patch is: Waiting on Author
On Mon, 13 Nov 2017 15:08:16 +0000 Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: > The following review has been posted through the commitfest > application: make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation: tested, passed > > Hi Anthony, > > Thank you for the new version of the patch! Here is my code review. > > 1. In jsonb_plpython2.out: > > +CREATE FUNCTION back(val jsonb) RETURNS jsonb > +LANGUAGE plpython2u > +TRANSFORM FOR TYPE jsonb > +as $$ > +return val > +$$; > +SELECT back('null'::jsonb); > + back > +-------- > + [null] > +(1 row) > + > +SELECT back('1'::jsonb); > + back > +------ > + [1] > +(1 row) > + > +SELECT back('true'::jsonb); > + back > +-------- > + [true] > +(1 row) > > Maybe I'm missing something, but why exactly all JSONB values turn > into arrays? > > 2. Could you please also add tests for some negative and real > numbers? Also could you check that your code handles numbers like > MAX_INT, MIN_INT, +/- infinity and NaN properly in both (Python <-> > JSONB) directions? > > 3. Handling unicode strings properly is another thing that is worth > checking. > > 4. I think we also need some tests that check the behavior of Python > -> JSONB conversion when the object contains data that is not > representable in JSON format, e.g. set(), some custom objects, etc. > > 5. PyObject_FromJsonbValue - I realize it's unlikely that the new > jsonbValue->type will be introduced any time soon. Still I believe > it's a good practice to add "it should never happen" default case > that just does elog(ERROR, ...) in case it happens nevertheless. > Otherwise in this scenario instead of reporting the error the code > will silently do the wrong thing. > > 6. Well, you decided to make the extension non-relocatable. Could you > at least describe what prevents it to be relocatable or why it's > meaningless is a comment in .control file? Please note that almost > all contrib/ extensions are relocatable. I believe your extension > should be relocatable as well unless there is a good reason why it > can't. > > The new status of this patch is: Waiting on Author Hi, thank you for your review. I took your comments into account in the third version of the patch. In the new version, I've added all the tests you asked for. The interesting thing is that: 1. set or any other non-jsonb-transformable object is transformed into string and added to jsonb as a string. 2. couldn't find a solution of working with "inf": this extension troughs exception if python generates a number called "inf" and returns it, but if we pass a very large integer into a function, it works fine with the whole number. This situation can be seen in tests. I've added tests of large numerics which weights quite heavy. So, please find it in compressed format in attachments. -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi Anthony, > thank you for your review. I took your comments into account in the > third version of the patch. In the new version, I've added all the > tests you asked for. The interesting thing is that: > 1. set or any other non-jsonb-transformable object is transformed into > string and added to jsonb as a string. Well frankly I very much doubt that this: ``` +-- set -> jsonb +CREATE FUNCTION test1set() RETURNS jsonb +LANGUAGE plpython2u +TRANSFORM FOR TYPE jsonb +AS $$ +x = set() +x.add(1) +x.add("string") +x.add(None) +return x +$$; +SELECT test1set(); + test1set +---------------------------- + "set([1, 'string', None])" +(1 row) ``` ... is an expected and valid behavior. If user tries to transform a set() to JSONB this is most likely a mistake since there is no standard representation of a set() in JSONB. I believe we should rise an error in this case instead of generating a string. Besides user can expect that such string can be transformed back to set() which doesn't sound like a good idea either. If necessary, user can just transform a set() to a list(): ``` >>> x = set([1,2,3,4]) >>> x {1, 2, 3, 4} >>> list(x) [1, 2, 3, 4] ``` BTW I just recalled that Python supports complex numbers out-of-the box and that range and xrange are a separate types too: ``` >>> 1 + 2j (1+2j) >>> range(3) range(0, 3) >>> type(range(3)) <class 'range'> ``` I think we should add all this to the tests as well. Naturally complex numbers can't be represented in JSON so we should rise an error if user tries to transform a complex number to JSON. I'm not that sure regarding ranges though. Probably the best solution will be to rise and error in this case as well just to keep things consistent. > +ERROR: jsonb doesn't support inf type yet I would say this error message is too informal. How about something more like "Infinity can't be represented in JSONB"? > 2. couldn't find a solution of working with "inf": this extension > troughs exception if python generates a number called "inf" and returns > it, but if we pass a very large integer into a function, it works fine > with the whole number. This situation can be seen in tests. > > I've added tests of large numerics which weights quite heavy. So, > please find it in compressed format in attachments. I'm afraid that tests fail on Python 3: ``` SELECT test1set(); test1set ----------------------- ! "{None, 1, 'string'}" (1 row) DROP EXTENSION plpython3u CASCADE; --- 296,302 ---- SELECT test1set(); test1set ----------------------- ! "{1, None, 'string'}" (1 row) DROP EXTENSION plpython3u CASCADE ``` -- Best regards, Aleksander Alekseev
On Mon, Nov 20, 2017 at 10:48 PM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: > Well frankly I very much doubt that this: > [snip] > I'm afraid that tests fail on Python 3: So this still needs more work.. I am marking it as returned with feedback as there has been no updates for more than 1 week. -- Michael
Hello, fixed the issues: 1. Rising errors when invalid object being transformed. 2. We don't rise the exception when transforming range(3) only in python 2. In third one it is an error. Please, find the 4-th version of the patch in attachments to this message. -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 12/6/17 06:40, Anthony Bykov wrote: > Hello, > fixed the issues: > 1. Rising errors when invalid object being transformed. > 2. We don't rise the exception when transforming range(3) only in > python 2. In third one it is an error. > > Please, find the 4-th version of the patch in attachments to this > message. Why not make this part of the plpythonu extension? It doesn't have to be a separate contrib module. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 9 Dec 2017 16:57:05 -0500 Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 12/6/17 06:40, Anthony Bykov wrote: > > Hello, > > fixed the issues: > > 1. Rising errors when invalid object being transformed. > > 2. We don't rise the exception when transforming range(3) only in > > python 2. In third one it is an error. > > > > Please, find the 4-th version of the patch in attachments to this > > message. > > Why not make this part of the plpythonu extension? It doesn't have to > be a separate contrib module. > Hello, I thought about that, but the problem is that there will be no possibilities to create custom transform if we create this extension by default. For example, it is easy to check if we install this extension and try to create new transform: # create extension jsonb_plperl cascade; NOTICE: installing required extension "plperl" CREATE EXTENSION # CREATE TRANSFORM FOR jsonb LANGUAGE plperl ( # FROM SQL WITH FUNCTION jsonb_to_plperl(internal), # TO SQL WITH FUNCTION plperl_to_jsonb(internal) # ); 2017-12-11 10:23:07.507 MSK [19149] ERROR: transform for type jsonb language "plperl" already exists 2017-12-11 10:23:07.507 MSK [19149] STATEMENT: CREATE TRANSFORM FOR jsonb LANGUAGE plperl ( FROM SQL WITH FUNCTION jsonb_to_plperl(internal), TO SQL WITH FUNCTION plperl_to_jsonb(internal) ); ERROR: transform for type jsonb language "plperl" already exists Other types of transforms may be interesting for people when they want to transform the jsonb to certain structures. For example, what if the user wants the parameter to be always array inside the function, while this extension can return integers or strings in some cases. -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 12/11/17 03:22, Anthony Bykov wrote: >> Why not make this part of the plpythonu extension? It doesn't have to >> be a separate contrib module. >> > Hello, > I thought about that, but the problem is that there will be no > possibilities to create custom transform if we create this extension by > default. OK, could it be a separate extension, but part of the same code directory? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 12/11/17 03:22, Anthony Bykov wrote: >>> Why not make this part of the plpythonu extension? It doesn't have to >>> be a separate contrib module. >> I thought about that, but the problem is that there will be no >> possibilities to create custom transform if we create this extension by >> default. > OK, could it be a separate extension, but part of the same code directory? I think our makefile infrastructure only allows for one shlib to be built per directory. Admittedly, you could have two extensions sharing the same shlib (and the same regression test suite), but on the whole it's not clear to me why we should do that. regards, tom lane
On Thu, Dec 7, 2017 at 12:40 AM, Anthony Bykov <a.bykov@postgrespro.ru> wrote: > Hello, > fixed the issues: > 1. Rising errors when invalid object being transformed. > 2. We don't rise the exception when transforming range(3) only in > python 2. In third one it is an error. > > Please, find the 4-th version of the patch in attachments to this > message. -- Hi Anthony, FYI make docs fails: json.sgml:584: parser error : Opening and ending tag mismatch: xref line 581 and para </para> ^ json.sgml:585: parser error : Opening and ending tag mismatch: para line 575 and sect2 </sect2> ^ json.sgml:588: parser error : Opening and ending tag mismatch: sect2 line 572 and sect1 </sect1> ^ json.sgml:589: parser error : Premature end of data in tag sect1 line 3 json.sgml:589: parser error : chunk is not well balanced datatype.sgml:4354: parser error : Failure to process entity json &json; ^ datatype.sgml:4354: parser error : Entity 'json' not defined &json; ^ datatype.sgml:4955: parser error : chunk is not well balanced postgres.sgml:104: parser error : Failure to process entity datatype &datatype; ^ postgres.sgml:104: parser error : Entity 'datatype' not defined &datatype; ^ -- Thomas Munro http://www.enterprisedb.com
On Fri, 12 Jan 2018 13:33:56 +1300 Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Dec 7, 2017 at 12:40 AM, Anthony Bykov > <a.bykov@postgrespro.ru> wrote: > > Hello, > > fixed the issues: > > 1. Rising errors when invalid object being transformed. > > 2. We don't rise the exception when transforming range(3) only in > > python 2. In third one it is an error. > > > > Please, find the 4-th version of the patch in attachments to this > > message. -- > > Hi Anthony, > > FYI make docs fails: > > json.sgml:584: parser error : Opening and ending tag mismatch: xref > line 581 and para > </para> > ^ > json.sgml:585: parser error : Opening and ending tag mismatch: para > line 575 and sect2 > </sect2> > ^ > json.sgml:588: parser error : Opening and ending tag mismatch: sect2 > line 572 and sect1 > </sect1> > ^ > json.sgml:589: parser error : Premature end of data in tag sect1 line > 3 json.sgml:589: parser error : chunk is not well balanced > datatype.sgml:4354: parser error : Failure to process entity json > &json; > ^ > datatype.sgml:4354: parser error : Entity 'json' not defined > &json; > ^ > datatype.sgml:4955: parser error : chunk is not well balanced > postgres.sgml:104: parser error : Failure to process entity datatype > &datatype; > ^ > postgres.sgml:104: parser error : Entity 'datatype' not defined > &datatype; > ^ > Hello, thank you for your message. Fixed it. Here is a new version of the patch. -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed LGTM. The new status of this patch is: Ready for Committer
On 1/12/18 10:43, Aleksander Alekseev wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation: tested, passed > > LGTM. > > The new status of this patch is: Ready for Committer I've been working on polishing this a bit. I'll keep working on it. It should be ready to commit soon. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01.02.2018 19:06, Peter Eisentraut wrote: > On 1/12/18 10:43, Aleksander Alekseev wrote: >> The following review has been posted through the commitfest application: >> make installcheck-world: tested, passed >> Implements feature: tested, passed >> Spec compliant: tested, passed >> Documentation: tested, passed >> >> LGTM. >> >> The new status of this patch is: Ready for Committer > I've been working on polishing this a bit. I'll keep working on it. It > should be ready to commit soon. Hi. I have reviewed this patch. Attached new 6th version of the patch with v5-v6 delta-patch. * Added out of memory checks after the following function calls: - PyList_New() - PyDict_New() - PyString_FromStringAndSize() (added PyString_FromJsonbValue()) * Added Py_XDECREF() for key-value pairs and list elements after theirs appending because PyDict_SetItem() and PyList_Append() do not steal references (see also hstore_plpython code). * Removed unnecessary JsonbValue heap-allocations in PyObject_ToJsonbValue(). * Added iterating to the end of iterator in PyObject_FromJsonb() for correct freeing of JsonbIterators. * Passed JsonbParseState ** to PyXxx_ToJsonbValue() functions. * Added transformation of Python tuples into JSON arrays because standard Python JSONEncoder encoder does the same. (See https://docs.python.org/2/library/json.html#py-to-json-table) -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 3/12/18 11:26, Nikita Glukhov wrote: > I have reviewed this patch. Attached new 6th version of the patch with > v5-v6 delta-patch. Thanks for the update. I'm working on committing this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/21/18 18:50, Peter Eisentraut wrote: > On 3/12/18 11:26, Nikita Glukhov wrote: >> I have reviewed this patch. Attached new 6th version of the patch with >> v5-v6 delta-patch. > > Thanks for the update. I'm working on committing this. Committed. Everything seemed to work well. I just did a bit of cosmetic cleanups. I did a fair amount of tweaking on the naming of functions, the extensions and library names, etc., to make it look like the existing plpython transforms. I also changed it so that the transform would support mappings and sequences other than dict and list. I removed the non-ASCII test and the test with the huge numbers. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 28.03.2018 15:43, Peter Eisentraut wrote: > On 3/21/18 18:50, Peter Eisentraut wrote: >> On 3/12/18 11:26, Nikita Glukhov wrote: >>> I have reviewed this patch. Attached new 6th version of the patch with >>> v5-v6 delta-patch. >> Thanks for the update. I'm working on committing this. > Committed. > > Everything seemed to work well. I just did a bit of cosmetic cleanups. > I did a fair amount of tweaking on the naming of functions, the > extensions and library names, etc., to make it look like the existing > plpython transforms. I also changed it so that the transform would > support mappings and sequences other than dict and list. I removed the > non-ASCII test and the test with the huge numbers. I found a memory leak in PLySequence_ToJsonbValue(): PyObject returned from PySequence_GetItem() is not released. A bug can be easily reproduced using function roudtrip() from regression test: SELECT roundtrip('[1,2,3]'::jsonb) FROM generate_series(1, 1000000); Similar code in PLyMapping_ToJsonbValue() seems to be correct because PyList_GetItem() and PyTuple_GetItem() return a borrowed reference. Patch with fix is attached. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi! On Fri, Jun 15, 2018 at 2:11 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > > On 28.03.2018 15:43, Peter Eisentraut wrote: > > > On 3/21/18 18:50, Peter Eisentraut wrote: > >> On 3/12/18 11:26, Nikita Glukhov wrote: > >>> I have reviewed this patch. Attached new 6th version of the patch with > >>> v5-v6 delta-patch. > >> Thanks for the update. I'm working on committing this. > > Committed. > > > > Everything seemed to work well. I just did a bit of cosmetic cleanups. > > I did a fair amount of tweaking on the naming of functions, the > > extensions and library names, etc., to make it look like the existing > > plpython transforms. I also changed it so that the transform would > > support mappings and sequences other than dict and list. I removed the > > non-ASCII test and the test with the huge numbers. > > > I found a memory leak in PLySequence_ToJsonbValue(): > PyObject returned from PySequence_GetItem() is not released. > > A bug can be easily reproduced using function roudtrip() from regression test: > SELECT roundtrip('[1,2,3]'::jsonb) FROM generate_series(1, 1000000); > > Similar code in PLyMapping_ToJsonbValue() seems to be correct because > PyList_GetItem() and PyTuple_GetItem() return a borrowed reference. > > Patch with fix is attached. I'm going to check and commit this if everything is OK. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi! We have found another performance problem in this transform -- very slow conversion via I/O from PostgreSQL numerics (which are used for the representation of jsonb numbers) to Python Decimals. Attached patch with fix. We are simply trying first to convert numeric to int64 if is does not have digits after the decimal point, and then construct Python Int instead of Decimal. Standard Python json.loads() does the same for exact integers. A special function numeric_to_exact_int64() was added to numeric.c. Existing numeric_int8() can't be used here because it rounds input numeric. Performance results (see the second attached file jsonb_plplython_tests.sql for the function definitions): - calculating the length of the passed jsonb object (input transformation): py_jsonb_length_trans opt 2761,873 ms py_jsonb_length_trans 10419,230 ms py_jsonb_length_json 8691,201 ms - returning integer arrays (output transformation): py_jsonb_ret_int_array_trans opt 3284,810 ms py_jsonb_ret_int_array_trans 4540,063 ms py_jsonb_ret_int_array_raw 5100,793 ms py_jsonb_ret_int_array_json 9887,821 ms - returning float arrays (output transformation): py_jsonb_ret_float_array_trans opt 5699,360 ms py_jsonb_ret_float_array_trans 5735,854 ms py_jsonb_ret_float_array_raw 6516,514 ms py_jsonb_ret_float_array_json 10869,213 ms -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 6/23/18 01:44, Nikita Glukhov wrote: > We are simply trying first to convert numeric to int64 if is does not have > digits after the decimal point, and then construct Python Int instead of > Decimal. Standard Python json.loads() does the same for exact integers. We just had a very similar but not identical discussion in the thread about the PL/Perl transforms, where we rejected the proposal. Other comments on this one? In any case, this might well be a consideration for PG12, not a bug in the existing implementation. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 6/23/18 01:44, Nikita Glukhov wrote: >> We are simply trying first to convert numeric to int64 if is does not have >> digits after the decimal point, and then construct Python Int instead of >> Decimal. Standard Python json.loads() does the same for exact integers. > We just had a very similar but not identical discussion in the thread > about the PL/Perl transforms, where we rejected the proposal. Other > comments on this one? I can sympathize with the speed concern, but the proposed patch produces a functional change (ie, you get a different kind of Python object, with different behaviors, in some cases). That seems to me to be a good reason to reject it. The fact that it makes the behavior vary depending on the local width of "long" is also a problem, although I think that could be fixed. More generally, I'd be happier with a patch that sped things up for non-integers too. I don't know whether Python exposes enough internals of type Decimal to make that practical, though. One idea for doing it at arm's length is to compute the Decimal value using numeric-digit-at-a-time arithmetic, roughly x := 0; foreach(digit, numeric) x.fma(10000, digit); x.scaleb(appropriate-exponent); In principle that's probably faster than string conversion, but not sure by how much. regards, tom lane
On Wed, Jul 11, 2018 at 12:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > > On 6/23/18 01:44, Nikita Glukhov wrote: > >> We are simply trying first to convert numeric to int64 if is does not have > >> digits after the decimal point, and then construct Python Int instead of > >> Decimal. Standard Python json.loads() does the same for exact integers. > > > We just had a very similar but not identical discussion in the thread > > about the PL/Perl transforms, where we rejected the proposal. Other > > comments on this one? > > I can sympathize with the speed concern, but the proposed patch produces > a functional change (ie, you get a different kind of Python object, with > different behaviors, in some cases). That seems to me to be a good reason > to reject it. Nevertheless, as Nikita pointed, json.loads() performs the same in Python. See an example for python 2.7.10. >>> import json >>> type(json.loads('1')) <type 'int'> >>> type(json.loads('1.0')) <type 'float'> So, from postgres point of view this behavior is wrong. But on another hand why don't let python transform to behave like a python? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Working on the new lazy transform for jsonb I found another memory leak in PLyObject_ToJsonbValue(): palloc() for output boolean JsonbValue is unnecessary, 'out' variable is already initialized. Fix is attached. On 15.06.2018 14:42, Alexander Korotkov wrote: > Hi! > > On Fri, Jun 15, 2018 at 2:11 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > > I found a memory leak in PLySequence_ToJsonbValue(): > PyObject returned from PySequence_GetItem() is not released. > > A bug can be easily reproduced using function roudtrip() from regression test: > SELECT roundtrip('[1,2,3]'::jsonb) FROM generate_series(1, 1000000); > > Similar code in PLyMapping_ToJsonbValue() seems to be correct because > PyList_GetItem() and PyTuple_GetItem() return a borrowed reference. > > Patch with fix is attached. > I'm going to check and commit this if everything is OK. > > ------ > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 27/09/2018 16:58, Nikita Glukhov wrote: > Working on the new lazy transform for jsonb I found another memory leak in > PLyObject_ToJsonbValue(): palloc() for output boolean JsonbValue is unnecessary, > 'out' variable is already initialized. > > Fix is attached. Committed, thanks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services