Thread:

From
Mark Teper
Date:

Hi,

I'm trying to build some numerical processing algorithms on Postgres Array data types.  Using PL/Python I can get access to the numpy libraries but the performance is not great.  A guess is that there is a lot of overhead going from Postgres -> Python List -> Numpy and back again.

I'd like to test if that's the issue, and potentially fix by creating a C extension to convert directly from Postgres Array types to Numpy Array types.  I _think_ I have the C side somewhat working, but I can't get Postgres to use the transform.

What I have:

---

CREATE FUNCTION arr_to_np(val internal) RETURNS internal LANGUAGE C AS 'MODULE_PATHNAME', 'arr_to_np';

CREATE FUNCTION np_to_arr(val internal) RETURNS real[] LANGUAGE C AS 'MODULE_PATHNAME', 'np_to_arr';

CREATE TRANSFORM FOR real[] LANGUAGE plpythonu (

    FROM SQL WITH FUNCTION arr_to_np(internal),

    TO SQL WITH FUNCTION np_to_arr(internal)

);

CREATE FUNCTION fn (a integer[]) RETURNS integer

    TRANSFORM FOR TYPE real[]  

     AS $$  return a $$ LANGUAGE plpythonu;

----

The problem is this produces an error that transforms for type "real" doesn't work.  It doesn't seem to allow for transforms on array's as opposed to underlying types.  Is it possible to tell it to apply the transform to the array?  

Thanks,

Mark

Re: plpython transforms vs. arrays

From
Tom Lane
Date:
Mark Teper <mark.teper@gmail.com> writes:
> The problem is this produces an error that transforms for type "real"
> doesn't work.  It doesn't seem to allow for transforms on array's as
> opposed to underlying types.  Is it possible to tell it to apply the
> transform to the array?

Yeah, see PLy_input_setup_func and PLy_output_setup_func, which both
say

     * Choose conversion method.  Note that transform functions are checked
     * for composite and scalar types, but not for arrays or domains.  This is
     * somewhat historical, but we'd have a problem allowing them on domains,
     * since we drill down through all levels of a domain nest without looking
     * at the intermediate levels at all.

At least for arrays, it might be sufficient to switch the order of the
array-lookup and transform-lookup cases to fix this.  I don't think
anyone's felt motivated to look into that, up to now.

            regards, tom lane



Re: plpython transforms vs. arrays

From
Jiří Fejfar
Date:

Dear Mark,


I am also looking around to find way how to do effectively (in parallel?) computations (with potentially large and sometimes sparse) matrices directly in PostgreSQL. Some time ago I have found this experimental extension https://github.com/PandaPost/panda_post which "allow you to represent Python NumPy/Pandas objects in Postgres". There is also http://madlib.apache.org/ but is seems too much "heavyweight" for my use-case.

Now I do not have much time to spent with this, but I hope in late summer it will be better. I am looking forward what will be your conclusions.


Good luck, Jiří.


On 5/2/19 7:51 PM, Mark Teper wrote:

Hi,

I'm trying to build some numerical processing algorithms on Postgres Array data types.  Using PL/Python I can get access to the numpy libraries but the performance is not great.  A guess is that there is a lot of overhead going from Postgres -> Python List -> Numpy and back again.

I'd like to test if that's the issue, and potentially fix by creating a C extension to convert directly from Postgres Array types to Numpy Array types.  I _think_ I have the C side somewhat working, but I can't get Postgres to use the transform.

What I have:

---

CREATE FUNCTION arr_to_np(val internal) RETURNS internal LANGUAGE C AS 'MODULE_PATHNAME', 'arr_to_np';

CREATE FUNCTION np_to_arr(val internal) RETURNS real[] LANGUAGE C AS 'MODULE_PATHNAME', 'np_to_arr';

CREATE TRANSFORM FOR real[] LANGUAGE plpythonu (

    FROM SQL WITH FUNCTION arr_to_np(internal),

    TO SQL WITH FUNCTION np_to_arr(internal)

);

CREATE FUNCTION fn (a integer[]) RETURNS integer

    TRANSFORM FOR TYPE real[]  

     AS $$  return a $$ LANGUAGE plpythonu;

----

The problem is this produces an error that transforms for type "real" doesn't work.  It doesn't seem to allow for transforms on array's as opposed to underlying types.  Is it possible to tell it to apply the transform to the array?  

Thanks,

Mark

AW:

From
"Sonnenberg-Carstens, Stefan"
Date:

Hi,

 

is the Python code running inside the PostgreSQL instance?

If not, the performance penalty might only get marginally better because it always involves to copy the data to the client and back again.

Mit freundlichen Grüßen
Stefan Sonnenberg-Carstens

Teamleiter Softwareentwicklung WECO / DevOps Engineer

B.Sc. Wirtschaftsinformatik

Signatur_50-Jahre-ingo-man

OPHARDT HYGIENE-TECHNIK GmbH + Co. KG
Lindenau 27
D - 47661 Issum

Tel:  +49 2835 18-492
Fax: +49 2835 18-28492
Mail: ssonnenberg@ophardt.com

www.OPHARDT.com

Handelsregistereintrag beim Amtsgericht Kleve:
Kommanditgesellschaft HRA 1681, Komplementärin:
OPHARDT HYGIENE-TECHNIK VerwaltungsGmbH HRB 3910

Geschäftsführer: Thomas Houcken

USt.-Identifikations-Nr.: DE813067257


CONFIDENTIALITY NOTICE:

This e-mail communication and any attachments may contain confidential and privileged information for the use
of the designated recipients named above.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies
of the original message.

 

Von: Mark Teper [mailto:mark.teper@gmail.com]
Gesendet: Donnerstag, 2. Mai 2019 19:51
An: pgsql-sql@lists.postgresql.org
Betreff:

 

Hi,

I'm trying to build some numerical processing algorithms on Postgres Array data types.  Using PL/Python I can get access to the numpy libraries but the performance is not great.  A guess is that there is a lot of overhead going from Postgres -> Python List -> Numpy and back again.

I'd like to test if that's the issue, and potentially fix by creating a C extension to convert directly from Postgres Array types to Numpy Array types.  I _think_ I have the C side somewhat working, but I can't get Postgres to use the transform.

What I have:

---

CREATE FUNCTION arr_to_np(val internal) RETURNS internal LANGUAGE C AS 'MODULE_PATHNAME', 'arr_to_np';

CREATE FUNCTION np_to_arr(val internal) RETURNS real[] LANGUAGE C AS 'MODULE_PATHNAME', 'np_to_arr';

CREATE TRANSFORM FOR real[] LANGUAGE plpythonu (

    FROM SQL WITH FUNCTION arr_to_np(internal),

    TO SQL WITH FUNCTION np_to_arr(internal)

);

CREATE FUNCTION fn (a integer[]) RETURNS integer

    TRANSFORM FOR TYPE real[]  

     AS $$  return a $$ LANGUAGE plpythonu;

----

The problem is this produces an error that transforms for type "real" doesn't work.  It doesn't seem to allow for transforms on array's as opposed to underlying types.  Is it possible to tell it to apply the transform to the array?  

Thanks,

Mark

Attachment

Re:

From
Steve Midgley
Date:


On Tue, Feb 1, 2022 at 12:42 AM Дмитрий Воронин <carriingfate92@yandex.ru> wrote:
Hi all,
 
I'm using PostgreSQL 13.
 
I have a table:
 
CREATE TABLE test(docid integer, jsonb attrs);
 
So, attrs contains data like
 
...
"dates": ["2019-10-02", "2018-02-03"]
...
 
So, I want to SELECT all docids, which dates in range:
 
SELECT attrs FROM document_resinfo WHERE attrs @? '$.dates[*].datetime() ? (@ >= "2020-10-02".datetime())';
 
How can I create index on attrs field to query docids with other date? Thanks.
 

Have you tried just putting a default index on that column? I think it should work fine.

CREATE INDEX attrs_idx ON test (attrs)

IIRC, jsonb can be indexed like any other column and you get significant performance benefits when using the index. Also IIRC, you can index "deeper" into jsonb if you only want to index part of the jsonb structure - which is more efficient, so you don't index a bunch of elements that you never search.

Have you tried this approach? What problems are you experiencing?

Steve

Re:

From
"David G. Johnston"
Date:
On Tuesday, February 1, 2022, Дмитрий Воронин <carriingfate92@yandex.ru> wrote:
 
...
"dates": ["2019-10-02", "2018-02-03"]
...
 
So, I want to SELECT all docids, which dates in range:
 


So, searching on dates field will be often and I want speed up by indexing but I don't known how.

Create a generated column of type daterange and populate that on insert/update.  Index that column.  Write queries against that column. (Not tested, but in short get rid of the json pseudo daterange array implementation and use the real SQL daterange type)

David J.