Thread:
Hi,
I'm trying to build some numerical processing algorithms on Postgres Array data types. Using PL/Python I can get access to the numpy libraries but the performance is not great. A guess is that there is a lot of overhead going from Postgres -> Python List -> Numpy and back again.
I'd like to test if that's the issue, and potentially fix by creating a C extension to convert directly from Postgres Array types to Numpy Array types. I _think_ I have the C side somewhat working, but I can't get Postgres to use the transform.
What I have:
---
CREATE FUNCTION arr_to_np(val internal) RETURNS internal LANGUAGE C AS 'MODULE_PATHNAME', 'arr_to_np';
CREATE FUNCTION np_to_arr(val internal) RETURNS real[] LANGUAGE C AS 'MODULE_PATHNAME', 'np_to_arr';
CREATE TRANSFORM FOR real[] LANGUAGE plpythonu (
FROM SQL WITH FUNCTION arr_to_np(internal),
TO SQL WITH FUNCTION np_to_arr(internal)
);
CREATE FUNCTION fn (a integer[]) RETURNS integer
TRANSFORM FOR TYPE real[]
AS $$ return a $$ LANGUAGE plpythonu;
----
The problem is this produces an error that transforms for type "real" doesn't work. It doesn't seem to allow for transforms on array's as opposed to underlying types. Is it possible to tell it to apply the transform to the array?
Thanks,
Mark
Mark Teper <mark.teper@gmail.com> writes: > The problem is this produces an error that transforms for type "real" > doesn't work. It doesn't seem to allow for transforms on array's as > opposed to underlying types. Is it possible to tell it to apply the > transform to the array? Yeah, see PLy_input_setup_func and PLy_output_setup_func, which both say * Choose conversion method. Note that transform functions are checked * for composite and scalar types, but not for arrays or domains. This is * somewhat historical, but we'd have a problem allowing them on domains, * since we drill down through all levels of a domain nest without looking * at the intermediate levels at all. At least for arrays, it might be sufficient to switch the order of the array-lookup and transform-lookup cases to fix this. I don't think anyone's felt motivated to look into that, up to now. regards, tom lane
Dear Mark,
I am also looking around to find way how to do effectively (in parallel?) computations (with potentially large and sometimes sparse) matrices directly in PostgreSQL. Some time ago I have found this experimental extension https://github.com/PandaPost/panda_post which "allow you to represent Python NumPy/Pandas objects in Postgres". There is also http://madlib.apache.org/ but is seems too much "heavyweight" for my use-case.
Now I do not have much time to spent with this, but I hope in late summer it will be better. I am looking forward what will be your conclusions.
Good luck, Jiří.
Hi,
I'm trying to build some numerical processing algorithms on Postgres Array data types. Using PL/Python I can get access to the numpy libraries but the performance is not great. A guess is that there is a lot of overhead going from Postgres -> Python List -> Numpy and back again.
I'd like to test if that's the issue, and potentially fix by creating a C extension to convert directly from Postgres Array types to Numpy Array types. I _think_ I have the C side somewhat working, but I can't get Postgres to use the transform.
What I have:
---
CREATE FUNCTION arr_to_np(val internal) RETURNS internal LANGUAGE C AS 'MODULE_PATHNAME', 'arr_to_np';
CREATE FUNCTION np_to_arr(val internal) RETURNS real[] LANGUAGE C AS 'MODULE_PATHNAME', 'np_to_arr';
CREATE TRANSFORM FOR real[] LANGUAGE plpythonu (
FROM SQL WITH FUNCTION arr_to_np(internal),
TO SQL WITH FUNCTION np_to_arr(internal)
);
CREATE FUNCTION fn (a integer[]) RETURNS integer
TRANSFORM FOR TYPE real[]
AS $$ return a $$ LANGUAGE plpythonu;
----
The problem is this produces an error that transforms for type "real" doesn't work. It doesn't seem to allow for transforms on array's as opposed to underlying types. Is it possible to tell it to apply the transform to the array?
Thanks,
Mark
Hi,
is the Python code running inside the PostgreSQL instance?
If not, the performance penalty might only get marginally better because it always involves to copy the data to the client and back again.
Mit freundlichen Grüßen
Stefan Sonnenberg-Carstens
Teamleiter Softwareentwicklung WECO / DevOps Engineer
B.Sc. Wirtschaftsinformatik
OPHARDT HYGIENE-TECHNIK GmbH + Co. KG
Lindenau 27
D - 47661 Issum
Tel: +49 2835 18-492
Fax: +49 2835 18-28492
Mail: ssonnenberg@ophardt.com
www.OPHARDT.com
Handelsregistereintrag beim Amtsgericht Kleve:
Kommanditgesellschaft HRA 1681, Komplementärin:
OPHARDT HYGIENE-TECHNIK VerwaltungsGmbH HRB 3910
Geschäftsführer: Thomas Houcken
USt.-Identifikations-Nr.: DE813067257
CONFIDENTIALITY NOTICE:
This e-mail communication and any attachments may contain confidential and privileged information for the use
of the designated recipients named above. Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies
of the original message.
Von: Mark Teper [mailto:mark.teper@gmail.com]
Gesendet: Donnerstag, 2. Mai 2019 19:51
An: pgsql-sql@lists.postgresql.org
Betreff:
Hi,
I'm trying to build some numerical processing algorithms on Postgres Array data types. Using PL/Python I can get access to the numpy libraries but the performance is not great. A guess is that there is a lot of overhead going from Postgres -> Python List -> Numpy and back again.
I'd like to test if that's the issue, and potentially fix by creating a C extension to convert directly from Postgres Array types to Numpy Array types. I _think_ I have the C side somewhat working, but I can't get Postgres to use the transform.
What I have:
---
CREATE FUNCTION arr_to_np(val internal) RETURNS internal LANGUAGE C AS 'MODULE_PATHNAME', 'arr_to_np';
CREATE FUNCTION np_to_arr(val internal) RETURNS real[] LANGUAGE C AS 'MODULE_PATHNAME', 'np_to_arr';
CREATE TRANSFORM FOR real[] LANGUAGE plpythonu (
FROM SQL WITH FUNCTION arr_to_np(internal),
TO SQL WITH FUNCTION np_to_arr(internal)
);
CREATE FUNCTION fn (a integer[]) RETURNS integer
TRANSFORM FOR TYPE real[]
AS $$ return a $$ LANGUAGE plpythonu;
----
The problem is this produces an error that transforms for type "real" doesn't work. It doesn't seem to allow for transforms on array's as opposed to underlying types. Is it possible to tell it to apply the transform to the array?
Thanks,
Mark
Attachment
Hi all,I'm using PostgreSQL 13.I have a table:CREATE TABLE test(docid integer, jsonb attrs);So, attrs contains data like..."dates": ["2019-10-02", "2018-02-03"]...So, I want to SELECT all docids, which dates in range:SELECT attrs FROM document_resinfo WHERE attrs @? '$.dates[*].datetime() ? (@ >= "2020-10-02".datetime())';How can I create index on attrs field to query docids with other date? Thanks.
..."dates": ["2019-10-02", "2018-02-03"]...So, I want to SELECT all docids, which dates in range:So, searching on dates field will be often and I want speed up by indexing but I don't known how.