Home > mailing lists

Thread: python modul pre-import to avoid importing each time

python modul pre-import to avoid importing each time

From

Rémi Cura

Date:

19 June 2014, 14:50:46

Hey List,

I use plpython with postgis and 2 python modules (numpy and shapely).

Sadly importing such module in the plpython function is very slow (several hundreds of milliseconds).

I also don't know if this overhead is applied each time the function is called in the same session.

Is there a way to pre-import those modules once and for all,
such that the python function are accelerated?

Thanks,

Cheers,
Rémi-C

Re: python modul pre-import to avoid importing each time

From

Jeff Janes

Date:

25 June 2014, 19:46:28

On Thu, Jun 19, 2014 at 7:50 AM, Rémi Cura <remi.cura@gmail.com> wrote:
> Hey List,
>
> I use plpython with postgis and 2 python modules (numpy and shapely).
> Sadly importing such module in the plpython function is very slow (several
> hundreds of milliseconds).

Is that mostly shapely (which I don't have)?  numpy seems to be pretty
fast, like 16ms.  But that is still slow for what you want, perhaps.

>
> I also don't know if this overhead is applied each time the function is
> called in the same session.

It is not.  The overhead is once per connection, not once per call.
So using a connection pooler could be really be a help here.

> Is there a way to pre-import those modules once and for all,
> such that the python function are accelerated?

I don't think there is.  With plperl you can do this by loading the
module in plperl.on_init and by putting plperl into
shared_preload_libraries so that this happens just at server start up.
But I don't see a way to do something analogous for plpython due to
lack of plpython.on_init.  I think that is because the infrastructure
to do that is part of making a "trusted" version of the language,
which python doesn't have.  (But it could just be that no one has ever
gotten around to adding it.)

Cheers,

Jeff

Re: python modul pre-import to avoid importing each time

From

Rémi Cura

Date:

26 June 2014, 09:14:36

Hey,
thanks for your answer !

Yep you are right, the function I would like to test are going to be called a lot (100k times), so even 15 ms per call matters.

I'm still a bit confused by a topic I found here : http://stackoverflow.com/questions/15023080/how-are-import-statements-in-plpython-handled

The answer gives a trick to avoid importing each time, so somehow it must be usefull.

On another internet page (can't find it anymore) somebody mentioned this module loading at server startup, one way or another, but gave no precision. It seems that the "plpy" python module get loaded by default, would'nt it be possible to hack this module to add other import inside it?

I also use PL/R (untrusted I guess) and you can create a special table to indicate which module to load at startup.

Cheers,
Rémi-C

2014-06-25 21:46 GMT+02:00 Jeff Janes <jeff.janes@gmail.com>:

On Thu, Jun 19, 2014 at 7:50 AM, Rémi Cura <remi.cura@gmail.com> wrote:
> Hey List,
>
> I use plpython with postgis and 2 python modules (numpy and shapely).
> Sadly importing such module in the plpython function is very slow (several
> hundreds of milliseconds).

Is that mostly shapely (which I don't have)? numpy seems to be pretty
fast, like 16ms. But that is still slow for what you want, perhaps.

>
> I also don't know if this overhead is applied each time the function is
> called in the same session.

It is not. The overhead is once per connection, not once per call.
So using a connection pooler could be really be a help here.

> Is there a way to pre-import those modules once and for all,
> such that the python function are accelerated?

I don't think there is. With plperl you can do this by loading the
module in plperl.on_init and by putting plperl into
shared_preload_libraries so that this happens just at server start up.
But I don't see a way to do something analogous for plpython due to
lack of plpython.on_init. I think that is because the infrastructure
to do that is part of making a "trusted" version of the language,
which python doesn't have. (But it could just be that no one has ever
gotten around to adding it.)

Cheers,

Jeff

Re: python modul pre-import to avoid importing each time

From

Adrian Klaver

Date:

26 June 2014, 13:18:32

On 06/26/2014 02:14 AM, Rémi Cura wrote:
> Hey,
> thanks for your answer !
>
> Yep you are right, the function I would like to test are going to be
> called a lot (100k times), so even 15 ms per call matters.
>
> I'm still a bit confused by a topic I found here :
> http://stackoverflow.com/questions/15023080/how-are-import-statements-in-plpython-handled
>
> The answer gives a trick to avoid importing each time, so somehow it
> must be usefull.

Peters answer is based on using the global dictionary SD to store an
imported library. For more information see here:

http://www.postgresql.org/docs/9.3/interactive/plpython-sharing.html

>
> On another internet page (can't find it anymore) somebody mentioned this
> module loading at server startup, one way or another, but gave no
> precision. It seems that the "plpy" python module get loaded by default,
> would'nt it be possible to hack this module to add other import inside it?

In a sense that is what is being suggested above.

>
> I also use PL/R (untrusted I guess) and you can create a special table
> to indicate which module to load at startup.
>
> Cheers,
> Rémi-C
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: python modul pre-import to avoid importing each time

From

Jeff Janes

Date:

26 June 2014, 18:16:59

On Thu, Jun 26, 2014 at 2:14 AM, Rémi Cura <remi.cura@gmail.com> wrote:
> Hey,
> thanks for your answer !
>
> Yep you are right, the function I would like to test are going to be called
> a lot (100k times), so even 15 ms per call matters.
>
> I'm still a bit confused by a topic I found here :
> http://stackoverflow.com/questions/15023080/how-are-import-statements-in-plpython-handled
> The answer gives a trick to avoid importing each time, so somehow it must be
> usefull.

I'd want to see the benchmark before deciding that how useful it actually is....

Anyway, that seems to be about calling import over and over within the
same connection, not between different connections, as is your issue.
Also, I think that that suggestion is targeted at removing what is
already a very minor overhead, which is importing the symbols from the
module into the importer's namespace (or however you translate that
into python speak).  The slow part is loading the module in the first
place (finding the shared objects, parsing the module's source code,
gluing them together, etc.), not importing the python symbols.

If you arrange to re-use connections, you will probably find no
further optimization is needed.

> On another internet page (can't find it anymore) somebody mentioned this
> module loading at server startup, one way or another, but gave no precision.
> It seems that the "plpy" python module get loaded by default, would'nt it be
> possible to hack this module to add other import inside it?

I just thought your question looked lonely and that I'd tell you what
I learned about plperl in case it helped.  There may be a way to do
about the same thing in plpython, but if so it doesn't seem to be
documented, or analogous to the way plperl does it.  I'm afraid that
exhausts my knowledge of plpython.  I don't see any files that
suggests there is a user-editable plpy.py module.  If you are willing
to monkey around with C and recompiling, you could probably make it
happen somehow, though.

Cheers,

Jeff

Re: python modul pre-import to avoid importing each time

From

Tom Lane

Date:

26 June 2014, 18:17:19

Adrian Klaver <adrian.klaver@aklaver.com> writes:
> On 06/26/2014 02:14 AM, Rémi Cura wrote:
>> On another internet page (can't find it anymore) somebody mentioned this
>> module loading at server startup, one way or another, but gave no
>> precision. It seems that the "plpy" python module get loaded by default,
>> would'nt it be possible to hack this module to add other import inside it?

> In a sense that is what is being suggested above.

IIRC, plperl has a GUC you can set to tell it to do things at the time
it's loaded (which of course you use in combination with having listed
plperl in shared_preload_libraries).  There's no reason except lack of
round tuits why plpython couldn't have a similar feature.

            regards, tom lane

Re: python modul pre-import to avoid importing each time

From

Adrian Klaver

Date:

27 June 2014, 02:28:06

On 06/26/2014 02:14 AM, Rémi Cura wrote:
> Hey,
> thanks for your answer !
>
> Yep you are right, the function I would like to test are going to be
> called a lot (100k times), so even 15 ms per call matters.
>

I got to thinking about this.

100K over what time frame?

How is it being called?




--
Adrian Klaver
adrian.klaver@aklaver.com

Re: python modul pre-import to avoid importing each time

From

Rémi Cura

Date:

27 June 2014, 08:18:28

Hey,

thanks, now we have good information:

the python package are really loaded once per connection, so no optimization is needed.
Unlike plperl or plR there is no easy way to preload packages.

There may be some solutions to make this import at connection start but it would involve C modification (found no trace of python file or hackable sql script in postgres source and install directory)

After that,
further optimization is possible by avoiding the useless 'import' (because it is already loaded) (see the trick here)

,however benefits are not proven.

My use case is simple geometry manipulation functions. It is easier to use plpython rather than plpgsql thanks to numpy for vector manipulation. Usually the functions are called inside complex query with many CTE, and execute over 100k of rows. Total execution time is in the order of minutes. (exemple of querry at the end)

Thanks everybody,
Rémi

Example of querry

CREATE TABLE holding_result AS

WITH the_geom AS (

SELECT gid, geom
FROM my_big_table --200k rows

)

SELECT gid, my_python_function(geom) AS result
FROM the_geom;

2014-06-27 4:27 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 06/26/2014 02:14 AM, Rémi Cura wrote:
Hey,
thanks for your answer !

Yep you are right, the function I would like to test are going to be
called a lot (100k times), so even 15 ms per call matters.

I got to thinking about this.

100K over what time frame?

How is it being called?

--
Adrian Klaver
adrian.klaver@aklaver.com