Thread: Re: psycopg3, prepared statements

Re: psycopg3, prepared statements

From
Adrian Klaver
Date:
On 12/21/20 6:24 AM, Daniele Varrazzo wrote:
> The one thing, the most requested thing in psycopg, is support for
> prepared statements.
> 
> In psycopg3 for the moment there is:
> 
> - very low level support for prepared statement, i.e. wrapping of
> libpq functions such as PQsendPrepare/PQsendQueryPrepared
> (https://www.postgresql.org/docs/current/libpq-async.html#LIBPQ-PQSENDPREPARE)
> - automatic use of prepared statements in `cursor.executemany()`,
> which might eventually stop sucking.
> 
> Gathering some ideas:
> 
> Prepared statements in the server are per session, so any form of
> cache is better connected to the connection than the cursor, although
> the cursors are the obvious interface to give commands.
> 
> In the past [1] I thought about exposing explicit prepare/deallocate
> on the cursor, and it was a single prepared query per cursor. A
> `cursor.prepare(query)` with no args doesn't have types information
> though: if any it should take an optional array of parameters to get
> types from.
> 
> What I'm thinking about is to prepare queries automatically with a schema such:
> 
> - decisions are made after the query is transformed to postgres format
> (i.e. it is reduced to bytes, all the client-side manipulations have
> been done, placeholders have been transformed to $ format). There is
> an object in psycopg3 that takes care of this transformation [2]
> - the number of times a query is seen is stored in a LRU cache on the connection
> - if a query is seen more than `connection.prepare_threshold` times
> (proposed default: 5) then it is prepared with the name
> f'pg3_{hash(query)}' and the following executions are prepared.
> - if more than `connection.prepared_number` queries are prepared, the
> one used least recently is deallocated and evicted from the cache
> (proposed default: 100).
> - Parameters may be fudged on the connection: prepared_threshold=0
> would prepare all queries, prepared_threshold=None would disable
> preparing.
> - For the control freak, cursor.execute(query, params, prepare=True)
> would prepare the query immediately, if it isn't already,
> prepare=False would avoid preparation. The default None would enable
> the automatic choice.

So your plan from [1] was like that in plpythonu where the process is 
broken down into two parts. What I'm not following is whether that is 
still the plan or whether prepare/execute is going to happen without the 
cursor.prepare() and just be automatic on cursor.execute()? With the 
provision to override per cursor.

> 
> [1] https://gist.github.com/dvarrazzo/3797445
> [2] https://github.com/psycopg/psycopg3/blob/c790a832/psycopg3/psycopg3/_queries.py#L27
> 
> What do you think?
> 
> Cheers
> 
> -- Daniele
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: psycopg3, prepared statements

From
Daniele Varrazzo
Date:
On Mon, 21 Dec 2020 at 16:02, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> So your plan from [1] was like that in plpythonu where the process is
> broken down into two parts.

Is there anything useful to learn from the plpythonu experience? I'm
not very familiar with it.


> What I'm not following is whether that is
> still the plan or whether prepare/execute is going to happen without the
> cursor.prepare() and just be automatic on cursor.execute()? With the
> provision to override per cursor.

I would do without `cursor.prepare(query)`. As a starter it should be
`cursor.prepare(query, args)` in order to sniff the types, so you
would also need to pass it a sample of data (or the types). So now 1)
as interface it's very similar to `cursor.execute(query, args)`, but
it would prepare and not run the query, and 2) it's a moment in your
program's lifetime where you know the query to run and the arguments:
most likely now you want to run the query too, not only to prepare it.
So the lack of a function "prepare the query, with these types, but
don't run it" doesn't seem a very likely one.

I am also assuming that, if the assumption is that "preparing queries
is good", it would be better to let it happen automatically instead of
asking the people to do it on their own.

Does it make sense?

----

I mistakenly replied Adrian privately. Following, his reply.

On Mon, 21 Dec 2020 at 16:45, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> Is there anything useful to learn from the plpythonu experience? I'm
> not very familiar with it.

The user visible portion can be found here:

https://www.postgresql.org/docs/current/plpython-database.html

The most notable part is that it requires type definition(s) for
parameters in the prepare portion. That and a prepared query can be
passed around to different execute and different function calls.


> I would do without `cursor.prepare(query)`. [...]

If I read the above correctly this boils down to cursor.execute(query,
args) will always be prepared once the
connection.prepare_threshold(default 5)is passed and evicted after
connection.prepared_number(default 100) other queries are run.

> Does it make sense?

I think so. Personally, I prefer the two step approach as I am becoming
less and less enthusiastic about hidden 'magic'. To that end a
global(maybe connection) setting that would disable prepare would be nice.

----

To which, 1) thank you very much, Adrian, for the plpython pointer,
I'll take a look at it.

2) About disabling the automatic prepare: the mechanism I have in mind
is to set prepare_threshold to None on the connection instance;
however we could make sure to have the default attribute defined on
the class: this way who really hates the idea of prepared statements
can be cheeky and set `psycopg3.Connection.prepare_threshold = None`
instead of `myconn.prepare_threshold`... More seriously, if there is a
large base of people who think that something can go wrong with
prepared statement we can either provide a better interface to control
it globally or to have the feature opt-in.


-- Daniele



Re: psycopg3, prepared statements

From
Adrian Klaver
Date:
On 12/21/20 12:26 PM, Daniele Varrazzo wrote:
> On Mon, 21 Dec 2020 at 16:02, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> 

> 
> I mistakenly replied Adrian privately. Following, his reply.
> 

> I think so. Personally, I prefer the two step approach as I am becoming
> less and less enthusiastic about hidden 'magic'. To that end a
> global(maybe connection) setting that would disable prepare would be nice.
> 
> ----
> 
> To which, 1) thank you very much, Adrian, for the plpython pointer,
> I'll take a look at it.
> 
> 2) About disabling the automatic prepare: the mechanism I have in mind
> is to set prepare_threshold to None on the connection instance;

So the above is something the user would have to do on each connection?

> however we could make sure to have the default attribute defined on
> the class: this way who really hates the idea of prepared statements
> can be cheeky and set `psycopg3.Connection.prepare_threshold = None`

I could get behind that. This all may be premature optimization on my 
part. As long as there is some way to turn it off at some level I could 
live with it.

> instead of `myconn.prepare_threshold`... More seriously, if there is a
> large base of people who think that something can go wrong with
> prepared statement we can either provide a better interface to control
> it globally or to have the feature opt-in.
> 
> 
> -- Daniele
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: psycopg3, prepared statements

From
Vladimir Ryabtsev
Date:
> it's a moment in your
> program's lifetime where you know the query to run and the arguments:
> most likely now you want to run the query too, not only to prepare it.

I am just thinking about type issues in some edge cases...
What if the data is such that in the first row it has some small integers,
but on subsequent rows it has larger numbers that do not fit into "integer"?
There are probably other cases similar to this, including "None" values
on the moment you want to capture types... Maybe it is more reliable to
oblige the user, who knows their data better, to explicitly supply the types
for preparation... Will any type inference work in case of types mismatch?
If yes, to what extent?

Generally I think that automatic preparation is a great idea, but if it is achieved
by sacrificing reliability to any extent, I would prefer to have such a feature
switched off by default.

> - if more than `connection.prepared_number` queries are prepared, the
> one used least recently is deallocated and evicted from the cache
> (proposed default: 100).

Why do you need such logic? Why not just keep some limited number of
prepared statements? Is it a problem if a PS is in cache but rarely used?

--
Vlad


On Mon, 21 Dec 2020 at 13:55, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/21/20 12:26 PM, Daniele Varrazzo wrote:
> On Mon, 21 Dec 2020 at 16:02, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>

>
> I mistakenly replied Adrian privately. Following, his reply.
>

> I think so. Personally, I prefer the two step approach as I am becoming
> less and less enthusiastic about hidden 'magic'. To that end a
> global(maybe connection) setting that would disable prepare would be nice.
>
> ----
>
> To which, 1) thank you very much, Adrian, for the plpython pointer,
> I'll take a look at it.
>
> 2) About disabling the automatic prepare: the mechanism I have in mind
> is to set prepare_threshold to None on the connection instance;

So the above is something the user would have to do on each connection?

> however we could make sure to have the default attribute defined on
> the class: this way who really hates the idea of prepared statements
> can be cheeky and set `psycopg3.Connection.prepare_threshold = None`

I could get behind that. This all may be premature optimization on my
part. As long as there is some way to turn it off at some level I could
live with it.

> instead of `myconn.prepare_threshold`... More seriously, if there is a
> large base of people who think that something can go wrong with
> prepared statement we can either provide a better interface to control
> it globally or to have the feature opt-in.
>
>
> -- Daniele
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Aw: Re: psycopg3, prepared statements

From
Karsten Hilbert
Date:
 
>> - if more than `connection.prepared_number` queries are prepared, the> one
>> used least recently is deallocated and evicted from the cache
>> (proposed default: 100).
 
>Why do you need such logic? Why not just keep some limited number of
>prepared statements? Is it a problem if a PS is in cache but rarely used?

On resource constrained environments one wants to be
able to balance use vs disuse.

Here I agree with Adrian: automagic, yes but only if one can
opt out by default.

Karsten
 



Re: psycopg3, prepared statements

From
Daniele Varrazzo
Date:
On Tue, 22 Dec 2020 at 05:39, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:
>
> > it's a moment in your
> > program's lifetime where you know the query to run and the arguments:
> > most likely now you want to run the query too, not only to prepare it.
>
> I am just thinking about type issues in some edge cases...
> What if the data is such that in the first row it has some small integers,
> but on subsequent rows it has larger numbers that do not fit into "integer"?
> There are probably other cases similar to this, including "None" values
> on the moment you want to capture types... Maybe it is more reliable to
> oblige the user, who knows their data better, to explicitly supply the types
> for preparation... Will any type inference work in case of types mismatch?
> If yes, to what extent?

The only case I have thought where this can happen is with None vs.
not None, in which case the oid info would be missing. Oid selection
is based only on the Python type, not on the specific values (for
instance Python datetime is always passed as timestamptz, never
timestamp, even if the tezinfo is missing, and it relies on Postgres
cast): I have avoided so far to make a choice based on the values
exactly to avoid to find ourselves in this type of situations.

I don't have in mind a case where a missing value would make a
difference but that doesn't mean that they don't exist. What I think
is that if in a certain context a NULL is acceptable and Python can
pass either a None or e.g. a date, preparing witn unknown or date oid
shouldn't make a difference. Of course it is possible to trigger
ambiguities: trivially the query `SELECT %s` could be prepared with a
date and then you can throw it a string that will make it fail. But if
a placeholder is part of an expression or target for an insert I
believe (but will test too) that preparing with unknown types will be
equivalent.

I am interested in real cases in which a prepared statement is
definitely undesirable. My current idea is to make the feature
automatic but with the possibility to disable it at global,
connection, statement level. If there is evidence that the feature is
dangerous we can leave it disabled by default and leave the
possibility to enable it in the same contexts.

-- Daniele



>
> Generally I think that automatic preparation is a great idea, but if it is achieved
> by sacrificing reliability to any extent, I would prefer to have such a feature
> switched off by default.
>
> > - if more than `connection.prepared_number` queries are prepared, the
> > one used least recently is deallocated and evicted from the cache
> > (proposed default: 100).
>
> Why do you need such logic? Why not just keep some limited number of
> prepared statements? Is it a problem if a PS is in cache but rarely used?
>
> --
> Vlad
>
>
> On Mon, 21 Dec 2020 at 13:55, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>> On 12/21/20 12:26 PM, Daniele Varrazzo wrote:
>> > On Mon, 21 Dec 2020 at 16:02, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> >
>>
>> >
>> > I mistakenly replied Adrian privately. Following, his reply.
>> >
>>
>> > I think so. Personally, I prefer the two step approach as I am becoming
>> > less and less enthusiastic about hidden 'magic'. To that end a
>> > global(maybe connection) setting that would disable prepare would be nice.
>> >
>> > ----
>> >
>> > To which, 1) thank you very much, Adrian, for the plpython pointer,
>> > I'll take a look at it.
>> >
>> > 2) About disabling the automatic prepare: the mechanism I have in mind
>> > is to set prepare_threshold to None on the connection instance;
>>
>> So the above is something the user would have to do on each connection?
>>
>> > however we could make sure to have the default attribute defined on
>> > the class: this way who really hates the idea of prepared statements
>> > can be cheeky and set `psycopg3.Connection.prepare_threshold = None`
>>
>> I could get behind that. This all may be premature optimization on my
>> part. As long as there is some way to turn it off at some level I could
>> live with it.
>>
>> > instead of `myconn.prepare_threshold`... More seriously, if there is a
>> > large base of people who think that something can go wrong with
>> > prepared statement we can either provide a better interface to control
>> > it globally or to have the feature opt-in.
>> >
>> >
>> > -- Daniele
>> >
>>
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com
>>
>>



Re: psycopg3, prepared statements

From
Daniele Varrazzo
Date:
On Tue, 22 Dec 2020 at 22:36, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:
>
> On Tue, 22 Dec 2020 at 05:39, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:

> > I am just thinking about type issues in some edge cases...
> > What if the data is such that in the first row it has some small integers,
> > but on subsequent rows it has larger numbers that do not fit into "integer"?
> > There are probably other cases similar to this, including "None" values
> > on the moment you want to capture types... Maybe it is more reliable to
> > oblige the user, who knows their data better, to explicitly supply the types
> > for preparation... Will any type inference work in case of types mismatch?
> > If yes, to what extent?
>
> The only case I have thought where this can happen is with None vs.
> not None, in which case the oid info would be missing. Oid selection
> is based only on the Python type, not on the specific values (for
> instance Python datetime is always passed as timestamptz, never
> timestamp, even if the tezinfo is missing, and it relies on Postgres
> cast): I have avoided so far to make a choice based on the values
> exactly to avoid to find ourselves in this type of situations.
>
> I don't have in mind a case where a missing value would make a
> difference but that doesn't mean that they don't exist. What I think
> is that if in a certain context a NULL is acceptable and Python can
> pass either a None or e.g. a date, preparing witn unknown or date oid
> shouldn't make a difference. Of course it is possible to trigger
> ambiguities: trivially the query `SELECT %s` could be prepared with a
> date and then you can throw it a string that will make it fail. But if
> a placeholder is part of an expression or target for an insert I
> believe (but will test too) that preparing with unknown types will be
> equivalent.

Heads up about this: it's better than I thought!

I wrote a first implementation of the prepared statements cache using
the query as a key, but it's actually enough to use the (query, types)
tuple in order to tell apart statements that are executed with
different types. This way even the "SELECT %s" case won't be a
problem. Of course a statement executed with a mix of types will be
prepared later than `prepare_threshold`, but I think it's perfectly
acceptable: the case doesn't happen often and having the query
prepared after 10 times instead of 5 doesn't change much if it will be
executed hundreds of times or more.

What seems a feature-complete branch is available in [1]. The tests
[2] illustrate the main behaviour of the prepared statements system.

[1]: https://github.com/psycopg/psycopg3/tree/prepared-statements>.
[2]: https://github.com/psycopg/psycopg3/blob/prepared-statements/tests/test_prepared.py

Off to do some benchmarks now...

-- Daniele



Re: psycopg3, prepared statements

From
Adrian Klaver
Date:
On 12/23/20 2:53 PM, Daniele Varrazzo wrote:
> On Tue, 22 Dec 2020 at 22:36, Daniele Varrazzo
> <daniele.varrazzo@gmail.com> wrote:
>>
>> On Tue, 22 Dec 2020 at 05:39, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:
> 

> 
> Heads up about this: it's better than I thought!
> 
> I wrote a first implementation of the prepared statements cache using
> the query as a key, but it's actually enough to use the (query, types)
> tuple in order to tell apart statements that are executed with
> different types. This way even the "SELECT %s" case won't be a
> problem. Of course a statement executed with a mix of types will be
> prepared later than `prepare_threshold`, but I think it's perfectly

Alright I was following you until you got to above. I'm not following 
why it would overshoot prepare_threshold?

> acceptable: the case doesn't happen often and having the query
> prepared after 10 times instead of 5 doesn't change much if it will be
> executed hundreds of times or more.
> 
> What seems a feature-complete branch is available in [1]. The tests
> [2] illustrate the main behaviour of the prepared statements system.
> 
> [1]: https://github.com/psycopg/psycopg3/tree/prepared-statements>.
> [2]: https://github.com/psycopg/psycopg3/blob/prepared-statements/tests/test_prepared.py
> 
> Off to do some benchmarks now...
> 
> -- Daniele
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: psycopg3, prepared statements

From
Vladimir Ryabtsev
Date:
Cause (query, types) can give more combinations than (query,)?


On Wed, 23 Dec 2020 at 15:12, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/23/20 2:53 PM, Daniele Varrazzo wrote:
> On Tue, 22 Dec 2020 at 22:36, Daniele Varrazzo
> <daniele.varrazzo@gmail.com> wrote:
>>
>> On Tue, 22 Dec 2020 at 05:39, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:
>

>
> Heads up about this: it's better than I thought!
>
> I wrote a first implementation of the prepared statements cache using
> the query as a key, but it's actually enough to use the (query, types)
> tuple in order to tell apart statements that are executed with
> different types. This way even the "SELECT %s" case won't be a
> problem. Of course a statement executed with a mix of types will be
> prepared later than `prepare_threshold`, but I think it's perfectly

Alright I was following you until you got to above. I'm not following
why it would overshoot prepare_threshold?

> acceptable: the case doesn't happen often and having the query
> prepared after 10 times instead of 5 doesn't change much if it will be
> executed hundreds of times or more.
>
> What seems a feature-complete branch is available in [1]. The tests
> [2] illustrate the main behaviour of the prepared statements system.
>
> [1]: https://github.com/psycopg/psycopg3/tree/prepared-statements>.
> [2]: https://github.com/psycopg/psycopg3/blob/prepared-statements/tests/test_prepared.py
>
> Off to do some benchmarks now...
>
> -- Daniele
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: psycopg3, prepared statements

From
Daniele Varrazzo
Date:
On Wed, 23 Dec 2020 at 23:23, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:
>
> Cause (query, types) can give more combinations than (query,)?

Yes, that's the reason

    In [1]: import psycopg3
    In [2]: cnn = psycopg3.connect()
    In [3]: cnn.prepare_threshold = 2

    In [4]: cnn.execute("select 1 + %s", [1]).fetchone()
    Out[4]: (2,)
    In [5]: cnn.execute("select 1 + %s", [None]).fetchone()
    Out[5]: (None,)
    In [7]: cnn.execute("select 1 + %s", [2]).fetchone()
    Out[7]: (3,)

After 3 times the expression should have been prepared, but the tally
has been spread in two values (0 is unknown oid, 20 is int oid).

    In [8]: cnn._prepared_statements
    Out[8]: OrderedDict([((b'select 1 + $1', (0,)), 1), ((b'select 1 +
$1', (20,)), 2)])

    In [9]: cnn.execute("select 1 + %s", [3]).fetchone()
    Out[9]: (4,)

However, when either key passes the threshold, eventually preparation happens.

    In [10]: cnn._prepared_statements
    Out[10]:
    OrderedDict([((b'select 1 + $1', (0,)), 1),
                ((b'select 1 + $1', (20,)), b'_pg3_0')])

_pg3_0 is the name under which that combination of query and types is
now prepared (it is local per session).

-- Daniele



Re: psycopg3, prepared statements

From
Adrian Klaver
Date:
On 12/23/20 4:14 PM, Daniele Varrazzo wrote:
> On Wed, 23 Dec 2020 at 23:23, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:
>>
>> Cause (query, types) can give more combinations than (query,)?
> 
> Yes, that's the reason
> 
>      In [1]: import psycopg3
>      In [2]: cnn = psycopg3.connect()
>      In [3]: cnn.prepare_threshold = 2
> 
>      In [4]: cnn.execute("select 1 + %s", [1]).fetchone()
>      Out[4]: (2,)
>      In [5]: cnn.execute("select 1 + %s", [None]).fetchone()
>      Out[5]: (None,)
>      In [7]: cnn.execute("select 1 + %s", [2]).fetchone()
>      Out[7]: (3,)

Alright, I was misunderstanding. I thought you where referring to 
something like:

cur.execute("select * from some table where id > %s and user_name = %s 
and current = %s", [2, 'aklaver', True])

> 
> After 3 times the expression should have been prepared, but the tally
> has been spread in two values (0 is unknown oid, 20 is int oid).
> 
>      In [8]: cnn._prepared_statements
>      Out[8]: OrderedDict([((b'select 1 + $1', (0,)), 1), ((b'select 1 +
> $1', (20,)), 2)])
> 
>      In [9]: cnn.execute("select 1 + %s", [3]).fetchone()
>      Out[9]: (4,)
> 
> However, when either key passes the threshold, eventually preparation happens.
> 
>      In [10]: cnn._prepared_statements
>      Out[10]:
>      OrderedDict([((b'select 1 + $1', (0,)), 1),
>                  ((b'select 1 + $1', (20,)), b'_pg3_0')])
> 
> _pg3_0 is the name under which that combination of query and types is
> now prepared (it is local per session).
> 
> -- Daniele
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com