Thread: psycopg3 and adaptation choices

psycopg3 and adaptation choices

From
Daniele Varrazzo
Date:
Hello,

As you may know I'm implementing a new psycopg version using
server-side argument binding. As a real-time test I'm writing a Django
backend for it, and running Django test suite. This has uncovered
unexpected behaviours caused by the different ways parameters are
adapted by the backend as opposite as passed inline in the query.

I wrote down my findings at
https://www.varrazzo.com/blog/2020/11/07/psycopg3-adaptation/ and
would be happy to get some feedback. I'll copy the writing here too:
please forget the reST.

Cheers,

-- Daniele

---

One of the main behavioural differences between psycopg2 and 3, and a big
enought cause of incompatibility to require a "3", is the use of server-side
binding vs. client-side (psycopg2 predates the FE-BE protocol supporting it).
Passing Python objects to the server as separate arguments leads sometimes to
a behaviour of the server different to what people are used to using psql and
psycopg2. An example: what do you think::

    cursor.execute("select %s, %s", ["hello", 10])

should fetch? I assume it would be desirable to have a record ``("hello",
10)`` back - consistently to what you would return if you merged argument
client-side and queried for ``select 'hello', 10``. However keeping this
behaviour predictable proved already tricky. When passing arguments to merge
server-side, the client must pass their string (or binary) representation and,
optionally, the oid of their types. The most intuitive thing to do would be to
associate the text type oid (25) to Python strings. This works well for
queries like the above one, but it proves too strict a type definition in
context where a cast from text is not allowed. For instance this will fail::

    cursor.execute("create table testjson(data jsonb)")
    cursor.execute("insert into testjson (data) values (%s)", ["{}"])

The latter will raise the same exception you would get in psql if you execute
``insert into testjson (data) values ('{}'::text)``. The reason why a ``values
('{}')`` works as expected in psql is because the untyped literal is interpreted
as unknown, and the server can always cast from unknown to any other type. If
we wanted to insert the data above the query should be rewritten as::

    cursor.execute("insert into testjson (data) values (%s::jsonb)", ["{}"])

About a week ago I started the task to write `a Django backend for
psycopg3`__: running its test suite is proving a great way to observe the
behaviour of server-side binding in a plethora of different environments. The
problem of the non-implicit cast from text to jsonb made porting the backend
from psycopg2 to 3 `quite a tricky matter`__.

.. __: https://github.com/dvarrazzo/django/commits/psycopg3
.. __: https://github.com/dvarrazzo/django/commit/1ca8b71ba9bc6acfe239dd42f751037644e59e13

In order to provide an experience more similar to the use of psql and of
psycopg2, one possibility is to not pass an oid type for the strings, leaving
it unknown. When I tried with that approach, passing strings to jsonb fields
(which is not necessarily what you would do, but it's what Django does it,
probably to uniform JSON dumping across different adapter) became immediately
much simpler. So it seems a good choice, but not an overall win: our "hello
world" query::

    cursor.execute("select %s, %s", ["hello", 10])

fails with PostgreSQL 9.6 and older versions, returning an error such as
*could not determine data type of parameter $1*. PostgreSQL 10 and following
are more permissive and convert unknown to text on output.

It seems like there is a choice to make here: mine at the moment is to keep
the latter behaviour, both because that's what allows to write the simpler
code with the majority of the supported PostgreSQL versions. People using 9.6
and previous version would need to adjust to::

    cursor.execute("select %s::text, %s", ["hello", 10])

but the use case of inserting data into tables seems a more important use case
than parroting back a string after a database roundtrip.

What about the numbers? That's a problem thorny in a different way. Python
integers are of arbitrary size, so they map better to a subset of the
``numeric`` type than to ``int4`` or ``int8``. However there is a similar
problem, where seemingly familiar expression don't behave as expected. If this
works in psql::

    select data -> 'a' -> 2 from testjson;

a similar statement with 2 as a parameter will fail with an error like
*operator does not exist: jsonb -> numeric*. Passing unknown as oid would
work, but now our greeting would return instead ``('hello', '10')``, with the
number converted to string. This is probably too much of a departure from the
expected behaviour; at least, unlike the test-to-json case, there is an
automatic cast from ``numeric`` to integer, so an ``insert into
integer_field`` works as expected. Only operators and functions whose
arguments are declared ``integer`` and there is no ``numeric`` version trip
into an error: JSON's ``->``, binary operators such as ``>>`` and ``&``, and
few others::

    piro=# select current_date, current_date + 1;
     current_date |  ?column?
    --------------+------------
     2020-11-07   | 2020-11-08
    (1 row)

    piro=# select current_date + 1::numeric;
    ERROR:  operator does not exist: date + numeric
    LINE 1: select current_date + 1::numeric;
                                ^
    HINT:  No operator matches the given name and argument types. You might
    need to add explicit type casts.

Note that when Postgres says ``integer`` it means 4 bytes signed: defaulting
the Python ``int`` to the ``bigint`` oid solves no problem - the same
functions would fail the same way, and defaulting it to ``int4`` I feel it
would overflow too easily in a 64 bits world.

How to fix that? as the hint suggests, the user would have to either add a
cast to the query, which would look like::

    cur.execute("select current_date + %s::int", [offset])

or to specify an object that would get converted in Postgres to the oid of
``integer``: in psycopg3 I'm testing with adding subclasses of ``int`` called
``Int4``, ``Int8`` etc. to map more closely on the Postgres types::

    cur.execute("select current_date + %s", [Int4(offset)])

My choices so far are then:

- cast Python ``str`` to unknown by default (only alternative: ``text``)
- cast Python ``int`` to ``numeric`` by default (alternatives: ``int4``,
  ``int8``, ``unknown``).

Being the adaptation system flexible, people are able to override these
choices, but I don't expect many people to do it, and doing it process-wise
might cause interoperation problems across libraries. The behaviour
out-of-the-box is obviously important and I would like to get the tradeoffs
right.

What do you think?



Re: psycopg3 and adaptation choices

From
Adrian Klaver
Date:
On 11/7/20 8:32 AM, Daniele Varrazzo wrote:
> Hello,
> 
> As you may know I'm implementing a new psycopg version using
> server-side argument binding. As a real-time test I'm writing a Django
> backend for it, and running Django test suite. This has uncovered
> unexpected behaviours caused by the different ways parameters are
> adapted by the backend as opposite as passed inline in the query.
> 
> I wrote down my findings at
> https://www.varrazzo.com/blog/2020/11/07/psycopg3-adaptation/ and
> would be happy to get some feedback. I'll copy the writing here too:
> please forget the reST.
> 
> Cheers,
> 
> -- Daniele
> 
> ---
> 
> One of the main behavioural differences between psycopg2 and 3, and a big
> enought cause of incompatibility to require a "3", is the use of server-side
> binding vs. client-side (psycopg2 predates the FE-BE protocol supporting it).
> Passing Python objects to the server as separate arguments leads sometimes to
> a behaviour of the server different to what people are used to using psql and
> psycopg2. An example: what do you think::
> 
>      cursor.execute("select %s, %s", ["hello", 10])
> 
> should fetch? I assume it would be desirable to have a record ``("hello",
> 10)`` back - consistently to what you would return if you merged argument
> client-side and queried for ``select 'hello', 10``. However keeping this
> behaviour predictable proved already tricky. When passing arguments to merge
> server-side, the client must pass their string (or binary) representation and,
> optionally, the oid of their types. The most intuitive thing to do would be to
> associate the text type oid (25) to Python strings. This works well for
> queries like the above one, but it proves too strict a type definition in
> context where a cast from text is not allowed. For instance this will fail::
> 
>      cursor.execute("create table testjson(data jsonb)")
>      cursor.execute("insert into testjson (data) values (%s)", ["{}"])
> 
> The latter will raise the same exception you would get in psql if you execute
> ``insert into testjson (data) values ('{}'::text)``. The reason why a ``values
> ('{}')`` works as expected in psql is because the untyped literal is interpreted
> as unknown, and the server can always cast from unknown to any other type. If
> we wanted to insert the data above the query should be rewritten as::
> 
>      cursor.execute("insert into testjson (data) values (%s::jsonb)", ["{}"])
> 
> About a week ago I started the task to write `a Django backend for
> psycopg3`__: running its test suite is proving a great way to observe the
> behaviour of server-side binding in a plethora of different environments. The
> problem of the non-implicit cast from text to jsonb made porting the backend
> from psycopg2 to 3 `quite a tricky matter`__.
> 
> .. __: https://github.com/dvarrazzo/django/commits/psycopg3
> .. __: https://github.com/dvarrazzo/django/commit/1ca8b71ba9bc6acfe239dd42f751037644e59e13
> 
> In order to provide an experience more similar to the use of psql and of
> psycopg2, one possibility is to not pass an oid type for the strings, leaving
> it unknown. When I tried with that approach, passing strings to jsonb fields
> (which is not necessarily what you would do, but it's what Django does it,
> probably to uniform JSON dumping across different adapter) became immediately
> much simpler. So it seems a good choice, but not an overall win: our "hello
> world" query::
> 
>      cursor.execute("select %s, %s", ["hello", 10])
> 
> fails with PostgreSQL 9.6 and older versions, returning an error such as
> *could not determine data type of parameter $1*. PostgreSQL 10 and following
> are more permissive and convert unknown to text on output.
> 
> It seems like there is a choice to make here: mine at the moment is to keep
> the latter behaviour, both because that's what allows to write the simpler
> code with the majority of the supported PostgreSQL versions. People using 9.6
> and previous version would need to adjust to::
> 
>      cursor.execute("select %s::text, %s", ["hello", 10])
> 
> but the use case of inserting data into tables seems a more important use case
> than parroting back a string after a database roundtrip.
> 
> What about the numbers? That's a problem thorny in a different way. Python
> integers are of arbitrary size, so they map better to a subset of the
> ``numeric`` type than to ``int4`` or ``int8``. However there is a similar
> problem, where seemingly familiar expression don't behave as expected. If this
> works in psql::
> 
>      select data -> 'a' -> 2 from testjson;
> 
> a similar statement with 2 as a parameter will fail with an error like
> *operator does not exist: jsonb -> numeric*. Passing unknown as oid would
> work, but now our greeting would return instead ``('hello', '10')``, with the
> number converted to string. This is probably too much of a departure from the
> expected behaviour; at least, unlike the test-to-json case, there is an
> automatic cast from ``numeric`` to integer, so an ``insert into
> integer_field`` works as expected. Only operators and functions whose
> arguments are declared ``integer`` and there is no ``numeric`` version trip
> into an error: JSON's ``->``, binary operators such as ``>>`` and ``&``, and
> few others::
> 
>      piro=# select current_date, current_date + 1;
>       current_date |  ?column?
>      --------------+------------
>       2020-11-07   | 2020-11-08
>      (1 row)
> 
>      piro=# select current_date + 1::numeric;
>      ERROR:  operator does not exist: date + numeric
>      LINE 1: select current_date + 1::numeric;
>                                  ^
>      HINT:  No operator matches the given name and argument types. You might
>      need to add explicit type casts.
> 
> Note that when Postgres says ``integer`` it means 4 bytes signed: defaulting
> the Python ``int`` to the ``bigint`` oid solves no problem - the same
> functions would fail the same way, and defaulting it to ``int4`` I feel it
> would overflow too easily in a 64 bits world.
> 
> How to fix that? as the hint suggests, the user would have to either add a
> cast to the query, which would look like::
> 
>      cur.execute("select current_date + %s::int", [offset])
> 
> or to specify an object that would get converted in Postgres to the oid of
> ``integer``: in psycopg3 I'm testing with adding subclasses of ``int`` called
> ``Int4``, ``Int8`` etc. to map more closely on the Postgres types::
> 
>      cur.execute("select current_date + %s", [Int4(offset)])
> 
> My choices so far are then:
> 
> - cast Python ``str`` to unknown by default (only alternative: ``text``)
> - cast Python ``int`` to ``numeric`` by default (alternatives: ``int4``,
>    ``int8``, ``unknown``).
> 
> Being the adaptation system flexible, people are able to override these
> choices, but I don't expect many people to do it, and doing it process-wise
> might cause interoperation problems across libraries. The behaviour
> out-of-the-box is obviously important and I would like to get the tradeoffs
> right.
> 
> What do you think?

If I'm following correctly in psycopg2 the adapter does type adaption on 
the client side and passes that to server with oid for processing. In 
psycopg3 you are proposing to let the server do more of the type 
adaption and to that end you are looking for lowest common denominator 
type to pass to server and then let it do the final casting. Is that 
about right or am I way off base?


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: psycopg3 and adaptation choices

From
Daniele Varrazzo
Date:
On Sun, 8 Nov 2020 at 18:43, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> If I'm following correctly in psycopg2 the adapter does type adaption on
> the client side and passes that to server with oid for processing. In
> psycopg3 you are proposing to let the server do more of the type
> adaption and to that end you are looking for lowest common denominator
> type to pass to server and then let it do the final casting. Is that
> about right or am I way off base?

No, not really. In psycopg2 we compose a query entirely client-side,
and we pass it to the server with no oid indication, only as a big
literal, like it was typed all in psql. In the example of
`cursor.execute("select %s, %s", ["hello", 10])`, the server receives
a literal `select 'hello', 10` and has no idea that there were two
parameters.

In psycopg3 the idea is to use a more advanced protocol, which
separates query and parameters. It brings several benefits: can use
prepared statements (send a query once, several parameters later),
passing large data doesn't bloat the parser (the params don't hit the
lexer/parser), can use binary format (useful to pass large binary
blobs without escaping them in a textual form), the format of the data
is more homogeneous (no need to quoting), so we can use Python objects
in COPY instead of limiting the interface for the copy functions to
file-like objects only.

Both in psycopg2 and 3 there is an adaptation from Python types to
Postgres string representation. In pg2 there is additional quoting,
because apart from numbers and bools you need to quote a literal
string to merge it to the query and make it syntactically valid.

-- Daniele



Re: psycopg3 and adaptation choices

From
Adrian Klaver
Date:
On 11/8/20 11:16 AM, Daniele Varrazzo wrote:
> On Sun, 8 Nov 2020 at 18:43, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> 
>> If I'm following correctly in psycopg2 the adapter does type adaption on
>> the client side and passes that to server with oid for processing. In
>> psycopg3 you are proposing to let the server do more of the type
>> adaption and to that end you are looking for lowest common denominator
>> type to pass to server and then let it do the final casting. Is that
>> about right or am I way off base?
> 
> No, not really. In psycopg2 we compose a query entirely client-side,
> and we pass it to the server with no oid indication, only as a big
> literal, like it was typed all in psql. In the example of
> `cursor.execute("select %s, %s", ["hello", 10])`, the server receives
> a literal `select 'hello', 10` and has no idea that there were two
> parameters.

Alright I understand now.
More below.

> 
> In psycopg3 the idea is to use a more advanced protocol, which
> separates query and parameters. It brings several benefits: can use
> prepared statements (send a query once, several parameters later),
> passing large data doesn't bloat the parser (the params don't hit the
> lexer/parser), can use binary format (useful to pass large binary
> blobs without escaping them in a textual form), the format of the data
> is more homogeneous (no need to quoting), so we can use Python objects
> in COPY instead of limiting the interface for the copy functions to
> file-like objects only.
> 
> Both in psycopg2 and 3 there is an adaptation from Python types to
> Postgres string representation. In pg2 there is additional quoting,
> because apart from numbers and bools you need to quote a literal
> string to merge it to the query and make it syntactically valid.

So the issue in the psycopg3 protocol is making the parameters that are 
passed in separately match up correctly in type to what the server is 
expecting(or can cast implicitly)?

> 
> -- Daniele
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: psycopg3 and adaptation choices

From
Daniele Varrazzo
Date:
On Sun, 8 Nov 2020 at 20:35, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> Alright I understand now.
> More below.
>
> >
> > In psycopg3 the idea is to use a more advanced protocol, which
> > separates query and parameters. It brings several benefits: can use
> > prepared statements (send a query once, several parameters later),
> > passing large data doesn't bloat the parser (the params don't hit the
> > lexer/parser), can use binary format (useful to pass large binary
> > blobs without escaping them in a textual form), the format of the data
> > is more homogeneous (no need to quoting), so we can use Python objects
> > in COPY instead of limiting the interface for the copy functions to
> > file-like objects only.
> >
> > Both in psycopg2 and 3 there is an adaptation from Python types to
> > Postgres string representation. In pg2 there is additional quoting,
> > because apart from numbers and bools you need to quote a literal
> > string to merge it to the query and make it syntactically valid.
>
> So the issue in the psycopg3 protocol is making the parameters that are
> passed in separately match up correctly in type to what the server is
> expecting(or can cast implicitly)?

Yes, correct. What we have to choose is which Postgres oid to map to
each Python type.

Sometimes the mapping is trivial (e.g. `datetime.date` -> `date` in
Postgres, `uuid.UUID` -> `uuid`...)

Sometimes it might be ambiguous: is a `datetime.datetime` a
`timestamp` or a `timestamptz`? In some cases we don't care (here we
can say `timestamptz` no problem: if the Python datetime doesn't have
tzinfo, Postgres will use the `TimeZone` setting).

Sometimes it's messy: what Python type corresponds to a Postgres
`jsonb`? It might be a dict, or a list, or types that have other
representations too (numbers, strings, bools). In this case, as in
psycopg2, there can be a wrapper, e.g. `Json`, to tell psycopg that
this dict, or list, or whatever else, must be jsonified for the db.

When there are mismatches, sometimes the database cast rules help
(e.gi in the timestamp[tz] case). Sometimes not: if we say `text` to a
jsonb field, it will raise an error. Sometimes a cast is automatic on
inserting in a table but not on passing a function parameter.

Numbers are messy, as they usually are: Python has int, float,
Decimal, Postgres has int2, int4, int8, float4, float8, numeric. The
mappings float -> float8 and Decimal -> numeric are more or less
straightforward. `int` is not, as in Python it's unbounded. If you say
`select 10` in psql, the server understands "unknown type, but a
number", and can try if either int* or numeric fit the context. But we
don't have the help from the syntax that psql has: because 10 doesn't
have quotes, Postgres is sure that it is a number, and not a string,
but executing query/params separately we lose that expressivity: we
cannot quote the strings and not the number. So choices are:

1. If we specify `numeric` or `int8` as oid, inserting in an int field
in a table will work ok, but some functions/operators won't (e.g. "1
>> %s").
2. If we specify `int4` it would work for those few functions defined
as `integer`, but if we try to write a number that doesn't fit in 32
bits into a Postgres bigint field I assume something will overflow
along the way, even if both python and postgres can handle it.
3. If we specify `unknown` it might work more often, but
`cursor.execute("select %s", [10]) will return the string "10" instead
of a number.

So I wonder what's the best compromise to do here: the less bad seems
1. 3. might work in more contexts, but it's a very counterintuitive
behaviour, and roundtripping other objects (dates, uuid) works no
problem: they don't come back as strings.

-- Daniele



Re: psycopg3 and adaptation choices

From
Vladimir Ryabtsev
Date:
Hello,

From what I understood from your messages, I like passing 'unknown' for both strings and numbers.
Roundtripping parameters seems to be a less common case (with a possible fix if it's necessary).
Is there anything else that does not work or works counterintuitively with Python 'int' -> 'unknown'?

BTW, may I ask another question regarding parameters?
Don't you want to step away from '%s' syntax and use '$1, $2, ...' which seems to be more traditional in the database world?
'%s' feels like old-school string formatting, new server-side parameter binding may want to give some new impression.
Moreover, it appears more convenient when you have parameters numbered and can reuse them a few times in a query.

Vladimir

On Sun, 8 Nov 2020 at 14:22, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Sun, 8 Nov 2020 at 20:35, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> Alright I understand now.
> More below.
>
> >
> > In psycopg3 the idea is to use a more advanced protocol, which
> > separates query and parameters. It brings several benefits: can use
> > prepared statements (send a query once, several parameters later),
> > passing large data doesn't bloat the parser (the params don't hit the
> > lexer/parser), can use binary format (useful to pass large binary
> > blobs without escaping them in a textual form), the format of the data
> > is more homogeneous (no need to quoting), so we can use Python objects
> > in COPY instead of limiting the interface for the copy functions to
> > file-like objects only.
> >
> > Both in psycopg2 and 3 there is an adaptation from Python types to
> > Postgres string representation. In pg2 there is additional quoting,
> > because apart from numbers and bools you need to quote a literal
> > string to merge it to the query and make it syntactically valid.
>
> So the issue in the psycopg3 protocol is making the parameters that are
> passed in separately match up correctly in type to what the server is
> expecting(or can cast implicitly)?

Yes, correct. What we have to choose is which Postgres oid to map to
each Python type.

Sometimes the mapping is trivial (e.g. `datetime.date` -> `date` in
Postgres, `uuid.UUID` -> `uuid`...)

Sometimes it might be ambiguous: is a `datetime.datetime` a
`timestamp` or a `timestamptz`? In some cases we don't care (here we
can say `timestamptz` no problem: if the Python datetime doesn't have
tzinfo, Postgres will use the `TimeZone` setting).

Sometimes it's messy: what Python type corresponds to a Postgres
`jsonb`? It might be a dict, or a list, or types that have other
representations too (numbers, strings, bools). In this case, as in
psycopg2, there can be a wrapper, e.g. `Json`, to tell psycopg that
this dict, or list, or whatever else, must be jsonified for the db.

When there are mismatches, sometimes the database cast rules help
(e.gi in the timestamp[tz] case). Sometimes not: if we say `text` to a
jsonb field, it will raise an error. Sometimes a cast is automatic on
inserting in a table but not on passing a function parameter.

Numbers are messy, as they usually are: Python has int, float,
Decimal, Postgres has int2, int4, int8, float4, float8, numeric. The
mappings float -> float8 and Decimal -> numeric are more or less
straightforward. `int` is not, as in Python it's unbounded. If you say
`select 10` in psql, the server understands "unknown type, but a
number", and can try if either int* or numeric fit the context. But we
don't have the help from the syntax that psql has: because 10 doesn't
have quotes, Postgres is sure that it is a number, and not a string,
but executing query/params separately we lose that expressivity: we
cannot quote the strings and not the number. So choices are:

1. If we specify `numeric` or `int8` as oid, inserting in an int field
in a table will work ok, but some functions/operators won't (e.g. "1
>> %s").
2. If we specify `int4` it would work for those few functions defined
as `integer`, but if we try to write a number that doesn't fit in 32
bits into a Postgres bigint field I assume something will overflow
along the way, even if both python and postgres can handle it.
3. If we specify `unknown` it might work more often, but
`cursor.execute("select %s", [10]) will return the string "10" instead
of a number.

So I wonder what's the best compromise to do here: the less bad seems
1. 3. might work in more contexts, but it's a very counterintuitive
behaviour, and roundtripping other objects (dates, uuid) works no
problem: they don't come back as strings.

-- Daniele


Re: psycopg3 and adaptation choices

From
Vladimir Ryabtsev
Date:
A wild idea: support both client-side (like in psycopg2) and server-side binding. Keep old '%s' syntax and provide a separate method for client-side binding (not in 'cur.execute()'). This could alleviate cases like parameters roundtripping and other cases of safe query composition. At the same time use '$N' for true server-side binding.
Is it an overcomplication or there are valid use-cases of that?


On Sun, 8 Nov 2020 at 18:19, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:
Hello,

From what I understood from your messages, I like passing 'unknown' for both strings and numbers.
Roundtripping parameters seems to be a less common case (with a possible fix if it's necessary).
Is there anything else that does not work or works counterintuitively with Python 'int' -> 'unknown'?

BTW, may I ask another question regarding parameters?
Don't you want to step away from '%s' syntax and use '$1, $2, ...' which seems to be more traditional in the database world?
'%s' feels like old-school string formatting, new server-side parameter binding may want to give some new impression.
Moreover, it appears more convenient when you have parameters numbered and can reuse them a few times in a query.

Vladimir

On Sun, 8 Nov 2020 at 14:22, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Sun, 8 Nov 2020 at 20:35, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> Alright I understand now.
> More below.
>
> >
> > In psycopg3 the idea is to use a more advanced protocol, which
> > separates query and parameters. It brings several benefits: can use
> > prepared statements (send a query once, several parameters later),
> > passing large data doesn't bloat the parser (the params don't hit the
> > lexer/parser), can use binary format (useful to pass large binary
> > blobs without escaping them in a textual form), the format of the data
> > is more homogeneous (no need to quoting), so we can use Python objects
> > in COPY instead of limiting the interface for the copy functions to
> > file-like objects only.
> >
> > Both in psycopg2 and 3 there is an adaptation from Python types to
> > Postgres string representation. In pg2 there is additional quoting,
> > because apart from numbers and bools you need to quote a literal
> > string to merge it to the query and make it syntactically valid.
>
> So the issue in the psycopg3 protocol is making the parameters that are
> passed in separately match up correctly in type to what the server is
> expecting(or can cast implicitly)?

Yes, correct. What we have to choose is which Postgres oid to map to
each Python type.

Sometimes the mapping is trivial (e.g. `datetime.date` -> `date` in
Postgres, `uuid.UUID` -> `uuid`...)

Sometimes it might be ambiguous: is a `datetime.datetime` a
`timestamp` or a `timestamptz`? In some cases we don't care (here we
can say `timestamptz` no problem: if the Python datetime doesn't have
tzinfo, Postgres will use the `TimeZone` setting).

Sometimes it's messy: what Python type corresponds to a Postgres
`jsonb`? It might be a dict, or a list, or types that have other
representations too (numbers, strings, bools). In this case, as in
psycopg2, there can be a wrapper, e.g. `Json`, to tell psycopg that
this dict, or list, or whatever else, must be jsonified for the db.

When there are mismatches, sometimes the database cast rules help
(e.gi in the timestamp[tz] case). Sometimes not: if we say `text` to a
jsonb field, it will raise an error. Sometimes a cast is automatic on
inserting in a table but not on passing a function parameter.

Numbers are messy, as they usually are: Python has int, float,
Decimal, Postgres has int2, int4, int8, float4, float8, numeric. The
mappings float -> float8 and Decimal -> numeric are more or less
straightforward. `int` is not, as in Python it's unbounded. If you say
`select 10` in psql, the server understands "unknown type, but a
number", and can try if either int* or numeric fit the context. But we
don't have the help from the syntax that psql has: because 10 doesn't
have quotes, Postgres is sure that it is a number, and not a string,
but executing query/params separately we lose that expressivity: we
cannot quote the strings and not the number. So choices are:

1. If we specify `numeric` or `int8` as oid, inserting in an int field
in a table will work ok, but some functions/operators won't (e.g. "1
>> %s").
2. If we specify `int4` it would work for those few functions defined
as `integer`, but if we try to write a number that doesn't fit in 32
bits into a Postgres bigint field I assume something will overflow
along the way, even if both python and postgres can handle it.
3. If we specify `unknown` it might work more often, but
`cursor.execute("select %s", [10]) will return the string "10" instead
of a number.

So I wonder what's the best compromise to do here: the less bad seems
1. 3. might work in more contexts, but it's a very counterintuitive
behaviour, and roundtripping other objects (dates, uuid) works no
problem: they don't come back as strings.

-- Daniele


Re: psycopg3 and adaptation choices

From
Federico Di Gregorio
Date:
On 08/11/20 23:21, Daniele Varrazzo wrote:
[snip]
> 1. If we specify `numeric` or `int8` as oid, inserting in an int field
> in a table will work ok, but some functions/operators won't (e.g. "1
>>> %s").
> 2. If we specify `int4` it would work for those few functions defined
> as `integer`, but if we try to write a number that doesn't fit in 32
> bits into a Postgres bigint field I assume something will overflow
> along the way, even if both python and postgres can handle it.
> 3. If we specify `unknown` it might work more often, but
> `cursor.execute("select %s", [10]) will return the string "10" instead
> of a number.
> 
> So I wonder what's the best compromise to do here: the less bad seems
> 1. 3. might work in more contexts, but it's a very counterintuitive
> behaviour, and roundtripping other objects (dates, uuid) works no
> problem: they don't come back as strings.

Looking at what the adapters in other languages/frameworks do the common 
solution is to choose the "best fitting" type and let the programmer add 
a cast when needed. This is easier in statically typed languages where 
we have an almost perfect overlap between PostgreSQL and platform types 
but a bit more difficult in dynamic typed languages like Python where 
the available types are abstracted over the platform ones (numbers are a 
good example).

In your example I'd just go for int8 (the largest possible int in 
PostgreSQL). Decimal would probably be better (largest range) but it is 
not what the majority of people would expect. IMHO, oid is a bad idea 
because it has a very specific semantic and the error messages generated 
by PostgreSQL will be more confusing.

federico





Re: psycopg3 and adaptation choices

From
Daniele Varrazzo
Date:
On Mon, 9 Nov 2020 at 02:19, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:

> BTW, may I ask another question regarding parameters?
> Don't you want to step away from '%s' syntax and use '$1, $2, ...' which seems to be more traditional in the database
world?
> '%s' feels like old-school string formatting, new server-side parameter binding may want to give some new
impression.
> Moreover, it appears more convenient when you have parameters numbered and can reuse them a few times in a query.

Hi Vladmir,

I wouldn't want to step away from the %s placeholder, because that
would mean that every query of every program written in psycopg2 would
need to be rewritten, and that would be an impossibly steep adoption
curve. Furthermore the %(named)s placeholders are a much better
convenience over $n: you couldn't pass {name: param} mapping
otherwise.

The $n parameters are also cumbersome in requiring an explicit
mapping, where an implicit one would have worked (so it has to be "$1
$2 $3" and counting, instead of uniform "%s, %s, %s") and slipping a
parameter in the middle of a sequence of parameters requires to
renumber all the following ones.

Another feature added to psycopg3 is support for binary parameters: I
added %b and %(name)b placeholders to mark the placeholders requiring
a binary param, so that you can `execute("insert into image (name,
data) values (%s, %b)", [filename, image_bytes])`: if you used $n
placeholders you would need a new parameter to specify, of the list
(or mapping) of parameters, which one do you want in text and in
binary format, or some other mechanism, such as a
`Binary(image_bytes)` wrapper.

Said that, because we are using the server-side binding, we are
actually passing $n parameters to the server: in psycopg3 there is a
parser to convert %s and %(name)s placeholders to $n style and to
reorder params mapping when needed. So the query above results in a
call like `pgconn.exec_params(b"select $1, $2", [filename.encode(...),
image_bytes], [TEXT, BINARY])`. If there is interest we can think
about how to make this querying layer more accessible (e.g. using a
`cur.execute(PgQuery("select $1, $2"), [...])` or some other wrapping
mechanism.

-- Daniele



Re: psycopg3 and adaptation choices

From
Daniele Varrazzo
Date:
On Mon, 9 Nov 2020 at 02:49, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:
>
> A wild idea: support both client-side (like in psycopg2) and server-side binding. Keep old '%s' syntax and provide a
separatemethod for client-side binding (not in 'cur.execute()'). This could alleviate cases like parameters
roundtrippingand other cases of safe query composition. At the same time use '$N' for true server-side binding. 
> Is it an overcomplication or there are valid use-cases of that?

Such a thing exists already: there is a client-side binding library,
which is pretty much a direct porting of the psycopg2.sql module
(https://www.psycopg.org/docs/sql.html). Using that facility, you can
compose argument client-side with `query = sql.SQL("select {},
%s").format("hello")` and pass the above to `cursor.execute(query,
[42]). It uses `{}` and `{name}`-style parameters to bind client-side
so it leaves %s placeholders untouched for execute().

In psycopg3 I've made it slightly easier to use by letting
`SQL.format()` to accept any Python object, and automatically wrapping
it into a `Literal()` object, whereas in psycopg2 `format()` only
accepted "`Composible`" objects, and you should have written the above
as `sql.SQL("select {}, %s").format(Literal("hello"))`.

-- Daniele



Re: psycopg3 and adaptation choices

From
Daniele Varrazzo
Date:
On Mon, 9 Nov 2020 at 06:57, Federico Di Gregorio <fog@dndg.it> wrote:

> In your example I'd just go for int8 (the largest possible int in
> PostgreSQL). Decimal would probably be better (largest range) but it is
> not what the majority of people would expect.

The problem with int8 is that it fails in all the cases where decimal
fails, while not allowing to pass values larger than 64 bits:

    piro=# select '[10,20,30]'::jsonb -> 1::int8;
    ERROR:  operator does not exist: jsonb -> bigint
    LINE 1: select '[10,20,30]'::jsonb -> 1::int8;
                                      ^
    HINT:  No operator matches the given name and argument types. You
might need to add explicit type casts.

You are right though that, in terms of round-trippig, decimal might be
unexpected too:

    >>> cur.execute("select %s", [10]).fetchone()[0]
    Decimal('10')

So uhm... that's a +1 for that option too :D


> IMHO, oid is a bad idea
> because it has a very specific semantic and the error messages generated
> by PostgreSQL will be more confusing.

I'm not sure I understand this. At the moment, the oids are something
that don't really surface to the end-users, who are not required to
use them explicitly and shouldn't be seen in the error messages. For
instance the query above might results in a call:

    >>> from psycopg3.oids import builtins
    >>> builtins["numeric"].oid
    1700

    >>> res = conn.pgconn.exec_params(b"select '[]'::jsonb -> $1",
[b"1"], [1700])
    >>> res.status
    <ExecStatus.FATAL_ERROR: 7>

    >>> print(res.error_message.decode("utf8"))
    ERROR:  operator does not exist: jsonb -> numeric
    LINE 1: select '[]'::jsonb -> $1
                              ^
    HINT:  No operator matches the given name and argument types. You
might need to add explicit type casts.

So the oid is only used internally, in the mapping python type ->
exec_params() types array, the 1700 shouldn't surface anywhere.

Maybe I'm misunderstanding your concern: can you tell me better?

-- Daniele



Re: psycopg3 and adaptation choices

From
Federico Di Gregorio
Date:
On 09/11/20 13:00, Daniele Varrazzo wrote:
> On Mon, 9 Nov 2020 at 06:57, Federico Di Gregorio <fog@dndg.it> wrote:
[snip]
>> IMHO, oid is a bad idea
>> because it has a very specific semantic and the error messages generated
>> by PostgreSQL will be more confusing.
> 
> I'm not sure I understand this. At the moment, the oids are something
> that don't really surface to the end-users, who are not required to
> use them explicitly and shouldn't be seen in the error messages. For
> instance the query above might results in a call:
> 
>      >>> from psycopg3.oids import builtins
>      >>> builtins["numeric"].oid
>      1700
> 
>      >>> res = conn.pgconn.exec_params(b"select '[]'::jsonb -> $1",
> [b"1"], [1700])
>      >>> res.status
>      <ExecStatus.FATAL_ERROR: 7>
> 
>      >>> print(res.error_message.decode("utf8"))
>      ERROR:  operator does not exist: jsonb -> numeric
>      LINE 1: select '[]'::jsonb -> $1
>                                ^
>      HINT:  No operator matches the given name and argument types. You
> might need to add explicit type casts.
> 
> So the oid is only used internally, in the mapping python type ->
> exec_params() types array, the 1700 shouldn't surface anywhere.
> 
> Maybe I'm misunderstanding your concern: can you tell me better?

My fault. I misread and though you wanted to use OID as *the* type to 
pass to PostggreSQL for numbers.

federico




Re: psycopg3 and adaptation choices

From
Vladimir Ryabtsev
Date:
> I wouldn't want to step away from the %s placeholder ...

Thanks for the elaboration, I agree with your argumentation.

> If there is interest we can think about how to make this querying layer
> more accessible (e.g. using a `cur.execute(PgQuery("select $1, $2"), [...])`

It's always good if such customization is accessible. Having said that,
I can't think of a really good example that benefits from that rather than
from standard use, but basically the more possibilities the better.

> In psycopg3 I've made it slightly easier to use by letting
> `SQL.format()` to accept any Python object

Great, thanks!

May I ask you again about using 'unknown' for numbers? Could you recap
all the downsides of this approach?

Vladimir


On Mon, 9 Nov 2020 at 04:05, Federico Di Gregorio <fog@dndg.it> wrote:
On 09/11/20 13:00, Daniele Varrazzo wrote:
> On Mon, 9 Nov 2020 at 06:57, Federico Di Gregorio <fog@dndg.it> wrote:
[snip]
>> IMHO, oid is a bad idea
>> because it has a very specific semantic and the error messages generated
>> by PostgreSQL will be more confusing.
>
> I'm not sure I understand this. At the moment, the oids are something
> that don't really surface to the end-users, who are not required to
> use them explicitly and shouldn't be seen in the error messages. For
> instance the query above might results in a call:
>
>      >>> from psycopg3.oids import builtins
>      >>> builtins["numeric"].oid
>      1700
>
>      >>> res = conn.pgconn.exec_params(b"select '[]'::jsonb -> $1",
> [b"1"], [1700])
>      >>> res.status
>      <ExecStatus.FATAL_ERROR: 7>
>
>      >>> print(res.error_message.decode("utf8"))
>      ERROR:  operator does not exist: jsonb -> numeric
>      LINE 1: select '[]'::jsonb -> $1
>                                ^
>      HINT:  No operator matches the given name and argument types. You
> might need to add explicit type casts.
>
> So the oid is only used internally, in the mapping python type ->
> exec_params() types array, the 1700 shouldn't surface anywhere.
>
> Maybe I'm misunderstanding your concern: can you tell me better?

My fault. I misread and though you wanted to use OID as *the* type to
pass to PostggreSQL for numbers.

federico



Re: psycopg3 and adaptation choices

From
Adrian Klaver
Date:
On 11/8/20 2:21 PM, Daniele Varrazzo wrote:
> On Sun, 8 Nov 2020 at 20:35, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> 
>> Alright I understand now.
>> More below.
>>
>>>
>>> In psycopg3 the idea is to use a more advanced protocol, which
>>> separates query and parameters. It brings several benefits: can use
>>> prepared statements (send a query once, several parameters later),
>>> passing large data doesn't bloat the parser (the params don't hit the
>>> lexer/parser), can use binary format (useful to pass large binary
>>> blobs without escaping them in a textual form), the format of the data
>>> is more homogeneous (no need to quoting), so we can use Python objects
>>> in COPY instead of limiting the interface for the copy functions to
>>> file-like objects only.
>>>
>>> Both in psycopg2 and 3 there is an adaptation from Python types to
>>> Postgres string representation. In pg2 there is additional quoting,
>>> because apart from numbers and bools you need to quote a literal
>>> string to merge it to the query and make it syntactically valid.
>>
>> So the issue in the psycopg3 protocol is making the parameters that are
>> passed in separately match up correctly in type to what the server is
>> expecting(or can cast implicitly)?
> 
> Yes, correct. What we have to choose is which Postgres oid to map to
> each Python type.
> 
> Sometimes the mapping is trivial (e.g. `datetime.date` -> `date` in
> Postgres, `uuid.UUID` -> `uuid`...)
> 
> Sometimes it might be ambiguous: is a `datetime.datetime` a
> `timestamp` or a `timestamptz`? In some cases we don't care (here we
> can say `timestamptz` no problem: if the Python datetime doesn't have
> tzinfo, Postgres will use the `TimeZone` setting).
> 
> Sometimes it's messy: what Python type corresponds to a Postgres
> `jsonb`? It might be a dict, or a list, or types that have other
> representations too (numbers, strings, bools). In this case, as in
> psycopg2, there can be a wrapper, e.g. `Json`, to tell psycopg that
> this dict, or list, or whatever else, must be jsonified for the db.
> 
> When there are mismatches, sometimes the database cast rules help
> (e.gi in the timestamp[tz] case). Sometimes not: if we say `text` to a
> jsonb field, it will raise an error. Sometimes a cast is automatic on
> inserting in a table but not on passing a function parameter.
> 
> Numbers are messy, as they usually are: Python has int, float,
> Decimal, Postgres has int2, int4, int8, float4, float8, numeric. The
> mappings float -> float8 and Decimal -> numeric are more or less
> straightforward. `int` is not, as in Python it's unbounded. If you say
> `select 10` in psql, the server understands "unknown type, but a
> number", and can try if either int* or numeric fit the context. But we
> don't have the help from the syntax that psql has: because 10 doesn't
> have quotes, Postgres is sure that it is a number, and not a string,
> but executing query/params separately we lose that expressivity: we
> cannot quote the strings and not the number. So choices are:
> 
> 1. If we specify `numeric` or `int8` as oid, inserting in an int field
> in a table will work ok, but some functions/operators won't (e.g. "1

What is not working here?

>>> %s").
> 2. If we specify `int4` it would work for those few functions defined
> as `integer`, but if we try to write a number that doesn't fit in 32
> bits into a Postgres bigint field I assume something will overflow
> along the way, even if both python and postgres can handle it.
> 3. If we specify `unknown` it might work more often, but
> `cursor.execute("select %s", [10]) will return the string "10" instead
> of a number.
> 
> So I wonder what's the best compromise to do here: the less bad seems
> 1. 3. might work in more contexts, but it's a very counterintuitive
> behaviour, and roundtripping other objects (dates, uuid) works no
> problem: they don't come back as strings.

There is a lot to digest here. I'm going to have to do some thinking on 
this.

> 
> -- Daniele
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: psycopg3 and adaptation choices

From
Daniele Varrazzo
Date:
On Tue, 10 Nov 2020 at 01:19, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 11/8/20 2:21 PM, Daniele Varrazzo wrote:

> > 1. If we specify `numeric` or `int8` as oid, inserting in an int field
> > in a table will work ok, but some functions/operators won't (e.g. "1
> > > %s").
>
> What is not working here?

Postgres has two different types of cast rules: "assignment" and
"implicit". https://www.postgresql.org/docs/current/sql-createcast.html
Assignment casts are only chosen by the parser on insert. Implicit
casts are chosen in other contexts too. It appears that the cast rules
from numeric/int8 to integer are of the first kind. So, while
inserting into a table with a mismatching type mostly works:

    piro=# create table mytable  (myint integer);
    CREATE TABLE
    piro=# insert into mytable (myint) values (42::decimal);
    INSERT 0 1
    piro=# insert into mytable (myint) values (84::int8);
    INSERT 0 1

using the wrong type in more generic expression may fail:

    piro=# select 1 << 4::int;
    ?column?
    ----------
          16

    piro=# select 1 << 4::int8;
    ERROR:  operator does not exist: integer << bigint
    LINE 1: select 1 << 4::bigint;
                    ^
    HINT:  No operator matches the given name and argument types. You
might need to add explicit type casts.

So a psycopg statement such as `cur.execute("select 1 << %s", [n])`,
which used to work if bound client-side, must be rewritten as "select
1 << %s::integer" for server-side binding, both if we choose int8 or
numeric as Postgres types to adapt a Python int.

-- Daniele



Re: psycopg3 and adaptation choices

From
Christophe Pettus
Date:

> On Nov 7, 2020, at 08:32, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
> What do you think?

I don't want to derail this, but a question: Can a connection be set to use the simple-query protocol instead of the
extended-queryprotocol in 3? 

--
-- Christophe Pettus
   xof@thebuild.com




Re: psycopg3 and adaptation choices

From
Daniele Varrazzo
Date:
On Tue, 10 Nov 2020 at 01:06, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:

> May I ask you again about using 'unknown' for numbers? Could you recap
> all the downsides of this approach?

After this useful conversation, I've updated the article
(https://www.varrazzo.com/blog/2020/11/07/psycopg3-adaptation/) and
I've added comparison tables with the choices available. The integer
one is the following:

    Choices to cast Python ``int`` type:

    To ``unknown``:

    * +1: can express the full range of the unbounded Python ``int``
    * +2: can be passed without cast to most data types and functions
    * -2: it round-trips back to string, error on PostgreSQL < 10

    To ``numeric``:

    * +1: can express the full range of the unbounded Python ``int``
    * +1: can be passed without cast to most data types
    * -1: requires a cast for some functions
    * DELETED [-1: it round-trips to 'Decimal'`]
    * +0.5: **it can round-trip back to int**, with additional care - more to
      follow

    To ``int8``:

    * -0.5: can't express values not fitting in 64 bits (relatively limited
      use case: if the target column is ``numeric`` then it would be wise for
      the user to pass a ``Decimal``)
    * +1: can be passed without cast to most data types
    * -1: requires a cast for some functions
    * +1: it round-trips back to ``int``

    To ``int4``:

    * -1: limited range: couldn't be used to pass a value not fitting into 32
      bytes from Python to a ``bigint``.
    * +2: can be passed without cast to most data types and functions
    * +1: it round-trips back to ``int``

What about the DELETED entry on ``numeric``? If we dump ``int`` -> ``numeric``
to the db, and load back ``numeric`` -> ``Decimal`` from it, we end up with
integers round-tripping to ``Decimal``, which could easily create errors in
Python contexts which are not ready to deal with fixed-point arithmetic.
However, upon receiving a ``numeric`` from the database, we can check what
number it is: if it has no decimal digit it can be returned to Python as
``int``, if it has any decimal digit it must be returned as ``Decimal``.

This mechanism can be as trivial as `looking if there is a '.'`__ in the data
returned by the database; however in many cases the is job made simpler (or at
least more efficient) by the presence of the ``numeric`` modifiers:
``numeric`` comes in three flavours:

- ``numeric``: arbitrary precision (number of digits) and scale (number of
  digits after the decimal point),

- ``numeric(n)``: limited precision, no digit after the decimal
  point (equivalent to ``numeric(n, 0)``),

- ``numeric(n, m)``: limited precision, fixed number of
  digits after the decimal point.

.. __: https://github.com/psycopg/psycopg3/commit/5ced659f4838cf72c1981518ae2804942ebbd07b

The modifier information is returned in many contexts (among which the most
important: selecting data from tables): if we know the scale `we can decide
upfront`__ to load the entire column as ``int`` if the scale is 0, ``Decimal``
otherwise. If the modifier is unknown we can look at the presence of the dot.

.. __: https://github.com/psycopg/psycopg3/commit/a9444144f7d0581284ccab198ad0355436e6822a

Returning an ``int`` in a context where ``Decimal`` are expected doesn't seem
a big problem: throwing an integer to a fixed-point calculation doesn't wreak
havoc has it does throwing a ``Decimal`` in a context of ``flaot``
calculations. Running the entire Django test suite after this change caused no
test to fail, which gives me some comfort.



Re: psycopg3 and adaptation choices

From
Christophe Pettus
Date:

> On Nov 9, 2020, at 19:20, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
>
>    Choices to cast Python ``int`` type:

Is it absurd to make the choice at execution time, based on the actual value of the Python int?

--
-- Christophe Pettus
   xof@thebuild.com




Re: psycopg3 and adaptation choices

From
Daniele Varrazzo
Date:
On Tue, 10 Nov 2020 at 03:20, Christophe Pettus <xof@thebuild.com> wrote:
>
> I don't want to derail this, but a question: Can a connection be set to use the simple-query protocol instead of the
extended-queryprotocol in 3?
 

It won't be set explicitly to use simple/extended protocol, but if a
query is passed with no parameter (because it just has no variable
part, or because it's being composed client-side using functionalities
similar to psycopg2.sql) then the simple-query protocol is used [1].
This is mostly useful to pass several queries at once, because the
extended query protocol doesn't allow more than one query at time [2].

[1]
https://github.com/psycopg/psycopg3/blob/bc02ae1c9c42892966ff079250c24ff5eae28b48/psycopg3/psycopg3/cursor.py#L243-L254
[2] https://www.postgresql.org/docs/current/libpq-exec.html#LIBPQ-PQEXECPARAMS

-- Daniele



Re: psycopg3 and adaptation choices

From
Daniele Varrazzo
Date:
On Tue, 10 Nov 2020 at 03:22, Christophe Pettus <xof@thebuild.com> wrote:
>
> > On Nov 9, 2020, at 19:20, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
> >
> >    Choices to cast Python ``int`` type:
>
> Is it absurd to make the choice at execution time, based on the actual value of the Python int?

I've been thinking a lot about it. I haven't completely ruled it out,
but there are a few cases in which having different oids for the same
query gets in the way. One that comes to mind is with prepared
statements, either explicit (which I haven't exposed yet, but it's
like the #1 request for a new feature), or implicit (currently using
them to implement 'executemany()'). However I might be overestimating
these issues, yes.

I guess I should give an overview of the whole adaptation system: I'll
try and write its documentation in the next few days. I have to start
with the documentation somewhere...

-- Daniele



Re: psycopg3 and adaptation choices

From
Vladimir Ryabtsev
Date:
> I've added comparison tables with the choices available.

Nice article!

I see you have assigned weights to the upsides and downsides
and now these weights are going to determine the winner.
My particular question is regarding '-2' score for mapping numbers
to 'unknown'.

Why do you consider it such an important use-case? In my opinion,
such usage is just waste of resources and traffic, and if presenting
some constants in the result set is really required, other means,
such as 'SQL.format()`, might be a better option.

> Is it absurd to make the choice at execution time, based on the actual value of the Python int?

If I got the idea correctly, it may lead to some inconsistent behaviour
in a series of executions of a prepared statement. Imagine you tested
you query with small numbers that fit into int4, but in production rarely
blogger values can come that must be sent as int8, which will lead to
occasional cast errors.

Vladimir

On Mon, 9 Nov 2020 at 19:43, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Tue, 10 Nov 2020 at 03:22, Christophe Pettus <xof@thebuild.com> wrote:
>
> > On Nov 9, 2020, at 19:20, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
> >
> >    Choices to cast Python ``int`` type:
>
> Is it absurd to make the choice at execution time, based on the actual value of the Python int?

I've been thinking a lot about it. I haven't completely ruled it out,
but there are a few cases in which having different oids for the same
query gets in the way. One that comes to mind is with prepared
statements, either explicit (which I haven't exposed yet, but it's
like the #1 request for a new feature), or implicit (currently using
them to implement 'executemany()'). However I might be overestimating
these issues, yes.

I guess I should give an overview of the whole adaptation system: I'll
try and write its documentation in the next few days. I have to start
with the documentation somewhere...

-- Daniele

Aw: Re: psycopg3 and adaptation choices

From
Karsten Hilbert
Date:
> My particular question is regarding '-2' score for mapping numbers
> to 'unknown'.
>  
> Why do you consider it such an important use-case? In my opinion,
> such usage is just waste of resources and traffic,

I am sometimes round-tripping magic constants to tell which part of a, say,
UNION actually returned the result(s), based on which knowledge the EMR
UI will behave slightly differently (say, one is retrieving medications,
by matching either of product name or active ingredient, all in
one query).

I needn't roundtrip ints, sure, but...

Karsten



Re: psycopg3 and adaptation choices

From
Rory Campbell-Lange
Date:
On 10/11/20, Daniele Varrazzo (daniele.varrazzo@gmail.com) wrote:
> On Tue, 10 Nov 2020 at 03:22, Christophe Pettus <xof@thebuild.com> wrote:
> >
> > > On Nov 9, 2020, at 19:20, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
> > >
> > >    Choices to cast Python ``int`` type:
> >
> > Is it absurd to make the choice at execution time, based on the actual value of the Python int?
> 
> I've been thinking a lot about it. I haven't completely ruled it out,
> but there are a few cases in which having different oids for the same
> query gets in the way. One that comes to mind is with prepared
> statements, either explicit (which I haven't exposed yet, but it's
> like the #1 request for a new feature), or implicit (currently using
> them to implement 'executemany()'). However I might be overestimating
> these issues, yes.
> 
> I guess I should give an overview of the whole adaptation system: I'll
> try and write its documentation in the next few days. I have to start
> with the documentation somewhere...

Apologies for a no-doubt naive suggestion, Daniele, but how about a
postgresql type 'shim' of some sort that only accepts python types on
input and translates output back to only python types.

If such a shim, perhaps a type + C function pair were used, I assume it
would not round-trip per-se, but could cycle through int types from most
restrictive to most lenient on the basis (I'm guessing) that postgresql
will coerce a postgres int4 to and int8 on insertion if necessary, on
the principle that the following works ok:

    test=> create table a (b int8);
    CREATE TABLE
    test=> insert into a values (1::int4);
    INSERT 0 1

    test=> create table b (c numeric);
    CREATE TABLE
    test=> insert into b values (4::int8);
    INSERT 0 1

I assume mapping native postgresql column types to output values would
pass back through such a 'sieve' quite naturally into native python
types.

Whether such a filtering layer should work directly in postgresql or as
a translation (or 'adaptation') layer in psycopg[3]*is perhaps a similar
debate -- although at a lower level -- about Django's ORM. Where should
the logic lie?

By the way I believe this is this 'layer' for the golang pgx module,
which may be of interest:
https://github.com/jackc/pgx/blob/93c6b60429e13e0016665214dca2c6382982cf99/values.go#L28
although golang is of course is more strongly typed than python. I
thought the type switch test for coercion, as Christophe suggests, might
be doable through the 'shim' layer I'm imagining.

Regards
Rory