Thread: psycopg3 and adaptation choices
Hello, As you may know I'm implementing a new psycopg version using server-side argument binding. As a real-time test I'm writing a Django backend for it, and running Django test suite. This has uncovered unexpected behaviours caused by the different ways parameters are adapted by the backend as opposite as passed inline in the query. I wrote down my findings at https://www.varrazzo.com/blog/2020/11/07/psycopg3-adaptation/ and would be happy to get some feedback. I'll copy the writing here too: please forget the reST. Cheers, -- Daniele --- One of the main behavioural differences between psycopg2 and 3, and a big enought cause of incompatibility to require a "3", is the use of server-side binding vs. client-side (psycopg2 predates the FE-BE protocol supporting it). Passing Python objects to the server as separate arguments leads sometimes to a behaviour of the server different to what people are used to using psql and psycopg2. An example: what do you think:: cursor.execute("select %s, %s", ["hello", 10]) should fetch? I assume it would be desirable to have a record ``("hello", 10)`` back - consistently to what you would return if you merged argument client-side and queried for ``select 'hello', 10``. However keeping this behaviour predictable proved already tricky. When passing arguments to merge server-side, the client must pass their string (or binary) representation and, optionally, the oid of their types. The most intuitive thing to do would be to associate the text type oid (25) to Python strings. This works well for queries like the above one, but it proves too strict a type definition in context where a cast from text is not allowed. For instance this will fail:: cursor.execute("create table testjson(data jsonb)") cursor.execute("insert into testjson (data) values (%s)", ["{}"]) The latter will raise the same exception you would get in psql if you execute ``insert into testjson (data) values ('{}'::text)``. The reason why a ``values ('{}')`` works as expected in psql is because the untyped literal is interpreted as unknown, and the server can always cast from unknown to any other type. If we wanted to insert the data above the query should be rewritten as:: cursor.execute("insert into testjson (data) values (%s::jsonb)", ["{}"]) About a week ago I started the task to write `a Django backend for psycopg3`__: running its test suite is proving a great way to observe the behaviour of server-side binding in a plethora of different environments. The problem of the non-implicit cast from text to jsonb made porting the backend from psycopg2 to 3 `quite a tricky matter`__. .. __: https://github.com/dvarrazzo/django/commits/psycopg3 .. __: https://github.com/dvarrazzo/django/commit/1ca8b71ba9bc6acfe239dd42f751037644e59e13 In order to provide an experience more similar to the use of psql and of psycopg2, one possibility is to not pass an oid type for the strings, leaving it unknown. When I tried with that approach, passing strings to jsonb fields (which is not necessarily what you would do, but it's what Django does it, probably to uniform JSON dumping across different adapter) became immediately much simpler. So it seems a good choice, but not an overall win: our "hello world" query:: cursor.execute("select %s, %s", ["hello", 10]) fails with PostgreSQL 9.6 and older versions, returning an error such as *could not determine data type of parameter $1*. PostgreSQL 10 and following are more permissive and convert unknown to text on output. It seems like there is a choice to make here: mine at the moment is to keep the latter behaviour, both because that's what allows to write the simpler code with the majority of the supported PostgreSQL versions. People using 9.6 and previous version would need to adjust to:: cursor.execute("select %s::text, %s", ["hello", 10]) but the use case of inserting data into tables seems a more important use case than parroting back a string after a database roundtrip. What about the numbers? That's a problem thorny in a different way. Python integers are of arbitrary size, so they map better to a subset of the ``numeric`` type than to ``int4`` or ``int8``. However there is a similar problem, where seemingly familiar expression don't behave as expected. If this works in psql:: select data -> 'a' -> 2 from testjson; a similar statement with 2 as a parameter will fail with an error like *operator does not exist: jsonb -> numeric*. Passing unknown as oid would work, but now our greeting would return instead ``('hello', '10')``, with the number converted to string. This is probably too much of a departure from the expected behaviour; at least, unlike the test-to-json case, there is an automatic cast from ``numeric`` to integer, so an ``insert into integer_field`` works as expected. Only operators and functions whose arguments are declared ``integer`` and there is no ``numeric`` version trip into an error: JSON's ``->``, binary operators such as ``>>`` and ``&``, and few others:: piro=# select current_date, current_date + 1; current_date | ?column? --------------+------------ 2020-11-07 | 2020-11-08 (1 row) piro=# select current_date + 1::numeric; ERROR: operator does not exist: date + numeric LINE 1: select current_date + 1::numeric; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. Note that when Postgres says ``integer`` it means 4 bytes signed: defaulting the Python ``int`` to the ``bigint`` oid solves no problem - the same functions would fail the same way, and defaulting it to ``int4`` I feel it would overflow too easily in a 64 bits world. How to fix that? as the hint suggests, the user would have to either add a cast to the query, which would look like:: cur.execute("select current_date + %s::int", [offset]) or to specify an object that would get converted in Postgres to the oid of ``integer``: in psycopg3 I'm testing with adding subclasses of ``int`` called ``Int4``, ``Int8`` etc. to map more closely on the Postgres types:: cur.execute("select current_date + %s", [Int4(offset)]) My choices so far are then: - cast Python ``str`` to unknown by default (only alternative: ``text``) - cast Python ``int`` to ``numeric`` by default (alternatives: ``int4``, ``int8``, ``unknown``). Being the adaptation system flexible, people are able to override these choices, but I don't expect many people to do it, and doing it process-wise might cause interoperation problems across libraries. The behaviour out-of-the-box is obviously important and I would like to get the tradeoffs right. What do you think?
On 11/7/20 8:32 AM, Daniele Varrazzo wrote: > Hello, > > As you may know I'm implementing a new psycopg version using > server-side argument binding. As a real-time test I'm writing a Django > backend for it, and running Django test suite. This has uncovered > unexpected behaviours caused by the different ways parameters are > adapted by the backend as opposite as passed inline in the query. > > I wrote down my findings at > https://www.varrazzo.com/blog/2020/11/07/psycopg3-adaptation/ and > would be happy to get some feedback. I'll copy the writing here too: > please forget the reST. > > Cheers, > > -- Daniele > > --- > > One of the main behavioural differences between psycopg2 and 3, and a big > enought cause of incompatibility to require a "3", is the use of server-side > binding vs. client-side (psycopg2 predates the FE-BE protocol supporting it). > Passing Python objects to the server as separate arguments leads sometimes to > a behaviour of the server different to what people are used to using psql and > psycopg2. An example: what do you think:: > > cursor.execute("select %s, %s", ["hello", 10]) > > should fetch? I assume it would be desirable to have a record ``("hello", > 10)`` back - consistently to what you would return if you merged argument > client-side and queried for ``select 'hello', 10``. However keeping this > behaviour predictable proved already tricky. When passing arguments to merge > server-side, the client must pass their string (or binary) representation and, > optionally, the oid of their types. The most intuitive thing to do would be to > associate the text type oid (25) to Python strings. This works well for > queries like the above one, but it proves too strict a type definition in > context where a cast from text is not allowed. For instance this will fail:: > > cursor.execute("create table testjson(data jsonb)") > cursor.execute("insert into testjson (data) values (%s)", ["{}"]) > > The latter will raise the same exception you would get in psql if you execute > ``insert into testjson (data) values ('{}'::text)``. The reason why a ``values > ('{}')`` works as expected in psql is because the untyped literal is interpreted > as unknown, and the server can always cast from unknown to any other type. If > we wanted to insert the data above the query should be rewritten as:: > > cursor.execute("insert into testjson (data) values (%s::jsonb)", ["{}"]) > > About a week ago I started the task to write `a Django backend for > psycopg3`__: running its test suite is proving a great way to observe the > behaviour of server-side binding in a plethora of different environments. The > problem of the non-implicit cast from text to jsonb made porting the backend > from psycopg2 to 3 `quite a tricky matter`__. > > .. __: https://github.com/dvarrazzo/django/commits/psycopg3 > .. __: https://github.com/dvarrazzo/django/commit/1ca8b71ba9bc6acfe239dd42f751037644e59e13 > > In order to provide an experience more similar to the use of psql and of > psycopg2, one possibility is to not pass an oid type for the strings, leaving > it unknown. When I tried with that approach, passing strings to jsonb fields > (which is not necessarily what you would do, but it's what Django does it, > probably to uniform JSON dumping across different adapter) became immediately > much simpler. So it seems a good choice, but not an overall win: our "hello > world" query:: > > cursor.execute("select %s, %s", ["hello", 10]) > > fails with PostgreSQL 9.6 and older versions, returning an error such as > *could not determine data type of parameter $1*. PostgreSQL 10 and following > are more permissive and convert unknown to text on output. > > It seems like there is a choice to make here: mine at the moment is to keep > the latter behaviour, both because that's what allows to write the simpler > code with the majority of the supported PostgreSQL versions. People using 9.6 > and previous version would need to adjust to:: > > cursor.execute("select %s::text, %s", ["hello", 10]) > > but the use case of inserting data into tables seems a more important use case > than parroting back a string after a database roundtrip. > > What about the numbers? That's a problem thorny in a different way. Python > integers are of arbitrary size, so they map better to a subset of the > ``numeric`` type than to ``int4`` or ``int8``. However there is a similar > problem, where seemingly familiar expression don't behave as expected. If this > works in psql:: > > select data -> 'a' -> 2 from testjson; > > a similar statement with 2 as a parameter will fail with an error like > *operator does not exist: jsonb -> numeric*. Passing unknown as oid would > work, but now our greeting would return instead ``('hello', '10')``, with the > number converted to string. This is probably too much of a departure from the > expected behaviour; at least, unlike the test-to-json case, there is an > automatic cast from ``numeric`` to integer, so an ``insert into > integer_field`` works as expected. Only operators and functions whose > arguments are declared ``integer`` and there is no ``numeric`` version trip > into an error: JSON's ``->``, binary operators such as ``>>`` and ``&``, and > few others:: > > piro=# select current_date, current_date + 1; > current_date | ?column? > --------------+------------ > 2020-11-07 | 2020-11-08 > (1 row) > > piro=# select current_date + 1::numeric; > ERROR: operator does not exist: date + numeric > LINE 1: select current_date + 1::numeric; > ^ > HINT: No operator matches the given name and argument types. You might > need to add explicit type casts. > > Note that when Postgres says ``integer`` it means 4 bytes signed: defaulting > the Python ``int`` to the ``bigint`` oid solves no problem - the same > functions would fail the same way, and defaulting it to ``int4`` I feel it > would overflow too easily in a 64 bits world. > > How to fix that? as the hint suggests, the user would have to either add a > cast to the query, which would look like:: > > cur.execute("select current_date + %s::int", [offset]) > > or to specify an object that would get converted in Postgres to the oid of > ``integer``: in psycopg3 I'm testing with adding subclasses of ``int`` called > ``Int4``, ``Int8`` etc. to map more closely on the Postgres types:: > > cur.execute("select current_date + %s", [Int4(offset)]) > > My choices so far are then: > > - cast Python ``str`` to unknown by default (only alternative: ``text``) > - cast Python ``int`` to ``numeric`` by default (alternatives: ``int4``, > ``int8``, ``unknown``). > > Being the adaptation system flexible, people are able to override these > choices, but I don't expect many people to do it, and doing it process-wise > might cause interoperation problems across libraries. The behaviour > out-of-the-box is obviously important and I would like to get the tradeoffs > right. > > What do you think? If I'm following correctly in psycopg2 the adapter does type adaption on the client side and passes that to server with oid for processing. In psycopg3 you are proposing to let the server do more of the type adaption and to that end you are looking for lowest common denominator type to pass to server and then let it do the final casting. Is that about right or am I way off base? -- Adrian Klaver adrian.klaver@aklaver.com
On Sun, 8 Nov 2020 at 18:43, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > If I'm following correctly in psycopg2 the adapter does type adaption on > the client side and passes that to server with oid for processing. In > psycopg3 you are proposing to let the server do more of the type > adaption and to that end you are looking for lowest common denominator > type to pass to server and then let it do the final casting. Is that > about right or am I way off base? No, not really. In psycopg2 we compose a query entirely client-side, and we pass it to the server with no oid indication, only as a big literal, like it was typed all in psql. In the example of `cursor.execute("select %s, %s", ["hello", 10])`, the server receives a literal `select 'hello', 10` and has no idea that there were two parameters. In psycopg3 the idea is to use a more advanced protocol, which separates query and parameters. It brings several benefits: can use prepared statements (send a query once, several parameters later), passing large data doesn't bloat the parser (the params don't hit the lexer/parser), can use binary format (useful to pass large binary blobs without escaping them in a textual form), the format of the data is more homogeneous (no need to quoting), so we can use Python objects in COPY instead of limiting the interface for the copy functions to file-like objects only. Both in psycopg2 and 3 there is an adaptation from Python types to Postgres string representation. In pg2 there is additional quoting, because apart from numbers and bools you need to quote a literal string to merge it to the query and make it syntactically valid. -- Daniele
On 11/8/20 11:16 AM, Daniele Varrazzo wrote: > On Sun, 8 Nov 2020 at 18:43, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> If I'm following correctly in psycopg2 the adapter does type adaption on >> the client side and passes that to server with oid for processing. In >> psycopg3 you are proposing to let the server do more of the type >> adaption and to that end you are looking for lowest common denominator >> type to pass to server and then let it do the final casting. Is that >> about right or am I way off base? > > No, not really. In psycopg2 we compose a query entirely client-side, > and we pass it to the server with no oid indication, only as a big > literal, like it was typed all in psql. In the example of > `cursor.execute("select %s, %s", ["hello", 10])`, the server receives > a literal `select 'hello', 10` and has no idea that there were two > parameters. Alright I understand now. More below. > > In psycopg3 the idea is to use a more advanced protocol, which > separates query and parameters. It brings several benefits: can use > prepared statements (send a query once, several parameters later), > passing large data doesn't bloat the parser (the params don't hit the > lexer/parser), can use binary format (useful to pass large binary > blobs without escaping them in a textual form), the format of the data > is more homogeneous (no need to quoting), so we can use Python objects > in COPY instead of limiting the interface for the copy functions to > file-like objects only. > > Both in psycopg2 and 3 there is an adaptation from Python types to > Postgres string representation. In pg2 there is additional quoting, > because apart from numbers and bools you need to quote a literal > string to merge it to the query and make it syntactically valid. So the issue in the psycopg3 protocol is making the parameters that are passed in separately match up correctly in type to what the server is expecting(or can cast implicitly)? > > -- Daniele > -- Adrian Klaver adrian.klaver@aklaver.com
On Sun, 8 Nov 2020 at 20:35, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > Alright I understand now. > More below. > > > > > In psycopg3 the idea is to use a more advanced protocol, which > > separates query and parameters. It brings several benefits: can use > > prepared statements (send a query once, several parameters later), > > passing large data doesn't bloat the parser (the params don't hit the > > lexer/parser), can use binary format (useful to pass large binary > > blobs without escaping them in a textual form), the format of the data > > is more homogeneous (no need to quoting), so we can use Python objects > > in COPY instead of limiting the interface for the copy functions to > > file-like objects only. > > > > Both in psycopg2 and 3 there is an adaptation from Python types to > > Postgres string representation. In pg2 there is additional quoting, > > because apart from numbers and bools you need to quote a literal > > string to merge it to the query and make it syntactically valid. > > So the issue in the psycopg3 protocol is making the parameters that are > passed in separately match up correctly in type to what the server is > expecting(or can cast implicitly)? Yes, correct. What we have to choose is which Postgres oid to map to each Python type. Sometimes the mapping is trivial (e.g. `datetime.date` -> `date` in Postgres, `uuid.UUID` -> `uuid`...) Sometimes it might be ambiguous: is a `datetime.datetime` a `timestamp` or a `timestamptz`? In some cases we don't care (here we can say `timestamptz` no problem: if the Python datetime doesn't have tzinfo, Postgres will use the `TimeZone` setting). Sometimes it's messy: what Python type corresponds to a Postgres `jsonb`? It might be a dict, or a list, or types that have other representations too (numbers, strings, bools). In this case, as in psycopg2, there can be a wrapper, e.g. `Json`, to tell psycopg that this dict, or list, or whatever else, must be jsonified for the db. When there are mismatches, sometimes the database cast rules help (e.gi in the timestamp[tz] case). Sometimes not: if we say `text` to a jsonb field, it will raise an error. Sometimes a cast is automatic on inserting in a table but not on passing a function parameter. Numbers are messy, as they usually are: Python has int, float, Decimal, Postgres has int2, int4, int8, float4, float8, numeric. The mappings float -> float8 and Decimal -> numeric are more or less straightforward. `int` is not, as in Python it's unbounded. If you say `select 10` in psql, the server understands "unknown type, but a number", and can try if either int* or numeric fit the context. But we don't have the help from the syntax that psql has: because 10 doesn't have quotes, Postgres is sure that it is a number, and not a string, but executing query/params separately we lose that expressivity: we cannot quote the strings and not the number. So choices are: 1. If we specify `numeric` or `int8` as oid, inserting in an int field in a table will work ok, but some functions/operators won't (e.g. "1 >> %s"). 2. If we specify `int4` it would work for those few functions defined as `integer`, but if we try to write a number that doesn't fit in 32 bits into a Postgres bigint field I assume something will overflow along the way, even if both python and postgres can handle it. 3. If we specify `unknown` it might work more often, but `cursor.execute("select %s", [10]) will return the string "10" instead of a number. So I wonder what's the best compromise to do here: the less bad seems 1. 3. might work in more contexts, but it's a very counterintuitive behaviour, and roundtripping other objects (dates, uuid) works no problem: they don't come back as strings. -- Daniele
Hello,
From what I understood from your messages, I like passing 'unknown' for both strings and numbers.
Roundtripping parameters seems to be a less common case (with a possible fix if it's necessary).
Is there anything else that does not work or works counterintuitively with Python 'int' -> 'unknown'?
BTW, may I ask another question regarding parameters?
Don't you want to step away from '%s' syntax and use '$1, $2, ...' which seems to be more traditional in the database world?
'%s' feels like old-school string formatting, new server-side parameter binding may want to give some new impression.
Moreover, it appears more convenient when you have parameters numbered and can reuse them a few times in a query.
Vladimir
On Sun, 8 Nov 2020 at 14:22, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Sun, 8 Nov 2020 at 20:35, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> Alright I understand now.
> More below.
>
> >
> > In psycopg3 the idea is to use a more advanced protocol, which
> > separates query and parameters. It brings several benefits: can use
> > prepared statements (send a query once, several parameters later),
> > passing large data doesn't bloat the parser (the params don't hit the
> > lexer/parser), can use binary format (useful to pass large binary
> > blobs without escaping them in a textual form), the format of the data
> > is more homogeneous (no need to quoting), so we can use Python objects
> > in COPY instead of limiting the interface for the copy functions to
> > file-like objects only.
> >
> > Both in psycopg2 and 3 there is an adaptation from Python types to
> > Postgres string representation. In pg2 there is additional quoting,
> > because apart from numbers and bools you need to quote a literal
> > string to merge it to the query and make it syntactically valid.
>
> So the issue in the psycopg3 protocol is making the parameters that are
> passed in separately match up correctly in type to what the server is
> expecting(or can cast implicitly)?
Yes, correct. What we have to choose is which Postgres oid to map to
each Python type.
Sometimes the mapping is trivial (e.g. `datetime.date` -> `date` in
Postgres, `uuid.UUID` -> `uuid`...)
Sometimes it might be ambiguous: is a `datetime.datetime` a
`timestamp` or a `timestamptz`? In some cases we don't care (here we
can say `timestamptz` no problem: if the Python datetime doesn't have
tzinfo, Postgres will use the `TimeZone` setting).
Sometimes it's messy: what Python type corresponds to a Postgres
`jsonb`? It might be a dict, or a list, or types that have other
representations too (numbers, strings, bools). In this case, as in
psycopg2, there can be a wrapper, e.g. `Json`, to tell psycopg that
this dict, or list, or whatever else, must be jsonified for the db.
When there are mismatches, sometimes the database cast rules help
(e.gi in the timestamp[tz] case). Sometimes not: if we say `text` to a
jsonb field, it will raise an error. Sometimes a cast is automatic on
inserting in a table but not on passing a function parameter.
Numbers are messy, as they usually are: Python has int, float,
Decimal, Postgres has int2, int4, int8, float4, float8, numeric. The
mappings float -> float8 and Decimal -> numeric are more or less
straightforward. `int` is not, as in Python it's unbounded. If you say
`select 10` in psql, the server understands "unknown type, but a
number", and can try if either int* or numeric fit the context. But we
don't have the help from the syntax that psql has: because 10 doesn't
have quotes, Postgres is sure that it is a number, and not a string,
but executing query/params separately we lose that expressivity: we
cannot quote the strings and not the number. So choices are:
1. If we specify `numeric` or `int8` as oid, inserting in an int field
in a table will work ok, but some functions/operators won't (e.g. "1
>> %s").
2. If we specify `int4` it would work for those few functions defined
as `integer`, but if we try to write a number that doesn't fit in 32
bits into a Postgres bigint field I assume something will overflow
along the way, even if both python and postgres can handle it.
3. If we specify `unknown` it might work more often, but
`cursor.execute("select %s", [10]) will return the string "10" instead
of a number.
So I wonder what's the best compromise to do here: the less bad seems
1. 3. might work in more contexts, but it's a very counterintuitive
behaviour, and roundtripping other objects (dates, uuid) works no
problem: they don't come back as strings.
-- Daniele
A wild idea: support both client-side (like in psycopg2) and server-side binding. Keep old '%s' syntax and provide a separate method for client-side binding (not in 'cur.execute()'). This could alleviate cases like parameters roundtripping and other cases of safe query composition. At the same time use '$N' for true server-side binding.
Is it an overcomplication or there are valid use-cases of that?
On Sun, 8 Nov 2020 at 18:19, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:
Hello,From what I understood from your messages, I like passing 'unknown' for both strings and numbers.Roundtripping parameters seems to be a less common case (with a possible fix if it's necessary).Is there anything else that does not work or works counterintuitively with Python 'int' -> 'unknown'?BTW, may I ask another question regarding parameters?Don't you want to step away from '%s' syntax and use '$1, $2, ...' which seems to be more traditional in the database world?'%s' feels like old-school string formatting, new server-side parameter binding may want to give some new impression.Moreover, it appears more convenient when you have parameters numbered and can reuse them a few times in a query.VladimirOn Sun, 8 Nov 2020 at 14:22, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:On Sun, 8 Nov 2020 at 20:35, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> Alright I understand now.
> More below.
>
> >
> > In psycopg3 the idea is to use a more advanced protocol, which
> > separates query and parameters. It brings several benefits: can use
> > prepared statements (send a query once, several parameters later),
> > passing large data doesn't bloat the parser (the params don't hit the
> > lexer/parser), can use binary format (useful to pass large binary
> > blobs without escaping them in a textual form), the format of the data
> > is more homogeneous (no need to quoting), so we can use Python objects
> > in COPY instead of limiting the interface for the copy functions to
> > file-like objects only.
> >
> > Both in psycopg2 and 3 there is an adaptation from Python types to
> > Postgres string representation. In pg2 there is additional quoting,
> > because apart from numbers and bools you need to quote a literal
> > string to merge it to the query and make it syntactically valid.
>
> So the issue in the psycopg3 protocol is making the parameters that are
> passed in separately match up correctly in type to what the server is
> expecting(or can cast implicitly)?
Yes, correct. What we have to choose is which Postgres oid to map to
each Python type.
Sometimes the mapping is trivial (e.g. `datetime.date` -> `date` in
Postgres, `uuid.UUID` -> `uuid`...)
Sometimes it might be ambiguous: is a `datetime.datetime` a
`timestamp` or a `timestamptz`? In some cases we don't care (here we
can say `timestamptz` no problem: if the Python datetime doesn't have
tzinfo, Postgres will use the `TimeZone` setting).
Sometimes it's messy: what Python type corresponds to a Postgres
`jsonb`? It might be a dict, or a list, or types that have other
representations too (numbers, strings, bools). In this case, as in
psycopg2, there can be a wrapper, e.g. `Json`, to tell psycopg that
this dict, or list, or whatever else, must be jsonified for the db.
When there are mismatches, sometimes the database cast rules help
(e.gi in the timestamp[tz] case). Sometimes not: if we say `text` to a
jsonb field, it will raise an error. Sometimes a cast is automatic on
inserting in a table but not on passing a function parameter.
Numbers are messy, as they usually are: Python has int, float,
Decimal, Postgres has int2, int4, int8, float4, float8, numeric. The
mappings float -> float8 and Decimal -> numeric are more or less
straightforward. `int` is not, as in Python it's unbounded. If you say
`select 10` in psql, the server understands "unknown type, but a
number", and can try if either int* or numeric fit the context. But we
don't have the help from the syntax that psql has: because 10 doesn't
have quotes, Postgres is sure that it is a number, and not a string,
but executing query/params separately we lose that expressivity: we
cannot quote the strings and not the number. So choices are:
1. If we specify `numeric` or `int8` as oid, inserting in an int field
in a table will work ok, but some functions/operators won't (e.g. "1
>> %s").
2. If we specify `int4` it would work for those few functions defined
as `integer`, but if we try to write a number that doesn't fit in 32
bits into a Postgres bigint field I assume something will overflow
along the way, even if both python and postgres can handle it.
3. If we specify `unknown` it might work more often, but
`cursor.execute("select %s", [10]) will return the string "10" instead
of a number.
So I wonder what's the best compromise to do here: the less bad seems
1. 3. might work in more contexts, but it's a very counterintuitive
behaviour, and roundtripping other objects (dates, uuid) works no
problem: they don't come back as strings.
-- Daniele
On 08/11/20 23:21, Daniele Varrazzo wrote: [snip] > 1. If we specify `numeric` or `int8` as oid, inserting in an int field > in a table will work ok, but some functions/operators won't (e.g. "1 >>> %s"). > 2. If we specify `int4` it would work for those few functions defined > as `integer`, but if we try to write a number that doesn't fit in 32 > bits into a Postgres bigint field I assume something will overflow > along the way, even if both python and postgres can handle it. > 3. If we specify `unknown` it might work more often, but > `cursor.execute("select %s", [10]) will return the string "10" instead > of a number. > > So I wonder what's the best compromise to do here: the less bad seems > 1. 3. might work in more contexts, but it's a very counterintuitive > behaviour, and roundtripping other objects (dates, uuid) works no > problem: they don't come back as strings. Looking at what the adapters in other languages/frameworks do the common solution is to choose the "best fitting" type and let the programmer add a cast when needed. This is easier in statically typed languages where we have an almost perfect overlap between PostgreSQL and platform types but a bit more difficult in dynamic typed languages like Python where the available types are abstracted over the platform ones (numbers are a good example). In your example I'd just go for int8 (the largest possible int in PostgreSQL). Decimal would probably be better (largest range) but it is not what the majority of people would expect. IMHO, oid is a bad idea because it has a very specific semantic and the error messages generated by PostgreSQL will be more confusing. federico
On Mon, 9 Nov 2020 at 02:19, Vladimir Ryabtsev <greatvovan@gmail.com> wrote: > BTW, may I ask another question regarding parameters? > Don't you want to step away from '%s' syntax and use '$1, $2, ...' which seems to be more traditional in the database world? > '%s' feels like old-school string formatting, new server-side parameter binding may want to give some new impression. > Moreover, it appears more convenient when you have parameters numbered and can reuse them a few times in a query. Hi Vladmir, I wouldn't want to step away from the %s placeholder, because that would mean that every query of every program written in psycopg2 would need to be rewritten, and that would be an impossibly steep adoption curve. Furthermore the %(named)s placeholders are a much better convenience over $n: you couldn't pass {name: param} mapping otherwise. The $n parameters are also cumbersome in requiring an explicit mapping, where an implicit one would have worked (so it has to be "$1 $2 $3" and counting, instead of uniform "%s, %s, %s") and slipping a parameter in the middle of a sequence of parameters requires to renumber all the following ones. Another feature added to psycopg3 is support for binary parameters: I added %b and %(name)b placeholders to mark the placeholders requiring a binary param, so that you can `execute("insert into image (name, data) values (%s, %b)", [filename, image_bytes])`: if you used $n placeholders you would need a new parameter to specify, of the list (or mapping) of parameters, which one do you want in text and in binary format, or some other mechanism, such as a `Binary(image_bytes)` wrapper. Said that, because we are using the server-side binding, we are actually passing $n parameters to the server: in psycopg3 there is a parser to convert %s and %(name)s placeholders to $n style and to reorder params mapping when needed. So the query above results in a call like `pgconn.exec_params(b"select $1, $2", [filename.encode(...), image_bytes], [TEXT, BINARY])`. If there is interest we can think about how to make this querying layer more accessible (e.g. using a `cur.execute(PgQuery("select $1, $2"), [...])` or some other wrapping mechanism. -- Daniele
On Mon, 9 Nov 2020 at 02:49, Vladimir Ryabtsev <greatvovan@gmail.com> wrote: > > A wild idea: support both client-side (like in psycopg2) and server-side binding. Keep old '%s' syntax and provide a separatemethod for client-side binding (not in 'cur.execute()'). This could alleviate cases like parameters roundtrippingand other cases of safe query composition. At the same time use '$N' for true server-side binding. > Is it an overcomplication or there are valid use-cases of that? Such a thing exists already: there is a client-side binding library, which is pretty much a direct porting of the psycopg2.sql module (https://www.psycopg.org/docs/sql.html). Using that facility, you can compose argument client-side with `query = sql.SQL("select {}, %s").format("hello")` and pass the above to `cursor.execute(query, [42]). It uses `{}` and `{name}`-style parameters to bind client-side so it leaves %s placeholders untouched for execute(). In psycopg3 I've made it slightly easier to use by letting `SQL.format()` to accept any Python object, and automatically wrapping it into a `Literal()` object, whereas in psycopg2 `format()` only accepted "`Composible`" objects, and you should have written the above as `sql.SQL("select {}, %s").format(Literal("hello"))`. -- Daniele
On Mon, 9 Nov 2020 at 06:57, Federico Di Gregorio <fog@dndg.it> wrote: > In your example I'd just go for int8 (the largest possible int in > PostgreSQL). Decimal would probably be better (largest range) but it is > not what the majority of people would expect. The problem with int8 is that it fails in all the cases where decimal fails, while not allowing to pass values larger than 64 bits: piro=# select '[10,20,30]'::jsonb -> 1::int8; ERROR: operator does not exist: jsonb -> bigint LINE 1: select '[10,20,30]'::jsonb -> 1::int8; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. You are right though that, in terms of round-trippig, decimal might be unexpected too: >>> cur.execute("select %s", [10]).fetchone()[0] Decimal('10') So uhm... that's a +1 for that option too :D > IMHO, oid is a bad idea > because it has a very specific semantic and the error messages generated > by PostgreSQL will be more confusing. I'm not sure I understand this. At the moment, the oids are something that don't really surface to the end-users, who are not required to use them explicitly and shouldn't be seen in the error messages. For instance the query above might results in a call: >>> from psycopg3.oids import builtins >>> builtins["numeric"].oid 1700 >>> res = conn.pgconn.exec_params(b"select '[]'::jsonb -> $1", [b"1"], [1700]) >>> res.status <ExecStatus.FATAL_ERROR: 7> >>> print(res.error_message.decode("utf8")) ERROR: operator does not exist: jsonb -> numeric LINE 1: select '[]'::jsonb -> $1 ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. So the oid is only used internally, in the mapping python type -> exec_params() types array, the 1700 shouldn't surface anywhere. Maybe I'm misunderstanding your concern: can you tell me better? -- Daniele
On 09/11/20 13:00, Daniele Varrazzo wrote: > On Mon, 9 Nov 2020 at 06:57, Federico Di Gregorio <fog@dndg.it> wrote: [snip] >> IMHO, oid is a bad idea >> because it has a very specific semantic and the error messages generated >> by PostgreSQL will be more confusing. > > I'm not sure I understand this. At the moment, the oids are something > that don't really surface to the end-users, who are not required to > use them explicitly and shouldn't be seen in the error messages. For > instance the query above might results in a call: > > >>> from psycopg3.oids import builtins > >>> builtins["numeric"].oid > 1700 > > >>> res = conn.pgconn.exec_params(b"select '[]'::jsonb -> $1", > [b"1"], [1700]) > >>> res.status > <ExecStatus.FATAL_ERROR: 7> > > >>> print(res.error_message.decode("utf8")) > ERROR: operator does not exist: jsonb -> numeric > LINE 1: select '[]'::jsonb -> $1 > ^ > HINT: No operator matches the given name and argument types. You > might need to add explicit type casts. > > So the oid is only used internally, in the mapping python type -> > exec_params() types array, the 1700 shouldn't surface anywhere. > > Maybe I'm misunderstanding your concern: can you tell me better? My fault. I misread and though you wanted to use OID as *the* type to pass to PostggreSQL for numbers. federico
> I wouldn't want to step away from the %s placeholder ...
Thanks for the elaboration, I agree with your argumentation.
Thanks for the elaboration, I agree with your argumentation.
> If there is interest we can think about how to make this querying layer
> more accessible (e.g. using a `cur.execute(PgQuery("select $1, $2"), [...])`
It's always good if such customization is accessible. Having said that,
I can't think of a really good example that benefits from that rather than
from standard use, but basically the more possibilities the better.
> In psycopg3 I've made it slightly easier to use by letting
> `SQL.format()` to accept any Python objectGreat, thanks!
May I ask you again about using 'unknown' for numbers? Could you recap
all the downsides of this approach?
Vladimir
On Mon, 9 Nov 2020 at 04:05, Federico Di Gregorio <fog@dndg.it> wrote:
On 09/11/20 13:00, Daniele Varrazzo wrote:
> On Mon, 9 Nov 2020 at 06:57, Federico Di Gregorio <fog@dndg.it> wrote:
[snip]
>> IMHO, oid is a bad idea
>> because it has a very specific semantic and the error messages generated
>> by PostgreSQL will be more confusing.
>
> I'm not sure I understand this. At the moment, the oids are something
> that don't really surface to the end-users, who are not required to
> use them explicitly and shouldn't be seen in the error messages. For
> instance the query above might results in a call:
>
> >>> from psycopg3.oids import builtins
> >>> builtins["numeric"].oid
> 1700
>
> >>> res = conn.pgconn.exec_params(b"select '[]'::jsonb -> $1",
> [b"1"], [1700])
> >>> res.status
> <ExecStatus.FATAL_ERROR: 7>
>
> >>> print(res.error_message.decode("utf8"))
> ERROR: operator does not exist: jsonb -> numeric
> LINE 1: select '[]'::jsonb -> $1
> ^
> HINT: No operator matches the given name and argument types. You
> might need to add explicit type casts.
>
> So the oid is only used internally, in the mapping python type ->
> exec_params() types array, the 1700 shouldn't surface anywhere.
>
> Maybe I'm misunderstanding your concern: can you tell me better?
My fault. I misread and though you wanted to use OID as *the* type to
pass to PostggreSQL for numbers.
federico
On 11/8/20 2:21 PM, Daniele Varrazzo wrote: > On Sun, 8 Nov 2020 at 20:35, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> Alright I understand now. >> More below. >> >>> >>> In psycopg3 the idea is to use a more advanced protocol, which >>> separates query and parameters. It brings several benefits: can use >>> prepared statements (send a query once, several parameters later), >>> passing large data doesn't bloat the parser (the params don't hit the >>> lexer/parser), can use binary format (useful to pass large binary >>> blobs without escaping them in a textual form), the format of the data >>> is more homogeneous (no need to quoting), so we can use Python objects >>> in COPY instead of limiting the interface for the copy functions to >>> file-like objects only. >>> >>> Both in psycopg2 and 3 there is an adaptation from Python types to >>> Postgres string representation. In pg2 there is additional quoting, >>> because apart from numbers and bools you need to quote a literal >>> string to merge it to the query and make it syntactically valid. >> >> So the issue in the psycopg3 protocol is making the parameters that are >> passed in separately match up correctly in type to what the server is >> expecting(or can cast implicitly)? > > Yes, correct. What we have to choose is which Postgres oid to map to > each Python type. > > Sometimes the mapping is trivial (e.g. `datetime.date` -> `date` in > Postgres, `uuid.UUID` -> `uuid`...) > > Sometimes it might be ambiguous: is a `datetime.datetime` a > `timestamp` or a `timestamptz`? In some cases we don't care (here we > can say `timestamptz` no problem: if the Python datetime doesn't have > tzinfo, Postgres will use the `TimeZone` setting). > > Sometimes it's messy: what Python type corresponds to a Postgres > `jsonb`? It might be a dict, or a list, or types that have other > representations too (numbers, strings, bools). In this case, as in > psycopg2, there can be a wrapper, e.g. `Json`, to tell psycopg that > this dict, or list, or whatever else, must be jsonified for the db. > > When there are mismatches, sometimes the database cast rules help > (e.gi in the timestamp[tz] case). Sometimes not: if we say `text` to a > jsonb field, it will raise an error. Sometimes a cast is automatic on > inserting in a table but not on passing a function parameter. > > Numbers are messy, as they usually are: Python has int, float, > Decimal, Postgres has int2, int4, int8, float4, float8, numeric. The > mappings float -> float8 and Decimal -> numeric are more or less > straightforward. `int` is not, as in Python it's unbounded. If you say > `select 10` in psql, the server understands "unknown type, but a > number", and can try if either int* or numeric fit the context. But we > don't have the help from the syntax that psql has: because 10 doesn't > have quotes, Postgres is sure that it is a number, and not a string, > but executing query/params separately we lose that expressivity: we > cannot quote the strings and not the number. So choices are: > > 1. If we specify `numeric` or `int8` as oid, inserting in an int field > in a table will work ok, but some functions/operators won't (e.g. "1 What is not working here? >>> %s"). > 2. If we specify `int4` it would work for those few functions defined > as `integer`, but if we try to write a number that doesn't fit in 32 > bits into a Postgres bigint field I assume something will overflow > along the way, even if both python and postgres can handle it. > 3. If we specify `unknown` it might work more often, but > `cursor.execute("select %s", [10]) will return the string "10" instead > of a number. > > So I wonder what's the best compromise to do here: the less bad seems > 1. 3. might work in more contexts, but it's a very counterintuitive > behaviour, and roundtripping other objects (dates, uuid) works no > problem: they don't come back as strings. There is a lot to digest here. I'm going to have to do some thinking on this. > > -- Daniele > -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, 10 Nov 2020 at 01:19, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > On 11/8/20 2:21 PM, Daniele Varrazzo wrote: > > 1. If we specify `numeric` or `int8` as oid, inserting in an int field > > in a table will work ok, but some functions/operators won't (e.g. "1 > > > %s"). > > What is not working here? Postgres has two different types of cast rules: "assignment" and "implicit". https://www.postgresql.org/docs/current/sql-createcast.html Assignment casts are only chosen by the parser on insert. Implicit casts are chosen in other contexts too. It appears that the cast rules from numeric/int8 to integer are of the first kind. So, while inserting into a table with a mismatching type mostly works: piro=# create table mytable (myint integer); CREATE TABLE piro=# insert into mytable (myint) values (42::decimal); INSERT 0 1 piro=# insert into mytable (myint) values (84::int8); INSERT 0 1 using the wrong type in more generic expression may fail: piro=# select 1 << 4::int; ?column? ---------- 16 piro=# select 1 << 4::int8; ERROR: operator does not exist: integer << bigint LINE 1: select 1 << 4::bigint; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. So a psycopg statement such as `cur.execute("select 1 << %s", [n])`, which used to work if bound client-side, must be rewritten as "select 1 << %s::integer" for server-side binding, both if we choose int8 or numeric as Postgres types to adapt a Python int. -- Daniele
> On Nov 7, 2020, at 08:32, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > What do you think? I don't want to derail this, but a question: Can a connection be set to use the simple-query protocol instead of the extended-queryprotocol in 3? -- -- Christophe Pettus xof@thebuild.com
On Tue, 10 Nov 2020 at 01:06, Vladimir Ryabtsev <greatvovan@gmail.com> wrote: > May I ask you again about using 'unknown' for numbers? Could you recap > all the downsides of this approach? After this useful conversation, I've updated the article (https://www.varrazzo.com/blog/2020/11/07/psycopg3-adaptation/) and I've added comparison tables with the choices available. The integer one is the following: Choices to cast Python ``int`` type: To ``unknown``: * +1: can express the full range of the unbounded Python ``int`` * +2: can be passed without cast to most data types and functions * -2: it round-trips back to string, error on PostgreSQL < 10 To ``numeric``: * +1: can express the full range of the unbounded Python ``int`` * +1: can be passed without cast to most data types * -1: requires a cast for some functions * DELETED [-1: it round-trips to 'Decimal'`] * +0.5: **it can round-trip back to int**, with additional care - more to follow To ``int8``: * -0.5: can't express values not fitting in 64 bits (relatively limited use case: if the target column is ``numeric`` then it would be wise for the user to pass a ``Decimal``) * +1: can be passed without cast to most data types * -1: requires a cast for some functions * +1: it round-trips back to ``int`` To ``int4``: * -1: limited range: couldn't be used to pass a value not fitting into 32 bytes from Python to a ``bigint``. * +2: can be passed without cast to most data types and functions * +1: it round-trips back to ``int`` What about the DELETED entry on ``numeric``? If we dump ``int`` -> ``numeric`` to the db, and load back ``numeric`` -> ``Decimal`` from it, we end up with integers round-tripping to ``Decimal``, which could easily create errors in Python contexts which are not ready to deal with fixed-point arithmetic. However, upon receiving a ``numeric`` from the database, we can check what number it is: if it has no decimal digit it can be returned to Python as ``int``, if it has any decimal digit it must be returned as ``Decimal``. This mechanism can be as trivial as `looking if there is a '.'`__ in the data returned by the database; however in many cases the is job made simpler (or at least more efficient) by the presence of the ``numeric`` modifiers: ``numeric`` comes in three flavours: - ``numeric``: arbitrary precision (number of digits) and scale (number of digits after the decimal point), - ``numeric(n)``: limited precision, no digit after the decimal point (equivalent to ``numeric(n, 0)``), - ``numeric(n, m)``: limited precision, fixed number of digits after the decimal point. .. __: https://github.com/psycopg/psycopg3/commit/5ced659f4838cf72c1981518ae2804942ebbd07b The modifier information is returned in many contexts (among which the most important: selecting data from tables): if we know the scale `we can decide upfront`__ to load the entire column as ``int`` if the scale is 0, ``Decimal`` otherwise. If the modifier is unknown we can look at the presence of the dot. .. __: https://github.com/psycopg/psycopg3/commit/a9444144f7d0581284ccab198ad0355436e6822a Returning an ``int`` in a context where ``Decimal`` are expected doesn't seem a big problem: throwing an integer to a fixed-point calculation doesn't wreak havoc has it does throwing a ``Decimal`` in a context of ``flaot`` calculations. Running the entire Django test suite after this change caused no test to fail, which gives me some comfort.
> On Nov 9, 2020, at 19:20, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > > Choices to cast Python ``int`` type: Is it absurd to make the choice at execution time, based on the actual value of the Python int? -- -- Christophe Pettus xof@thebuild.com
On Tue, 10 Nov 2020 at 03:20, Christophe Pettus <xof@thebuild.com> wrote: > > I don't want to derail this, but a question: Can a connection be set to use the simple-query protocol instead of the extended-queryprotocol in 3? It won't be set explicitly to use simple/extended protocol, but if a query is passed with no parameter (because it just has no variable part, or because it's being composed client-side using functionalities similar to psycopg2.sql) then the simple-query protocol is used [1]. This is mostly useful to pass several queries at once, because the extended query protocol doesn't allow more than one query at time [2]. [1] https://github.com/psycopg/psycopg3/blob/bc02ae1c9c42892966ff079250c24ff5eae28b48/psycopg3/psycopg3/cursor.py#L243-L254 [2] https://www.postgresql.org/docs/current/libpq-exec.html#LIBPQ-PQEXECPARAMS -- Daniele
On Tue, 10 Nov 2020 at 03:22, Christophe Pettus <xof@thebuild.com> wrote: > > > On Nov 9, 2020, at 19:20, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > > > > Choices to cast Python ``int`` type: > > Is it absurd to make the choice at execution time, based on the actual value of the Python int? I've been thinking a lot about it. I haven't completely ruled it out, but there are a few cases in which having different oids for the same query gets in the way. One that comes to mind is with prepared statements, either explicit (which I haven't exposed yet, but it's like the #1 request for a new feature), or implicit (currently using them to implement 'executemany()'). However I might be overestimating these issues, yes. I guess I should give an overview of the whole adaptation system: I'll try and write its documentation in the next few days. I have to start with the documentation somewhere... -- Daniele
> I've added comparison tables with the choices available.
Nice article!
I see you have assigned weights to the upsides and downsides
and now these weights are going to determine the winner.
My particular question is regarding '-2' score for mapping numbers
to 'unknown'.
Why do you consider it such an important use-case? In my opinion,
such usage is just waste of resources and traffic, and if presenting
some constants in the result set is really required, other means,
such as 'SQL.format()`, might be a better option.
> Is it absurd to make the choice at execution time, based on the actual value of the Python int?
If I got the idea correctly, it may lead to some inconsistent behaviour
in a series of executions of a prepared statement. Imagine you tested
you query with small numbers that fit into int4, but in production rarely
blogger values can come that must be sent as int8, which will lead to
occasional cast errors.
Vladimir
On Mon, 9 Nov 2020 at 19:43, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Tue, 10 Nov 2020 at 03:22, Christophe Pettus <xof@thebuild.com> wrote:
>
> > On Nov 9, 2020, at 19:20, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
> >
> > Choices to cast Python ``int`` type:
>
> Is it absurd to make the choice at execution time, based on the actual value of the Python int?
I've been thinking a lot about it. I haven't completely ruled it out,
but there are a few cases in which having different oids for the same
query gets in the way. One that comes to mind is with prepared
statements, either explicit (which I haven't exposed yet, but it's
like the #1 request for a new feature), or implicit (currently using
them to implement 'executemany()'). However I might be overestimating
these issues, yes.
I guess I should give an overview of the whole adaptation system: I'll
try and write its documentation in the next few days. I have to start
with the documentation somewhere...
-- Daniele
> My particular question is regarding '-2' score for mapping numbers > to 'unknown'. > > Why do you consider it such an important use-case? In my opinion, > such usage is just waste of resources and traffic, I am sometimes round-tripping magic constants to tell which part of a, say, UNION actually returned the result(s), based on which knowledge the EMR UI will behave slightly differently (say, one is retrieving medications, by matching either of product name or active ingredient, all in one query). I needn't roundtrip ints, sure, but... Karsten
On 10/11/20, Daniele Varrazzo (daniele.varrazzo@gmail.com) wrote: > On Tue, 10 Nov 2020 at 03:22, Christophe Pettus <xof@thebuild.com> wrote: > > > > > On Nov 9, 2020, at 19:20, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > > > > > > Choices to cast Python ``int`` type: > > > > Is it absurd to make the choice at execution time, based on the actual value of the Python int? > > I've been thinking a lot about it. I haven't completely ruled it out, > but there are a few cases in which having different oids for the same > query gets in the way. One that comes to mind is with prepared > statements, either explicit (which I haven't exposed yet, but it's > like the #1 request for a new feature), or implicit (currently using > them to implement 'executemany()'). However I might be overestimating > these issues, yes. > > I guess I should give an overview of the whole adaptation system: I'll > try and write its documentation in the next few days. I have to start > with the documentation somewhere... Apologies for a no-doubt naive suggestion, Daniele, but how about a postgresql type 'shim' of some sort that only accepts python types on input and translates output back to only python types. If such a shim, perhaps a type + C function pair were used, I assume it would not round-trip per-se, but could cycle through int types from most restrictive to most lenient on the basis (I'm guessing) that postgresql will coerce a postgres int4 to and int8 on insertion if necessary, on the principle that the following works ok: test=> create table a (b int8); CREATE TABLE test=> insert into a values (1::int4); INSERT 0 1 test=> create table b (c numeric); CREATE TABLE test=> insert into b values (4::int8); INSERT 0 1 I assume mapping native postgresql column types to output values would pass back through such a 'sieve' quite naturally into native python types. Whether such a filtering layer should work directly in postgresql or as a translation (or 'adaptation') layer in psycopg[3]*is perhaps a similar debate -- although at a lower level -- about Django's ORM. Where should the logic lie? By the way I believe this is this 'layer' for the golang pgx module, which may be of interest: https://github.com/jackc/pgx/blob/93c6b60429e13e0016665214dca2c6382982cf99/values.go#L28 although golang is of course is more strongly typed than python. I thought the type switch test for coercion, as Christophe suggests, might be doable through the 'shim' layer I'm imagining. Regards Rory