Thread: [psycopg] Solving the SQL composition problem

[psycopg] Solving the SQL composition problem

From
Daniele Varrazzo
Date:
Hello,

one recurring problem using psycopg is with the generation of dynamic
SQL strings, e.g. where the table or the field names change. Usually
the conversation goes:

User: this doesn't work: cur.execute("insert into %s values (%s)",
['mytable', 42])
Maintainer: yes, you can't pass tables to the arguments
User: what should I do?
Maintainer: normal string concatenation: cur.execute("insert into %s
values %%s" % 'mytable', [42])
User: what if the table name has special chars?
Maintainer: ehm...
User: what if the source of the table name is not secure?
Maintainer: ehm...

The problem is slightly improved by exposing `quote_ident()`  but not
solved altogether: people must remember to use `quote_ident()` every
time, which is dangerous, because forgetting to use it will usually
work anyway... until one table name contains a special char.
Furthermore sometimes you will want to include a SQL value in the
query, which should be done in a complete different way (calling
adapt, prepare, getquoted...).

[quote_ident()]
http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.quote_ident

My attempt to a solution: the `psycopg2.sql` module:
http://initd.org/psycopg/docs/sql.html

The design was sort of sketched in bugs #308 and #358. Unlike these
first sketches there is no new method on connections or cursors: the
new module exposes certain Composable objects (SQL, Literal,
Identifier...) which can be composed using operators and methods and
forming new Composables. Composables can be ultimately used in place
of strings in queries.

[bug #308]: https://github.com/psycopg/psycopg2/issues/308
[bug #358]: https://github.com/psycopg/psycopg2/issues/358

With these objects it is possible to call:

    from psycopg2 import sql

    cur.execute(
        sql.SQL("insert into %s values (%%s)") % [sql.Identifier('mytable')],
        [42])

If you could take a look at the documentation, or play with the
feature, and let me know what you think about the feature design, or
any other observation, it would be great. The code currently lives in
the sql-compose branch of the github repository.

[sql-compose] https://github.com/psycopg/psycopg2/tree/sql-compose

Thank you very much!


-- Daniele


Re: [psycopg] Solving the SQL composition problem

From
Adrian Klaver
Date:
On 01/01/2017 12:11 AM, Daniele Varrazzo wrote:
> Hello,
>
> one recurring problem using psycopg is with the generation of dynamic
> SQL strings, e.g. where the table or the field names change. Usually
> the conversation goes:
>
> User: this doesn't work: cur.execute("insert into %s values (%s)",
> ['mytable', 42])
> Maintainer: yes, you can't pass tables to the arguments
> User: what should I do?
> Maintainer: normal string concatenation: cur.execute("insert into %s
> values %%s" % 'mytable', [42])
> User: what if the table name has special chars?
> Maintainer: ehm...
> User: what if the source of the table name is not secure?
> Maintainer: ehm...
>
> The problem is slightly improved by exposing `quote_ident()`  but not
> solved altogether: people must remember to use `quote_ident()` every
> time, which is dangerous, because forgetting to use it will usually
> work anyway... until one table name contains a special char.
> Furthermore sometimes you will want to include a SQL value in the
> query, which should be done in a complete different way (calling
> adapt, prepare, getquoted...).
>
> [quote_ident()]
> http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.quote_ident
>
> My attempt to a solution: the `psycopg2.sql` module:
> http://initd.org/psycopg/docs/sql.html
>
> The design was sort of sketched in bugs #308 and #358. Unlike these
> first sketches there is no new method on connections or cursors: the
> new module exposes certain Composable objects (SQL, Literal,
> Identifier...) which can be composed using operators and methods and
> forming new Composables. Composables can be ultimately used in place
> of strings in queries.
>
> [bug #308]: https://github.com/psycopg/psycopg2/issues/308
> [bug #358]: https://github.com/psycopg/psycopg2/issues/358
>
> With these objects it is possible to call:
>
>     from psycopg2 import sql
>
>     cur.execute(
>         sql.SQL("insert into %s values (%%s)") % [sql.Identifier('mytable')],
>         [42])
>
> If you could take a look at the documentation, or play with the
> feature, and let me know what you think about the feature design, or
> any other observation, it would be great. The code currently lives in
> the sql-compose branch of the github repository.
>
> [sql-compose] https://github.com/psycopg/psycopg2/tree/sql-compose
>
> Thank you very much!

Wow, that is very cool. I have gotten as far reading the docs,
especially like Placeholder. Still need to check out the branch and
play. Will report back.

>
>
> -- Daniele
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [psycopg] Solving the SQL composition problem

From
Karsten Hilbert
Date:
On Sun, Jan 01, 2017 at 09:11:24AM +0100, Daniele Varrazzo wrote:

> one recurring problem using psycopg is with the generation of dynamic
> SQL strings, e.g. where the table or the field names change. Usually
> the conversation goes:

[...]

The concept immediately appeals to me and I would surely
apply it in GNUmed.

One word of caution: pretty-please don't let psycopg2 slide
into being some sort of mini/pseudo ORM.

Thanks !

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: [psycopg] Solving the SQL composition problem

From
Adrian Klaver
Date:
On 01/01/2017 12:11 AM, Daniele Varrazzo wrote:
> Hello,
>
> one recurring problem using psycopg is with the generation of dynamic
> SQL strings, e.g. where the table or the field names change. Usually
> the conversation goes:
>
> User: this doesn't work: cur.execute("insert into %s values (%s)",
> ['mytable', 42])
> Maintainer: yes, you can't pass tables to the arguments
> User: what should I do?
> Maintainer: normal string concatenation: cur.execute("insert into %s
> values %%s" % 'mytable', [42])
> User: what if the table name has special chars?
> Maintainer: ehm...
> User: what if the source of the table name is not secure?
> Maintainer: ehm...
>
> The problem is slightly improved by exposing `quote_ident()`  but not
> solved altogether: people must remember to use `quote_ident()` every
> time, which is dangerous, because forgetting to use it will usually
> work anyway... until one table name contains a special char.
> Furthermore sometimes you will want to include a SQL value in the
> query, which should be done in a complete different way (calling
> adapt, prepare, getquoted...).
>
> [quote_ident()]
> http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.quote_ident
>
> My attempt to a solution: the `psycopg2.sql` module:
> http://initd.org/psycopg/docs/sql.html
>
> The design was sort of sketched in bugs #308 and #358. Unlike these
> first sketches there is no new method on connections or cursors: the
> new module exposes certain Composable objects (SQL, Literal,
> Identifier...) which can be composed using operators and methods and
> forming new Composables. Composables can be ultimately used in place
> of strings in queries.
>
> [bug #308]: https://github.com/psycopg/psycopg2/issues/308
> [bug #358]: https://github.com/psycopg/psycopg2/issues/358
>
> With these objects it is possible to call:
>
>     from psycopg2 import sql
>
>     cur.execute(
>         sql.SQL("insert into %s values (%%s)") % [sql.Identifier('mytable')],
>         [42])
>
> If you could take a look at the documentation, or play with the
> feature, and let me know what you think about the feature design, or
> any other observation, it would be great. The code currently lives in
> the sql-compose branch of the github repository.
>
> [sql-compose] https://github.com/psycopg/psycopg2/tree/sql-compose

Yes this is going to make building dynamic SQL cleaner then the
hodgepodge of things I have created over the years. Only hiccup I had
was more me then anything else. You have this example:

from psycopg2 import sql

cur.execute(
     sql.SQL("insert into %s values (%%s, %%s)") %
[sql.Identifier('my_table')],
     [10, 20])

towards the top of the docs section. When I got down to the Placeholder
section it did not occur to me at first that I could eliminate
Placeholder instances by just putting %%s in the template. Going back
over the docs got me to the above example and clarity.


>
> Thank you very much!
>
>
> -- Daniele
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [psycopg] Solving the SQL composition problem

From
Oleksandr Shulgin
Date:
On Sun, Jan 1, 2017 at 9:11 AM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
Hello,

one recurring problem using psycopg is with the generation of dynamic
SQL strings, e.g. where the table or the field names change. Usually
the conversation goes:

User: this doesn't work: cur.execute("insert into %s values (%s)",
['mytable', 42])
Maintainer: yes, you can't pass tables to the arguments
User: what should I do?
Maintainer: normal string concatenation: cur.execute("insert into %s
values %%s" % 'mytable', [42])
User: what if the table name has special chars?
Maintainer: ehm...
User: what if the source of the table name is not secure?
Maintainer: ehm...

The problem is slightly improved by exposing `quote_ident()`  but not
solved altogether: people must remember to use `quote_ident()` every
time, which is dangerous, because forgetting to use it will usually
work anyway... until one table name contains a special char.
Furthermore sometimes you will want to include a SQL value in the
query, which should be done in a complete different way (calling
adapt, prepare, getquoted...).

[quote_ident()]
http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.quote_ident

My attempt to a solution: the `psycopg2.sql` module:
http://initd.org/psycopg/docs/sql.html

The design was sort of sketched in bugs #308 and #358. Unlike these
first sketches there is no new method on connections or cursors: the
new module exposes certain Composable objects (SQL, Literal,
Identifier...) which can be composed using operators and methods and
forming new Composables. Composables can be ultimately used in place
of strings in queries.

[bug #308]: https://github.com/psycopg/psycopg2/issues/308
[bug #358]: https://github.com/psycopg/psycopg2/issues/358

With these objects it is possible to call:

    from psycopg2 import sql

    cur.execute(
        sql.SQL("insert into %s values (%%s)") % [sql.Identifier('mytable')],
        [42])

If you could take a look at the documentation, or play with the
feature, and let me know what you think about the feature design, or
any other observation, it would be great. The code currently lives in
the sql-compose branch of the github repository.

[sql-compose] https://github.com/psycopg/psycopg2/tree/sql-compose

Thank you very much!

Hello Daniele,

I've only taken look at the docs and that totally makes sense to me.  Thank you!

--
Alex

Re: [psycopg] Solving the SQL composition problem

From
Jim Nasby
Date:
On 1/1/17 2:11 AM, Daniele Varrazzo wrote:
>         sql.SQL("insert into %s values (%%s)") % [sql.Identifier('mytable')],

Since %s isn't standard parameter replacement anyway, I'm wondering if
both considerations could just be handled by execute(), by using
different replacement syntax. IE:

execute('insert into %s values ($1)', [42], ['my table'])

Obviously this would be backwards incompatible, but I think that's
manageable.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


Re: [psycopg] Solving the SQL composition problem

From
Adrian Klaver
Date:
On 01/02/2017 07:05 AM, Jim Nasby wrote:
> On 1/1/17 2:11 AM, Daniele Varrazzo wrote:
>>         sql.SQL("insert into %s values (%%s)") %
>> [sql.Identifier('mytable')],
>
> Since %s isn't standard parameter replacement anyway, I'm wondering if
> both considerations could just be handled by execute(), by using
> different replacement syntax. IE:

I thought it was?:

https://www.python.org/dev/peps/pep-0249/#paramstyle

format     ANSI C printf format codes, e.g. ...WHERE name=%s

>
> execute('insert into %s values ($1)', [42], ['my table'])
>
> Obviously this would be backwards incompatible, but I think that's
> manageable.


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [psycopg] Solving the SQL composition problem

From
Daniele Varrazzo
Date:
On Mon, Jan 2, 2017 at 4:05 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 1/1/17 2:11 AM, Daniele Varrazzo wrote:
>>
>>         sql.SQL("insert into %s values (%%s)") %
>> [sql.Identifier('mytable')],
>
>
> Since %s isn't standard parameter replacement anyway, I'm wondering if both
> considerations could just be handled by execute(), by using different
> replacement syntax. IE:
>
> execute('insert into %s values ($1)', [42], ['my table'])
>
> Obviously this would be backwards incompatible, but I think that's
> manageable.

mmm... what I think is that if these objects' replacement rules were
different one could leave the %s and %(name)s placeholder untouched
for the query params.

A natural choice could be to use the str.format syntax for the query
composition, or a subset of it. Hence my example could be:

cur.execute(
    sql.SQL("insert into {} values (%s,
%s)").format(sql.Identifier('my_table')),
    [10, 20])

This would largely remove the need for double escaping.

-- Daniele


Re: [psycopg] Solving the SQL composition problem

From
Adrian Klaver
Date:
On 01/02/2017 08:21 AM, Daniele Varrazzo wrote:
> On Mon, Jan 2, 2017 at 4:05 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> On 1/1/17 2:11 AM, Daniele Varrazzo wrote:
>>>
>>>         sql.SQL("insert into %s values (%%s)") %
>>> [sql.Identifier('mytable')],
>>
>>
>> Since %s isn't standard parameter replacement anyway, I'm wondering if both
>> considerations could just be handled by execute(), by using different
>> replacement syntax. IE:
>>
>> execute('insert into %s values ($1)', [42], ['my table'])
>>
>> Obviously this would be backwards incompatible, but I think that's
>> manageable.
>
> mmm... what I think is that if these objects' replacement rules were
> different one could leave the %s and %(name)s placeholder untouched
> for the query params.
>
> A natural choice could be to use the str.format syntax for the query
> composition, or a subset of it. Hence my example could be:
>
> cur.execute(
>     sql.SQL("insert into {} values (%s,
> %s)").format(sql.Identifier('my_table')),
>     [10, 20])

I like this, makes it clearer what is an identifier versus a placeholder.

>
> This would largely remove the need for double escaping.
>
> -- Daniele
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [psycopg] Solving the SQL composition problem

From
Jim Nasby
Date:
On 1/2/17 10:21 AM, Daniele Varrazzo wrote:
>> execute('insert into %s values ($1)', [42], ['my table'])
>>
>> Obviously this would be backwards incompatible, but I think that's
>> manageable.
> mmm... what I think is that if these objects' replacement rules were
> different one could leave the %s and %(name)s placeholder untouched
> for the query params.
>
> A natural choice could be to use the str.format syntax for the query
> composition, or a subset of it. Hence my example could be:
>
> cur.execute(
>     sql.SQL("insert into {} values (%s,
> %s)").format(sql.Identifier('my_table')),
>     [10, 20])
>
> This would largely remove the need for double escaping.

What I'm suggesting is to let Postgres handle the replacement of normal
values, using the prepared statement syntax of $1, $2, etc[1], and only
do identifier replacement in python (using quote_ident). That means a
lot less time spent parsing, and opens the door for eventually doing
more efficient stuff over the wire, like using binary type formats.

1: https://www.postgresql.org/docs/current/static/sql-prepare.html
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


Re: [psycopg] Solving the SQL composition problem

From
Daniele Varrazzo
Date:
On Mon, Jan 2, 2017 at 5:21 PM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:
>
> A natural choice could be to use the str.format syntax for the query
> composition, or a subset of it. Hence my example could be:
>
> cur.execute(
>     sql.SQL("insert into {} values (%s,
> %s)").format(sql.Identifier('my_table')),
>     [10, 20])

That's now implemented in the branch; docs updated too:
http://initd.org/psycopg/docs/sql.html. In particular see
http://initd.org/psycopg/docs/sql.html#psycopg2.sql.SQL.format which
replaces the previous SQL % operator (no more supported). It is no
more necessary to specify value placeholders as %%s so it's easier to
switch between constant and parametrized queries.

-- Daniele


Re: [psycopg] Solving the SQL composition problem

From
Adrian Klaver
Date:
On 01/03/2017 08:24 AM, Jim Nasby wrote:
> On 1/2/17 10:21 AM, Daniele Varrazzo wrote:
>>> execute('insert into %s values ($1)', [42], ['my table'])
>>>
>>> Obviously this would be backwards incompatible, but I think that's
>>> manageable.
>> mmm... what I think is that if these objects' replacement rules were
>> different one could leave the %s and %(name)s placeholder untouched
>> for the query params.
>>
>> A natural choice could be to use the str.format syntax for the query
>> composition, or a subset of it. Hence my example could be:
>>
>> cur.execute(
>>     sql.SQL("insert into {} values (%s,
>> %s)").format(sql.Identifier('my_table')),
>>     [10, 20])
>>
>> This would largely remove the need for double escaping.
>
> What I'm suggesting is to let Postgres handle the replacement of normal
> values, using the prepared statement syntax of $1, $2, etc[1], and only
> do identifier replacement in python (using quote_ident). That means a
> lot less time spent parsing, and opens the door for eventually doing
> more efficient stuff over the wire, like using binary type formats.

I believe a path to the above is explained here:

https://www.postgresql.org/message-id/AANLkTi=ym3SCQKCQBtp8RJHUswwAPOpjXYKTXS=aHWzp@mail.gmail.com

>
> 1: https://www.postgresql.org/docs/current/static/sql-prepare.html


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [psycopg] Solving the SQL composition problem

From
Christophe Pettus
Date:
> On Jan 3, 2017, at 08:24, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:
> What I'm suggesting is to let Postgres handle the replacement of normal values, using the prepared statement syntax
of$1, $2, etc[1], and only do identifier replacement in python (using quote_ident).  

If this is practical, I'd be very much in favor of it, for the reasons given.

--
-- Christophe Pettus
   xof@thebuild.com



Re: [psycopg] Solving the SQL composition problem

From
Daniele Varrazzo
Date:
On Tue, Jan 3, 2017 at 5:24 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

> What I'm suggesting is to let Postgres handle the replacement of normal
> values, using the prepared statement syntax of $1, $2, etc[1], and only do
> identifier replacement in python (using quote_ident). That means a lot less
> time spent parsing, and opens the door for eventually doing more efficient
> stuff over the wire, like using binary type formats.
>
> 1: https://www.postgresql.org/docs/current/static/sql-prepare.html

Hi Jim,

this is a very desirable feature. However it would be a major source
of backward incompatibility. Even keeping the placeholders as %s or
%(name)s some query would stop working because of different postgresql
cast rules and because PQexecParams doesn't support passing several
semicolon-separated statements at once. We will eventually get there
in another major release of the adapter. Adrian has already posted a
link with further discussion about the topic.

-- Daniele


Re: [psycopg] Solving the SQL composition problem

From
Jim Nasby
Date:
On 1/3/17 10:35 AM, Adrian Klaver wrote:
>> What I'm suggesting is to let Postgres handle the replacement of normal
>> values, using the prepared statement syntax of $1, $2, etc[1], and only
>> do identifier replacement in python (using quote_ident). That means a
>> lot less time spent parsing, and opens the door for eventually doing
>> more efficient stuff over the wire, like using binary type formats.
>
> I believe a path to the above is explained here:
>
> https://www.postgresql.org/message-id/AANLkTi=ym3SCQKCQBtp8RJHUswwAPOpjXYKTXS=aHWzp@mail.gmail.com

Seems reasonable at first glance.

Speaking of type compatibility and what-not... I find it irritating that
we need to maintain that in two different places (psycopg2 and
plpython). It'd be damn nice if we could abstract that out somehow, as
well as fixing plpython's brain-dead behavior with nested types (arrays
of complex, or complex with arrays or more complex types).

I've also been mucking around with having the SPI functions in plpython
return something that looks more like a dataframe. I've got working code
but have yet to figure out a good interface. Part of that is trying to
figure out how to deal with a bunch of different possible arguments in
the C code. I'm thinking the most flexible API would be something like

plpy.execute(..., container=[], members={})

to match existing behavior. You could then do container={}, members=[]
to get something that looks like a dataframe. Or (theoretically) you
could pass in the desired Pandas types (I think that'd be
contairner=pd.DataFrame, members=pd.Series).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


Re: [psycopg] Solving the SQL composition problem

From
Adrian Klaver
Date:
On 01/03/2017 08:33 AM, Daniele Varrazzo wrote:
> On Mon, Jan 2, 2017 at 5:21 PM, Daniele Varrazzo
> <daniele.varrazzo@gmail.com> wrote:
>>
>> A natural choice could be to use the str.format syntax for the query
>> composition, or a subset of it. Hence my example could be:
>>
>> cur.execute(
>>     sql.SQL("insert into {} values (%s,
>> %s)").format(sql.Identifier('my_table')),
>>     [10, 20])
>
> That's now implemented in the branch; docs updated too:
> http://initd.org/psycopg/docs/sql.html. In particular see
> http://initd.org/psycopg/docs/sql.html#psycopg2.sql.SQL.format which
> replaces the previous SQL % operator (no more supported). It is no
> more necessary to specify value placeholders as %%s so it's easier to
> switch between constant and parametrized queries.

Not sure it applies here, but I just ran across a blog from Armin
Ronacher. I don't always understand what he says, in this case I think I
do and it might be worth a look:

http://lucumr.pocoo.org/2016/12/29/careful-with-str-format/



>
> -- Daniele
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [psycopg] Solving the SQL composition problem

From
Daniele Varrazzo
Date:
On Thu, Jan 5, 2017 at 6:59 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> Not sure it applies here, but I just ran across a blog from Armin Ronacher.
> I don't always understand what he says, in this case I think I do and it
> might be worth a look:
>
> http://lucumr.pocoo.org/2016/12/29/careful-with-str-format/

It's a reasonable concern, but no, it doesn't apply to us. From the
Python library I'm only using the parser to parse the format
micro-language, but not doing anything special with the field name, in
particular not applying attribute lookup: trying `{0.__class__}`
wouldn't try to extract the `__class__` attribute from the first
positional argument, but would look up for a keyword argument with
such name and fail with a KeyError. Also, we check and explicitly
forbid placeholder modifier.

https://github.com/psycopg/psycopg2/blob/a8a3a298/lib/sql.py#L227

-- Daniele


Re: [psycopg] Solving the SQL composition problem

From
Adrian Klaver
Date:
On 01/05/2017 11:26 AM, Daniele Varrazzo wrote:
> On Thu, Jan 5, 2017 at 6:59 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
>> Not sure it applies here, but I just ran across a blog from Armin Ronacher.
>> I don't always understand what he says, in this case I think I do and it
>> might be worth a look:
>>
>> http://lucumr.pocoo.org/2016/12/29/careful-with-str-format/
>
> It's a reasonable concern, but no, it doesn't apply to us. From the
> Python library I'm only using the parser to parse the format
> micro-language, but not doing anything special with the field name, in
> particular not applying attribute lookup: trying `{0.__class__}`
> wouldn't try to extract the `__class__` attribute from the first
> positional argument, but would look up for a keyword argument with
> such name and fail with a KeyError. Also, we check and explicitly
> forbid placeholder modifier.
>
> https://github.com/psycopg/psycopg2/blob/a8a3a298/lib/sql.py#L227

Alright, good to know.
>
> -- Daniele
>


--
Adrian Klaver
adrian.klaver@aklaver.com