Thread: Sanitize schema name
On 05/06/2015 01:56 PM, Ludovic Gasc wrote: > Hi, > > I want to sanitize the name of the schema in a SQL query, because the > schema name depends on the client. So you are talking about schema qualifying objects(tables, functions, etc) in a query, correct? Using search_path does not work? > > The issue is that I can't sanitize the name via the standard way of > psycopg2, because it adds quotes around schema name. What is the standard way? Not sure I understand what quotes have to do with it? > > I imagine it's the same issue with a table name. Do you have a > suggestion to bypass that ? Can you provide an code example of what you are trying to do? > > For now, the most secure way I've found is to test the presence of the > schema before launch each query, but not really efficient. > > Regards. > -- > Ludovic Gasc (GMLudo) > http://www.gmludo.eu/ -- Adrian Klaver adrian.klaver@aklaver.com
On 05/06/2015 01:56 PM, Ludovic Gasc wrote:Hi,
I want to sanitize the name of the schema in a SQL query, because the
schema name depends on the client.
So you are talking about schema qualifying objects(tables, functions, etc) in a query, correct?
Using search_path does not work?
The issue is that I can't sanitize the name via the standard way of
psycopg2, because it adds quotes around schema name.
What is the standard way?
Not sure I understand what quotes have to do with it?
I imagine it's the same issue with a table name. Do you have a
suggestion to bypass that ?
Can you provide an code example of what you are trying to do?--
For now, the most secure way I've found is to test the presence of the
schema before launch each query, but not really efficient.
Regards.
--
Ludovic Gasc (GMLudo)
http://www.gmludo.eu/
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg
On Thu, May 7, 2015 at 12:35 PM, Dorian Hoxha <dorian.hoxha@gmail.com> wrote: > He wants to dynamically pass the table name: > > cursor.execute("SELECT * FROM %s", (table,)) # won't work Looks like there is more and more the need of exposing a function like libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released so recently, I'm reluctant to add such a function to 2.6.1. Maybe releasing a small Python module exposing just that function, then add the functionality to psycopg 2.7? [1] http://www.postgresql.org/docs/9.4/static/libpq-exec.html#LIBPQ-PQESCAPEIDENTIFIER -- Daniele
On Thu, May 7, 2015 at 12:35 PM, Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
> He wants to dynamically pass the table name:
>
> cursor.execute("SELECT * FROM %s", (table,)) # won't work
Looks like there is more and more the need of exposing a function like
libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
so recently, I'm reluctant to add such a function to 2.6.1.
Maybe releasing a small Python module exposing just that function,
then add the functionality to psycopg 2.7?
[1] http://www.postgresql.org/docs/9.4/static/libpq-exec.html#LIBPQ-PQESCAPEIDENTIFIER
-- Daniele
On Thu, May 07, 2015 at 01:01:48PM +0100, Daniele Varrazzo wrote: > > He wants to dynamically pass the table name: > > > > cursor.execute("SELECT * FROM %s", (table,)) # won't work > > Looks like there is more and more the need of exposing a function like > libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released > so recently, I'm reluctant to add such a function to 2.6.1. Making it available for explicit use wouldn't hurt, would it ? Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On 05/07/2015 04:35 AM, Dorian Hoxha wrote: > He wants to dynamically pass the table name: > > cursor.execute("SELECT * FROM %s", (table,)) # won't work > How about using format()?: http://www.postgresql.org/docs/9.4/static/functions-string.html#FUNCTIONS-STRING-FORMAT Available 9.1+ cur.execute("select format('select * from %I', 'student_info')") sql_str = cur.fetchone()[0] sql_str 'select * from student_info' -- Adrian Klaver adrian.klaver@aklaver.com
On 05/07/2015 04:35 AM, Dorian Hoxha wrote:He wants to dynamically pass the table name:
cursor.execute("SELECT * FROM %s", (table,)) # won't work
How about using format()?:
http://www.postgresql.org/docs/9.4/static/functions-string.html#FUNCTIONS-STRING-FORMAT
Available 9.1+
cur.execute("select format('select * from %I', 'student_info')")
sql_str = cur.fetchone()[0]
sql_str
'select * from student_info'
--
Adrian Klaver
adrian.klaver@aklaver.com
On 05/07/2015 01:06 PM, Ludovic Gasc wrote: > Thanks all for your answers, you understand well my need. > > About PQescapeIdentifier: > 1. An idea of release date for the next version of psycopg2 ? > 2. Are you sure it's enough to protect against SQL injections, because > you can read in the documentation: *Tip:* As with string literals, to > prevent SQL injection attacks, SQL identifiers must be escaped when they > are received from an untrustworthy source. > > About format() it doesn't work for schema, example: > SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices') > => SELECT * FROM "lg.devices" WHERE id=1 > SELECT * FROM "lg.devices" WHERE id=1 > => ERROR: relation "lg.devices" does not exist > LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1 > ^ > > ********** Error ********** > > ERROR: relation "lg.devices" does not exist > Try: SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg', 'devices') Still not sure why you cannot use search_path and avoid the schema qualification altogether? -- Adrian Klaver adrian.klaver@aklaver.com
On 05/07/2015 01:06 PM, Ludovic Gasc wrote:Thanks all for your answers, you understand well my need.
About PQescapeIdentifier:
1. An idea of release date for the next version of psycopg2 ?
2. Are you sure it's enough to protect against SQL injections, because
you can read in the documentation: *Tip:* As with string literals, to
prevent SQL injection attacks, SQL identifiers must be escaped when they
are received from an untrustworthy source.
About format() it doesn't work for schema, example:
SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices')
=> SELECT * FROM "lg.devices" WHERE id=1
SELECT * FROM "lg.devices" WHERE id=1
=> ERROR: relation "lg.devices" does not exist
LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1
^
********** Error **********
ERROR: relation "lg.devices" does not exist
Try:
SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg', 'devices')
Still not sure why you cannot use search_path and avoid the schema qualification altogether?
--
Adrian Klaver
adrian.klaver@aklaver.com
On 05/09/2015 01:03 PM, Ludovic Gasc wrote: > 2015-05-08 0:12 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>>: > > On 05/07/2015 01:06 PM, Ludovic Gasc wrote: > > Thanks all for your answers, you understand well my need. > > About PQescapeIdentifier: > 1. An idea of release date for the next version of psycopg2 ? > 2. Are you sure it's enough to protect against SQL injections, > because > you can read in the documentation: *Tip:* As with string > literals, to > prevent SQL injection attacks, SQL identifiers must be escaped > when they > are received from an untrustworthy source. > > About format() it doesn't work for schema, example: > SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices') > => SELECT * FROM "lg.devices" WHERE id=1 > SELECT * FROM "lg.devices" WHERE id=1 > => ERROR: relation "lg.devices" does not exist > LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1 > ^ > > ********** Error ********** > > ERROR: relation "lg.devices" does not exist > > > Try: > > SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg', 'devices') > > > Ok, now, it works, but, I need to launch the query two times: First time > with SELECT format(, a second time with the result of the first query. > It should be possible to execute that only in one pass ? As far as I know, only in plpgsql: http://www.postgresql.org/docs/9.4/static/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE Hence the previous suggestion about creating a psycopg2 function that you could use directly. > > > > Still not sure why you cannot use search_path and avoid the schema > qualification altogether? > > > Because I use a pool of pgsql sockets where no connexions are dedicated > to one particular client. So all the clients are connecting to a single database with many schemas, each schema unique to a client? > I could change that each time just before to execute each query, but it > shouldn't be very efficient. So is the login role for each client unique, where you could use ALTER ROLE SET search_path to have it preset: http://www.postgresql.org/docs/9.4/interactive/sql-alterrole.html > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > -- Adrian Klaver adrian.klaver@aklaver.com
On 05/09/2015 01:03 PM, Ludovic Gasc wrote:2015-05-08 0:12 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>:
On 05/07/2015 01:06 PM, Ludovic Gasc wrote:
Thanks all for your answers, you understand well my need.
About PQescapeIdentifier:
1. An idea of release date for the next version of psycopg2 ?
2. Are you sure it's enough to protect against SQL injections,
because
you can read in the documentation: *Tip:* As with string
literals, to
prevent SQL injection attacks, SQL identifiers must be escaped
when they
are received from an untrustworthy source.
About format() it doesn't work for schema, example:
SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices')
=> SELECT * FROM "lg.devices" WHERE id=1
SELECT * FROM "lg.devices" WHERE id=1
=> ERROR: relation "lg.devices" does not exist
LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1
^
********** Error **********
ERROR: relation "lg.devices" does not exist
Try:
SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg', 'devices')
Ok, now, it works, but, I need to launch the query two times: First time
with SELECT format(, a second time with the result of the first query.
It should be possible to execute that only in one pass ?
As far as I know, only in plpgsql:
http://www.postgresql.org/docs/9.4/static/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE
Hence the previous suggestion about creating a psycopg2 function that you could use directly.
Still not sure why you cannot use search_path and avoid the schema
qualification altogether?
Because I use a pool of pgsql sockets where no connexions are dedicated
to one particular client.
So all the clients are connecting to a single database with many schemas, each schema unique to a client?
I could change that each time just before to execute each query, but it
shouldn't be very efficient.
So is the login role for each client unique, where you could use ALTER ROLE SET search_path to have it preset:
http://www.postgresql.org/docs/9.4/interactive/sql-alterrole.html
--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
--
Adrian Klaver
adrian.klaver@aklaver.com
On Thursday 07 of May 2015, Daniele Varrazzo wrote: > Looks like there is more and more the need of exposing a function like > libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released > so recently, I'm reluctant to add such a function to 2.6.1. > > Maybe releasing a small Python module exposing just that function, > then add the functionality to psycopg 2.7? I vote for a pre-release of 2.7, with this feature. Modifying the API, even if the new function wouldn't interfere with any existing ones, calls for a version bump. Just another idea, would it make sense to abuse the semantics of string formatting[1] and introduce another type, say "%t" [2] for implicit identifier escaping? This would make our queries look like: cr.execute("SELECT id FROM %t WHERE name = %s", ('some.tbl', 'spam')) [1] https://docs.python.org/2/library/stdtypes.html#string-formatting [2] I notice that "t" isn't used for anything else, so far.
On Thursday 07 of May 2015, Daniele Varrazzo wrote:
> Looks like there is more and more the need of exposing a function like
> libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
> so recently, I'm reluctant to add such a function to 2.6.1.
>
> Maybe releasing a small Python module exposing just that function,
> then add the functionality to psycopg 2.7?
I vote for a pre-release of 2.7, with this feature. Modifying the API, even if
the new function wouldn't interfere with any existing ones, calls for a
version bump.
Just another idea, would it make sense to abuse the semantics of string
formatting[1] and introduce another type, say "%t" [2] for implicit identifier
escaping?
This would make our queries look like:
cr.execute("SELECT id FROM %t WHERE name = %s", ('some.tbl', 'spam'))
[1] https://docs.python.org/2/library/stdtypes.html#string-formatting
[2] I notice that "t" isn't used for anything else, so far.
--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg
On 05/09/2015 11:55 PM, Ludovic Gasc wrote: > 2015-05-10 2:41 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>>: > > On 05/09/2015 01:03 PM, Ludovic Gasc wrote: > > 2015-05-08 0:12 GMT+02:00 Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > <mailto:adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>>>: > > > On 05/07/2015 01:06 PM, Ludovic Gasc wrote: > > Thanks all for your answers, you understand well my need. > > About PQescapeIdentifier: > 1. An idea of release date for the next version of > psycopg2 ? > 2. Are you sure it's enough to protect against SQL > injections, > because > you can read in the documentation: *Tip:* As with string > literals, to > prevent SQL injection attacks, SQL identifiers must be > escaped > when they > are received from an untrustworthy source. > > About format() it doesn't work for schema, example: > SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices') > => SELECT * FROM "lg.devices" WHERE id=1 > SELECT * FROM "lg.devices" WHERE id=1 > => ERROR: relation "lg.devices" does not exist > LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1 > ^ > > ********** Error ********** > > ERROR: relation "lg.devices" does not exist > > > Try: > > SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg', > 'devices') > > > Ok, now, it works, but, I need to launch the query two times: > First time > with SELECT format(, a second time with the result of the first > query. > It should be possible to execute that only in one pass ? > > > As far as I know, only in plpgsql: > > http://www.postgresql.org/docs/9.4/static/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE > > Hence the previous suggestion about creating a psycopg2 function > that you could use directly. > > > Ok, at least to me, it's the ideal situation. > > > > > > > Still not sure why you cannot use search_path and avoid the > schema > qualification altogether? > > > Because I use a pool of pgsql sockets where no connexions are > dedicated > to one particular client. > > > So all the clients are connecting to a single database with many > schemas, each schema unique to a client? > > > Exactly. With that, I can easily generate cross statistics between > clients for billing, because I've a hierarchy of clients. > > > > I could change that each time just before to execute each query, > but it > shouldn't be very efficient. > > > > So is the login role for each client unique, where you could use > ALTER ROLE SET search_path to have it preset: > > http://www.postgresql.org/docs/9.4/interactive/sql-alterrole.html > > > Thanks for your help, but this suggestion doesn't fit with my need: If I > do that, I need to have a dedicated connection for each client. > The idea is to mutualize pgsql connections for all clients like an > applicative "virtualization": for some big clients, it will use several > connections from the aiopg's pool, for some others, it uses no connections. You are braver then I, especially given this: https://docs.python.org/3.4/library/asyncio.html " Note The asyncio package has been included in the standard library on a provisional basis. Backwards incompatible changes (up to and including removal of the module) may occur if deemed necessary by the core developers. " -- Adrian Klaver adrian.klaver@aklaver.com
On 11 May 2015 02:17, "Adrian Klaver" <adrian.klaver@aklaver.com> wrote:
>
> On 05/09/2015 11:55 PM, Ludovic Gasc wrote:
>>
>> 2015-05-10 2:41 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com
>> <mailto:adrian.klaver@aklaver.com>>:
>>
>>
>> On 05/09/2015 01:03 PM, Ludovic Gasc wrote:
>>
>> 2015-05-08 0:12 GMT+02:00 Adrian Klaver
>> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>> <mailto:adrian.klaver@aklaver.com
>>
>> <mailto:adrian.klaver@aklaver.com>>>:
>>
>>
>> On 05/07/2015 01:06 PM, Ludovic Gasc wrote:
>>
>> Thanks all for your answers, you understand well my need.
>>
>> About PQescapeIdentifier:
>> 1. An idea of release date for the next version of
>> psycopg2 ?
>> 2. Are you sure it's enough to protect against SQL
>> injections,
>> because
>> you can read in the documentation: *Tip:* As with string
>> literals, to
>> prevent SQL injection attacks, SQL identifiers must be
>> escaped
>> when they
>> are received from an untrustworthy source.
>>
>> About format() it doesn't work for schema, example:
>> SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices')
>> => SELECT * FROM "lg.devices" WHERE id=1
>> SELECT * FROM "lg.devices" WHERE id=1
>> => ERROR: relation "lg.devices" does not exist
>> LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1
>> ^
>>
>> ********** Error **********
>>
>> ERROR: relation "lg.devices" does not exist
>>
>>
>> Try:
>>
>> SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg',
>> 'devices')
>>
>>
>> Ok, now, it works, but, I need to launch the query two times:
>> First time
>> with SELECT format(, a second time with the result of the first
>> query.
>> It should be possible to execute that only in one pass ?
>>
>>
>> As far as I know, only in plpgsql:
>>
>> http://www.postgresql.org/docs/9.4/static/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE
>>
>> Hence the previous suggestion about creating a psycopg2 function
>> that you could use directly.
>>
>>
>> Ok, at least to me, it's the ideal situation.
>>
>>
>>
>>
>>
>>
>> Still not sure why you cannot use search_path and avoid the
>> schema
>> qualification altogether?
>>
>>
>> Because I use a pool of pgsql sockets where no connexions are
>> dedicated
>> to one particular client.
>>
>>
>> So all the clients are connecting to a single database with many
>> schemas, each schema unique to a client?
>>
>>
>> Exactly. With that, I can easily generate cross statistics between
>> clients for billing, because I've a hierarchy of clients.
>>
>>
>>
>> I could change that each time just before to execute each query,
>> but it
>> shouldn't be very efficient.
>>
>>
>>
>> So is the login role for each client unique, where you could use
>> ALTER ROLE SET search_path to have it preset:
>>
>> http://www.postgresql.org/docs/9.4/interactive/sql-alterrole.html
>>
>>
>> Thanks for your help, but this suggestion doesn't fit with my need: If I
>> do that, I need to have a dedicated connection for each client.
>> The idea is to mutualize pgsql connections for all clients like an
>> applicative "virtualization": for some big clients, it will use several
>> connections from the aiopg's pool, for some others, it uses no connections.
>
>
> You are braver then I, especially given this:
>
> https://docs.python.org/3.4/library/asyncio.html
> "
>
> Note
>
> The asyncio package has been included in the standard library on a provisional basis. Backwards incompatible changes (up to and including removal of the module) may occur if deemed necessary by the core developers. "
Brave certainly not, early adopter, yes.
In the same time, before to use AsyncIO, I had a mixin between Twisted for WebSockets and telephony + flask for http.
Now, I have all-in-one daemons for all protocols, aiohttp for http and WebSockets, panoramisk for telephony and API-Hour as a glue between that.
Even if I have the "risk" for a big change, but after my discussions with AsyncIO developers I don't think so, the time I save with a very simpler async implementation compare to Twisted but also now I can share easily my business logic source code between protocols justifies clearly this choice I've made one year ago.
My Dev team speed has clearly changed positively after this decision.
The critical part was to debug libraries because they are pretty young, but for a while, at least the AsyncIO libraries I use, are now enough stable and bugs free to build on top of.
>
>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
sql = """SELECT idFROM {schema}.{table}WHERE name = %(spam)s""".format(schema=my_schema, table=my_table)cursor.execute(sql, dict(spam=user_spam)
2015-05-10 11:00 GMT+02:00 P. Christeas <xrg@linux.gr>:On Thursday 07 of May 2015, Daniele Varrazzo wrote:
> Looks like there is more and more the need of exposing a function like
> libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
> so recently, I'm reluctant to add such a function to 2.6.1.
>
> Maybe releasing a small Python module exposing just that function,
> then add the functionality to psycopg 2.7?
I vote for a pre-release of 2.7, with this feature. Modifying the API, even if
the new function wouldn't interfere with any existing ones, calls for a
version bump.I'm in to be one of a beta-tester.
Just another idea, would it make sense to abuse the semantics of string
formatting[1] and introduce another type, say "%t" [2] for implicit identifier
escaping?
This would make our queries look like:
cr.execute("SELECT id FROM %t WHERE name = %s", ('some.tbl', 'spam'))Sincerely, it should be awesome, because it means it's more end-developer friendly.If you also support %(key)t syntax it should be wonderful, because we use dict to fill query values, easier to write.
[1] https://docs.python.org/2/library/stdtypes.html#string-formatting
[2] I notice that "t" isn't used for anything else, so far.
--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg
Lack of something like PQescapeIdentifier has always felt like a hole in the API. When I need to dynamically add a schema- or table name to a query from a trusted source, I'll do something likesql = """SELECT idFROM {schema}.{table}WHERE name = %(spam)s""".format(schema=my_schema, table=my_table)cursor.execute(sql, dict(spam=user_spam)to make it clear where I'm deliberately inserting an identifier. Having a %t or similar would be much simpler and would handle the untrusted case.DavidOn Sun, May 10, 2015 at 7:07 AM, Ludovic Gasc <gmludo@gmail.com> wrote:2015-05-10 11:00 GMT+02:00 P. Christeas <xrg@linux.gr>:On Thursday 07 of May 2015, Daniele Varrazzo wrote:
> Looks like there is more and more the need of exposing a function like
> libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
> so recently, I'm reluctant to add such a function to 2.6.1.
>
> Maybe releasing a small Python module exposing just that function,
> then add the functionality to psycopg 2.7?
I vote for a pre-release of 2.7, with this feature. Modifying the API, even if
the new function wouldn't interfere with any existing ones, calls for a
version bump.I'm in to be one of a beta-tester.
Just another idea, would it make sense to abuse the semantics of string
formatting[1] and introduce another type, say "%t" [2] for implicit identifier
escaping?
This would make our queries look like:
cr.execute("SELECT id FROM %t WHERE name = %s", ('some.tbl', 'spam'))Sincerely, it should be awesome, because it means it's more end-developer friendly.If you also support %(key)t syntax it should be wonderful, because we use dict to fill query values, easier to write.
[1] https://docs.python.org/2/library/stdtypes.html#string-formatting
[2] I notice that "t" isn't used for anything else, so far.
--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg
Looking for comments on this patch:
https://github.com/yieldsfalsehood/psycopg2/commit/f86f773de6ee99e2d7a2807136dcb458d97ba852
In short:
1. identifier quoting may use PQescapeIdentifier if it's available, otherwise the pure-psyco escaping is done
2. the %t format is now accepted, and its value must be either a string or bytes (no error handling is done yet if this isn't the case) - replacement for this calls out to the identifier quoting
On 05/10/2015 05:00 AM, P. Christeas wrote:
On Thursday 07 of May 2015, Daniele Varrazzo wrote:Looks like there is more and more the need of exposing a function like libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released so recently, I'm reluctant to add such a function to 2.6.1. Maybe releasing a small Python module exposing just that function, then add the functionality to psycopg 2.7?I vote for a pre-release of 2.7, with this feature. Modifying the API, even if the new function wouldn't interfere with any existing ones, calls for a version bump. Just another idea, would it make sense to abuse the semantics of string formatting[1] and introduce another type, say "%t" [2] for implicit identifier escaping? This would make our queries look like: cr.execute("SELECT id FROM %t WHERE name = %s", ('some.tbl', 'spam')) [1] https://docs.python.org/2/library/stdtypes.html#string-formatting [2] I notice that "t" isn't used for anything else, so far.
On 13/05/2015 16:13, Elliot S wrote: > I like this idea and drafted it up. > > Looking for comments on this patch: > > https://github.com/yieldsfalsehood/psycopg2/commit/f86f773de6ee99e2d7a2807136dcb458d97ba852 > > In short: > 1. identifier quoting may use PQescapeIdentifier if it's available, > otherwise the pure-psyco escaping is done > 2. the %t format is now accepted, and its value must be either a > string or bytes (no error handling is done yet if this isn't the case) - > replacement for this calls out to the identifier quoting The patch looks fine to me but your tests should cover all corner cases: 1) spaces in identifiers 2) double quotes in identifiers 3) a mix of upper- and lower-case characters I'd also like to see the tests compare the result with the result of a "SELECT quote_ident(...)" call, just to be future proof. Also, I'd expose the quoting function in psycopg.extensions to let the user build the query string separately from the .execute() call: this is useful if you want to stick to DBAPI in your .execute() call. I.e., to allow something like: from psycopg.extensions import quote_ident query = "SELECT %s FROM %s WHERE id = %%s" % ( quote_ident('table'), quote_ident('col')) curs.execute(query, (id_value,)) federico -- Federico Di Gregorio federico.digregorio@dndg.it Di Nunzio & Di Gregorio srl http://dndg.it One key. One input. One enter. All right. -- An american consultant (then the system crashed and took down the *entire* network)
Sounds good, thanks for the feedback. I should have time to work on this today and tomorrow. On 05/20/2015 04:14 AM, Federico Di Gregorio wrote: > On 13/05/2015 16:13, Elliot S wrote: >> I like this idea and drafted it up. >> >> Looking for comments on this patch: >> >> https://github.com/yieldsfalsehood/psycopg2/commit/f86f773de6ee99e2d7a2807136dcb458d97ba852 >> >> >> In short: >> 1. identifier quoting may use PQescapeIdentifier if it's available, >> otherwise the pure-psyco escaping is done >> 2. the %t format is now accepted, and its value must be either a >> string or bytes (no error handling is done yet if this isn't the case) - >> replacement for this calls out to the identifier quoting > > The patch looks fine to me but your tests should cover all corner cases: > > 1) spaces in identifiers > 2) double quotes in identifiers > 3) a mix of upper- and lower-case characters > > I'd also like to see the tests compare the result with the result of a > "SELECT quote_ident(...)" call, just to be future proof. > > Also, I'd expose the quoting function in psycopg.extensions to let the > user build the query string separately from the .execute() call: this > is useful if you want to stick to DBAPI in your .execute() call. I.e., > to allow something like: > > from psycopg.extensions import quote_ident > > query = "SELECT %s FROM %s WHERE id = %%s" % ( > quote_ident('table'), quote_ident('col')) > > curs.execute(query, (id_value,)) > > federico >
I'm sorry but I don't like this idea: On Wed, May 20, 2015 at 10:14 AM, Federico Di Gregorio <fog@dndg.it> wrote: > On 13/05/2015 16:13, Elliot S wrote: >> >> I like this idea and drafted it up. >> >> Looking for comments on this patch: >> >> >> https://github.com/yieldsfalsehood/psycopg2/commit/f86f773de6ee99e2d7a2807136dcb458d97ba852 >> >> In short: >> 1. identifier quoting may use PQescapeIdentifier if it's available, >> otherwise the pure-psyco escaping is done No: only the libpq version should be used. The psycopg one is not generic enough and only for internal use. The function is exposed in all the currently supported libpq versions so we don't really care about the older ones. >> 2. the %t format is now accepted, and its value must be either a >> string or bytes (no error handling is done yet if this isn't the case) - >> replacement for this calls out to the identifier quoting I don't like this idea: %t is nowhere standard. Psycopg has already powerful enough types-based adaptation: using it with identifiers has always been rejected because moving to libpq parameters would break. But the %t would break too so I don't see why to jump on this idea. Furthermore using a different parameter goes sideways to the entire type system: what if one passes a number to a %t? Looking at the patch it seems like it silently discards the value. That's a no for me. IMO we should just do: 1) expose the PQescape* libpq functions as pure bytes-bytes functions for people to use them as they wish. Under a good moon we should roundtrip unicode around. You know, python 3... 2) have a wrapper identifier object, whose adaptation result in identifier escaping. With current psycopg it could be used with: cur.execute("update table set %s = %s", (ident(field), value)) if we want this to be used in the future: sql = "update table set %s = %%s" % ident(field) # [note] cur.execute(sql, (value,)) [note]: this is a lie because the libpq functions take a connection as parameter too so we cannot just compose using the string % operator. We can have a cursor method doing that instead, or having "ident" being a cursor method. > Also, I'd expose the quoting function in psycopg.extensions to let the user > build the query string separately from the .execute() call: this is useful > if you want to stick to DBAPI in your .execute() call. I.e., to allow > something like: > > from psycopg.extensions import quote_ident > > query = "SELECT %s FROM %s WHERE id = %%s" % ( > quote_ident('table'), quote_ident('col')) > > curs.execute(query, (id_value,)) Agreed on that (point 1 above), with the note that we should handle the fact that the libpq function takes the connection too as argument. -- Daniele
Hi Daniele, some comments below: On 21/05/2015 01:51, Daniele Varrazzo wrote: > I'm sorry but I don't like this idea: > > On Wed, May 20, 2015 at 10:14 AM, Federico Di Gregorio <fog@dndg.it> wrote: >> On 13/05/2015 16:13, Elliot S wrote: >>> >>> I like this idea and drafted it up. >>> >>> Looking for comments on this patch: >>> >>> >>> https://github.com/yieldsfalsehood/psycopg2/commit/f86f773de6ee99e2d7a2807136dcb458d97ba852 >>> >>> In short: >>> 1. identifier quoting may use PQescapeIdentifier if it's available, >>> otherwise the pure-psyco escaping is done > > No: only the libpq version should be used. The psycopg one is not > generic enough and only for internal use. The function is exposed in > all the currently supported libpq versions so we don't really care > about the older ones. Agreed. >>> 2. the %t format is now accepted, and its value must be either a >>> string or bytes (no error handling is done yet if this isn't the case) - >>> replacement for this calls out to the identifier quoting > > I don't like this idea: %t is nowhere standard. Psycopg has already > powerful enough types-based adaptation: using it with identifiers has > always been rejected because moving to libpq parameters would break. > But the %t would break too so I don't see why to jump on this idea. > Furthermore using a different parameter goes sideways to the entire > type system: what if one passes a number to a %t? Looking at the patch > it seems like it silently discards the value. That's a no for me. > > IMO we should just do: > > 1) expose the PQescape* libpq functions as pure bytes-bytes functions > for people to use them as they wish. Under a good moon we should > roundtrip unicode around. You know, python 3... > 2) have a wrapper identifier object, whose adaptation result in > identifier escaping. With current psycopg it could be used with: > > cur.execute("update table set %s = %s", (ident(field), value)) No no no. Bound variables should be easily identifiable and should not be mixed with identifiers. If you don't like %t, that's OK but having an ident behave like a bound variable isn't good either. I'd say lets just stick to expose quoting functions in psycopg.extensions or as extra methods on the connection object. > if we want this to be used in the future: > > sql = "update table set %s = %%s" % ident(field) # [note] > cur.execute(sql, (value,)) > > [note]: this is a lie because the libpq functions take a connection as > parameter too so we cannot just compose using the string % operator. > We can have a cursor method doing that instead, or having "ident" > being a cursor method. If it requires a connection, lets stick it to the connection object. federico -- Federico Di Gregorio federico.digregorio@dndg.it Di Nunzio & Di Gregorio srl http://dndg.it When people say things are a lot more complicated than that, they means they're getting worried that they won't like the truth. -- Granny Weatherwax
>>>> In short: >>>> 1. identifier quoting may use PQescapeIdentifier if it's >>>> available, >>>> otherwise the pure-psyco escaping is done >> >> No: only the libpq version should be used. The psycopg one is not >> generic enough and only for internal use. The function is exposed in >> all the currently supported libpq versions so we don't really care >> about the older ones. > > Agreed. I've removed the call out to the psycopg quoting. >>>> 2. the %t format is now accepted, and its value must be either a >>>> string or bytes (no error handling is done yet if this isn't the >>>> case) - >>>> replacement for this calls out to the identifier quoting >> >> I don't like this idea: %t is nowhere standard. Psycopg has already >> powerful enough types-based adaptation: using it with identifiers has >> always been rejected because moving to libpq parameters would break. >> But the %t would break too so I don't see why to jump on this idea. >> Furthermore using a different parameter goes sideways to the entire >> type system: what if one passes a number to a %t? Looking at the patch >> it seems like it silently discards the value. That's a no for me. >> >> IMO we should just do: >> >> 1) expose the PQescape* libpq functions as pure bytes-bytes functions >> for people to use them as they wish. Under a good moon we should >> roundtrip unicode around. You know, python 3... >> 2) have a wrapper identifier object, whose adaptation result in >> identifier escaping. With current psycopg it could be used with: >> >> cur.execute("update table set %s = %s", (ident(field), value)) > > No no no. Bound variables should be easily identifiable and should not > be mixed with identifiers. If you don't like %t, that's OK but having > an ident behave like a bound variable isn't good either. I'd say lets > just stick to expose quoting functions in psycopg.extensions or as > extra methods on the connection object. %t formatting is also removed. >> if we want this to be used in the future: >> >> sql = "update table set %s = %%s" % ident(field) # [note] >> cur.execute(sql, (value,)) >> >> [note]: this is a lie because the libpq functions take a connection as >> parameter too so we cannot just compose using the string % operator. >> We can have a cursor method doing that instead, or having "ident" >> being a cursor method. > > If it requires a connection, lets stick it to the connection object. > > federico > I've added a quote_ident() to the connection object. I'm still working on my quote_ident branch https://github.com/psycopg/psycopg2/compare/master...yieldsfalsehood:quote_ident Return values are just bytestrings right now. bytestring input isn't accepted on python3.4 (only py3 I tested); unicode and bytestring inputs look ok for python2.7. The other PQescape* functions were mentioned earlier but I haven't touched them, yet.
On Wed, May 27, 2015 at 5:32 PM, Elliot S <yields.falsehood@gmail.com> wrote: > I've removed the call out to the psycopg quoting. > %t formatting is also removed. Thank you for these adjustments. > I've added a quote_ident() to the connection object. IMO we should have a function on psycopg2.extensions taking a connection or cursor object and a string, like the various extras.register_*. This method doesn't really belong to the connection (it does require a connection to work, but that's pretty much like everything else in libpq). It doesn't even belong to the cursor: it's just an utility function wrapping some libpq functionality. > I'm still working on my quote_ident branch > https://github.com/psycopg/psycopg2/compare/master...yieldsfalsehood:quote_ident > > Return values are just bytestrings right now. bytestring input isn't > accepted on python3.4 (only py3 I tested); unicode and bytestring inputs > look ok for python2.7. I'm not sure the trivial PyArg_ParseTuple(args, "s", &ident) is right here: in py3 you have an utf8 encoded string: maybe good enough but I'm not so sure. Looking at the libpq implementation it makes use of the connection encoding. I think the right thing to do is, in case of unicode input, encode it in the connection encoding. The full roundtrip, IMO, should be a function that returns a string on string input and a unicode on unicode input (with the respective meaning of these terms in Py2 and Py3), using the connection encoding for encoding/decoding in the latter case. > The other PQescape* functions were mentioned earlier but I haven't touched > them, yet. We can take a look at them and work out which one would be useful later, after we get this right. -- Daniele