Thread: Sanitize schema name

Sanitize schema name

From
Ludovic Gasc
Date:
Hi,

I want to sanitize the name of the schema in a SQL query, because the schema name depends on the client.

The issue is that I can't sanitize the name via the standard way of psycopg2, because it adds quotes around schema name.

I imagine it's the same issue with a table name. Do you have a suggestion to bypass that ?

For now, the most secure way I've found is to test the presence of the schema before launch each query, but not really efficient.

Regards.
--
Ludovic Gasc (GMLudo)

Re: Sanitize schema name

From
Adrian Klaver
Date:
On 05/06/2015 01:56 PM, Ludovic Gasc wrote:
> Hi,
>
> I want to sanitize the name of the schema in a SQL query, because the
> schema name depends on the client.

So you are talking about schema qualifying objects(tables, functions,
etc) in a query, correct?

Using search_path does not work?

>
> The issue is that I can't sanitize the name via the standard way of
> psycopg2, because it adds quotes around schema name.

What is the standard way?
Not sure I understand what quotes have to do with it?
>
> I imagine it's the same issue with a table name. Do you have a
> suggestion to bypass that ?

Can you provide an code example of what you are trying to do?

>
> For now, the most secure way I've found is to test the presence of the
> schema before launch each query, but not really efficient.
>
> Regards.
> --
> Ludovic Gasc (GMLudo)
> http://www.gmludo.eu/


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Sanitize schema name

From
Dorian Hoxha
Date:
He wants to dynamically pass the table name:

cursor.execute("SELECT * FROM %s", (table,)) # won't work




On Wed, May 6, 2015 at 11:08 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 05/06/2015 01:56 PM, Ludovic Gasc wrote:
Hi,

I want to sanitize the name of the schema in a SQL query, because the
schema name depends on the client.

So you are talking about schema qualifying objects(tables, functions, etc) in a query, correct?

Using search_path does not work?


The issue is that I can't sanitize the name via the standard way of
psycopg2, because it adds quotes around schema name.

What is the standard way?
Not sure I understand what quotes have to do with it?

I imagine it's the same issue with a table name. Do you have a
suggestion to bypass that ?

Can you provide an code example of what you are trying to do?



For now, the most secure way I've found is to test the presence of the
schema before launch each query, but not really efficient.

Regards.
--
Ludovic Gasc (GMLudo)
http://www.gmludo.eu/


--
Adrian Klaver
adrian.klaver@aklaver.com


--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg

Re: Sanitize schema name

From
Daniele Varrazzo
Date:
On Thu, May 7, 2015 at 12:35 PM, Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
> He wants to dynamically pass the table name:
>
> cursor.execute("SELECT * FROM %s", (table,)) # won't work

Looks like there is more and more the need of exposing a function like
libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
so recently, I'm reluctant to add such a function to 2.6.1.

Maybe releasing a small Python module exposing just that function,
then add the functionality to psycopg 2.7?

[1] http://www.postgresql.org/docs/9.4/static/libpq-exec.html#LIBPQ-PQESCAPEIDENTIFIER

-- Daniele


Re: Sanitize schema name

From
Dorian Hoxha
Date:
Since it will be an optional separate function, and won't interfere with old code, I don't see a problem with it being in 2.6.1

On Thu, May 7, 2015 at 2:01 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Thu, May 7, 2015 at 12:35 PM, Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
> He wants to dynamically pass the table name:
>
> cursor.execute("SELECT * FROM %s", (table,)) # won't work

Looks like there is more and more the need of exposing a function like
libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
so recently, I'm reluctant to add such a function to 2.6.1.

Maybe releasing a small Python module exposing just that function,
then add the functionality to psycopg 2.7?

[1] http://www.postgresql.org/docs/9.4/static/libpq-exec.html#LIBPQ-PQESCAPEIDENTIFIER

-- Daniele

Re: Sanitize schema name

From
Karsten Hilbert
Date:
On Thu, May 07, 2015 at 01:01:48PM +0100, Daniele Varrazzo wrote:

> > He wants to dynamically pass the table name:
> >
> > cursor.execute("SELECT * FROM %s", (table,)) # won't work
>
> Looks like there is more and more the need of exposing a function like
> libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
> so recently, I'm reluctant to add such a function to 2.6.1.

Making it available for explicit use wouldn't hurt, would it ?

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: Sanitize schema name

From
Adrian Klaver
Date:
On 05/07/2015 04:35 AM, Dorian Hoxha wrote:
> He wants to dynamically pass the table name:
>
> cursor.execute("SELECT * FROM %s", (table,)) # won't work
>

How about using format()?:
http://www.postgresql.org/docs/9.4/static/functions-string.html#FUNCTIONS-STRING-FORMAT

Available 9.1+

cur.execute("select format('select * from %I', 'student_info')")

sql_str = cur.fetchone()[0]

sql_str
'select * from student_info'





--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Sanitize schema name

From
Ludovic Gasc
Date:
Thanks all for your answers, you understand well my need.

About PQescapeIdentifier:
1. An idea of release date for the next version of psycopg2 ?
2. Are you sure it's enough to protect against SQL injections, because you can read in the documentation: Tip: As with string literals, to prevent SQL injection attacks, SQL identifiers must be escaped when they are received from an untrustworthy source.

About format()  it doesn't work for schema, example:
SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices')
=> SELECT * FROM "lg.devices" WHERE id=1
SELECT * FROM "lg.devices" WHERE id=1
=> ERROR:  relation "lg.devices" does not exist
LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1
                        ^

********** Error **********

ERROR: relation "lg.devices" does not exist

An idea should be also to verify if "lg" schema exists, but it means I need to do two SQL queries each type, or keep a cache of schema list.

--
Ludovic Gasc (GMLudo)

2015-05-07 15:33 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 05/07/2015 04:35 AM, Dorian Hoxha wrote:
He wants to dynamically pass the table name:

cursor.execute("SELECT * FROM %s", (table,)) # won't work


How about using format()?:
http://www.postgresql.org/docs/9.4/static/functions-string.html#FUNCTIONS-STRING-FORMAT

Available 9.1+

cur.execute("select format('select * from %I', 'student_info')")

sql_str = cur.fetchone()[0]

sql_str
'select * from student_info'





--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Sanitize schema name

From
Adrian Klaver
Date:
On 05/07/2015 01:06 PM, Ludovic Gasc wrote:
> Thanks all for your answers, you understand well my need.
>
> About PQescapeIdentifier:
> 1. An idea of release date for the next version of psycopg2 ?
> 2. Are you sure it's enough to protect against SQL injections, because
> you can read in the documentation: *Tip:* As with string literals, to
> prevent SQL injection attacks, SQL identifiers must be escaped when they
> are received from an untrustworthy source.
>
> About format() it doesn't work for schema, example:
> SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices')
> => SELECT * FROM "lg.devices" WHERE id=1
> SELECT * FROM "lg.devices" WHERE id=1
> => ERROR:  relation "lg.devices" does not exist
> LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1
>                          ^
>
> ********** Error **********
>
> ERROR: relation "lg.devices" does not exist
>

Try:

SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg', 'devices')


Still not sure why you cannot use search_path and avoid the schema
qualification altogether?

--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Sanitize schema name

From
Ludovic Gasc
Date:
2015-05-08 0:12 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 05/07/2015 01:06 PM, Ludovic Gasc wrote:
Thanks all for your answers, you understand well my need.

About PQescapeIdentifier:
1. An idea of release date for the next version of psycopg2 ?
2. Are you sure it's enough to protect against SQL injections, because
you can read in the documentation: *Tip:* As with string literals, to
prevent SQL injection attacks, SQL identifiers must be escaped when they
are received from an untrustworthy source.

About format() it doesn't work for schema, example:
SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices')
=> SELECT * FROM "lg.devices" WHERE id=1
SELECT * FROM "lg.devices" WHERE id=1
=> ERROR:  relation "lg.devices" does not exist
LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1
                         ^

********** Error **********

ERROR: relation "lg.devices" does not exist


Try:

SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg', 'devices')

Ok, now, it works, but, I need to launch the query two times: First time with SELECT format(, a second time with the result of the first query.
It should be possible to execute that only in one pass ?



Still not sure why you cannot use search_path and avoid the schema qualification altogether?

Because I use a pool of pgsql sockets where no connexions are dedicated to one particular client.
I could change that each time just before to execute each query, but it shouldn't be very efficient.
 


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Sanitize schema name

From
Adrian Klaver
Date:
On 05/09/2015 01:03 PM, Ludovic Gasc wrote:
> 2015-05-08 0:12 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>>:
>
>     On 05/07/2015 01:06 PM, Ludovic Gasc wrote:
>
>         Thanks all for your answers, you understand well my need.
>
>         About PQescapeIdentifier:
>         1. An idea of release date for the next version of psycopg2 ?
>         2. Are you sure it's enough to protect against SQL injections,
>         because
>         you can read in the documentation: *Tip:* As with string
>         literals, to
>         prevent SQL injection attacks, SQL identifiers must be escaped
>         when they
>         are received from an untrustworthy source.
>
>         About format() it doesn't work for schema, example:
>         SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices')
>         => SELECT * FROM "lg.devices" WHERE id=1
>         SELECT * FROM "lg.devices" WHERE id=1
>         => ERROR:  relation "lg.devices" does not exist
>         LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1
>                                   ^
>
>         ********** Error **********
>
>         ERROR: relation "lg.devices" does not exist
>
>
>     Try:
>
>     SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg', 'devices')
>
>
> Ok, now, it works, but, I need to launch the query two times: First time
> with SELECT format(, a second time with the result of the first query.
> It should be possible to execute that only in one pass ?

As far as I know, only in plpgsql:

http://www.postgresql.org/docs/9.4/static/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE

Hence the previous suggestion about creating a psycopg2 function that
you could use directly.

>
>
>
>     Still not sure why you cannot use search_path and avoid the schema
>     qualification altogether?
>
>
> Because I use a pool of pgsql sockets where no connexions are dedicated
> to one particular client.

So all the clients are connecting to a single database with many
schemas, each schema unique to a client?

> I could change that each time just before to execute each query, but it
> shouldn't be very efficient.


So is the login role for each client unique, where you could use ALTER
ROLE SET search_path to have it preset:

http://www.postgresql.org/docs/9.4/interactive/sql-alterrole.html

>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Sanitize schema name

From
Ludovic Gasc
Date:
2015-05-10 2:41 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 05/09/2015 01:03 PM, Ludovic Gasc wrote:
2015-05-08 0:12 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>:


    On 05/07/2015 01:06 PM, Ludovic Gasc wrote:

        Thanks all for your answers, you understand well my need.

        About PQescapeIdentifier:
        1. An idea of release date for the next version of psycopg2 ?
        2. Are you sure it's enough to protect against SQL injections,
        because
        you can read in the documentation: *Tip:* As with string
        literals, to
        prevent SQL injection attacks, SQL identifiers must be escaped
        when they
        are received from an untrustworthy source.

        About format() it doesn't work for schema, example:
        SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices')
        => SELECT * FROM "lg.devices" WHERE id=1
        SELECT * FROM "lg.devices" WHERE id=1
        => ERROR:  relation "lg.devices" does not exist
        LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1
                                  ^

        ********** Error **********

        ERROR: relation "lg.devices" does not exist


    Try:

    SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg', 'devices')


Ok, now, it works, but, I need to launch the query two times: First time
with SELECT format(, a second time with the result of the first query.
It should be possible to execute that only in one pass ?

As far as I know, only in plpgsql:

http://www.postgresql.org/docs/9.4/static/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE

Hence the previous suggestion about creating a psycopg2 function that you could use directly.

Ok, at least to me, it's the ideal situation.
 





    Still not sure why you cannot use search_path and avoid the schema
    qualification altogether?


Because I use a pool of pgsql sockets where no connexions are dedicated
to one particular client.

So all the clients are connecting to a single database with many schemas, each schema unique to a client?

Exactly. With that, I can easily generate cross statistics between clients for billing, because I've a hierarchy of clients.
 


I could change that each time just before to execute each query, but it
shouldn't be very efficient.


So is the login role for each client unique, where you could use ALTER ROLE SET search_path to have it preset:

http://www.postgresql.org/docs/9.4/interactive/sql-alterrole.html

Thanks for your help, but this suggestion doesn't fit with my need: If I do that, I need to have a dedicated connection for each client.
The idea is to mutualize pgsql connections for all clients like an applicative "virtualization": for some big clients, it will use several connections from the aiopg's pool, for some others, it uses no connections.
 





    --
    Adrian Klaver
    adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>




--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Sanitize schema name

From
"P. Christeas"
Date:
On Thursday 07 of May 2015, Daniele Varrazzo wrote:
> Looks like there is more and more the need of exposing a function like
> libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
> so recently, I'm reluctant to add such a function to 2.6.1.
>
> Maybe releasing a small Python module exposing just that function,
> then add the functionality to psycopg 2.7?

I vote for a pre-release of 2.7, with this feature. Modifying the API, even if
the new function wouldn't interfere with any existing ones, calls for a
version bump.

Just another idea, would it make sense to abuse the semantics of string
formatting[1] and introduce another type, say "%t" [2] for implicit identifier
escaping?

This would make our queries look like:
   cr.execute("SELECT id FROM %t WHERE name = %s", ('some.tbl', 'spam'))



[1] https://docs.python.org/2/library/stdtypes.html#string-formatting
[2] I notice that "t" isn't used for anything else, so far.



Re: Sanitize schema name

From
Ludovic Gasc
Date:
2015-05-10 11:00 GMT+02:00 P. Christeas <xrg@linux.gr>:
On Thursday 07 of May 2015, Daniele Varrazzo wrote:
> Looks like there is more and more the need of exposing a function like
> libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
> so recently, I'm reluctant to add such a function to 2.6.1.
>
> Maybe releasing a small Python module exposing just that function,
> then add the functionality to psycopg 2.7?

I vote for a pre-release of 2.7, with this feature. Modifying the API, even if
the new function wouldn't interfere with any existing ones, calls for a
version bump.

I'm in to be one of a beta-tester.
 

Just another idea, would it make sense to abuse the semantics of string
formatting[1] and introduce another type, say "%t" [2] for implicit identifier
escaping?

This would make our queries look like:
   cr.execute("SELECT id FROM %t WHERE name = %s", ('some.tbl', 'spam'))

Sincerely, it should be awesome, because it means it's more end-developer friendly.
If you also support %(key)t syntax it should be wonderful, because we use dict to fill query values, easier to write.
 



[1] https://docs.python.org/2/library/stdtypes.html#string-formatting
[2] I notice that "t" isn't used for anything else, so far.



--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg

Re: Sanitize schema name

From
Adrian Klaver
Date:
On 05/09/2015 11:55 PM, Ludovic Gasc wrote:
> 2015-05-10 2:41 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>>:
>
>     On 05/09/2015 01:03 PM, Ludovic Gasc wrote:
>
>         2015-05-08 0:12 GMT+02:00 Adrian Klaver
>         <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>         <mailto:adrian.klaver@aklaver.com
>         <mailto:adrian.klaver@aklaver.com>>>:
>
>
>              On 05/07/2015 01:06 PM, Ludovic Gasc wrote:
>
>                  Thanks all for your answers, you understand well my need.
>
>                  About PQescapeIdentifier:
>                  1. An idea of release date for the next version of
>         psycopg2 ?
>                  2. Are you sure it's enough to protect against SQL
>         injections,
>                  because
>                  you can read in the documentation: *Tip:* As with string
>                  literals, to
>                  prevent SQL injection attacks, SQL identifiers must be
>         escaped
>                  when they
>                  are received from an untrustworthy source.
>
>                  About format() it doesn't work for schema, example:
>                  SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices')
>                  => SELECT * FROM "lg.devices" WHERE id=1
>                  SELECT * FROM "lg.devices" WHERE id=1
>                  => ERROR:  relation "lg.devices" does not exist
>                  LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1
>                                            ^
>
>                  ********** Error **********
>
>                  ERROR: relation "lg.devices" does not exist
>
>
>              Try:
>
>              SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg',
>         'devices')
>
>
>         Ok, now, it works, but, I need to launch the query two times:
>         First time
>         with SELECT format(, a second time with the result of the first
>         query.
>         It should be possible to execute that only in one pass ?
>
>
>     As far as I know, only in plpgsql:
>
>     http://www.postgresql.org/docs/9.4/static/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE
>
>     Hence the previous suggestion about creating a psycopg2 function
>     that you could use directly.
>
>
> Ok, at least to me, it's the ideal situation.
>
>
>
>
>
>
>              Still not sure why you cannot use search_path and avoid the
>         schema
>              qualification altogether?
>
>
>         Because I use a pool of pgsql sockets where no connexions are
>         dedicated
>         to one particular client.
>
>
>     So all the clients are connecting to a single database with many
>     schemas, each schema unique to a client?
>
>
> Exactly. With that, I can easily generate cross statistics between
> clients for billing, because I've a hierarchy of clients.
>
>
>
>         I could change that each time just before to execute each query,
>         but it
>         shouldn't be very efficient.
>
>
>
>     So is the login role for each client unique, where you could use
>     ALTER ROLE SET search_path to have it preset:
>
>     http://www.postgresql.org/docs/9.4/interactive/sql-alterrole.html
>
>
> Thanks for your help, but this suggestion doesn't fit with my need: If I
> do that, I need to have a dedicated connection for each client.
> The idea is to mutualize pgsql connections for all clients like an
> applicative "virtualization": for some big clients, it will use several
> connections from the aiopg's pool, for some others, it uses no connections.

You are braver then I, especially given this:

https://docs.python.org/3.4/library/asyncio.html
"

Note

The asyncio package has been included in the standard library on a
provisional basis. Backwards incompatible changes (up to and including
removal of the module) may occur if deemed necessary by the core
developers. "




--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Sanitize schema name

From
Ludovic Gasc
Date:

On 11 May 2015 02:17, "Adrian Klaver" <adrian.klaver@aklaver.com> wrote:
>
> On 05/09/2015 11:55 PM, Ludovic Gasc wrote:
>>
>> 2015-05-10 2:41 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com
>> <mailto:adrian.klaver@aklaver.com>>:
>>
>>
>>     On 05/09/2015 01:03 PM, Ludovic Gasc wrote:
>>
>>         2015-05-08 0:12 GMT+02:00 Adrian Klaver
>>         <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>>         <mailto:adrian.klaver@aklaver.com
>>
>>         <mailto:adrian.klaver@aklaver.com>>>:
>>
>>
>>              On 05/07/2015 01:06 PM, Ludovic Gasc wrote:
>>
>>                  Thanks all for your answers, you understand well my need.
>>
>>                  About PQescapeIdentifier:
>>                  1. An idea of release date for the next version of
>>         psycopg2 ?
>>                  2. Are you sure it's enough to protect against SQL
>>         injections,
>>                  because
>>                  you can read in the documentation: *Tip:* As with string
>>                  literals, to
>>                  prevent SQL injection attacks, SQL identifiers must be
>>         escaped
>>                  when they
>>                  are received from an untrustworthy source.
>>
>>                  About format() it doesn't work for schema, example:
>>                  SELECT format('SELECT * FROM %I WHERE id=1', 'lg.devices')
>>                  => SELECT * FROM "lg.devices" WHERE id=1
>>                  SELECT * FROM "lg.devices" WHERE id=1
>>                  => ERROR:  relation "lg.devices" does not exist
>>                  LIGNE 1 : SELECT * FROM "lg.devices" WHERE id=1
>>                                            ^
>>
>>                  ********** Error **********
>>
>>                  ERROR: relation "lg.devices" does not exist
>>
>>
>>              Try:
>>
>>              SELECT format('SELECT * FROM %I.%I WHERE id=1', 'lg',
>>         'devices')
>>
>>
>>         Ok, now, it works, but, I need to launch the query two times:
>>         First time
>>         with SELECT format(, a second time with the result of the first
>>         query.
>>         It should be possible to execute that only in one pass ?
>>
>>
>>     As far as I know, only in plpgsql:
>>
>>     http://www.postgresql.org/docs/9.4/static/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE
>>
>>     Hence the previous suggestion about creating a psycopg2 function
>>     that you could use directly.
>>
>>
>> Ok, at least to me, it's the ideal situation.
>>
>>
>>
>>
>>
>>
>>              Still not sure why you cannot use search_path and avoid the
>>         schema
>>              qualification altogether?
>>
>>
>>         Because I use a pool of pgsql sockets where no connexions are
>>         dedicated
>>         to one particular client.
>>
>>
>>     So all the clients are connecting to a single database with many
>>     schemas, each schema unique to a client?
>>
>>
>> Exactly. With that, I can easily generate cross statistics between
>> clients for billing, because I've a hierarchy of clients.
>>
>>
>>
>>         I could change that each time just before to execute each query,
>>         but it
>>         shouldn't be very efficient.
>>
>>
>>
>>     So is the login role for each client unique, where you could use
>>     ALTER ROLE SET search_path to have it preset:
>>
>>     http://www.postgresql.org/docs/9.4/interactive/sql-alterrole.html
>>
>>
>> Thanks for your help, but this suggestion doesn't fit with my need: If I
>> do that, I need to have a dedicated connection for each client.
>> The idea is to mutualize pgsql connections for all clients like an
>> applicative "virtualization": for some big clients, it will use several
>> connections from the aiopg's pool, for some others, it uses no connections.
>
>
> You are braver then I, especially given this:
>
> https://docs.python.org/3.4/library/asyncio.html
> "
>
> Note
>
> The asyncio package has been included in the standard library on a provisional basis. Backwards incompatible changes (up to and including removal of the module) may occur if deemed necessary by the core developers. "

Brave certainly not, early adopter, yes.
In the same time, before to use AsyncIO, I had a mixin between Twisted for WebSockets and telephony + flask for http.
Now, I have all-in-one daemons for all protocols, aiohttp for http and WebSockets, panoramisk for telephony and API-Hour as a glue between that.

Even if I have the "risk" for a big change, but after my discussions with AsyncIO developers I don't think so, the time I save with a very simpler async implementation compare to Twisted but also now I can share easily my business logic source code between protocols justifies clearly this choice I've made one year ago.
My Dev team speed has clearly changed positively after this decision.

The critical part was to debug libraries because they are pretty young, but for a while, at least the AsyncIO libraries I use, are now enough stable and bugs free to build on top of.

>
>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com

Re: Sanitize schema name

From
David Niergarth
Date:
Lack of something like PQescapeIdentifier has always felt like a hole in the API. When I need to dynamically add a schema- or table name to a query from a trusted source, I'll do something like

sql = """
SELECT id 
FROM {schema}.{table}
WHERE name = %(spam)s
""".format(schema=my_schema, table=my_table)

cursor.execute(sql, dict(spam=user_spam)

to make it clear where I'm deliberately inserting an identifier. Having a %t or similar would be much simpler and would handle the untrusted case.

David

On Sun, May 10, 2015 at 7:07 AM, Ludovic Gasc <gmludo@gmail.com> wrote:
2015-05-10 11:00 GMT+02:00 P. Christeas <xrg@linux.gr>:
On Thursday 07 of May 2015, Daniele Varrazzo wrote:
> Looks like there is more and more the need of exposing a function like
> libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
> so recently, I'm reluctant to add such a function to 2.6.1.
>
> Maybe releasing a small Python module exposing just that function,
> then add the functionality to psycopg 2.7?

I vote for a pre-release of 2.7, with this feature. Modifying the API, even if
the new function wouldn't interfere with any existing ones, calls for a
version bump.

I'm in to be one of a beta-tester.
 

Just another idea, would it make sense to abuse the semantics of string
formatting[1] and introduce another type, say "%t" [2] for implicit identifier
escaping?

This would make our queries look like:
   cr.execute("SELECT id FROM %t WHERE name = %s", ('some.tbl', 'spam'))

Sincerely, it should be awesome, because it means it's more end-developer friendly.
If you also support %(key)t syntax it should be wonderful, because we use dict to fill query values, easier to write.
 



[1] https://docs.python.org/2/library/stdtypes.html#string-formatting
[2] I notice that "t" isn't used for anything else, so far.



--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg


Re: Sanitize schema name

From
Ludovic Gasc
Date:
David, I like your suggestion, we will use that on production, while we're waiting the new release of pyscopg2.

Regards.

--
Ludovic Gasc (GMLudo)

2015-05-11 19:05 GMT+02:00 David Niergarth <dn@hddesign.com>:
Lack of something like PQescapeIdentifier has always felt like a hole in the API. When I need to dynamically add a schema- or table name to a query from a trusted source, I'll do something like

sql = """
SELECT id 
FROM {schema}.{table}
WHERE name = %(spam)s
""".format(schema=my_schema, table=my_table)

cursor.execute(sql, dict(spam=user_spam)

to make it clear where I'm deliberately inserting an identifier. Having a %t or similar would be much simpler and would handle the untrusted case.

David

On Sun, May 10, 2015 at 7:07 AM, Ludovic Gasc <gmludo@gmail.com> wrote:
2015-05-10 11:00 GMT+02:00 P. Christeas <xrg@linux.gr>:
On Thursday 07 of May 2015, Daniele Varrazzo wrote:
> Looks like there is more and more the need of exposing a function like
> libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
> so recently, I'm reluctant to add such a function to 2.6.1.
>
> Maybe releasing a small Python module exposing just that function,
> then add the functionality to psycopg 2.7?

I vote for a pre-release of 2.7, with this feature. Modifying the API, even if
the new function wouldn't interfere with any existing ones, calls for a
version bump.

I'm in to be one of a beta-tester.
 

Just another idea, would it make sense to abuse the semantics of string
formatting[1] and introduce another type, say "%t" [2] for implicit identifier
escaping?

This would make our queries look like:
   cr.execute("SELECT id FROM %t WHERE name = %s", ('some.tbl', 'spam'))

Sincerely, it should be awesome, because it means it's more end-developer friendly.
If you also support %(key)t syntax it should be wonderful, because we use dict to fill query values, easier to write.
 



[1] https://docs.python.org/2/library/stdtypes.html#string-formatting
[2] I notice that "t" isn't used for anything else, so far.



--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg



Re: Sanitize schema name

From
Elliot S
Date:
I like this idea and drafted it up.

Looking for comments on this patch:

https://github.com/yieldsfalsehood/psycopg2/commit/f86f773de6ee99e2d7a2807136dcb458d97ba852

In short:
  1. identifier quoting may use PQescapeIdentifier if it's available, otherwise the pure-psyco escaping is done
  2. the %t format is now accepted, and its value must be either a string or bytes (no error handling is done yet if this isn't the case) - replacement for this calls out to the identifier quoting

On 05/10/2015 05:00 AM, P. Christeas wrote:
On Thursday 07 of May 2015, Daniele Varrazzo wrote:
Looks like there is more and more the need of exposing a function like
libpq's PQescapeIdentifier [1]. Too bad psycopg 2.6 has been released
so recently, I'm reluctant to add such a function to 2.6.1.

Maybe releasing a small Python module exposing just that function,
then add the functionality to psycopg 2.7?
I vote for a pre-release of 2.7, with this feature. Modifying the API, even if 
the new function wouldn't interfere with any existing ones, calls for a 
version bump.

Just another idea, would it make sense to abuse the semantics of string 
formatting[1] and introduce another type, say "%t" [2] for implicit identifier 
escaping?

This would make our queries look like:  cr.execute("SELECT id FROM %t WHERE name = %s", ('some.tbl', 'spam'))



[1] https://docs.python.org/2/library/stdtypes.html#string-formatting
[2] I notice that "t" isn't used for anything else, so far.




Re: Sanitize schema name

From
Federico Di Gregorio
Date:
On 13/05/2015 16:13, Elliot S wrote:
> I like this idea and drafted it up.
>
> Looking for comments on this patch:
>
> https://github.com/yieldsfalsehood/psycopg2/commit/f86f773de6ee99e2d7a2807136dcb458d97ba852
>
> In short:
>    1. identifier quoting may use PQescapeIdentifier if it's available,
> otherwise the pure-psyco escaping is done
>    2. the %t format is now accepted, and its value must be either a
> string or bytes (no error handling is done yet if this isn't the case) -
> replacement for this calls out to the identifier quoting

The patch looks fine to me but your tests should cover all corner cases:

1) spaces in identifiers
2) double quotes in identifiers
3) a mix of upper- and lower-case characters

I'd also like to see the tests compare the result with the result of a
"SELECT quote_ident(...)" call, just to be future proof.

Also, I'd expose the quoting function in psycopg.extensions to let the
user build the query string separately from the .execute() call: this is
useful if you want to stick to DBAPI in your .execute() call. I.e., to
allow something like:

from psycopg.extensions import quote_ident

query = "SELECT %s FROM %s WHERE id = %%s" % (
    quote_ident('table'), quote_ident('col'))

curs.execute(query, (id_value,))

federico

--
Federico Di Gregorio                         federico.digregorio@dndg.it
Di Nunzio & Di Gregorio srl                               http://dndg.it
      One key. One input. One enter. All right. -- An american consultant
            (then the system crashed and took down the *entire* network)


Re: Sanitize schema name

From
Elliot S
Date:
Sounds good, thanks for the feedback. I should have time to work on this
today and tomorrow.

On 05/20/2015 04:14 AM, Federico Di Gregorio wrote:
> On 13/05/2015 16:13, Elliot S wrote:
>> I like this idea and drafted it up.
>>
>> Looking for comments on this patch:
>>
>> https://github.com/yieldsfalsehood/psycopg2/commit/f86f773de6ee99e2d7a2807136dcb458d97ba852
>>
>>
>> In short:
>>    1. identifier quoting may use PQescapeIdentifier if it's available,
>> otherwise the pure-psyco escaping is done
>>    2. the %t format is now accepted, and its value must be either a
>> string or bytes (no error handling is done yet if this isn't the case) -
>> replacement for this calls out to the identifier quoting
>
> The patch looks fine to me but your tests should cover all corner cases:
>
> 1) spaces in identifiers
> 2) double quotes in identifiers
> 3) a mix of upper- and lower-case characters
>
> I'd also like to see the tests compare the result with the result of a
> "SELECT quote_ident(...)" call, just to be future proof.
>
> Also, I'd expose the quoting function in psycopg.extensions to let the
> user build the query string separately from the .execute() call: this
> is useful if you want to stick to DBAPI in your .execute() call. I.e.,
> to allow something like:
>
> from psycopg.extensions import quote_ident
>
> query = "SELECT %s FROM %s WHERE id = %%s" % (
>     quote_ident('table'), quote_ident('col'))
>
> curs.execute(query, (id_value,))
>
> federico
>



Re: Sanitize schema name

From
Daniele Varrazzo
Date:
I'm sorry but I don't like this idea:

On Wed, May 20, 2015 at 10:14 AM, Federico Di Gregorio <fog@dndg.it> wrote:
> On 13/05/2015 16:13, Elliot S wrote:
>>
>> I like this idea and drafted it up.
>>
>> Looking for comments on this patch:
>>
>>
>> https://github.com/yieldsfalsehood/psycopg2/commit/f86f773de6ee99e2d7a2807136dcb458d97ba852
>>
>> In short:
>>    1. identifier quoting may use PQescapeIdentifier if it's available,
>> otherwise the pure-psyco escaping is done

No: only the libpq version should be used. The psycopg one is not
generic enough and only for internal use. The function is exposed in
all the currently supported libpq versions so we don't really care
about the older ones.

>>    2. the %t format is now accepted, and its value must be either a
>> string or bytes (no error handling is done yet if this isn't the case) -
>> replacement for this calls out to the identifier quoting

I don't like this idea: %t is nowhere standard. Psycopg has already
powerful enough types-based adaptation: using it with identifiers has
always been rejected because moving to libpq parameters would break.
But the %t would break too so I don't see why to jump on this idea.
Furthermore using a different parameter goes sideways to the entire
type system: what if one passes a number to a %t? Looking at the patch
it seems like it silently discards the value. That's a no for me.

IMO we should just do:

1) expose the PQescape* libpq functions as pure bytes-bytes functions
for people to use them as they wish. Under a good moon we should
roundtrip unicode around. You know, python 3...
2) have a wrapper identifier object, whose adaptation result in
identifier escaping. With current psycopg it could be used with:

    cur.execute("update table set %s = %s", (ident(field), value))

if we want this to be used in the future:

    sql = "update table set %s = %%s" % ident(field) # [note]
    cur.execute(sql, (value,))

[note]: this is a lie because the libpq functions take a connection as
parameter too so we cannot just compose using the string % operator.
We can have a cursor method doing that instead, or having "ident"
being a cursor method.


> Also, I'd expose the quoting function in psycopg.extensions to let the user
> build the query string separately from the .execute() call: this is useful
> if you want to stick to DBAPI in your .execute() call. I.e., to allow
> something like:
>
> from psycopg.extensions import quote_ident
>
> query = "SELECT %s FROM %s WHERE id = %%s" % (
>         quote_ident('table'), quote_ident('col'))
>
> curs.execute(query, (id_value,))

Agreed on that (point 1 above), with the note that we should handle
the fact that the libpq function takes the connection too as argument.


-- Daniele


Re: Sanitize schema name

From
Federico Di Gregorio
Date:
Hi Daniele,

some comments below:

On 21/05/2015 01:51, Daniele Varrazzo wrote:
> I'm sorry but I don't like this idea:
>
> On Wed, May 20, 2015 at 10:14 AM, Federico Di Gregorio <fog@dndg.it> wrote:
>> On 13/05/2015 16:13, Elliot S wrote:
>>>
>>> I like this idea and drafted it up.
>>>
>>> Looking for comments on this patch:
>>>
>>>
>>> https://github.com/yieldsfalsehood/psycopg2/commit/f86f773de6ee99e2d7a2807136dcb458d97ba852
>>>
>>> In short:
>>>     1. identifier quoting may use PQescapeIdentifier if it's available,
>>> otherwise the pure-psyco escaping is done
>
> No: only the libpq version should be used. The psycopg one is not
> generic enough and only for internal use. The function is exposed in
> all the currently supported libpq versions so we don't really care
> about the older ones.

Agreed.

>>>     2. the %t format is now accepted, and its value must be either a
>>> string or bytes (no error handling is done yet if this isn't the case) -
>>> replacement for this calls out to the identifier quoting
>
> I don't like this idea: %t is nowhere standard. Psycopg has already
> powerful enough types-based adaptation: using it with identifiers has
> always been rejected because moving to libpq parameters would break.
> But the %t would break too so I don't see why to jump on this idea.
> Furthermore using a different parameter goes sideways to the entire
> type system: what if one passes a number to a %t? Looking at the patch
> it seems like it silently discards the value. That's a no for me.
>
> IMO we should just do:
>
> 1) expose the PQescape* libpq functions as pure bytes-bytes functions
> for people to use them as they wish. Under a good moon we should
> roundtrip unicode around. You know, python 3...
> 2) have a wrapper identifier object, whose adaptation result in
> identifier escaping. With current psycopg it could be used with:
>
>      cur.execute("update table set %s = %s", (ident(field), value))

No no no. Bound variables should be easily identifiable and should not
be mixed with identifiers. If you don't like %t, that's OK but having an
ident behave like a bound variable isn't good either. I'd say lets just
stick to expose quoting functions in psycopg.extensions or as extra
methods on the connection object.

> if we want this to be used in the future:
>
>      sql = "update table set %s = %%s" % ident(field) # [note]
>      cur.execute(sql, (value,))
>
> [note]: this is a lie because the libpq functions take a connection as
> parameter too so we cannot just compose using the string % operator.
> We can have a cursor method doing that instead, or having "ident"
> being a cursor method.

If it requires a connection, lets stick it to the connection object.

federico

--
Federico Di Gregorio                         federico.digregorio@dndg.it
Di Nunzio & Di Gregorio srl                               http://dndg.it
  When people say things are a lot more complicated than that, they
   means they're getting worried that they won't like the truth.
                                                     -- Granny Weatherwax


Re: Sanitize schema name

From
Elliot S
Date:
>>>> In short:
>>>>     1. identifier quoting may use PQescapeIdentifier if it's
>>>> available,
>>>> otherwise the pure-psyco escaping is done
>>
>> No: only the libpq version should be used. The psycopg one is not
>> generic enough and only for internal use. The function is exposed in
>> all the currently supported libpq versions so we don't really care
>> about the older ones.
>
> Agreed.

I've removed the call out to the psycopg quoting.

>>>>     2. the %t format is now accepted, and its value must be either a
>>>> string or bytes (no error handling is done yet if this isn't the
>>>> case) -
>>>> replacement for this calls out to the identifier quoting
>>
>> I don't like this idea: %t is nowhere standard. Psycopg has already
>> powerful enough types-based adaptation: using it with identifiers has
>> always been rejected because moving to libpq parameters would break.
>> But the %t would break too so I don't see why to jump on this idea.
>> Furthermore using a different parameter goes sideways to the entire
>> type system: what if one passes a number to a %t? Looking at the patch
>> it seems like it silently discards the value. That's a no for me.
>>
>> IMO we should just do:
>>
>> 1) expose the PQescape* libpq functions as pure bytes-bytes functions
>> for people to use them as they wish. Under a good moon we should
>> roundtrip unicode around. You know, python 3...
>> 2) have a wrapper identifier object, whose adaptation result in
>> identifier escaping. With current psycopg it could be used with:
>>
>>      cur.execute("update table set %s = %s", (ident(field), value))
>
> No no no. Bound variables should be easily identifiable and should not
> be mixed with identifiers. If you don't like %t, that's OK but having
> an ident behave like a bound variable isn't good either. I'd say lets
> just stick to expose quoting functions in psycopg.extensions or as
> extra methods on the connection object.

%t formatting is also removed.

>> if we want this to be used in the future:
>>
>>      sql = "update table set %s = %%s" % ident(field) # [note]
>>      cur.execute(sql, (value,))
>>
>> [note]: this is a lie because the libpq functions take a connection as
>> parameter too so we cannot just compose using the string % operator.
>> We can have a cursor method doing that instead, or having "ident"
>> being a cursor method.
>
> If it requires a connection, lets stick it to the connection object.
>
> federico
>

I've added a quote_ident() to the connection object.

I'm still working on my quote_ident branch
https://github.com/psycopg/psycopg2/compare/master...yieldsfalsehood:quote_ident

Return values are just bytestrings right now. bytestring input isn't
accepted on python3.4 (only py3 I tested); unicode and bytestring inputs
look ok for python2.7.

The other PQescape* functions were mentioned earlier but I haven't
touched them, yet.


Re: Sanitize schema name

From
Daniele Varrazzo
Date:
On Wed, May 27, 2015 at 5:32 PM, Elliot S <yields.falsehood@gmail.com> wrote:

> I've removed the call out to the psycopg quoting.

> %t formatting is also removed.

Thank you for these adjustments.


> I've added a quote_ident() to the connection object.

IMO we should have a function on psycopg2.extensions taking a
connection or cursor object and a string, like the various
extras.register_*. This method doesn't really belong to the connection
(it does require a connection to work, but that's pretty much like
everything else in libpq). It doesn't even belong to the cursor: it's
just an utility function wrapping some libpq functionality.


> I'm still working on my quote_ident branch
> https://github.com/psycopg/psycopg2/compare/master...yieldsfalsehood:quote_ident
>
> Return values are just bytestrings right now. bytestring input isn't
> accepted on python3.4 (only py3 I tested); unicode and bytestring inputs
> look ok for python2.7.

I'm not sure the trivial PyArg_ParseTuple(args, "s", &ident) is right
here: in py3 you have an utf8 encoded string: maybe good enough but
I'm not so sure. Looking at the libpq implementation it makes use of
the connection encoding. I think the right thing to do is, in case of
unicode input, encode it in the connection encoding.

The full roundtrip, IMO, should be a function that returns a string on
string input and a unicode on unicode input (with the respective
meaning of these terms in Py2 and Py3), using the connection encoding
for encoding/decoding in the latter case.


> The other PQescape* functions were mentioned earlier but I haven't touched
> them, yet.

We can take a look at them and work out which one would be useful
later, after we get this right.


-- Daniele