Thread: Is passing a list as a bound variable safe from SQL injection?

Is passing a list as a bound variable safe from SQL injection?

From
"W. Matthew Wilson"
Date:
Here's the python code:

    cursor.execute(textwrap.dedent("""
        select *
        from bundles
        where bundle_id = any(%(list_of_bundle_ids))
        """), {'list_of_bundle_ids': [2,3,4,5,6,7]})

Is there more of a risk of SQL injection here by passing in a list of integers list this?

Also, is there some simple way I can pass in a list of strings that could be converted to integers, like this:

    ["2", "33", "444"]

When I tried passing a list containing a string, I got an "operator does not exist" error.

Finally, I feel like recently I read a blog post that described how using "= any(array[...])" was much slower than using some other approach.

But now I can't find that blog post, and I don't remember the other approach.  Any ideas?

Thanks for the help!

Matt



--
W. Matthew Wilson
matt@tplus1.com
http://tplus1.com

Re: Is passing a list as a bound variable safe from SQL injection?

From
Daniele Varrazzo
Date:
On Wed, Oct 2, 2013 at 4:48 PM, W. Matthew Wilson <matt@tplus1.com> wrote:
> Here's the python code:
>
>     cursor.execute(textwrap.dedent("""
>         select *
>         from bundles
>         where bundle_id = any(%(list_of_bundle_ids))
>         """), {'list_of_bundle_ids': [2,3,4,5,6,7]})
>
> Is there more of a risk of SQL injection here by passing in a list of
> integers list this?

Don't see any risk here.


> Also, is there some simple way I can pass in a list of strings that could be
> converted to integers, like this:
>
>     ["2", "33", "444"]
>
> When I tried passing a list containing a string, I got an "operator does not
> exist" error.

You are creating avoidable problems to the postgres parser. Just use
map(int, my_list) in python and you are on the safe side.


> Finally, I feel like recently I read a blog post that described how using "=
> any(array[...])" was much slower than using some other approach.
>
> But now I can't find that blog post, and I don't remember the other
> approach.  Any ideas?

What other approach? Not using IN: the IN operator is converted to "=
any(array)" by the postgres parser:

=# explain select * from x where id in (1,2,3,4,5);
                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on x  (cost=0.00..6.58 rows=5 width=51)
   Filter: (id = ANY ('{1,2,3,4,5}'::integer[]))

It is true that this can be very inefficient for long lists, I've
experimented it myself several times, but it's not something you can
change at driver level: creating a temp table to join on can be faster
even for not very long lists.


-- Daniele


Re: Is passing a list as a bound variable safe from SQL injection?

From
"W. Matthew Wilson"
Date:
On Wed, Oct 2, 2013 at 12:17 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
What other approach? Not using IN: the IN operator is converted to "=
any(array)" by the postgres parser:

=# explain select * from x where id in (1,2,3,4,5);
                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on x  (cost=0.00..6.58 rows=5 width=51)
   Filter: (id = ANY ('{1,2,3,4,5}'::integer[]))

It is true that this can be very inefficient for long lists, I've
experimented it myself several times, but it's not something you can
change at driver level: creating a temp table to join on can be faster
even for not very long lists.


-- Daniele

This is the approach (and it does involve very long lists): 

http://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/

Instead of writing = any(array[1,2,3,4]), they wrote = any(values (1), (2), (3), (4), )

and somehow that works more quickly.

Would it be possible to make a python list subclass that converts itself to "values ( ... )" rather than to "array[ ... ]"?  I imagine that this might be useful for these gigantic lists.

Matt

Re: Is passing a list as a bound variable safe from SQL injection?

From
Federico Di Gregorio
Date:
On 02/10/2013 18:31, W. Matthew Wilson wrote:
[snip]
> This is the approach (and it does involve very long lists):
>
> http://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/
>
> Instead of writing = any(array[1,2,3,4]), they wrote = any(values (1),
> (2), (3), (4), )
>
> and somehow that works more quickly.

Hi Matthew,

you can override the list adapter and have it generate the "values"
expression instead of an array. See this example:

http://www.psycopg.org/psycopg/docs/advanced.html?highlight=adapt#adapting-new-python-types-to-sql-syntax


Btw, I am a little curious, what kind of query requires an array of
~11000 values? Surely there is a better way to write it, isn't it?

federico

--
Federico Di Gregorio                         federico.digregorio@dndg.it
Di Nunzio & Di Gregorio srl                               http://dndg.it
 The only thing I see is if you are pumping so much data into the
  database all the time when do you expect to look at it?
                                                        -- Charlie Clark


Re: Is passing a list as a bound variable safe from SQL injection?

From
Daniele Varrazzo
Date:
On Thu, Oct 3, 2013 at 8:53 AM, Federico Di Gregorio <fog@dndg.it> wrote:
> On 02/10/2013 18:31, W. Matthew Wilson wrote:
> [snip]
>> This is the approach (and it does involve very long lists):
>>
>> http://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/
>>
>> Instead of writing = any(array[1,2,3,4]), they wrote = any(values (1),
>> (2), (3), (4), )
>>
>> and somehow that works more quickly.
>
> Hi Matthew,
>
> you can override the list adapter and have it generate the "values"
> expression instead of an array. See this example:
>
> http://www.psycopg.org/psycopg/docs/advanced.html?highlight=adapt#adapting-new-python-types-to-sql-syntax

Interesting indeed. Too bad in the article there is no reference to
the discussion in -performance about the issue (or is there and I
haven't seen it?).

Here is an adapter that can be used to wrap a sequence and pass it to VALUES:

https://gist.github.com/dvarrazzo/6809387

Unfortunately it seems there is no way to create the gist under the
psycopg organization on github.

-- Daniele


Re: Is passing a list as a bound variable safe from SQL injection?

From
Daniele Varrazzo
Date:
On Thu, Oct 3, 2013 at 2:07 PM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:
> Interesting indeed. Too bad in the article there is no reference to
> the discussion in -performance about the issue (or is there and I
> haven't seen it?).

Found it: there is a single answer:
http://www.postgresql.org/message-id/6581.1375891657@sss.pgh.pa.us

-- Daniele


Re: Is passing a list as a bound variable safe from SQL injection?

From
Stuart Bishop
Date:
On Thu, Oct 3, 2013 at 2:53 PM, Federico Di Gregorio <fog@dndg.it> wrote:

> Btw, I am a little curious, what kind of query requires an array of
> ~11000 values? Surely there is a better way to write it, isn't it?

I see it a lot when breaking up bulk operations into small chunks. If
you can't use a temporary table persisting across transactions (eg.
pgbouncer in transaction pooling mode), the fastest way is often a
script that first sucks down a huge list of ids and does the update in
lots of small transactions. It works because PG happily handles
queries like 'UPDATE foo SET bar='baz' WHERE id IN ([... 10,000 ids
...])' without breaking a sweat, even though on the surface it looks
like a really silly thing to do.

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/