Thread: Is passing a list as a bound variable safe from SQL injection?
Here's the python code:
cursor.execute(textwrap.dedent("""
select *
from bundles
where bundle_id = any(%(list_of_bundle_ids))
"""), {'list_of_bundle_ids': [2,3,4,5,6,7]})
Is there more of a risk of SQL injection here by passing in a list of integers list this?
Also, is there some simple way I can pass in a list of strings that could be converted to integers, like this:
["2", "33", "444"]
When I tried passing a list containing a string, I got an "operator does not exist" error.
Finally, I feel like recently I read a blog post that described how using "= any(array[...])" was much slower than using some other approach.
But now I can't find that blog post, and I don't remember the other approach. Any ideas?
Thanks for the help!
Matt
W. Matthew Wilson
matt@tplus1.com
http://tplus1.com
On Wed, Oct 2, 2013 at 4:48 PM, W. Matthew Wilson <matt@tplus1.com> wrote: > Here's the python code: > > cursor.execute(textwrap.dedent(""" > select * > from bundles > where bundle_id = any(%(list_of_bundle_ids)) > """), {'list_of_bundle_ids': [2,3,4,5,6,7]}) > > Is there more of a risk of SQL injection here by passing in a list of > integers list this? Don't see any risk here. > Also, is there some simple way I can pass in a list of strings that could be > converted to integers, like this: > > ["2", "33", "444"] > > When I tried passing a list containing a string, I got an "operator does not > exist" error. You are creating avoidable problems to the postgres parser. Just use map(int, my_list) in python and you are on the safe side. > Finally, I feel like recently I read a blog post that described how using "= > any(array[...])" was much slower than using some other approach. > > But now I can't find that blog post, and I don't remember the other > approach. Any ideas? What other approach? Not using IN: the IN operator is converted to "= any(array)" by the postgres parser: =# explain select * from x where id in (1,2,3,4,5); QUERY PLAN ------------------------------------------------------------ Seq Scan on x (cost=0.00..6.58 rows=5 width=51) Filter: (id = ANY ('{1,2,3,4,5}'::integer[])) It is true that this can be very inefficient for long lists, I've experimented it myself several times, but it's not something you can change at driver level: creating a temp table to join on can be faster even for not very long lists. -- Daniele
On Wed, Oct 2, 2013 at 12:17 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
What other approach? Not using IN: the IN operator is converted to "=any(array)" by the postgres parser:
=# explain select * from x where id in (1,2,3,4,5);
QUERY PLAN
------------------------------------------------------------
Seq Scan on x (cost=0.00..6.58 rows=5 width=51)
Filter: (id = ANY ('{1,2,3,4,5}'::integer[]))
It is true that this can be very inefficient for long lists, I've
experimented it myself several times, but it's not something you can
change at driver level: creating a temp table to join on can be faster
even for not very long lists.
-- Daniele
This is the approach (and it does involve very long lists):
http://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/
Instead of writing = any(array[1,2,3,4]), they wrote = any(values (1), (2), (3), (4), )
and somehow that works more quickly.
Would it be possible to make a python list subclass that converts itself to "values ( ... )" rather than to "array[ ... ]"? I imagine that this might be useful for these gigantic lists.
Matt
On 02/10/2013 18:31, W. Matthew Wilson wrote: [snip] > This is the approach (and it does involve very long lists): > > http://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/ > > Instead of writing = any(array[1,2,3,4]), they wrote = any(values (1), > (2), (3), (4), ) > > and somehow that works more quickly. Hi Matthew, you can override the list adapter and have it generate the "values" expression instead of an array. See this example: http://www.psycopg.org/psycopg/docs/advanced.html?highlight=adapt#adapting-new-python-types-to-sql-syntax Btw, I am a little curious, what kind of query requires an array of ~11000 values? Surely there is a better way to write it, isn't it? federico -- Federico Di Gregorio federico.digregorio@dndg.it Di Nunzio & Di Gregorio srl http://dndg.it The only thing I see is if you are pumping so much data into the database all the time when do you expect to look at it? -- Charlie Clark
On Thu, Oct 3, 2013 at 8:53 AM, Federico Di Gregorio <fog@dndg.it> wrote: > On 02/10/2013 18:31, W. Matthew Wilson wrote: > [snip] >> This is the approach (and it does involve very long lists): >> >> http://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/ >> >> Instead of writing = any(array[1,2,3,4]), they wrote = any(values (1), >> (2), (3), (4), ) >> >> and somehow that works more quickly. > > Hi Matthew, > > you can override the list adapter and have it generate the "values" > expression instead of an array. See this example: > > http://www.psycopg.org/psycopg/docs/advanced.html?highlight=adapt#adapting-new-python-types-to-sql-syntax Interesting indeed. Too bad in the article there is no reference to the discussion in -performance about the issue (or is there and I haven't seen it?). Here is an adapter that can be used to wrap a sequence and pass it to VALUES: https://gist.github.com/dvarrazzo/6809387 Unfortunately it seems there is no way to create the gist under the psycopg organization on github. -- Daniele
On Thu, Oct 3, 2013 at 2:07 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > Interesting indeed. Too bad in the article there is no reference to > the discussion in -performance about the issue (or is there and I > haven't seen it?). Found it: there is a single answer: http://www.postgresql.org/message-id/6581.1375891657@sss.pgh.pa.us -- Daniele
On Thu, Oct 3, 2013 at 2:53 PM, Federico Di Gregorio <fog@dndg.it> wrote: > Btw, I am a little curious, what kind of query requires an array of > ~11000 values? Surely there is a better way to write it, isn't it? I see it a lot when breaking up bulk operations into small chunks. If you can't use a temporary table persisting across transactions (eg. pgbouncer in transaction pooling mode), the fastest way is often a script that first sucks down a huge list of ids and does the update in lots of small transactions. It works because PG happily handles queries like 'UPDATE foo SET bar='baz' WHERE id IN ([... 10,000 ids ...])' without breaking a sweat, even though on the surface it looks like a really silly thing to do. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/