Thread: Protection from SQL injection
Hi, As you know, "SQL injection" is the main security problem of databases today. I think I have a solution: 'disabling literals'. Or you may call it 'enforcing the use of parameterized statements'. This means that SQL statements with embedded user input are rejected at runtime. My solution goes beyond saying "developers ~should~ use parameterized statements". That is not a solution because developers are lazy. My solution is: "developers MUST use parameterized statements". It goes like this: Literals are disabled using the SQL statement: SET ALLOW_LITERALS NONE; Afterwards, SQL statements with text are not allowed any more. That means, SQL statement of the form "SELECT * FROM USERS WHERE PASSWORD='qerkllkj'" will fail with the exception 'Literals are not allowed, please use parameters'. It is like the database does not know what ='qerkllkj' means. Only statements of the secure form are allowed, for example "SELECT * FROM USERS WHERE PASSWORD=?". This solves the problem because SQL injection is almost impossible if user input is not directly embedded in SQL statements. The 'ALLOW_LITERALS NONE' mode is enabled by the developer itself, or by an administrator. It is still possible to generate SQL statements dynamically, and use the same APIs as before, as long as SQL statements don't include literals. Literals can still be used when using query tools, or in applications considered 'safe'. To ease converting the application to use parameterized queries, there should be a second mode where number literals are allowed: SET ALLOW_LITERALS NUMBERS. To allow all literals, execute SET ALLOW_LITERALS ALL (this is the default setting). So far this feature is implemented in my little database H2. More information about this feature is described here: http://www.h2database.com/html/advanced.html#sql_injection What do you think about it? Do you think it makes sense to implement this security feature in PostgreSQL as well? If not why not? Does PostgreSQL have another solution or plan to solve the SQL injection problem? Regards, Thomas
"Thomas Mueller" <thomas.tom.mueller@gmail.com> writes: > SET ALLOW_LITERALS NONE; I think you missed April Fool's Day... This is just silly, as it makes life impossibly painful for users (constants are hardly a useless part of SQL) and it doesn't really plug any holes. As an example: select * from tab where intcol = intcol; delete from tab; contains no literals and yet the delete is very probably injected. regards, tom lane
On Sat, Apr 26, 2008 at 11:32 AM, Thomas Mueller <thomas.tom.mueller@gmail.com> wrote: > > The 'ALLOW_LITERALS NONE' mode is enabled by the developer itself, or > by an administrator. then it solves nothing... what if the developer never SET ALLOW_LITERALS NONE or maybe i can inject "select * from tab where intcol = intcol; set allow_literals all; add any query you want" -- regards, Jaime Casanova Soporte de PostgreSQL Guayaquil - Ecuador Cel. (593) 087171157
Hi, > I think you missed April Fool's Day... This is absolutely not an April Fool idea :-) I must have made a bad job explaining my idea. > This is just silly, as it makes life impossibly painful for users You mean developers? No, developers should use parameterized queries anyway. > (constants are hardly a useless part of SQL) Your are right. But it depends how you define 'constant'. In other programming languages, constants are not always literals. Instead, constants are defined like this: #define ACTIVE 1 // C final static int ACTIVE=1; // Java As far as I know, this concept doesn't exist in the SQL. I suggest to add this concept as well to the database engine: CREATE CONSTANT [IF NOT EXISTS] newConstantName VALUE expression; DROP CONSTANT [IF EXISTS] constantName; Example: CREATE CONSTANT PI VALUE 3.1415926535; CREATE CONSTANT ACTIVE VALUE 1; CREATE CONSTANT INACTIVE VALUE 0; This also improves the program because you don't need to 'hard code' numbers in the application: SELECT * FROM USERS WHERE STATE=ACTIVE AND PASSWORD=? > it doesn't really plug any holes. Sure it does. > As an example: > select * from tab where intcol = intcol; delete from tab; How would the application that executed this statement would look like? In Java: ResultSet rs = stat.executeQuery("select * from tab where " + userInput); Such a program wouldn't make any sense, right? Do you mean this? ResultSet rs = stat.executeQuery("select * from tab where state = " + userInput); If literals are disabled, the database would reject any number. The program wouldn't work in the normal case any longer if literals are disabled. So the developer would have to change it to (otherwise his application doesn't work) to: PreparedStatement prep = conn.prepareStatement("select * from tab where state = ?"); prep.setInt(1, userInput);... This is save. There is no way to inject SQL here. Regards, Thomas
Hi, > > The 'ALLOW_LITERALS NONE' mode is enabled by the developer itself, or > > by an administrator. > then it solves nothing... > what if the developer never SET ALLOW_LITERALS NONE As I have said, the 'ALLOW_LITERALS NONE' mode is enabled by the developer itself, or by an administrator. The developer may be lazy, but the administrator can enforce this policy. > maybe i can inject "select * from tab where intcol = intcol; set > allow_literals all; add any query you want" How do you inject this? How would the application looks like where this can be injected? Regards, Thomas
On Sat, Apr 26, 2008 at 1:19 PM, Thomas Mueller <thomas.tom.mueller@gmail.com> wrote: > Hi, > > > > The 'ALLOW_LITERALS NONE' mode is enabled by the developer itself, or > > > by an administrator. > > then it solves nothing... > > what if the developer never SET ALLOW_LITERALS NONE > > As I have said, the 'ALLOW_LITERALS NONE' mode is enabled by the > developer itself, or by an administrator. The developer may be lazy, > but the administrator can enforce this policy. > but can't the developer allow literals again? > > maybe i can inject "select * from tab where intcol = intcol; set > > allow_literals all; add any query you want" > > How do you inject this? How would the application looks like where > this can be injected? > ok... point taken -- regards, Jaime Casanova Soporte de PostgreSQL Guayaquil - Ecuador Cel. (593) 087171157
Thomas Mueller wrote on 26.04.2008 18:32: > Literals can still be used when using query tools, or in applications considered 'safe'. I fail to see how the backend could distinguish between a query sent by a query tool and a query sent by an "application". Thomas
On Sat, Apr 26, 2008 at 3:32 PM, Thomas Kellerer <spam_eater@gmx.net> wrote: > Thomas Mueller wrote on 26.04.2008 18:32: > > > Literals can still be used when using query tools, or in applications > considered 'safe'. > > > I fail to see how the backend could distinguish between a query sent by a > query tool and a query sent by an "application". Wouldn't it be much simpler to have a version of the libpq client lib that only understands prepared queries?
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > Wouldn't it be much simpler to have a version of the libpq client lib > that only understands prepared queries? You could do that, but there's still no way for it to know exactly how the submitted query was constructed. This would block off the types of injections that want to add whole SQL commands, but not ones that just subvert the current query (eg adding OR TRUE to see data you shouldn't). This is really a client problem and only client-side solutions will provide meaningful traction for it. In perl, for instance, the "taint" mechanism is a good way to notice whether any insecure strings are getting into database queries. regards, tom lane
On Sat, Apr 26, 2008 at 9:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Scott Marlowe" <scott.marlowe@gmail.com> writes: > > Wouldn't it be much simpler to have a version of the libpq client lib > > that only understands prepared queries? > > You could do that, but there's still no way for it to know exactly how > the submitted query was constructed. This would block off the types of > injections that want to add whole SQL commands, but not ones that just > subvert the current query (eg adding OR TRUE to see data you shouldn't). > > This is really a client problem and only client-side solutions will > provide meaningful traction for it. In perl, for instance, the "taint" > mechanism is a good way to notice whether any insecure strings are > getting into database queries. Agreed. My point was that to do what the OP wants, wouldn't it make more sense to just lobotomize libpq so it doesn't understand anything but prepared queries. Doesn't obviate the need for a client side language based solution. Just seems to make WAY more sense than trying to make the change at the server level in pgsql.
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > Agreed. My point was that to do what the OP wants, wouldn't it make > more sense to just lobotomize libpq so it doesn't understand anything > but prepared queries. I doubt that that particular lobotomization accomplishes much in comparison to the penalties. IIRC there was some discussion recently of providing a mode in which the server would reject PQexec strings containing more than one query. I didn't care for it much at the time, but I think it would provide most of the benefit of these suggestions with far less compatibility or performance hit. regards, tom lane
On Sat, Apr 26, 2008 at 9:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > IIRC there was some discussion recently of providing a mode in which > the server would reject PQexec strings containing more than one query. > I didn't care for it much at the time, but I think it would provide > most of the benefit of these suggestions with far less compatibility > or performance hit. agreed. And I trust (SQL) code review more than tying the hands of the programmers. But I've always had the luxury of working with developers who liked me as a DBA and were willing to do things my way, as far as the DB was concerned anyway...
Hi, > but can't the developer allow literals again? Executing the statement SET ALLOW_LITERALS should be restricted. The application uses another user name / password and doesn't have to access rights to enable it. Maybe the user name / password is configured using JNDI, so the application developper has no influence on that. In any case, even if the developer can enable literals, I don't think he would, because he would be afraid to be caught cheating. Regards, Thomas
Hi, > I fail to see how the backend could distinguish between a query sent by a > query tool and a query sent by an "application". The backend could use a different client library (a client library that doesn't allow literals). But in this case two or three client libraries are required. Probably better is to restrict in the database. There would be a user (or role) for the query tool and one for the application. Maybe the SET ALLOW_LITERALS is not such a good idea. What about REVOKE LITERAL_TEXT FROM APP_ROLE. So LITERAL_TEXT and LITERAL_NUMBER would be rights (similar to REVOKE USAGE ON LANGUAGE ... FROM ...). It's an access rights problem. Let's say there is a development database (DEV_DB) and a production (PROD_DB). There are two users / roles on those systems: APP_USER (no literals) and QUERY_TOOL_USER (literals allowed). The passwords are different on each system. Developers know the password for QUERY_TOOL_USER@DEV_DB and APP_USER@DEV_DB, but only APP_USER@PROD_DB. Or developers know all passwords, but the application configuration is rewieved not to use QUERY_TOOL_USER. Regards, Thomas
Hi, > > Wouldn't it be much simpler to have a version of the libpq client lib > > that only understands prepared queries? That would be possible. However the problem is not 'prepared queries' versus 'direct queries'. It is possible to use literals in prepared queries: PreparedStatement prep = conn.prepareStatement( "SELECT * FROM USERS WHERE PASSWORD = '" + password + "'); prep.executeQuery(); This is unsafe. So you really need to filter on literals, and not on the API. If you mean a library where literals not are allowed: that would solve the problem, yes. However that would mean the client library has to parse each SQL statement. That would complicate the client library. Parsing is already done in the database engine, technically it's better (long term) to disable literals there. But it is possible to create a wrapper client library (for example a JDBC wrapper driver) for other databases where literals are not allowed. That would work for all databases. > You could do that, but there's still no way for it to know exactly how > the submitted query was constructed. This would block off the types of > injections that want to add whole SQL commands, but not ones that just > subvert the current query (eg adding OR TRUE to see data you shouldn't). In most cases you don't need to know how the query was constructed. If the query uses parameters for all user input then it is (almost always) safe. Only 'almost' because there is still a small gap: ORDER BY injection. Example: String order = input("order (id, name)?"); ResultSet rs = stat.executeQuery( "SELECT ID, NAME FROM ITEMS ORDER BY " + order); Sometimes the input comes from a web application (the user clicks on the column header to sort on it). See also http://code.google.com/p/h2database/source/browse/trunk/h2/src/test/org/h2/samples/SQLInjection.java > This is really a client problem and only client-side solutions will > provide meaningful traction for it. This is like data validation on client side javascript. Yes, it's good to have data validation there. But, the user could have Javascript disabled. In any case it's better to have data validation on the server side as well. > In perl, for instance, the "taint" mechanism is a good way to notice whether any insecure strings are > getting into database queries. There is no 'taint' mechanism in programming languages like Java. Also, it requires that the application is reviewed. In the companies I worked so far there were almost no code reviews. A lazy programmer can still make mistakes. Regards, Thomas
Hi, > providing a mode in which the server would reject PQexec strings containing more than one query. That wouldn't help a lot. The simple SQL injection is not detected: ResultSet rs = stat.executeQuery( "SELECT * FROM USERS WHERE PASSWORD='" + password + "'"); An attacker would only need to use the following password: ' OR 1=1 The the SQL statements is still only one query: SELECT * FROM USERS WHERE PASSWORD='' OR 1=1 Regards, Thomas
On Sat, 26 Apr 2008 23:24:59 -0600 "Scott Marlowe" <scott.marlowe@gmail.com> wrote: > On Sat, Apr 26, 2008 at 9:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > IIRC there was some discussion recently of providing a mode in > > which the server would reject PQexec strings containing more than > > one query. I didn't care for it much at the time, but I think it > > would provide most of the benefit of these suggestions with far > > less compatibility or performance hit. > > agreed. > And I trust (SQL) code review more than tying the hands of the > programmers. > But I've always had the luxury of working with developers who liked > me as a DBA and were willing to do things my way, as far as the DB > was concerned anyway... what if you're the DBA and the dev and you don't trust yourself even if you'd be willing to do the things your way ;) -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Sat, 26 Apr 2008 21:50:10 -0600 "Scott Marlowe" <scott.marlowe@gmail.com> wrote: > Agreed. My point was that to do what the OP wants, wouldn't it make > more sense to just lobotomize libpq so it doesn't understand > anything but prepared queries. Doesn't obviate the need for a > client side language based solution. Just seems to make WAY more > sense than trying to make the change at the server level in pgsql. The problem may be legacy code. You'd like to statically point out places where multiple statements could get injected. All calls to your "query" function get routed to a wrapper that actually call prepare/execute logic. You do a BIG search&replace and see where your code fail cos you actually needed more than one statement in a query. Now you just have to grep your code for direct call to "plain" queries during commit of your rcs. My proposal was to add a switch that force routing to prepared statement logic in libpq. I'm thinking about situation in which you're using a library that already wrap your query call. You don't want to change the wrapper, so you don't want to take the responsibility, sync troubles etc... of the library maintainer but still you'd like to add a safety net to your code. People dealing with your code would still see the familiar library wrapper (you're not wrapping the wrapper) but you'd be able to switch to "single statement mode". Still ALLOW_LITERALS is a nice feature even if I think it won't fix the notorious SQL injection problem forever. Since it is going to make dev nervous because it adds code bloat that's going to cause more bugs than the SQL injections it may prevent. Once you've developers that are so patient to write stuff like: "select a.id, b.name from a join b on b.id=a.id where a.status='pending' and b.id>7 and b.status='logged'" into "select a.id, b.name from a join b on b.id=a.id where a.status=? and b.id>? and b.status=?", 'pending', 7, 'logged' there are high chances they will prefer to spend some of their time actually thinking about what they are writing. I do know that thinking can't be taken for granted and that habits and automatic methods may be generally preferable to "thinking", but automatic methods works when they don't look painful. Prepared statements force you to match input with position and it is definitively error prone. It is a tool... you may have some section of your code where that parameter can't be changed, but most of the time you'll find it useful if its default is set to NONE and dev *can* change it. Now... let's think at the poor programmer... He is writing a SQL statement that is static. He has to disable ALLOW_LITERALS. He is writing dynamic SQL that DON'T take user input. ALLOW_LITERALS may still have some sense as a debugging tool but there are high chances he will disable it to avoid other errors and make coding simpler. He is writing dynamic SQL that does take user input. He should be forced to use ALLOW_LITTERALS NONE. But how can he be forced in the middle of a program? He is writing a "mixed" statement where some input is actually static but not all... he may think it is easier to allow literals. Everything is still in the hands of the dev. Such setting may help you in static code evaluation since you may spot easier the places where there could be breach of policy... but still unless you want to make your dev life a hell... it is not going to solve the SQL injection problem. "mixed" statements that use external input and static input are quite common and writing them avoiding literals may be a pain that your dev won't be willing to suffer. Queued statements in one query are far less common. Still I do think that ALLOW_LITERAL is a valuable tool. Same problems for legacy code apply. -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Sun, 27 Apr 2008 12:38:48 +0200 Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: > Once you've developers that are so patient to write stuff like: > > "select a.id, b.name from a join b on b.id=a.id where > a.status='pending' and b.id>7 and b.status='logged'" > > into > > "select a.id, b.name from a join b on b.id=a.id where > a.status=? and b.id>? and b.status=?", 'pending', 7, 'logged' > > there are high chances they will prefer to spend some of their time > actually thinking about what they are writing. [snipped here and there] > Prepared statements force you to match input with position and it is > definitively error prone. It'd be nice to have a wrapper that let you write prepared statements this way: "select a.id, b.name from a join b on a.id=b.id where a.status=$variable1 and b.id>$variable2 etc... but that's a pretty good change to any language parser. Maybe it could be obtained by use of macro... That will turn new_query("select a.id, b.name from a join b on a.id=b.id where a.status=$variable1 and b.id>$variable2"); into pg_prepare('anonymous', 'select a.id, b.name from a join b on a.id=b.id where a.status=$1 and b.id>$2'); pg_execute('anonymous',array($variable1,$variable2)); but aren't macro evil Still it would be handy. -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ivan Sergio Borgonovo wrote: > It'd be nice to have a wrapper that let you write prepared statements > this way: > > "select a.id, b.name from a join b on a.id=b.id where > a.status=$variable1 and b.id>$variable2 etc... but that's a pretty > good change to any language parser. > Python already supports something like that. See PEP 249 (http://www.python.org/dev/peps/pep-0249/), under Module Interface, the description of the paramstyle parameter. Psycopg2 supports both the "format" (C printf) and "pyformat" styles. See the last section on this page for an example using the pyformat style: http://www.devx.com/opensource/Article/29071/0/page/3. Joe
On Sun, 27 Apr 2008 11:55:18 -0400 Joe <dev@freedomcircle.net> wrote: > Ivan Sergio Borgonovo wrote: > > It'd be nice to have a wrapper that let you write prepared > > statements this way: > > > > "select a.id, b.name from a join b on a.id=b.id where > > a.status=$variable1 and b.id>$variable2 etc... but that's a pretty > > good change to any language parser. > Python already supports something like that. See PEP 249 > (http://www.python.org/dev/peps/pep-0249/), under Module Interface, > the description of the paramstyle parameter. Psycopg2 supports both > the "format" (C printf) and "pyformat" styles. See the last section > on this page for an example using the pyformat style: > http://www.devx.com/opensource/Article/29071/0/page/3. That's better than nothing but it is still a lot of code duplication. You've to write column names in the sql statement and in the array and... column values are not contextual to the statement. That's easy... while what I wrote above does look as requiring a really special parser. Furthermore from the example it looks as if all this is going to miss the scope to prevent sql injection since it doesn't support prepared statements. -- Ivan Sergio Borgonovo http://www.webthatworks.it
Hi, > The problem may be legacy code. Of course. There is a lot of legacy code that needs to be made secure ('hardened'). The best solution is to use parameterized queries. To find out what statements are still not doing that, first disable text literals only. If some places are incorrect, the relevant statements will be in the log file. After text literals are fixed, number literals can be disabled. > Since it is going to make dev nervous because it adds code bloat > that's going to cause more bugs than the SQL injections it may > prevent. I don't think it will really bloat the code. In fact, the code will be cleaner afterwards. > "select a.id, b.name from a join b on b.id=a.id where > a.status='pending' and b.id>7 and b.status='logged' I even saw code like this in a VIEW: ... and b.type = 10430001 ... The 10430001 had a special meaning. It was documented somewhere, but not in the code of course ;-) 'pending' and 'logged' are better, but in regular programming languages usually constants are defined. This also solves the 'disable literals' pain you have described. The code would look like this: select a.id, b.name from a join b on b.id=a.id where a.status=STATUS_PENDING and b.id>? and b.status=STATUS_LOGGED. So parameters only need to be used for the actual user input. > Prepared statements force you to match input with position and it is > definitively error prone. The correctness of the application in the normal case is usually tested. But I didn't so far see any SQL injection tests in regular application. > Everything is still in the hands of the dev. Not always, specially not in banking applications. > you want to make your dev life a hell Using parameterized queries is not 'hell', it's best practice. Regards, Thomas On Sun, Apr 27, 2008 at 12:38 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: > On Sat, 26 Apr 2008 21:50:10 -0600 > "Scott Marlowe" <scott.marlowe@gmail.com> wrote: > > > Agreed. My point was that to do what the OP wants, wouldn't it make > > more sense to just lobotomize libpq so it doesn't understand > > anything but prepared queries. Doesn't obviate the need for a > > client side language based solution. Just seems to make WAY more > > sense than trying to make the change at the server level in pgsql. > > The problem may be legacy code. > > You'd like to statically point out places where multiple statements > could get injected. > All calls to your "query" function get routed to a wrapper that > actually call prepare/execute logic. > You do a BIG search&replace and see where your code fail cos you > actually needed more than one statement in a query. > Now you just have to grep your code for direct call to "plain" > queries during commit of your rcs. > > My proposal was to add a switch that force routing to prepared > statement logic in libpq. > > I'm thinking about situation in which you're using a library that > already wrap your query call. > You don't want to change the wrapper, so you don't want to take the > responsibility, sync troubles etc... of the library maintainer but > still you'd like to add a safety net to your code. > > People dealing with your code would still see the familiar library > wrapper (you're not wrapping the wrapper) but you'd be able to switch > to "single statement mode". > > > Still ALLOW_LITERALS is a nice feature even if I think it won't fix > the notorious SQL injection problem forever. > Since it is going to make dev nervous because it adds code bloat > that's going to cause more bugs than the SQL injections it may > prevent. > Once you've developers that are so patient to write stuff like: > > "select a.id, b.name from a join b on b.id=a.id where > a.status='pending' and b.id>7 and b.status='logged'" > > into > > "select a.id, b.name from a join b on b.id=a.id where > a.status=? and b.id>? and b.status=?", 'pending', 7, 'logged' > > there are high chances they will prefer to spend some of their time > actually thinking about what they are writing. > > I do know that thinking can't be taken for granted and that habits > and automatic methods may be generally preferable to "thinking", but > automatic methods works when they don't look painful. > Prepared statements force you to match input with position and it is > definitively error prone. > > It is a tool... you may have some section of your code where that > parameter can't be changed, but most of the time you'll find it > useful if its default is set to NONE and dev *can* change it. > > Now... let's think at the poor programmer... > > He is writing a SQL statement that is static. He has to disable > ALLOW_LITERALS. > He is writing dynamic SQL that DON'T take user input. > ALLOW_LITERALS may still have some sense as a debugging tool but > there are high chances he will disable it to avoid other errors and > make coding simpler. > He is writing dynamic SQL that does take user input. He should be > forced to use ALLOW_LITTERALS NONE. But how can he be forced in the > middle of a program? > He is writing a "mixed" statement where some input is actually static > but not all... he may think it is easier to allow literals. > > Everything is still in the hands of the dev. > Such setting may help you in static code evaluation since you may > spot easier the places where there could be breach of policy... but > still unless you want to make your dev life a hell... it is not going > to solve the SQL injection problem. > > "mixed" statements that use external input and static input are quite > common and writing them avoiding literals may be a pain that your dev > won't be willing to suffer. > > Queued statements in one query are far less common. > > Still I do think that ALLOW_LITERAL is a valuable tool. > Same problems for legacy code apply. > > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > > > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
Ivan Sergio Borgonovo wrote: > That's better than nothing but it is still a lot of code duplication. > You've to write column names in the sql statement and in the array > and... column values are not contextual to the statement. > The apparent duplication in the example stems for its tutorial nature. In a real program, the namedict "array" (it's actually a Python tuple--an immutable array) would normally be constructed programmatically from user or other input. Note also that although Joshua chose to use dictionary keys named identical to the PG column names, they could be named differently, like "first" and "last". > That's easy... while what I wrote above does look as requiring a > really special parser. > > Furthermore from the example it looks as if all this is going to > miss the scope to prevent sql injection since it doesn't support > prepared statements. > I assume you didn't check the PEP 249 (http://www.python.org/dev/peps/pep-0249/). The execute() and executemany() Cursor object methods are precisely to prepare and execute database operations. Joe