Thread: Correct escaping of untrusted data
Hi folks, The thread on injection attacks was very instructive, but seemed to run out of steam at an interesting point. Now you guys have kindly educated me about the real nature of the issues, can I ask again what effective escaping really means? Are the standard escaping functions found in the PHP, Tcl etc APIs to Postgres bombproof? Are there any encodings that might slip through and be cast to malicious strings inside Postgres? What about functions like convert(): could they be used to slip something through the escaping function? I don't really have enough knowledge in this area to be confident in the results of my own experiments. Any advice from the more technically savvy would be much appreciated. ------------------ Geoff Caplan Vario Software Ltd (+44) 121-515 1154
Geoff Caplan <geoff@variosoft.com> writes: > Are the standard escaping functions found in the PHP, Tcl etc APIs to > Postgres bombproof? I dunno; you'd probably want to look at the source for each one you planned to use, anyway, if you're being paranoid. As long as they escape ' and \ they should be okay. If your source language allows embedded nulls (\0) in strings you might want to reject those as well. > Are there any encodings that might slip through > and be cast to malicious strings inside Postgres? All the supported encodings are supersets of ASCII, so I don't think there is any such risk. There is a risk in the opposite direction I think: if the escaping function doesn't know the encoding being used it might think that one byte of a multibyte character is ' or \ and try to escape it, thereby breaking the data. This could not happen in "sane" encodings like UTF-8, however, just in the one or two Far Eastern encodings that allow multibyte characters to contain bytes <= 0x7F. Since you as the application programmer can control what client-side encoding is used, the simplest answer here is just to be sure you're using a sane encoding, or at least that the escaping function knows the encoding you're using. > What about functions like convert(): could they be used to slip > something through the escaping function? Don't see how. The issue is to be sure that the query string traveling to the backend will be interpreted the way you expected. By the time any server-side function executes it is far too late to change that interpretation. regards, tom lane
Geoff Caplan wrote: > Are the standard escaping functions found in the PHP, Tcl etc APIs to > Postgres bombproof? Are there any encodings that might slip through > and be cast to malicious strings inside Postgres? What about functions > like convert(): could they be used to slip something through the > escaping function? What about writing nessus plugin(s) or a specific scanner for these escaping issues ? I don't know if a such thing already exists... -- Olivier
Tom, Belated thanks for the info (I've been away from my desk). Very helpful. ------------------ Geoff Caplan Vario Software Ltd (+44) 121-515 1154
At 11:09 AM 7/31/2004 -0400, Tom Lane wrote: >All the supported encodings are supersets of ASCII, so I don't think >there is any such risk. Is the 7.4.x multibyte support bombproof? How would we avoid problems like this: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&c2coff=1&safe=off&edition=us&selm=20020502171830J.t-ishii%40sra.co.jp Summary of that problem: an invalid multibyte character "eats" the following character. I know it's fixed already, but is there a way to reduce exposure to such bugs? >There is a risk in the opposite direction I think: if the escaping >function doesn't know the encoding being used >it might think that one byte of a multibyte character is ' or \ and try to >escape it, thereby breaking the data. Is the escaping function always consistent with the backend's interpretation? Is it impossible for them to be inconsistent (e.g. they use the same code to interpret data). My concern is if the escaping function thinks one byte of a multibyte is \ but the rest of the backend doesn't then you can end up with an escaped backslash which does not escape a naughty '... Also: what is the proper/official way to deal with: update tablea set data=3-? where a=1; And the parameter is -1 Somehow ensure it's always like this? update tablea set data=3 - ? where a=1; This doesn't seem to be escaped safely for: DBD::Pg 1.22 (3 versions old) with Postgresql 7.3.4 Would it be best to do the 3-? part in the application and then do update tablea set data=? where a=1; Possibly result in less CPU usage at backend too. Regards, Link.
> Is the 7.4.x multibyte support bombproof? How would we avoid problems > like this: > http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&c2coff=1&safe=off&edition=us&selm=20020502171830J.t-ishii%40sra.co.jp Well, maybe using UTF-8 encoding would fix this ? > update tablea set data=3-? where a=1; Add parentheses : > update tablea set data=3-(?) where a=1; Or do it in your program... but you can't do this if you have a db field or function instead of the 3.
Hi folks I'm designing a table to be used for web session management. If all goes well with the project, the table should have 100,000+ records and be getting hammered with SELECTS, INSERTS and UPDATES. The table will need a technical key. The question is, what is the most efficient way to do this? a) Generate a random 24 character string in the application. Very quick for the INSERTs, but will the longer key slow down the the SELECTs and UPDATES? b) Use a sequence. Faster for the SELECTS and UPDATES, I guess, but how much will the sequence slow down the INSERTS on a medium sized record-set? There will probably be 6-8 SELECTs & UPDATEs for each INSERT. I appreciate that I could set up some tests, but I am under the hammer time-wise. Some rule-of-thumb advice from the list would be most welcome. ------------------ Geoff Caplan Vario Software Ltd (+44) 121-515 1154
After a long battle with technology, geoff@variosoft.com (Geoff Caplan), an earthling, wrote: > b) Use a sequence. Faster for the SELECTS and UPDATES, I guess, but > how much will the sequence slow down the INSERTS on a medium sized > record-set? Why, in particular, would you expect the sequence to slow down inserts? They don't lock the table. Note that if you're really doing a lot of INSERTs in parallel, you might find it worthwhile to configure the sequence to cache some number of entries so that they are pre-allocated and stored in memory for each session (e.g. - for each connection) for quicker access. See the documentation for "create sequence" for more details... -- output = reverse("gro.gultn" "@" "enworbbc") http://www3.sympatico.ca/cbbrowne/x.html Think of C++ as an object-oriented assembly language.
On Thu, Aug 12, 2004 at 13:05:45 +0100, Geoff Caplan <geoff@variosoft.com> wrote: > > b) Use a sequence. Faster for the SELECTS and UPDATES, I guess, but > how much will the sequence slow down the INSERTS on a medium sized > record-set? Using a sequence shouldn't be slow. The main potential problem is that it will make the session IDs guessible if you don't take any other steps. That may or may not be a problem. One way around this is to encrypt the sequence number in the database with a key and use a combination of the encrypted string and an index for which key is used (this makes changing keys for new sessions while allowing continued use of an old key for old sessions) as the session id. You can change the keys as often as needed and practical for your application.
Bruno Wolff III wrote: >>> Using a sequence shouldn't be slow. Thanks - that's the main thing I need to know. >>> The main potential problem is that it will make the session IDs >>> guessible if you don't take any other steps. That may or may not >>> be a problem. Thanks for the warning, but I won't be using the sequence number as the session id: as you say, not a safe thing to do. The session record key persists from session to session: it is used to link sessions with browsers and with user accounts. The session key will be a random 32 character key generated for each session. Christopher Browne wrote: >>> Why, in particular, would you expect the sequence to slow down >>> inserts? They don't lock the table. I was assuming that generating the sequence number was expensive: it is some other DBs I have used. That was why I was thinking of providing a unique id via a random string. But a practical test shows that in PG it is pretty fast, so there is not need. >>> Note that if you're really doing a lot of INSERTs in parallel, you >>> might find it worthwhile to configure the sequence to cache some >>> number of entries so that they are pre-allocated and stored in memory >>> for each session (e.g. - for each connection) for quicker access. See >>> the documentation for "create sequence" for more details... I think that would be worthwhile. Thanks for the input, folks. ------------------ Geoff Caplan Vario Software Ltd (+44) 121-515 1154
You could use apache mod_auth_tkt : http://www.openfusion.com.au/labs/mod_auth_tkt/ Its main advantage is that it'll authentify a user, hence your script gets the user ID, which you can use as a key in your session table for instance. Cut & paste for the lazies : mod_auth_tkt is a lightweight cookie-based authentication module for Apache 1.3.x, written in C. It implements a single-signon framework that works across multiple apache instances and multiple machines. The actual authentication is done by a user-supplied CGI or script in whatever language you like (examples are provided in Perl), meaning you can authenticate against any kind of user repository you can access (password files, ldap, databases, etc.) mod_auth_tkt supports inactivity timeouts (including the ability to control how aggressively the ticket is refreshed), the ability to include arbitrary user data within the cookie, configurable cookie names and domains, and token-based access to subsections of a site. mod_auth_tkt works by checking incoming Apache requests for a (user- defined) cookie containing a valid authentication ticket. The ticket is checked by generating an MD5 checksum for the username and any (optional) user data from the ticket together with the requesting IP address and a shared secret available to the server. If the generated MD5 checksum matches the ticket's checksum, the ticket is valid and the request is authorised. Requests without a valid ticket are redirected to a configurable URL which is expected to validate the user and generate a ticket for them. This package includes both a sample C executable and a Perl module for generating the cookies; implementations for other environments should be relatively straightforward. > Hi folks > > I'm designing a table to be used for web session management. If all > goes well with the project, the table should have 100,000+ records and > be getting hammered with SELECTS, INSERTS and UPDATES. > > The table will need a technical key. The question is, what is the most > efficient way to do this? > > a) Generate a random 24 character string in the application. Very > quick for the INSERTs, but will the longer key slow down the the > SELECTs and UPDATES? > > b) Use a sequence. Faster for the SELECTS and UPDATES, I guess, but > how much will the sequence slow down the INSERTS on a medium sized > record-set? > > There will probably be 6-8 SELECTs & UPDATEs for each INSERT. > > I appreciate that I could set up some tests, but I am under the hammer > time-wise. Some rule-of-thumb advice from the list would be most > welcome. > > ------------------ > Geoff Caplan > Vario Software Ltd > (+44) 121-515 1154 > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Pierre-Frédéric, PFC> You could use apache mod_auth_tkt : PFC> http://www.openfusion.com.au/labs/mod_auth_tkt/ I think their own description of "lightweight" is a fair summary of mod_auth. My own approach needs to be a more security conscious. Secure web sessions is an area that deserves more attention. The only good source I know is: http://cookies.lcs.mit.edu/pubs/webauth.html The ease with which the MIT team were able to compromise so many leading corporate sites is sobering. My own approach is mainly a blend of the MIT ideas, the Yahoo ideas reported on the the latest version of the MIT paper, and the OpenACS approach: http://openacs.org/doc/openacs-5-1/security-design.html But this is a bit OT here. If you want to carry on with this, perhaps you could contact me off list? ------------------ Geoff Caplan Vario Software Ltd (+44) 121-515 1154