Thread: text column constraint, newbie question
Hi, I'm new to both PostgreSQL and web-based application development; I read the FAQ at postgresql.org (perhaps this discussion group has another FAQ that I haven't found yet?) and didn't see this addressed. I'm creating a table with a column of type text, to be used in a php web application, where I'll be accepting user input for that text field. Are there characters, maybe non-printing characters, or perhaps even whole phrases, that could cause problems in my database or application if I were to allow users to enter them into that column? If so, does anyone happen to have a regular expression handy that you think is a good choice for text columns' CHECK constraint? Or maybe a link to a discussion of this topic? Thanks!
On Sat, Mar 21, 2009 at 11:13 PM, RebeccaJ <rebeccaj@gmail.com> wrote: > Hi, > > I'm new to both PostgreSQL and web-based application development; I > read the FAQ at postgresql.org (perhaps this discussion group has > another FAQ that I haven't found yet?) and didn't see this addressed. > > I'm creating a table with a column of type text, to be used in a php > web application, where I'll be accepting user input for that text > field. Are there characters, maybe non-printing characters, or perhaps > even whole phrases, that could cause problems in my database or > application if I were to allow users to enter them into that column? > > If so, does anyone happen to have a regular expression handy that you > think is a good choice for text columns' CHECK constraint? Or maybe a > link to a discussion of this topic? Nope, there's nothing you can put into a text to break pgsql. However, if you are using regular old queries, you'd be advised to use pg_escape_string() function in php to prevent SQL injection attacks.
> > Are there characters, maybe non-printing characters, or perhaps > > even whole phrases, that could cause problems in my database or > > application if I were to allow users to enter them into that column? > > > If so, does anyone happen to have a regular expression handy that you > > think is a good choice for text columns' CHECK constraint? Or maybe a > > link to a discussion of this topic? > > Nope, there's nothing you can put into a text to break pgsql. > However, if you are using regular old queries, you'd be advised to use > pg_escape_string() function in php to prevent SQL injection attacks. Thanks! I'll check out pg_escape_string() in php, and I see that PostgreSQL also has something called PQescapeStringConn... I wonder if I should use both... Also, I should have asked: what about char and varchar fields? Can those also handle any characters, as long as I consider SQL injection attacks?
On Sun, Mar 22, 2009 at 11:36 AM, RebeccaJ <rebeccaj@gmail.com> wrote: >> > Are there characters, maybe non-printing characters, or perhaps >> > even whole phrases, that could cause problems in my database or >> > application if I were to allow users to enter them into that column? >> >> > If so, does anyone happen to have a regular expression handy that you >> > think is a good choice for text columns' CHECK constraint? Or maybe a >> > link to a discussion of this topic? >> >> Nope, there's nothing you can put into a text to break pgsql. >> However, if you are using regular old queries, you'd be advised to use >> pg_escape_string() function in php to prevent SQL injection attacks. > > Thanks! I'll check out pg_escape_string() in php, and I see that > PostgreSQL also has something called PQescapeStringConn... I wonder if > I should use both... Isn't PGescapeStringConn a libpq function? I'm pretty sure that php's pg_escape_string is just calling that for you, so no need to use both. > Also, I should have asked: what about char and varchar fields? Can > those also handle any characters, as long as I consider SQL injection > attacks? ayup. As long as they're legal for your encoding, they'll go right in. If you wanna stuff in anything no matter the encoding, use a database initialized for SQL_ASCII encoding.
You should use pg_query_params() rather than build a SQL statement in your code, to prevent SQL injection attacks. Also, if you are going to read this data back out and show it on a web page you probably should make sure there is no rogue HTML or JavaScript or anything in there with htmlentities() or somesuch. RebeccaJ wrote: >>> Are there characters, maybe non-printing characters, or perhaps >>> even whole phrases, that could cause problems in my database or >>> application if I were to allow users to enter them into that column? >>> If so, does anyone happen to have a regular expression handy that you >>> think is a good choice for text columns' CHECK constraint? Or maybe a >>> link to a discussion of this topic? >> Nope, there's nothing you can put into a text to break pgsql. >> However, if you are using regular old queries, you'd be advised to use >> pg_escape_string() function in php to prevent SQL injection attacks. > > Thanks! I'll check out pg_escape_string() in php, and I see that > PostgreSQL also has something called PQescapeStringConn... I wonder if > I should use both... > > Also, I should have asked: what about char and varchar fields? Can > those also handle any characters, as long as I consider SQL injection > attacks? >
On Mon, Mar 23, 2009 at 12:59 AM, Stephen Cook <sclists@gmail.com> wrote: > You should use pg_query_params() rather than build a SQL statement in your > code, to prevent SQL injection attacks. Also, if you are going to read this > data back out and show it on a web page you probably should make sure there > is no rogue HTML or JavaScript or anything in there with htmlentities() or > somesuch. Are you saying pg_quer_params is MORE effective than pg_escape_string at deflecting SQL injection attacks?
On Mon, Mar 23, 2009 at 3:07 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > Are you saying pg_quer_params is MORE effective than pg_escape_string > at deflecting SQL injection attacks? pg_query_params() will protect non-strings. For instance, read a number in from user input and do something of the form " and foo=$my_number". Even if you escape the string, an attacker doesn't need a ' to close a string, so he can manage injection. If it's " and foo=$1" using pg_query_params(), however, that's not possible. -- - David T. Wilson david.t.wilson@gmail.com
On Mon, 23 Mar 2009 01:07:18 -0600 Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, Mar 23, 2009 at 12:59 AM, Stephen Cook <sclists@gmail.com> > wrote: > > You should use pg_query_params() rather than build a SQL > > statement in your code, to prevent SQL injection attacks. Also, > > if you are going to read this data back out and show it on a web > > page you probably should make sure there is no rogue HTML or > > JavaScript or anything in there with htmlentities() or somesuch. > > Are you saying pg_quer_params is MORE effective than > pg_escape_string at deflecting SQL injection attacks? I didn't follow the thread from the beginning but I'd say yes. It should avoid queueing multiple statements and it is a more "general" method that let you pass parameters in one shot in spite of building the string a bit at a time for every parameter you insert (string, float, integer...). Of course if you correctly escape/cast/whatever everything injecting 2 statements shouldn't be possible... but if you don't you give more freedom to the attacker. $sql='select * from '.$table.' where a=$1 and $b=$2'; //oops I made a mistake. $result=db_query_params($sql,array(1,'bonzo')); If $table is external input and an attacker pass existingtable; delete from othertable; -- The attack may just result in a DOS if existingtable is there but your othertable shouldn't be wiped. untested... but I recall pg_query and pg_query_params use different C calls PGexec vs. PGexecParams and the later "Unlike PQexec, PQexecParams allows at most one SQL command in the given string." http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html I think pg_query_params should make a difference between floats and integers and signal an error if you pass float where integers are expected... but I'm not sure. Not really a security concern, but an early warning for some mistake. -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Mon, Mar 23, 2009 at 2:33 AM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: > On Mon, 23 Mar 2009 01:07:18 -0600 > Scott Marlowe <scott.marlowe@gmail.com> wrote: > >> On Mon, Mar 23, 2009 at 12:59 AM, Stephen Cook <sclists@gmail.com> >> wrote: >> > You should use pg_query_params() rather than build a SQL >> > statement in your code, to prevent SQL injection attacks. Also, >> > if you are going to read this data back out and show it on a web >> > page you probably should make sure there is no rogue HTML or >> > JavaScript or anything in there with htmlentities() or somesuch. >> >> Are you saying pg_quer_params is MORE effective than >> pg_escape_string at deflecting SQL injection attacks? > > I didn't follow the thread from the beginning but I'd say yes. > It should avoid queueing multiple statements and it is a more > "general" method that let you pass parameters in one shot in spite > of building the string a bit at a time for every parameter you > insert (string, float, integer...). > > Of course if you correctly escape/cast/whatever everything injecting > 2 statements shouldn't be possible... but if you don't you give more > freedom to the attacker. > > $sql='select * from '.$table.' where a=$1 and $b=$2'; //oops I made > a mistake. > $result=db_query_params($sql,array(1,'bonzo')); > > If $table is external input and an attacker pass > existingtable; delete from othertable; -- > > The attack may just result in a DOS if existingtable is there but > your othertable shouldn't be wiped. > > untested... but I recall pg_query and pg_query_params use different C > calls PGexec vs. PGexecParams and the later "Unlike PQexec, > PQexecParams allows at most one SQL command in the given string." > > http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html > > I think pg_query_params should make a difference between floats and > integers and signal an error if you pass float where integers are > expected... but I'm not sure. > Not really a security concern, but an early warning for some mistake. So, what are the performance implications? Do both methods get planned / perform the same on the db side?
On Mon, 23 Mar 2009 03:30:09 -0600 Scott Marlowe <scott.marlowe@gmail.com> wrote: > > I think pg_query_params should make a difference between floats > > and integers and signal an error if you pass float where > > integers are expected... but I'm not sure. > > Not really a security concern, but an early warning for some > > mistake. > So, what are the performance implications? Do both methods get > planned / perform the same on the db side? I don't think there is any appreciable advantage. Maybe all the stuff ala fprint perform better in C rather than building up a string concatenating and escaping in php. Still I wouldn't consider it a first source of slowdown. For making a difference in plan management you've to use another family of functions pg_prepare/pg_execute. I'm not an expert but not every time caching plans is what you'd like to do. -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Mon, Mar 23, 2009 at 03:30:09AM -0600, Scott Marlowe wrote: > On Mon, Mar 23, 2009 at 2:33 AM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: > > On Mon, 23 Mar 2009 01:07:18 -0600 Scott Marlowe <scott.marlowe@gmail.com> wrote: > >> Are you saying pg_quer_params is MORE effective than > >> pg_escape_string at deflecting SQL injection attacks? > > > > I didn't follow the thread from the beginning but I'd say yes. > > It should avoid queueing multiple statements and it is a more > > "general" method that let you pass parameters in one shot in spite > > of building the string a bit at a time for every parameter you > > insert (string, float, integer...). > > > > Of course if you correctly escape/cast/whatever everything injecting > > 2 statements shouldn't be possible... but if you don't you give more > > freedom to the attacker. > > So, what are the performance implications? Do both methods get > planned / perform the same on the db side? Isn't the main point that it's just easier to get things right if you use something that ends up calling PGexecParams under the hood rather than doing your own string interpolation? The frequency of SQL injection attacks[1,2,3,4] of people who really should know better suggests that we're (i.e. developers en masse) not very reliable at doing things properly and hence APIs that default to safety are "a good thing". It's always easy as a developer to say "oops, didn't think about that" when you're debugging, but if that oops has just resulted in the compromise of details of a hundred thousand credit-cards then it becomes a somewhat more serious issue. Of course there are reasons for doing things differently, it's just that those should be special cases (i.e. performance hacks) and not the norm. Admittedly, using something like PGexecParams is a more awkward; but there are efforts to get decent string interpolation libraries going that "just work". For example, the caja project has developed something they call "Secure String Interpolation"[5] which looks very neat and tidy. It would be cool if things like this appeared in other languages. -- Sam http://samason.me.uk/ [1] http://www.theregister.co.uk/2009/03/09/telegraph_hack_attack/ [2] http://www.theregister.co.uk/2009/02/27/lottery_website_security_probed/ [3] http://www.theregister.co.uk/2009/02/16/bitdefender_website_breach/ [4] http://www.theregister.co.uk/2009/02/13/f_secure_hack_attack/ [5] http://google-caja.googlecode.com/svn/changes/mikesamuel/string-interpolation-29-Jan-2008/trunk/src/js/com/google/caja/interp/index.html
On Mon, Mar 23, 2009 at 01:07:18AM -0600, Scott Marlowe wrote: > On Mon, Mar 23, 2009 at 12:59 AM, Stephen Cook <sclists@gmail.com> wrote: > > You should use pg_query_params() rather than build a SQL statement > > in your code, to prevent SQL injection attacks. Also, if you are > > going to read this data back out and show it on a web page you > > probably should make sure there is no rogue HTML or JavaScript or > > anything in there with htmlentities() or somesuch. > > Are you saying pg_quer_params is MORE effective than > pg_escape_string at deflecting SQL injection attacks? Yes. Much more. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mar 22, 12:36 pm, scott.marl...@gmail.com (Scott Marlowe) wrote: > ayup. As long as they're legal for your encoding, they'll go right in. > If you wanna stuff in anything no matter the encoding, use a database > initialized for SQL_ASCII encoding. Thanks, everyone, for your contribution to this thread. I'm approaching the database design of my web application differently, now. Before, I was planning to have CHECK constraints in all of my text or char fields, to keep out all semicolons, single quotes, and anything else that looked dangerous. Now I'm thinking that I'll be using htmlentities(), pg_escape_string() and pg_query_params() as safety filters, and otherwise allowing users to store whatever they want to, in the fields where I store/retrieve user input. Scott, your comment above introduced some new concepts to me, and now I'm thinking about foreign language text and other ways to be more flexible. I found this page that talks about encoding: http://www.postgresql.org/docs/8.3/static/multibyte.html And I wonder why you like SQL_ASCII better than UTF8, and whether others have any opinions about those two. (My web server's LC_CTYPE is C, so I can use any character set.) Wouldn't UTF8 allow more characters than SQL_ASCII? Thanks again!
On Mon, Mar 23, 2009 at 3:11 PM, RebeccaJ <rebeccaj@gmail.com> wrote: > Scott, your comment above introduced some new concepts to me, and now > I'm thinking about foreign language text and other ways to be more > flexible. I found this page that talks about encoding: > http://www.postgresql.org/docs/8.3/static/multibyte.html > And I wonder why you like SQL_ASCII better than UTF8, and whether > others have any opinions about those two. (My web server's LC_CTYPE is > C, so I can use any character set.) Wouldn't UTF8 allow more > characters than SQL_ASCII? No, SQL_ASCII will allow anything you wanna put into the database to go in, with no checking. UTF8 will require properly formed and valud UTF characters. Which is better depends a lot on what you're doing. Note that SQL_ASCII is not 8 bit ASCII, it's a name for "anything goes" instead. (Now Cole Porter is running through my head.. :) )
On Mon, 23 Mar 2009 14:11:28 -0700 (PDT) RebeccaJ <rebeccaj@gmail.com> wrote: > now. Before, I was planning to have CHECK constraints in all of my > text or char fields, to keep out all semicolons, single quotes, and > anything else that looked dangerous. Now I'm thinking that I'll be > using htmlentities(), pg_escape_string() and pg_query_params() as check, htmlentities, pg_escape_string and pg_query_params really don't belong to the same family of "functions" and serve very different purposes. simplifying it very much: - check are used to control the quality of data that get stored in the db - htmlentities is about formatting for web output - pg_escape_string is to prepare input for sql and avoiding sql injection - pg_query_params is a relative of pg_escape_string but somehow used differently -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Mar 23, 2009, at 10:11 PM, RebeccaJ wrote: > On Mar 22, 12:36 pm, scott.marl...@gmail.com (Scott Marlowe) wrote: >> ayup. As long as they're legal for your encoding, they'll go right >> in. >> If you wanna stuff in anything no matter the encoding, use a database >> initialized for SQL_ASCII encoding. > > Thanks, everyone, for your contribution to this thread. I'm > approaching the database design of my web application differently, > now. Before, I was planning to have CHECK constraints in all of my > text or char fields, to keep out all semicolons, single quotes, and > anything else that looked dangerous. Now I'm thinking that I'll be > using htmlentities(), pg_escape_string() and pg_query_params() as > safety filters, and otherwise allowing users to store whatever they > want to, in the fields where I store/retrieve user input. Yes indeed. But don't use together: * use pg_escape_string() or pg_query_params() to escape data that goes INTO your database, and * use htmlentities() on data that comes OUT of it, and only once it gets printed to the page. Otherwise you'll end up with data in your database that is hard to search in (there is no collation on html entities in text fields AFAIK, so if someone searches for "é" in your forms it doesn't match "é" in your database) or data in your scripts that is hard to compare (the value from a GET or POST request does not contain entities while the value read and converted from the database does). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,49c81dc9129743370417724!
RebeccaJ wrote: > And I wonder why you like SQL_ASCII better than UTF8, and whether > others have any opinions about those two. (My web server's LC_CTYPE is > C, so I can use any character set.) Wouldn't UTF8 allow more > characters than SQL_ASCII? I've had a LOT of experience dealing with apps that use 8-bit byte strings (like SQL_ASCII `text') to store data, and I've rarely seen one that *doesn't* have text encoding handling bugs. If you store your text as byte streams that don't know, check, or enforce their own encoding you must keep track of the encoding separately - either with another value stored alongside the string, or through your app logic. If you start storing data with multiple different text encodings in the DB, you're most likely to land up tracking down annoying "corrupt text" bugs sooner or later. If, on the other hand, you use UTF-8, you *know* that everything in the database is well-formed UTF-8. You ensure that it is UTF-8 before storing it in the DB and know it'll be UTF-8 coming out. The DB takes care of encoding conversion for you if you ask it to, by setting client_encoding - the only downside being that it'll refuse to return strings that can't be represented in your current client_encoding, like say Cyrillic (Russian etc) text if you're using ISO-8859-1 (latin-1) for your client encoding. Even with a UTF-8 database you must still get your I/O to/from libraries and the rest of the system right, converting UTF-8 text to whatever the system expects or vice versa. Alternately, if you set client_encoding, you must be prepared for cases where the DB can't send you what you ask for because your encoding can't represent it. All in all, I personally think a UTF-8 database is considerably better for most uses. There are certainly cases where I'd use SQL_ASCII, but not most. -- Craig Ringer
RebeccaJ wrote: > Thanks, everyone, for your contribution to this thread. I'm > approaching the database design of my web application differently, > now. Before, I was planning to have CHECK constraints in all of my > text or char fields, to keep out all semicolons, single quotes, and > anything else that looked dangerous. Now I'm thinking that I'll be > using htmlentities(), pg_escape_string() and pg_query_params() as > safety filters, and otherwise allowing users to store whatever they > want to, in the fields where I store/retrieve user input. Note that htmlentities() expects LATIN1-encoded strings and is thus unusable on UTF-8 contents. So if you end up talking UTF-8 with the database, you'll probably need to use htmlspecialchars() instead, and UTF-8 as your HTML charset. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
Daniel Verite wrote: > Note that htmlentities() expects LATIN1-encoded strings and is thus > unusable on UTF-8 contents. > So if you end up talking UTF-8 with the database, you'll probably need > to use htmlspecialchars() instead, and UTF-8 as your HTML charset. I believe you are wrong, at least the PHP documentation says otherwise and it _seems_ to work for me (http://us2.php.net/htmlentities). Maybe you are thinking about an older version? Also the iconv() function can help you convert between (some) different character encodings (http://us2.php.net/htmlentities). -- Stephen
Stephen Cook wrote: > Daniel Verite wrote: > > Note that htmlentities() expects LATIN1-encoded strings and is thus > > unusable on UTF-8 contents. > > So if you end up talking UTF-8 with the database, you'll probably need > > to use htmlspecialchars() instead, and UTF-8 as your HTML charset. > > > I believe you are wrong, at least the PHP documentation says otherwise > and it _seems_ to work for me (http://us2.php.net/htmlentities). Maybe > you are thinking about an older version? You're right, I've missed the fact that they added support for other character sets at some point in php4. Now I know :) Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org