Thread: text column constraint, newbie question

text column constraint, newbie question

From
RebeccaJ
Date:
Hi,

I'm new to both PostgreSQL and web-based application development; I
read the FAQ at postgresql.org (perhaps this discussion group has
another FAQ that I haven't found yet?) and didn't see this addressed.

I'm creating a table with a column of type text, to be used in a php
web application, where I'll be accepting user input for that text
field. Are there characters, maybe non-printing characters, or perhaps
even whole phrases, that could cause problems in my database or
application if I were to allow users to enter them into that column?

If so, does anyone happen to have a regular expression handy that you
think is a good choice for text columns' CHECK constraint? Or maybe a
link to a discussion of this topic?

Thanks!

Re: text column constraint, newbie question

From
Scott Marlowe
Date:
On Sat, Mar 21, 2009 at 11:13 PM, RebeccaJ <rebeccaj@gmail.com> wrote:
> Hi,
>
> I'm new to both PostgreSQL and web-based application development; I
> read the FAQ at postgresql.org (perhaps this discussion group has
> another FAQ that I haven't found yet?) and didn't see this addressed.
>
> I'm creating a table with a column of type text, to be used in a php
> web application, where I'll be accepting user input for that text
> field. Are there characters, maybe non-printing characters, or perhaps
> even whole phrases, that could cause problems in my database or
> application if I were to allow users to enter them into that column?
>
> If so, does anyone happen to have a regular expression handy that you
> think is a good choice for text columns' CHECK constraint? Or maybe a
> link to a discussion of this topic?

Nope, there's nothing you can put into a text to break pgsql.
However, if you are using regular old queries, you'd be advised to use
pg_escape_string() function in php to prevent SQL injection attacks.

Re: text column constraint, newbie question

From
RebeccaJ
Date:
> >  Are there characters, maybe non-printing characters, or perhaps
> > even whole phrases, that could cause problems in my database or
> > application if I were to allow users to enter them into that column?
>
> > If so, does anyone happen to have a regular expression handy that you
> > think is a good choice for text columns' CHECK constraint? Or maybe a
> > link to a discussion of this topic?
>
> Nope, there's nothing you can put into a text to break pgsql.
> However, if you are using regular old queries, you'd be advised to use
> pg_escape_string() function in php to prevent SQL injection attacks.

Thanks! I'll check out pg_escape_string() in php, and I see that
PostgreSQL also has something called PQescapeStringConn... I wonder if
I should use both...

Also, I should have asked: what about char and varchar fields? Can
those also handle any characters, as long as I consider SQL injection
attacks?

Re: text column constraint, newbie question

From
Scott Marlowe
Date:
On Sun, Mar 22, 2009 at 11:36 AM, RebeccaJ <rebeccaj@gmail.com> wrote:
>> >  Are there characters, maybe non-printing characters, or perhaps
>> > even whole phrases, that could cause problems in my database or
>> > application if I were to allow users to enter them into that column?
>>
>> > If so, does anyone happen to have a regular expression handy that you
>> > think is a good choice for text columns' CHECK constraint? Or maybe a
>> > link to a discussion of this topic?
>>
>> Nope, there's nothing you can put into a text to break pgsql.
>> However, if you are using regular old queries, you'd be advised to use
>> pg_escape_string() function in php to prevent SQL injection attacks.
>
> Thanks! I'll check out pg_escape_string() in php, and I see that
> PostgreSQL also has something called PQescapeStringConn... I wonder if
> I should use both...

Isn't PGescapeStringConn a libpq function?  I'm pretty sure that php's
pg_escape_string is just calling that for you, so no need to use both.

> Also, I should have asked: what about char and varchar fields? Can
> those also handle any characters, as long as I consider SQL injection
> attacks?

ayup. As long as they're legal for your encoding, they'll go right in.
 If you wanna stuff in anything no matter the encoding, use a database
initialized for SQL_ASCII encoding.

Re: text column constraint, newbie question

From
Stephen Cook
Date:
You should use pg_query_params() rather than build a SQL statement in
your code, to prevent SQL injection attacks. Also, if you are going to
read this data back out and show it on a web page you probably should
make sure there is no rogue HTML or JavaScript or anything in there with
htmlentities() or somesuch.


RebeccaJ wrote:
>>>  Are there characters, maybe non-printing characters, or perhaps
>>> even whole phrases, that could cause problems in my database or
>>> application if I were to allow users to enter them into that column?
>>> If so, does anyone happen to have a regular expression handy that you
>>> think is a good choice for text columns' CHECK constraint? Or maybe a
>>> link to a discussion of this topic?
>> Nope, there's nothing you can put into a text to break pgsql.
>> However, if you are using regular old queries, you'd be advised to use
>> pg_escape_string() function in php to prevent SQL injection attacks.
>
> Thanks! I'll check out pg_escape_string() in php, and I see that
> PostgreSQL also has something called PQescapeStringConn... I wonder if
> I should use both...
>
> Also, I should have asked: what about char and varchar fields? Can
> those also handle any characters, as long as I consider SQL injection
> attacks?
>

Re: text column constraint, newbie question

From
Scott Marlowe
Date:
On Mon, Mar 23, 2009 at 12:59 AM, Stephen Cook <sclists@gmail.com> wrote:
> You should use pg_query_params() rather than build a SQL statement in your
> code, to prevent SQL injection attacks. Also, if you are going to read this
> data back out and show it on a web page you probably should make sure there
> is no rogue HTML or JavaScript or anything in there with htmlentities() or
> somesuch.

Are you saying pg_quer_params is MORE effective than pg_escape_string
at deflecting SQL injection attacks?

Re: text column constraint, newbie question

From
David Wilson
Date:
On Mon, Mar 23, 2009 at 3:07 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> Are you saying pg_quer_params is MORE effective than pg_escape_string
> at deflecting SQL injection attacks?

pg_query_params() will protect non-strings. For instance, read a
number in from user input and do something of the form " and
foo=$my_number". Even if you escape the string, an attacker doesn't
need a ' to close a string, so he can manage injection. If it's " and
foo=$1" using pg_query_params(), however, that's not possible.

--
- David T. Wilson
david.t.wilson@gmail.com

Re: text column constraint, newbie question

From
Ivan Sergio Borgonovo
Date:
On Mon, 23 Mar 2009 01:07:18 -0600
Scott Marlowe <scott.marlowe@gmail.com> wrote:

> On Mon, Mar 23, 2009 at 12:59 AM, Stephen Cook <sclists@gmail.com>
> wrote:
> > You should use pg_query_params() rather than build a SQL
> > statement in your code, to prevent SQL injection attacks. Also,
> > if you are going to read this data back out and show it on a web
> > page you probably should make sure there is no rogue HTML or
> > JavaScript or anything in there with htmlentities() or somesuch.
>
> Are you saying pg_quer_params is MORE effective than
> pg_escape_string at deflecting SQL injection attacks?

I didn't follow the thread from the beginning but I'd say yes.
It should avoid queueing multiple statements and it is a more
"general" method that let you pass parameters in one shot in spite
of building the string a bit at a time for every parameter you
insert (string, float, integer...).

Of course if you correctly escape/cast/whatever everything injecting
2 statements shouldn't be possible... but if you don't you give more
freedom to the attacker.

$sql='select * from '.$table.' where a=$1 and $b=$2'; //oops I made
a mistake.
$result=db_query_params($sql,array(1,'bonzo'));

If $table is external input and an attacker pass
existingtable; delete from othertable; --

The attack may just result in a DOS if existingtable is there but
your othertable shouldn't be wiped.

untested... but I recall pg_query and pg_query_params use different C
calls PGexec vs. PGexecParams and the later "Unlike PQexec,
PQexecParams allows at most one SQL command in the given string."

http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html

I think pg_query_params should make a difference between floats and
integers and signal an error if you pass float where integers are
expected... but I'm not sure.
Not really a security concern, but an early warning for some mistake.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: text column constraint, newbie question

From
Scott Marlowe
Date:
On Mon, Mar 23, 2009 at 2:33 AM, Ivan Sergio Borgonovo
<mail@webthatworks.it> wrote:
> On Mon, 23 Mar 2009 01:07:18 -0600
> Scott Marlowe <scott.marlowe@gmail.com> wrote:
>
>> On Mon, Mar 23, 2009 at 12:59 AM, Stephen Cook <sclists@gmail.com>
>> wrote:
>> > You should use pg_query_params() rather than build a SQL
>> > statement in your code, to prevent SQL injection attacks. Also,
>> > if you are going to read this data back out and show it on a web
>> > page you probably should make sure there is no rogue HTML or
>> > JavaScript or anything in there with htmlentities() or somesuch.
>>
>> Are you saying pg_quer_params is MORE effective than
>> pg_escape_string at deflecting SQL injection attacks?
>
> I didn't follow the thread from the beginning but I'd say yes.
> It should avoid queueing multiple statements and it is a more
> "general" method that let you pass parameters in one shot in spite
> of building the string a bit at a time for every parameter you
> insert (string, float, integer...).
>
> Of course if you correctly escape/cast/whatever everything injecting
> 2 statements shouldn't be possible... but if you don't you give more
> freedom to the attacker.
>
> $sql='select * from '.$table.' where a=$1 and $b=$2'; //oops I made
> a mistake.
> $result=db_query_params($sql,array(1,'bonzo'));
>
> If $table is external input and an attacker pass
> existingtable; delete from othertable; --
>
> The attack may just result in a DOS if existingtable is there but
> your othertable shouldn't be wiped.
>
> untested... but I recall pg_query and pg_query_params use different C
> calls PGexec vs. PGexecParams and the later "Unlike PQexec,
> PQexecParams allows at most one SQL command in the given string."
>
> http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html
>
> I think pg_query_params should make a difference between floats and
> integers and signal an error if you pass float where integers are
> expected... but I'm not sure.
> Not really a security concern, but an early warning for some mistake.

So, what are the performance implications?  Do both methods get
planned / perform the same on the db side?

Re: text column constraint, newbie question

From
Ivan Sergio Borgonovo
Date:
On Mon, 23 Mar 2009 03:30:09 -0600
Scott Marlowe <scott.marlowe@gmail.com> wrote:

> > I think pg_query_params should make a difference between floats
> > and integers and signal an error if you pass float where
> > integers are expected... but I'm not sure.
> > Not really a security concern, but an early warning for some
> > mistake.

> So, what are the performance implications?  Do both methods get
> planned / perform the same on the db side?

I don't think there is any appreciable advantage. Maybe all the
stuff ala fprint perform better in C rather than building up a
string concatenating and escaping in php.
Still I wouldn't consider it a first source of slowdown.

For making a difference in plan management you've to use another
family of functions pg_prepare/pg_execute.

I'm not an expert but not every time caching plans is what you'd like
to do.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: text column constraint, newbie question

From
Sam Mason
Date:
On Mon, Mar 23, 2009 at 03:30:09AM -0600, Scott Marlowe wrote:
> On Mon, Mar 23, 2009 at 2:33 AM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
> > On Mon, 23 Mar 2009 01:07:18 -0600 Scott Marlowe <scott.marlowe@gmail.com> wrote:
> >> Are you saying pg_quer_params is MORE effective than
> >> pg_escape_string at deflecting SQL injection attacks?
> >
> > I didn't follow the thread from the beginning but I'd say yes.
> > It should avoid queueing multiple statements and it is a more
> > "general" method that let you pass parameters in one shot in spite
> > of building the string a bit at a time for every parameter you
> > insert (string, float, integer...).
> >
> > Of course if you correctly escape/cast/whatever everything injecting
> > 2 statements shouldn't be possible... but if you don't you give more
> > freedom to the attacker.
>
> So, what are the performance implications?  Do both methods get
> planned / perform the same on the db side?

Isn't the main point that it's just easier to get things right if you
use something that ends up calling PGexecParams under the hood rather
than doing your own string interpolation?

The frequency of SQL injection attacks[1,2,3,4] of people who really
should know better suggests that we're (i.e. developers en masse) not
very reliable at doing things properly and hence APIs that default to
safety are "a good thing".  It's always easy as a developer to say
"oops, didn't think about that" when you're debugging, but if that oops
has just resulted in the compromise of details of a hundred thousand
credit-cards then it becomes a somewhat more serious issue.

Of course there are reasons for doing things differently, it's just that
those should be special cases (i.e. performance hacks) and not the norm.

Admittedly, using something like PGexecParams is a more awkward; but
there are efforts to get decent string interpolation libraries going
that "just work".  For example, the caja project has developed something
they call "Secure String Interpolation"[5] which looks very neat and
tidy.  It would be cool if things like this appeared in other languages.

--
  Sam  http://samason.me.uk/

 [1] http://www.theregister.co.uk/2009/03/09/telegraph_hack_attack/
 [2] http://www.theregister.co.uk/2009/02/27/lottery_website_security_probed/
 [3] http://www.theregister.co.uk/2009/02/16/bitdefender_website_breach/
 [4] http://www.theregister.co.uk/2009/02/13/f_secure_hack_attack/
 [5]
http://google-caja.googlecode.com/svn/changes/mikesamuel/string-interpolation-29-Jan-2008/trunk/src/js/com/google/caja/interp/index.html

Re: text column constraint, newbie question

From
David Fetter
Date:
On Mon, Mar 23, 2009 at 01:07:18AM -0600, Scott Marlowe wrote:
> On Mon, Mar 23, 2009 at 12:59 AM, Stephen Cook <sclists@gmail.com> wrote:
> > You should use pg_query_params() rather than build a SQL statement
> > in your code, to prevent SQL injection attacks. Also, if you are
> > going to read this data back out and show it on a web page you
> > probably should make sure there is no rogue HTML or JavaScript or
> > anything in there with htmlentities() or somesuch.
>
> Are you saying pg_quer_params is MORE effective than
> pg_escape_string at deflecting SQL injection attacks?

Yes.  Much more.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: text column constraint, newbie question

From
RebeccaJ
Date:
On Mar 22, 12:36 pm, scott.marl...@gmail.com (Scott Marlowe) wrote:
> ayup. As long as they're legal for your encoding, they'll go right in.
>  If you wanna stuff in anything no matter the encoding, use a database
> initialized for SQL_ASCII encoding.

Thanks, everyone, for your contribution to this thread. I'm
approaching the database design of my web application differently,
now. Before, I was planning to have CHECK constraints in all of my
text or char fields, to keep out all semicolons, single quotes, and
anything else that looked dangerous. Now I'm thinking that I'll be
using htmlentities(), pg_escape_string() and pg_query_params() as
safety filters, and otherwise allowing users to store whatever they
want to, in the fields where I store/retrieve user input.

Scott, your comment above introduced some new concepts to me, and now
I'm thinking about foreign language text and other ways to be more
flexible. I found this page that talks about encoding:
http://www.postgresql.org/docs/8.3/static/multibyte.html
And I wonder why you like SQL_ASCII better than UTF8, and whether
others have any opinions about those two. (My web server's LC_CTYPE is
C, so I can use any character set.) Wouldn't UTF8 allow more
characters than SQL_ASCII?

Thanks again!

Re: text column constraint, newbie question

From
Scott Marlowe
Date:
On Mon, Mar 23, 2009 at 3:11 PM, RebeccaJ <rebeccaj@gmail.com> wrote:
> Scott, your comment above introduced some new concepts to me, and now
> I'm thinking about foreign language text and other ways to be more
> flexible. I found this page that talks about encoding:
> http://www.postgresql.org/docs/8.3/static/multibyte.html
> And I wonder why you like SQL_ASCII better than UTF8, and whether
> others have any opinions about those two. (My web server's LC_CTYPE is
> C, so I can use any character set.) Wouldn't UTF8 allow more
> characters than SQL_ASCII?

No, SQL_ASCII will allow anything you wanna put into the database to
go in, with no checking.  UTF8 will require properly formed and valud
UTF characters.  Which is better depends a lot on what you're doing.
Note that SQL_ASCII is not 8 bit ASCII, it's a name for "anything
goes" instead. (Now Cole Porter is running through my head.. :) )

Re: text column constraint, newbie question

From
Ivan Sergio Borgonovo
Date:
On Mon, 23 Mar 2009 14:11:28 -0700 (PDT)
RebeccaJ <rebeccaj@gmail.com> wrote:

> now. Before, I was planning to have CHECK constraints in all of my
> text or char fields, to keep out all semicolons, single quotes, and
> anything else that looked dangerous. Now I'm thinking that I'll be
> using htmlentities(), pg_escape_string() and pg_query_params() as

check, htmlentities, pg_escape_string and pg_query_params really
don't belong to the same family of "functions" and serve very
different purposes.

simplifying it very much:
- check are used to control the quality of data that get stored in
  the db
- htmlentities is about formatting for web output
- pg_escape_string is to prepare input for sql and avoiding sql
  injection
- pg_query_params is a relative of pg_escape_string but somehow used
  differently

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: text column constraint, newbie question

From
Alban Hertroys
Date:
On Mar 23, 2009, at 10:11 PM, RebeccaJ wrote:

> On Mar 22, 12:36 pm, scott.marl...@gmail.com (Scott Marlowe) wrote:
>> ayup. As long as they're legal for your encoding, they'll go right
>> in.
>> If you wanna stuff in anything no matter the encoding, use a database
>> initialized for SQL_ASCII encoding.
>
> Thanks, everyone, for your contribution to this thread. I'm
> approaching the database design of my web application differently,
> now. Before, I was planning to have CHECK constraints in all of my
> text or char fields, to keep out all semicolons, single quotes, and
> anything else that looked dangerous. Now I'm thinking that I'll be
> using htmlentities(), pg_escape_string() and pg_query_params() as
> safety filters, and otherwise allowing users to store whatever they
> want to, in the fields where I store/retrieve user input.


Yes indeed.

But don't use together:
* use pg_escape_string() or pg_query_params() to escape data that goes
INTO your database, and
* use htmlentities() on data that comes OUT of it, and only once it
gets printed to the page.

Otherwise you'll end up with data in your database that is hard to
search in (there is no collation on html entities in text fields
AFAIK, so if someone searches for "é" in your forms it doesn't match
"é" in your database) or data in your scripts that is hard to
compare (the value from a GET or POST request does not contain
entities while the value read and converted from the database does).

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,49c81dc9129743370417724!



Re: text column constraint, newbie question

From
Craig Ringer
Date:
RebeccaJ wrote:

> And I wonder why you like SQL_ASCII better than UTF8, and whether
> others have any opinions about those two. (My web server's LC_CTYPE is
> C, so I can use any character set.) Wouldn't UTF8 allow more
> characters than SQL_ASCII?

I've had a LOT of experience dealing with apps that use 8-bit byte
strings (like SQL_ASCII `text') to store data, and I've rarely seen one
that *doesn't* have text encoding handling bugs.

If you store your text as byte streams that don't know, check, or
enforce their own encoding you must keep track of the encoding
separately - either with another value stored alongside the string, or
through your app logic.

If you start storing data with multiple different text encodings in the
DB, you're most likely to land up tracking down annoying "corrupt text"
bugs sooner or later.

If, on the other hand, you use UTF-8, you *know* that everything in the
database is well-formed UTF-8. You ensure that it is UTF-8 before
storing it in the DB and know it'll be UTF-8 coming out. The DB takes
care of encoding conversion for you if you ask it to, by setting
client_encoding - the only downside being that it'll refuse to return
strings that can't be represented in your current client_encoding, like
say Cyrillic (Russian etc) text if you're using ISO-8859-1 (latin-1) for
your client encoding.

Even with a UTF-8 database you must still get your I/O to/from libraries
and the rest of the system right, converting UTF-8 text to whatever the
system expects or vice versa. Alternately, if you set client_encoding,
you must be prepared for cases where the DB can't send you what you ask
for because your encoding can't represent it.

All in all, I personally think a UTF-8 database is considerably better
for most uses. There are certainly cases where I'd use SQL_ASCII, but
not most.

--
Craig Ringer

Re: text column constraint, newbie question

From
"Daniel Verite"
Date:
    RebeccaJ wrote:

> Thanks, everyone, for your contribution to this thread. I'm
> approaching the database design of my web application differently,
> now. Before, I was planning to have CHECK constraints in all of my
> text or char fields, to keep out all semicolons, single quotes, and
> anything else that looked dangerous. Now I'm thinking that I'll be
> using htmlentities(), pg_escape_string() and pg_query_params() as
> safety filters, and otherwise allowing users to store whatever they
> want to, in the fields where I store/retrieve user input.

Note that htmlentities() expects LATIN1-encoded strings and is thus
unusable on UTF-8 contents.
So if you end up talking UTF-8 with the database, you'll probably need
to use htmlspecialchars() instead, and UTF-8 as your HTML charset.

Best regards,

--
 Daniel
 PostgreSQL-powered mail user agent and storage:
 http://www.manitou-mail.org

Re: text column constraint, newbie question

From
Stephen Cook
Date:
Daniel Verite wrote:
> Note that htmlentities() expects LATIN1-encoded strings and is thus
> unusable on UTF-8 contents.
> So if you end up talking UTF-8 with the database, you'll probably need
> to use htmlspecialchars() instead, and UTF-8 as your HTML charset.


I believe you are wrong, at least the PHP documentation says otherwise
and it _seems_ to work for me (http://us2.php.net/htmlentities). Maybe
you are thinking about an older version?

Also the iconv() function can help you convert between (some) different
character encodings (http://us2.php.net/htmlentities).

-- Stephen


Re: text column constraint, newbie question

From
"Daniel Verite"
Date:
    Stephen Cook wrote:

> Daniel Verite wrote:
> > Note that htmlentities() expects LATIN1-encoded strings and is thus

> > unusable on UTF-8 contents.
> > So if you end up talking UTF-8 with the database, you'll probably
need
> > to use htmlspecialchars() instead, and UTF-8 as your HTML charset.
>
>
> I believe you are wrong, at least the PHP documentation says
otherwise
> and it _seems_ to work for me (http://us2.php.net/htmlentities).
Maybe
> you are thinking about an older version?

You're right, I've missed the fact that they added support for other
character sets at some point in php4. Now I know :)

Best regards,

--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org