Thread: Correct escaping of untrusted data

Correct escaping of untrusted data

From
Geoff Caplan
Date:
Hi folks,

The thread on injection attacks was very instructive, but seemed to
run out of steam at an interesting point. Now you guys have kindly
educated me about the real nature of the issues, can I ask again
what effective escaping really means?

Are the standard escaping functions found in the PHP, Tcl etc APIs to
Postgres bombproof? Are there any encodings that might slip through
and be cast to malicious strings inside Postgres? What about functions
like convert(): could they be used to slip something through the
escaping function?

I don't really have enough knowledge in this area to be confident in
the results of my own experiments. Any advice from the more
technically savvy would be much appreciated.

------------------
Geoff Caplan
Vario Software Ltd
(+44) 121-515 1154


Re: Correct escaping of untrusted data

From
Tom Lane
Date:
Geoff Caplan <geoff@variosoft.com> writes:
> Are the standard escaping functions found in the PHP, Tcl etc APIs to
> Postgres bombproof?

I dunno; you'd probably want to look at the source for each one you
planned to use, anyway, if you're being paranoid.  As long as they
escape ' and \ they should be okay.  If your source language allows
embedded nulls (\0) in strings you might want to reject those as well.

> Are there any encodings that might slip through
> and be cast to malicious strings inside Postgres?

All the supported encodings are supersets of ASCII, so I don't think
there is any such risk.  There is a risk in the opposite direction
I think: if the escaping function doesn't know the encoding being used
it might think that one byte of a multibyte character is ' or \ and
try to escape it, thereby breaking the data.  This could not happen in
"sane" encodings like UTF-8, however, just in the one or two Far Eastern
encodings that allow multibyte characters to contain bytes <= 0x7F.

Since you as the application programmer can control what client-side
encoding is used, the simplest answer here is just to be sure you're
using a sane encoding, or at least that the escaping function knows
the encoding you're using.

> What about functions like convert(): could they be used to slip
> something through the escaping function?

Don't see how.  The issue is to be sure that the query string traveling
to the backend will be interpreted the way you expected.  By the time
any server-side function executes it is far too late to change that
interpretation.

            regards, tom lane

Re: Correct escaping of untrusted data

From
Olivier Guilyardi
Date:
Geoff Caplan wrote:

> Are the standard escaping functions found in the PHP, Tcl etc APIs to
> Postgres bombproof? Are there any encodings that might slip through
> and be cast to malicious strings inside Postgres? What about functions
> like convert(): could they be used to slip something through the
> escaping function?

What about writing nessus plugin(s) or a specific scanner for these
escaping issues ? I don't know if a such thing already exists...

--
     Olivier

Re: Correct escaping of untrusted data

From
Geoff Caplan
Date:
Tom,

Belated thanks for the info (I've been away from my desk).

Very helpful.

------------------
Geoff Caplan
Vario Software Ltd
(+44) 121-515 1154


Re: Correct escaping of untrusted data

From
Lincoln Yeoh
Date:
At 11:09 AM 7/31/2004 -0400, Tom Lane wrote:

>All the supported encodings are supersets of ASCII, so I don't think
>there is any such risk.

Is the 7.4.x multibyte support bombproof? How would we avoid problems like
this:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&c2coff=1&safe=off&edition=us&selm=20020502171830J.t-ishii%40sra.co.jp

Summary of that problem: an invalid multibyte character "eats" the
following character.

I know it's fixed already, but is there a way to reduce exposure to such bugs?

>There is a risk in the opposite direction I think: if the escaping
>function doesn't know the encoding being used
>it might think that one byte of a multibyte character is ' or \ and try to
>escape it, thereby breaking the data.

Is the escaping function always consistent with the backend's
interpretation? Is it impossible for them to be inconsistent (e.g. they use
the same code to interpret data).

My concern is if the escaping function thinks one byte of a multibyte is \
but the rest of the backend doesn't then you can end up with an escaped
backslash which does not escape a naughty '...

Also: what is the proper/official way to deal with:

update tablea set data=3-? where a=1;

And the parameter is -1

Somehow ensure it's always like this?
update tablea set data=3 - ? where a=1;

This doesn't seem to be escaped safely for: DBD::Pg 1.22 (3 versions old)
with Postgresql 7.3.4

Would it be best to do the 3-? part in the application and then do update
tablea set data=? where a=1;

Possibly result in less CPU usage at backend too.

Regards,

Link.


Re: Correct escaping of untrusted data

From
Pierre-Frédéric Caillaud
Date:

> Is the 7.4.x multibyte support bombproof? How would we avoid problems
> like this:
>
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&c2coff=1&safe=off&edition=us&selm=20020502171830J.t-ishii%40sra.co.jp

    Well, maybe using UTF-8 encoding would fix this ?

> update tablea set data=3-? where a=1;

    Add parentheses :

> update tablea set data=3-(?) where a=1;

    Or do it in your program... but you can't do this if you have a db field
or function instead of the 3.



Performance critical technical key

From
Geoff Caplan
Date:
Hi folks

I'm designing a table to be used for web session management. If all
goes well with the project, the table should have 100,000+ records and
be getting hammered with SELECTS, INSERTS and UPDATES.

The table will need a technical key. The question is, what is the most
efficient way to do this?

a) Generate a random 24 character string in the application. Very
quick for the INSERTs, but will the longer key slow down the the
SELECTs and UPDATES?

b) Use a sequence. Faster for the SELECTS and UPDATES, I guess, but
how much will the sequence slow down the INSERTS on a medium sized
record-set?

There will probably be 6-8 SELECTs & UPDATEs for each INSERT.

I appreciate that I could set up some tests, but I am under the hammer
time-wise. Some rule-of-thumb advice from the list would be most
welcome.

------------------
Geoff Caplan
Vario Software Ltd
(+44) 121-515 1154


Re: Performance critical technical key

From
Christopher Browne
Date:
After a long battle with technology, geoff@variosoft.com (Geoff Caplan), an earthling, wrote:
> b) Use a sequence. Faster for the SELECTS and UPDATES, I guess, but
> how much will the sequence slow down the INSERTS on a medium sized
> record-set?

Why, in particular, would you expect the sequence to slow down
inserts?  They don't lock the table.

Note that if you're really doing a lot of INSERTs in parallel, you
might find it worthwhile to configure the sequence to cache some
number of entries so that they are pre-allocated and stored in memory
for each session (e.g. - for each connection) for quicker access.  See
the documentation for "create sequence" for more details...
--
output = reverse("gro.gultn" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/x.html
Think of C++ as an object-oriented assembly language.

Re: Performance critical technical key

From
Bruno Wolff III
Date:
On Thu, Aug 12, 2004 at 13:05:45 +0100,
  Geoff Caplan <geoff@variosoft.com> wrote:
>
> b) Use a sequence. Faster for the SELECTS and UPDATES, I guess, but
> how much will the sequence slow down the INSERTS on a medium sized
> record-set?

Using a sequence shouldn't be slow. The main potential problem is that
it will make the session IDs guessible if you don't take any other
steps. That may or may not be a problem. One way around this is to
encrypt the sequence number in the database with a key and use a combination
of the encrypted string and an index for which key is used (this makes
changing keys for new sessions while allowing continued use of an old
key for old sessions) as the session id. You can change the keys as often
as needed and practical for your application.

Re: Performance critical technical key

From
Geoff Caplan
Date:
Bruno Wolff III wrote:

>>> Using a sequence shouldn't be slow.

Thanks - that's the main thing I need to know.

>>> The main potential problem is that it will make the session IDs
>>> guessible if you don't take any other steps. That may or may not
>>> be a problem.

Thanks for the warning, but I won't be using the sequence number as
the session id: as you say, not a safe thing to do. The session record
key persists from session to session: it is used to link sessions with
browsers and with user accounts. The session key will be a random 32
character key generated for each session.

Christopher Browne wrote:

>>> Why, in particular, would you expect the sequence to slow down
>>> inserts?  They don't lock the table.

I was assuming that generating the sequence number was expensive: it
is some other DBs I have used. That was why I was thinking of
providing a unique id via a random string. But a practical test shows
that in PG it is pretty fast, so there is not need.

>>> Note that if you're really doing a lot of INSERTs in parallel, you
>>> might find it worthwhile to configure the sequence to cache some
>>> number of entries so that they are pre-allocated and stored in memory
>>> for each session (e.g. - for each connection) for quicker access.  See
>>> the documentation for "create sequence" for more details...

I think that would be worthwhile.

Thanks for the input, folks.

------------------
Geoff Caplan
Vario Software Ltd
(+44) 121-515 1154


Re: Performance critical technical key

From
Pierre-Frédéric Caillaud
Date:
    You could use apache mod_auth_tkt :
http://www.openfusion.com.au/labs/mod_auth_tkt/

    Its main advantage is that it'll authentify a user, hence your script
gets the user ID, which you can use as a key in your session table for
instance.


    Cut & paste for the lazies :

mod_auth_tkt is a lightweight cookie-based authentication module for
Apache 1.3.x, written in C. It implements a single-signon framework that
works across multiple apache instances and multiple machines. The actual
authentication is done by a user-supplied CGI or script in whatever
language you like (examples are provided in Perl), meaning you can
authenticate against any kind of user repository you can access (password
files, ldap, databases, etc.)

mod_auth_tkt supports inactivity timeouts (including the ability to
control how aggressively the ticket is refreshed), the ability to include
arbitrary user data within the cookie, configurable cookie names and
domains, and token-based access to subsections of a site.

mod_auth_tkt works by checking incoming Apache requests for a (user-
defined) cookie containing a valid authentication ticket. The ticket is
checked by generating an MD5 checksum for the username and any (optional)
user data from the ticket together with the requesting IP address and a
shared secret available to the server. If the generated MD5 checksum
matches the ticket's checksum, the ticket is valid and the request is
authorised. Requests without a valid ticket are redirected to a
configurable URL which is expected to validate the user and generate a
ticket for them. This package includes both a sample C executable and a
Perl module for generating the cookies; implementations for other
environments should be relatively straightforward.



> Hi folks
>
> I'm designing a table to be used for web session management. If all
> goes well with the project, the table should have 100,000+ records and
> be getting hammered with SELECTS, INSERTS and UPDATES.
>
> The table will need a technical key. The question is, what is the most
> efficient way to do this?
>
> a) Generate a random 24 character string in the application. Very
> quick for the INSERTs, but will the longer key slow down the the
> SELECTs and UPDATES?
>
> b) Use a sequence. Faster for the SELECTS and UPDATES, I guess, but
> how much will the sequence slow down the INSERTS on a medium sized
> record-set?
>
> There will probably be 6-8 SELECTs & UPDATEs for each INSERT.
>
> I appreciate that I could set up some tests, but I am under the hammer
> time-wise. Some rule-of-thumb advice from the list would be most
> welcome.
>
> ------------------
> Geoff Caplan
> Vario Software Ltd
> (+44) 121-515 1154
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>



Re: Performance critical technical key

From
Geoff Caplan
Date:
Pierre-Frédéric,

PFC>         You could use apache mod_auth_tkt :
PFC> http://www.openfusion.com.au/labs/mod_auth_tkt/

I think their own description of "lightweight" is a fair summary of
mod_auth.

My own approach needs to be a more security conscious. Secure web
sessions is an area that deserves more attention. The only good source
I know is:

http://cookies.lcs.mit.edu/pubs/webauth.html

The ease with which the MIT team were able to compromise so many
leading corporate sites is sobering.

My own approach is mainly a blend of the MIT ideas, the Yahoo ideas
reported on the the latest version of the MIT paper, and the OpenACS
approach:

http://openacs.org/doc/openacs-5-1/security-design.html

But this is a bit OT here. If you want to carry on with this, perhaps
you could contact me off list?

------------------
Geoff Caplan
Vario Software Ltd
(+44) 121-515 1154