Thread: Automating access grants

Automating access grants

From
"Kynn Jones"
Date:
I need advice, and a lot of it, from those with more experience, since
I have practically none.  Please, let me know your opinion on the
following.

We have an in-house Postgres database that we would like to make
publicly accessible via a password-less login (user: anonymous).  (We
already have a web front-end for this database, but we have had a lot
of requests to allow programmatic access in a way that does not
require scraping web pages; FWIW, web scraping of this site is already
disallowed in our TOS.)

The problem is how to make this database publicly accessible in a way
that protects our server from abuse, and at the same time minimizes
the amount of time we have to spend policing it.

One possible solution (that I like) would be to require some sort of
host registration[1] from potential users, so that password-less
connection attempts from only these hosts would be allowed.  The
principal objective here would be to simplify the process of
identifying and neutralizing any abuse to the system.

My thought would be to set up a registration webpage page with
safeguards to prevent robots from registering, maybe perform other
checks (such as ensuring that the email address given is legit), and
spool the registration requests for processing.  (BTW, does anyone
know of free software to do this?)

Now, supposing we have a fresh batch of host registration requests
that have passed all the filters we may impose on them (i.e. they have
been "approved" somehow).  How best to automate the process of
granting access to these host?  I suppose that the script/program in
charge of this could, in principle, update the pg_hba.conf file, and
bounce the server with a suitable "kill -HUP", but I'm queasy about
such unsupervised bouncing of the server.  I could use some words of
wisdom on this topic.

More generally, are we even on the right track here?  Or is the whole
idea of making our database publicly accessible totally foolish?

Are there any examples that I may be able to learn from?

We are specifically trying to avoid, at this initial stage at least,
any solution that would require creating a proxy server for the sole
purpose of authenticating and/or validating requests (e.g. ensuring
that the request include a unique key, etc.)  Our hope is that we may
be able to craft a  solution using only PostgreSQL's standard security
facilities that would be adequate for at least the first several
months of operation, if not much longer.

Many thanks for your thoughts and opinions on this!

kj


[1]  I realize that it may inconvenient for the anonymous users to be
restricted to one or a few hosts, but the kinds of uses of our
database that we consider legitimate should be possible even with this
inconvenience.

Re: Automating access grants

From
Douglas McNaught
Date:
"Kynn Jones" <kynnjo@gmail.com> writes:

> We have an in-house Postgres database that we would like to make
> publicly accessible via a password-less login (user: anonymous).  (We
> already have a web front-end for this database, but we have had a lot
> of requests to allow programmatic access in a way that does not
> require scraping web pages; FWIW, web scraping of this site is already
> disallowed in our TOS.)

Honestly, I would consider writing a web (i.e. SOAP or XML-RPC)
service for this purpose rather than using allowing direct access.
That lets you control what kind of queries can be run.  It's more
work, but much cleaner and more secure.  There are too many ways even
a read-only user can perform a DOS attack.

-Doug

Re: Automating access grants

From
Stephen Frost
Date:
* Kynn Jones (kynnjo@gmail.com) wrote:
> Now, supposing we have a fresh batch of host registration requests
> that have passed all the filters we may impose on them (i.e. they have
> been "approved" somehow).  How best to automate the process of
> granting access to these host?  I suppose that the script/program in
> charge of this could, in principle, update the pg_hba.conf file, and
> bounce the server with a suitable "kill -HUP", but I'm queasy about
> such unsupervised bouncing of the server.  I could use some words of
> wisdom on this topic.

Call the init.d script with 'reload' or call pg_ctl directly with
'reload'.

> More generally, are we even on the right track here?  Or is the whole
> idea of making our database publicly accessible totally foolish?

In general I'd recommend against making the database publicly available
(as in, allowing psql/etc connections on port 5432).  It depends a great
deal on what you're doing too though.

> We are specifically trying to avoid, at this initial stage at least,
> any solution that would require creating a proxy server for the sole
> purpose of authenticating and/or validating requests (e.g. ensuring
> that the request include a unique key, etc.)  Our hope is that we may
> be able to craft a  solution using only PostgreSQL's standard security
> facilities that would be adequate for at least the first several
> months of operation, if not much longer.

One big question I have is, is this completely read-only?  Or is it
read-write?  Or what?  If it's something which is just read-only it
seems to me that it'd make more sense to write something in perl to pull
the data out of the database, put it into a portable format (ala csv or
similar) and provide that as a CGI.  You could then protect that using
alot of different ways- apache htaccess style, something in the perl
code which checks a table in the database, etc.  Things are more
complicated if it's actually read/write. :)  If by 'host' you mean 'IP
address', then you really should probably also get your firewall
involved so that connections aren't even allowed to the PG port unless
they're coming from an approved IP (note that, of course, that doesn't
solve all problems..  MITAs, viruses/hackers, naughty upstreams, etc).

> Many thanks for your thoughts and opinions on this!

Good luck..

    Stephen

Attachment

Re: Automating access grants

From
"Kynn Jones"
Date:
On 3/15/07, Stephen Frost <sfrost@snowman.net> wrote:
> * Kynn Jones (kynnjo@gmail.com) wrote:

> One big question I have is, is this completely read-only?

Sorry, I should have made this clear: the access we had in mind is
strictly read-only, and only a subset of the tables at that.

kj

Re: Automating access grants

From
David Fetter
Date:
On Thu, Mar 15, 2007 at 07:38:25AM -0400, Douglas McNaught wrote:
> "Kynn Jones" <kynnjo@gmail.com> writes:
>
> > We have an in-house Postgres database that we would like to make
> > publicly accessible via a password-less login (user: anonymous).
> > (We already have a web front-end for this database, but we have
> > had a lot of requests to allow programmatic access in a way that
> > does not require scraping web pages; FWIW, web scraping of this
> > site is already disallowed in our TOS.)
>
> Honestly, I would consider writing a web (i.e. SOAP or XML-RPC)
> service for this purpose rather than using allowing direct access.
> That lets you control what kind of queries can be run.  It's more
> work, but much cleaner and more secure.  There are too many ways
> even a read-only user can perform a DOS attack.

Simple example: you allow reads on table foo.  Attacker does:

SELECT * FROM foo f1, foo f2, foo f3, foo f4, foo f5, foo f6, foo f7 ,
foo f8, foo f9, foo f10, foo f11, foo f12, foo f13, foo f14, foo f15,
foo f16, foo f17, foo f18, foo f19, foo f20;

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

Re: Automating access grants

From
Stephen Frost
Date:
* Kynn Jones (kynnjo@gmail.com) wrote:
> On 3/15/07, Stephen Frost <sfrost@snowman.net> wrote:
> >* Kynn Jones (kynnjo@gmail.com) wrote:
>
> >One big question I have is, is this completely read-only?
>
> Sorry, I should have made this clear: the access we had in mind is
> strictly read-only, and only a subset of the tables at that.

Then I would definitely encourage setting up a webpage to provide the
information..  There's no need to grant access to the database directly,
and for that matter it'll probably be easier for your *users* to get the
data in a portable format directly rather than having to install
something which can talk the PG protocol.

    Enjoy,

        Stephen

Attachment

Re: Automating access grants

From
"Kynn Jones"
Date:
I realize that direct access gives an outside user the opportunity to overload the server.  In fact, I am far less worried about malicious DOS-type attacks than I am about plain old incompetence, such as having a buggy script hammer our server with an infinite loop.

BTW, is there a way to configure a PostgreSQL server to abort a query if it takes longer than a certain amount of time, and/or to limit the number of queries allowed per host per unit time (say, per hour)?

That's why registration of a host is mandatory for this access.  Any registered host that violates the TOS gets summarily removed from the allowed hosts list.  (They get a second chance if they convince us that it won't happen again.  No third chance.)

I should point out that the information that we will be serving is readily available from other sources; our service just provides it in a more convenient form.  The data in question is of academic interest only; it has little or no economic value.

At any rate, if we were to do this, we would announce it as an "experimental feature".  If server-overload (whether from malicious attacks, or from inept usage) becomes an intractable problem, we will just retire the service.

That said, for this experimental feature to work at all, it is necessary to have a solid way to automate the granting of access to those servers that request it and meet our conditions.

kj


On 3/15/07, Stephen Frost <sfrost@snowman.net> wrote:
* Kynn Jones (kynnjo@gmail.com) wrote:
> On 3/15/07, Stephen Frost <sfrost@snowman.net> wrote:
> >* Kynn Jones ( kynnjo@gmail.com) wrote:
>
> >One big question I have is, is this completely read-only?
>
> Sorry, I should have made this clear: the access we had in mind is
> strictly read-only, and only a subset of the tables at that.

Then I would definitely encourage setting up a webpage to provide the
information..  There's no need to grant access to the database directly,
and for that matter it'll probably be easier for your *users* to get the
data in a portable format directly rather than having to install
something which can talk the PG protocol.

        Enjoy,

                Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFF+TtHrzgMPqB3kigRAkRNAJ9JeWKQ6y2yjqpRxuHMOxRAtZgMwgCglkO7
KllW1Aa2hyYuIFG7tSspSZY=
=xqHu
-----END PGP SIGNATURE-----


Re: Automating access grants

From
Douglas McNaught
Date:
I would still recommend making it available as a web service rather
than giving direct access; besides the security/load issues, it lets
you change the representation of the data without necessarily
affecting customers (of course, you can do this also at the DB level
with views).  The web service is a little extra work, but it lets you
re-use the data access layer that already exists in your web
application, so it's just a matter of putting a SOAP layer on top of
that.

My general philosophy is to only allow trusted apps/users to hit the
database directly.

-Doug