Thread: How to guard PostgreSQL from overcharged and/or malicious (read-only) queries?

How to guard PostgreSQL from overcharged and/or malicious (read-only) queries?

From

Stefan Keller

Date:

20 December 2010, 08:06:06

I'd like to guard postgres from overcharged and/or malicious queries.

The queries are strinctly read-only (from a SQL users perspective).
For doing this I'd like to code two functions (preferrably pl/pgsql):

1. Filter out all SQL commands which are *not* read-only (no
DROP/DELETE/UPDATE/TRUNCATE).
2. Get the estimated time (units) from PostgreSQL planner in a
reliable way (if possible standard/ANSI).

Can someone give me some hints?

Yours, S.

Re: How to guard PostgreSQL from overcharged and/or malicious (read-only) queries?

From

Alban Hertroys

Date:

20 December 2010, 10:44:10

On 20 Dec 2010, at 10:05, Stefan Keller wrote:

> I'd like to guard postgres from overcharged and/or malicious queries.
>
> The queries are strinctly read-only (from a SQL users perspective).
> For doing this I'd like to code two functions (preferrably pl/pgsql):
>
> 1. Filter out all SQL commands which are *not* read-only (no
> DROP/DELETE/UPDATE/TRUNCATE).

Most people do this using permissions.

> 2. Get the estimated time (units) from PostgreSQL planner in a
> reliable way (if possible standard/ANSI).

I don't think there's a way to do that directly, not without hacking the source.

What you can do is to have all users go through a SECURITY DEFINER type of function that does this for them. That
functioncan then read the output of EXPLAIN <query> for its estimates. Those aren't exactly times, but cost estimates.
Theactual time taken depends on your hardware, you would need to do some measurements to see how planned costs and
actualtime relate. 

I'm not sure this is a good idea though.
Firstly, testing the query plan adds a little overhead to every query coming in. It's not a lot, but if someone fires a
lotof small fast queries it could become a problem. You would be hurting the people who're using your database
correctly,instead of the people who're "abusing" it. 

Secondly, you could achieve a similar effect by limiting the amount of time a query is allowed to run. I'm pretty sure
thereare configuration options that cause long-running queries to get killed after a set time. 

Thirdly... Reliable estimates??? Lol!
Seriously, as hard as Postgres tries to keep statistics that make sense, I don't think they can ever be considered
entirelyreliable. You may not be vacuuming frequently enough, your statistics target may be too small or your data
mightnot be suitable for statistical analysis (random numbers and very unbalanced distributions are good examples). 
Therefore, if you plan to rely on the estimated time a query takes, you're going to be wrong sometimes.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d0f4181802653553761881!

libpq ASYNC with PQgetResult and PQisBusy

From

Raimon Fernandez

Date:

20 December 2010, 12:19:59

Hello,


I'm trying to create a plugin using the libpq.

Almost everything is working, and now I want to implememt the asynchronous issue.

I send the SQL using the PQsendQuery, and my interface is not blocking, great.

Now, everytime I check fot the PQgetResult my interface gets blocked.

So, now I'm using the PQisBusy to check if postgre is still busy and I can safely call the PQgetResult wihtout
blocking,or just wait *some time* before sending a new PQisBusy. 

Before every PQisBusy i call PQconsumeInput to update the status.

So, in pseudo code:


1. PQsendQuery (a really slow select just to check the asyncronous)

2. From a timer every 0.2 seconds, I call:
2.1 PQconsumeInput
2.2 PQisBusy
2.3 evaluate => if it's busy => sleep and start again from 2 ; if it's not busy, continue
2.4 call PQgetResult

Using PQisBusy it's not working, it's taking really longer to just send the 0 (non-busy) and at this moment the
PQgetResultis null. 

If I force to call the PQgetResult after just one second of the PQsendQuery I can get the PQgetResult, without testing
thePQisBusy. 


here is my montxPG_isBusy

static long montxPG_isBusy()

{    int execStatus;
    int consumeeVar;

    consumeeVar = PQconsumeInput(gPGconn);

    if (consumeeVar == 0) return (long) PGRES_FATAL_ERROR;

        execStatus = PQisBusy(gPGconn);

    return (long) execStatus;

}


thanks,

regards,

r.

Re: libpq ASYNC with PQgetResult and PQisBusy

From

Tom Lane

Date:

20 December 2010, 16:48:23

Raimon Fernandez <coder@montx.com> writes:
> Almost everything is working, and now I want to implememt the asynchronous issue.

> I send the SQL using the PQsendQuery, and my interface is not blocking, great.

> Now, everytime I check fot the PQgetResult my interface gets blocked.

Well, yes.  PQgetResult says wait for a result and return it.

> So, now I'm using the PQisBusy to check if postgre is still busy and I can safely call the PQgetResult wihtout
blocking,or just wait *some time* before sending a new PQisBusy. 

Your proposed code is still a busy-wait loop.  What you should be doing
is waiting for some data to arrive on the socket.  Once you see
read-ready on the socket, call PQconsumeInput, then check PQisBusy to
see if the query is complete or not.  If not, go back to waiting on the
socket.  Typically you'd use select() or poll() to watch for both data
on libpq's socket and whatever other events your app is interested in.

> here is my montxPG_isBusy

> static long montxPG_isBusy()

> {    int execStatus;
>     int consumeeVar;

>     consumeeVar = PQconsumeInput(gPGconn);

>     if (consumeeVar == 0) return (long) PGRES_FATAL_ERROR;

>         execStatus = PQisBusy(gPGconn);

>     return (long) execStatus;

> }

This code seems a bit confused.  PQisBusy returns a bool (1/0), not a
value of ExecStatusType.

            regards, tom lane

Re: libpq ASYNC with PQgetResult and PQisBusy

From

Raimon Fernandez

Date:

20 December 2010, 19:49:55

On 20dic, 2010, at 18:48 , Tom Lane wrote:

>> So, now I'm using the PQisBusy to check if postgre is still busy and I can safely call the PQgetResult wihtout
blocking,or just wait *some time* before sending a new PQisBusy. 
>
> Your proposed code is still a busy-wait loop.

This is how are made all the examples I've found.

Last year I was implementing the FE/BE protocol v3 and there I was using what you proposed, a TCP/Socket with events,
noloops and no waits, just events. 

> What you should be doing is waiting for some data to arrive on the socket.

where I have to wait, in a function inside my plugin or from the framework that uses my plugin ?

> Once you see
> read-ready on the socket, call PQconsumeInput, then check PQisBusy to
> see if the query is complete or not.  If not, go back to waiting on the
> socket.  Typically you'd use select() or poll() to watch for both data
> on libpq's socket and whatever other events your app is interested in.

Here is what I've found:

extern int    PQsocket(const PGconn *conn);

There are some examples in the postgreSQL documentation:

/*
* Sleep untilsomething happens on the connection. We use select(2)
* to wait for input, but you could also use poll() or similar
* facilities.
*/

int    sock;
fd_set    input_mask;

sock = PQsocket(conn);

if (sock < 0) break; /* shouldn’t happen */

FD_ZERO(&input_mask);
FD_SET(sock, &input_mask);

if (select(sock + 1, &input_mask, NULL, NULL, NULL) < 0) {

    fprintf(stderr, "select() failed: %s\n", strerror(errno));
    exit_nicely(conn);
}

/* Now check for input */
PQconsumeInput(conn);

while ((notify = PQnotifies(conn)) != NULL) {

    fprintf(stderr, "ASYNC NOTIFY of ’%s’ received from backend pid %d\n", notify->relname, notify->be_pid);
    PQfreemem(notify);
}

The select(2) that says that are using for wait is this line ? if (select(sock + 1, &input_mask, NULL, NULL, NULL) < 0)
{

I can't see where is 'sleeping'

and the approach you are refering, is the only way to non-block the plugin calls and postgreSQL ?

>> here is my montxPG_isBusy
>
>> static long montxPG_isBusy()
>
>> {    int execStatus;
>>     int consumeeVar;
>
>>     consumeeVar = PQconsumeInput(gPGconn);
>
>>     if (consumeeVar == 0) return (long) PGRES_FATAL_ERROR;
>
>>         execStatus = PQisBusy(gPGconn);
>
>>     return (long) execStatus;
>
>> }
>
> This code seems a bit confused.  PQisBusy returns a bool (1/0), not a
> value of ExecStatusType.

yes, here the execStatus is the name of the int, and yes, I know, a bad name ...

thanks again,

regards,

r.

Re: libpq ASYNC with PQgetResult and PQisBusy

From

Alban Hertroys

Date:

20 December 2010, 22:56:31

On 20 Dec 2010, at 21:49, Raimon Fernandez wrote:

> The select(2) that says that are using for wait is this line ? if (select(sock + 1, &input_mask, NULL, NULL, NULL) <
0){ 
>
> I can't see where is 'sleeping'

See man 2 select.
If you're on an OS without manual pages (Windows is pretty much the only exception I know of), there are plenty of
onlineversions of man pages available. I'd suggest looking at the FreeBSD ones, as in my experience they tend to be
prettykeen on proper documentation. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d0fed20802654673819298!

Re: libpq ASYNC with PQgetResult and PQisBusy

From

Raimon Fernandez

Date:

21 December 2010, 06:12:06

On 21dic, 2010, at 00:56 , Alban Hertroys wrote:

> On 20 Dec 2010, at 21:49, Raimon Fernandez wrote:
>
>> The select(2) that says that are using for wait is this line ? if (select(sock + 1, &input_mask, NULL, NULL, NULL) <
0){ 
>>
>> I can't see where is 'sleeping'
>
>
> See man 2 select.
> If you're on an OS without manual pages (Windows is pretty much the only exception I know of), there are plenty of
onlineversions of man pages available. I'd suggest looking at the FreeBSD ones, as in my experience they tend to be
prettykeen on proper documentation. 

thanks, I didn't know this was from the OS level.

I'm reading the documentation now ...

regards,

r.

Re: libpq ASYNC with PQgetResult and PQisBusy

From

Raimon Fernandez

Date:

21 December 2010, 08:06:49

hi again,

On 20dic, 2010, at 18:48 , Tom Lane wrote:

>> So, now I'm using the PQisBusy to check if postgre is still busy and I can safely call the PQgetResult wihtout
blocking,or just wait *some time* before sending a new PQisBusy. 
>
> Your proposed code is still a busy-wait loop.  What you should be doing
> is waiting for some data to arrive on the socket.  Once you see
> read-ready on the socket, call PQconsumeInput, then check PQisBusy to
> see if the query is complete or not.  If not, go back to waiting on the
> socket.  Typically you'd use select() or poll() to watch for both data
> on libpq's socket and whatever other events your app is interested in.

As I'm doing this as an excercise and how knows, I'm still playing with my previous approach, the same question but in
adifferent way: 

Why the PQisBusy is telling me that it's still busy if I send the PQgetResult and obtain them at the same moment ?

Now I'm not in a loop, just send the PQsendQuery from a button, and then, I just click on another button that simply
checksfor the PQconsumeInput and PQisBusy, and I'm clickin on it each second, and always it's returning busy, but if I
sendthe PQgetResult I get it. 

So, why libpq it's not updating it's status ?

thanks again,

r.

Re: How to guard PostgreSQL from overcharged and/or malicious (read-only) queries?

From

Stefan Keller

Date:

21 December 2010, 08:57:41

Hi Alban

Many thanks for your answers.

You answered:
>> 1. Filter out all SQL commands which are *not* read-only (no DROP
> Most people do this using permissions.

Oh, yes: forgot to mention that; that's obvious. What I also looked
for was the PL/pgSQL's "EXECUTE command-string".

>> 2. Get the estimated time (units) from PostgreSQL planner in a
>> reliable way (if possible standard/ANSI).

Ok; again keep in mind that I have a read-only database. Therefore the
statistics should be up-to-date (after a vacuum analyse). What I have
in mind is exposing the database to the 'public' for exercising and
testing in a way similar to the following (try a query like "SELECT
ST_AsText(ST_GeometryFromText('POINT(8.5 47.7)', 4326));"):
http://openlayers-buch.de/beispiele/chapter-09/postgis-terminal.html

See below my attempt to write such a function I called
"secure_execute(text)". It's still not functioning and I have
indicated two problems there. What do you think?
I like the idea letting abandon the query if it's obviously(!) wrong
or if the planner alerts me about very high costs?
Or should I rather abandon the idea of such a function and simply rely
on read-only privileges and a session statement_timeout?

Yours, S.

--
-- Executes a query. Aborts if it contains ";" or does take too long.
-- Returns: TABLE
--
DROP FUNCTION secure_execute(text);
--
CREATE OR REPLACE FUNCTION secure_execute(text)
RETURNS SETOF real  -- << PROBLEM 1: Want to return the resultset of
the query here as table (SETOF RECORD?) .
LANGUAGE 'plpgsql' STRICT
AS $$
DECLARE
  query text := $1;
  cost_estimate_txt text;
  max_cost_estimate integer;
  rec RECORD;
BEGIN
  -- Abort if ";" is in query
  -- tbd.

  -- Get max_cost_estimate:
  EXECUTE 'EXPLAIN' || ' ' || query INTO cost_estimate_txt;
  SET statement_timeout TO DEFAULT;
  -- txt example: 'Function Scan on generate_series id
(cost=0.00..12.50 rows=1000 width=0)'
  max_cost_estimate := round(CAST(substring(cost_estimate_txt,50,6) AS
numeric),0);

  -- Execute query (abort if too time consuming)!
  IF max_cost_estimate < 100 THEN  -- in units (production config.:
depends on machine)
    SET statement_timeout TO 10; -- in ms (production config.: set to
a minute = 60000ms)
    EXECUTE $1 INTO rec;
    SET statement_timeout TO DEFAULT;
  END IF;
  RETURN; -- << PROBLEM 2: want to return "rec" here.

  -- Error handling: Catch all
  EXCEPTION WHEN OTHERS THEN
    SET statement_timeout TO DEFAULT;
    RAISE NOTICE 'ERROR';
    RETURN;
END;
$$
-- Test (positive):
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS id');
-- Test (not ok):
SELECT secure_execute('SELECT random() FROM generate_series(1, 100000)
AS id');  -- timeout
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS
id;'SELECT * FROM generate_series(1, 100)); -- two commands
SELECT secure_execute('DROP TABLE IF EXISTS dummy');  -- malicious!
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS
id;DROP TABLE IF EXISTS dummy'); -- two commands, one malicious


2010/12/20 Alban Hertroys <dalroi@solfertje.student.utwente.nl>:
> On 20 Dec 2010, at 10:05, Stefan Keller wrote:
>
>> I'd like to guard postgres from overcharged and/or malicious queries.
>>
>> The queries are strinctly read-only (from a SQL users perspective).
>> For doing this I'd like to code two functions (preferrably pl/pgsql):
>>
>> 1. Filter out all SQL commands which are *not* read-only (no
>> DROP/DELETE/UPDATE/TRUNCATE).
>
> Most people do this using permissions.
>
>> 2. Get the estimated time (units) from PostgreSQL planner in a
>> reliable way (if possible standard/ANSI).
>
>
> I don't think there's a way to do that directly, not without hacking the source.
>
> What you can do is to have all users go through a SECURITY DEFINER type of function that does this for them. That
functioncan then read the output of EXPLAIN <query> for its estimates. Those aren't exactly times, but cost estimates.
Theactual time taken depends on your hardware, you would need to do some measurements to see how planned costs and
actualtime relate. 
>
> I'm not sure this is a good idea though.
> Firstly, testing the query plan adds a little overhead to every query coming in. It's not a lot, but if someone fires
alot of small fast queries it could become a problem. You would be hurting the people who're using your database
correctly,instead of the people who're "abusing" it. 
>
> Secondly, you could achieve a similar effect by limiting the amount of time a query is allowed to run. I'm pretty
surethere are configuration options that cause long-running queries to get killed after a set time. 
>
> Thirdly... Reliable estimates??? Lol!
> Seriously, as hard as Postgres tries to keep statistics that make sense, I don't think they can ever be considered
entirelyreliable. You may not be vacuuming frequently enough, your statistics target may be too small or your data
mightnot be suitable for statistical analysis (random numbers and very unbalanced distributions are good examples). 
> Therefore, if you plan to rely on the estimated time a query takes, you're going to be wrong sometimes.
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:1205,4d0f4177802651300117526!
>
>
>

Re: How to guard PostgreSQL from overcharged and/or malicious (read-only) queries?

From

Alban Hertroys

Date:

21 December 2010, 09:57:42

On 21 Dec 2010, at 10:57, Stefan Keller wrote:

> You answered:
>>> 1. Filter out all SQL commands which are *not* read-only (no DROP
>> Most people do this using permissions.
>
> Oh, yes: forgot to mention that; that's obvious. What I also looked
> for was the PL/pgSQL's "EXECUTE command-string".

I'm not sure what you're getting at here, but if you're saying that you have to catch SQL commands called from EXECUTE
separatelyI think you're wrong. I wouldn't expect Postgres to not apply permissions in such cases. A simple test-case
canprove that. 

>>> 2. Get the estimated time (units) from PostgreSQL planner in a
>>> reliable way (if possible standard/ANSI).
>
> Ok; again keep in mind that I have a read-only database. Therefore the
> statistics should be up-to-date (after a vacuum analyse).

Up-to-date? Probably. Correct or adequate? That depends, as I pointed out in my previous message.

> See below my attempt to write such a function I called
> "secure_execute(text)". It's still not functioning and I have
> indicated two problems there. What do you think?

Your problems seem to stem from a lack of experience with set-returning functions. You best look up the documentation
forthose, it explains it better than I could. Make sure you look at the docs for the versions of Postgres that you're
usingor expect to use, as there's a relatively new feature in this domain returning a set as a table. 

> I like the idea letting abandon the query if it's obviously(!) wrong
> or if the planner alerts me about very high costs?
> Or should I rather abandon the idea of such a function and simply rely
> on read-only privileges and a session statement_timeout?

Although I don't think what you're attempting is wrong in any way, I'd first see whether it's necessary to do so. You
canuse the built-in features (permissions & statement_timeout) and see whether that's adequate for your use-case. If
it'snot, then it's time to look into tightening things up. 

Of course, having some experience with the solution through experimentation can't hurt if you can afford to.

What you're doing would by many on this list be pointed out as "premature optimisation", although that usually involves
queryperformance ;) 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.

!DSPAM:737,4d10881c802651631920626!