Re: How to guard PostgreSQL from overcharged and/or malicious (read-only) queries? - Mailing list pgsql-general

From Stefan Keller
Subject Re: How to guard PostgreSQL from overcharged and/or malicious (read-only) queries?
Date
Msg-id AANLkTi=UfofqQ_j6bnG=YFE+C=88q_ck0z-H=Ekk8kXX@mail.gmail.com
Whole thread Raw
In response to Re: How to guard PostgreSQL from overcharged and/or malicious (read-only) queries?  (Alban Hertroys <dalroi@solfertje.student.utwente.nl>)
Responses Re: How to guard PostgreSQL from overcharged and/or malicious (read-only) queries?  (Alban Hertroys <dalroi@solfertje.student.utwente.nl>)
List pgsql-general
Hi Alban

Many thanks for your answers.

You answered:
>> 1. Filter out all SQL commands which are *not* read-only (no DROP
> Most people do this using permissions.

Oh, yes: forgot to mention that; that's obvious. What I also looked
for was the PL/pgSQL's "EXECUTE command-string".

>> 2. Get the estimated time (units) from PostgreSQL planner in a
>> reliable way (if possible standard/ANSI).

Ok; again keep in mind that I have a read-only database. Therefore the
statistics should be up-to-date (after a vacuum analyse). What I have
in mind is exposing the database to the 'public' for exercising and
testing in a way similar to the following (try a query like "SELECT
ST_AsText(ST_GeometryFromText('POINT(8.5 47.7)', 4326));"):
http://openlayers-buch.de/beispiele/chapter-09/postgis-terminal.html

See below my attempt to write such a function I called
"secure_execute(text)". It's still not functioning and I have
indicated two problems there. What do you think?
I like the idea letting abandon the query if it's obviously(!) wrong
or if the planner alerts me about very high costs?
Or should I rather abandon the idea of such a function and simply rely
on read-only privileges and a session statement_timeout?

Yours, S.

--
-- Executes a query. Aborts if it contains ";" or does take too long.
-- Returns: TABLE
--
DROP FUNCTION secure_execute(text);
--
CREATE OR REPLACE FUNCTION secure_execute(text)
RETURNS SETOF real  -- << PROBLEM 1: Want to return the resultset of
the query here as table (SETOF RECORD?) .
LANGUAGE 'plpgsql' STRICT
AS $$
DECLARE
  query text := $1;
  cost_estimate_txt text;
  max_cost_estimate integer;
  rec RECORD;
BEGIN
  -- Abort if ";" is in query
  -- tbd.

  -- Get max_cost_estimate:
  EXECUTE 'EXPLAIN' || ' ' || query INTO cost_estimate_txt;
  SET statement_timeout TO DEFAULT;
  -- txt example: 'Function Scan on generate_series id
(cost=0.00..12.50 rows=1000 width=0)'
  max_cost_estimate := round(CAST(substring(cost_estimate_txt,50,6) AS
numeric),0);

  -- Execute query (abort if too time consuming)!
  IF max_cost_estimate < 100 THEN  -- in units (production config.:
depends on machine)
    SET statement_timeout TO 10; -- in ms (production config.: set to
a minute = 60000ms)
    EXECUTE $1 INTO rec;
    SET statement_timeout TO DEFAULT;
  END IF;
  RETURN; -- << PROBLEM 2: want to return "rec" here.

  -- Error handling: Catch all
  EXCEPTION WHEN OTHERS THEN
    SET statement_timeout TO DEFAULT;
    RAISE NOTICE 'ERROR';
    RETURN;
END;
$$
-- Test (positive):
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS id');
-- Test (not ok):
SELECT secure_execute('SELECT random() FROM generate_series(1, 100000)
AS id');  -- timeout
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS
id;'SELECT * FROM generate_series(1, 100)); -- two commands
SELECT secure_execute('DROP TABLE IF EXISTS dummy');  -- malicious!
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS
id;DROP TABLE IF EXISTS dummy'); -- two commands, one malicious


2010/12/20 Alban Hertroys <dalroi@solfertje.student.utwente.nl>:
> On 20 Dec 2010, at 10:05, Stefan Keller wrote:
>
>> I'd like to guard postgres from overcharged and/or malicious queries.
>>
>> The queries are strinctly read-only (from a SQL users perspective).
>> For doing this I'd like to code two functions (preferrably pl/pgsql):
>>
>> 1. Filter out all SQL commands which are *not* read-only (no
>> DROP/DELETE/UPDATE/TRUNCATE).
>
> Most people do this using permissions.
>
>> 2. Get the estimated time (units) from PostgreSQL planner in a
>> reliable way (if possible standard/ANSI).
>
>
> I don't think there's a way to do that directly, not without hacking the source.
>
> What you can do is to have all users go through a SECURITY DEFINER type of function that does this for them. That
functioncan then read the output of EXPLAIN <query> for its estimates. Those aren't exactly times, but cost estimates.
Theactual time taken depends on your hardware, you would need to do some measurements to see how planned costs and
actualtime relate. 
>
> I'm not sure this is a good idea though.
> Firstly, testing the query plan adds a little overhead to every query coming in. It's not a lot, but if someone fires
alot of small fast queries it could become a problem. You would be hurting the people who're using your database
correctly,instead of the people who're "abusing" it. 
>
> Secondly, you could achieve a similar effect by limiting the amount of time a query is allowed to run. I'm pretty
surethere are configuration options that cause long-running queries to get killed after a set time. 
>
> Thirdly... Reliable estimates??? Lol!
> Seriously, as hard as Postgres tries to keep statistics that make sense, I don't think they can ever be considered
entirelyreliable. You may not be vacuuming frequently enough, your statistics target may be too small or your data
mightnot be suitable for statistical analysis (random numbers and very unbalanced distributions are good examples). 
> Therefore, if you plan to rely on the estimated time a query takes, you're going to be wrong sometimes.
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:1205,4d0f4177802651300117526!
>
>
>

pgsql-general by date:

Previous
From: Dario Beraldi
Date:
Subject: Can the query planner create indexes?
Next
From: Gabriele Bartolini
Date:
Subject: Re: Can the query planner create indexes?