Thread: Dynamically filtering a CTE?

Dynamically filtering a CTE?

From
"W. Trevor King"
Date:
I have a slow ‘WITH RECURSIVE’ CTE like:

  CREATE VIEW ancestors AS
  WITH RECURSIVE _ancestors(descendant, ancestors) AS (
      SELECT
        item.id AS id,
        ARRAY[item.ancestor_id] AS ancestors
      FROM items AS item
    UNION ALL
      SELECT
        child.id AS id
        child.ancestors || ancestor.ancestor_id AS ancestors
      FROM _ancestors AS child
      JOIN items as ancestor
        ON child.ancestors[array_length(child.ancestors, 1)] = ancestor.id
  )
  SELECT *
  FROM _ancestors
  WHERE child.ancestors[array_length(child.ancestors, 1)] IS NULL;

I'll usually only need a few rows, so I'm being bitten by PostgreSQL's
CTE optimization fence [1].  I'm looking to optimize it by limiting
the dynamically limiting the number of rows in the initial query, as
if it had been:

  WITH RECURSIVE _ancestors(id, ancestors) AS (
      SELECT
        item.id AS id,
        ARRAY[item.ancestor_id] AS ancestors
      FROM items AS item
      WHERE {your condition here}
    UNION ALL
      …
  )

My initial thought was to create a function which accepted a WITH
clause as an argument.  Something like:

  CREATE OR REPLACE FUNCTION ancestors(condition)
    RETURNS TABLE(id integer, ancestors integer[]) AS
  $$
    WITH RECURSIVE _ancestors(id, ancestors) AS (
        SELECT
          item.id AS id,
          ARRAY[item.ancestor_id] AS ancestors
        FROM items AS item
        WHERE condition
      UNION ALL
        …
    )
    …
  $$ LANGUAGE SQL;

or with ‘WHERE condition(item)’.  But I couldn't find a way to define
an argument that was a where condition [2] or a record→boolean
function [3].  I could probably use PREPARE/EXECUTE [4] to dynamically
construct the WHERE statement, but that looks like it may have its own
optimization issues and there's no way to stash it for use in
subsequent sessions.  Perhaps a function to run the PREPARE?  Is there
an idiomatic way to approach this problem?

Thanks,
Trevor

[1]: https://www.postgresql.org/message-id/201209191305.44674.db@kavod.com
[2]: https://www.postgresql.org/docs/10/static/sql-select.html#SQL-WHERE
[3]: https://www.postgresql.org/docs/10/static/sql-createfunction.html
[4]: https://www.postgresql.org/docs/10/static/sql-prepare.html

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

Attachment

Re: Dynamically filtering a CTE?

From
"David G. Johnston"
Date:
On Thursday, April 19, 2018, W. Trevor King <wking@tremily.us> wrote:
  Is there
an idiomatic way to approach this problem?


I would use pl/pgsql as the language and build a query using a combination of text literals and the format() function - invoking via pl/pgsql's EXECUTE command.

David J.

Re: Dynamically filtering a CTE?

From
"W. Trevor King"
Date:
On Thu, Apr 19, 2018 at 05:28:00PM -0700, David G. Johnston wrote:
> On Thursday, April 19, 2018, W. Trevor King wrote:
> > Is there an idiomatic way to approach this problem?
>
> I would use pl/pgsql as the language and build a query using a
> combination of text literals and the format() function - invoking
> via pl/pgsql's EXECUTE command.

That works.  I've ended up with:

  CREATE OR REPLACE FUNCTION ancestors(condition text)
    RETURNS TABLE(id integer, ancestors integer[]) AS
  $$
  BEGIN
  RETURN QUERY EXECUTE format('
    WITH RECURSIVE _ancestors(id, ancestors) AS (
        SELECT
          item.id AS id,
          ARRAY[item.ancestor_id] AS ancestors
        FROM items AS item
        %s
      UNION ALL
        SELECT
          descendant.id AS id,
          descendant.ancestors || ancestor.ancestor_id AS ancestors
        FROM _ancestors AS descendant
        JOIN items as ancestor
          ON descendant.ancestors[array_length(descendant.ancestors, 1)] = ancestor.id
    )
    SELECT
      id,
      ancestors[1:array_length(ancestors, 1) - 1] AS ancestors -- drop the trailing NULL
    FROM _ancestors
    WHERE ancestors[array_length(ancestors, 1)] IS NULL -- remove non-terminal recursion
    ', condition);
  END
  $$ LANGUAGE plpgsql STABLE;

which you can use like:

  SELECT * FROM ancestors('WHERE item.id = 62324721');

or (without filtering, for the full, slow CTE):

  SELECT * FROM ancestors('');

Thanks,
Trevor

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

Attachment

Re: Dynamically filtering a CTE?

From
"David G. Johnston"
Date:
On Fri, Apr 20, 2018 at 9:22 AM, W. Trevor King <wking@tremily.us> wrote:
format('
    WITH RECURSIVE _ancestors(id, ancestors) AS (
        SELECT
          item.id AS id,
          ARRAY[item.ancestor_id] AS ancestors
        FROM items AS item
        %s
​[...]​

    ', condition);

  SELECT * FROM ancestors('WHERE item.id = 62324721');

​Just keep in mind that this opens up a huge SQL-injection hole in your database.  Depending on how its called you might want to validation the input text for both whitelist and blacklist items before executing it.

David J.

Re: Dynamically filtering a CTE?

From
"W. Trevor King"
Date:
On Fri, Apr 20, 2018 at 09:33:22AM -0700, David G. Johnston wrote:
> On Fri, Apr 20, 2018 at 9:22 AM, W. Trevor King wrote:
> > format('
> >     WITH RECURSIVE _ancestors(id, ancestors) AS (
> >         SELECT
> >           item.id AS id,
> >           ARRAY[item.ancestor_id] AS ancestors
> >         FROM items AS item
> >         %s
> > ​[...]​
> >
> >     ', condition);
> >
> >   SELECT * FROM ancestors('WHERE item.id = 62324721');
>
> ​Just keep in mind that this opens up a huge SQL-injection hole in
> your database.  Depending on how its called you might want to
> validation the input text for both whitelist and blacklist items
> before executing it.

I'm not calling it on user-supplied conditions, but yeah, if I were it
would certainly need some guards.  Unfortunately, neither format [1]
nor USING [2,3] seem to have auto-quoting for “make sure this is just
a WHERE condition [4] without side-effects” ;).  I think we'd need a
WHERE-condition data type to support that, just like we'd need a
WHERE-condition data type (or a function data type) to support my
initial idea [5]:

  CREATE OR REPLACE FUNCTION ancestors(condition WHERE-condition-type)
      RETURNS TABLE(id integer, ancestors integer[]) AS
    $$
      WITH RECURSIVE _ancestors(id, ancestors) AS (
          SELECT
            item.id AS id,
            ARRAY[item.ancestor_id] AS ancestors
          FROM items AS item
          WHERE condition -- or, with a function type, condition(item)
        UNION ALL
          …
      )
      …
    $$ LANGUAGE SQL;

And even if you had a WHERE-condition data type, enforcing the
no-side-effects constraint would be tricky.

Things like blacklisting condition text with semicolons, etc. might
help against unintentional typos, although they seem too easily
avoided to be relied on against potentially malicious user input.
Parsing the condition text as WHERE-clause SQL to look for dangerous
constructs might be strong enough, but seems like a lot of work and
something I'm likely to get wrong if I tried ;).  For now, I'm just
making sure I'm not allowing untrusted users to provide condition
text.

Thanks,
Trevor

[1]: https://www.postgresql.org/docs/10/static/functions-string.html#FUNCTIONS-STRING-FORMAT
[2]: https://www.postgresql.org/docs/10/static/plpgsql-control-structures.html#id-1.8.8.8.3.4
[3]: https://www.postgresql.org/docs/10/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
[4]: https://www.postgresql.org/docs/10/static/sql-select.html#SQL-WHERE
[5]: https://www.postgresql.org/message-id/20180420000055.GL27577%40valgrind.us

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

Attachment