Thread: Dynamically filtering a CTE?
I have a slow ‘WITH RECURSIVE’ CTE like: CREATE VIEW ancestors AS WITH RECURSIVE _ancestors(descendant, ancestors) AS ( SELECT item.id AS id, ARRAY[item.ancestor_id] AS ancestors FROM items AS item UNION ALL SELECT child.id AS id child.ancestors || ancestor.ancestor_id AS ancestors FROM _ancestors AS child JOIN items as ancestor ON child.ancestors[array_length(child.ancestors, 1)] = ancestor.id ) SELECT * FROM _ancestors WHERE child.ancestors[array_length(child.ancestors, 1)] IS NULL; I'll usually only need a few rows, so I'm being bitten by PostgreSQL's CTE optimization fence [1]. I'm looking to optimize it by limiting the dynamically limiting the number of rows in the initial query, as if it had been: WITH RECURSIVE _ancestors(id, ancestors) AS ( SELECT item.id AS id, ARRAY[item.ancestor_id] AS ancestors FROM items AS item WHERE {your condition here} UNION ALL … ) My initial thought was to create a function which accepted a WITH clause as an argument. Something like: CREATE OR REPLACE FUNCTION ancestors(condition) RETURNS TABLE(id integer, ancestors integer[]) AS $$ WITH RECURSIVE _ancestors(id, ancestors) AS ( SELECT item.id AS id, ARRAY[item.ancestor_id] AS ancestors FROM items AS item WHERE condition UNION ALL … ) … $$ LANGUAGE SQL; or with ‘WHERE condition(item)’. But I couldn't find a way to define an argument that was a where condition [2] or a record→boolean function [3]. I could probably use PREPARE/EXECUTE [4] to dynamically construct the WHERE statement, but that looks like it may have its own optimization issues and there's no way to stash it for use in subsequent sessions. Perhaps a function to run the PREPARE? Is there an idiomatic way to approach this problem? Thanks, Trevor [1]: https://www.postgresql.org/message-id/201209191305.44674.db@kavod.com [2]: https://www.postgresql.org/docs/10/static/sql-select.html#SQL-WHERE [3]: https://www.postgresql.org/docs/10/static/sql-createfunction.html [4]: https://www.postgresql.org/docs/10/static/sql-prepare.html -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
Attachment
On Thursday, April 19, 2018, W. Trevor King <wking@tremily.us> wrote:
Is there
an idiomatic way to approach this problem?
I would use pl/pgsql as the language and build a query using a combination of text literals and the format() function - invoking via pl/pgsql's EXECUTE command.
David J.
On Thu, Apr 19, 2018 at 05:28:00PM -0700, David G. Johnston wrote: > On Thursday, April 19, 2018, W. Trevor King wrote: > > Is there an idiomatic way to approach this problem? > > I would use pl/pgsql as the language and build a query using a > combination of text literals and the format() function - invoking > via pl/pgsql's EXECUTE command. That works. I've ended up with: CREATE OR REPLACE FUNCTION ancestors(condition text) RETURNS TABLE(id integer, ancestors integer[]) AS $$ BEGIN RETURN QUERY EXECUTE format(' WITH RECURSIVE _ancestors(id, ancestors) AS ( SELECT item.id AS id, ARRAY[item.ancestor_id] AS ancestors FROM items AS item %s UNION ALL SELECT descendant.id AS id, descendant.ancestors || ancestor.ancestor_id AS ancestors FROM _ancestors AS descendant JOIN items as ancestor ON descendant.ancestors[array_length(descendant.ancestors, 1)] = ancestor.id ) SELECT id, ancestors[1:array_length(ancestors, 1) - 1] AS ancestors -- drop the trailing NULL FROM _ancestors WHERE ancestors[array_length(ancestors, 1)] IS NULL -- remove non-terminal recursion ', condition); END $$ LANGUAGE plpgsql STABLE; which you can use like: SELECT * FROM ancestors('WHERE item.id = 62324721'); or (without filtering, for the full, slow CTE): SELECT * FROM ancestors(''); Thanks, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
Attachment
format('
WITH RECURSIVE _ancestors(id, ancestors) AS (
SELECT
item.id AS id,
ARRAY[item.ancestor_id] AS ancestors
FROM items AS item
%s[...]
', condition);
SELECT * FROM ancestors('WHERE item.id = 62324721');
Just keep in mind that this opens up a huge SQL-injection hole in your database. Depending on how its called you might want to validation the input text for both whitelist and blacklist items before executing it.
David J.
On Fri, Apr 20, 2018 at 09:33:22AM -0700, David G. Johnston wrote: > On Fri, Apr 20, 2018 at 9:22 AM, W. Trevor King wrote: > > format(' > > WITH RECURSIVE _ancestors(id, ancestors) AS ( > > SELECT > > item.id AS id, > > ARRAY[item.ancestor_id] AS ancestors > > FROM items AS item > > %s > > [...] > > > > ', condition); > > > > SELECT * FROM ancestors('WHERE item.id = 62324721'); > > Just keep in mind that this opens up a huge SQL-injection hole in > your database. Depending on how its called you might want to > validation the input text for both whitelist and blacklist items > before executing it. I'm not calling it on user-supplied conditions, but yeah, if I were it would certainly need some guards. Unfortunately, neither format [1] nor USING [2,3] seem to have auto-quoting for “make sure this is just a WHERE condition [4] without side-effects” ;). I think we'd need a WHERE-condition data type to support that, just like we'd need a WHERE-condition data type (or a function data type) to support my initial idea [5]: CREATE OR REPLACE FUNCTION ancestors(condition WHERE-condition-type) RETURNS TABLE(id integer, ancestors integer[]) AS $$ WITH RECURSIVE _ancestors(id, ancestors) AS ( SELECT item.id AS id, ARRAY[item.ancestor_id] AS ancestors FROM items AS item WHERE condition -- or, with a function type, condition(item) UNION ALL … ) … $$ LANGUAGE SQL; And even if you had a WHERE-condition data type, enforcing the no-side-effects constraint would be tricky. Things like blacklisting condition text with semicolons, etc. might help against unintentional typos, although they seem too easily avoided to be relied on against potentially malicious user input. Parsing the condition text as WHERE-clause SQL to look for dangerous constructs might be strong enough, but seems like a lot of work and something I'm likely to get wrong if I tried ;). For now, I'm just making sure I'm not allowing untrusted users to provide condition text. Thanks, Trevor [1]: https://www.postgresql.org/docs/10/static/functions-string.html#FUNCTIONS-STRING-FORMAT [2]: https://www.postgresql.org/docs/10/static/plpgsql-control-structures.html#id-1.8.8.8.3.4 [3]: https://www.postgresql.org/docs/10/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN [4]: https://www.postgresql.org/docs/10/static/sql-select.html#SQL-WHERE [5]: https://www.postgresql.org/message-id/20180420000055.GL27577%40valgrind.us -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy