Thread: A modest proposal vis hierarchical queries: MINUS in the column list

A modest proposal vis hierarchical queries: MINUS in the column list

From

Mark Zellers

Date:

07 June 2021, 18:25:15

One of the friction points I have found in migrating from Oracle to PostgreSQL is in the conversion of hierarchical queries from the Oracle START WITH/CONNECT BY/ORDER SIBLINGS by pattern to using the ANSI recursive subquery form.

Once you wrap your head around it, the ANSI form is not so bad with one major exception. In order to achieve the equivalent of Oracle’s ORDER SIBLINGS BY clause, you need to add an additional column containing an array with the accumulated path back to the root of the hierarchy for each row. The problem with that is that it leaves you with an unfortunate choice: either accept the inefficiency of returning the array with the path back to the client (which the client doesn’t need or want), or requiring the application to explicitly list all of the columns that it wants just to exclude the hierarchy column, which can be hard to maintain, especially if your application needs to support both databases. If you have a ORM model where there could be multiple queries that share the same client code to read the result set, you might have to change multiple queries when new columns are added to a table or view even though you have centralized the processing of the result set.

The ideal solution for this would be for PostgreSQL to support the Oracle syntax and internally convert it to the ANSI form. Failing that, I have a modest suggestion that I would like to start a discussion around. What if you could use the MINUS keyword in the column list of a select statement to remove a column from the result set returned to the client? What I have in mind is something like this:

To achieve the equivalent of the following Oracle query:

      SELECT T.*
          FROM T
       START WITH T.ParentID IS NULL
       CONNECT BY T.ParentID = PRIOR T.ID
      ORDER SIBLINGS BY T.OrderVal

You could use

      WITH RECURSIVE TT AS (
              SELECT T0.*, ARRAY[]::INTEGER[] || T.OrderVal AS Sortable
                 FROM T T0
             UNION ALL
                SELECT T1.*, TT.Sortable || T1 AS Sortable
                   FROM TT
      INNER JOIN T T1 ON (T1.ParentID = TT.ID)
    )
   SELECT TT.* MINUS TT.Sortable
      FROM TT
ORDER BY TT.Sortable

Now the Sortable column can be used to order the result set but is not returned to the client.

Not knowing the internals of the parser, I’m assuming that the use of MINUS in this construct would be distinguishable from the set difference use case because the expression being subtracted is a column (or perhaps even a lst of columns) rather than a SELECT expression.

Re: A modest proposal vis hierarchical queries: MINUS in the column list

From

"David G. Johnston"

Date:

07 June 2021, 21:36:56

On Mon, Jun 7, 2021 at 1:54 PM Mark Zellers <mark.zellers@workday.com> wrote:

Failing that, I have a modest suggestion that I would like to start a discussion around. What if you could use the MINUS keyword in the column list of a select statement to remove a column from the result set returned to the client?

I asked this a decade ago and got no useful responses.

https://www.postgresql.org/message-id/flat/02e901cc2bb4%2476bc2090%24643461b0%24%40yahoo.com#3784fab26b0f946b3239266e3b70a6ce

I will say I've still had the itch to want it occasionally in the years since, though not frequently.

David J.

Re: A modest proposal vis hierarchical queries: MINUS in the column list

From

Tom Lane

Date:

07 June 2021, 22:10:58

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Jun 7, 2021 at 1:54 PM Mark Zellers <mark.zellers@workday.com>
> wrote:
>> What if you could use the MINUS keyword in the column
>> list of a select statement to remove a column from the result set returned
>> to the client?

> I asked this a decade ago and got no useful responses.
>
https://www.postgresql.org/message-id/flat/02e901cc2bb4%2476bc2090%24643461b0%24%40yahoo.com#3784fab26b0f946b3239266e3b70a6ce

I can recall more-recent requests for that too, though I'm too lazy
to go search the archives right now.

I'm fairly disinclined to do anything about it though, because I'm
afraid of the SQL committee standardizing some other syntax for the
same idea in future (or maybe worse, commandeering the same keyword
for some other feature).  It doesn't seem quite valuable enough to
take those risks for.

Note that it's not like SQL hasn't heard of projections before.
You can always do "SELECT a, b, d FROM subquery-yielding-a-b-c-d".
So the proposed syntax would save a small amount of typing, but
it's not adding any real new functionality.

            regards, tom lane

Re: A modest proposal vis hierarchical queries: MINUS in the column list

From

Julien Rouhaud

Date:

08 June 2021, 02:50:08

On Mon, Jun 07, 2021 at 06:10:58PM -0400, Tom Lane wrote:
> 
> I'm fairly disinclined to do anything about it though, because I'm
> afraid of the SQL committee standardizing some other syntax for the
> same idea in future (or maybe worse, commandeering the same keyword
> for some other feature).  It doesn't seem quite valuable enough to
> take those risks for.

Also, isn't the OP problem already solved by the SEARCH / CYCLE grammar
handling added in 3696a600e2292?

Re: A modest proposal vis hierarchical queries: MINUS in the column list

From

Andrew Dunstan

Date:

08 June 2021, 15:00:56

On 6/7/21 6:10 PM, Tom Lane wrote:
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
>> On Mon, Jun 7, 2021 at 1:54 PM Mark Zellers <mark.zellers@workday.com>
>> wrote:
>>> What if you could use the MINUS keyword in the column
>>> list of a select statement to remove a column from the result set returned
>>> to the client?
>> I asked this a decade ago and got no useful responses.
>>
https://www.postgresql.org/message-id/flat/02e901cc2bb4%2476bc2090%24643461b0%24%40yahoo.com#3784fab26b0f946b3239266e3b70a6ce
> I can recall more-recent requests for that too, though I'm too lazy
> to go search the archives right now.
>
> I'm fairly disinclined to do anything about it though, because I'm
> afraid of the SQL committee standardizing some other syntax for the
> same idea in future (or maybe worse, commandeering the same keyword
> for some other feature).  It doesn't seem quite valuable enough to
> take those risks for.
>
> Note that it's not like SQL hasn't heard of projections before.
> You can always do "SELECT a, b, d FROM subquery-yielding-a-b-c-d".
> So the proposed syntax would save a small amount of typing, but
> it's not adding any real new functionality.
>
>             



True, but the problem happens when you have 250 fields and you want to
skip 4 of them. Getting that right can be a pain.


I agree that inventing syntax for this has the dangers you identify.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: A modest proposal vis hierarchical queries: MINUS in the column list

From

Tom Lane

Date:

08 June 2021, 15:39:09

Andrew Dunstan <andrew@dunslane.net> writes:
> On 6/7/21 6:10 PM, Tom Lane wrote:
>> Note that it's not like SQL hasn't heard of projections before.
>> You can always do "SELECT a, b, d FROM subquery-yielding-a-b-c-d".
>> So the proposed syntax would save a small amount of typing, but
>> it's not adding any real new functionality.

> True, but the problem happens when you have 250 fields and you want to
> skip 4 of them. Getting that right can be a pain.

I'm slightly skeptical of that argument, because if you have that
sort of query, you're most likely generating the query programmatically
anyway.  Yeah, it'd be a pain to maintain such code by hand, but
I don't see it being much of a problem if the code is built by
a machine.

Note that I'm not saying the idea is useless.  I'm just opining
that I'd rather wait for the SQL committee to do something in
this area.

            regards, tom lane

Re: A modest proposal vis hierarchical queries: MINUS in the column list

From

Peter Eisentraut

Date:

08 June 2021, 16:28:47

On 08.06.21 04:50, Julien Rouhaud wrote:
> On Mon, Jun 07, 2021 at 06:10:58PM -0400, Tom Lane wrote:
>>
>> I'm fairly disinclined to do anything about it though, because I'm
>> afraid of the SQL committee standardizing some other syntax for the
>> same idea in future (or maybe worse, commandeering the same keyword
>> for some other feature).  It doesn't seem quite valuable enough to
>> take those risks for.
> 
> Also, isn't the OP problem already solved by the SEARCH / CYCLE grammar
> handling added in 3696a600e2292?

You still get the path column in the output, which is what the OP didn't 
want.  But optionally eliminating the path column from the output might 
be a more constrained problem to solve.  We actually already discussed 
this; we just need to do it somehow.

Re: [External Sender] Re: A modest proposal vis hierarchical queries: MINUS in the column list

From

Mark Zellers

Date:

09 June 2021, 20:51:54

Tom Lane writes:
>Andrew Dunstan <andrew@dunslane.net> writes:
>> On 6/7/21 6:10 PM, Tom Lane wrote:
>>> Note that it's not like SQL hasn't heard of projections before.
>>> You can always do "SELECT a, b, d FROM subquery-yielding-a-b-c-d".
>>> So the proposed syntax would save a small amount of typing, but
>>> it's not adding any real new functionality.
>> True, but the problem happens when you have 250 fields and you want to
>> skip 4 of them. Getting that right can be a pain.

>I'm slightly skeptical of that argument, because if you have that
>sort of query, you're most likely generating the query programmatically
>anyway. Yeah, it'd be a pain to maintain such code by hand, but
>I don't see it being much of a problem if the code is built by
>a machine.

Here is the pattern I’m concerned with: the application has an entity layer that for each relationship knows all the fields and can read them and convert them into Java objects.
Developers are typically writing queries that just `SELECT *` from a table or view to load the entity. There could be many different queries with different filter criteria, for example, that are all fed through the same Java code. If the query omits some fields, the Java code can handle that by examining the meta-data and not reading the missing fields.

When new fields are added to a table or view, it is generally only necessary to update the common Java component rather than modifying each individual query. As I said in my original post, that leaves us with the unhappy alternatives of returning the (potentially large) temporary arrays used for sorting or having to explicitly name each column just to omit the unwanted temporary array.

Note that the Oracle START WITH/CONNECT BY syntax avoids this issue entirely because it is not necessary to return the temporary structure used only for sorting and is not needed by the client.

There is a preference for static queries over dynamically generated ones, as those can be statically analyzed for correctness and security issue, so dynamically generating the query is not always an available option.

I expect that this sort of pattern drives database developers crazy (“surely you aren’t using *all* those fields, why don’t you just explicitly list the ones you want?”) but there are other constraints (static validation, provably avoiding SQL Injection attacks, ease of maintenance) that may take precedence. There is value in not needing to make a knight’s tour through the code base every time someone adds a field to a table to update the column lists in all the queries that refer to that table.

Regards,

Mark Z.