Re: The nested view from hell - Restricting a subquerry - Mailing list pgsql-sql

From Gregory Stark
Subject Re: The nested view from hell - Restricting a subquerry
Date
Msg-id 87d4yjmkiw.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: The nested view from hell - Restricting a subquerry  (Nis Jørgensen <nis@superlativ.dk>)
Responses Re: The nested view from hell - Restricting a subquerry  (Nis Jørgensen <nis@superlativ.dk>)
Re: The nested view from hell - Restricting a subquerry  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-sql
Nis Jørgensen <nis@superlativ.dk> writes:

>>> Well, the query can be satisfied by looking only at the rows with an
>>> order_id matching the invoice_id given. The condition that this is the
>>> largest invoice in the group then needs to be checked afterwards.
>>>
>>> I certainly did not expect the query planner to be able to deduce this,
>>> though.
>>
>> No, that's not true. If you had two records in eg_order with the same order_id
>> but different invoice_ids then the query would need both records to satisfy
>> the query.
>
> I assume you mean "... then both records are necessary in order to
> calculate the results of the query". This does not contradict what I wrote.

Sorry I meant, "the query as written can not be satisfied by looking only at
the rows with the specified invoice_id".

>     SELECT  order_id,
>     max(order_view.invoice_id),
>     sum(order_view.mileage)
>     FROM    (SELECT order_id,invoice_id, 0 as mileage FROM eg_order
>              UNION
>              SELECT order_id, 0, mileage FROM eg_order_line)
>              order_view GROUP BY order_view.order_id;
>
> This is then restricted on max(invoice_id)
>
> As far as I can tell, these steps produce the correct results (without
> the later information about primary keys provided by Bryce)
>
> INPUT: my_invoice_id
>
> 1. Look up all order_ids for which (order_id,my_invoice_id) appear in
> eg_orders
>
> 2. Find all rows (in both branches of the UNION) with these id_s
>
> 3. Group the rows, and calculate max(invoice_id)
>
> 4. Filter the result rows on max(invoice_id) = my_invoice_id.

So here's a hypothetical data set for which this algorithm fails:

order_id     invoice_id    mileage
--------------------------------------------
1        1        100
1        2        100

Your algorithm would produce

order_id    max(invoice_id)    sum(mileage)
--------------------------------------------
1        1        100

Whereas the correct output would be to output no records at all.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



pgsql-sql by date:

Previous
From: "Jyoti Seth"
Date:
Subject: Re: Database Synchronization
Next
From: Nis Jørgensen
Date:
Subject: Re: The nested view from hell - Restricting a subquerry