Nis Jørgensen <nis@superlativ.dk> writes:
>>> Well, the query can be satisfied by looking only at the rows with an
>>> order_id matching the invoice_id given. The condition that this is the
>>> largest invoice in the group then needs to be checked afterwards.
>>>
>>> I certainly did not expect the query planner to be able to deduce this,
>>> though.
>>
>> No, that's not true. If you had two records in eg_order with the same order_id
>> but different invoice_ids then the query would need both records to satisfy
>> the query.
>
> I assume you mean "... then both records are necessary in order to
> calculate the results of the query". This does not contradict what I wrote.
Sorry I meant, "the query as written can not be satisfied by looking only at
the rows with the specified invoice_id".
> SELECT order_id,
> max(order_view.invoice_id),
> sum(order_view.mileage)
> FROM (SELECT order_id,invoice_id, 0 as mileage FROM eg_order
> UNION
> SELECT order_id, 0, mileage FROM eg_order_line)
> order_view GROUP BY order_view.order_id;
>
> This is then restricted on max(invoice_id)
>
> As far as I can tell, these steps produce the correct results (without
> the later information about primary keys provided by Bryce)
>
> INPUT: my_invoice_id
>
> 1. Look up all order_ids for which (order_id,my_invoice_id) appear in
> eg_orders
>
> 2. Find all rows (in both branches of the UNION) with these id_s
>
> 3. Group the rows, and calculate max(invoice_id)
>
> 4. Filter the result rows on max(invoice_id) = my_invoice_id.
So here's a hypothetical data set for which this algorithm fails:
order_id invoice_id mileage
--------------------------------------------
1 1 100
1 2 100
Your algorithm would produce
order_id max(invoice_id) sum(mileage)
--------------------------------------------
1 1 100
Whereas the correct output would be to output no records at all.
-- Gregory Stark EnterpriseDB http://www.enterprisedb.com