Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union - Mailing list pgsql-general

From Karsten P
Subject Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
Date
Msg-id DB7PR08MB30819C2279665D7AFBD247EFDA8BA@DB7PR08MB3081.eurprd08.prod.outlook.com
Whole thread Raw
In response to Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union  (Karsten P <mr.mister123@hotmail.com>)
Responses Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
List pgsql-general
Okay, forget what i've just written, sorry for that.

I've digged deeper and checked my generic example - it works perfectly using
a merge-append in combination with limit.

i'm sorry i didn't check that first. it just won't work in my real-life 
example.
though each part of the query is using an index-scan it is than using a 
'normal' append
instead of a merge-append, but i don't know why.

here is the expected query plan as used with my generic example:

Limit  (cost=0.86..0.90 rows=1 width=12)
   ->  Merge Append  (cost=0.86..104314.37 rows=3435900 width=12)
         Sort Key: kpkp_orders.buchungsdatum DESC
         ->  Index Only Scan Backward using kpkp_orders_1 on 
kpkp_orders  (cost=0.43..34977.68 rows=1717950 width=12)
         ->  Index Only Scan Backward using kpkp_invoices_1 on 
kpkp_invoices  (cost=0.43..34977.68 rows=1717950 width=12)

this is fast.

and here is the query plan used on my real-life example:


Limit  (cost=918.40..918.41 rows=1 width=8)
   ->  Sort  (cost=918.40..919.59 rows=473 width=8)
         Sort Key: s.belegdatum DESC
         ->  Subquery Scan on s  (cost=0.56..916.04 rows=473 width=8)
               ->  Append  (cost=0.56..911.31 rows=473 width=327)
                     ->  Subquery Scan on "*SELECT* 1" 
(cost=0.56..624.08 rows=318 width=187)
                           ->  Index Scan using 
soit_erpprozesshistorie_12 on erpprozesshistorie eph (cost=0.56..620.10 
rows=318 width=181)
                                 Index Cond: ((clientid = '0'::numeric) 
AND (activeflag = 't'::bpchar) AND (prozessuntertyp = 2) AND (CASE WHEN 
((typ = 1) OR (typ = 7)) THEN 1 ELSE CASE WHEN (typ = 3) THEN 2 ELSE 
CASE WHEN ((typ = 4) OR (typ = 9)) THEN 3 ELSE 4 END END END = 1) AND 
(CASE WHEN (prozesstyp = 1) THEN true ELSE false END = true) AND 
(artikelvariante_objid = '1064748816'::numeric))
                     ->  Subquery Scan on "*SELECT* 2" 
(cost=0.43..284.87 rows=155 width=214)
                           ->  Index Scan using soit_umsatz_alt_08 on 
umsatz_alt  (cost=0.43..282.93 rows=155 width=186)
                                 Index Cond: ((clientid = '0'::numeric) 
AND (activeflag = 't'::bpchar) AND (verkauf = true) AND 
(artikelvariante_objid = '1064748816'::numeric))


so my question is: under wich circumstance does the query-planner use or 
prefer the 'merge append' over 'append'?


Thanks in advance!


Am 08.05.25 um 11:57 schrieb Karsten P:
> Hi,
>
> i've already googled so far but didn't find anything regarding my 
> problem..
> I hope i'm here at the right place.
>
> Following situation (this is just an simplyfied example):
>
> suppose we have two tables, lets say
>
> orders
>   - column 'order_number' -> varchar
>   - column 'order_date' -> timestamp
>
> with index on order_date
>
> and
>
> invoices
>   - column 'invoice_number' -> varchar
>   - column 'invoice_date' -> timestamp
>
> with index on invoice_date
>
> and many records in both if them.
>
> now we have a view combining both of them as
>
> create view documents as
> (
>     select order_number as document_number, order_date as 
> document_date from orders
>     union all select invoice_number, invoice_date from invoices
> )
>
>
> finding the last order placed in the database ist really easy:
>
>   select order_number from orders order by order_date desc limit 1
>
> will result in an index scan backward on orders
>
> same with invoices only...
>
> but when querying the view
>
>   select document_number from documents order by document_date desc 
> limit 1
>
> seems to break down to
>   - collect all rows from orders
>   - combine it with all rows from invoices
>   - sort all rows (descending)
>   - limit to one row
>
> with many data this is quite slow.
>
> I've tested this with PG9.6 and PG14, it doesn't seem to make a 
> difference (correct me if i'm wrong).
>
>
> So my question is: What about optimizing the query-planner that if
>
> - a query with unions of selects is executed
> - and an 'order by' in combination with 'limit' is applied on the 
> complete query (not only on subselects)
> - and there is a matching index for each select
>
> the order by and limit - part of the sql is also beeing applied on 
> each sub-select ?
>
> actually
>     select document_number from documents order by document_date desc 
> limit 1
>
> is beeing processed as
>     select order_number from orders
>     union all select invoice_number from invoices
>     order by document_number desc
>     limit 1
>
> but would it be possible to let the query-optimizer expand the query to
>     select order_number from (
>         (select order_number, order_date from orders order by 
> order_date desc limit 1)
>         union all (select invoice_number, invoice_date from invoices 
> order by invoice_date desc limit 1)
>     ) as subselect
>    order by order_date desc
>    limit 1
>
> as this would use two (or number of unions) index-backward-scans
> and than only has to reorder at maximum two rows before limiting to 
> the first of it?
>
> this should be significantly faster.
>
> thanks a lot and greetz,
> Karsten
>
>
>




pgsql-general by date:

Previous
From: Karsten P
Date:
Subject: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
Next
From: איתי זרחוביץ
Date:
Subject: Issue Launching PostgreSQL on MacBook M4 – Error -10669‏