Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union - Mailing list pgsql-general
From | Karsten P |
---|---|
Subject | Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union |
Date | |
Msg-id | DB7PR08MB30819C2279665D7AFBD247EFDA8BA@DB7PR08MB3081.eurprd08.prod.outlook.com Whole thread Raw |
In response to | Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union (Karsten P <mr.mister123@hotmail.com>) |
Responses |
Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union |
List | pgsql-general |
Okay, forget what i've just written, sorry for that. I've digged deeper and checked my generic example - it works perfectly using a merge-append in combination with limit. i'm sorry i didn't check that first. it just won't work in my real-life example. though each part of the query is using an index-scan it is than using a 'normal' append instead of a merge-append, but i don't know why. here is the expected query plan as used with my generic example: Limit (cost=0.86..0.90 rows=1 width=12) -> Merge Append (cost=0.86..104314.37 rows=3435900 width=12) Sort Key: kpkp_orders.buchungsdatum DESC -> Index Only Scan Backward using kpkp_orders_1 on kpkp_orders (cost=0.43..34977.68 rows=1717950 width=12) -> Index Only Scan Backward using kpkp_invoices_1 on kpkp_invoices (cost=0.43..34977.68 rows=1717950 width=12) this is fast. and here is the query plan used on my real-life example: Limit (cost=918.40..918.41 rows=1 width=8) -> Sort (cost=918.40..919.59 rows=473 width=8) Sort Key: s.belegdatum DESC -> Subquery Scan on s (cost=0.56..916.04 rows=473 width=8) -> Append (cost=0.56..911.31 rows=473 width=327) -> Subquery Scan on "*SELECT* 1" (cost=0.56..624.08 rows=318 width=187) -> Index Scan using soit_erpprozesshistorie_12 on erpprozesshistorie eph (cost=0.56..620.10 rows=318 width=181) Index Cond: ((clientid = '0'::numeric) AND (activeflag = 't'::bpchar) AND (prozessuntertyp = 2) AND (CASE WHEN ((typ = 1) OR (typ = 7)) THEN 1 ELSE CASE WHEN (typ = 3) THEN 2 ELSE CASE WHEN ((typ = 4) OR (typ = 9)) THEN 3 ELSE 4 END END END = 1) AND (CASE WHEN (prozesstyp = 1) THEN true ELSE false END = true) AND (artikelvariante_objid = '1064748816'::numeric)) -> Subquery Scan on "*SELECT* 2" (cost=0.43..284.87 rows=155 width=214) -> Index Scan using soit_umsatz_alt_08 on umsatz_alt (cost=0.43..282.93 rows=155 width=186) Index Cond: ((clientid = '0'::numeric) AND (activeflag = 't'::bpchar) AND (verkauf = true) AND (artikelvariante_objid = '1064748816'::numeric)) so my question is: under wich circumstance does the query-planner use or prefer the 'merge append' over 'append'? Thanks in advance! Am 08.05.25 um 11:57 schrieb Karsten P: > Hi, > > i've already googled so far but didn't find anything regarding my > problem.. > I hope i'm here at the right place. > > Following situation (this is just an simplyfied example): > > suppose we have two tables, lets say > > orders > - column 'order_number' -> varchar > - column 'order_date' -> timestamp > > with index on order_date > > and > > invoices > - column 'invoice_number' -> varchar > - column 'invoice_date' -> timestamp > > with index on invoice_date > > and many records in both if them. > > now we have a view combining both of them as > > create view documents as > ( > select order_number as document_number, order_date as > document_date from orders > union all select invoice_number, invoice_date from invoices > ) > > > finding the last order placed in the database ist really easy: > > select order_number from orders order by order_date desc limit 1 > > will result in an index scan backward on orders > > same with invoices only... > > but when querying the view > > select document_number from documents order by document_date desc > limit 1 > > seems to break down to > - collect all rows from orders > - combine it with all rows from invoices > - sort all rows (descending) > - limit to one row > > with many data this is quite slow. > > I've tested this with PG9.6 and PG14, it doesn't seem to make a > difference (correct me if i'm wrong). > > > So my question is: What about optimizing the query-planner that if > > - a query with unions of selects is executed > - and an 'order by' in combination with 'limit' is applied on the > complete query (not only on subselects) > - and there is a matching index for each select > > the order by and limit - part of the sql is also beeing applied on > each sub-select ? > > actually > select document_number from documents order by document_date desc > limit 1 > > is beeing processed as > select order_number from orders > union all select invoice_number from invoices > order by document_number desc > limit 1 > > but would it be possible to let the query-optimizer expand the query to > select order_number from ( > (select order_number, order_date from orders order by > order_date desc limit 1) > union all (select invoice_number, invoice_date from invoices > order by invoice_date desc limit 1) > ) as subselect > order by order_date desc > limit 1 > > as this would use two (or number of unions) index-backward-scans > and than only has to reorder at maximum two rows before limiting to > the first of it? > > this should be significantly faster. > > thanks a lot and greetz, > Karsten > > >
pgsql-general by date: