Thread: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
From
Karsten P
Date:
Hi, i've already googled so far but didn't find anything regarding my problem.. I hope i'm here at the right place. Following situation (this is just an simplyfied example): suppose we have two tables, lets say orders - column 'order_number' -> varchar - column 'order_date' -> timestamp with index on order_date and invoices - column 'invoice_number' -> varchar - column 'invoice_date' -> timestamp with index on invoice_date and many records in both if them. now we have a view combining both of them as create view documents as ( select order_number as document_number, order_date as document_date from orders union all select invoice_number, invoice_date from invoices ) finding the last order placed in the database ist really easy: select order_number from orders order by order_date desc limit 1 will result in an index scan backward on orders same with invoices only... but when querying the view select document_number from documents order by document_date desc limit 1 seems to break down to - collect all rows from orders - combine it with all rows from invoices - sort all rows (descending) - limit to one row with many data this is quite slow. I've tested this with PG9.6 and PG14, it doesn't seem to make a difference (correct me if i'm wrong). So my question is: What about optimizing the query-planner that if - a query with unions of selects is executed - and an 'order by' in combination with 'limit' is applied on the complete query (not only on subselects) - and there is a matching index for each select the order by and limit - part of the sql is also beeing applied on each sub-select ? actually select document_number from documents order by document_date desc limit 1 is beeing processed as select order_number from orders union all select invoice_number from invoices order by document_number desc limit 1 but would it be possible to let the query-optimizer expand the query to select order_number from ( (select order_number, order_date from orders order by order_date desc limit 1) union all (select invoice_number, invoice_date from invoices order by invoice_date desc limit 1) ) as subselect order by order_date desc limit 1 as this would use two (or number of unions) index-backward-scans and than only has to reorder at maximum two rows before limiting to the first of it? this should be significantly faster. thanks a lot and greetz, Karsten
Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
From
Karsten P
Date:
Okay, forget what i've just written, sorry for that. I've digged deeper and checked my generic example - it works perfectly using a merge-append in combination with limit. i'm sorry i didn't check that first. it just won't work in my real-life example. though each part of the query is using an index-scan it is than using a 'normal' append instead of a merge-append, but i don't know why. here is the expected query plan as used with my generic example: Limit (cost=0.86..0.90 rows=1 width=12) -> Merge Append (cost=0.86..104314.37 rows=3435900 width=12) Sort Key: kpkp_orders.buchungsdatum DESC -> Index Only Scan Backward using kpkp_orders_1 on kpkp_orders (cost=0.43..34977.68 rows=1717950 width=12) -> Index Only Scan Backward using kpkp_invoices_1 on kpkp_invoices (cost=0.43..34977.68 rows=1717950 width=12) this is fast. and here is the query plan used on my real-life example: Limit (cost=918.40..918.41 rows=1 width=8) -> Sort (cost=918.40..919.59 rows=473 width=8) Sort Key: s.belegdatum DESC -> Subquery Scan on s (cost=0.56..916.04 rows=473 width=8) -> Append (cost=0.56..911.31 rows=473 width=327) -> Subquery Scan on "*SELECT* 1" (cost=0.56..624.08 rows=318 width=187) -> Index Scan using soit_erpprozesshistorie_12 on erpprozesshistorie eph (cost=0.56..620.10 rows=318 width=181) Index Cond: ((clientid = '0'::numeric) AND (activeflag = 't'::bpchar) AND (prozessuntertyp = 2) AND (CASE WHEN ((typ = 1) OR (typ = 7)) THEN 1 ELSE CASE WHEN (typ = 3) THEN 2 ELSE CASE WHEN ((typ = 4) OR (typ = 9)) THEN 3 ELSE 4 END END END = 1) AND (CASE WHEN (prozesstyp = 1) THEN true ELSE false END = true) AND (artikelvariante_objid = '1064748816'::numeric)) -> Subquery Scan on "*SELECT* 2" (cost=0.43..284.87 rows=155 width=214) -> Index Scan using soit_umsatz_alt_08 on umsatz_alt (cost=0.43..282.93 rows=155 width=186) Index Cond: ((clientid = '0'::numeric) AND (activeflag = 't'::bpchar) AND (verkauf = true) AND (artikelvariante_objid = '1064748816'::numeric)) so my question is: under wich circumstance does the query-planner use or prefer the 'merge append' over 'append'? Thanks in advance! Am 08.05.25 um 11:57 schrieb Karsten P: > Hi, > > i've already googled so far but didn't find anything regarding my > problem.. > I hope i'm here at the right place. > > Following situation (this is just an simplyfied example): > > suppose we have two tables, lets say > > orders > - column 'order_number' -> varchar > - column 'order_date' -> timestamp > > with index on order_date > > and > > invoices > - column 'invoice_number' -> varchar > - column 'invoice_date' -> timestamp > > with index on invoice_date > > and many records in both if them. > > now we have a view combining both of them as > > create view documents as > ( > select order_number as document_number, order_date as > document_date from orders > union all select invoice_number, invoice_date from invoices > ) > > > finding the last order placed in the database ist really easy: > > select order_number from orders order by order_date desc limit 1 > > will result in an index scan backward on orders > > same with invoices only... > > but when querying the view > > select document_number from documents order by document_date desc > limit 1 > > seems to break down to > - collect all rows from orders > - combine it with all rows from invoices > - sort all rows (descending) > - limit to one row > > with many data this is quite slow. > > I've tested this with PG9.6 and PG14, it doesn't seem to make a > difference (correct me if i'm wrong). > > > So my question is: What about optimizing the query-planner that if > > - a query with unions of selects is executed > - and an 'order by' in combination with 'limit' is applied on the > complete query (not only on subselects) > - and there is a matching index for each select > > the order by and limit - part of the sql is also beeing applied on > each sub-select ? > > actually > select document_number from documents order by document_date desc > limit 1 > > is beeing processed as > select order_number from orders > union all select invoice_number from invoices > order by document_number desc > limit 1 > > but would it be possible to let the query-optimizer expand the query to > select order_number from ( > (select order_number, order_date from orders order by > order_date desc limit 1) > union all (select invoice_number, invoice_date from invoices > order by invoice_date desc limit 1) > ) as subselect > order by order_date desc > limit 1 > > as this would use two (or number of unions) index-backward-scans > and than only has to reorder at maximum two rows before limiting to > the first of it? > > this should be significantly faster. > > thanks a lot and greetz, > Karsten > > >
Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
From
David Rowley
Date:
On Thu, 8 May 2025 at 22:57, Karsten P <mr.mister123@hotmail.com> wrote: > i'm sorry i didn't check that first. it just won't work in my real-life > example. > though each part of the query is using an index-scan it is than using a > 'normal' append > instead of a merge-append, but i don't know why. You could try: SET enable_sort = 0; ... to see what the costs come out to be and how it performs. Perhaps the planner thinks using the other indexes to more efficiently filter out the unrelated tuples for the WHERE clause is cheaper than using the index that provides the tuples sorted by date and filtering the unwanted tuples with a "Filter". > so my question is: under wich circumstance does the query-planner use or > prefer the 'merge append' over 'append'? It's all based on costs. Those are shown in the "cost=918.40..918.41" part that you're seeing in the EXPLAIN output. You could try adding an index that suits all your equality WHERE clause filters, or some subset of them and put the date column as the final indexed column and see what happens. David
Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
From
Tom Lane
Date:
Karsten P <mr.mister123@hotmail.com> writes: > i'm sorry i didn't check that first. it just won't work in my real-life > example. > though each part of the query is using an index-scan it is than using a > 'normal' append > instead of a merge-append, but i don't know why. The "Subquery Scan" nodes shown in your real-life example indicate that you're using views that the planner is unable to flatten completely, and those are preventing detection that the index you want to use would be helpful. The view you showed originally wouldn't be that, so there is something you're doing that you left out. It looks like your actual view contains some WHERE restrictions in the UNION arms, which I think are enough to cause this problem. Even then, though, the "Subquery Scan" nodes get optimized away in simple tests, which means there's an additional optimization blocker. I'd look closely at whether the output column types of the UNION arms match. regards, tom lane