Thread: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union

Hi,

i've already googled so far but didn't find anything regarding my problem..
I hope i'm here at the right place.

Following situation (this is just an simplyfied example):

suppose we have two tables, lets say

orders
   - column 'order_number' -> varchar
   - column 'order_date' -> timestamp

with index on order_date

and

invoices
   - column 'invoice_number' -> varchar
   - column 'invoice_date' -> timestamp

with index on invoice_date

and many records in both if them.

now we have a view combining both of them as

create view documents as
(
     select order_number as document_number, order_date as document_date 
from orders
     union all select invoice_number, invoice_date from invoices
)


finding the last order placed in the database ist really easy:

   select order_number from orders order by order_date desc limit 1

will result in an index scan backward on orders

same with invoices only...

but when querying the view

   select document_number from documents order by document_date desc limit 1

seems to break down to
   - collect all rows from orders
   - combine it with all rows from invoices
   - sort all rows (descending)
   - limit to one row

with many data this is quite slow.

I've tested this with PG9.6 and PG14, it doesn't seem to make a 
difference (correct me if i'm wrong).


So my question is: What about optimizing the query-planner that if

- a query with unions of selects is executed
- and an 'order by' in combination with 'limit' is applied on the 
complete query (not only on subselects)
- and there is a matching index for each select

the order by and limit - part of the sql is also beeing applied on each 
sub-select ?

actually
     select document_number from documents order by document_date desc 
limit 1

is beeing processed as
     select order_number from orders
     union all select invoice_number from invoices
     order by document_number desc
     limit 1

but would it be possible to let the query-optimizer expand the query to
     select order_number from (
         (select order_number, order_date from orders order by 
order_date desc limit 1)
         union all (select invoice_number, invoice_date from invoices 
order by invoice_date desc limit 1)
     ) as subselect
    order by order_date desc
    limit 1

as this would use two (or number of unions) index-backward-scans
and than only has to reorder at maximum two rows before limiting to the 
first of it?

this should be significantly faster.

thanks a lot and greetz,
Karsten




Okay, forget what i've just written, sorry for that.

I've digged deeper and checked my generic example - it works perfectly using
a merge-append in combination with limit.

i'm sorry i didn't check that first. it just won't work in my real-life 
example.
though each part of the query is using an index-scan it is than using a 
'normal' append
instead of a merge-append, but i don't know why.

here is the expected query plan as used with my generic example:

Limit  (cost=0.86..0.90 rows=1 width=12)
   ->  Merge Append  (cost=0.86..104314.37 rows=3435900 width=12)
         Sort Key: kpkp_orders.buchungsdatum DESC
         ->  Index Only Scan Backward using kpkp_orders_1 on 
kpkp_orders  (cost=0.43..34977.68 rows=1717950 width=12)
         ->  Index Only Scan Backward using kpkp_invoices_1 on 
kpkp_invoices  (cost=0.43..34977.68 rows=1717950 width=12)

this is fast.

and here is the query plan used on my real-life example:


Limit  (cost=918.40..918.41 rows=1 width=8)
   ->  Sort  (cost=918.40..919.59 rows=473 width=8)
         Sort Key: s.belegdatum DESC
         ->  Subquery Scan on s  (cost=0.56..916.04 rows=473 width=8)
               ->  Append  (cost=0.56..911.31 rows=473 width=327)
                     ->  Subquery Scan on "*SELECT* 1" 
(cost=0.56..624.08 rows=318 width=187)
                           ->  Index Scan using 
soit_erpprozesshistorie_12 on erpprozesshistorie eph (cost=0.56..620.10 
rows=318 width=181)
                                 Index Cond: ((clientid = '0'::numeric) 
AND (activeflag = 't'::bpchar) AND (prozessuntertyp = 2) AND (CASE WHEN 
((typ = 1) OR (typ = 7)) THEN 1 ELSE CASE WHEN (typ = 3) THEN 2 ELSE 
CASE WHEN ((typ = 4) OR (typ = 9)) THEN 3 ELSE 4 END END END = 1) AND 
(CASE WHEN (prozesstyp = 1) THEN true ELSE false END = true) AND 
(artikelvariante_objid = '1064748816'::numeric))
                     ->  Subquery Scan on "*SELECT* 2" 
(cost=0.43..284.87 rows=155 width=214)
                           ->  Index Scan using soit_umsatz_alt_08 on 
umsatz_alt  (cost=0.43..282.93 rows=155 width=186)
                                 Index Cond: ((clientid = '0'::numeric) 
AND (activeflag = 't'::bpchar) AND (verkauf = true) AND 
(artikelvariante_objid = '1064748816'::numeric))


so my question is: under wich circumstance does the query-planner use or 
prefer the 'merge append' over 'append'?


Thanks in advance!


Am 08.05.25 um 11:57 schrieb Karsten P:
> Hi,
>
> i've already googled so far but didn't find anything regarding my 
> problem..
> I hope i'm here at the right place.
>
> Following situation (this is just an simplyfied example):
>
> suppose we have two tables, lets say
>
> orders
>   - column 'order_number' -> varchar
>   - column 'order_date' -> timestamp
>
> with index on order_date
>
> and
>
> invoices
>   - column 'invoice_number' -> varchar
>   - column 'invoice_date' -> timestamp
>
> with index on invoice_date
>
> and many records in both if them.
>
> now we have a view combining both of them as
>
> create view documents as
> (
>     select order_number as document_number, order_date as 
> document_date from orders
>     union all select invoice_number, invoice_date from invoices
> )
>
>
> finding the last order placed in the database ist really easy:
>
>   select order_number from orders order by order_date desc limit 1
>
> will result in an index scan backward on orders
>
> same with invoices only...
>
> but when querying the view
>
>   select document_number from documents order by document_date desc 
> limit 1
>
> seems to break down to
>   - collect all rows from orders
>   - combine it with all rows from invoices
>   - sort all rows (descending)
>   - limit to one row
>
> with many data this is quite slow.
>
> I've tested this with PG9.6 and PG14, it doesn't seem to make a 
> difference (correct me if i'm wrong).
>
>
> So my question is: What about optimizing the query-planner that if
>
> - a query with unions of selects is executed
> - and an 'order by' in combination with 'limit' is applied on the 
> complete query (not only on subselects)
> - and there is a matching index for each select
>
> the order by and limit - part of the sql is also beeing applied on 
> each sub-select ?
>
> actually
>     select document_number from documents order by document_date desc 
> limit 1
>
> is beeing processed as
>     select order_number from orders
>     union all select invoice_number from invoices
>     order by document_number desc
>     limit 1
>
> but would it be possible to let the query-optimizer expand the query to
>     select order_number from (
>         (select order_number, order_date from orders order by 
> order_date desc limit 1)
>         union all (select invoice_number, invoice_date from invoices 
> order by invoice_date desc limit 1)
>     ) as subselect
>    order by order_date desc
>    limit 1
>
> as this would use two (or number of unions) index-backward-scans
> and than only has to reorder at maximum two rows before limiting to 
> the first of it?
>
> this should be significantly faster.
>
> thanks a lot and greetz,
> Karsten
>
>
>




On Thu, 8 May 2025 at 22:57, Karsten P <mr.mister123@hotmail.com> wrote:
> i'm sorry i didn't check that first. it just won't work in my real-life
> example.
> though each part of the query is using an index-scan it is than using a
> 'normal' append
> instead of a merge-append, but i don't know why.

You could try:

SET enable_sort = 0;

... to see what the costs come out to be and how it performs. Perhaps
the planner thinks using the other indexes to more efficiently filter
out the unrelated tuples for the WHERE clause is cheaper than using
the index that provides the tuples sorted by date and filtering the
unwanted tuples with a "Filter".

> so my question is: under wich circumstance does the query-planner use or
> prefer the 'merge append' over 'append'?

It's all based on costs. Those are shown in the "cost=918.40..918.41"
part that you're seeing in the EXPLAIN output.

You could try adding an index that suits all your equality WHERE
clause filters, or some subset of them and put the date column as the
final indexed column and see what happens.

David



Karsten P <mr.mister123@hotmail.com> writes:
> i'm sorry i didn't check that first. it just won't work in my real-life 
> example.
> though each part of the query is using an index-scan it is than using a 
> 'normal' append
> instead of a merge-append, but i don't know why.

The "Subquery Scan" nodes shown in your real-life example indicate
that you're using views that the planner is unable to flatten
completely, and those are preventing detection that the index you want
to use would be helpful.  The view you showed originally wouldn't be
that, so there is something you're doing that you left out.  It looks
like your actual view contains some WHERE restrictions in the UNION
arms, which I think are enough to cause this problem.  Even then,
though, the "Subquery Scan" nodes get optimized away in simple tests,
which means there's an additional optimization blocker.  I'd look
closely at whether the output column types of the UNION arms match.

            regards, tom lane