A cost issue in ORDER BY + LIMIT - Mailing list pgsql-hackers

From Paul Guo
Subject A cost issue in ORDER BY + LIMIT
Date
Msg-id CABQrizdnkP4zmd5gQoqCqN9s-VCVqUWyFja6M072QPLFnA5T0g@mail.gmail.com
Whole thread Raw
Responses Re: A cost issue in ORDER BY + LIMIT
Re: A cost issue in ORDER BY + LIMIT
List pgsql-hackers
Hello,

Postgres seems to always optimize ORDER BY + LIMIT as top-k sort.
Recently I happened to notice
that in this scenario the output tuple number of the sort node is not
the same as the LIMIT tuple number.

See below,

postgres=# explain analyze verbose select * from t1 order by a limit 10;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------
------------------------------
 Limit  (cost=69446.17..69446.20 rows=10 width=4) (actual
time=282.896..282.923 rows=10 loops=1)
   Output: a
   ->  Sort  (cost=69446.17..71925.83 rows=991862 width=4) (actual
time=282.894..282.896 rows=10 l
oops=1)
         Output: a
         Sort Key: t1.a
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on public.t1  (cost=0.00..44649.62 rows=991862
width=4) (actual time=0.026..
195.438 rows=1001000 loops=1)
               Output: a
 Planning Time: 0.553 ms
 Execution Time: 282.961 ms
(10 rows)

We can see from the output that the LIMIT cost is wrong also due to
this since it assumes the input_rows
from the sort node is 991862 (per gdb debugging).

This could be easily fixed by below patch,

diff --git a/src/backend/optimizer/path/costsize.c
b/src/backend/optimizer/path/costsize.c
index fb28e6411a..800cf0b256 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2429,7 +2429,11 @@ cost_sort(Path *path, PlannerInfo *root,

    startup_cost += input_cost;

-   path->rows = tuples;
+   if (limit_tuples > 0 && limit_tuples < tuples)
+       path->rows = limit_tuples;
+   else
+       path->rows = tuples;
+
    path->startup_cost = startup_cost;
    path->total_cost = startup_cost + run_cost;
 }

Withe the patch the explain output looks like this.

postgres=# explain analyze verbose select * from t1 order by a limit 10;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------
------------------------------
 Limit  (cost=69446.17..71925.83 rows=10 width=4) (actual
time=256.204..256.207 rows=10 loops=1)
   Output: a
   ->  Sort  (cost=69446.17..71925.83 rows=10 width=4) (actual
time=256.202..256.203 rows=10 loops
=1)
         Output: a
         Sort Key: t1.a
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on public.t1  (cost=0.00..44649.62 rows=991862
width=4) (actual time=1.014..
169.509 rows=1001000 loops=1)
               Output: a
 Planning Time: 0.076 ms
 Execution Time: 256.232 ms
(10 rows)

Regards,
Paul



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: `make check` doesn't pass on MacOS Catalina
Next
From: Zhang Mingli
Date:
Subject: Re: A cost issue in ORDER BY + LIMIT