Thread: [planner] Ignore "order by" in subselect if parrent do count(*)
Hello, my example query (and explain) is: $ explain SELECT count(*) from (select * from users_profile order by id) u_p; QUERY PLAN --------------------------------------------------------------------------- Aggregate (cost=1.06..1.07 rows=1 width=0) -> Sort (cost=1.03..1.03 rows=2 width=572) Sort Key: users_profile.id -> Seq Scan on users_profile (cost=0.00..1.02 rows=2 width=572) (4 rows) Meseems "order by id" can be ignored by planner. It should speed up query without side effect. I know the query should be fixed but this is real and simplified query from real application. Does postgresql team think ppostgres should be smarter than user and fix user queries? If answer is positive please treat this as "feature request". Thank you and regards, Marcin.
On 1 March 2012 12:45, Marcin Mirosław <marcin@mejor.pl> wrote:
If you have only 2 rows in the table, then the plan really doesn't matter too much. Sorting two rows would be really fast :)
Try to check it with 10k rows.
regards
Szymon
Hello,
my example query (and explain) is:
$ explain SELECT count(*) from (select * from users_profile order by id)
u_p;
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=1.06..1.07 rows=1 width=0)
-> Sort (cost=1.03..1.03 rows=2 width=572)
Sort Key: users_profile.id
-> Seq Scan on users_profile (cost=0.00..1.02 rows=2 width=572)
(4 rows)
Meseems "order by id" can be ignored by planner. It should speed up
query without side effect. I know the query should be fixed but this is
real and simplified query from real application.
Does postgresql team think ppostgres should be smarter than user and fix
user queries? If answer is positive please treat this as "feature request".
Thank you and regards,
Marcin.
If you have only 2 rows in the table, then the plan really doesn't matter too much. Sorting two rows would be really fast :)
Try to check it with 10k rows.
regards
Szymon
W dniu 01.03.2012 12:50, Szymon Guz pisze: Hi Szymon, > If you have only 2 rows in the table, then the plan really doesn't > matter too much. Sorting two rows would be really fast :) > > Try to check it with 10k rows. It doesn't matter (in this case) how many records is in user_profile table. Planner does sorting. Here is version with more rows: $ explain (analyze,verbose,buffers) SELECT count(*) from (select * from users_profile order by id) u_p; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1593639.92..1593639.93 rows=1 width=0) (actual time=11738.498..11738.498 rows=1 loops=1) Output: count(*) Buffers: shared hit=2499 read=41749 written=10595, temp read=17107 written=17107 -> Sort (cost=1443640.26..1468640.21 rows=9999977 width=4) (actual time=9804.461..10963.911 rows=10000000 loops=1) Output: users_profile.id Sort Key: users_profile.id Sort Method: external sort Disk: 136856kB Buffers: shared hit=2499 read=41749 written=10595, temp read=17107 written=17107 -> Seq Scan on public.users_profile (cost=0.00..144247.77 rows=9999977 width=4) (actual time=0.021..1192.202 rows=10000000 loops=1) Output: users_profile.id Buffers: shared hit=2499 read=41749 written=10595 Total runtime: 11768.199 ms (12 rows) And without "order by": $ explain (analyze,verbose,buffers) SELECT count(*) from (select * from users_profile ) u_p; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=169247.71..169247.72 rows=1 width=0) (actual time=1757.613..1757.613 rows=1 loops=1) Output: count(*) Buffers: shared hit=2522 read=41726 -> Seq Scan on public.users_profile (cost=0.00..144247.77 rows=9999977 width=0) (actual time=0.032..946.166 rows=10000000 loops=1) Output: users_profile.id Buffers: shared hit=2522 read=41726 Total runtime: 1757.656 ms (7 rows)
On 1 March 2012 13:02, Marcin Mirosław <marcin@mejor.pl> wrote:
W dniu 01.03.2012 12:50, Szymon Guz pisze:
Hi Szymon,> If you have only 2 rows in the table, then the plan really doesn'tIt doesn't matter (in this case) how many records is in user_profile
> matter too much. Sorting two rows would be really fast :)
>
> Try to check it with 10k rows.
table. Planner does sorting.
Here is version with more rows:
$ explain (analyze,verbose,buffers) SELECT count(*) from (select * fromusers_profile order by id) u_p;-----------------------------------------------------------------------------------------------------------------------------------------------
QUERY
PLAN
Aggregate (cost=1593639.92..1593639.93 rows=1 width=0) (actual
time=11738.498..11738.498 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=2499 read=41749 written=10595, temp read=17107
written=17107
-> Sort (cost=1443640.26..1468640.21 rows=9999977 width=4) (actual
time=9804.461..10963.911 rows=10000000 loops=1)
Output: users_profile.id
Sort Key: users_profile.id
Sort Method: external sort Disk: 136856kB
Buffers: shared hit=2499 read=41749 written=10595, temp
read=17107 written=17107
-> Seq Scan on public.users_profile (cost=0.00..144247.77
rows=9999977 width=4) (actual time=0.021..1192.202 rows=10000000 loops=1)
Output: users_profile.id
Buffers: shared hit=2499 read=41749 written=10595
Total runtime: 11768.199 ms
(12 rows)
And without "order by":
$ explain (analyze,verbose,buffers) SELECT count(*) from (select * from
users_profile ) u_p;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=169247.71..169247.72 rows=1 width=0) (actual
time=1757.613..1757.613 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=2522 read=41726
-> Seq Scan on public.users_profile (cost=0.00..144247.77
rows=9999977 width=0) (actual time=0.032..946.166 rows=10000000 loops=1)
Output: users_profile.id
Buffers: shared hit=2522 read=41726
Total runtime: 1757.656 ms
(7 rows)
Could you provide the postgres version and the structure of users_profile table (with indexes)?
- Szymon
W dniu 01.03.2012 13:09, Szymon Guz pisze: > Could you provide the postgres version and the structure of > users_profile table (with indexes)? Argh, i forgot about version. It's postgresql-9.1.3. I don't think structre of users_profile is important here. Me idea is let planner ignore sorting completly. I don't want to have sort quicker (in this case, surely;)), i'd like to skip sorting completly because it doesn't influence for query result. Table isn't "real life", it only demonstrates than planner sometimes can safely skip some steps. Regards, Marcin
Marcin Miros*aw<marcin@mejor.pl> wrote: > SELECT count(*) > from (select * from users_profile order by id) u_p; > "order by id" can be ignored by planner. This has been discussed before. Certainly not all ORDER BY clauses within query steps can be ignored, so there would need to be code to determine whether it was actually useful, which wouldn't be free, either in terms of planning time or code maintenance. It wasn't judged to be worth the cost. If you want to avoid the cost of the sort, don't specify ORDER BY where it doesn't matter. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Marcin Miros*aw<marcin@mejor.pl> wrote: >> SELECT count(*) >> from (select * from users_profile order by id) u_p; >> "order by id" can be ignored by planner. > This has been discussed before. Certainly not all ORDER BY clauses > within query steps can be ignored, so there would need to be code to > determine whether it was actually useful, which wouldn't be free, > either in terms of planning time or code maintenance. It wasn't > judged to be worth the cost. If you want to avoid the cost of the > sort, don't specify ORDER BY where it doesn't matter. Considering that ORDER BY in a subquery isn't even legal per spec, there does not seem to be any tenable argument for supposing that a user wrote it there "by accident". It's much more likely that he had some semantic reason for it (say, an order-sensitive function in a higher query level) and that we'd break his results by ignoring the ORDER BY. I doubt that very many of the possible reasons for needing ordered output are reliably detectable by the planner, either. regards, tom lane
On Thu, Mar 1, 2012 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Marcin Miros*aw<marcin@mejor.pl> wrote: >>> SELECT count(*) >>> from (select * from users_profile order by id) u_p; > >>> "order by id" can be ignored by planner. > >> This has been discussed before. Certainly not all ORDER BY clauses >> within query steps can be ignored, so there would need to be code to >> determine whether it was actually useful, which wouldn't be free, >> either in terms of planning time or code maintenance. It wasn't >> judged to be worth the cost. If you want to avoid the cost of the >> sort, don't specify ORDER BY where it doesn't matter. > > Considering that ORDER BY in a subquery isn't even legal per spec, That's surprising ... normally it won't affect the result, but with an offset or limit it would. Does the offset or limit change the "not even legal" part? Something like: select * from foo where foo_id in (select bar_id from bar order by bar_id offset 10 limit 10); Craig > there does not seem to be any tenable argument for supposing that > a user wrote it there "by accident". It's much more likely that > he had some semantic reason for it (say, an order-sensitive function > in a higher query level) and that we'd break his results by ignoring > the ORDER BY. I doubt that very many of the possible reasons for > needing ordered output are reliably detectable by the planner, either. > > regards, tom lane > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
Craig James <cjames@emolecules.com> writes: > On Thu, Mar 1, 2012 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Considering that ORDER BY in a subquery isn't even legal per spec, > That's surprising ... normally it won't affect the result, but with an > offset or limit it would. Does the offset or limit change the "not > even legal" part? Well, actually, the SQL standard didn't have anything comparable to offset/limit until SQL:2008, either. But I have to take back my statement above. It wasn't legal in SQL99, but evidently they added it in the 2003 or 2008 edition, presumably to go with the limit functionality. Anyway, the long and the short of it is that people depend on ORDER BY in subqueries to be honored, and we're not going to break that. regards, tom lane