Thread: [planner] Ignore "order by" in subselect if parrent do count(*)

[planner] Ignore "order by" in subselect if parrent do count(*)

From
Marcin Mirosław
Date:
Hello,
my example query (and explain) is:
$ explain SELECT count(*) from (select * from users_profile order by id)
u_p;
                                QUERY PLAN
---------------------------------------------------------------------------
 Aggregate  (cost=1.06..1.07 rows=1 width=0)
   ->  Sort  (cost=1.03..1.03 rows=2 width=572)
         Sort Key: users_profile.id
         ->  Seq Scan on users_profile  (cost=0.00..1.02 rows=2 width=572)
(4 rows)

Meseems "order by id" can be ignored by planner. It should speed up
query without side effect. I know the query should be fixed but this is
real and simplified query from real application.
Does postgresql team think ppostgres should be smarter than user and fix
user queries? If answer is positive please treat this as "feature request".
Thank you and regards,
Marcin.

Re: [planner] Ignore "order by" in subselect if parrent do count(*)

From
Szymon Guz
Date:


On 1 March 2012 12:45, Marcin Mirosław <marcin@mejor.pl> wrote:
Hello,
my example query (and explain) is:
$ explain SELECT count(*) from (select * from users_profile order by id)
u_p;
                               QUERY PLAN
---------------------------------------------------------------------------
 Aggregate  (cost=1.06..1.07 rows=1 width=0)
  ->  Sort  (cost=1.03..1.03 rows=2 width=572)
        Sort Key: users_profile.id
        ->  Seq Scan on users_profile  (cost=0.00..1.02 rows=2 width=572)
(4 rows)

Meseems "order by id" can be ignored by planner. It should speed up
query without side effect. I know the query should be fixed but this is
real and simplified query from real application.
Does postgresql team think ppostgres should be smarter than user and fix
user queries? If answer is positive please treat this as "feature request".
Thank you and regards,
Marcin.


If you have only 2 rows in the table, then the plan really doesn't matter too much. Sorting two rows would be really fast :)

Try to check it with 10k rows.

regards
Szymon

Re: [planner] Ignore "order by" in subselect if parrent do count(*)

From
Marcin Mirosław
Date:
W dniu 01.03.2012 12:50, Szymon Guz pisze:
Hi Szymon,
> If you have only 2 rows in the table, then the plan really doesn't
> matter too much. Sorting two rows would be really fast :)
>
> Try to check it with 10k rows.

It doesn't matter (in this case) how many records is in user_profile
table. Planner does sorting.
Here is version with more rows:
$ explain (analyze,verbose,buffers) SELECT count(*) from (select * from
users_profile order by id) u_p;
                                                                  QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1593639.92..1593639.93 rows=1 width=0) (actual
time=11738.498..11738.498 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=2499 read=41749 written=10595, temp read=17107
written=17107
   ->  Sort  (cost=1443640.26..1468640.21 rows=9999977 width=4) (actual
time=9804.461..10963.911 rows=10000000 loops=1)
         Output: users_profile.id
         Sort Key: users_profile.id
         Sort Method: external sort  Disk: 136856kB
         Buffers: shared hit=2499 read=41749 written=10595, temp
read=17107 written=17107
         ->  Seq Scan on public.users_profile  (cost=0.00..144247.77
rows=9999977 width=4) (actual time=0.021..1192.202 rows=10000000 loops=1)
               Output: users_profile.id
               Buffers: shared hit=2499 read=41749 written=10595
 Total runtime: 11768.199 ms
(12 rows)

And without "order by":
$ explain (analyze,verbose,buffers) SELECT count(*) from (select * from
users_profile ) u_p;
                                                               QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=169247.71..169247.72 rows=1 width=0) (actual
time=1757.613..1757.613 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=2522 read=41726
   ->  Seq Scan on public.users_profile  (cost=0.00..144247.77
rows=9999977 width=0) (actual time=0.032..946.166 rows=10000000 loops=1)
         Output: users_profile.id
         Buffers: shared hit=2522 read=41726
 Total runtime: 1757.656 ms
(7 rows)

Re: [planner] Ignore "order by" in subselect if parrent do count(*)

From
Szymon Guz
Date:


On 1 March 2012 13:02, Marcin Mirosław <marcin@mejor.pl> wrote:
W dniu 01.03.2012 12:50, Szymon Guz pisze:
Hi Szymon,
> If you have only 2 rows in the table, then the plan really doesn't
> matter too much. Sorting two rows would be really fast :)
>
> Try to check it with 10k rows.

It doesn't matter (in this case) how many records is in user_profile
table. Planner does sorting.
Here is version with more rows:
$ explain (analyze,verbose,buffers) SELECT count(*) from (select * from
users_profile order by id) u_p;
                                                                 QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1593639.92..1593639.93 rows=1 width=0) (actual
time=11738.498..11738.498 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=2499 read=41749 written=10595, temp read=17107
written=17107
  ->  Sort  (cost=1443640.26..1468640.21 rows=9999977 width=4) (actual
time=9804.461..10963.911 rows=10000000 loops=1)
        Output: users_profile.id
        Sort Key: users_profile.id
        Sort Method: external sort  Disk: 136856kB
        Buffers: shared hit=2499 read=41749 written=10595, temp
read=17107 written=17107
        ->  Seq Scan on public.users_profile  (cost=0.00..144247.77
rows=9999977 width=4) (actual time=0.021..1192.202 rows=10000000 loops=1)
              Output: users_profile.id
              Buffers: shared hit=2499 read=41749 written=10595
 Total runtime: 11768.199 ms
(12 rows)

And without "order by":
$ explain (analyze,verbose,buffers) SELECT count(*) from (select * from
users_profile ) u_p;
                                                              QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=169247.71..169247.72 rows=1 width=0) (actual
time=1757.613..1757.613 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=2522 read=41726
  ->  Seq Scan on public.users_profile  (cost=0.00..144247.77
rows=9999977 width=0) (actual time=0.032..946.166 rows=10000000 loops=1)
        Output: users_profile.id
        Buffers: shared hit=2522 read=41726
 Total runtime: 1757.656 ms
(7 rows)


Could you provide the postgres version and the structure of users_profile table (with indexes)?

- Szymon

Re: [planner] Ignore "order by" in subselect if parrent do count(*)

From
Marcin Mirosław
Date:
W dniu 01.03.2012 13:09, Szymon Guz pisze:
> Could you provide the postgres version and the structure of
> users_profile table (with indexes)?

Argh, i forgot about version. It's postgresql-9.1.3.
I don't think structre of users_profile is important here. Me idea is
let planner ignore sorting completly. I don't want to have sort quicker
(in this case, surely;)), i'd like to skip sorting completly because it
doesn't influence for query result.
Table isn't "real life", it only demonstrates than planner sometimes can
safely skip some steps.
Regards,
Marcin

Re: [planner] Ignore "order by" in subselect if parrent do count(*)

From
"Kevin Grittner"
Date:
Marcin Miros*aw<marcin@mejor.pl> wrote:

> SELECT count(*)
>   from (select * from users_profile order by id) u_p;

> "order by id" can be ignored by planner.

This has been discussed before.  Certainly not all ORDER BY clauses
within query steps can be ignored, so there would need to be code to
determine whether it was actually useful, which wouldn't be free,
either in terms of planning time or code maintenance.  It wasn't
judged to be worth the cost.  If you want to avoid the cost of the
sort, don't specify ORDER BY where it doesn't matter.

-Kevin

Re: [planner] Ignore "order by" in subselect if parrent do count(*)

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Marcin Miros*aw<marcin@mejor.pl> wrote:
>> SELECT count(*)
>> from (select * from users_profile order by id) u_p;

>> "order by id" can be ignored by planner.

> This has been discussed before.  Certainly not all ORDER BY clauses
> within query steps can be ignored, so there would need to be code to
> determine whether it was actually useful, which wouldn't be free,
> either in terms of planning time or code maintenance.  It wasn't
> judged to be worth the cost.  If you want to avoid the cost of the
> sort, don't specify ORDER BY where it doesn't matter.

Considering that ORDER BY in a subquery isn't even legal per spec,
there does not seem to be any tenable argument for supposing that
a user wrote it there "by accident".  It's much more likely that
he had some semantic reason for it (say, an order-sensitive function
in a higher query level) and that we'd break his results by ignoring
the ORDER BY.  I doubt that very many of the possible reasons for
needing ordered output are reliably detectable by the planner, either.

            regards, tom lane

Re: [planner] Ignore "order by" in subselect if parrent do count(*)

From
Craig James
Date:
On Thu, Mar 1, 2012 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Marcin Miros*aw<marcin@mejor.pl> wrote:
>>> SELECT count(*)
>>> from (select * from users_profile order by id) u_p;
>
>>> "order by id" can be ignored by planner.
>
>> This has been discussed before.  Certainly not all ORDER BY clauses
>> within query steps can be ignored, so there would need to be code to
>> determine whether it was actually useful, which wouldn't be free,
>> either in terms of planning time or code maintenance.  It wasn't
>> judged to be worth the cost.  If you want to avoid the cost of the
>> sort, don't specify ORDER BY where it doesn't matter.
>
> Considering that ORDER BY in a subquery isn't even legal per spec,

That's surprising ... normally it won't affect the result, but with an
offset or limit it would.  Does the offset or limit change the "not
even legal" part?  Something like:

  select * from foo where foo_id in (select bar_id from bar order by
bar_id offset 10 limit 10);

Craig

> there does not seem to be any tenable argument for supposing that
> a user wrote it there "by accident".  It's much more likely that
> he had some semantic reason for it (say, an order-sensitive function
> in a higher query level) and that we'd break his results by ignoring
> the ORDER BY.  I doubt that very many of the possible reasons for
> needing ordered output are reliably detectable by the planner, either.
>
>                        regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: [planner] Ignore "order by" in subselect if parrent do count(*)

From
Tom Lane
Date:
Craig James <cjames@emolecules.com> writes:
> On Thu, Mar 1, 2012 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Considering that ORDER BY in a subquery isn't even legal per spec,

> That's surprising ... normally it won't affect the result, but with an
> offset or limit it would.  Does the offset or limit change the "not
> even legal" part?

Well, actually, the SQL standard didn't have anything comparable to
offset/limit until SQL:2008, either.  But I have to take back my
statement above.  It wasn't legal in SQL99, but evidently they added it
in the 2003 or 2008 edition, presumably to go with the limit
functionality.

Anyway, the long and the short of it is that people depend on ORDER BY
in subqueries to be honored, and we're not going to break that.

            regards, tom lane