Thread: Order by optimisations?

Order by optimisations?

From
Christopher Kings-Lynne
Date:
Hi,

Does PostgreSQL do the following optimisation:

SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date;

or in fact even better (for my situation)

SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01' 
ORDER BY date;

Does it know that the input to the sort routine is already sorted and 
hence is a no-op?

Chris



Re: Order by optimisations?

From
Hannu Krosing
Date:
On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote:
> Hi,
> 
> Does PostgreSQL do the following optimisation:
> 
> SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date;
> 
> or in fact even better (for my situation)
> 
> SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01' 
> ORDER BY date;
> 
> Does it know that the input to the sort routine is already sorted and 
> hence is a no-op?

Yes

try EXPLAIN ;)

-- 
Hannu Krosing <hannu@skype.net>



Re: Order by optimisations?

From
Christopher Kings-Lynne
Date:

Hannu Krosing wrote:
> On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote:
> 
>>Hi,
>>
>>Does PostgreSQL do the following optimisation:
>>
>>SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date;
>>
>>or in fact even better (for my situation)
>>
>>SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01' 
>>ORDER BY date;
>>
>>Does it know that the input to the sort routine is already sorted and 
>>hence is a no-op?
> 
> 
> Yes
> 
> try EXPLAIN ;)

Doesn't seem like it does:

usatest=# explain select * from users_myfoods_map where 
date='2004-11-21' order by date;                                QUERY PLAN
--------------------------------------------------------------------------- Sort  (cost=17.17..17.48 rows=123 width=22)
 Sort Key: date   ->  Seq Scan on users_myfoods_map  (cost=0.00..12.90 rows=123 width=22)         Filter: (date =
'2004-11-21'::date)
(4 rows)

The sort cost is non-zero.  Or am I not looking at the right thing...

Chris



Re: Order by optimisations?

From
"Michael Paesold"
Date:
Christopher Kings-Lynne wrote:
> Doesn't seem like it does:
>
> usatest=# explain select * from users_myfoods_map where date='2004-11-21' 
> order by date;
>                                 QUERY PLAN
> ---------------------------------------------------------------------------
>  Sort  (cost=17.17..17.48 rows=123 width=22)
>    Sort Key: date
>    ->  Seq Scan on users_myfoods_map  (cost=0.00..12.90 rows=123 width=22)
>          Filter: (date = '2004-11-21'::date)
> (4 rows)
>
> The sort cost is non-zero.  Or am I not looking at the right thing...

You are looking at the right thing, AFAIK. Well, it seems the planner cannot 
reason that if a field should have only one value, sorting on that field is 
not needed.

I remember there are examples where the planner will know that the input to 
a sort is already sorted and will skip the sort. Tom will be able to explain 
if this here is a reasonable optimization. I *guess* it could be done, with 
some restrictions.

Best Regards,
Michael Paesold 



Re: Order by optimisations?

From
"Andrew Dunstan"
Date:
Christopher Kings-Lynne said:
>

> usatest=# explain select * from users_myfoods_map where
> date='2004-11-21' order by date;

I assume that this is program generated SQL, as I hope a human would know
better than to write this. In which case, isn't the answer to improve the
generator rather than expect postgres to make up for its defficiencies?

cheers

andrew






Re: Order by optimisations?

From
Jochem van Dieten
Date:
On 7/14/05, Michael Paesold wrote:
> Christopher Kings-Lynne wrote:
>>
>> usatest=# explain select * from users_myfoods_map where date='2004-11-21'
>> order by date;
>>                                 QUERY PLAN
>> ---------------------------------------------------------------------------
>>  Sort  (cost=17.17..17.48 rows=123 width=22)
>>    Sort Key: date
>>    ->  Seq Scan on users_myfoods_map  (cost=0.00..12.90 rows=123 width=22)
>>          Filter: (date = '2004-11-21'::date)
>> (4 rows)
>>
>> The sort cost is non-zero.  Or am I not looking at the right thing...
>
> You are looking at the right thing, AFAIK. Well, it seems the planner cannot
> reason that if a field should have only one value, sorting on that field is
> not needed.

For the planner to deduct that, it should first deduct that the field
should only have one value. Is that a deduction the planner can even
make for this query if we consider for instance implicit timestamp to
date casting?


> I remember there are examples where the planner will know that the input to
>a sort is already sorted and will skip the sort.

The planner knows the output of an indexscan is sorted. With a proper
index on the "date" field (I hope that is not really the name) and
favourable statistics the planner should switch to an indexscan and
the order node should disappear.

Jochem


Re: Order by optimisations?

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote:
>> Does it know that the input to the sort routine is already sorted and 
>> hence is a no-op?

> Yes

No, but in most cases this will use an index and hence will assume that
the index is responsible for ordering.
        regards, tom lane


Re: Order by optimisations?

From
Christopher Kings-Lynne
Date:
> I assume that this is program generated SQL, as I hope a human would know
> better than to write this. In which case, isn't the answer to improve the
> generator rather than expect postgres to make up for its defficiencies?

Well, the issue in my case is we have user food diaries.  Usually, 
99.9999% of the time we pull up a single date of their diary.  However, 
for printing purposes we need a range.

It's a large query, so it's implemented as a simple PL/PSQL stored 
procedure.

I was trying to avoid having to c&p the entire stored proc to make a 
'range version'.

If PostgreSQL was smart enough to deal with a range of 1 day and a sort 
on it efficiently, I'd just use the range stored proc exclusively....

Chris



Re: Order by optimisations?

From
Christopher Kings-Lynne
Date:
>>>Does it know that the input to the sort routine is already sorted and 
>>>hence is a no-op?
> 
>>Yes
> 
> No, but in most cases this will use an index and hence will assume that
> the index is responsible for ordering.

OK, so what's going on here?

usa=> explain select * from users_myfoods_map where user_id=1 and 
date='2003-11-03' order by date;                                                    QUERY PLAN 

-------------------------------------------------------------------------------------------------------------------
Sort (cost=4.84..4.85 rows=2 width=22)   Sort Key: date   ->  Index Scan using users_myfoods_map_user_id_date_key on 
 
users_myfoods_map  (cost=0.00..4.83 rows=2 width=22)         Index Cond: ((user_id = 1) AND (date =
'2003-11-03'::date))
(4 rows)

(That's on our enormous live table)

Chris



Re: Order by optimisations?

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> OK, so what's going on here?

> usa=> explain select * from users_myfoods_map where user_id=1 and 
> date='2003-11-03' order by date;
>                                                      QUERY PLAN 

> -------------------------------------------------------------------------------------------------------------------
>   Sort  (cost=4.84..4.85 rows=2 width=22)
>     Sort Key: date
>     ->  Index Scan using users_myfoods_map_user_id_date_key on 
> users_myfoods_map  (cost=0.00..4.83 rows=2 width=22)
>           Index Cond: ((user_id = 1) AND (date = '2003-11-03'::date))
> (4 rows)

Well, date evidently isn't the high-order key of this index.  But why
exactly are you worried about a sort of 2 rows?
        regards, tom lane


Re: Order by optimisations?

From
Christopher Kings-Lynne
Date:
> Well, date evidently isn't the high-order key of this index.  But why
> exactly are you worried about a sort of 2 rows?

Aha that's nailed it:

usa=> explain select * from users_myfoods_map where user_id=1 and date 
between '2003-11-03' and '2003-11-03' order by user_id, date;                                                 QUERY
PLAN
 

------------------------------------------------------------------------------------------------------------- Index
Scanusing users_myfoods_map_user_id_date_key on 
 
users_myfoods_map  (cost=0.00..3.78 rows=1 width=22)   Index Cond: ((user_id = 1) AND (date >= '2003-11-03'::date) AND

(date <= '2003-11-03'::date))
(2 rows)


I don't care about this particular result.  But imagine it running 
thousands of times a minute, with result sets between 0 and 50 rows...

Chris



Re: Order by optimisations?

From
Christopher Kings-Lynne
Date:
>>>Does it know that the input to the sort routine is already sorted and 
>>>hence is a no-op?
> 
> 
>>Yes
> 
> 
> No, but in most cases this will use an index and hence will assume that
> the index is responsible for ordering.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly