Thread: Order by optimisations?
Hi, Does PostgreSQL do the following optimisation: SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date; or in fact even better (for my situation) SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01' ORDER BY date; Does it know that the input to the sort routine is already sorted and hence is a no-op? Chris
On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote: > Hi, > > Does PostgreSQL do the following optimisation: > > SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date; > > or in fact even better (for my situation) > > SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01' > ORDER BY date; > > Does it know that the input to the sort routine is already sorted and > hence is a no-op? Yes try EXPLAIN ;) -- Hannu Krosing <hannu@skype.net>
Hannu Krosing wrote: > On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote: > >>Hi, >> >>Does PostgreSQL do the following optimisation: >> >>SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date; >> >>or in fact even better (for my situation) >> >>SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01' >>ORDER BY date; >> >>Does it know that the input to the sort routine is already sorted and >>hence is a no-op? > > > Yes > > try EXPLAIN ;) Doesn't seem like it does: usatest=# explain select * from users_myfoods_map where date='2004-11-21' order by date; QUERY PLAN --------------------------------------------------------------------------- Sort (cost=17.17..17.48 rows=123 width=22) Sort Key: date -> Seq Scan on users_myfoods_map (cost=0.00..12.90 rows=123 width=22) Filter: (date = '2004-11-21'::date) (4 rows) The sort cost is non-zero. Or am I not looking at the right thing... Chris
Christopher Kings-Lynne wrote: > Doesn't seem like it does: > > usatest=# explain select * from users_myfoods_map where date='2004-11-21' > order by date; > QUERY PLAN > --------------------------------------------------------------------------- > Sort (cost=17.17..17.48 rows=123 width=22) > Sort Key: date > -> Seq Scan on users_myfoods_map (cost=0.00..12.90 rows=123 width=22) > Filter: (date = '2004-11-21'::date) > (4 rows) > > The sort cost is non-zero. Or am I not looking at the right thing... You are looking at the right thing, AFAIK. Well, it seems the planner cannot reason that if a field should have only one value, sorting on that field is not needed. I remember there are examples where the planner will know that the input to a sort is already sorted and will skip the sort. Tom will be able to explain if this here is a reasonable optimization. I *guess* it could be done, with some restrictions. Best Regards, Michael Paesold
Christopher Kings-Lynne said: > > usatest=# explain select * from users_myfoods_map where > date='2004-11-21' order by date; I assume that this is program generated SQL, as I hope a human would know better than to write this. In which case, isn't the answer to improve the generator rather than expect postgres to make up for its defficiencies? cheers andrew
On 7/14/05, Michael Paesold wrote: > Christopher Kings-Lynne wrote: >> >> usatest=# explain select * from users_myfoods_map where date='2004-11-21' >> order by date; >> QUERY PLAN >> --------------------------------------------------------------------------- >> Sort (cost=17.17..17.48 rows=123 width=22) >> Sort Key: date >> -> Seq Scan on users_myfoods_map (cost=0.00..12.90 rows=123 width=22) >> Filter: (date = '2004-11-21'::date) >> (4 rows) >> >> The sort cost is non-zero. Or am I not looking at the right thing... > > You are looking at the right thing, AFAIK. Well, it seems the planner cannot > reason that if a field should have only one value, sorting on that field is > not needed. For the planner to deduct that, it should first deduct that the field should only have one value. Is that a deduction the planner can even make for this query if we consider for instance implicit timestamp to date casting? > I remember there are examples where the planner will know that the input to >a sort is already sorted and will skip the sort. The planner knows the output of an indexscan is sorted. With a proper index on the "date" field (I hope that is not really the name) and favourable statistics the planner should switch to an indexscan and the order node should disappear. Jochem
Hannu Krosing <hannu@skype.net> writes: > On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote: >> Does it know that the input to the sort routine is already sorted and >> hence is a no-op? > Yes No, but in most cases this will use an index and hence will assume that the index is responsible for ordering. regards, tom lane
> I assume that this is program generated SQL, as I hope a human would know > better than to write this. In which case, isn't the answer to improve the > generator rather than expect postgres to make up for its defficiencies? Well, the issue in my case is we have user food diaries. Usually, 99.9999% of the time we pull up a single date of their diary. However, for printing purposes we need a range. It's a large query, so it's implemented as a simple PL/PSQL stored procedure. I was trying to avoid having to c&p the entire stored proc to make a 'range version'. If PostgreSQL was smart enough to deal with a range of 1 day and a sort on it efficiently, I'd just use the range stored proc exclusively.... Chris
>>>Does it know that the input to the sort routine is already sorted and >>>hence is a no-op? > >>Yes > > No, but in most cases this will use an index and hence will assume that > the index is responsible for ordering. OK, so what's going on here? usa=> explain select * from users_myfoods_map where user_id=1 and date='2003-11-03' order by date; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Sort (cost=4.84..4.85 rows=2 width=22) Sort Key: date -> Index Scan using users_myfoods_map_user_id_date_key on users_myfoods_map (cost=0.00..4.83 rows=2 width=22) Index Cond: ((user_id = 1) AND (date = '2003-11-03'::date)) (4 rows) (That's on our enormous live table) Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > OK, so what's going on here? > usa=> explain select * from users_myfoods_map where user_id=1 and > date='2003-11-03' order by date; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------- > Sort (cost=4.84..4.85 rows=2 width=22) > Sort Key: date > -> Index Scan using users_myfoods_map_user_id_date_key on > users_myfoods_map (cost=0.00..4.83 rows=2 width=22) > Index Cond: ((user_id = 1) AND (date = '2003-11-03'::date)) > (4 rows) Well, date evidently isn't the high-order key of this index. But why exactly are you worried about a sort of 2 rows? regards, tom lane
> Well, date evidently isn't the high-order key of this index. But why > exactly are you worried about a sort of 2 rows? Aha that's nailed it: usa=> explain select * from users_myfoods_map where user_id=1 and date between '2003-11-03' and '2003-11-03' order by user_id, date; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Index Scanusing users_myfoods_map_user_id_date_key on users_myfoods_map (cost=0.00..3.78 rows=1 width=22) Index Cond: ((user_id = 1) AND (date >= '2003-11-03'::date) AND (date <= '2003-11-03'::date)) (2 rows) I don't care about this particular result. But imagine it running thousands of times a minute, with result sets between 0 and 50 rows... Chris
>>>Does it know that the input to the sort routine is already sorted and >>>hence is a no-op? > > >>Yes > > > No, but in most cases this will use an index and hence will assume that > the index is responsible for ordering. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly