Thread: optimize query with a maximum(date) extraction
Hi all, I need to improve a query like : SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; Stupidly, I create a B-tree index on my_table(the_date), witch is logically not used in my query, because it's not with aconstant ? isn't it ? I know that I can't create a function index with an aggregative function. How I can do ? thanks, jsubei _____________________________________________________________________________ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
bad address kep his from going to the list on my first try ... apologies to the moderators.
-----Original Message-----
From: Gregory Williamson
Sent: Wed 9/5/2007 4:59 AM
To: JS Ubei; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] optimize query with a maximum(date) extraction
In order to help others help you, you might provide the following:
table description (columns, types, indexes) (\d tablename from psql does nicely)
the same query run as "EXPLAIN ANALYZE <your query here>;"
Greg Williamson
Senior DBA
GlobeXplorer LLC, a DigitalGlobe company
Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
(My corporate masters made me say this.)
-----Original Message-----
From: pgsql-performance-owner@postgresql.org on behalf of JS Ubei
Sent: Wed 9/5/2007 3:53 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] optimize query with a maximum(date) extraction
Hi all,
I need to improve a query like :
SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;
Stupidly, I create a B-tree index on my_table(the_date), witch is logically not used in my query, because it's not with a constant ? isn't it ?
I know that I can't create a function index with an aggregative function.
How I can do ?
thanks,
jsubei
_____________________________________________________________________________
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
"JS Ubei" <jsubei@yahoo.fr> writes: > Hi all, > > I need to improve a query like : > > SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; > > Stupidly, I create a B-tree index on my_table(the_date), witch is logically > not used in my query, because it's not with a constant ? isn't it ? That's not so stupid, it would be possible for a database to make use of such an index for this query. But it's not one of the plans Postgres knows how to execute. I don't think you'll find anything much faster for this particular query. You could profile running these two (non-standard) queries: SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC I think the first of these can actually use your index but the latter can't unless you create one for it specifically (which is not so easy -- it'll be easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the query you already have. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Gregory Stark" <stark@enterprisedb.com> writes: > "JS Ubei" <jsubei@yahoo.fr> writes: > >> I need to improve a query like : >> >> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; >... > I don't think you'll find anything much faster for this particular query. You > could profile running these two (non-standard) queries: > > SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC > SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC Something else you might try: select id, (select min(the_date) from my_table where id=x.id) as min_date, (select max(the_date) from my_table where id=x.id) as max_date from (select distinct id from my_table) Recent versions of Postgres do know how to use the index for a simple ungrouped min() or max() like these subqueries. This would be even better if you have a better source for the list of distinct ids you're interested in than my_table. If you have a source that just has one record for each id then you won't need an extra step to eliminate duplicates. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Gregory Stark" <stark@enterprisedb.com> writes:
> "JS Ubei" <jsubei@yahoo.fr> writes:
>
>> I need to improve a query like :
>>
>> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;
>...
> I don't think you'll find anything much faster for this particular query. You
> could profile running these two (non-standard) queries:
>
> SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC
> SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC
Something else you might try:
select id,
(select min(the_date) from my_table where id=x.id) as min_date,
(select max(the_date) from my_table where id= x.id) as max_date
from (select distinct id from my_table)
Recent versions of Postgres do know how to use the index for a simple
ungrouped min() or max() like these subqueries.
This would be even better if you have a better source for the list of distinct
ids you're interested in than my_table. If you have a source that just has one
record for each id then you won't need an extra step to eliminate duplicates.
My personal reaction is why are you using distinct at all?
why not
select id,
min(the_date) as min_date,
max(the_date) as max_date
from my_table group by id;
Since 8.0 or was it earlier this will use an index should a reasonable one exist.
Peter.
> > why not > > select id, > min(the_date) as min_date, > max(the_date) as max_date > from my_table group by id; > > Since 8.0 or was it earlier this will use an index should a reasonable one > exist. without any limits, seq scan is optimal. Regards Pavel Stehule
On Wed, Sep 05, 2007 at 12:30:21PM +0100, Gregory Stark wrote: > SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC > SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC > I think the first of these can actually use your index but the latter can't > unless you create one for it specifically (which is not so easy -- it'll be > easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the > query you already have. it's easy to fix the second query (fix to use index) - just change order by to: order by id desc, the_date desc. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
"hubert depesz lubaczewski" <depesz@depesz.com> writes: > On Wed, Sep 05, 2007 at 12:30:21PM +0100, Gregory Stark wrote: >> SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC >> SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC >> I think the first of these can actually use your index but the latter can't >> unless you create one for it specifically (which is not so easy -- it'll be >> easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the >> query you already have. > > it's easy to fix the second query (fix to use index) - just change order > by to: > order by id desc, the_date desc. Cute. I didn't think of that, thanks -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Peter Childs" <peterachilds@gmail.com> writes: > My personal reaction is why are you using distinct at all? > > why not > > select id, > min(the_date) as min_date, > max(the_date) as max_date > from my_table group by id; > > Since 8.0 or was it earlier this will use an index should a reasonable one > exist. That's not true for this query. In fact that was precisely the original query he as looking to optimize. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Pavel Stehule" <pavel.stehule@gmail.com> writes: >> >> why not >> >> select id, >> min(the_date) as min_date, >> max(the_date) as max_date >> from my_table group by id; >> >> Since 8.0 or was it earlier this will use an index should a reasonable one >> exist. As I mentioned in the other post that's not true for this query. > without any limits, seq scan is optimal. That's not necessarily true either. You could have ten distinct ids and millions of dates for them. In that case a scan of the index which jumped around to scan from the beginning and end of each distinct id value would be faster. There's just no such plan type in Postgres currently. You can simulate such a plan with the subqueries I described but there's a bit more overhead than necessary and you need a reasonably efficient source of the distinct ids. Also it may or may not be faster than simply scanning the whole table like above and simulating it with subqueries makes it impossible to choose the best plan. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > You can simulate such a plan with the subqueries I described but > there's a bit more overhead than necessary and you need a reasonably > efficient source of the distinct ids. Yeah, that seems like the $64 question. If you have no better way of finding out the set of ID values than a seqscan, then there's no point in trying to optimize the min/max accumulation ... regards, tom lane
BTW, will it improve something if you change your index to "my_table( id, the_date )"? Rgds, -Dimitri On 9/5/07, JS Ubei <jsubei@yahoo.fr> wrote: > Hi all, > > I need to improve a query like : > > SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; > > Stupidly, I create a B-tree index on my_table(the_date), witch is logically > not used in my query, because it's not with a constant ? isn't it ? > > I know that I can't create a function index with an aggregative function. > > How I can do ? > > thanks, > > jsubei > > > > > > _____________________________________________________________________________ > Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >