Thread: optimize query with a maximum(date) extraction

optimize query with a maximum(date) extraction

From
JS Ubei
Date:
Hi all,

I need to improve a query like :

SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;

Stupidly, I create a B-tree index on my_table(the_date), witch is logically not used in my query, because it's not with
aconstant ? isn't it ? 

I know that I can't create a function index with an aggregative function.

How I can do ?

thanks,

jsubei




      _____________________________________________________________________________
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail

FW: optimize query with a maximum(date) extraction

From
"Gregory Williamson"
Date:

bad address kep his from going to the list on my first try ... apologies to the moderators.

-----Original Message-----
From: Gregory Williamson
Sent: Wed 9/5/2007 4:59 AM
To: JS Ubei; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] optimize query with a maximum(date) extraction

In order to help others help you, you might provide the following:

table description (columns, types, indexes) (\d tablename from psql does nicely)

the same query run as "EXPLAIN ANALYZE <your query here>;"

Greg Williamson
Senior DBA
GlobeXplorer LLC, a DigitalGlobe company

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)



-----Original Message-----
From: pgsql-performance-owner@postgresql.org on behalf of JS Ubei
Sent: Wed 9/5/2007 3:53 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] optimize query with a maximum(date) extraction

Hi all,

I need to improve a query like :

SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;

Stupidly, I create a B-tree index on my_table(the_date), witch is logically not used in my query, because it's not with a constant ? isn't it ?

I know that I can't create a function index with an aggregative function.

How I can do ?

thanks,

jsubei




      _____________________________________________________________________________
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Re: optimize query with a maximum(date) extraction

From
Gregory Stark
Date:
"JS Ubei" <jsubei@yahoo.fr> writes:

> Hi all,
>
> I need to improve a query like :
>
> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;
>
> Stupidly, I create a B-tree index on my_table(the_date), witch is logically
> not used in my query, because it's not with a constant ? isn't it ?

That's not so stupid, it would be possible for a database to make use of such
an index for this query. But it's not one of the plans Postgres knows how to
execute.

I don't think you'll find anything much faster for this particular query. You
could profile running these two (non-standard) queries:

SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC
SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC

I think the first of these can actually use your index but the latter can't
unless you create one for it specifically (which is not so easy -- it'll be
easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the
query you already have.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: optimize query with a maximum(date) extraction

From
Gregory Stark
Date:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "JS Ubei" <jsubei@yahoo.fr> writes:
>
>> I need to improve a query like :
>>
>> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;
>...
> I don't think you'll find anything much faster for this particular query. You
> could profile running these two (non-standard) queries:
>
> SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC
> SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC

Something else you might try:

select id,
       (select min(the_date) from my_table where id=x.id) as min_date,
       (select max(the_date) from my_table where id=x.id) as max_date
  from (select distinct id from my_table)

Recent versions of Postgres do know how to use the index for a simple
ungrouped min() or max() like these subqueries.

This would be even better if you have a better source for the list of distinct
ids you're interested in than my_table. If you have a source that just has one
record for each id then you won't need an extra step to eliminate duplicates.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: optimize query with a maximum(date) extraction

From
"Peter Childs"
Date:


On 05/09/07, Gregory Stark <stark@enterprisedb.com> wrote:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "JS Ubei" <jsubei@yahoo.fr> writes:
>
>> I need to improve a query like :
>>
>> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;
>...
> I don't think you'll find anything much faster for this particular query. You
> could profile running these two (non-standard) queries:
>
> SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC
> SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC

Something else you might try:

select id,
       (select min(the_date) from my_table where id=x.id) as min_date,
       (select max(the_date) from my_table where id= x.id) as max_date
  from (select distinct id from my_table)

Recent versions of Postgres do know how to use the index for a simple
ungrouped min() or max() like these subqueries.

This would be even better if you have a better source for the list of distinct
ids you're interested in than my_table. If you have a source that just has one
record for each id then you won't need an extra step to eliminate duplicates.


My personal reaction is why are you using distinct at all?

why not

select id,
       min(the_date) as min_date,
       max(the_date) as max_date
  from my_table group by id;

Since 8.0 or was it earlier this will use an index should a reasonable one exist.

Peter.



Re: optimize query with a maximum(date) extraction

From
"Pavel Stehule"
Date:
>
> why not
>
> select id,
>        min(the_date) as min_date,
>        max(the_date) as max_date
>   from my_table group by id;
>
> Since 8.0 or was it earlier this will use an index should a reasonable one
> exist.

without any limits, seq scan is optimal.

Regards
Pavel Stehule

Re: optimize query with a maximum(date) extraction

From
hubert depesz lubaczewski
Date:
On Wed, Sep 05, 2007 at 12:30:21PM +0100, Gregory Stark wrote:
> SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC
> SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC
> I think the first of these can actually use your index but the latter can't
> unless you create one for it specifically (which is not so easy -- it'll be
> easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the
> query you already have.

it's easy to fix the second query (fix to use index) - just change order
by to:
order by id desc, the_date desc.

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: optimize query with a maximum(date) extraction

From
Gregory Stark
Date:
"hubert depesz lubaczewski" <depesz@depesz.com> writes:

> On Wed, Sep 05, 2007 at 12:30:21PM +0100, Gregory Stark wrote:
>> SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC
>> SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC
>> I think the first of these can actually use your index but the latter can't
>> unless you create one for it specifically (which is not so easy -- it'll be
>> easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the
>> query you already have.
>
> it's easy to fix the second query (fix to use index) - just change order
> by to:
> order by id desc, the_date desc.

Cute. I didn't think of that, thanks

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: optimize query with a maximum(date) extraction

From
Gregory Stark
Date:
"Peter Childs" <peterachilds@gmail.com> writes:

> My personal reaction is why are you using distinct at all?
>
> why not
>
> select id,
>        min(the_date) as min_date,
>        max(the_date) as max_date
>   from my_table group by id;
>
> Since 8.0 or was it earlier this will use an index should a reasonable one
> exist.

That's not true for this query. In fact that was precisely the original query
he as looking to optimize.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: optimize query with a maximum(date) extraction

From
Gregory Stark
Date:
"Pavel Stehule" <pavel.stehule@gmail.com> writes:

>>
>> why not
>>
>> select id,
>>        min(the_date) as min_date,
>>        max(the_date) as max_date
>>   from my_table group by id;
>>
>> Since 8.0 or was it earlier this will use an index should a reasonable one
>> exist.

As I mentioned in the other post that's not true for this query.

> without any limits, seq scan is optimal.

That's not necessarily true either. You could have ten distinct ids and
millions of dates for them. In that case a scan of the index which jumped
around to scan from the beginning and end of each distinct id value would be
faster. There's just no such plan type in Postgres currently.

You can simulate such a plan with the subqueries I described but there's a bit
more overhead than necessary and you need a reasonably efficient source of the
distinct ids. Also it may or may not be faster than simply scanning the whole
table like above and simulating it with subqueries makes it impossible to
choose the best plan.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: optimize query with a maximum(date) extraction

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> You can simulate such a plan with the subqueries I described but
> there's a bit more overhead than necessary and you need a reasonably
> efficient source of the distinct ids.

Yeah, that seems like the $64 question.  If you have no better way of
finding out the set of ID values than a seqscan, then there's no point
in trying to optimize the min/max accumulation ...

            regards, tom lane

Re: optimize query with a maximum(date) extraction

From
Dimitri
Date:
BTW, will it improve something if you change your index to "my_table(
id, the_date )"?

Rgds,
-Dimitri

On 9/5/07, JS Ubei <jsubei@yahoo.fr> wrote:
> Hi all,
>
> I need to improve a query like :
>
> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;
>
> Stupidly, I create a B-tree index on my_table(the_date), witch is logically
> not used in my query, because it's not with a constant ? isn't it ?
>
> I know that I can't create a function index with an aggregative function.
>
> How I can do ?
>
> thanks,
>
> jsubei
>
>
>
>
>
> _____________________________________________________________________________
> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>