Thread: Speeding up Aggregates

Speeding up Aggregates

From

Dror Matalon

Date:

03 October 2003, 17:40:42

Hi,

I have a query that ran quite well initially, but slowed down quite a
bit once I introduced an aggregate into the equation. The average
execution time went up from around 15 msec to around 300 msec.

The original query fetches a bunch of articles:

select articlenumber, channel, description, title, link, dtstamp  from
    items, my_channels where items.channel = '22222' and my_channels.id =
    '22222' and owner = 'drormata'  and dtstamp > last_viewed and
    articlenumber not in (select item from viewed_items where channel
    ='22222' and owner = 'drormata');


I then added a call to a function:

and (dtstamp = item_max_date(22222, link))


item_max_date() looks like this:
   select max(dtstamp) from items where channel = $1 and link = $2;

This should eliminate duplicate articles and only show the most recent
one.

resulting in the following query

select articlenumber, channel, description, title, link, dtstamp  from
    items, my_channels where items.channel = '22222' and my_channels.id =
    '22222' and owner = 'drormata'  and dtstamp > last_viewed and
    articlenumber not in (select item from viewed_items where channel
    ='22222' and owner = 'drormata') and (dtstamp = item_max_date(22222,
    link));



Any suggestions on optimizing the query/function? It makes sense that
it slowed down, but I wonder if I can do better.

I'm including index list as well as "explain analyze" of both versions.

Indexes:
    "item_channel_link" btree (channel, link)
    "item_created" btree (dtstamp)
    "item_signature" btree (signature)
    "items_channel_article" btree (channel, articlenumber)


explain analyze select articlenumber, channel, description, title, link, dtstamp  from items, my_channels where
items.channel= '22222' and my_channels.id = '22222' and owner = 'drormata'  and dtstamp > last_viewed and articlenumber
notin (select item from viewed_items where channel ='22222' and owner = 'drormata');
                                                          QUERY PLAN 

-------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=8.19..6982.58 rows=302 width=259) (actual time=16.95..17.16 rows=8 loops=1)
   Join Filter: ("inner".dtstamp > "outer".last_viewed)
   ->  Seq Scan on my_channels  (cost=0.00..3.23 rows=1 width=8) (actual time=0.36..0.38 rows=1 loops=1)
         Filter: ((id = 22222) AND (("owner")::text = 'drormata'::text))
   ->  Index Scan using items_channel_article on items  (cost=8.19..6968.05 rows=904 width=259) (actual
time=0.68..13.94rows=899 loops=1) 
         Index Cond: (channel = 22222)
         Filter: (NOT (hashed subplan))
         SubPlan
           ->  Seq Scan on viewed_items  (cost=0.00..8.19 rows=2 width=4) (actual time=0.48..0.48 rows=0 loops=1)
                 Filter: ((channel = 22222) AND (("owner")::text = 'drormata'::text))
 Total runtime: 17.42 msec
(11 rows)


explain analyze select articlenumber, channel, description, title, link, dtstamp  from items, my_channels where
items.channel= '22222' and my_channels.id = '22222' and owner = 'drormata'  and dtstamp > last_viewed and articlenumber
notin (select item from viewed_items where channel ='22222' and owner = 'drormata') and (dtstamp = item_max_date(22222,
link));
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=8.19..6980.33 rows=1 width=259) (actual time=262.94..265.14 rows=7 loops=1)
   Join Filter: ("outer".dtstamp > "inner".last_viewed)
   ->  Index Scan using items_channel_article on items  (cost=8.19..6977.08 rows=1 width=259) (actual time=1.94..150.55
rows=683loops=1) 
         Index Cond: (channel = 22222)
         Filter: ((dtstamp = item_max_date(22222, link)) AND (NOT (hashed subplan)))
         SubPlan
           ->  Seq Scan on viewed_items  (cost=0.00..8.19 rows=2 width=4) (actual time=0.43..0.43 rows=0 loops=1)
                 Filter: ((channel = 22222) AND (("owner")::text = 'drormata'::text))
   ->  Seq Scan on my_channels  (cost=0.00..3.23 rows=1 width=8) (actual time=0.14..0.15 rows=1 loops=683)
         Filter: ((id = 22222) AND (("owner")::text = 'drormata'::text))
 Total runtime: 265.39 msec




--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Speeding up Aggregates

From

Josh Berkus

Date:

03 October 2003, 18:09:53

Dror,

> select articlenumber, channel, description, title, link, dtstamp  from
>     items, my_channels where items.channel = '22222' and my_channels.id =
>     '22222' and owner = 'drormata'  and dtstamp > last_viewed and
>     articlenumber not in (select item from viewed_items where channel
>     ='22222' and owner = 'drormata');

the NOT IN is a bad idea unless the subselect never returns more than a
handful of rows.  If viewed_items can grow to dozens  of rows, wyou should
use WHERE NOT EXISTS instead.  Unless you're using 7.4.

> item_max_date() looks like this:
>    select max(dtstamp) from items where channel = $1 and link = $2;

Change it to

SELECT dtstamp from iterm where channel = $1 and link = $2
ORDER BY dtstamp DESC LIMIT 1

and possibly build an index on channel, link, dtstamp

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: Speeding up Aggregates

From

Dror Matalon

Date:

03 October 2003, 18:29:38

Hi Josh,

On Fri, Oct 03, 2003 at 02:07:10PM -0700, Josh Berkus wrote:
> Dror,
>
> > select articlenumber, channel, description, title, link, dtstamp  from
> >     items, my_channels where items.channel = '22222' and my_channels.id =
> >     '22222' and owner = 'drormata'  and dtstamp > last_viewed and
> >     articlenumber not in (select item from viewed_items where channel
> >     ='22222' and owner = 'drormata');
>
> the NOT IN is a bad idea unless the subselect never returns more than a
> handful of rows.  If viewed_items can grow to dozens  of rows, wyou should
> use WHERE NOT EXISTS instead.  Unless you're using 7.4.
>

I am using 7.4, and had tried NOT EXISTS and didn't see any
improvements.

> > item_max_date() looks like this:
> >    select max(dtstamp) from items where channel = $1 and link = $2;
>
> Change it to
>
> SELECT dtstamp from iterm where channel = $1 and link = $2
> ORDER BY dtstamp DESC LIMIT 1
>

Didn't make a difference. And plugging real values into this query as
well as into  the original
  select max(dtstamp) from items where channel = $1 and link = $2;

and doing an explain analyze shows that the cost is the same. The
strange things is that when I run the above queries by hand they take
about .5 msec. Yet on a resultset that fetches 5 rows, I go up from 15
msec to 300 msec. It would seem like it should be something like 15 +
(0.5 * 5) + small overhead, = 30 msec or so rather than the 300 I'm
seeing.

> and possibly build an index on channel, link, dtstamp

Didn't make a difference either. Explain analyze shows that it didn't
use it.

>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Speeding up Aggregates

From

Josh Berkus

Date:

03 October 2003, 18:38:30

Dror,

> I am using 7.4, and had tried NOT EXISTS and didn't see any
> improvements.

It wouldn't if you're using 7.4, which has improved IN performance immensely.

What happens if you stop using a function and instead use a subselect?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: Speeding up Aggregates

From

Rod Taylor

Date:

03 October 2003, 18:45:32

> item_max_date() looks like this:
>    select max(dtstamp) from items where channel = $1 and link = $2;

It is too bad the (channel, link) index doesn't have dtstamp at the end
of it, otherwise the below query would be a gain (might be a small one
anyway).

  select dtstamp
    from items
   where channel = $1
     and link = $2
ORDER BY dtstamp DESC
   LIMIT 1;


Could you show us the exact specification of the function?  In
particular, did you mark it VOLATILE, IMMUTABLE, or STABLE?

I hope it isn't the first or second one ;)

Attachment

signature.asc

Re: Speeding up Aggregates

From

Dror Matalon

Date:

03 October 2003, 18:54:36

On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > item_max_date() looks like this:
> >    select max(dtstamp) from items where channel = $1 and link = $2;
>
> It is too bad the (channel, link) index doesn't have dtstamp at the end
> of it, otherwise the below query would be a gain (might be a small one
> anyway).
>
>   select dtstamp
>     from items
>    where channel = $1
>      and link = $2
> ORDER BY dtstamp DESC
>    LIMIT 1;

Similar idea to what Josh suggested. I did create an additional index
with dtstamp at the end and it doesn't look like the planner used it.
Using the above query instead of max() didn't improve things either.

>
>
> Could you show us the exact specification of the function?  In
> particular, did you mark it VOLATILE, IMMUTABLE, or STABLE?
>
> I hope it isn't the first or second one ;)

CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2;
' LANGUAGE 'sql';




--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Speeding up Aggregates

From

Dror Matalon

Date:

03 October 2003, 19:04:47

On Fri, Oct 03, 2003 at 02:35:46PM -0700, Josh Berkus wrote:
> Dror,
>
> > I am using 7.4, and had tried NOT EXISTS and didn't see any
> > improvements.
>
> It wouldn't if you're using 7.4, which has improved IN performance immensely.
>
> What happens if you stop using a function and instead use a subselect?

An improvement. Now I'm getting in the 200 msec response time.

And by the way, I tried "not exists" again and it actually runs slower
than "not in."

>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Speeding up Aggregates

From

Rod Taylor

Date:

03 October 2003, 19:10:35

On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
> On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > > item_max_date() looks like this:
> > >    select max(dtstamp) from items where channel = $1 and link = $2;
> >
> > It is too bad the (channel, link) index doesn't have dtstamp at the end
> > of it, otherwise the below query would be a gain (might be a small one
> > anyway).
> >
> >   select dtstamp
> >     from items
> >    where channel = $1
> >      and link = $2
> > ORDER BY dtstamp DESC
> >    LIMIT 1;

It didn't make a difference even with the 3 term index? I guess you
don't have very many common values for channel / link combination.



How about the below? Note the word STABLE on the end.

CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2;
' LANGUAGE 'sql' STABLE;

Attachment

signature.asc

Re: Speeding up Aggregates

From

Dror Matalon

Date:

03 October 2003, 19:17:44

On Fri, Oct 03, 2003 at 06:10:29PM -0400, Rod Taylor wrote:
> On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
> > On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > > > item_max_date() looks like this:
> > > >    select max(dtstamp) from items where channel = $1 and link = $2;
> > >
> > > It is too bad the (channel, link) index doesn't have dtstamp at the end
> > > of it, otherwise the below query would be a gain (might be a small one
> > > anyway).
> > >
> > >   select dtstamp
> > >     from items
> > >    where channel = $1
> > >      and link = $2
> > > ORDER BY dtstamp DESC
> > >    LIMIT 1;
>
> It didn't make a difference even with the 3 term index? I guess you
> don't have very many common values for channel / link combination.

There's no noticeable difference between two term and three term
indexes.

>
>
>
> How about the below? Note the word STABLE on the end.
>
> CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
> timestamptz AS '
> select max(dtstamp) from items where channel = $1 and link = $2;
> ' LANGUAGE 'sql' STABLE;

Made no difference.



--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Speeding up Aggregates

From

Rod Taylor

Date:

03 October 2003, 20:32:25

> > I hope it isn't the first or second one ;)
>
> CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
> timestamptz AS '
> select max(dtstamp) from items where channel = $1 and link = $2;
> ' LANGUAGE 'sql';


How about the below?

CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2;
' LANGUAGE 'sql' STABLE;

Attachment

signature.asc

Re: Speeding up Aggregates

From

Greg Stark

Date:

08 October 2003, 11:55:18

Rod Taylor <rbt@rbt.ca> writes:

> On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
> > On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > >
> > > It is too bad the (channel, link) index doesn't have dtstamp at the end
> > > of it, otherwise the below query would be a gain (might be a small one
> > > anyway).
> > >
> > >   select dtstamp
> > >     from items
> > >    where channel = $1
> > >      and link = $2
> > > ORDER BY dtstamp DESC
> > >    LIMIT 1;
>
> It didn't make a difference even with the 3 term index? I guess you
> don't have very many common values for channel / link combination.

You need to do:

 ORDER BY channel DESC, link DESC, dtstamp DESC

This is an optimizer nit. It doesn't notice that since it selected on channel
and link already the remaining tuples in the index will be ordered simply by
dtstamp.

(This is the thing i pointed out previously in
<87el6ckrlu.fsf@stark.dyndns.tv> on Feb 13th 2003 on pgsql-general)


--
greg

Re: Speeding up Aggregates

From

Dror Matalon

Date:

08 October 2003, 15:21:16

Actually what finally sovled the problem is repeating the
dtstamp > last_viewed
in the sub select

select articlenumber, channel, description, title, link, dtstamp  from items i1, my_channels where ((i1.channel =
'22222'and 
my_channels.id = '22222' and owner = 'drormata'  and (dtstamp > last_viewed)) ) and (dtstamp = (select max (dtstamp)
fromitems i2  
    where channel = '22222' and i1.link = i2.link));

to
explain analyze select articlenumber, channel, description, title, link, dtstamp  from items i1, my_channels where
((i1.channel= '22222' and 
my_channels.id = '22222' and owner = 'drormata'  and (dtstamp > last_viewed)) ) and (dtstamp = (select max (dtstamp)
fromitems i2 where 
channel = '22222' and i1.link = i2.link and dtstamp > last_viewed));

Which in the stored procedure looks like this:
CREATE or REPLACE FUNCTION item_max_date (int4, varchar, timestamptz)
RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2 and
dtstamp > $3;
' LANGUAGE 'sql';


Basically I have hundreds or thousands of items but only a few that
satisfy "dtstamp > last_viewed". Obviously I want to run the max() only on
on a few items. Repeating "dtstamp > last_viewed" did the trick, but it
seems like there should be a more elegant/clear way to tell the planner
which constraint to apply first.

Dror



On Wed, Oct 08, 2003 at 10:54:24AM -0400, Greg Stark wrote:
> Rod Taylor <rbt@rbt.ca> writes:
>
> > On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
> > > On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > > >
> > > > It is too bad the (channel, link) index doesn't have dtstamp at the end
> > > > of it, otherwise the below query would be a gain (might be a small one
> > > > anyway).
> > > >
> > > >   select dtstamp
> > > >     from items
> > > >    where channel = $1
> > > >      and link = $2
> > > > ORDER BY dtstamp DESC
> > > >    LIMIT 1;
> >
> > It didn't make a difference even with the 3 term index? I guess you
> > don't have very many common values for channel / link combination.
>
> You need to do:
>
>  ORDER BY channel DESC, link DESC, dtstamp DESC
>
> This is an optimizer nit. It doesn't notice that since it selected on channel
> and link already the remaining tuples in the index will be ordered simply by
> dtstamp.
>
> (This is the thing i pointed out previously in
> <87el6ckrlu.fsf@stark.dyndns.tv> on Feb 13th 2003 on pgsql-general)
>
>
> --
> greg
>

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Speeding up Aggregates

From

Greg Stark

Date:

09 October 2003, 20:07:55

Dror Matalon <dror@zapatec.com> writes:

> Actually what finally sovled the problem is repeating the
> dtstamp > last_viewed
> in the sub select

That will at least convince the optimizer to use an index range lookup. But it
still will have to scan every record that matches channel==$1, link==$2, and
dtstamp>$3.

The trick of using limit 1 will be faster still as it only has to retrieve a
single record using the index. But you have to be sure to convince it to use
the index and the way to do that is to list exactly the same columns in the
ORDER BY as are in the index definition.

Even if some of the leading columns are redundant because they'll be constant
for all of the records retrieved. The optimizer doesn't know to ignore those.

> > (This is the thing i pointed out previously in
> > <87el6ckrlu.fsf@stark.dyndns.tv> on Feb 13th 2003 on pgsql-general)

--
greg

Re: Speeding up Aggregates

From

Dror Matalon

Date:

09 October 2003, 21:45:46

On Thu, Oct 09, 2003 at 07:07:00PM -0400, Greg Stark wrote:
> Dror Matalon <dror@zapatec.com> writes:
>
> > Actually what finally sovled the problem is repeating the
> > dtstamp > last_viewed
> > in the sub select
>
> That will at least convince the optimizer to use an index range lookup. But it
> still will have to scan every record that matches channel==$1, link==$2, and
> dtstamp>$3.
>
> The trick of using limit 1 will be faster still as it only has to retrieve a
> single record using the index. But you have to be sure to convince it to use

How is doing order by limit 1 faster than doing max()? Seems like the
optimizer will need to sort or scan the data set either way. That part
didn't actually make a difference in my specific case.

> the index and the way to do that is to list exactly the same columns in the
> ORDER BY as are in the index definition.
>
> Even if some of the leading columns are redundant because they'll be constant
> for all of the records retrieved. The optimizer doesn't know to ignore those.

The main problem in my case was that the optimizer was doing the max()
on all 700 rows, rather than the filtered rows. It's not until I put the
"dtstamp> last_viewed" in the sub select as well as in the main query
that it realized that it can first filter the 696 rows out and then to
the max() on the 4 rows that satisfied this constraint.

That was the big saving.

Hope this all makes sense,

Dror
>
> > > (This is the thing i pointed out previously in
> > > <87el6ckrlu.fsf@stark.dyndns.tv> on Feb 13th 2003 on pgsql-general)
>
> --
> greg
>

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Speeding up Aggregates

From

Bruno Wolff III

Date:

09 October 2003, 22:28:38

On Thu, Oct 09, 2003 at 17:44:46 -0700,
  Dror Matalon <dror@zapatec.com> wrote:
>
> How is doing order by limit 1 faster than doing max()? Seems like the
> optimizer will need to sort or scan the data set either way. That part
> didn't actually make a difference in my specific case.

max() will never be evaluated by using an index to find the greatest value.
So in many cases using order by and limit 1 is faster.

Re: Speeding up Aggregates

From

Dror Matalon

Date:

10 October 2003, 00:56:27

On Thu, Oct 09, 2003 at 08:35:22PM -0500, Bruno Wolff III wrote:
> On Thu, Oct 09, 2003 at 17:44:46 -0700,
>   Dror Matalon <dror@zapatec.com> wrote:
> >
> > How is doing order by limit 1 faster than doing max()? Seems like the
> > optimizer will need to sort or scan the data set either way. That part
> > didn't actually make a difference in my specific case.
>
> max() will never be evaluated by using an index to find the greatest value.
> So in many cases using order by and limit 1 is faster.

Ouch. I just double checked and you're right. Is this considered a bug,
or just an implementation issue?

While I've seen this hint a few times in the lists, it seems like it's
one of those magic incantations that those in the know, know about, and
that people new to postgres are going to be surprised by the need to use
this idiom.

Regards,

Dror

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Speeding up Aggregates

From

Greg Stark

Date:

10 October 2003, 01:50:21

Dror Matalon <dror@zapatec.com> writes:

> Ouch. I just double checked and you're right. Is this considered a bug,
> or just an implementation issue?

Call it a wishlist bug. The problem is it would be a hard feature to implement
properly. And none of the people paid to work on postgres by various companies
seem to have this on their to-do lists. So don't expect it in the near future.

> While I've seen this hint a few times in the lists, it seems like it's
> one of those magic incantations that those in the know, know about, and
> that people new to postgres are going to be surprised by the need to use
> this idiom.

Yup. Though it's in the FAQ and comes up on the mailing list about once a week
or so, so it's hard to see how to document it any better. Perhaps a warning
specifically on the min/max functions in the documentation?

Say, what do people think about a comment board thing like php.net has
attached to the documentation. People can add comments that show up directly
on the bottom of the documentation for each function. I find it's mostly full
of junk but skimming the comments often turns up one or two relevant warnings,
especially when I'm wondering why something's not behaving the way I expect.

--
greg

Re: Speeding up Aggregates

From

Christopher Kings-Lynne

Date:

10 October 2003, 02:31:56

> Say, what do people think about a comment board thing like php.net has
> attached to the documentation. People can add comments that show up directly
> on the bottom of the documentation for each function. I find it's mostly full
> of junk but skimming the comments often turns up one or two relevant warnings,
> especially when I'm wondering why something's not behaving the way I expect.

I thought we had that:

http://www.postgresql.org/docs/7.3/interactive/functions-aggregate.html

...and someone has already made the comment.

Chris

Re: Speeding up Aggregates

From

Peter Childs

Date:

10 October 2003, 05:52:13

On Thu, 9 Oct 2003, Dror Matalon wrote:

> On Thu, Oct 09, 2003 at 07:07:00PM -0400, Greg Stark wrote:
> > Dror Matalon <dror@zapatec.com> writes:
> >
> > > Actually what finally sovled the problem is repeating the
> > > dtstamp > last_viewed
> > > in the sub select
> >
> > That will at least convince the optimizer to use an index range lookup. But it
> > still will have to scan every record that matches channel==$1, link==$2, and
> > dtstamp>$3.
> >
> > The trick of using limit 1 will be faster still as it only has to retrieve a
> > single record using the index. But you have to be sure to convince it to use
>
> How is doing order by limit 1 faster than doing max()? Seems like the
> optimizer will need to sort or scan the data set either way. That part
> didn't actually make a difference in my specific case.
>

max(field) = sequential scan looking for the hightest.

order by field desc limit 1 = index scan (if available), read first
record.
    else (if no index) sequential scan for highest.

    aggregates don't use indexes because its only appilicable for
max() and min() and can't be done for sum(), count(), etc writing an
alogorithim to use the index would be complex as you would need to tell
the optimized from the inside a function (you can write aggrate functions
your self if you wish) to do somthing slighly differently.

for my large table....
select max(field) from table; (5264.21 msec)
select field from table order by field limit 1; (54.88 msec)

Peter Childs

Re: Speeding up Aggregates

From

Rob Nagler

Date:

10 October 2003, 12:49:37

Greg Stark writes:
> Call it a wishlist bug. The problem is it would be a hard feature to
> implement properly. And none of the people paid to work on postgres
> by various companies seem to have this on their to-do lists. So
> don't expect it in the near future.

We are using Postgres heavily, and we should be able to find some time
and/or funding to help.

We're becoming more and more frustrated with the discontinuous
behaviour of the planner.  It seems every complex query we have these
days needs some "hint" like "ORDER BY foo DESC LIMIT 1" to make it run
on the order of seconds, not minutes.  We usually figure out a way to
write the query so the planner does the right thing, and pushes
the discontinuity out far enough that the user doesn't see it.
However, it takes a lot of work, and it seems to me that work would be
put to better use improving the planner than improving our knowledge
of how to get the planner to do the right thing by coding the SQL in
some unusual way.

Please allow me to go out on a limb here.  I know that Tom is
philosophically opposed to planner hints.  However, we do have a
problem that the planner is never going to be smart enough.  This
leaves the SQL coder the only option of collecting a bag of
(non-portable) SQL tricks.  I saw what the best SQL coders did at
Tandem, and frankly, it's scary.  Being at Tandem, the SQL coders also
had the option (if they could argue their case strong enough) of
adding a new rule to the optimizer.  This doesn't solve the problem
for outsiders no matter how good they are at SQL.

Would it be possible to extend the planner with a pattern matching
language?  It would formalize what it is doing already, and would
allow outsiders to teach the planner about idiosyncrasies without
changing the SQL.  It would be like a style sheet in Latex (or Scribe :-)
if you are familiar with these typesetting languages.

Comments?

Rob

Re: Speeding up Aggregates

From

Josh Berkus

Date:

10 October 2003, 14:35:41

Dror,

> Ouch. I just double checked and you're right. Is this considered a bug,
> or just an implementation issue?

It's an implementation issue, which may be fixed by 7.5 but not sooner.
Basically, the free ability of PostgreSQL users to define their own
aggregates limits our ability to define query planner optimization for
aggregates.   Only recently has anyone suggested a feasable way around this.

> While I've seen this hint a few times in the lists, it seems like it's
> one of those magic incantations that those in the know, know about, and
> that people new to postgres are going to be surprised by the need to use
> this idiom.

It IS in the FAQ.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Speeding up Aggregates

From

Dror Matalon

Date:

10 October 2003, 15:26:16

On Fri, Oct 10, 2003 at 10:32:32AM -0700, Josh Berkus wrote:
> Dror,
>
> > Ouch. I just double checked and you're right. Is this considered a bug,
> > or just an implementation issue?
>
> It's an implementation issue, which may be fixed by 7.5 but not sooner.
> Basically, the free ability of PostgreSQL users to define their own
> aggregates limits our ability to define query planner optimization for
> aggregates.   Only recently has anyone suggested a feasable way around this.
>
> > While I've seen this hint a few times in the lists, it seems like it's
> > one of those magic incantations that those in the know, know about, and
> > that people new to postgres are going to be surprised by the need to use
> > this idiom.
>
> It IS in the FAQ.

Might be a good idea to put it in its own section rather than under "My
queries are slow or don't make use of the indexes. Why?"

Also, you might want to take out for 7.4

4.22) Why are my subqueries using IN so slow?


>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com