Re: query optimization: aggregate and distinct - Mailing list pgsql-general

From Jeff Davis
Subject Re: query optimization: aggregate and distinct
Date
Msg-id 200308220337.14928.jdavis-pgsql@empires.org
Whole thread Raw
In response to Re: query optimization: aggregate and distinct  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Thursday 21 August 2003 06:36 am, Tom Lane wrote:
> Jeff Davis <jdavis-pgsql@empires.org> writes:
> > I had an idea about using aggregates: what if I made an aggregate
> > function called "first" that just returned the value in the first tuple
> > it encountered?
>
> You could make that work in 7.4, but not in any existing releases.
>
> The trouble is that you need something like
>
>     SELECT first(foo) FROM (SELECT ... ORDER BY col1,col2) ss
>     GROUP BY col1
>
> and before 7.4 the optimizer doesn't realize that it can skip re-sorting
> at the outer level.  So unless the sort is stable (which it won't be, on
> most platforms anyway) the needed ordering by col2 within each group is
> destroyed.
>

Interesting. It turns out I don't really need the hack because I was able to
optimize the query with some reworking and EXPLAIN ANALYZE. Now it takes
about 1 second as opposed to 5.

However, it still has me wondering what the most efficient algorithm would be.

Here is my plan:
- make a new complex type (say, most_recent_t) that's just an int and a
timestamp
- make a function to turn an int and a timestamp into a most_recent_t
- make an aggregate function that takes most_recent_t and finds the int with
the highest timestamp

I tried a preliminary version, but all the functions were in plpgsql, which I
think may have slowed it down (plus, I was using a text[] instead of a
complex type, meaning more converting). The performance was nothing great,
but it seemed like it should have been more efficient. After all, doesn't my
plan skip the sorting phase needed for DISTINCT? The main problem is that I
need to do a lot of extra aggregate calls.

Does that seem like a more efficient plan overall, or would I waste my time
writing all those functions?

    regards,
        jeff davis



pgsql-general by date:

Previous
From: "Shridhar Daithankar"
Date:
Subject: Re: Postgresql for Solaris on Sparc
Next
From: Manfred Koizar
Date:
Subject: Re: Buglist