Re: SQL challenge--top 10 for each key value? - Mailing list pgsql-sql

From Greg Stark
Subject Re: SQL challenge--top 10 for each key value?
Date
Msg-id 87r7uyf3ki.fsf@stark.xeocode.com
Whole thread Raw
In response to SQL challenge--top 10 for each key value?  (Jeff Boes <jboes@nexcerpt.com>)
Responses Re: SQL challenge--top 10 for each key value?  (Rod Taylor <pg@rbt.ca>)
List pgsql-sql
Jeff Boes <jboes@nexcerpt.com> writes:

> I headed off in the direction of groups of SELECTs and UNIONs, and quit when I
> got to something like four levels of "SELECT ... AS FOO" ...

four? wimp, that's nothing!

ok, seriously I think there's no way to do this directly with straight SQL.
You would have to define a non-immutable function that has some temporary
storage where it keeps track of how many it has seen. 

The generic function that would help here would be some kind of rank(value)
that would give you the equivalent of rownum except with a level break every
time value changes. I've been hoping to see something like this on the list
for a long time but haven't yet.

If the value of n is constant and small you could cheat with an aggregate
function with an array of the top n values.

db=> create function first3_accum(integer[],integer) returns integer[] as 'select case when array_upper($1,1) >= 3 then
$1else array_append($1,$2) end' language sql strict immutable;
 
CREATE FUNCTION
db=> create aggregate first3 (basetype = integer, sfunc = first3_accum, stype = integer[], initcond = '{}');
CREATE AGGREGATE

then something like:

SELECT first3(id) FROM (SELECT id         FROM my_table        ORDER BY query,                 CASE WHEN include THEN 1
ELSE2 END ASC,                 score DESC)GROUP BY query
 

But then you'll have to go back to the table to refetch the original records
that you've found. The best way I find to do that is with the int_array_enum()
function from the int_aggregate contrib module.

SELECT *  FROM my_table WHERE id IN (          SELECT int_array_enum(f3)            FROM (                  SELECT
first3(id)as f3                    FROM (SELECT id                            FROM my_table
ORDERBY query,                                    CASE WHEN include THEN 1 ELSE 2 END ASC,
     score DESC) as x                   GROUP BY query                ) as x          )
 


This last step is kind of annoying since you've already seen all those
records. And it requires writing a new aggregate function every time the value
of n changes though, which kind of sucks.

In theory if the new work in 7.5 handling structured datatypes is as cool as
it sounds you could have an array of complete records and when UNNEST is
eventually incorporated into the array code then you could expand those
instead of using the int_array_enum function. Neither of those things are
ready yet as far as I know though.

-- 
greg



pgsql-sql by date:

Previous
From: Jeff Boes
Date:
Subject: SQL challenge--top 10 for each key value?
Next
From: Rod Taylor
Date:
Subject: Re: SQL challenge--top 10 for each key value?