Thread: Is it possible to summarize uniqe values from an indexed column?

Is it possible to summarize uniqe values from an indexed column?

From
Mike Leahy
Date:
Hello list,

Following from a question I had yesterday, I'm wondering if there is
some way to summarize the unique values of an indexed column in
PostgreSQL without having the query scan the whole table.  For my
current work, I have many large tables, all of which have an indexed
column for the year from which each row of data was recorded.  This year
column contains a small number of unique values (e.g., where a large
table contains data from one, two or three years...so far).  I've been
getting the unique values by executing queries like 'select distinct
year from [table];', or 'select year from [table] group by year;'.

It would seem to me (as I mis-understand it) that if the column is
indexed, that the whole table shouldn't need to be queried when all I
want to get these values.  I figure I should somehow be able to benefit
from the index, which presumably knows something about the unique values
for this column.  Is there some way I can get a list of the unique
values directly out of an index?  So far, I haven't come across any
documentation that suggests this is possible, so I'm guessing there's no
straightforward way to do this...can anybody suggest some alternative
methods for summarizing the possible values from an indexed column that
has a small number of unique values?

Thanks in advance,
Mike



Re: Is it possible to summarize uniqe values from an indexed column?

From
Martijn van Oosterhout
Date:
On Fri, Apr 28, 2006 at 08:35:08AM -0400, Mike Leahy wrote:
> Hello list,
>
> Following from a question I had yesterday, I'm wondering if there is
> some way to summarize the unique values of an indexed column in
> PostgreSQL without having the query scan the whole table.  For my
> current work, I have many large tables, all of which have an indexed
> column for the year from which each row of data was recorded.  This year
> column contains a small number of unique values (e.g., where a large
> table contains data from one, two or three years...so far).  I've been
> getting the unique values by executing queries like 'select distinct
> year from [table];', or 'select year from [table] group by year;'.

I don't know if you can specify it in plain SQL, but you might be able
to code the following into a pl/pgsql function (this is pseudo-code):

function getunique
   $curr = "";
loop:
   select $next from table where field > $curr order by field limit 1;
   if found then
     return $next
     $curr = $next;
     goto loop;
   return;

The idea being that you ask the index lookup to find the next biggest
item. You end up doing lots of little queries but on big table with
lots of duplicates it might be a win.

Hope this helps,

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Is it possible to summarize uniqe values from an indexed

From
Mike Leahy
Date:
Martijn,

This works rather well - especially in my case, where I have thousands of rows in my tables with only a handful of unique values in the year column.  I'm not sure if I can get it to work in pl/pgsql as a function (I'll give it a shot), but it'll be no problem for me to add a routine in my php scripts that does essentially the same thing.

I really appreciate your help,
Mike

Martijn van Oosterhout wrote:
On Fri, Apr 28, 2006 at 08:35:08AM -0400, Mike Leahy wrote: 
Hello list,

Following from a question I had yesterday, I'm wondering if there is
some way to summarize the unique values of an indexed column in
PostgreSQL without having the query scan the whole table.  For my
current work, I have many large tables, all of which have an indexed
column for the year from which each row of data was recorded.  This year
column contains a small number of unique values (e.g., where a large
table contains data from one, two or three years...so far).  I've been
getting the unique values by executing queries like 'select distinct
year from [table];', or 'select year from [table] group by year;'.   
I don't know if you can specify it in plain SQL, but you might be able
to code the following into a pl/pgsql function (this is pseudo-code):

function getunique  $curr = "";
loop:  select $next from table where field > $curr order by field limit 1;  if found then    return $next    $curr = $next;    goto loop;  return;

The idea being that you ask the index lookup to find the next biggest
item. You end up doing lots of little queries but on big table with
lots of duplicates it might be a win.

Hope this helps,