Thread: Best way to get all different values in a column

Best way to get all different values in a column

From
Date:
Ok, since my question got no answer on the general list, I'm reposting
it here since this list seems in fact better suited to it.

Does anyone here know what  is  the most efficient way to  list all
different values of a given column with low cardinality ?  For instance
I have a table with columns DAY, NAME, ID, etc.  The table is updated
about each week with thousands of records with the same (current) date.
Now I would like to list all values for DAY, only if possible without
scanning all the table each time I submit the request.

I can think of:

Solution 1: SELECT DAY FROM TABLE GROUP BY DAY;

Solution 2: SELECT DISTINCT DAY FROM TABLE;

(BTW why do those two yield such different performances, the later being
seemingly *much* slower than the former  ?)

Solution 3: Improve performance through an index scan by using DAY as
the first element of the PK,  (PRIMARY KEY (DAY, ID) ), although DAY has
a low cardinality ?

Solution 4: Create a separate index on column DAY ?

Solution 5: Use some kind of view / stored procedure that would be
precomputed when TABLE is updated or cached when called for the first
time ? Does something like that exist ?

Solution 6: Store the values in a separate table, recreated each time
TABLE is updated.

This looks to me as a very common problem. Is there an obvious / best /
standard solution there ? What would be the expected performance of the
different solutions above ? (I guess some are probably non-sense)

Thank you all !
Christian


Re: Best way to get all different values in a column

From
mark@mark.mielke.cc
Date:
On Fri, Oct 14, 2005 at 06:02:56PM +0200, ext-christian.roche@nokia.com wrote:
> Does anyone here know what  is  the most efficient way to  list all
> different values of a given column with low cardinality ?  For instance
> I have a table with columns DAY, NAME, ID, etc.  The table is updated
> about each week with thousands of records with the same (current) date.
> Now I would like to list all values for DAY, only if possible without
> scanning all the table each time I submit the request.
> I can think of:
> ...
> Solution 6: Store the values in a separate table, recreated each time
> TABLE is updated.

I've found a variant on 6 to work well for this problem domain.

Why not insert into the separate table, when you insert into the table?
Either as a trigger, or in your application.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: Best way to get all different values in a column

From
Steve Atkins
Date:
On Fri, Oct 14, 2005 at 06:02:56PM +0200, ext-christian.roche@nokia.com wrote:
> Ok, since my question got no answer on the general list, I'm reposting
> it here since this list seems in fact better suited to it.
>
> Does anyone here know what  is  the most efficient way to  list all
> different values of a given column with low cardinality ?  For instance
> I have a table with columns DAY, NAME, ID, etc.  The table is updated
> about each week with thousands of records with the same (current) date.
> Now I would like to list all values for DAY, only if possible without
> scanning all the table each time I submit the request.
>
> I can think of:
>
> Solution 1: SELECT DAY FROM TABLE GROUP BY DAY;
>
> Solution 2: SELECT DISTINCT DAY FROM TABLE;
>
> (BTW why do those two yield such different performances, the later being
> seemingly *much* slower than the former  ?)
>
> Solution 3: Improve performance through an index scan by using DAY as
> the first element of the PK,  (PRIMARY KEY (DAY, ID) ), although DAY has
> a low cardinality ?
>
> Solution 4: Create a separate index on column DAY ?
>
> Solution 5: Use some kind of view / stored procedure that would be
> precomputed when TABLE is updated or cached when called for the first
> time ? Does something like that exist ?
>
> Solution 6: Store the values in a separate table, recreated each time
> TABLE is updated.
>
> This looks to me as a very common problem. Is there an obvious / best /
> standard solution there ? What would be the expected performance of the
> different solutions above ? (I guess some are probably non-sense)
>

There's not going to be a single "best" solution, as it'll depend on
your requirements, and on your application level constraints.

You say that the table is seldom updated (a few thousand a week is "almost
never"). If it's updated in a single batch you could simply generate
a table of the distinct values after each update pretty easily (solution
6).

If you don't have such a well-defined update then using a trigger on
inserts, updates and deletes of the table to update a separate table
to keep track of the counts of each distinct values, then you can
just select any row with a non-zero count from that table (solution 5).
(You need the counts to be able to deal with deletes efficiently). That
would increase the cost of updating the main table significantly, but
you're putting very little traffic through it, so that's unlikely to
be a problem.

I doubt that solutions 3 or 4 are worth looking at at all, and the first
two are what they are and you know their performance already.

You could probably do this far more efficiently with some of the work
being done in the application layer, rather than in the database - for
instance you could update the counts table one time per transaction,
rather than one time per operation - but that would lose you the
convenience of maintaining the counts correctly when you futz with
the data manually or using tools not aware of the count table.

Cheers,
  Steve