Thread: question about index

question about index

From
Jerome Alet
Date:
Hi,

For a future databas, I plan to have got a table with a text field 
which can contain only three different values, say "VALUE1", 
"VALUE2", and "VALUE3" 

this table may have, over the course of one year, several million
rows for a size around 2 Gb or more.

I'd be interested in having :
       SELECT count(*) AS nbvalue1 FROM mytable WHERE myfield='VALUE1';       SELECT count(*) AS nbvalue2 FROM mytable
WHEREmyfield='VALUE2';       SELECT count(*) AS nbvalue3 FROM mytable WHERE myfield='VALUE3';       
 
be as fast as possible.        

considering that almost 70% of the rows will be with 'VALUE1', 20%
will be with 'VALUE2' and 10% will be with 'VALUE3' on the average.

should I create an index to speedup the counts or not ?

any idea of the impact of running these three queries every 5 minutes
on say 10 000 000 rows ?

thanks in advance

Jerome Alet


Re: [despammed] question about index

From
Andreas Kretschmer
Date:
am  16.12.2004, um 11:41:54 +0100 mailte Jerome Alet folgendes:
> Hi,
> 
> For a future databas, I plan to have got a table with a text field 
> which can contain only three different values, say "VALUE1", 
> "VALUE2", and "VALUE3" 

Why text-fields for this task? I would prefer a smallint for this with a
check-contraint to check the values.

> be as fast as possible.        
> 
> considering that almost 70% of the rows will be with 'VALUE1', 20%
> will be with 'VALUE2' and 10% will be with 'VALUE3' on the average.
> 
> should I create an index to speedup the counts or not ?

Yes.



Andreas
-- 
Andreas Kretschmer    (Kontakt: siehe Header)              Tel. NL Heynitz:  035242/47212
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net===    Schollglas Unternehmensgruppe    === 


Re: question about index

From
"D'Arcy J.M. Cain"
Date:
On Thu, 16 Dec 2004 11:41:54 +0100
Jerome Alet <alet@librelogiciel.com> wrote:
> For a future databas, I plan to have got a table with a text field 
> which can contain only three different values, say "VALUE1", 
> "VALUE2", and "VALUE3" 

Can it increase?  That is, can a "VALUE4" be added half way through the
year?

> this table may have, over the course of one year, several million
> rows for a size around 2 Gb or more.

Doing a COUNT(*) on that many rows could get expensive but...

> I'd be interested in having :
> 
>         SELECT count(*) AS nbvalue1 FROM mytable WHERE
>         myfield='VALUE1'; SELECT count(*) AS nbvalue2 FROM mytable
>         WHERE myfield='VALUE2'; SELECT count(*) AS nbvalue3 FROM
>         mytable WHERE myfield='VALUE3';

First of all, consider doing it in one statement so that you at least
are not running through it multiple times.
   SELECT myfield, COUNT(*) FROM mytable GROUP BY myfield;

Second, consider rules or triggers to keep a separate table up to date
at all times so that you simply have to dump a three row table instead
of going through a huge table counting as you go.  We did that in a
similar situation and it made a huge difference.  We were doing SUM()
instead of COUNT() and we were calculating an extremely small percentage
of the table - average probably 3 or 4 rows out of 20 million on average
- but it still was worth our while to calculate the sum (balance) on
every transaction rather than calculate it every time.  You need to
analyze your own data and usage but this may be a better solution for
you.

As usual, if not completely satisfied you get a full refund.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: [despammed] question about index

From
Bruno Wolff III
Date:
On Thu, Dec 16, 2004 at 12:01:39 +0100, Andreas Kretschmer <akretschmer@despammed.com> wrote:
> > 
> > considering that almost 70% of the rows will be with 'VALUE1', 20%
> > will be with 'VALUE2' and 10% will be with 'VALUE3' on the average.
> > 
> > should I create an index to speedup the counts or not ?
> 
> Yes.

Indexes will probably not help in this case. Even when looking at 10% of
the rows, a sequential scan is still going to be faster than an index
scan in most cases.


Re: [despammed] question about index

From
Andreas Kretschmer
Date:
am  16.12.2004, um  8:10:25 -0600 mailte Bruno Wolff III folgendes:
> On Thu, Dec 16, 2004 at 12:01:39 +0100,
>   Andreas Kretschmer <akretschmer@despammed.com> wrote:
> > > 
> > > considering that almost 70% of the rows will be with 'VALUE1', 20%
> > > will be with 'VALUE2' and 10% will be with 'VALUE3' on the average.
> > > 
> > > should I create an index to speedup the counts or not ?
> > 
> > Yes.
> 
> Indexes will probably not help in this case. Even when looking at 10% of
> the rows, a sequential scan is still going to be faster than an index
> scan in most cases.

Okay.


Andreas
-- 
Andreas Kretschmer    (Kontakt: siehe Header)              Tel. NL Heynitz:  035242/47212
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net===    Schollglas Unternehmensgruppe    ===