Thread: Disk usage question

Disk usage question

From

Franck Routier

Date:

12 November 2008, 18:28:36

Hi,

I have to manage a database that is getting way too big for us.
Currently db size is 304 GB.

One table is accounting for a third of this space.
The table itself has 68.800.000 tuples, taking 28GB.

There are 39 indices on the table, and many of them use multiple
columns. A lot of these indices share the same column(s).
The indices are taking 95GB.

So, here are my questions:

- do these figures seem normal or is there likely a bigger problem ?

- when indices share a column, is it worth creating several multi-column
indices (as we do now), or would we get the same result (from a
performance point of view) by creating several single column indices
(one for each column) ?

- does the order in which a multi-column index is created matter ? That
is, if I have a column A with less discriminating values and a column B
with more discriminating values, does it matter if I:
 'CREATE INDEX myindex ON mytable USING (A,B) '
or
'CREATE INDEX myindex ON mytable USING (A,B) '
Is the second solution likely to behave faster ?
Or is it simply better to:
CREATE INDEX myindexa ON mytable USING (A);
CREATE INDEX myindexb ON mytable USING (B);

- as we do many insert and very few update/delete, I thought REINDEX was
going to be superfluous. But REINDEXing is often needed to keep the size
of the db _relatively_ reasonable. Does it sound normal ?

Thanks for any tip,
Franck

Re: Disk usage question

From

"Scott Marlowe"

Date:

12 November 2008, 19:44:20

On Wed, Nov 12, 2008 at 10:02 AM, Franck Routier
<franck.routier@axege.com> wrote:
> Hi,
>
> I have to manage a database that is getting way too big for us.
> Currently db size is 304 GB.
>
> One table is accounting for a third of this space.
> The table itself has 68.800.000 tuples, taking 28GB.
>
> There are 39 indices on the table, and many of them use multiple
> columns. A lot of these indices share the same column(s).
> The indices are taking 95GB.
>
> So, here are my questions:
>
> - do these figures seem normal or is there likely a bigger problem ?

Can't really say.  Is this a table with a single integer column?  Then
it's way too big.  If it's got plenty of columns, some of which are
text or bytea then probably not.  What does vacuum verbose say about
your tables / db?

> - when indices share a column, is it worth creating several multi-column
> indices (as we do now), or would we get the same result (from a
> performance point of view) by creating several single column indices
> (one for each column) ?

No, single field indexes are not as fast as multi-field indexes when
the where clause hits the fields starting from the left of the index.
Note that an index on (a,b) will not help a where clause on only b.

> - does the order in which a multi-column index is created matter ? That
> is, if I have a column A with less discriminating values and a column B
> with more discriminating values, does it matter if I:
>  'CREATE INDEX myindex ON mytable USING (A,B) '
> or
> 'CREATE INDEX myindex ON mytable USING (A,B) '

Those look the same, I assume you meant USING (B,A) for one.

Assuming both fields are used by the query's where clause, the more
selective one should be first.  I think.  testing will tell for sure.

> Is the second solution likely to behave faster ?
> Or is it simply better to:
> CREATE INDEX myindexa ON mytable USING (A);
> CREATE INDEX myindexb ON mytable USING (B);

Maybe.  If one is very selective then there's no great need for the
other anyway.

> - as we do many insert and very few update/delete, I thought REINDEX was
> going to be superfluous. But REINDEXing is often needed to keep the size
> of the db _relatively_ reasonable. Does it sound normal ?

Yes, if you have a lot of failed inserts.  a failed insert = insert + delete.