Re: Disk usage question - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: Disk usage question
Date
Msg-id dcc563d10811121544j25210153y26ca277d1ce04ee7@mail.gmail.com
Whole thread Raw
In response to Disk usage question  (Franck Routier <franck.routier@axege.com>)
List pgsql-performance
On Wed, Nov 12, 2008 at 10:02 AM, Franck Routier
<franck.routier@axege.com> wrote:
> Hi,
>
> I have to manage a database that is getting way too big for us.
> Currently db size is 304 GB.
>
> One table is accounting for a third of this space.
> The table itself has 68.800.000 tuples, taking 28GB.
>
> There are 39 indices on the table, and many of them use multiple
> columns. A lot of these indices share the same column(s).
> The indices are taking 95GB.
>
> So, here are my questions:
>
> - do these figures seem normal or is there likely a bigger problem ?

Can't really say.  Is this a table with a single integer column?  Then
it's way too big.  If it's got plenty of columns, some of which are
text or bytea then probably not.  What does vacuum verbose say about
your tables / db?

> - when indices share a column, is it worth creating several multi-column
> indices (as we do now), or would we get the same result (from a
> performance point of view) by creating several single column indices
> (one for each column) ?

No, single field indexes are not as fast as multi-field indexes when
the where clause hits the fields starting from the left of the index.
Note that an index on (a,b) will not help a where clause on only b.

> - does the order in which a multi-column index is created matter ? That
> is, if I have a column A with less discriminating values and a column B
> with more discriminating values, does it matter if I:
>  'CREATE INDEX myindex ON mytable USING (A,B) '
> or
> 'CREATE INDEX myindex ON mytable USING (A,B) '

Those look the same, I assume you meant USING (B,A) for one.

Assuming both fields are used by the query's where clause, the more
selective one should be first.  I think.  testing will tell for sure.

> Is the second solution likely to behave faster ?
> Or is it simply better to:
> CREATE INDEX myindexa ON mytable USING (A);
> CREATE INDEX myindexb ON mytable USING (B);

Maybe.  If one is very selective then there's no great need for the
other anyway.

> - as we do many insert and very few update/delete, I thought REINDEX was
> going to be superfluous. But REINDEXing is often needed to keep the size
> of the db _relatively_ reasonable. Does it sound normal ?

Yes, if you have a lot of failed inserts.  a failed insert = insert + delete.

pgsql-performance by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: slow full table update
Next
From: Tomas Vondra
Date:
Subject: Re: slow full table update