Home > mailing lists

Re: Disk usage question - Mailing list pgsql-performance

From	Scott Marlowe
Subject	Re: Disk usage question
Date	November 12, 2008 19:44:20
Msg-id	dcc563d10811121544j25210153y26ca277d1ce04ee7@mail.gmail.com Whole thread Raw
In response to	Disk usage question (Franck Routier <franck.routier@axege.com>)
List	pgsql-performance

Tree view

On Wed, Nov 12, 2008 at 10:02 AM, Franck Routier
<franck.routier@axege.com> wrote:
> Hi,
>
> I have to manage a database that is getting way too big for us.
> Currently db size is 304 GB.
>
> One table is accounting for a third of this space.
> The table itself has 68.800.000 tuples, taking 28GB.
>
> There are 39 indices on the table, and many of them use multiple
> columns. A lot of these indices share the same column(s).
> The indices are taking 95GB.
>
> So, here are my questions:
>
> - do these figures seem normal or is there likely a bigger problem ?

Can't really say.  Is this a table with a single integer column?  Then
it's way too big.  If it's got plenty of columns, some of which are
text or bytea then probably not.  What does vacuum verbose say about
your tables / db?

> - when indices share a column, is it worth creating several multi-column
> indices (as we do now), or would we get the same result (from a
> performance point of view) by creating several single column indices
> (one for each column) ?

No, single field indexes are not as fast as multi-field indexes when
the where clause hits the fields starting from the left of the index.
Note that an index on (a,b) will not help a where clause on only b.

> - does the order in which a multi-column index is created matter ? That
> is, if I have a column A with less discriminating values and a column B
> with more discriminating values, does it matter if I:
>  'CREATE INDEX myindex ON mytable USING (A,B) '
> or
> 'CREATE INDEX myindex ON mytable USING (A,B) '

Those look the same, I assume you meant USING (B,A) for one.

Assuming both fields are used by the query's where clause, the more
selective one should be first.  I think.  testing will tell for sure.

> Is the second solution likely to behave faster ?
> Or is it simply better to:
> CREATE INDEX myindexa ON mytable USING (A);
> CREATE INDEX myindexb ON mytable USING (B);

Maybe.  If one is very selective then there's no great need for the
other anyway.

> - as we do many insert and very few update/delete, I thought REINDEX was
> going to be superfluous. But REINDEXing is often needed to keep the size
> of the db _relatively_ reasonable. Does it sound normal ?

Yes, if you have a lot of failed inserts.  a failed insert = insert + delete.

pgsql-performance by date:

From: "Scott Marlowe"
Date: 12 November 2008, 18:59:00
Subject: Re: slow full table update

From: Tomas Vondra
Date: 12 November 2008, 20:13:32
Subject: Re: slow full table update

Re: Disk usage question - Mailing list pgsql-performance

Previous

Next