Thread: Estimating Database Disk Space

Estimating Database Disk Space

From
Richard A Lough
Date:
Hi All,

I'm running postgres.7.1.3-7

I'm concerned at the steady growth of disk space taken up by my
postgres.
I have three databases, each of which take up about 2+ MB when dumped.
I have tried deleting one database and re-installing. This suggests that
about 38Mb of disk space is taken up by each database. I suspect that if
postgres were deleted entirely, about 250 MB would be freed.

The FAQ suggests that about 7MB might be taken up by one database when
copied from a file into postgres. Other posts suggest that xlog might
add a further 20MB as an overhead. Indexes are mentioned elsewhere, but
without any means of estimating what their size should be.

Do the above figures seem sensible? Is the required disk space likely
to grow in proportion to the data, or is there an overhead of about
60MB of disk space needed to run each database?

TIA

Richard A Lough

Re: Estimating Database Disk Space

From
Tom Lane
Date:
Richard A Lough <ralough.ced@dnet.co.uk> writes:
> I have three databases, each of which take up about 2+ MB when dumped.
> I have tried deleting one database and re-installing. This suggests that
> about 38Mb of disk space is taken up by each database. I suspect that if
> postgres were deleted entirely, about 250 MB would be freed.

A "du" on your $PGDATA directory would give you some facts instead of
guesses.

FWIW, the per-database overhead is a couple of megabytes in my
experience (basically the size of another set of system catalogs).

pg_xlog will occupy some small multiple of 16MB depending on your
WAL configuration settings.  This is independent of how many databases
you have in the installation --- but it is affected by your transaction
rate.

If your installation has been around for awhile then you might be
noticing growth of pg_log --- it grows at a steady rate of 2 bits per
transaction.  In 7.2 and later there are provisions to recycle that
space, but not in 7.1.

Or you might be seeing problems with index bloat.  Hard to tell with
no breakdown of the space usage to go on.

            regards, tom lane

Re: Estimating Database Disk Space

From
Richard A Lough
Date:
Tom Lane wrote:
>
> Richard A Lough <ralough.ced@dnet.co.uk> writes:
> > I have three databases, each of which take up about 2+ MB when dumped.
> > I have tried deleting one database and re-installing. This suggests that
> > about 38Mb of disk space is taken up by each database. I suspect that if
> > postgres were deleted entirely, about 250 MB would be freed.
>
> A "du" on your $PGDATA directory would give you some facts instead of
> guesses.
>
It seems to have grown a little, du says:
250540     base
32812     pg_xlog
472    global
4    postmaster.pid
4    postmaster.opts
4    PG_VERSION
0    postgresql.conf

Databases, dumped in text format are 3.3MB, 3.3MB and 1.6MB,
though there are be a couple of empty temporary databases
on the system as well.

Thanks for your reply

Richard A Lough

Re: Estimating Database Disk Space

From
Tom Lane
Date:
Richard A Lough <ralough.ced@dnet.co.uk> writes:
>> A "du" on your $PGDATA directory would give you some facts instead of
>> guesses.
>>
> It seems to have grown a little, du says:
> 250540     base
> 32812     pg_xlog
> 472    global
> 4    postmaster.pid
> 4    postmaster.opts
> 4    PG_VERSION
> 0    postgresql.conf

Okay, so your space usage is indeed mostly in the databases, and not the
overhead like pg_xlog.

> Databases, dumped in text format are 3.3MB, 3.3MB and 1.6MB,
> though there are be a couple of empty temporary databases

250MB of disk space does seem a tad high for that much data.  Have you
vacuumed lately?  If so, you could drill down to the next level of
detail by looking at the relpages column of pg_class in each database
to see which tables or indexes are using the most space.  (relpages is
measured in units of 8K disk blocks, btw, and it's only approximate
because it's generally only updated by vacuum.)  Something like this
would give you the top ten offenders:

select relname, relpages from pg_class order by relpages desc limit 10;

            regards, tom lane

Re: Estimating Database Disk Space

From
Richard A Lough
Date:
Tom Lane wrote:
>
SNIP
> > Databases, dumped in text format are 3.3MB, 3.3MB and 1.6MB,
> > though there are be a couple of empty temporary databases
>
> 250MB of disk space does seem a tad high for that much data.  Have you
> vacuumed lately?  If so, you could drill down to the next level of
> detail by looking at the relpages column of pg_class in each database
> to see which tables or indexes are using the most space.  (relpages is
> measured in units of 8K disk blocks, btw, and it's only approximate
> because it's generally only updated by vacuum.)  Something like this
> would give you the top ten offenders:
>
> select relname, relpages from pg_class order by relpages desc limit 10;
>
>                         regards, tom lane
>
Hmmm.. \l tells me I have nine databases, including postgres, lo_demo,
template0, and template1. Template0 refuses a connection, and three
of the other databases are empty. Only one of my active databases
shows significant figures (recently increased):

relpages | relname
17473    pg_attribute_relid_attnam_index
7267    pg_attribute_relid_attnum_index
3245    pg_attribute
1012    pg_statistic_relid_att_index
646    pg_class_oid_index
645    pg_type_oid_index
507    pg_class_relname_index
478    pg_type_typname_index
77    pg_class
76    pg_type

Of these pg_attribute and pg_statistic_relid_att_index have grown
significantly (3245/909, and 1012/549 respectively). At most I have
added an average of one tuple to the 3126 tables. I'm adding
frequently, so size is a moving target.

I had a look (I think) at the indexes for nasdaq01 using pgaccess.
No indexes were shown, I'm not sure I got that right. Size
dumped with pg_dump is 2.6MB for the relpages given above.

Richard A Lough

Re: Estimating Database Disk Space

From
Tom Lane
Date:
Richard A Lough <ralough.ced@dnet.co.uk> writes:
> Only one of my active databases
> shows significant figures (recently increased):

> relpages | relname
> 17473    pg_attribute_relid_attnam_index
> 7267    pg_attribute_relid_attnum_index
> 3245    pg_attribute

Yipes ... you must do a lot of table creation and deletion.

The index bloat here is a known result of inefficiency in vacuuming
indexes (there's no good way to reclaim index pages at the moment).
You could probably fix it by REINDEXing pg_attribute, but it might
be easier to just dump, dropdb, createdb, reload that database.

Yes, there's a TODO item to improve index vacuuming ...

            regards, tom lane

Re: Estimating Database Disk Space

From
Richard A Lough
Date:
Tom Lane wrote:
>
> Richard A Lough <ralough.ced@dnet.co.uk> writes:
> > Only one of my active databases
> > shows significant figures (recently increased):
>
> > relpages | relname
> > 17473 pg_attribute_relid_attnam_index
> > 7267  pg_attribute_relid_attnum_index
> > 3245  pg_attribute
>
> Yipes ... you must do a lot of table creation and deletion.
>
> The index bloat here is a known result of inefficiency in vacuuming
> indexes (there's no good way to reclaim index pages at the moment).
> You could probably fix it by REINDEXing pg_attribute, but it might
> be easier to just dump, dropdb, createdb, reload that database.
>

I kinda suspected something was not quite right. The input method
is a perl script which was "borrowed." It's probably not the most
efficient for this task, but it works and code reuse is good, no? :-)

> Yes, there's a TODO item to improve index vacuuming ...
>
Speaking of TODO's, I think I have enough material for a first draft
on disk storage use. If you can point me at your documentation people
I will forward it.

Thanks again for your help.

Richard A Lough

Re: Estimating Database Disk Space

From
Tom Lane
Date:
Richard A Lough <ralough.ced@dnet.co.uk> writes:
> Speaking of TODO's, I think I have enough material for a first draft
> on disk storage use. If you can point me at your documentation people
> I will forward it.

pgsql-docs mailing list is the right place for that sort of discussion.

            regards, tom lane