Thread: frustration with database size

frustration with database size

From
Andrew Gould
Date:
We purchase data from the state gevernment, change the
data model from flat file to relational, perform
additional analysis and put it all into a PostgreSQL
7.1.3 database running on FreeBSD 4.4.

With one year's worth of data, the database was
approximately 4GB.  I expected that adding a second
year would increase the size of the database to 9GB or
10GB.  After adding the second year's data, the size
was 15GB.  To reduce the size, I dropped 2 tables (one
from each year) that are used only for initial
processing.  This decreased the size, after vacuuming,
to 12GB.

I converted the database to MySQL for an industry peer
who purchased the data, but isn't allowed to run Unix.
 The resulting database was approximately 5GB.
Looking at the vast difference is size, I thought that
there might be a lot of space wasted in tables that
can't be cleaned out by vacuum.  Therefore, I:

1.  Used COPY TO to move the largest tables to text
files. (Nulls were recorded as ''.)
2.  Deleted all rows from these tables.
3.  Vacuumed the tables.
4.  Dropped the indexes for these tables.
5.  Used COPY FROM to repopulate the tables.
6.  Recreated the indexes.
7.  Used vacuumdb -d [database name] to vacuum the
database.

The process above resulted in an **increase** in
database size from 12GB to 14GB.

I'm both surprised and stumped.  Obviously, there's
much I'm not understanding about COPY, vacuumdb and
database size.

Does anyone have any guidance?

Thanks,

Andrew Gould

__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

Re: frustration with database size

From
Einar Karttunen
Date:
On 20.01.02 05:12 -0800(+0000), Andrew Gould wrote:
> The process above resulted in an **increase** in
> database size from 12GB to 14GB.
>
> I'm both surprised and stumped.  Obviously, there's
> much I'm not understanding about COPY, vacuumdb and
> database size.
>
> Does anyone have any guidance?
>
Which files are taking up the space? Just look inside the
database directory. The files are named by oids. One
suspect is indexes, which may result in space loss.

- Einar Karttunen

Re: frustration with database size

From
Tom Lane
Date:
Andrew Gould <andrewgould@yahoo.com> writes:
> The process above resulted in an **increase** in
> database size from 12GB to 14GB.
> I'm both surprised and stumped.

Seems odd to me too.  Like Einar, I am wondering about index sizes.

An easy way to get some data is to do a VACUUM so that the relpages
statistics are up to date, and then do

select relname,relkind,relpages from pg_class order by relpages desc;

            regards, tom lane

Re: frustration with database size

From
Andrew Gould
Date:
Thanks!  I must have missed dropping/recreating one of
the primary key indexes.  The relpages for the index
exceeded the relpages for the table.

I dropped and recreated the index, and vacuumed the
database.  The overall database size is down to 11GB.
It's still large; but at least the numbers make more
sense.

Thanks again,

Andrew Gould

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Gould <andrewgould@yahoo.com> writes:
> > The process above resulted in an **increase** in
> > database size from 12GB to 14GB.
> > I'm both surprised and stumped.
>
> Seems odd to me too.  Like Einar, I am wondering
> about index sizes.
>
> An easy way to get some data is to do a VACUUM so
> that the relpages
> statistics are up to date, and then do
>
> select relname,relkind,relpages from pg_class order
> by relpages desc;
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

Re: frustration with database size

From
"Thomas F. O'Connell"
Date:
> An easy way to get some data is to do a VACUUM so that the relpages
> statistics are up to date, and then do
>
> select relname,relkind,relpages from pg_class order by relpages desc;


well, that certainly is data. but what does it mean?

what does this query actually spell out?

to what do relkind and relpages refere?

just curious.

thanks!

-tfo


Re: frustration with database size

From
Einar Karttunen
Date:
On 21.01.02 14:40 -0600(+0000), Thomas F. O'Connell wrote:
> >An easy way to get some data is to do a VACUUM so that the relpages
> >statistics are up to date, and then do
> >
> >select relname,relkind,relpages from pg_class order by relpages desc;
>
>
> well, that certainly is data. but what does it mean?
> what does this query actually spell out?
> to what do relkind and relpages refere?
> just curious.
>
It's documented in the developers guide:
http://www.postgresql.org/idocs/index.php?catalog-pg-class.html

relname         Name of the table, index, view, etc.
relkind         'r'=ordinary table,'i'=index,'S'=sequence,'v'=view,
                's'=special,'t'=secondary TOAST table
relpages        Size of the on-disk representation of this table in pages
                (size BLCKSZ).  This is only an approximate value
                   which is calculated during vacuum.

- Einar Karttunen

Re: frustration with database size

From
Vivek Khera
Date:
>>>>> "AG" == Andrew Gould <andrewgould@yahoo.com> writes:

AG> We purchase data from the state gevernment, change the
AG> data model from flat file to relational, perform
AG> additional analysis and put it all into a PostgreSQL
AG> 7.1.3 database running on FreeBSD 4.4.

In my previous business, we used to do the same thing (buy state data,
make it relational and sell it) with Postgres 6.5.  It kept on
crashing and losing data, so I moved it to MySQL.  That worked
ok... until I bought their isampack utility to compress the data
tables (making them read-only) at which point it worked extremely
well, and took up very little disk space.  If I were to do it today,
I'd definitely be using PG 7.1.3, regardless of disk space usage.

I think if you analyze your field types you might find some wasted
space there.  MySQL tends to convert all CHAR fields to VARCHAR
whenever there is any dynamic sized field in the table.  Perhaps doing
that for PG will help reduce your size.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: frustration with database size

From
Mark Rae
Date:
Einar Karttunen wrote:
> On 21.01.02 14:40 -0600(+0000), Thomas F. O'Connell wrote:
> > well, that certainly is data. but what does it mean?
> > what does this query actually spell out?
> > to what do relkind and relpages refere?
> > just curious.
> >
> It's documented in the developers guide:
> http://www.postgresql.org/idocs/index.php?catalog-pg-class.html

I have some CGI scripts which I use to produce web pages
for monitoring the size of my databases. which uses similar
queries.

I was going to be packaging this up nicely at some point, but if
you think it might be of some use you can get a copy of it as it
currently stands from

http://www.rahu.demon.co.uk/pgstat.tgz

You should be able to untar it into your cgi-bin directory, and then
change the user/password etc. in PGSConfig.pm

You can then access it as
http://machinename/cgi-bin/pgstat/dblist.pl

    -Mark

--
Mark Rae                                       Tel: +44(0)20 7074 4648
Inpharmatica                                   Fax: +44(0)20 7074 4700
m.rae@inpharmatica.co.uk                http://www.inpharmatica.co.uk/