Thread: frustration with database size
We purchase data from the state gevernment, change the data model from flat file to relational, perform additional analysis and put it all into a PostgreSQL 7.1.3 database running on FreeBSD 4.4. With one year's worth of data, the database was approximately 4GB. I expected that adding a second year would increase the size of the database to 9GB or 10GB. After adding the second year's data, the size was 15GB. To reduce the size, I dropped 2 tables (one from each year) that are used only for initial processing. This decreased the size, after vacuuming, to 12GB. I converted the database to MySQL for an industry peer who purchased the data, but isn't allowed to run Unix. The resulting database was approximately 5GB. Looking at the vast difference is size, I thought that there might be a lot of space wasted in tables that can't be cleaned out by vacuum. Therefore, I: 1. Used COPY TO to move the largest tables to text files. (Nulls were recorded as ''.) 2. Deleted all rows from these tables. 3. Vacuumed the tables. 4. Dropped the indexes for these tables. 5. Used COPY FROM to repopulate the tables. 6. Recreated the indexes. 7. Used vacuumdb -d [database name] to vacuum the database. The process above resulted in an **increase** in database size from 12GB to 14GB. I'm both surprised and stumped. Obviously, there's much I'm not understanding about COPY, vacuumdb and database size. Does anyone have any guidance? Thanks, Andrew Gould __________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
On 20.01.02 05:12 -0800(+0000), Andrew Gould wrote: > The process above resulted in an **increase** in > database size from 12GB to 14GB. > > I'm both surprised and stumped. Obviously, there's > much I'm not understanding about COPY, vacuumdb and > database size. > > Does anyone have any guidance? > Which files are taking up the space? Just look inside the database directory. The files are named by oids. One suspect is indexes, which may result in space loss. - Einar Karttunen
Andrew Gould <andrewgould@yahoo.com> writes: > The process above resulted in an **increase** in > database size from 12GB to 14GB. > I'm both surprised and stumped. Seems odd to me too. Like Einar, I am wondering about index sizes. An easy way to get some data is to do a VACUUM so that the relpages statistics are up to date, and then do select relname,relkind,relpages from pg_class order by relpages desc; regards, tom lane
Thanks! I must have missed dropping/recreating one of the primary key indexes. The relpages for the index exceeded the relpages for the table. I dropped and recreated the index, and vacuumed the database. The overall database size is down to 11GB. It's still large; but at least the numbers make more sense. Thanks again, Andrew Gould --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Gould <andrewgould@yahoo.com> writes: > > The process above resulted in an **increase** in > > database size from 12GB to 14GB. > > I'm both surprised and stumped. > > Seems odd to me too. Like Einar, I am wondering > about index sizes. > > An easy way to get some data is to do a VACUUM so > that the relpages > statistics are up to date, and then do > > select relname,relkind,relpages from pg_class order > by relpages desc; > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html __________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
> An easy way to get some data is to do a VACUUM so that the relpages > statistics are up to date, and then do > > select relname,relkind,relpages from pg_class order by relpages desc; well, that certainly is data. but what does it mean? what does this query actually spell out? to what do relkind and relpages refere? just curious. thanks! -tfo
On 21.01.02 14:40 -0600(+0000), Thomas F. O'Connell wrote: > >An easy way to get some data is to do a VACUUM so that the relpages > >statistics are up to date, and then do > > > >select relname,relkind,relpages from pg_class order by relpages desc; > > > well, that certainly is data. but what does it mean? > what does this query actually spell out? > to what do relkind and relpages refere? > just curious. > It's documented in the developers guide: http://www.postgresql.org/idocs/index.php?catalog-pg-class.html relname Name of the table, index, view, etc. relkind 'r'=ordinary table,'i'=index,'S'=sequence,'v'=view, 's'=special,'t'=secondary TOAST table relpages Size of the on-disk representation of this table in pages (size BLCKSZ). This is only an approximate value which is calculated during vacuum. - Einar Karttunen
>>>>> "AG" == Andrew Gould <andrewgould@yahoo.com> writes: AG> We purchase data from the state gevernment, change the AG> data model from flat file to relational, perform AG> additional analysis and put it all into a PostgreSQL AG> 7.1.3 database running on FreeBSD 4.4. In my previous business, we used to do the same thing (buy state data, make it relational and sell it) with Postgres 6.5. It kept on crashing and losing data, so I moved it to MySQL. That worked ok... until I bought their isampack utility to compress the data tables (making them read-only) at which point it worked extremely well, and took up very little disk space. If I were to do it today, I'd definitely be using PG 7.1.3, regardless of disk space usage. I think if you analyze your field types you might find some wasted space there. MySQL tends to convert all CHAR fields to VARCHAR whenever there is any dynamic sized field in the table. Perhaps doing that for PG will help reduce your size. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
Einar Karttunen wrote: > On 21.01.02 14:40 -0600(+0000), Thomas F. O'Connell wrote: > > well, that certainly is data. but what does it mean? > > what does this query actually spell out? > > to what do relkind and relpages refere? > > just curious. > > > It's documented in the developers guide: > http://www.postgresql.org/idocs/index.php?catalog-pg-class.html I have some CGI scripts which I use to produce web pages for monitoring the size of my databases. which uses similar queries. I was going to be packaging this up nicely at some point, but if you think it might be of some use you can get a copy of it as it currently stands from http://www.rahu.demon.co.uk/pgstat.tgz You should be able to untar it into your cgi-bin directory, and then change the user/password etc. in PGSConfig.pm You can then access it as http://machinename/cgi-bin/pgstat/dblist.pl -Mark -- Mark Rae Tel: +44(0)20 7074 4648 Inpharmatica Fax: +44(0)20 7074 4700 m.rae@inpharmatica.co.uk http://www.inpharmatica.co.uk/