Estimating Database Disk Space - Mailing list pgsql-docs
From | Richard A Lough |
---|---|
Subject | Estimating Database Disk Space |
Date | |
Msg-id | 3CA9ED80.40F2B87C@dnet.co.uk Whole thread Raw |
Responses |
Re: Estimating Database Disk Space
|
List | pgsql-docs |
Hi All, I have attached my attempt at documenting the use of hard disk storage by a postgres database. I'll point out that I am still on the learning curve for postgres, and my assumptions may well be completely wrong. The main point I think that needs validation is whether it is best to create a model of a database in order to get an accurate estimate of the space needed. TIA Richard A LoughChapter 10. Disk Storage Estimating Disk Storage requirements for a database. The following advice is provided for a machine running Linux and Postgresql 7.1. The filesystem in use is ext2. There is no easy way to estimate precisely the storage requirements for a given database. If an accurate size estimate isneeded, it is suggested that a dummy database is set up with suitable fields, numbers of tables and queries. It shouldbe filled with suitable random data and operated as a database for a short time before attempting to measure the diskstorage needed to store and manage all the information. Even this method provides only an estimate, and the figures should be carefully reviewed. Some general guidelines can be provided to assist the assessment. The following figures are taken from the FAQ. They supposethe database containd one table of 100000 lines, each with 20 characters and one integer. Including end-of-line bytes,a text file containing the information would be 2.8 MegaBytes (MB). The Users Guide Ch 3 "Data Types" provides information on the storage requirements for each piece of information. We findthat Numeric types: integer, is held in four bytes. With a 20 byte text field, the information in each row is held in24 bytes. Each row has a header of 36 bytes (approximately) and a pointer to the table of 4 bytes. The total storage requiredfor each row of data is 64 bytes. Each datapage size in postgres is 8192 bytes (8K) and each page can hold up to 128 rows. Our row size of 64 bytes allowsthe page to be completely filled. If, for example the row size were 129 bytes, we could fit 63 rows into the page,leaving 65 bytes unused. Similarly, our 100000 lines of text will try to fit into 781.25 pages, with the 782nd pagebeing three-quarters empty. As another example, our 64 byte row could be used to store a date, an integer, and four real floating point numbers, withtheir header and pointer. The Users Guide will reveal that each field takes four bytes to store the information. Our calculation tells us that we will use 782 pages, each 8K to store the data. That is about 6.4 MB. This is not the wholestory. One table of 100000 rows is relatively inexpensive for postgres to manage. We would reasonably expect that a database of 10000 tables, each of 10 rows using 64 bytes would also need 6.4 MB for storage.For comparison, we will assume that our second database has rows containing a date, an integer and four real numbers.We should suspect, however, that postgres will need more resources to manage a database with a large number of tables.If there are hidden overheads needed to manage tables our database of 10000 tables should reveal them. To discover these hidden resources we will need some tools to help us find what we are looking for. If our database is alreadyin operation, disk useage (man du) is what we want to know. Try $ du -s * | sort -r -n | head -10 in $PGDATA That should give you the ten largest users of disk space in the postgres data directory. Further examination will reveala directory for each database. Unfortunately, they all have numbers that match the oid instad of the database name.There is a program in contrib (oid2name) that takes these numbers and gives you the database name. The xlog segment can take up a lot of space. Each incremental increase in size is usually set at 16 MB. We can expect tobe fortunate with our first database of one table in consuming "only" 16 MB, giving a total of about 22 MB. We can expectour database of 10000 tables to range in size from 22 MB to 38MB, and consider ourselves unfortunate if disk useagerises to 54 MB. That, however, is not the complete picture. We begin our next search by logging in to template1, and asking for the list of databases template1> \l This should show all the databases in the system, and the list should include template 0, template1, lo_demo and any otherdatabases created on the system and still active. We are looking for those bits of information (parts of indexes) whichremain behind even after $ vacuumdb database is run on a database to clean "dead" information out of it. Creating and deleting tables within a database is one activity known to cause this problem. To see the extent of the problem,access each database and search for relpages, like so... select relname, relpages from pg_class order by relpages desc limit 10; This should give the top ten sets of offenders within each database. Each relpage is 8K in size. You need to do a lot of creating and deleting of tables to make this problem noticeable. If your database seems to be growingexponentially and you are adding and deleting tables very frequently, this may be your problem. The only way at present(March 2002) to fix the problem is to dump, delete, create, and restore the database. It's reasonable to omit calculations of this wastage from estimates of the disk space which will be used by the database.If a problem with index bloat is suspected regular checks and measures to control the problem should be put in place. Summary The crucial question to ask when preparing any estimate is : "How accurate does this need to be?" If your estimate needsto be closer than +100% -50% you will need some form of historical information to improve the accuracy of your estimate.The most cost effective way of doing this is by generating a model of the database, operating it for a short time,and measuring the storage needed.
pgsql-docs by date: