Re: Help estimating database and WAL size - Mailing list pgsql-general

From Daniel Serodio (lists)
Subject Re: Help estimating database and WAL size
Date
Msg-id 507C7947.1090803@mandic.com.br
Whole thread Raw
In response to Re: Help estimating database and WAL size  (Jasen Betts <jasen@xnet.co.nz>)
List pgsql-general
Jasen Betts wrote:
On 2012-10-08, Daniel Serodio (lists) <daniel.lists@mandic.com.br> wrote:
We are preparing a PostgreSQL database for production usage and we need 
to estimate the storage size for this database. We're a team of 
developers with low expertise on database administration, so we are 
doing research, reading manuals and using our general IT knowledge to 
achieve this.

We have actual data to migrate to this database and some rough 
estimations of growth. For the sake of the example, let's say we have a 
estimation of growth of 50% per year.

The point is: what's the general proper technique for doing a good size 
estimation?

We are estimating the storage usage by the following rules. Topics where 
we need advice are marked with ** asterisks **. Feedback on the whole 
process is more than welcome.

1) Estimate the size of each table    1.1) Discover the actual size of each row.        - For fields with a fixed size (like bigint, char, etc) we used 
the sizes described in the documentation        - For fields with a dynamic size (like text) we estimated the 
string length and used the function select pg_column_size('expected text 
here'::text)
long text is subject to compression, pg_column_size doesn't seem to 
test compression, compression is some sort of LZ..
Interesting, I didn't know about automatic compression. I've just read the section on TOAST and haven't been able to answer this either: Is there any way to check for the compressed size?
         - We added 4 more bytes for the OID that PostgreSQL uses internally
OID is optional, IIRC PGXID is not 
I hadn't heard of PGXID, I've just searched Google but found no reference to this term except for this e-mail thread and some source code. What is PGXID? Where can I learn more about hit?
     1.2) Multiply the size of each row by the number of estimated rows    ** Do I need to consider any overhead here, like row or table 
metadata? **
page size 8K
column overhead 1 byte per not-NULL column, NULLs are free,

2) Estimate the size of each table index    ** Don't know how to estimate this, need advice here **
IIRC
( data being indexed + 8 bytes ) / fill factor

3) Estimate the size of the transaction log    ** We've got no idea how to estimate this, need advice **
how big are your transactions?
Very short, a couple of statements each.
4) Estimate the size of the backups (full and incremental)    ** Don't know how to estimate this, need advice here **
depends on the format you use, backups tend to compress well.

5) Sum all the estimates for the actual minimum size
no, you get estimated size.
Thanks a lot for the response.

Regards,
Daniel Serodio
6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates 
1, 2 and 4 for the minimum size after 1 year

7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5 
and 6 for a good safety margin

I know the rules got pretty extensive, please let me know if you need 
more data or examples for a better understanding.

We've also posted this question to 
http://dba.stackexchange.com/q/25617/10166

pgsql-general by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Is pg_basebackup also for "regular" backups?
Next
From: "Daniel Serodio (lists)"
Date:
Subject: Re: Help estimating database and WAL size