Re: Storage consumption - Mailing list pgsql-general

From Bruce Momjian
Subject Re: Storage consumption
Date
Msg-id 200311141826.hAEIQCD23265@candle.pha.pa.us
Whole thread Raw
In response to Storage consumption  (Troels Arvin <troels@arvin.dk>)
List pgsql-general
Did you see the FAQ item on estimating disk space?

---------------------------------------------------------------------------

Troels Arvin wrote:
> Hello,
>
> For some very data-intensive projects it's interesting how much space the
> DBMS uses for the storage of data, so I'm investigating how space
> efficient different DBMSes are.
>
> In the PostgreSQL manual, it's written that values of the type INTEGER
> take op four bytes. I was curious about how close to real-World this
> number is, so I did a test: How much space does PostgreSQL use when
> storing 100000 rows where each row consists of a single INTEGER value?
>
> With help from http://random.org/ I created a file with 100000 random
> integer insertions. The SQL used to do that is available at
> http://troels.arvin.dk/db/tests/storage-sizes/randomints.zip
>
> About installation: PostgreSQL v. 7.3.4 on Red Hat Linux 9, file system
> ext3. PostgreSQL data-area in /var/lib/pgsql/data.
>
> For this test, PostgreSQL is being used for nothing else.
>
> Before test start:
> -----------------
> Access to a default database ('psql' brings you right into a working
> database) from psql.
> Access to do a 'du' (disk usage unix-command) on /var/lib/pgsql/data from
> the command line.
> No existing table 'inttab' in database. PostgreSQL stopped.
>
> Test starts.
> -----------
> Output of 'du -sb /var/lib/pgsql/data': 77946519.
> Start PostgreSQL.
> Do: "CREATE TABLE inttab (intval INT) WITHOUT OIDS;"
> psql -q -f random_ints.sql
> (Wait for a long time.)
> Do: "VACUUM FULL;"
> Shut down PostgreSQL.
> Output of 'du -sb /var/lib/pgsql/data': 81190551.
>
> Result:
> ------
> Real difference: 81190551-77946519 = 3244032
> Optimal difference: 100000*4       =  400000
> Storage consumption rate ((real/optimal)*100)% = 811%
>
> I'm surprised by an overhead _that_ high. Any comments on my methology?
> Does it need adjustments? If you think it's rotten: What methology would
> you use to measure space overhead for a DBMS? (Again: Space overhead is
> seldomly interesting, but sometimes it is.)
>
> I guess that transaction log files are a joker in this context, but then
> again: A number which reflects the DBMS' disk usage before and after an
> operation does have real-World meaning, I think.
>
> (Of course, I'll need another methology for DBMSes which preallocate a
> fixed amount of storage for a database.)
>
> --
> Greetings from Troels Arvin, Copenhagen, Denmark
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-general by date:

Previous
From: "Stephen"
Date:
Subject: Re: embedded postgresql
Next
From:
Date:
Subject: Re: Conservation of OIDs