Estimating space required for indexes - Mailing list pgsql-general

From Shridhar Daithankar
Subject Estimating space required for indexes
Date
Msg-id 200304281308.58996.shridhar_daithankar@nospam.persistent.co.in
Whole thread Raw
Responses Re: Estimating space required for indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hi all,

Yesterday I was trying to upload a medium size terrain data dump to a
postgresql database. (350 files, of sizes between 8-13MB each.. 3.3GB all in
all). The load was done in 45 minutes.(7.3.2 is real fast for data loading is
my feeling, anyway..). It took 4GB of disk space after upload.

A tuple consists of 3 float values, x,y,z. I had to create a compound index on
x and y. I started indexing it and killed it 1.5 hours later as it filled
rest of the 5GB free disk upto point of choking.

Since it wasn't much of time, I recreated the database, created the index
first and then started bulk loading. It took 3 hours and 10 minutes which is
not bad at all.

How can I predict reasonably how much disk space I am going to need for such
kind of indexing operation? This data is just a small sample of things and
more data is coming. Is there any other solution from inerting into indexed
table? BTW, oracle 8i chocked equally badly when we attemted to create index
on such a large table..

Some observations which I feel worth mentioning..

Machnie:- P-III/800MHZ-256MB-40GB IDE disk.
Postgresql settings: Shared buffers=800, fsync off

1) With 5 concurrent processes loading data, it took 45 minutes to insert
3.3GB of dump, with each file worth ~200K rows done in single transaction.
For indexed insertion, I didn't use more than one connection as disk was
already choking. The text dump were read from same disk and database was on
the same one, just on different partitions..

2) The floats are real i.e. float4. Table is without oids. So a tuple is 12
bytes of data. It shows 395320 pages i.e. more than 3 gigs of data. It
roughly has 8.1M rows. Should that have been bit less of disk usage? 8.1M*12
gives around 100MB of raw data. Amounting for all overhead, the disk usage
seems bit too much.

3) Index is taking 376840 pages i.e. another 3 gigs. I understand it is
compind index but could it be made bit more space savvy?

4) I got bitten by planner.. x=-81.45 reasults sequential scan and
x=-81.45::real results index scan.. Oops!..

I am really impressed with the way 7.3.2 is handling data on such a low spec
machine.. Will come back again if I have any more problems/comments..

 Shridhar


pgsql-general by date:

Previous
From: "Bruno Baguette"
Date:
Subject: Re: Cannot use the queries buffer of psql... :-(
Next
From: "Egyud Csaba"
Date:
Subject: Re: Returning with a userd defined type (PL/pgSQL)