Re: Estimating space required for indexes - Mailing list pgsql-general

From Tom Lane
Subject Re: Estimating space required for indexes
Date
Msg-id 18084.1051539021@sss.pgh.pa.us
Whole thread Raw
In response to Estimating space required for indexes  (Shridhar Daithankar <shridhar_daithankar@persistent.co.in>)
Responses Re: Estimating space required for indexes  (Shridhar Daithankar <shridhar_daithankar@persistent.co.in>)
Re: Estimating space required for indexes  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-general
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> Yesterday I was trying to upload a medium size terrain data dump to a
> postgresql database. (350 files, of sizes between 8-13MB each.. 3.3GB all in
> all). The load was done in 45 minutes.(7.3.2 is real fast for data loading is
> my feeling, anyway..). It took 4GB of disk space after upload.

> A tuple consists of 3 float values, x,y,z. I had to create a compound index on
> x and y. I started indexing it and killed it 1.5 hours later as it filled
> rest of the 5GB free disk upto point of choking.

AFAIK, a CREATE INDEX should require no more than twice the finished
index's size on disk.  I'm surprised that you were able to build the
index one way and not the other.

> How can I predict reasonably how much disk space I am going to need for such
> kind of indexing operation?

Assuming your "float"s were float4s, the heap tuple size is

    28 bytes overhead + 3 * 4 bytes data = 40 bytes/row

(assuming WITHOUT OIDS, no nulls, Intel-like alignment rules) while the
index tuple size is

    12 bytes overhead + 2 * 4 bytes data = 20 bytes/row

But this is not the whole story because heap pages are normally crammed
full while btree index pages are normally only filled 2/3rds full during
initial creation.  (Plus you have to allow for upper b-tree levels, but
with such small index entries that won't be much.)  So I'd have expected
the finished index to be about 3/4ths the size of the table proper.
I'm surprised you could fit it at all.

> This data is just a small sample of things and
> more data is coming.

Better buy more disk ...

            regards, tom lane


pgsql-general by date:

Previous
From: Jeremiah Jahn
Date:
Subject: Re: > 16TB worth of data question
Next
From: David Link
Date:
Subject: Re: pq_recvbuf: unexpected EOF