Re: A thought on Index Organized Tables - Mailing list pgsql-hackers

From Csaba Nagy
Subject Re: A thought on Index Organized Tables
Date
Msg-id 1266926447.14231.29.camel@pcd12478
Whole thread Raw
In response to Re: A thought on Index Organized Tables  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Hi all,

On Mon, 2010-02-22 at 10:29 +0000, Greg Stark wrote:
> On Mon, Feb 22, 2010 at 8:18 AM, Gokulakannan Somasundaram
> <gokul007@gmail.com> wrote:
> > a) IOT has both table and index in one structure. So no duplication of data
> > b) With visibility maps, we have three structures a) Table b) Index c)
> > Visibility map. So the disk footprint of the same data will be higher in
> > postgres ( 2x + size of the visibility map).
> 
> These sound like the same point to me. I don't think we're concerned
> with footprint -- only with how much of that footprint actually needs
> to be scanned. 

For some data the disk foot-print would be actually important: on our
data bases we have one table which has exactly 2 fields, which are both
part of it's primary key, and there's no other index. The table is
write-only, never updated and rarely deleted from.

The disk footprint of the table is 30%-50% of the total disk space used
by the DB (depending on the other data). This amounts to about 1.5-2TB
if I count it on all of our DBs, and it has to be fast disk too as the
table is heavily used... so disk space does matter for some. 

And yes, I put the older entries in some archive partition on slower
disks, but I just halve the problem - the data is growing exponentially,
and about half of it is always in use. I guess our developers are just
ready to get this table out of postgres and up to hadoop...

Cheers,
Csaba.




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
Next
From: Alvaro Herrera
Date:
Subject: Re: tie user processes to postmaster was:(Re: [HACKERS] scheduler in core)