Home > mailing lists

Re: A thought on Index Organized Tables - Mailing list pgsql-hackers

From	Gokulakannan Somasundaram
Subject	Re: A thought on Index Organized Tables
Date	February 23, 2010 12:47:57
Msg-id	9362e74e1002230547i39e232fdj8bd64474135d4af@mail.gmail.com Whole thread Raw
In response to	Re: A thought on Index Organized Tables (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: A thought on Index Organized Tables
List	pgsql-hackers

Tree view

On Tue, Feb 23, 2010 at 10:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
> Instead, how about excluding columns in primary keys from table data?

How will you implement "select * from mytable" ? Or even
"select * from mytable where non_primary_key = something" ?
If you can't do either of those without great expense, I think
a savings on primary-key lookups is not going to be adequate
recompense.

Tom,
      I am talking things more from the perspective of how things have got implemented in Oracle/SQL Server. Both Oracle and SQL Server store the snapshot info with indexes and hence can do index-only scans with their indexes. But still they have the concept of Index Organized Tables / Clustered Indexes. Apart from the disk footprint, it will have an impact on the cache efficiency also.
       In Oracle IOT and SQL Server Clustered Indexes, you have an option to store some of the columns in the leaf pages( but not in the non-leaf pages) and hence the tuples won't get sorted based on them, but you don't require an extra i/o to access them. This optimization is again to reduce the size of IOT. Oracle IOT has a concept called overflow regions, which is more like a heap and will store a few columns. There will be a pointer from main b-tree structure to this secondary structure. Accessing these columns are costly, but the idea is that the database designer has taken this into account while deciding on the columns to be put in the overflow regions.
       We can design secondary indexes to make the access faster for non-primary key based searches. But since the Secondary indexes store primary key in the place of HeapTuple Pointer, the access will usually take 2-3 more i/os. But the idea is that the IOT is for those kind of data. which will be 99% queried based on primary keys. The database provides that extra performance for that kind of access patterns. So to answer your question, full table scans(if overflow regions are involved) and search based on non-primary keys will be slow in an IOT.
         I looked at the postgres nbtree code. From my analysis(which might be wrong!), we can implement IOTs, provided we make a decision on broken data types issue.

Thanks,
Gokul.

pgsql-hackers by date:

From: Magnus Hagander
Date: 23 February 2010, 10:04:55
Subject: Re: Recent vendor SSL renegotiation patches break PostgreSQL

From: Tom Lane
Date: 23 February 2010, 13:28:55
Subject: Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

Re: A thought on Index Organized Tables - Mailing list pgsql-hackers

Previous

Next