Re: A thought on Index Organized Tables - Mailing list pgsql-hackers

From Greg Stark
Subject Re: A thought on Index Organized Tables
Date
Msg-id 407d949e1002241012p5882cef8j6122a6f56d5a8ac1@mail.gmail.com
Whole thread Raw
In response to Re: A thought on Index Organized Tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: A thought on Index Organized Tables  (Gokulakannan Somasundaram <gokul007@gmail.com>)
List pgsql-hackers
On Wed, Feb 24, 2010 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Greg Stark <gsstark@mit.edu> wrote:
>>> That doesn't work because when you split an index page any
>>> sequential scan in progress will either see the same tuples twice
>>> or will miss some tuples depending on where the new page is
>>> allocated. Vacuum has a clever trick for solving this but it
>>> doesn't work for arbitrarily many concurrent scans.
>
>> It sounds like you're asserting that Index Scan nodes are inherently
>> unreliable, so I must be misunderstanding you.
>
> We handle splits in a manner that insures that concurrent index-order
> scans remain consistent.  I'm not sure that it's possible to scale that
> to ensure that both index-order and physical-order scans would remain
> consistent.  It might be soluble but it's certainly something to worry
> about.

It might be slightly easier given the assumption that you only want to
scan leaf tuples.

But there's an additional problem I didn't think of before. Currently
we optimize index scans by copying all relevant tuples to local memory
so we don't need to hold an index lock for an extended time or spend a
lot of time relocking and rechecking the index for changes. That
wouldn't be possible if we needed to get visibility info from the page
since we would need up-to-date information.


--
greg


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [BUGS] BUG #4887: inclusion operator (@>) on tsqeries behaves not conforming to documentation
Next
From: "Joshua D. Drake"
Date:
Subject: Re: pg_stop_backup does not complete