Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date
Msg-id 4AAF62B9.2000003@enterprisedb.com
Whole thread Raw
In response to Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
List pgsql-hackers
Robert Haas wrote:
> Hi, I'm reviewing this patch for the 2009-09 CommitFest.

Thank you!

> It doesn't seem to compile.

Looks like the patch has bitrotted, sorry about that. Attached is an
updated version. I've also pushed this to my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch "heapfetch".

> Actually, before I even tried compiling this, I was looking through
> the joinpath.c changes, since that is an area of the code with which I
> have some familiarity.  As I'm sure you're aware, the lack of
> commenting makes it quite difficult to understand what this is trying
> to do, and the functions are poorly named.  It isn't self-explanatory
> what "bubbling up" means, even in the limited context of joinpath.c.

Yeah, understood :-(. The bubbling up refers to moving HeapFetch nodes
above a join, which I explained earlier in this thread:
http://archives.postgresql.org/message-id/4A90448F.4060601@enterprisedb.com,

> Leaving that aside, I think that the approach here is likely wrong;
> the decision about when to perform a heap fetch doesn't seem to be
> based on cost, which I think it needs to be.

Right, cost estimation and making choices based on it still missing.

>  Consider A IJ B, with
> the scan over A implemented as an index scan.  It seems to me that if
> the join selectivity is < 1, then assuming there's a choice, we
> probably want to join A to B and then do the heap fetches against A
> afterwards.  But if the join selectivity is > 1 (consider, for
> example, a cross join), we probably want to do the heap fetches first.

Hmm, good point. I didn't consider that join selectivity can be > 1.

A more common scenario is that there's an additional filter condition on
the HeapFetch, with a selectivity < 1. It can then cheaper to perform
the heap fetches first and only join the remaining rows that satisfy the
filter condition.

It could also be cheaper to perform HeapFetch first if there's a lot of
dead tuples in the table, as the heap fetch weeds them out. I'm not sure
if we can estimate that in a meaningful way.

> Am I right to think that the heap fetch node is not optional?  i.e.
> even if only columns in the index are ever used, we need to make sure
> that a heap fetch node gets inserted to check visibility.  Is that
> right?

Correct. The plan is to eventually use the visibility map to skip the
heap access altogether, but we'll need to make the visibility map 100%
reliable first. We'll still need a HeapFetch node to perform the
visibility check, but it could perform it against the visibility map
instead of the heap.

> I also think that the use of the term index-only quals is misleading.
> It seemed good when you first posted to -hackers about it, but looking
> at the patch, it means we have both index quals and index-only quals,
> and it seems like that might not be the greatest naming.  Maybe we
> could call them index-tuple quals or index-evaluated quals or
> something.

Hmm, I'm not too fond of the naming either. Not sure I like those
alternatives any better, though.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b0a911e..205b3e1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1491,8 +1491,14 @@ heap_fetch(Relation relation,
  * On entry, *tid is the TID of a tuple (either a simple tuple, or the root
  * of a HOT chain), and buffer is the buffer holding this tuple.  We search
  * for the first chain member satisfying the given snapshot.  If one is
- * found, we update *tid to reference that tuple's offset number, and
- * return TRUE.  If no match, return FALSE without modifying *tid.
+ * found, we update *tid and *heapTuple to reference that tuple, and
+ * return TRUE.  If no match, return FALSE without modifying *tid (*heapTuple
+ * is undefined in that case).
+ *
+ * To fetch the next matching tuple in the chain, call again with 'firstCall'
+ * set to FALSE and *tid still pointing to the tuple returned by previous call.
+ * (normally only one tuple in a chain can be visible at a time, but that does
+ * not hold for special snapshots like SnapshotAny)
  *
  * If all_dead is not NULL, we check non-visible tuples to see if they are
  * globally dead; *all_dead is set TRUE if all members of the HOT chain
@@ -1504,27 +1510,29 @@ heap_fetch(Relation relation,
  */
 bool
 heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
-                       bool *all_dead)
+                       HeapTuple heapTuple, bool *all_dead, bool firstCall)
 {
     Page        dp = (Page) BufferGetPage(buffer);
     TransactionId prev_xmax = InvalidTransactionId;
     OffsetNumber offnum;
     bool        at_chain_start;
+    bool        skip;

-    if (all_dead)
+    if (all_dead && firstCall)
         *all_dead = true;

     Assert(TransactionIdIsValid(RecentGlobalXmin));

     Assert(ItemPointerGetBlockNumber(tid) == BufferGetBlockNumber(buffer));
     offnum = ItemPointerGetOffsetNumber(tid);
-    at_chain_start = true;
+
+    at_chain_start = firstCall;
+    skip = !firstCall;

     /* Scan through possible multiple members of HOT-chain */
     for (;;)
     {
         ItemId        lp;
-        HeapTupleData heapTuple;

         /* check for bogus TID */
         if (offnum < FirstOffsetNumber || offnum > PageGetMaxOffsetNumber(dp))
@@ -1547,13 +1555,13 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
             break;
         }

-        heapTuple.t_data = (HeapTupleHeader) PageGetItem(dp, lp);
-        heapTuple.t_len = ItemIdGetLength(lp);
+        heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
+        heapTuple->t_len = ItemIdGetLength(lp);

         /*
          * Shouldn't see a HEAP_ONLY tuple at chain start.
          */
-        if (at_chain_start && HeapTupleIsHeapOnly(&heapTuple))
+        if (at_chain_start && HeapTupleIsHeapOnly(heapTuple))
             break;

         /*
@@ -1562,17 +1570,18 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
          */
         if (TransactionIdIsValid(prev_xmax) &&
             !TransactionIdEquals(prev_xmax,
-                                 HeapTupleHeaderGetXmin(heapTuple.t_data)))
+                                 HeapTupleHeaderGetXmin(heapTuple->t_data)))
             break;

         /* If it's visible per the snapshot, we must return it */
-        if (HeapTupleSatisfiesVisibility(&heapTuple, snapshot, buffer))
+        if (!skip && HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer))
         {
             ItemPointerSetOffsetNumber(tid, offnum);
             if (all_dead)
                 *all_dead = false;
             return true;
         }
+        skip = false;

         /*
          * If we can't see it, maybe no one else can either.  At caller
@@ -1580,7 +1589,7 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
          * transactions.
          */
         if (all_dead && *all_dead &&
-            HeapTupleSatisfiesVacuum(heapTuple.t_data, RecentGlobalXmin,
+            HeapTupleSatisfiesVacuum(heapTuple->t_data, RecentGlobalXmin,
                                      buffer) != HEAPTUPLE_DEAD)
             *all_dead = false;

@@ -1588,13 +1597,13 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
          * Check to see if HOT chain continues past this tuple; if so fetch
          * the next offnum and loop around.
          */
-        if (HeapTupleIsHotUpdated(&heapTuple))
+        if (HeapTupleIsHotUpdated(heapTuple))
         {
-            Assert(ItemPointerGetBlockNumber(&heapTuple.t_data->t_ctid) ==
+            Assert(ItemPointerGetBlockNumber(&heapTuple->t_data->t_ctid) ==
                    ItemPointerGetBlockNumber(tid));
-            offnum = ItemPointerGetOffsetNumber(&heapTuple.t_data->t_ctid);
+            offnum = ItemPointerGetOffsetNumber(&heapTuple->t_data->t_ctid);
             at_chain_start = false;
-            prev_xmax = HeapTupleHeaderGetXmax(heapTuple.t_data);
+            prev_xmax = HeapTupleHeaderGetXmax(heapTuple->t_data);
         }
         else
             break;                /* end of chain */
@@ -1616,10 +1625,12 @@ heap_hot_search(ItemPointer tid, Relation relation, Snapshot snapshot,
 {
     bool        result;
     Buffer        buffer;
+    HeapTupleData heapTuple;

     buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
     LockBuffer(buffer, BUFFER_LOCK_SHARE);
-    result = heap_hot_search_buffer(tid, buffer, snapshot, all_dead);
+    result = heap_hot_search_buffer(tid, buffer, snapshot, &heapTuple,
+                                    all_dead, true);
     LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
     ReleaseBuffer(buffer);
     return result;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 1c1cd34..4398e8c 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -99,9 +99,6 @@ RelationGetIndexScan(Relation indexRelation,
     ItemPointerSetInvalid(&scan->xs_ctup.t_self);
     scan->xs_ctup.t_data = NULL;
     scan->xs_cbuf = InvalidBuffer;
-    scan->xs_hot_dead = false;
-    scan->xs_next_hot = InvalidOffsetNumber;
-    scan->xs_prev_xmax = InvalidTransactionId;

     /*
      * Let the AM fill in the key and any opaque data it wants.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index c86cd52..616edae 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -20,7 +20,8 @@
  *        index_insert    - insert an index tuple into a relation
  *        index_markpos    - mark a scan position
  *        index_restrpos    - restore a scan position
- *        index_getnext    - get the next tuple from a scan
+ *        index_next        - advance index scan to next match
+ *        index_fetch        - fetch heap tuple for current index scan position
  *        index_getbitmap - get all tuples from a scan
  *        index_bulk_delete    - bulk deletion of index tuples
  *        index_vacuum_cleanup    - post-deletion cleanup of an index
@@ -310,8 +311,6 @@ index_rescan(IndexScanDesc scan, ScanKey key)
         scan->xs_cbuf = InvalidBuffer;
     }

-    scan->xs_next_hot = InvalidOffsetNumber;
-
     scan->kill_prior_tuple = false;        /* for safety */

     FunctionCall2(procedure,
@@ -389,32 +388,28 @@ index_restrpos(IndexScanDesc scan)
     SCAN_CHECKS;
     GET_SCAN_PROCEDURE(amrestrpos);

-    scan->xs_next_hot = InvalidOffsetNumber;
-
     scan->kill_prior_tuple = false;        /* for safety */

     FunctionCall1(procedure, PointerGetDatum(scan));
 }

 /* ----------------
- *        index_getnext - get the next heap tuple from a scan
+ *        index_next - get the next index tuple from a scan
  *
- * The result is the next heap tuple satisfying the scan keys and the
- * snapshot, or NULL if no more matching tuples exist.    On success,
- * the buffer containing the heap tuple is pinned (the pin will be dropped
- * at the next index_getnext or index_endscan).
+ * Advances to the next index tuple satisfying the scan keys, returning TRUE
+ * if there was one, FALSE otherwise. The heap TID pointer of the next match
+ * is stored in scan->xs_ctup.self.
  *
  * Note: caller must check scan->xs_recheck, and perform rechecking of the
  * scan keys if required.  We do not do that here because we don't have
  * enough information to do it efficiently in the general case.
  * ----------------
  */
-HeapTuple
-index_getnext(IndexScanDesc scan, ScanDirection direction)
+bool
+index_next(IndexScanDesc scan, ScanDirection direction)
 {
-    HeapTuple    heapTuple = &scan->xs_ctup;
-    ItemPointer tid = &heapTuple->t_self;
     FmgrInfo   *procedure;
+    bool         found;

     SCAN_CHECKS;
     GET_SCAN_PROCEDURE(amgettuple);
@@ -422,220 +417,125 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
     Assert(TransactionIdIsValid(RecentGlobalXmin));

     /*
-     * We always reset xs_hot_dead; if we are here then either we are just
-     * starting the scan, or we previously returned a visible tuple, and in
-     * either case it's inappropriate to kill the prior index entry.
+     * The AM's gettuple proc finds the next index entry matching the scan
+     * keys, and puts the TID in xs_ctup.t_self. It should also set
+     * scan->xs_recheck, though we pay no attention to that here.
      */
-    scan->xs_hot_dead = false;
+    found = DatumGetBool(FunctionCall2(procedure,
+                                       PointerGetDatum(scan),
+                                       Int32GetDatum(direction)));

-    for (;;)
-    {
-        OffsetNumber offnum;
-        bool        at_chain_start;
-        Page        dp;
+    /* Reset kill flag immediately for safety */
+    scan->kill_prior_tuple = false;

-        if (scan->xs_next_hot != InvalidOffsetNumber)
-        {
-            /*
-             * We are resuming scan of a HOT chain after having returned an
-             * earlier member.    Must still hold pin on current heap page.
-             */
-            Assert(BufferIsValid(scan->xs_cbuf));
-            Assert(ItemPointerGetBlockNumber(tid) ==
-                   BufferGetBlockNumber(scan->xs_cbuf));
-            Assert(TransactionIdIsValid(scan->xs_prev_xmax));
-            offnum = scan->xs_next_hot;
-            at_chain_start = false;
-            scan->xs_next_hot = InvalidOffsetNumber;
-        }
-        else
+    /* If we're out of index entries, release any held pin on a heap page */
+    if (!found)
+    {
+        if (BufferIsValid(scan->xs_cbuf))
         {
-            bool        found;
-            Buffer        prev_buf;
-
-            /*
-             * If we scanned a whole HOT chain and found only dead tuples,
-             * tell index AM to kill its entry for that TID.
-             */
-            scan->kill_prior_tuple = scan->xs_hot_dead;
-
-            /*
-             * The AM's gettuple proc finds the next index entry matching the
-             * scan keys, and puts the TID in xs_ctup.t_self (ie, *tid). It
-             * should also set scan->xs_recheck, though we pay no attention to
-             * that here.
-             */
-            found = DatumGetBool(FunctionCall2(procedure,
-                                               PointerGetDatum(scan),
-                                               Int32GetDatum(direction)));
-
-            /* Reset kill flag immediately for safety */
-            scan->kill_prior_tuple = false;
-
-            /* If we're out of index entries, break out of outer loop */
-            if (!found)
-                break;
-
-            pgstat_count_index_tuples(scan->indexRelation, 1);
-
-            /* Switch to correct buffer if we don't have it already */
-            prev_buf = scan->xs_cbuf;
-            scan->xs_cbuf = ReleaseAndReadBuffer(scan->xs_cbuf,
-                                                 scan->heapRelation,
-                                             ItemPointerGetBlockNumber(tid));
-
-            /*
-             * Prune page, but only if we weren't already on this page
-             */
-            if (prev_buf != scan->xs_cbuf)
-                heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
-                                    RecentGlobalXmin);
-
-            /* Prepare to scan HOT chain starting at index-referenced offnum */
-            offnum = ItemPointerGetOffsetNumber(tid);
-            at_chain_start = true;
-
-            /* We don't know what the first tuple's xmin should be */
-            scan->xs_prev_xmax = InvalidTransactionId;
-
-            /* Initialize flag to detect if all entries are dead */
-            scan->xs_hot_dead = true;
+            ReleaseBuffer(scan->xs_cbuf);
+            scan->xs_cbuf = InvalidBuffer;
         }
+        return false;
+    }

-        /* Obtain share-lock on the buffer so we can examine visibility */
-        LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE);
+    pgstat_count_index_tuples(scan->indexRelation, 1);

-        dp = (Page) BufferGetPage(scan->xs_cbuf);
+    return true;
+}

-        /* Scan through possible multiple members of HOT-chain */
-        for (;;)
-        {
-            ItemId        lp;
-            ItemPointer ctid;
-
-            /* check for bogus TID */
-            if (offnum < FirstOffsetNumber ||
-                offnum > PageGetMaxOffsetNumber(dp))
-                break;
-
-            lp = PageGetItemId(dp, offnum);
-
-            /* check for unused, dead, or redirected items */
-            if (!ItemIdIsNormal(lp))
-            {
-                /* We should only see a redirect at start of chain */
-                if (ItemIdIsRedirected(lp) && at_chain_start)
-                {
-                    /* Follow the redirect */
-                    offnum = ItemIdGetRedirect(lp);
-                    at_chain_start = false;
-                    continue;
-                }
-                /* else must be end of chain */
-                break;
-            }
-
-            /*
-             * We must initialize all of *heapTuple (ie, scan->xs_ctup) since
-             * it is returned to the executor on success.
-             */
-            heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
-            heapTuple->t_len = ItemIdGetLength(lp);
-            ItemPointerSetOffsetNumber(tid, offnum);
-            heapTuple->t_tableOid = RelationGetRelid(scan->heapRelation);
-            ctid = &heapTuple->t_data->t_ctid;
-
-            /*
-             * Shouldn't see a HEAP_ONLY tuple at chain start.  (This test
-             * should be unnecessary, since the chain root can't be removed
-             * while we have pin on the index entry, but let's make it
-             * anyway.)
-             */
-            if (at_chain_start && HeapTupleIsHeapOnly(heapTuple))
-                break;
-
-            /*
-             * The xmin should match the previous xmax value, else chain is
-             * broken.    (Note: this test is not optional because it protects
-             * us against the case where the prior chain member's xmax aborted
-             * since we looked at it.)
-             */
-            if (TransactionIdIsValid(scan->xs_prev_xmax) &&
-                !TransactionIdEquals(scan->xs_prev_xmax,
-                                  HeapTupleHeaderGetXmin(heapTuple->t_data)))
-                break;
-
-            /* If it's visible per the snapshot, we must return it */
-            if (HeapTupleSatisfiesVisibility(heapTuple, scan->xs_snapshot,
-                                             scan->xs_cbuf))
-            {
-                /*
-                 * If the snapshot is MVCC, we know that it could accept at
-                 * most one member of the HOT chain, so we can skip examining
-                 * any more members.  Otherwise, check for continuation of the
-                 * HOT-chain, and set state for next time.
-                 */
-                if (IsMVCCSnapshot(scan->xs_snapshot))
-                    scan->xs_next_hot = InvalidOffsetNumber;
-                else if (HeapTupleIsHotUpdated(heapTuple))
-                {
-                    Assert(ItemPointerGetBlockNumber(ctid) ==
-                           ItemPointerGetBlockNumber(tid));
-                    scan->xs_next_hot = ItemPointerGetOffsetNumber(ctid);
-                    scan->xs_prev_xmax = HeapTupleHeaderGetXmax(heapTuple->t_data);
-                }
-                else
-                    scan->xs_next_hot = InvalidOffsetNumber;
-
-                LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
-
-                pgstat_count_heap_fetch(scan->indexRelation);
-
-                return heapTuple;
-            }
-
-            /*
-             * If we can't see it, maybe no one else can either.  Check to see
-             * if the tuple is dead to all transactions.  If we find that all
-             * the tuples in the HOT chain are dead, we'll signal the index AM
-             * to not return that TID on future indexscans.
-             */
-            if (scan->xs_hot_dead &&
-                HeapTupleSatisfiesVacuum(heapTuple->t_data, RecentGlobalXmin,
-                                         scan->xs_cbuf) != HEAPTUPLE_DEAD)
-                scan->xs_hot_dead = false;
-
-            /*
-             * Check to see if HOT chain continues past this tuple; if so
-             * fetch the next offnum (we don't bother storing it into
-             * xs_next_hot, but must store xs_prev_xmax), and loop around.
-             */
-            if (HeapTupleIsHotUpdated(heapTuple))
-            {
-                Assert(ItemPointerGetBlockNumber(ctid) ==
-                       ItemPointerGetBlockNumber(tid));
-                offnum = ItemPointerGetOffsetNumber(ctid);
-                at_chain_start = false;
-                scan->xs_prev_xmax = HeapTupleHeaderGetXmax(heapTuple->t_data);
-            }
-            else
-                break;            /* end of chain */
-        }                        /* loop over a single HOT chain */
-
-        LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
-
-        /* Loop around to ask index AM for another TID */
-        scan->xs_next_hot = InvalidOffsetNumber;
-    }

-    /* Release any held pin on a heap page */
-    if (BufferIsValid(scan->xs_cbuf))
+/* ----------------
+ *        index_fetch - fetch the heap tuple the current index pointer points to
+ *
+ * The result is the heap tuple the current index tuple points to if it
+ * matches the snapshot, or NULL if it doesn't. On success, the buffer
+ * containing the heap tuple is pinned (the pin will be dropped at the next
+ * index_fetch or index_endscan).
+ * ----------------
+ */
+HeapTuple
+index_fetch(IndexScanDesc scan)
+{
+    ItemPointer tid = &scan->xs_ctup.t_self;
+    bool        found;
+    Buffer        prev_buf;
+
+    /*
+     * We only fetch the first matching tuple in a chain, so we can't support
+     * SnapshotAny. All other snapshots only consider one tuple in a HOT chain
+     * as visible at any given moment.
+     */
+    Assert(scan->xs_snapshot != SnapshotAny);
+
+    /* Switch to correct buffer if we don't have it already */
+    prev_buf = scan->xs_cbuf;
+    scan->xs_cbuf = ReleaseAndReadBuffer(scan->xs_cbuf,
+                                         scan->heapRelation,
+                                         ItemPointerGetBlockNumber(tid));
+
+    /*
+     * Prune page, but only if we weren't already on this page
+     */
+    if (prev_buf != scan->xs_cbuf)
+        heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
+                            RecentGlobalXmin);
+
+    /* Obtain share-lock on the buffer so we can examine visibility */
+    LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE);
+
+    found = heap_hot_search_buffer(tid, scan->xs_cbuf,
+                                   scan->xs_snapshot, &scan->xs_ctup,
+                                   &scan->kill_prior_tuple, true);
+    LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
+
+    if (found)
     {
-        ReleaseBuffer(scan->xs_cbuf);
-        scan->xs_cbuf = InvalidBuffer;
+        Assert(!scan->kill_prior_tuple);
+
+        /*
+         * We must initialize all of *heapTuple (ie, scan->xs_ctup) since
+         * it is returned to the executor on success.
+         */
+        scan->xs_ctup.t_tableOid = RelationGetRelid(scan->heapRelation);
+
+        pgstat_count_heap_fetch(scan->indexRelation);
+
+        return &scan->xs_ctup;
     }
+    else
+        return NULL;
+}

-    return NULL;                /* failure exit */
+/* ----------------
+ *        index_getnext - get the next heap tuple from a scan
+ *
+ * This is a shorthand for index_next() + index_fetch().
+ *
+ * The result is the next heap tuple satisfying the scan keys and the
+ * snapshot, or NULL if no more matching tuples exist.    On success,
+ * the buffer containing the heap tuple is pinned (the pin will be dropped
+ * at the next index_getnext or index_endscan).
+ *
+ * Note: caller must check scan->xs_recheck, and perform rechecking of the
+ * scan keys if required.  We do not do that here because we don't have
+ * enough information to do it efficiently in the general case.
+ * ----------------
+ */
+HeapTuple
+index_getnext(IndexScanDesc scan, ScanDirection direction)
+{
+    for(;;)
+    {
+        HeapTuple tup;
+
+        if (!index_next(scan, direction))
+            return NULL;
+
+        tup = index_fetch(scan);
+        if (tup != NULL)
+            return tup;
+    }
 }

 /* ----------------
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index a6ba2ec..6fb810e 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -772,6 +772,7 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex)
     TransactionId OldestXmin;
     TransactionId FreezeXid;
     RewriteState rwstate;
+    ItemPointerData tid;

     /*
      * Open the relations we need.
@@ -829,19 +830,56 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex)
     scan = index_beginscan(OldHeap, OldIndex,
                            SnapshotAny, 0, (ScanKey) NULL);

-    while ((tuple = index_getnext(scan, ForwardScanDirection)) != NULL)
+    tuple = NULL;
+    for(;;)
     {
         HeapTuple    copiedTuple;
         bool        isdead;
         int            i;
+        bool        found;

         CHECK_FOR_INTERRUPTS();

-        /* Since we used no scan keys, should never need to recheck */
-        if (scan->xs_recheck)
-            elog(ERROR, "CLUSTER does not support lossy index conditions");
+        /* Advance to next index tuple if we're done with current HOT chain */
+        if (tuple == NULL)
+        {
+            if (!index_next(scan, ForwardScanDirection))
+                break;
+
+            /* Since we used no scan keys, should never need to recheck */
+            if (scan->xs_recheck)
+                elog(ERROR, "CLUSTER does not support lossy index conditions");
+
+            tid = scan->xs_ctup.t_self;

-        LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE);
+            scan->xs_cbuf = ReleaseAndReadBuffer(scan->xs_cbuf, OldHeap,
+                                                 ItemPointerGetBlockNumber(&tid));
+            /* Obtain share-lock on the buffer so we can examine visibility */
+            LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE);
+            found = heap_hot_search_buffer(&tid, scan->xs_cbuf,
+                                           scan->xs_snapshot, &scan->xs_ctup,
+                                           &scan->kill_prior_tuple, true);
+            if (!found)
+            {
+                LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
+                break;
+            }
+        }
+        else
+        {
+            /* Fetch next tuple from current HOT chain */
+            LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE);
+            found = heap_hot_search_buffer(&tid, scan->xs_cbuf,
+                                           scan->xs_snapshot, &scan->xs_ctup,
+                                           &scan->kill_prior_tuple, false);
+            if (!found)
+            {
+                LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
+                tuple = NULL;
+                continue;
+            }
+        }
+        tuple = &scan->xs_ctup;

         switch (HeapTupleSatisfiesVacuum(tuple->t_data, OldestXmin,
                                          scan->xs_cbuf))
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index eba7063..e511abd 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -382,9 +382,11 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
         {
             OffsetNumber offnum = tbmres->offsets[curslot];
             ItemPointerData tid;
+            HeapTupleData heapTuple;

             ItemPointerSet(&tid, page, offnum);
-            if (heap_hot_search_buffer(&tid, buffer, snapshot, NULL))
+            if (heap_hot_search_buffer(&tid, buffer, snapshot, &heapTuple,
+                                       NULL, true))
                 scan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
         }
     }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index ae23a19..cfe2b60 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -145,6 +145,9 @@ extern void index_endscan(IndexScanDesc scan);
 extern void index_markpos(IndexScanDesc scan);
 extern void index_restrpos(IndexScanDesc scan);
 extern HeapTuple index_getnext(IndexScanDesc scan, ScanDirection direction);
+extern bool index_next(IndexScanDesc scan, ScanDirection direction);
+extern HeapTuple index_fetch(IndexScanDesc scan);
+
 extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap);

 extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f8395fe..a5781db 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -83,7 +83,8 @@ extern bool heap_fetch(Relation relation, Snapshot snapshot,
            HeapTuple tuple, Buffer *userbuf, bool keep_buf,
            Relation stats_relation);
 extern bool heap_hot_search_buffer(ItemPointer tid, Buffer buffer,
-                       Snapshot snapshot, bool *all_dead);
+                       Snapshot snapshot, HeapTuple tuple,
+                       bool *all_dead, bool firstCall);
 extern bool heap_hot_search(ItemPointer tid, Relation relation,
                 Snapshot snapshot, bool *all_dead);

diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 47b95c2..3bac6d3 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -72,16 +72,11 @@ typedef struct IndexScanDescData
     /* index access method's private state */
     void       *opaque;            /* access-method-specific info */

-    /* xs_ctup/xs_cbuf/xs_recheck are valid after a successful index_getnext */
+    /* xs_ctup/xs_cbuf/xs_recheck are valid after a successful index_next */
     HeapTupleData xs_ctup;        /* current heap tuple, if any */
     Buffer        xs_cbuf;        /* current heap buffer in scan, if any */
     /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
     bool        xs_recheck;        /* T means scan keys must be rechecked */
-
-    /* state data for traversing HOT chains in index_getnext */
-    bool        xs_hot_dead;    /* T if all members of HOT chain are dead */
-    OffsetNumber xs_next_hot;    /* next member of HOT chain, if any */
-    TransactionId xs_prev_xmax; /* previous HOT chain member's XMAX, if any */
 } IndexScanDescData;

 /* Struct for heap-or-index scans of system tables */
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 4398e8c..6e7807e 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -93,12 +93,14 @@ RelationGetIndexScan(Relation indexRelation,

     scan->kill_prior_tuple = false;
     scan->ignore_killed_tuples = true;    /* default setting */
+    scan->needIndexTuple = false;

     scan->opaque = NULL;

     ItemPointerSetInvalid(&scan->xs_ctup.t_self);
     scan->xs_ctup.t_data = NULL;
     scan->xs_cbuf = InvalidBuffer;
+    scan->xs_itup = NULL;

     /*
      * Let the AM fill in the key and any opaque data it wants.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 616edae..3d6d404 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -398,7 +398,9 @@ index_restrpos(IndexScanDesc scan)
  *
  * Advances to the next index tuple satisfying the scan keys, returning TRUE
  * if there was one, FALSE otherwise. The heap TID pointer of the next match
- * is stored in scan->xs_ctup.self.
+ * is stored in scan->xs_ctup.self. If scan->needIndexTuple is TRUE, the
+ * index tuple itself is stored in scan->xs_itup, if the indexam succeeded in
+ * fetching it.
  *
  * Note: caller must check scan->xs_recheck, and perform rechecking of the
  * scan keys if required.  We do not do that here because we don't have
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 79d4c66..decc57b 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -281,6 +281,9 @@ btgettuple(PG_FUNCTION_ARGS)
     else
         res = _bt_first(scan, dir);

+    if (scan->needIndexTuple && res)
+        scan->xs_itup = (IndexTuple) &so->currPos.buffer[so->currPos.items[so->currPos.itemIndex].bufferOffset];
+
     PG_RETURN_BOOL(res);
 }

@@ -368,6 +371,8 @@ btrescan(PG_FUNCTION_ARGS)
             so->keyData = NULL;
         so->killedItems = NULL; /* until needed */
         so->numKilled = 0;
+        so->currPos.buffer = so->markPos.buffer = NULL;
+        so->currPos.bufferUsed = so->markPos.bufferUsed = 0;
         scan->opaque = so;
     }

@@ -379,12 +384,14 @@ btrescan(PG_FUNCTION_ARGS)
             _bt_killitems(scan, false);
         ReleaseBuffer(so->currPos.buf);
         so->currPos.buf = InvalidBuffer;
+        so->currPos.bufferUsed = 0;
     }

     if (BTScanPosIsValid(so->markPos))
     {
         ReleaseBuffer(so->markPos.buf);
         so->markPos.buf = InvalidBuffer;
+        so->markPos.bufferUsed = 0;
     }
     so->markItemIndex = -1;

@@ -418,12 +425,24 @@ btendscan(PG_FUNCTION_ARGS)
             _bt_killitems(scan, false);
         ReleaseBuffer(so->currPos.buf);
         so->currPos.buf = InvalidBuffer;
+
+        if (so->currPos.buffer != NULL)
+        {
+            pfree(so->currPos.buffer);
+            so->currPos.buffer = NULL;
+        }
     }

     if (BTScanPosIsValid(so->markPos))
     {
         ReleaseBuffer(so->markPos.buf);
         so->markPos.buf = InvalidBuffer;
+
+        if (so->markPos.buffer != NULL)
+        {
+            pfree(so->markPos.buffer);
+            so->markPos.buffer = NULL;
+        }
     }
     so->markItemIndex = -1;

@@ -450,6 +469,7 @@ btmarkpos(PG_FUNCTION_ARGS)
     {
         ReleaseBuffer(so->markPos.buf);
         so->markPos.buf = InvalidBuffer;
+        so->markPos.bufferUsed = 0;
     }

     /*
@@ -494,6 +514,7 @@ btrestrpos(PG_FUNCTION_ARGS)
                 _bt_killitems(scan, false);
             ReleaseBuffer(so->currPos.buf);
             so->currPos.buf = InvalidBuffer;
+            so->currPos.bufferUsed = 0;
         }

         if (BTScanPosIsValid(so->markPos))
@@ -503,6 +524,11 @@ btrestrpos(PG_FUNCTION_ARGS)
             memcpy(&so->currPos, &so->markPos,
                    offsetof(BTScanPosData, items[1]) +
                    so->markPos.lastItem * sizeof(BTScanPosItem));
+
+            if (so->markPos.bufferUsed > 0)
+                memcpy(so->currPos.buffer, so->markPos.buffer,
+                       so->markPos.bufferUsed);
+            so->currPos.bufferUsed = so->markPos.bufferUsed;
         }
     }

diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 12b4528..cde3a39 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -1004,6 +1004,10 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
      */
     so->currPos.nextPage = opaque->btpo_next;

+    if (scan->needIndexTuple && so->currPos.buffer == NULL)
+        so->currPos.buffer = palloc(BLCKSZ);
+    so->currPos.bufferUsed = 0;
+
     if (ScanDirectionIsForward(dir))
     {
         /* load items[] in ascending order */
@@ -1019,6 +1023,17 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
                 /* _bt_checkkeys put the heap ptr into scan->xs_ctup.t_self */
                 so->currPos.items[itemIndex].heapTid = scan->xs_ctup.t_self;
                 so->currPos.items[itemIndex].indexOffset = offnum;
+                if (scan->needIndexTuple)
+                {
+                    Item itup = PageGetItem(page, PageGetItemId(page, offnum));
+                    Size size = IndexTupleSize(itup);
+
+                    so->currPos.items[itemIndex].bufferOffset = so->currPos.bufferUsed;
+                    memcpy(&so->currPos.buffer[so->currPos.bufferUsed],
+                           itup, size);
+                    so->currPos.bufferUsed += size;
+
+                }
                 itemIndex++;
             }
             if (!continuescan)
@@ -1052,6 +1067,18 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
                 itemIndex--;
                 so->currPos.items[itemIndex].heapTid = scan->xs_ctup.t_self;
                 so->currPos.items[itemIndex].indexOffset = offnum;
+
+                if (scan->needIndexTuple)
+                {
+                    Item itup = PageGetItem(page, PageGetItemId(page, offnum));
+                    Size size = IndexTupleSize(itup);
+
+                    so->currPos.items[itemIndex].bufferOffset = so->currPos.bufferUsed;
+                    memcpy(&so->currPos.buffer[so->currPos.bufferUsed],
+                           itup, size);
+                    so->currPos.bufferUsed += size;
+
+                }
             }
             if (!continuescan)
             {
@@ -1111,6 +1138,16 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
                offsetof(BTScanPosData, items[1]) +
                so->currPos.lastItem * sizeof(BTScanPosItem));
         so->markPos.itemIndex = so->markItemIndex;
+
+        if (so->currPos.bufferUsed > 0)
+        {
+            if (so->markPos.buffer == NULL)
+                so->markPos.buffer = palloc(BLCKSZ);
+            memcpy(so->markPos.buffer, so->currPos.buffer,
+                   so->currPos.bufferUsed);
+        }
+        so->markPos.bufferUsed = so->currPos.bufferUsed;
+
         so->markItemIndex = -1;
     }

@@ -1130,6 +1167,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
             /* release the previous buffer */
             _bt_relbuf(rel, so->currPos.buf);
             so->currPos.buf = InvalidBuffer;
+            so->currPos.bufferUsed = 0;
             /* if we're at end of scan, give up */
             if (blkno == P_NONE || !so->currPos.moreRight)
                 return false;
@@ -1165,6 +1203,8 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
          */
         for (;;)
         {
+            so->currPos.bufferUsed = 0;
+
             /* Done if we know there are no matching keys to the left */
             if (!so->currPos.moreLeft)
             {
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c4fa17c..c884ddf 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -626,6 +626,9 @@ ExplainNode(Plan *plan, PlanState *planstate,
         case T_BitmapHeapScan:
             pname = sname = "Bitmap Heap Scan";
             break;
+        case T_HeapFetch:
+            pname = sname = "Heap Fetch";
+            break;
         case T_TidScan:
             pname = sname = "Tid Scan";
             break;
@@ -782,6 +785,7 @@ ExplainNode(Plan *plan, PlanState *planstate,
             /* FALL THRU */
         case T_SeqScan:
         case T_BitmapHeapScan:
+        case T_HeapFetch:
         case T_TidScan:
         case T_SubqueryScan:
         case T_FunctionScan:
@@ -967,6 +971,7 @@ ExplainNode(Plan *plan, PlanState *planstate,
         case T_CteScan:
         case T_WorkTableScan:
         case T_SubqueryScan:
+        case T_HeapFetch:
             show_scan_qual(plan->qual, "Filter", plan, outer_plan, es);
             break;
         case T_TidScan:
@@ -1043,8 +1048,9 @@ ExplainNode(Plan *plan, PlanState *planstate,
          * nodes, but in bitmap scan trees we must, since the bottom
          * BitmapIndexScan nodes may have outer references.
          */
+        /* same with heap fetches */
         ExplainNode(outerPlan(plan), outerPlanState(planstate),
-                    IsA(plan, BitmapHeapScan) ? outer_plan : NULL,
+                    (IsA(plan, BitmapHeapScan) || IsA(plan, HeapFetch)) ? outer_plan : NULL,
                     "Outer", NULL, es);
     }

@@ -1329,6 +1335,7 @@ ExplainScanTarget(Scan *plan, ExplainState *es)
         case T_SeqScan:
         case T_IndexScan:
         case T_BitmapHeapScan:
+        case T_HeapFetch:
         case T_TidScan:
             /* Assert it's on a real relation */
             Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 4d70f98..9adcb53 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -22,6 +22,6 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
        nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
        nodeLimit.o nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
-       nodeWindowAgg.o tstoreReceiver.o spi.o
+       nodeWindowAgg.o nodeHeapfetch.o tstoreReceiver.o spi.o

 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 2fecb46..633db1f 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -26,6 +26,7 @@
 #include "executor/nodeGroup.h"
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
+#include "executor/nodeHeapfetch.h"
 #include "executor/nodeIndexscan.h"
 #include "executor/nodeLimit.h"
 #include "executor/nodeMaterial.h"
@@ -159,6 +160,10 @@ ExecReScan(PlanState *node, ExprContext *exprCtxt)
             ExecBitmapHeapReScan((BitmapHeapScanState *) node, exprCtxt);
             break;

+        case T_HeapFetchState:
+            ExecHeapFetchReScan((HeapFetchState *) node, exprCtxt);
+            break;
+
         case T_TidScanState:
             ExecTidReScan((TidScanState *) node, exprCtxt);
             break;
@@ -376,6 +381,13 @@ ExecSupportsMarkRestore(NodeTag plantype)
              */
             return false;

+        case T_HeapFetch:
+            /*
+             * T_HeapFetch has the same problem as T_Result above. But for
+             * HeapFetch it would actually be useful
+             */
+            return false;
+
         default:
             break;
     }
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 15af711..e9edb9d 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -90,6 +90,7 @@
 #include "executor/nodeGroup.h"
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
+#include "executor/nodeHeapfetch.h"
 #include "executor/nodeIndexscan.h"
 #include "executor/nodeLimit.h"
 #include "executor/nodeMaterial.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
                                                           estate, eflags);
             break;

+        case T_HeapFetch:
+            result = (PlanState *) ExecInitHeapFetch((HeapFetch *) node,
+                                                     estate, eflags);
+            break;
+
         case T_TidScan:
             result = (PlanState *) ExecInitTidScan((TidScan *) node,
                                                    estate, eflags);
@@ -373,6 +379,10 @@ ExecProcNode(PlanState *node)
             result = ExecBitmapHeapScan((BitmapHeapScanState *) node);
             break;

+        case T_HeapFetchState:
+            result = ExecHeapFetch((HeapFetchState *) node);
+            break;
+
         case T_TidScanState:
             result = ExecTidScan((TidScanState *) node);
             break;
@@ -566,6 +576,9 @@ ExecCountSlotsNode(Plan *node)
         case T_BitmapHeapScan:
             return ExecCountSlotsBitmapHeapScan((BitmapHeapScan *) node);

+        case T_HeapFetch:
+            return ExecCountSlotsHeapFetch((HeapFetch *) node);
+
         case T_TidScan:
             return ExecCountSlotsTidScan((TidScan *) node);

@@ -705,6 +718,10 @@ ExecEndNode(PlanState *node)
             ExecEndBitmapHeapScan((BitmapHeapScanState *) node);
             break;

+        case T_HeapFetchState:
+            ExecEndHeapFetch((HeapFetchState *) node);
+            break;
+
         case T_TidScanState:
             ExecEndTidScan((TidScanState *) node);
             break;
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 19fa4e6..ff937e3 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -211,7 +211,7 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
         if (!var || !IsA(var, Var))
             return false;        /* tlist item not a Var */
         /* if these Asserts fail, planner messed up */
-        Assert(var->varno == varno);
+        Assert(var->varno == varno || var->varno == OUTER );
         Assert(var->varlevelsup == 0);
         if (var->varattno != attrno)
             return false;        /* out of order */
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index c983066..b3c3f37 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -92,6 +92,7 @@
 #include "postgres.h"

 #include "funcapi.h"
+#include "access/itup.h"
 #include "catalog/pg_type.h"
 #include "nodes/nodeFuncs.h"
 #include "storage/bufmgr.h"
@@ -578,6 +579,51 @@ ExecStoreVirtualTuple(TupleTableSlot *slot)
     return slot;
 }

+TupleTableSlot *
+ExecStoreIndexTuple(IndexTuple tuple, TupleTableSlot *slot, bool shouldFree)
+{
+    int natts;
+    int i;
+
+    /*
+     * sanity checks
+     */
+    Assert(tuple != NULL);
+    Assert(slot != NULL);
+    Assert(slot->tts_tupleDescriptor != NULL);
+
+    /*
+     * Free any old physical tuple belonging to the slot.
+     */
+    if (slot->tts_shouldFree)
+        heap_freetuple(slot->tts_tuple);
+    if (slot->tts_shouldFreeMin)
+        heap_free_minimal_tuple(slot->tts_mintuple);
+
+    /*
+     * Store the new tuple into the specified slot.
+     */
+    slot->tts_isempty = false;
+    slot->tts_shouldFree = shouldFree;
+    slot->tts_shouldFreeMin = false;
+    slot->tts_tuple = NULL;
+    slot->tts_mintuple = NULL;
+
+    natts = slot->tts_tupleDescriptor->natts;
+    for (i = 0; i < natts - 1; i++)
+    {
+        slot->tts_values[i] = index_getattr(tuple, i + 1,
+                                            slot->tts_tupleDescriptor,
+                                            &slot->tts_isnull[i]);
+    }
+    slot->tts_values[i] = PointerGetDatum(&tuple->t_tid);
+    slot->tts_isnull[i] = false;
+
+    slot->tts_nvalid = natts;
+
+    return slot;
+}
+
 /* --------------------------------
  *        ExecStoreAllNullTuple
  *            Set up the slot to contain a null in every column.
diff --git a/src/backend/executor/nodeHeapfetch.c b/src/backend/executor/nodeHeapfetch.c
new file mode 100644
index 0000000..bd08ac3
--- /dev/null
+++ b/src/backend/executor/nodeHeapfetch.c
@@ -0,0 +1,384 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeHeapfetch.c
+ *      Routines to fetch additional columns from heap and check visibility
+ *
+ * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *      $PostgreSQL$
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *
+ *        ExecHeapFetch            scans a relation using tids
+ *        ExecInitHeapFetch        creates and initializes state info.
+ *        ExecHeapFetchReScan        rescans the tid relation.
+ *        ExecEndHeapFetch        releases all storage.
+ *        ExecHeapFetchMarkPos    marks scan position.
+ *        ExecHeapFetchRestrPos    restores scan position.
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
+#include "executor/execdebug.h"
+#include "executor/nodeHeapfetch.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/array.h"
+#include "utils/memutils.h"
+
+static TupleTableSlot *HeapFetchNext(HeapFetchState *node);
+
+#define DatumGetItemPointer(X)     ((ItemPointer) DatumGetPointer(X))
+#define ItemPointerGetDatum(X)     PointerGetDatum(X)
+
+/* ----------------------------------------------------------------
+ *        HeapFetchNext
+ *
+ *        Retrieve a tuple from the HeapFetch node's currentRelation
+ *        using the tid returned from the outer node.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+HeapFetchNext(HeapFetchState *node)
+{
+    EState       *estate;
+    Snapshot    snapshot;
+    Relation    heapRelation;
+    TupleTableSlot *slot;
+    Index        scanrelid;
+    Buffer        buffer = InvalidBuffer;
+    HeapFetch  *plan = (HeapFetch *) node->ss.ps.plan;
+    ExprContext *econtext;
+
+    /*
+     * extract necessary information from tid scan node
+     */
+    estate = node->ss.ps.state;
+    snapshot = estate->es_snapshot;
+    heapRelation = node->ss.ss_currentRelation;
+    slot = node->ss.ss_ScanTupleSlot;
+    scanrelid = ((HeapFetch *) node->ss.ps.plan)->scan.scanrelid;
+    econtext = node->ss.ps.ps_ExprContext;
+
+    /*
+     * Check if we are evaluating PlanQual for tuple of this relation.
+     * Additional checking is not good, but no other way for now. We could
+     * introduce new nodes for this case and handle HeapFetch --> NewNode
+     * switching in Init/ReScan plan...
+     */
+    if (estate->es_evTuple != NULL &&
+        estate->es_evTuple[scanrelid - 1] != NULL)
+    {
+        if (estate->es_evTupleNull[scanrelid - 1])
+            return ExecClearTuple(slot);
+
+        ExecStoreTuple(estate->es_evTuple[scanrelid - 1],
+                       slot, InvalidBuffer, false);
+
+        /* Flag for the next call that no more tuples */
+        estate->es_evTupleNull[scanrelid - 1] = true;
+        return slot;
+    }
+
+    for(;;)
+    {
+        PlanState  *outerNode;
+        TupleTableSlot *outerslot;
+        bool isnull;
+        ItemPointer tid;
+        HeapTuple tuple = &(node->hfs_htup);
+        bool visible;
+        bool recheck;
+
+        outerNode = outerPlanState(node);
+        outerslot = ExecProcNode(outerNode);
+
+        if (TupIsNull(outerslot))
+        {
+            if (BufferIsValid(buffer))
+                ReleaseBuffer(buffer);
+            return NULL;
+        }
+
+        /*
+         * place the current tuple into the expr context
+         */
+        econtext->ecxt_outertuple = outerslot;
+
+        /* fetch tid from outer plan */
+        tid = DatumGetItemPointer(slot_getattr(outerslot, plan->tidIdx, &isnull));
+        Assert(!isnull);
+
+        tuple->t_self = *tid;
+
+        /*
+         * XXX: As a quick hack, the IndexScan node sets the high bit in posid
+         * if the original index qual needs to be rechecked
+         */
+        if (tuple->t_self.ip_posid & (1<<15))
+        {
+            tuple->t_self.ip_posid &= ~(1<<15);
+            recheck = true;
+        }
+        else
+            recheck = false;
+
+        buffer = ReleaseAndReadBuffer(buffer, heapRelation,
+                                      ItemPointerGetBlockNumber(tid));
+        LockBuffer(buffer, BUFFER_LOCK_SHARE);
+        visible = heap_hot_search_buffer(&tuple->t_self, buffer, snapshot, tuple, NULL, true);
+        LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+        if (visible)
+        {
+            /*
+             * store the scanned tuple in the scan tuple slot of the scan
+             * state.  Eventually we will only do this and not return a tuple.
+             * Note: we pass 'false' because tuples returned by amgetnext are
+             * pointers onto disk pages and were not created with palloc() and
+             * so should not be pfree()'d.
+             */
+            ExecStoreTuple(tuple,        /* tuple to store */
+                           slot,        /* slot to store in */
+                           buffer,        /* buffer associated with tuple  */
+                           false);        /* don't pfree */
+
+            if (recheck)
+            {
+                econtext->ecxt_scantuple = slot;
+                ResetExprContext(econtext);
+                if (!ExecQual(node->hfs_indexqualorig, econtext, false))
+                {
+                    continue;       /* nope, so ask index for another one */
+                }
+            }
+
+            /*
+             * At this point we have an extra pin on the buffer, because
+             * ExecStoreTuple incremented the pin count. Drop our local pin.
+             */
+            ReleaseBuffer(buffer);
+
+            return slot;
+        }
+    }
+
+    /*
+     * if we get here it means the tid scan failed so we are at the end of the
+     * scan..
+     */
+    return ExecClearTuple(slot);
+}
+
+/* ----------------------------------------------------------------
+ *        ExecHeapFetchScan(node)
+ *
+ *        Retrieves the next tid from the outer node, and fetches the
+ *        corresponding heap tuple.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecHeapFetch(HeapFetchState *node)
+{
+    /*
+     * use HeapFetchNext as access method
+     */
+    return ExecScan(&node->ss, (ExecScanAccessMtd) HeapFetchNext);
+}
+
+/* ----------------------------------------------------------------
+ *        ExecHeapFetchReScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecHeapFetchReScan(HeapFetchState *node, ExprContext *exprCtxt)
+{
+    EState       *estate;
+    Index        scanrelid;
+
+    estate = node->ss.ps.state;
+    scanrelid = ((HeapFetch *) node->ss.ps.plan)->scan.scanrelid;
+
+    node->ss.ps.ps_TupFromTlist = false;
+
+    /* If we are being passed an outer tuple, save it for runtime key calc */
+    if (exprCtxt != NULL)
+        node->ss.ps.ps_ExprContext->ecxt_outertuple =
+            exprCtxt->ecxt_outertuple;
+
+    /* If this is re-scanning of PlanQual ... */
+    if (estate->es_evTuple != NULL &&
+        estate->es_evTuple[scanrelid - 1] != NULL)
+    {
+        estate->es_evTupleNull[scanrelid - 1] = false;
+        return;
+    }
+
+    /*
+     * if chgParam of subnodes is not null then plans will be re-scanned by
+     * first ExecProcNode.
+     */
+    if (((PlanState *) node)->lefttree->chgParam == NULL)
+        ExecReScan(((PlanState *) node)->lefttree, exprCtxt);
+}
+
+/* ----------------------------------------------------------------
+ *        ExecEndHeapFetch
+ *
+ *        Releases any storage allocated through C routines.
+ *        Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndHeapFetch(HeapFetchState *node)
+{
+    /*
+     * Free the exprcontext
+     */
+    ExecFreeExprContext(&node->ss.ps);
+
+    /*
+     * clear out tuple table slots
+     */
+    ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+    ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+    /*
+     * close the heap relation.
+     */
+    ExecCloseScanRelation(node->ss.ss_currentRelation);
+
+    /*
+     * shut down the subplan
+     */
+    ExecEndNode(outerPlanState(node));
+}
+
+/* ----------------------------------------------------------------
+ *        ExecHeapFetchMarkPos
+ *
+ *        XXX This is dead code at the moment, for the lack of support
+ *        in ExecSupportsMarkRestore.
+ * ----------------------------------------------------------------
+ */
+void
+ExecHeapFetchMarkPos(HeapFetchState *node)
+{
+    ExecMarkPos(outerPlanState(node));
+}
+
+/* ----------------------------------------------------------------
+ *        ExecHeapFetchRestrPos
+ *
+ *        XXX This is dead code at the moment, for the lack of support
+ *        in ExecSupportsMarkRestore.
+ * ----------------------------------------------------------------
+ */
+void
+ExecHeapFetchRestrPos(HeapFetchState *node)
+{
+    ExecRestrPos(outerPlanState(node));
+}
+
+/* ----------------------------------------------------------------
+ *        ExecInitHeapFetch
+ *
+ *        Initializes the tid scan's state information, creates
+ *        scan keys, and opens the base and tid relations.
+ *
+ *        Parameters:
+ *          node: HeapFetch node produced by the planner.
+ *          estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+HeapFetchState *
+ExecInitHeapFetch(HeapFetch *node, EState *estate, int eflags)
+{
+    HeapFetchState *hfstate;
+    Relation    currentRelation;
+    Plan *outerPlan;
+
+    /*
+     * create state structure
+     */
+    hfstate = makeNode(HeapFetchState);
+    hfstate->ss.ps.plan = (Plan *) node;
+    hfstate->ss.ps.state = estate;
+
+    /*
+     * Miscellaneous initialization
+     *
+     * create expression context for node
+     */
+    ExecAssignExprContext(estate, &hfstate->ss.ps);
+
+    hfstate->ss.ps.ps_TupFromTlist = false;
+
+    /*
+     * Initialize child expressions. The indexqualorig expression is always
+     * initialized even though it will only be used in some uncommon cases
+     * --- would be nice to improve that.
+     */
+    hfstate->ss.ps.targetlist = (List *)
+        ExecInitExpr((Expr *) node->scan.plan.targetlist,
+                     (PlanState *) hfstate);
+    hfstate->ss.ps.qual = (List *)
+        ExecInitExpr((Expr *) node->scan.plan.qual,
+                     (PlanState *) hfstate);
+    hfstate->hfs_indexqualorig = (List *)
+        ExecInitExpr((Expr *) node->indexqualorig,
+                     (PlanState *) hfstate);
+
+#define HEAPFETCH_NSLOTS 2
+
+    /*
+     * tuple table initialization
+     */
+    ExecInitResultTupleSlot(estate, &hfstate->ss.ps);
+    ExecInitScanTupleSlot(estate, &hfstate->ss);
+
+    /*
+     * open the base relation and acquire appropriate lock on it.
+     */
+    currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid);
+
+    hfstate->ss.ss_currentRelation = currentRelation;
+    hfstate->ss.ss_currentScanDesc = NULL;        /* no heap scan here */
+
+    /*
+     * get the scan type from the relation descriptor.
+     */
+    ExecAssignScanType(&hfstate->ss, RelationGetDescr(currentRelation));
+
+    /*
+     * initialize child nodes
+     */
+    outerPlan = outerPlan(node);
+    outerPlanState(hfstate) = ExecInitNode(outerPlan, estate, eflags);
+
+    /*
+     * Initialize result tuple type and projection info.
+     */
+    ExecAssignResultTypeFromTL(&hfstate->ss.ps);
+    ExecAssignScanProjectionInfo(&hfstate->ss);
+
+    /*
+     * all done.
+     */
+    return hfstate;
+}
+
+int
+ExecCountSlotsHeapFetch(HeapFetch *node)
+{
+    return ExecCountSlotsNode(outerPlan((Plan *) node)) +
+        ExecCountSlotsNode(innerPlan((Plan *) node)) + HEAPFETCH_NSLOTS;
+}
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d8ec8b3..2035302 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -27,6 +27,7 @@
 #include "access/genam.h"
 #include "access/nbtree.h"
 #include "access/relscan.h"
+#include "catalog/pg_type.h"
 #include "executor/execdebug.h"
 #include "executor/nodeIndexscan.h"
 #include "optimizer/clauses.h"
@@ -53,8 +54,8 @@ IndexNext(IndexScanState *node)
     ScanDirection direction;
     IndexScanDesc scandesc;
     Index        scanrelid;
-    HeapTuple    tuple;
     TupleTableSlot *slot;
+    IndexScan *plan = (IndexScan *) node->ss.ps.plan;

     /*
      * extract necessary information from index scan node
@@ -62,7 +63,7 @@ IndexNext(IndexScanState *node)
     estate = node->ss.ps.state;
     direction = estate->es_direction;
     /* flip direction if this is an overall backward scan */
-    if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
+    if (ScanDirectionIsBackward(plan->indexorderdir))
     {
         if (ScanDirectionIsForward(direction))
             direction = BackwardScanDirection;
@@ -72,7 +73,7 @@ IndexNext(IndexScanState *node)
     scandesc = node->iss_ScanDesc;
     econtext = node->ss.ps.ps_ExprContext;
     slot = node->ss.ss_ScanTupleSlot;
-    scanrelid = ((IndexScan *) node->ss.ps.plan)->scan.scanrelid;
+    scanrelid = plan->scan.scanrelid;

     /*
      * Check if we are evaluating PlanQual for tuple of this relation.
@@ -94,9 +95,6 @@ IndexNext(IndexScanState *node)

         ResetExprContext(econtext);

-        if (!ExecQual(node->indexqualorig, econtext, false))
-            ExecClearTuple(slot);        /* would not be returned by scan */
-
         /* Flag for the next call that no more tuples */
         estate->es_evTupleNull[scanrelid - 1] = true;

@@ -106,28 +104,39 @@ IndexNext(IndexScanState *node)
     /*
      * ok, now that we have what we need, fetch the next tuple.
      */
-    while ((tuple = index_getnext(scandesc, direction)) != NULL)
+    while (index_next(scandesc, direction))
     {
         /*
-         * Store the scanned tuple in the scan tuple slot of the scan state.
-         * Note: we pass 'false' because tuples returned by amgetnext are
-         * pointers onto disk pages and must not be pfree()'d.
+         * Before we fetch the heap tuple, see if we can evaluate quals
+         * using the index tuple.
          */
-        ExecStoreTuple(tuple,    /* tuple to store */
-                       slot,    /* slot to store in */
-                       scandesc->xs_cbuf,        /* buffer containing tuple */
-                       false);    /* don't pfree */
+        if (scandesc->xs_itup)
+        {
+            ExecStoreIndexTuple(scandesc->xs_itup, slot, false);
+        }
+        else
+        {
+            Assert(!scandesc->needIndexTuple);
+            Assert(slot->tts_tupleDescriptor->natts == 1);
+
+            ExecClearTuple(slot);
+            slot->tts_values[0] = PointerGetDatum(&scandesc->xs_ctup.t_self);
+            slot->tts_isnull[0] = false;
+            ExecStoreVirtualTuple(slot);
+        }

         /*
          * If the index was lossy, we have to recheck the index quals using
-         * the real tuple.
+         * the real tuple in the HeapFetch node above.
+         *
+         * XXX: As a quick hack to signal to the HeapFetch node that this
+         * tuple needs to be rechecked, set the high bit in the posid field of
+         * the tid.
          */
         if (scandesc->xs_recheck)
         {
-            econtext->ecxt_scantuple = slot;
-            ResetExprContext(econtext);
-            if (!ExecQual(node->indexqualorig, econtext, false))
-                continue;        /* nope, so ask index for another one */
+            scandesc->xs_ctup.t_self.ip_posid |= (1<<15);
+            slot->tts_values[0] = PointerGetDatum(&scandesc->xs_ctup.t_self);
         }

         return slot;
@@ -501,6 +510,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
     IndexScanState *indexstate;
     Relation    currentRelation;
     bool        relistarget;
+    bool        needIndexTuple;

     /*
      * create state structure
@@ -522,19 +532,20 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
      * initialize child expressions
      *
      * Note: we don't initialize all of the indexqual expression, only the
-     * sub-parts corresponding to runtime keys (see below).  The indexqualorig
-     * expression is always initialized even though it will only be used in
-     * some uncommon cases --- would be nice to improve that.  (Problem is
+     * sub-parts corresponding to runtime keys (see below).  (Problem is
      * that any SubPlans present in the expression must be found now...)
+     *
+     * The targetlist and qual expressions in the plan node refer to columns
+     * in the heap, for the convenience of EXPLAIN and consistency with all
+     * other node types. But we return an index tuple, so we use versions of
+     * those expressions that have been transformed to refer index columns
+     * instead.
      */
     indexstate->ss.ps.targetlist = (List *)
-        ExecInitExpr((Expr *) node->scan.plan.targetlist,
+        ExecInitExpr((Expr *) node->indexonlytargetlist,
                      (PlanState *) indexstate);
     indexstate->ss.ps.qual = (List *)
-        ExecInitExpr((Expr *) node->scan.plan.qual,
-                     (PlanState *) indexstate);
-    indexstate->indexqualorig = (List *)
-        ExecInitExpr((Expr *) node->indexqualorig,
+        ExecInitExpr((Expr *) node->indexonlyqual,
                      (PlanState *) indexstate);

 #define INDEXSCAN_NSLOTS 2
@@ -554,15 +565,9 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
     indexstate->ss.ss_currentScanDesc = NULL;    /* no heap scan here */

     /*
-     * get the scan type from the relation descriptor.
-     */
-    ExecAssignScanType(&indexstate->ss, RelationGetDescr(currentRelation));
-
-    /*
      * Initialize result tuple type and projection info.
      */
     ExecAssignResultTypeFromTL(&indexstate->ss.ps);
-    ExecAssignScanProjectionInfo(&indexstate->ss);

     /*
      * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
@@ -584,6 +589,44 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
                                      relistarget ? NoLock : AccessShareLock);

     /*
+     * Initialize slot to hold index tuple. The shape of the slot depends
+     * on whether we return the whole index tuple or just the tid.
+     */
+    {
+        TupleDesc td;
+        int natts;
+
+        if (node->indexonlyqual != NIL ||
+            list_length(node->indexonlytargetlist) > 1)
+        {
+            TupleDesc itupdesc = RelationGetDescr(indexstate->iss_RelationDesc);
+            int i;
+
+            natts = itupdesc->natts + 1;
+
+            td = CreateTemplateTupleDesc(natts, false);
+            for(i = 0; i < itupdesc->natts; i++)
+                memcpy(td->attrs[i], itupdesc->attrs[i],
+                       sizeof(FormData_pg_attribute));
+
+            needIndexTuple = true;
+        }
+        else
+        {
+            natts = 1;
+            td = CreateTemplateTupleDesc(natts, false);
+
+            needIndexTuple = false;
+        }
+
+        TupleDescInitEntry(td, natts, "tid", TIDOID, -1, 0);
+
+        ExecAssignScanType(&indexstate->ss, td);
+    }
+
+    ExecAssignScanProjectionInfo(&indexstate->ss);
+
+    /*
      * Initialize index-specific scan state
      */
     indexstate->iss_RuntimeKeysReady = false;
@@ -629,6 +672,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
                                                estate->es_snapshot,
                                                indexstate->iss_NumScanKeys,
                                                indexstate->iss_ScanKeys);
+    indexstate->iss_ScanDesc->needIndexTuple = needIndexTuple;

     /*
      * all done.
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 25961d1..15c91be 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -313,6 +313,9 @@ _copyIndexScan(IndexScan *from)
     COPY_NODE_FIELD(indexqual);
     COPY_NODE_FIELD(indexqualorig);
     COPY_SCALAR_FIELD(indexorderdir);
+    COPY_NODE_FIELD(indexonlyqual);
+    COPY_NODE_FIELD(indexonlyqualorig);
+    COPY_NODE_FIELD(indexonlytargetlist);

     return newnode;
 }
@@ -362,6 +365,24 @@ _copyBitmapHeapScan(BitmapHeapScan *from)
 }

 /*
+ * _copyHeapFetch
+ */
+static HeapFetch *
+_copyHeapFetch(HeapFetch *from)
+{
+    HeapFetch  *newnode = makeNode(HeapFetch);
+
+    /*
+     * copy node superclass fields
+     */
+    CopyScanFields((Scan *) from, (Scan *) newnode);
+    COPY_SCALAR_FIELD(tidIdx);
+    COPY_NODE_FIELD(indexqualorig);
+
+    return newnode;
+}
+
+/*
  * _copyTidScan
  */
 static TidScan *
@@ -879,6 +900,7 @@ _copyVar(Var *from)
     Var           *newnode = makeNode(Var);

     COPY_SCALAR_FIELD(varno);
+    COPY_SCALAR_FIELD(varindexno);
     COPY_SCALAR_FIELD(varattno);
     COPY_SCALAR_FIELD(vartype);
     COPY_SCALAR_FIELD(vartypmod);
@@ -3470,6 +3492,9 @@ copyObject(void *from)
         case T_BitmapHeapScan:
             retval = _copyBitmapHeapScan(from);
             break;
+        case T_HeapFetch:
+            retval = _copyHeapFetch(from);
+            break;
         case T_TidScan:
             retval = _copyTidScan(from);
             break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 142aa8f..00abb24 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -134,6 +134,7 @@ static bool
 _equalVar(Var *a, Var *b)
 {
     COMPARE_SCALAR_FIELD(varno);
+    COMPARE_SCALAR_FIELD(varindexno);
     COMPARE_SCALAR_FIELD(varattno);
     COMPARE_SCALAR_FIELD(vartype);
     COMPARE_SCALAR_FIELD(vartypmod);
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 9ed9018..33d03da 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -87,6 +87,8 @@ makeVar(Index varno,
     /* Likewise, we just set location to "unknown" here */
     var->location = -1;

+    var->varindexno = -1;
+
     return var;
 }

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index af801a4..d508329 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -399,6 +399,8 @@ _outIndexScan(StringInfo str, IndexScan *node)
     WRITE_NODE_FIELD(indexqual);
     WRITE_NODE_FIELD(indexqualorig);
     WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+    WRITE_NODE_FIELD(indexonlyqual);
+    WRITE_NODE_FIELD(indexonlyqualorig);
 }

 static void
@@ -424,6 +426,17 @@ _outBitmapHeapScan(StringInfo str, BitmapHeapScan *node)
 }

 static void
+_outHeapFetch(StringInfo str, HeapFetch *node)
+{
+    WRITE_NODE_TYPE("HEAPFETCH");
+
+    _outScanInfo(str, (Scan *) node);
+
+    WRITE_INT_FIELD(tidIdx);
+    WRITE_NODE_FIELD(indexqualorig);
+}
+
+static void
 _outTidScan(StringInfo str, TidScan *node)
 {
     WRITE_NODE_TYPE("TIDSCAN");
@@ -780,6 +793,7 @@ _outVar(StringInfo str, Var *node)
     WRITE_NODE_TYPE("VAR");

     WRITE_UINT_FIELD(varno);
+    WRITE_INT_FIELD(varindexno);
     WRITE_INT_FIELD(varattno);
     WRITE_OID_FIELD(vartype);
     WRITE_INT_FIELD(vartypmod);
@@ -2414,6 +2428,9 @@ _outNode(StringInfo str, void *obj)
             case T_BitmapHeapScan:
                 _outBitmapHeapScan(str, obj);
                 break;
+            case T_HeapFetch:
+                _outHeapFetch(str, obj);
+                break;
             case T_TidScan:
                 _outTidScan(str, obj);
                 break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4f7b740..7b5914b 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -398,6 +398,7 @@ _readVar(void)
     READ_LOCALS(Var);

     READ_UINT_FIELD(varno);
+    READ_INT_FIELD(varindexno);
     READ_INT_FIELD(varattno);
     READ_OID_FIELD(vartype);
     READ_INT_FIELD(vartypmod);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c9c5025..31740b9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -195,6 +195,8 @@ cost_seqscan(Path *path, PlannerInfo *root,
  *
  * 'index' is the index to be used
  * 'indexQuals' is the list of applicable qual clauses (implicit AND semantics)
+ * 'indexOnlyQuals' is the list of qual clauses that are applied to tuples
+ *        fetched from index, before fetching the heap tuple
  * 'outer_rel' is the outer relation when we are considering using the index
  *        scan as the inside of a nestloop join (hence, some of the indexQuals
  *        are join clauses, and we should expect repeated scans of the index);
@@ -216,6 +218,7 @@ void
 cost_index(IndexPath *path, PlannerInfo *root,
            IndexOptInfo *index,
            List *indexQuals,
+           List *indexOnlyQuals,
            RelOptInfo *outer_rel)
 {
     RelOptInfo *baserel = index->rel;
@@ -224,6 +227,7 @@ cost_index(IndexPath *path, PlannerInfo *root,
     Cost        indexStartupCost;
     Cost        indexTotalCost;
     Selectivity indexSelectivity;
+    Selectivity indexOnlyQualSelectivity;
     double        indexCorrelation,
                 csquared;
     Cost        min_IO_cost,
@@ -257,6 +261,9 @@ cost_index(IndexPath *path, PlannerInfo *root,
                      PointerGetDatum(&indexSelectivity),
                      PointerGetDatum(&indexCorrelation));

+    indexOnlyQualSelectivity =
+        clauselist_selectivity(root, indexOnlyQuals, 0, JOIN_INNER, NULL);
+
     /*
      * Save amcostestimate's results for possible use in bitmap scan planning.
      * We don't bother to save indexStartupCost or indexCorrelation, because a
@@ -270,7 +277,8 @@ cost_index(IndexPath *path, PlannerInfo *root,
     run_cost += indexTotalCost - indexStartupCost;

     /* estimate number of main-table tuples fetched */
-    tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);
+    tuples_fetched = clamp_row_est(
+        indexOnlyQualSelectivity * indexSelectivity * baserel->tuples);

     /*----------
      * Estimate number of main-table pages fetched, and compute I/O cost.
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 52caddd..87073f7 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -27,9 +27,11 @@
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
+#include "optimizer/planmain.h"
 #include "optimizer/predtest.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/var.h"
+#include "parser/parsetree.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/lsyscache.h"
@@ -47,12 +49,10 @@


 /* Whether we are looking for plain indexscan, bitmap scan, or either */
-typedef enum
-{
-    ST_INDEXSCAN,                /* must support amgettuple */
-    ST_BITMAPSCAN,                /* must support amgetbitmap */
-    ST_ANYSCAN                    /* either is okay */
-} ScanTypeControl;
+#define ST_INDEXSCAN   1
+#define ST_BITMAPSCAN  2
+#define ST_ANYSCAN     (ST_INDEXSCAN | ST_BITMAPSCAN)
+typedef int ScanTypeControl;

 /* Per-path data used within choose_bitmap_and() */
 typedef struct
@@ -64,10 +64,11 @@ typedef struct
 } PathClauseUsage;


-static List *find_usable_indexes(PlannerInfo *root, RelOptInfo *rel,
+static void find_usable_indexes(PlannerInfo *root, RelOptInfo *rel,
                     List *clauses, List *outer_clauses,
                     bool istoplevel, RelOptInfo *outer_rel,
-                    SaOpControl saop_control, ScanTypeControl scantype);
+                    SaOpControl saop_control, ScanTypeControl scantype,
+                    List **indexpaths, List **bmpaths);
 static List *find_saop_paths(PlannerInfo *root, RelOptInfo *rel,
                 List *clauses, List *outer_clauses,
                 bool istoplevel, RelOptInfo *outer_rel);
@@ -175,9 +176,10 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
      * Find all the index paths that are directly usable for this relation
      * (ie, are valid without considering OR or JOIN clauses).
      */
-    indexpaths = find_usable_indexes(root, rel,
-                                     rel->baserestrictinfo, NIL,
-                                     true, NULL, SAOP_FORBID, ST_ANYSCAN);
+    find_usable_indexes(root, rel,
+                        rel->baserestrictinfo, NIL,
+                        true, NULL, SAOP_FORBID, ST_ANYSCAN,
+                        &indexpaths, &bitindexpaths);

     /*
      * Submit all the ones that can form plain IndexScan plans to add_path. (A
@@ -189,18 +191,11 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
      * selectivity; we should discard anything that was generated solely for
      * ordering purposes.
      */
-    bitindexpaths = NIL;
     foreach(l, indexpaths)
     {
-        IndexPath  *ipath = (IndexPath *) lfirst(l);
-
-        if (ipath->indexinfo->amhasgettuple)
-            add_path(rel, (Path *) ipath);
+        Path  *ipath = (Path *) lfirst(l);

-        if (ipath->indexinfo->amhasgetbitmap &&
-            ipath->indexselectivity < 1.0 &&
-            !ScanDirectionIsBackward(ipath->indexscandir))
-            bitindexpaths = lappend(bitindexpaths, ipath);
+        add_path(rel, ipath);
     }

     /*
@@ -271,26 +266,37 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
  * 'saop_control' indicates whether ScalarArrayOpExpr clauses can be used
  * 'scantype' indicates whether we need plain or bitmap scan support
  *
+ * Paths suitable for regular index scans and bitmap scans are returned in
+ * *indexpaths and *bmpaths, respectively. Same path can be returned in both
+ * lists.
+ *
  * Note: check_partial_indexes() must have been run previously.
  *----------
  */
-static List *
+static void
 find_usable_indexes(PlannerInfo *root, RelOptInfo *rel,
                     List *clauses, List *outer_clauses,
                     bool istoplevel, RelOptInfo *outer_rel,
-                    SaOpControl saop_control, ScanTypeControl scantype)
+                    SaOpControl saop_control, ScanTypeControl scantype,
+                    List **indexpaths, List **bmpaths)
 {
     Relids        outer_relids = outer_rel ? outer_rel->relids : NULL;
     bool        possibly_useful_pathkeys = has_useful_pathkeys(root, rel);
-    List       *result = NIL;
     List       *all_clauses = NIL;        /* not computed till needed */
     ListCell   *ilist;

+    if (indexpaths)
+        *indexpaths = NIL;
+    if (bmpaths)
+        *bmpaths = NIL;
+
     foreach(ilist, rel->indexlist)
     {
         IndexOptInfo *index = (IndexOptInfo *) lfirst(ilist);
         IndexPath  *ipath;
         List       *restrictclauses;
+        List       *indexonlyQuals;
+        List       *heapQuals;
         List       *index_pathkeys;
         List       *useful_pathkeys;
         bool        useful_predicate;
@@ -398,29 +404,70 @@ find_usable_indexes(PlannerInfo *root, RelOptInfo *rel,
             useful_pathkeys = NIL;

         /*
-         * 3. Generate an indexscan path if there are relevant restriction
-         * clauses in the current clauses, OR the index ordering is
-         * potentially useful for later merging or final output ordering, OR
-         * the index has a predicate that was proven by the current clauses.
+         * 3. The index might be useful if data from it can be used to
+         * evaluate any restrictions not used as index keys.
          */
-        if (found_clause || useful_pathkeys != NIL || useful_predicate)
+        indexonlyQuals = NIL;
+        heapQuals = NIL;
+        if ((scantype & ST_INDEXSCAN) && index->amhasgettuple)
+            extract_indexonly_quals(root, index, restrictclauses, outer_rel,
+                                    &heapQuals, &indexonlyQuals);
+        else
+            extract_indexonly_quals(root, index, restrictclauses, outer_rel,
+                                    &heapQuals, NULL);
+
+        /*
+         * 4. Generate an indexscan path if
+         *
+         * - there are relevant restriction clauses in the current clauses, OR
+         * - the index ordering is potentially useful for later merging or
+         *   final output ordering, OR
+         * - the index has a predicate that was proven by the current
+         *   clauses, OR
+         * - data from the index can be used to evaluate restriction clauses
+         *   before accessing the heap.
+         */
+        if (found_clause || useful_pathkeys != NIL || useful_predicate ||
+            indexonlyQuals != NIL)
         {
             ipath = create_index_path(root, index,
                                       restrictclauses,
+                                      indexonlyQuals,
                                       useful_pathkeys,
                                       index_is_ordered ?
                                       ForwardScanDirection :
                                       NoMovementScanDirection,
                                       outer_rel);
-            result = lappend(result, ipath);
+
+            /*
+             * If the only reason we're considering this path is index-only
+             * quals, this path is only usable for a regular index scan.
+             */
+            if (!found_clause && useful_pathkeys == NIL && !useful_predicate)
+            {
+                Path *p = (Path *) create_heapfetch_path(root, rel, (Path *)ipath, index, heapQuals,
get_actual_clauses(ipath->indexquals));
+                *indexpaths = lappend(*indexpaths, p);
+            }
+            else
+            {
+                if (index->amhasgettuple && indexpaths != NULL)
+                {
+                    Path *p = (Path *) create_heapfetch_path(root, rel, (Path *)ipath, index, heapQuals,
get_actual_clauses(ipath->indexquals));
+
+                    *indexpaths = lappend(*indexpaths, p);
+                }
+
+                if (index->amhasgetbitmap && bmpaths != NULL)
+                    *bmpaths = lappend(*bmpaths, ipath);
+            }
         }

         /*
-         * 4. If the index is ordered, a backwards scan might be interesting.
+         * 5. If the index is ordered, a backwards scan might be interesting.
          * Again, this is only interesting at top level.
          */
         if (index_is_ordered && possibly_useful_pathkeys &&
-            istoplevel && outer_rel == NULL)
+            istoplevel && outer_rel == NULL && indexpaths != NULL)
         {
             index_pathkeys = build_index_pathkeys(root, index,
                                                   BackwardScanDirection);
@@ -428,17 +475,19 @@ find_usable_indexes(PlannerInfo *root, RelOptInfo *rel,
                                                         index_pathkeys);
             if (useful_pathkeys != NIL)
             {
+                Path *p;
                 ipath = create_index_path(root, index,
                                           restrictclauses,
+                                          indexonlyQuals,
                                           useful_pathkeys,
                                           BackwardScanDirection,
                                           outer_rel);
-                result = lappend(result, ipath);
+                p = (Path *) create_heapfetch_path(root, rel, (Path *) ipath, index,
+                                                   heapQuals, get_actual_clauses(ipath->indexquals));
+                *indexpaths = lappend(*indexpaths, p);
             }
         }
     }
-
-    return result;
 }


@@ -457,6 +506,7 @@ find_saop_paths(PlannerInfo *root, RelOptInfo *rel,
 {
     bool        have_saop = false;
     ListCell   *l;
+    List       *paths;

     /*
      * Since find_usable_indexes is relatively expensive, don't bother to run
@@ -476,10 +526,11 @@ find_saop_paths(PlannerInfo *root, RelOptInfo *rel,
     if (!have_saop)
         return NIL;

-    return find_usable_indexes(root, rel,
-                               clauses, outer_clauses,
-                               istoplevel, outer_rel,
-                               SAOP_REQUIRE, ST_BITMAPSCAN);
+    find_usable_indexes(root, rel,
+                        clauses, outer_clauses,
+                        istoplevel, outer_rel,
+                        SAOP_REQUIRE, ST_BITMAPSCAN, NULL, &paths);
+    return paths;
 }


@@ -536,13 +587,15 @@ generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel,
             {
                 List       *andargs = ((BoolExpr *) orarg)->args;

-                indlist = find_usable_indexes(root, rel,
-                                              andargs,
-                                              all_clauses,
-                                              false,
-                                              outer_rel,
-                                              SAOP_ALLOW,
-                                              ST_BITMAPSCAN);
+                find_usable_indexes(root, rel,
+                                    andargs,
+                                    all_clauses,
+                                    false,
+                                    outer_rel,
+                                    SAOP_ALLOW,
+                                    ST_BITMAPSCAN,
+                                    NULL,
+                                    &indlist);
                 /* Recurse in case there are sub-ORs */
                 indlist = list_concat(indlist,
                                       generate_bitmap_or_paths(root, rel,
@@ -554,13 +607,15 @@ generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel,
             {
                 Assert(IsA(orarg, RestrictInfo));
                 Assert(!restriction_is_or_clause((RestrictInfo *) orarg));
-                indlist = find_usable_indexes(root, rel,
-                                              list_make1(orarg),
-                                              all_clauses,
-                                              false,
-                                              outer_rel,
-                                              SAOP_ALLOW,
-                                              ST_BITMAPSCAN);
+                find_usable_indexes(root, rel,
+                                    list_make1(orarg),
+                                    all_clauses,
+                                    false,
+                                    outer_rel,
+                                    SAOP_ALLOW,
+                                    ST_BITMAPSCAN,
+                                    NULL,
+                                    &indlist);
             }

             /*
@@ -1123,6 +1178,102 @@ group_clauses_by_indexkey(IndexOptInfo *index,
     return clausegroup_list;
 }

+void
+extract_indexonly_quals(PlannerInfo *root, IndexOptInfo *index,
+                        List *clause_groups,
+                        RelOptInfo *outer_rel,
+                        List **heapquals, List **indexonlyqualorig)
+{
+    List       *scan_clauses = index->rel->baserestrictinfo;
+    List       *indexquals;
+    List       *allclauses;
+    ListCell   *l;
+    RelOptInfo *rel = index->rel;
+
+    /* Convert clauses to indexquals the executor can handle */
+    indexquals = expand_indexqual_conditions(index, clause_groups);
+
+    /* Flatten the clause-groups list to produce indexclauses list */
+    allclauses = flatten_clausegroups_list(clause_groups);
+
+    /*
+     * If this is an innerjoin scan, the indexclauses will contain join
+     * clauses that are not present in scan_clauses (since the passed-in value
+     * is just the rel's baserestrictinfo list).  We must add these clauses to
+     * scan_clauses to ensure they get checked.  In most cases we will remove
+     * the join clauses again below, but if a join clause contains a special
+     * operator, we need to make sure it gets into the scan_clauses.
+     *
+     * Note: pointer comparison should be enough to determine RestrictInfo
+     * matches.
+     */
+    if (outer_rel != NULL)
+        scan_clauses = list_union_ptr(scan_clauses, allclauses);
+
+    /*
+     * The qpqual list must contain all restrictions not automatically handled
+     * by the index.  All the predicates in the indexquals will be checked
+     * (either by the index itself, or by nodeIndexscan.c), but if there are
+     * any "special" operators involved then they must be included in qpqual.
+     * The upshot is that qpqual must contain scan_clauses minus whatever
+     * appears in indexquals.
+     *
+     * In normal cases simple pointer equality checks will be enough to spot
+     * duplicate RestrictInfos, so we try that first.  In some situations
+     * (particularly with OR'd index conditions) we may have scan_clauses that
+     * are not equal to, but are logically implied by, the index quals; so we
+     * also try a predicate_implied_by() check to see if we can discard quals
+     * that way.  (predicate_implied_by assumes its first input contains only
+     * immutable functions, so we have to check that.)
+     *
+     * We can also discard quals that are implied by a partial index's
+     * predicate, but only in a plain SELECT; when scanning a target relation
+     * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the
+     * plan so that they'll be properly rechecked by EvalPlanQual testing.
+     */
+    *heapquals = NIL;
+    if (indexonlyqualorig)
+        *indexonlyqualorig = NIL;
+    foreach(l, scan_clauses)
+    {
+        RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+        bool is_indexonly;
+
+        Assert(IsA(rinfo, RestrictInfo));
+        if (rinfo->pseudoconstant)
+            continue;            /* we may drop pseudoconstants here */
+        if (list_member_ptr(indexquals, rinfo))
+            continue;
+        if (!contain_mutable_functions((Node *) rinfo->clause))
+        {
+            List       *clausel = list_make1(rinfo->clause);
+
+            if (predicate_implied_by(clausel, indexquals))
+                continue;
+            if (index->indpred)
+            {
+                if (rel->relid != root->parse->resultRelation &&
+                    get_rowmark(root->parse, rel->relid) == NULL)
+                    if (predicate_implied_by(clausel,
+                                             index->indpred))
+                        continue;
+            }
+        }
+
+        if (indexonlyqualorig)
+        {
+            make_indexonly_expr(root, index,
+                                (Expr *) rinfo->clause,
+                                &is_indexonly);
+            if (is_indexonly)
+                *indexonlyqualorig = lappend(*indexonlyqualorig, rinfo);
+            else
+                *heapquals = lappend(*heapquals, rinfo);
+        }
+        else
+            *heapquals = lappend(*heapquals, rinfo);
+    }
+}

 /*
  * match_clause_to_indexcol()
@@ -1677,7 +1828,6 @@ best_inner_indexscan(PlannerInfo *root, RelOptInfo *rel,
     List       *clause_list;
     List       *indexpaths;
     List       *bitindexpaths;
-    List       *allindexpaths;
     ListCell   *l;
     InnerIndexscanInfo *info;
     MemoryContext oldcontext;
@@ -1773,27 +1923,13 @@ best_inner_indexscan(PlannerInfo *root, RelOptInfo *rel,
      * Find all the index paths that are usable for this join, except for
      * stuff involving OR and ScalarArrayOpExpr clauses.
      */
-    allindexpaths = find_usable_indexes(root, rel,
-                                        clause_list, NIL,
-                                        false, outer_rel,
-                                        SAOP_FORBID,
-                                        ST_ANYSCAN);
-
-    /*
-     * Include the ones that are usable as plain indexscans in indexpaths, and
-     * include the ones that are usable as bitmap scans in bitindexpaths.
-     */
-    indexpaths = bitindexpaths = NIL;
-    foreach(l, allindexpaths)
-    {
-        IndexPath  *ipath = (IndexPath *) lfirst(l);
-
-        if (ipath->indexinfo->amhasgettuple)
-            indexpaths = lappend(indexpaths, ipath);
-
-        if (ipath->indexinfo->amhasgetbitmap)
-            bitindexpaths = lappend(bitindexpaths, ipath);
-    }
+    find_usable_indexes(root, rel,
+                        clause_list, NIL,
+                        false, outer_rel,
+                        SAOP_FORBID,
+                        ST_ANYSCAN,
+                        &indexpaths,
+                        &bitindexpaths);

     /*
      * Generate BitmapOrPaths for any suitable OR-clauses present in the
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index ef13110..18b8f85 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -16,10 +16,15 @@

 #include <math.h>

+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
 #include "executor/executor.h"
+#include "nodes/makefuncs.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/tlist.h"


 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
@@ -43,6 +48,8 @@ static List *select_mergejoin_clauses(PlannerInfo *root,
                          List *restrictlist,
                          JoinType jointype);

+static Path *bubbleup_heapfetches(PlannerInfo *root, JoinPath *path);
+

 /*
  * add_paths_to_joinrel
@@ -131,6 +138,159 @@ add_paths_to_joinrel(PlannerInfo *root,
     if (enable_hashjoin)
         hash_inner_and_outer(root, joinrel, outerrel, innerrel,
                              restrictlist, jointype, sjinfo);
+
+    {
+        ListCell *l;
+        List *newpaths = NIL;
+        foreach(l, joinrel->pathlist)
+        {
+            Path *path = (Path *) lfirst(l);
+            path = bubbleup_heapfetches(root, (JoinPath *)path);
+            newpaths = lappend(newpaths, path);
+        }
+        joinrel->pathlist = newpaths;
+    }
+}
+
+static bool
+can_bubbleup(PlannerInfo *root, JoinPath *join, HeapFetchPath *hfpath)
+{
+    bool indexonly;
+    ListCell *l;
+
+    /*
+     * XXX:
+     * Don't bubble up above a Merge Join, if a sort is required. This is
+     * just an implementation limitation - we'd have to check if the
+     * column we're sorting on is in the targetlist of the subpath, which
+     * is not trivial
+     */
+    if (IsA(join, MergePath) && ((MergePath *) join)->outersortkeys != NIL)
+        return false;
+
+    /* Check if we need any of the columns not in the index in the join clause */
+    foreach(l, join->joinrestrictinfo)
+    {
+        RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+        make_indexonly_expr(root, hfpath->indexinfo, rinfo->clause,
+                            &indexonly);
+        if (!indexonly)
+            return false;
+    }
+    return true;
+}
+
+static HeapFetchPath *
+bubbleup_step(PlannerInfo *root, Path *join, HeapFetchPath *hfpath)
+{
+    /* This can be bubbled up */
+    Var *tidvar;
+    ListCell *l;
+    List *origtlist;
+    HeapFetchPath *result;
+
+    tidvar = makeVar(hfpath->indexinfo->rel->relid,
+                     SelfItemPointerAttributeNumber,
+                     TIDOID, -1, 0);
+    tidvar->varindexno = hfpath->indexinfo->indexno;
+
+    origtlist = join->pathtargetlist;
+
+    /*
+     * Remove any entries that can't be calculated below the heap fetch.
+     */
+    join->pathtargetlist = NIL;
+    foreach(l, origtlist)
+    {
+        bool indexonly;
+
+        make_indexonly_expr(root, hfpath->indexinfo, lfirst(l), &indexonly);
+        if (indexonly)
+            join->pathtargetlist = lappend(join->pathtargetlist, copyObject(lfirst(l)));
+    }
+
+    /*
+     * Add the ctid col to the targetlist of the join
+     */
+    if (!list_member(join->pathtargetlist, tidvar))
+        join->pathtargetlist = lappend(join->pathtargetlist, tidvar);
+
+    result = create_heapfetch_path(root, hfpath->baserel,
+                                   join, hfpath->indexinfo,
+                                   hfpath->heapclauses, hfpath->indexqualorig);
+    result->path.pathtargetlist = origtlist;
+
+    return result;
+}
+
+static Path *
+bubbleup_heapfetches(PlannerInfo *root, JoinPath *path)
+{
+    Path *outerpath = path->outerjoinpath;
+    Path *innerpath = path->innerjoinpath;
+    Path *result = (Path *) path;
+    HeapFetchPath *lasthf = NULL;
+
+    if (IsA(path, HeapFetchPath))
+        return (Path *) path;
+
+    if (path->jointype == JOIN_INNER || path->jointype == JOIN_LEFT)
+    {
+        while(IsA(outerpath, HeapFetchPath))
+        {
+            HeapFetchPath *hfpath = (HeapFetchPath *) outerpath;
+
+            if (can_bubbleup(root, path, hfpath))
+            {
+                outerpath = hfpath->subpath;
+
+                hfpath = bubbleup_step(root, (Path *) path, hfpath);
+                if (lasthf)
+                    lasthf->subpath = (Path *) hfpath;
+                else
+                    result = (Path *) hfpath;
+                lasthf = hfpath;
+            }
+            else
+            {
+                /* XXX: we could see if there's more heap fetches below this
+                 * that could be pulled above this heap fetch and the join path.
+                 */
+                break;
+            }
+        }
+    }
+
+    if (path->jointype == JOIN_INNER || path->jointype == JOIN_RIGHT)
+    {
+        while(IsA(innerpath, HeapFetchPath))
+        {
+            HeapFetchPath *hfpath = (HeapFetchPath *) innerpath;
+
+            if (can_bubbleup(root, path, hfpath))
+            {
+                innerpath = hfpath->subpath;
+
+                hfpath = bubbleup_step(root, (Path *) path, hfpath);
+                if (lasthf)
+                    lasthf->subpath = (Path *) hfpath;
+                else
+                    result = (Path *) hfpath;
+                lasthf = hfpath;
+            }
+            else
+            {
+                /* XXX: we could see if there's more heap fetches below this
+                 * that could be pulled above this heap fetch and the join path.
+                 */
+                break;
+            }
+        }
+    }
+    path->outerjoinpath = outerpath;
+    path->innerjoinpath = innerpath;
+
+    return result;
 }

 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ccea2e8..a3a5d47 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -20,6 +20,8 @@
 #include <math.h>

 #include "access/skey.h"
+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
@@ -36,7 +38,7 @@


 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
-static List *build_relation_tlist(RelOptInfo *rel);
+static List *build_path_tlist(Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
 static void disuse_physical_tlist(Plan *plan, Path *path);
 static Plan *create_gating_plan(PlannerInfo *root, Plan *plan, List *quals);
@@ -49,6 +51,8 @@ static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
                     List *tlist, List *scan_clauses);
 static IndexScan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
                       List *tlist, List *scan_clauses);
+static HeapFetch *create_heapfetch_plan(PlannerInfo *root, HeapFetchPath *best_path,
+                                        List *tlist, List *scan_clauses);
 static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
                         BitmapHeapPath *best_path,
                         List *tlist, List *scan_clauses);
@@ -67,11 +71,11 @@ static CteScan *create_ctescan_plan(PlannerInfo *root, Path *best_path,
 static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_path,
                           List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
-                     Plan *outer_plan, Plan *inner_plan);
+                     Plan *outer_plan, Plan *inner_plan, List *tlist);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
-                      Plan *outer_plan, Plan *inner_plan);
+                      Plan *outer_plan, Plan *inner_plan, List *tlist);
 static HashJoin *create_hashjoin_plan(PlannerInfo *root, HashPath *best_path,
-                     Plan *outer_plan, Plan *inner_plan);
+                     Plan *outer_plan, Plan *inner_plan, List *tlist);
 static List *fix_indexqual_references(List *indexquals, IndexPath *index_path);
 static Node *fix_indexqual_operand(Node *node, IndexOptInfo *index);
 static List *get_switched_clauses(List *clauses, Relids outerrelids);
@@ -82,6 +86,7 @@ static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
 static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
                Oid indexid, List *indexqual, List *indexqualorig,
                ScanDirection indexscandir);
+static HeapFetch *make_heapfetch(List *qptlist, List *qpqual, Index scanrelid);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
                       List *indexqual,
                       List *indexqualorig);
@@ -157,6 +162,7 @@ create_plan(PlannerInfo *root, Path *best_path)
         case T_IndexScan:
         case T_BitmapHeapScan:
         case T_TidScan:
+        case T_HeapFetch: /* strictly speakin not a scan, but... */
         case T_SubqueryScan:
         case T_FunctionScan:
         case T_ValuesScan:
@@ -216,15 +222,15 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
      * planner.c may replace the tlist we generate here, forcing projection to
      * occur.)
      */
-    if (use_physical_tlist(root, rel))
+    if (use_physical_tlist(root, rel) && !IsA(best_path, IndexPath))
     {
         tlist = build_physical_tlist(root, rel);
         /* if fail because of dropped cols, use regular method */
         if (tlist == NIL)
-            tlist = build_relation_tlist(rel);
+            tlist = build_path_tlist(best_path);
     }
     else
-        tlist = build_relation_tlist(rel);
+        tlist = build_path_tlist(best_path);

     /*
      * Extract the relevant restriction clauses from the parent relation. The
@@ -249,6 +255,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
                                                   scan_clauses);
             break;

+        case T_HeapFetch:
+            plan = (Plan *) create_heapfetch_plan(root,
+                                                  (HeapFetchPath *) best_path,
+                                                  tlist,
+                                                  scan_clauses);
+            break;
+
         case T_BitmapHeapScan:
             plan = (Plan *) create_bitmap_scan_plan(root,
                                                 (BitmapHeapPath *) best_path,
@@ -320,13 +333,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
  * Build a target list (ie, a list of TargetEntry) for a relation.
  */
 static List *
-build_relation_tlist(RelOptInfo *rel)
+build_path_tlist(Path *path)
 {
     List       *tlist = NIL;
     int            resno = 1;
     ListCell   *v;

-    foreach(v, rel->reltargetlist)
+    foreach(v, path->pathtargetlist)
     {
         /* Do we really need to copy here?    Not sure */
         Node       *node = (Node *) copyObject(lfirst(v));
@@ -412,15 +425,22 @@ disuse_physical_tlist(Plan *plan, Path *path)
     switch (path->pathtype)
     {
         case T_SeqScan:
+#ifdef NOT_USED
+            /*
+             * IndexScans are special, we don't use the physical tlist
+             * optimization for them.
+             */
         case T_IndexScan:
+#endif
         case T_BitmapHeapScan:
         case T_TidScan:
+        case T_HeapFetch:
         case T_SubqueryScan:
         case T_FunctionScan:
         case T_ValuesScan:
         case T_CteScan:
         case T_WorkTableScan:
-            plan->targetlist = build_relation_tlist(path->parent);
+            plan->targetlist = build_path_tlist(path);
             break;
         default:
             break;
@@ -477,29 +497,35 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
     Plan       *outer_plan;
     Plan       *inner_plan;
     Plan       *plan;
+    List       *tlist;

     outer_plan = create_plan(root, best_path->outerjoinpath);
     inner_plan = create_plan(root, best_path->innerjoinpath);

+    tlist = build_path_tlist((Path *) best_path);
+
     switch (best_path->path.pathtype)
     {
         case T_MergeJoin:
             plan = (Plan *) create_mergejoin_plan(root,
                                                   (MergePath *) best_path,
                                                   outer_plan,
-                                                  inner_plan);
+                                                  inner_plan,
+                                                  tlist);
             break;
         case T_HashJoin:
             plan = (Plan *) create_hashjoin_plan(root,
                                                  (HashPath *) best_path,
                                                  outer_plan,
-                                                 inner_plan);
+                                                 inner_plan,
+                                                 tlist);
             break;
         case T_NestLoop:
             plan = (Plan *) create_nestloop_plan(root,
                                                  (NestPath *) best_path,
                                                  outer_plan,
-                                                 inner_plan);
+                                                 inner_plan,
+                                                 tlist);
             break;
         default:
             elog(ERROR, "unrecognized node type: %d",
@@ -543,7 +569,7 @@ static Plan *
 create_append_plan(PlannerInfo *root, AppendPath *best_path)
 {
     Append       *plan;
-    List       *tlist = build_relation_tlist(best_path->path.parent);
+    List       *tlist = build_path_tlist((Path *) best_path);
     List       *subplans = NIL;
     ListCell   *subpaths;

@@ -668,7 +694,7 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
      * should be left as-is if we don't need to add any expressions; but if we
      * do have to add expressions, then a projection step will be needed at
      * runtime anyway, so we may as well remove unneeded items. Therefore
-     * newtlist starts from build_relation_tlist() not just a copy of the
+     * newtlist starts from build_path_tlist() not just a copy of the
      * subplan's tlist; and we don't install it into the subplan unless we are
      * sorting or stuff has to be added.
      */
@@ -676,7 +702,7 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
     uniq_exprs = best_path->uniq_exprs;

     /* initialize modified subplan tlist as just the "required" vars */
-    newtlist = build_relation_tlist(best_path->path.parent);
+    newtlist = build_path_tlist((Path *) best_path);
     nextresno = list_length(newtlist) + 1;
     newitems = false;

@@ -764,7 +790,7 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
          * subplan tlist.
          */
         plan = (Plan *) make_agg(root,
-                                 build_relation_tlist(best_path->path.parent),
+                                 build_path_tlist((Path *) best_path),
                                  NIL,
                                  AGG_HASHED,
                                  numGroupCols,
@@ -870,16 +896,32 @@ create_indexscan_plan(PlannerInfo *root,
     List       *indexquals = best_path->indexquals;
     Index        baserelid = best_path->path.parent->relid;
     Oid            indexoid = best_path->indexinfo->indexoid;
-    List       *qpqual;
     List       *stripped_indexquals;
+    List       *indexonlyqualorig = best_path->indexonlyqualorig;
     List       *fixed_indexquals;
     ListCell   *l;
     IndexScan  *scan_plan;
+    RelOptInfo *reloptinfo;
+    IndexOptInfo *indexoptinfo;
+    bool is_indexonly;

     /* it should be a base rel... */
     Assert(baserelid > 0);
     Assert(best_path->path.parent->rtekind == RTE_RELATION);

+    /* Find the index that was used */
+    reloptinfo =  root->simple_rel_array[baserelid];
+    foreach(l, reloptinfo->indexlist)
+    {
+        indexoptinfo = (IndexOptInfo *) lfirst(l);
+        if (indexoptinfo->indexoid == indexoid)
+            break;
+        else
+            indexoptinfo = NULL;
+    }
+    if (indexoptinfo == NULL)
+        elog(ERROR, "can't find IndexOptInfo for index scan");
+
     /*
      * Build "stripped" indexquals structure (no RestrictInfos) to pass to
      * executor as indexqualorig
@@ -892,83 +934,40 @@ create_indexscan_plan(PlannerInfo *root,
      */
     fixed_indexquals = fix_indexqual_references(indexquals, best_path);

-    /*
-     * If this is an innerjoin scan, the indexclauses will contain join
-     * clauses that are not present in scan_clauses (since the passed-in value
-     * is just the rel's baserestrictinfo list).  We must add these clauses to
-     * scan_clauses to ensure they get checked.  In most cases we will remove
-     * the join clauses again below, but if a join clause contains a special
-     * operator, we need to make sure it gets into the scan_clauses.
-     *
-     * Note: pointer comparison should be enough to determine RestrictInfo
-     * matches.
-     */
-    if (best_path->isjoininner)
-        scan_clauses = list_union_ptr(scan_clauses, best_path->indexclauses);
-
-    /*
-     * The qpqual list must contain all restrictions not automatically handled
-     * by the index.  All the predicates in the indexquals will be checked
-     * (either by the index itself, or by nodeIndexscan.c), but if there are
-     * any "special" operators involved then they must be included in qpqual.
-     * The upshot is that qpqual must contain scan_clauses minus whatever
-     * appears in indexquals.
-     *
-     * In normal cases simple pointer equality checks will be enough to spot
-     * duplicate RestrictInfos, so we try that first.  In some situations
-     * (particularly with OR'd index conditions) we may have scan_clauses that
-     * are not equal to, but are logically implied by, the index quals; so we
-     * also try a predicate_implied_by() check to see if we can discard quals
-     * that way.  (predicate_implied_by assumes its first input contains only
-     * immutable functions, so we have to check that.)
-     *
-     * We can also discard quals that are implied by a partial index's
-     * predicate, but only in a plain SELECT; when scanning a target relation
-     * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the
-     * plan so that they'll be properly rechecked by EvalPlanQual testing.
-     */
-    qpqual = NIL;
-    foreach(l, scan_clauses)
-    {
-        RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
-
-        Assert(IsA(rinfo, RestrictInfo));
-        if (rinfo->pseudoconstant)
-            continue;            /* we may drop pseudoconstants here */
-        if (list_member_ptr(indexquals, rinfo))
-            continue;
-        if (!contain_mutable_functions((Node *) rinfo->clause))
-        {
-            List       *clausel = list_make1(rinfo->clause);
-
-            if (predicate_implied_by(clausel, indexquals))
-                continue;
-            if (best_path->indexinfo->indpred)
-            {
-                if (baserelid != root->parse->resultRelation &&
-                    get_rowmark(root->parse, baserelid) == NULL)
-                    if (predicate_implied_by(clausel,
-                                             best_path->indexinfo->indpred))
-                        continue;
-            }
-        }
-        qpqual = lappend(qpqual, rinfo);
-    }
-
     /* Sort clauses into best execution order */
-    qpqual = order_qual_clauses(root, qpqual);
+    indexonlyqualorig = order_qual_clauses(root, indexonlyqualorig);

     /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
-    qpqual = extract_actual_clauses(qpqual, false);
+    indexonlyqualorig = extract_actual_clauses(indexonlyqualorig, false);

-    /* Finally ready to build the plan node */
     scan_plan = make_indexscan(tlist,
-                               qpqual,
+                               indexonlyqualorig,
                                baserelid,
                                indexoid,
                                fixed_indexquals,
                                stripped_indexquals,
                                best_path->indexscandir);
+    scan_plan->indexonlyqualorig = indexonlyqualorig;
+    scan_plan->indexonlyqual = (List *)
+        make_indexonly_expr(root, indexoptinfo,
+                            (Expr *) indexonlyqualorig,
+                            &is_indexonly);
+    if (!is_indexonly)
+        elog(ERROR, "entry in index target list not found in index");
+    if (list_length(tlist) > 1)
+    {
+        scan_plan->indexonlytargetlist =
+            (List *) make_indexonly_expr(root, indexoptinfo,
+                                         (Expr *) tlist, &is_indexonly);
+
+        if (!is_indexonly)
+            elog(ERROR, "entry in index target list not found in index");
+    }
+    else
+    {
+        /* ctid only */
+        scan_plan->indexonlytargetlist = tlist;
+    }

     copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
     /* use the indexscan-specific rows estimate, not the parent rel's */
@@ -977,6 +976,48 @@ create_indexscan_plan(PlannerInfo *root,
     return scan_plan;
 }

+
+/*
+ * create_heapfetch_plan
+ *      Returns a heap fetch plan for the base relation scanned by 'best_path'
+ *      with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static HeapFetch *
+create_heapfetch_plan(PlannerInfo *root,
+                      HeapFetchPath *best_path,
+                      List *tlist,
+                      List *scan_clauses)
+{
+    Index        baserelid = best_path->baserel->relid;
+    List       *qpqual;
+    HeapFetch  *scan_plan;
+    Plan       *subplan;
+
+    /* it should be a base rel... */
+    Assert(baserelid > 0);
+    Assert(best_path->baserel->rtekind == RTE_RELATION);
+
+    subplan = create_plan(root, best_path->subpath);
+
+    qpqual = best_path->heapclauses;
+
+    /* Sort clauses into best execution order */
+    qpqual = order_qual_clauses(root, qpqual);
+
+    /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+    qpqual = extract_actual_clauses(qpqual, false);
+
+    /* Finally ready to build the plan node */
+    scan_plan = make_heapfetch(tlist, qpqual, baserelid);
+
+    copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+    scan_plan->scan.plan.lefttree = subplan;
+    scan_plan->indexqualorig = best_path->indexqualorig;
+
+    return scan_plan;
+}
+
 /*
  * create_bitmap_scan_plan
  *      Returns a bitmap scan plan for the base relation scanned by 'best_path'
@@ -1555,9 +1596,9 @@ static NestLoop *
 create_nestloop_plan(PlannerInfo *root,
                      NestPath *best_path,
                      Plan *outer_plan,
-                     Plan *inner_plan)
+                     Plan *inner_plan,
+                     List *tlist)
 {
-    List       *tlist = build_relation_tlist(best_path->path.parent);
     List       *joinrestrictclauses = best_path->joinrestrictinfo;
     List       *joinclauses;
     List       *otherclauses;
@@ -1606,9 +1647,9 @@ static MergeJoin *
 create_mergejoin_plan(PlannerInfo *root,
                       MergePath *best_path,
                       Plan *outer_plan,
-                      Plan *inner_plan)
+                      Plan *inner_plan,
+                      List *tlist)
 {
-    List       *tlist = build_relation_tlist(best_path->jpath.path.parent);
     List       *joinclauses;
     List       *otherclauses;
     List       *mergeclauses;
@@ -1890,9 +1931,9 @@ static HashJoin *
 create_hashjoin_plan(PlannerInfo *root,
                      HashPath *best_path,
                      Plan *outer_plan,
-                     Plan *inner_plan)
+                     Plan *inner_plan,
+                     List *tlist)
 {
-    List       *tlist = build_relation_tlist(best_path->jpath.path.parent);
     List       *joinclauses;
     List       *otherclauses;
     List       *hashclauses;
@@ -2133,9 +2174,8 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index)
     /*
      * We represent index keys by Var nodes having the varno of the base table
      * but varattno equal to the index's attribute number (index column
-     * position).  This is a bit hokey ... would be cleaner to use a
-     * special-purpose node type that could not be mistaken for a regular Var.
-     * But it will do for now.
+     * position), and varindexno set to index number (IndexOptInfo->indexno)
+     * of the index.
      */
     Var           *result;
     int            pos;
@@ -2160,6 +2200,7 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index)
                 if (index->indexkeys[pos] == varatt)
                 {
                     result = (Var *) copyObject(node);
+                    result->varindexno = index->indexno;
                     result->varattno = pos + 1;
                     return (Node *) result;
                 }
@@ -2186,6 +2227,7 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index)
                 result = makeVar(index->rel->relid, pos + 1,
                                  exprType(lfirst(indexpr_item)), -1,
                                  0);
+                result->varindexno = index->indexno;
                 return (Node *) result;
             }
             indexpr_item = lnext(indexpr_item);
@@ -2198,6 +2240,195 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index)
 }

 /*
+ * replace_heapvars_with_indexvars
+ *
+ * Replaces all Vars in an expression with Vars that refer to columns
+ * of an index. The resulting expression can be used in the executor to
+ * evaluate an expression using an IndexTuple from the index, instead
+ * of a heap tuple.
+ *
+ * Returns NULL and sets context->indexonly to FALSE if the expression can't
+ * be evaluated using columns from the index only.
+ */
+typedef struct
+{
+    IndexOptInfo *index;    /* index we're dealing with */
+    bool        indexonly;
+} replace_heapvars_context;
+
+static Node *
+replace_heapvars_with_indexvars(Node *node, replace_heapvars_context *context)
+{
+    /*
+     * We represent index keys by Var nodes having the varno of the base table
+     * but varattno equal to the index's attribute number (index column
+     * position), and varindexno set to index number (IndexOptInfo->indexno)
+     * of the index.
+     */
+    Var           *result;
+    int            pos;
+    IndexOptInfo *index = context->index;
+    ListCell   *indexpr_item;
+
+    if (node == NULL || !context->indexonly)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *var = (Var *) node;
+
+        if (var->varno == index->rel->relid && var->varindexno == -1)
+        {
+            /* Try to match against simple index columns */
+            int            varatt = var->varattno;
+
+            if (varatt != 0)
+            {
+                for (pos = 0; pos < index->ncolumns; pos++)
+                {
+                    /*
+                     * If the type stored in the index doesn't match that
+                     * in the heap, we can't use the value from the index.
+                     */
+                    if (index->atttype[pos] != var->vartype)
+                        continue;
+
+                    if (index->indexkeys[pos] == varatt)
+                    {
+
+                        result = (Var *) copyObject(node);
+                        result->varattno = pos + 1;
+                        result->varindexno = context->index->indexno;
+                        return (Node *) result;
+                    }
+                }
+                /* Not in index. The heap tuple is needed for this qual. */
+                context->indexonly = false;
+                return NULL;
+            }
+            else
+            {
+                /* TODO: we don't handle full-row references yet. */
+                context->indexonly = false;
+                return NULL;
+            }
+        }
+        else
+        {
+            /* XXX: Can this happen? */
+            /*
+            context->indexonly = false;
+            return NULL;
+            */
+        }
+    }
+
+    /* Try to match against index expressions */
+    if (index->indexprs != NIL)
+    {
+        indexpr_item = list_head(index->indexprs);
+        for (pos = 0; pos < index->ncolumns; pos++)
+        {
+            if (index->indexkeys[pos] == 0)
+            {
+                Node       *indexkey;
+                Oid            indexkey_type;
+
+                if (indexpr_item == NULL)
+                    elog(ERROR, "too few entries in indexprs list");
+
+                indexkey = (Node *) lfirst(indexpr_item);
+                if (indexkey && IsA(indexkey, RelabelType))
+                    indexkey = (Node *) ((RelabelType *) indexkey)->arg;
+
+                /*
+                 * If the type stored in the index doesn't match that of
+                 * the expression, we can't use the value from the index.
+                 */
+                indexkey_type = exprType(indexkey);
+                if (index->atttype[pos] != indexkey_type)
+                    continue;
+
+                if (equal(node, indexkey))
+                {
+                    /* Found a match */
+                    result = makeVar(index->rel->relid, pos + 1,
+                                     exprType(lfirst(indexpr_item)), -1,
+                                     0);
+                    result->varindexno = context->index->indexno;
+                    return (Node *) result;
+                }
+                indexpr_item = lnext(indexpr_item);
+            }
+        }
+    }
+
+    /* Not found */
+    return expression_tree_mutator(node, replace_heapvars_with_indexvars,
+                                   context);
+}
+
+/*
+ * make_indexonly_expr
+ *
+ * Given an expression, return an equivalent Expr tree that evaluates the
+ * expression using columns from an index. On success, *indexonly is set to
+ * TRUE. If the expression can't be evaluated using only index columns,
+ * *indexonly is set to FALSE and NULL is returned.
+ *
+ * This is similar to fix_indexonly_references, but we return NULL instead
+ * of throwing an error if the expression can't be evaluated using only index
+ * columns, and the input expression doesn't need to be in simple "A op B"
+ * format.
+ */
+Expr *
+make_indexonly_expr(PlannerInfo *root, IndexOptInfo *index,
+                    Expr *expr, bool *indexonly)
+{
+    replace_heapvars_context context;
+
+    if (expr == NULL)
+    {
+        if (indexonly)
+            *indexonly = true;
+        return NULL;
+    }
+
+    /*
+     * If the indexam can't return index tuples, we can forget about it.
+     */
+    if (!index->amreturntuple)
+    {
+        if (indexonly)
+            *indexonly = false;
+        return NULL;
+    }
+
+    /*
+     * Expression containing volatile functions can't be used as index-only
+     * quals, because the expression would be executed on dead rows as well.
+     */
+    if (contain_volatile_functions((Node *) expr))
+    {
+        if (indexonly)
+            *indexonly = false;
+        return NULL;
+    }
+
+    context.index = index;
+    context.indexonly = true;
+
+    expr = (Expr *) replace_heapvars_with_indexvars((Node *) expr, &context);
+
+    if (indexonly)
+        *indexonly = context.indexonly;
+    if (!context.indexonly)
+        return NULL;
+    else
+        return expr;
+}
+
+/*
  * get_switched_clauses
  *      Given a list of merge or hash joinclauses (as RestrictInfo nodes),
  *      extract the bare clauses, and rearrange the elements within the
@@ -2435,6 +2666,25 @@ make_indexscan(List *qptlist,
     return node;
 }

+
+static HeapFetch *
+make_heapfetch(List *qptlist,
+               List *qpqual,
+               Index scanrelid)
+{
+    HeapFetch  *node = makeNode(HeapFetch);
+    Plan       *plan = &node->scan.plan;
+
+    /* cost should be inserted by caller */
+    plan->targetlist = qptlist;
+    plan->qual = qpqual;
+    plan->lefttree = NULL;
+    plan->righttree = NULL;
+    node->scan.scanrelid = scanrelid;
+
+    return node;
+}
+
 static BitmapIndexScan *
 make_bitmap_indexscan(Index scanrelid,
                       Oid indexid,
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 34e9617..37cf67b 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -25,6 +25,7 @@
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
 #include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
 #include "optimizer/subselect.h"
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
@@ -38,7 +39,7 @@ typedef struct
     Oid            aggsortop;        /* Oid of its sort operator */
     Expr       *target;            /* expression we are aggregating on */
     Expr       *notnulltest;    /* expression for "target IS NOT NULL" */
-    IndexPath  *path;            /* access path for index scan */
+    Path  *path;            /* access path for index scan */
     Cost        pathcost;        /* estimated cost to fetch first row */
     bool        nulls_first;    /* null ordering direction matching index */
     Param       *param;            /* param for subplan's output */
@@ -297,7 +298,7 @@ find_minmax_aggs_walker(Node *node, List **context)
 static bool
 build_minmax_path(PlannerInfo *root, RelOptInfo *rel, MinMaxAggInfo *info)
 {
-    IndexPath  *best_path = NULL;
+    Path       *best_path = NULL;
     Cost        best_cost = 0;
     bool        best_nulls_first = false;
     NullTest   *ntest;
@@ -327,6 +328,8 @@ build_minmax_path(PlannerInfo *root, RelOptInfo *rel, MinMaxAggInfo *info)
         IndexPath  *new_path;
         Cost        new_cost;
         bool        found_clause;
+        List       *heapquals;
+        List       *indexonlyquals;

         /* Ignore non-btree indexes */
         if (index->relam != BTREE_AM_OID)
@@ -399,8 +402,12 @@ build_minmax_path(PlannerInfo *root, RelOptInfo *rel, MinMaxAggInfo *info)
         /*
          * Build the access path.  We don't bother marking it with pathkeys.
          */
+        extract_indexonly_quals(root, index, restrictclauses, NULL,
+                                &heapquals, &indexonlyquals);
+
         new_path = create_index_path(root, index,
                                      restrictclauses,
+                                     indexonlyquals,
                                      NIL,
                                      indexscandir,
                                      NULL);
@@ -420,7 +427,10 @@ build_minmax_path(PlannerInfo *root, RelOptInfo *rel, MinMaxAggInfo *info)
          */
         if (best_path == NULL || new_cost < best_cost)
         {
-            best_path = new_path;
+            best_path = (Path *) create_heapfetch_path(root, rel, (Path *) new_path,
+                                                       index, heapquals, get_actual_clauses(new_path->indexquals));
+
+
             best_cost = new_cost;
             if (ScanDirectionIsForward(indexscandir))
                 best_nulls_first = index->nulls_first[indexcol];
@@ -478,6 +488,7 @@ make_agg_subplan(PlannerInfo *root, MinMaxAggInfo *info)
     Plan       *iplan;
     TargetEntry *tle;
     SortGroupClause *sortcl;
+    IndexOptInfo *indexinfo;

     /*
      * Generate a suitably modified query.    Much of the work here is probably
@@ -545,14 +556,20 @@ make_agg_subplan(PlannerInfo *root, MinMaxAggInfo *info)

     plan->targetlist = copyObject(subparse->targetList);

-    if (IsA(plan, Result))
-        iplan = plan->lefttree;
-    else
-        iplan = plan;
+    iplan = plan;
+    while(IsA(iplan, Result) || IsA(iplan, HeapFetch))
+        iplan = iplan->lefttree;
     Assert(IsA(iplan, IndexScan));

+    if (IsA(info->path, IndexPath))
+        indexinfo = ((IndexPath *) info->path)->indexinfo;
+    else if (IsA(info->path, HeapFetchPath))
+        indexinfo = ((HeapFetchPath *) info->path)->indexinfo;
+    else
+        elog(ERROR, "unexpected path type %d", info->path->type);
+
     if (!list_member(iplan->qual, info->notnulltest) &&
-        !list_member(info->path->indexinfo->indpred, info->notnulltest))
+        !list_member(indexinfo->indpred, info->notnulltest))
         iplan->qual = lcons(info->notnulltest, iplan->qual);

     plan = (Plan *) make_limit(plan,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index f3c50fe..6f02beb 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -15,7 +15,10 @@
  */
 #include "postgres.h"

+#include "access/sysattr.h"
 #include "access/transam.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -30,6 +33,7 @@
 typedef struct
 {
     Index        varno;            /* RT index of Var */
+    int            varindexno;
     AttrNumber    varattno;        /* attr number of Var */
     AttrNumber    resno;            /* TLE position of Var */
 } tlist_vinfo;
@@ -92,6 +96,7 @@ static void set_join_references(PlannerGlobal *glob, Join *join, int rtoffset);
 static void set_inner_join_references(PlannerGlobal *glob, Plan *inner_plan,
                           indexed_tlist *outer_itlist);
 static void set_upper_references(PlannerGlobal *glob, Plan *plan, int rtoffset);
+static void set_heapfetch_references(PlannerGlobal *glob, Plan *plan, int rtoffset, int scanvar);
 static void set_dummy_tlist_references(Plan *plan, int rtoffset);
 static indexed_tlist *build_tlist_index(List *tlist);
 static Var *search_indexed_tlist_for_var(Var *var,
@@ -266,14 +271,37 @@ set_plan_refs(PlannerGlobal *glob, Plan *plan, int rtoffset)
                 IndexScan  *splan = (IndexScan *) plan;

                 splan->scan.scanrelid += rtoffset;
+
+                /*
+                 * Replace the target list (containing vars referring heap
+                 * cols), with straight references to index cols. XXX: this
+                 * depends on the fact that build_index_tlist() includes all
+                 * index cols in the target list.
+                 */
                 splan->scan.plan.targetlist =
                     fix_scan_list(glob, splan->scan.plan.targetlist, rtoffset);
+
                 splan->scan.plan.qual =
                     fix_scan_list(glob, splan->scan.plan.qual, rtoffset);
+
                 splan->indexqual =
                     fix_scan_list(glob, splan->indexqual, rtoffset);
                 splan->indexqualorig =
                     fix_scan_list(glob, splan->indexqualorig, rtoffset);
+                splan->indexonlyqual =
+                    fix_scan_list(glob, splan->indexonlyqual, rtoffset);
+                splan->indexonlyqualorig =
+                    fix_scan_list(glob, splan->indexonlyqualorig, rtoffset);
+                splan->indexonlytargetlist =
+                    fix_scan_list(glob, splan->indexonlytargetlist, rtoffset);
+                {
+                    TargetEntry *tid_te = (TargetEntry *) llast(splan->indexonlytargetlist);
+                    /* hack the last entry in target list to be ctid */
+                    Var *tidvar = (Var *) tid_te->expr;
+                    Assert(IsA(tidvar, Var));
+                    Assert(tidvar->varattno = SelfItemPointerAttributeNumber);
+                    tidvar->varattno = list_length(splan->scan.plan.targetlist);
+                }
             }
             break;
         case T_BitmapIndexScan:
@@ -303,6 +331,14 @@ set_plan_refs(PlannerGlobal *glob, Plan *plan, int rtoffset)
                     fix_scan_list(glob, splan->bitmapqualorig, rtoffset);
             }
             break;
+        case T_HeapFetch:
+            {
+                HeapFetch *splan = (HeapFetch *) plan;
+                int oldscanrelid = splan->scan.scanrelid;
+                splan->scan.scanrelid += rtoffset;
+                set_heapfetch_references(glob, plan, rtoffset, oldscanrelid);
+            }
+            break;
         case T_TidScan:
             {
                 TidScan    *splan = (TidScan *) plan;
@@ -925,12 +961,12 @@ set_inner_join_references(PlannerGlobal *glob, Plan *inner_plan,
          */
         IndexScan  *innerscan = (IndexScan *) inner_plan;
         List       *indexqualorig = innerscan->indexqualorig;
+        List       *indexonlyqualorig = innerscan->indexonlyqualorig;
+        Index        innerrel = innerscan->scan.scanrelid;

         /* No work needed if indexqual refers only to its own rel... */
         if (NumRelids((Node *) indexqualorig) > 1)
         {
-            Index        innerrel = innerscan->scan.scanrelid;
-
             /* only refs to outer vars get changed in the inner qual */
             innerscan->indexqualorig = fix_join_expr(glob,
                                                      indexqualorig,
@@ -944,20 +980,42 @@ set_inner_join_references(PlannerGlobal *glob, Plan *inner_plan,
                                                  NULL,
                                                  innerrel,
                                                  0);
+        }

-            /*
-             * We must fix the inner qpqual too, if it has join clauses (this
-             * could happen if special operators are involved: some indexquals
-             * may get rechecked as qpquals).
-             */
-            if (NumRelids((Node *) inner_plan->qual) > 1)
-                inner_plan->qual = fix_join_expr(glob,
-                                                 inner_plan->qual,
+        /* No work needed if indexqual refers only to its own rel... */
+        if (NumRelids((Node *) indexonlyqualorig) > 1)
+        {
+            /* only refs to outer vars get changed in the inner qual */
+            innerscan->indexonlyqualorig = fix_join_expr(glob,
+                                                     indexonlyqualorig,
+                                                     outer_itlist,
+                                                     NULL,
+                                                     innerrel,
+                                                     0);
+            innerscan->indexonlyqual = fix_join_expr(glob,
+                                                 innerscan->indexonlyqual,
                                                  outer_itlist,
                                                  NULL,
                                                  innerrel,
                                                  0);
         }
+
+        /*
+         * We must fix the inner qpqual too, if it has join clauses (this
+         * could happen if special operators are involved: some indexquals
+         * may get rechecked as qpquals).
+         */
+        if (NumRelids((Node *) inner_plan->qual) > 1)
+            inner_plan->qual = fix_join_expr(glob,
+                                             inner_plan->qual,
+                                             outer_itlist,
+                                             NULL,
+                                             innerrel,
+                                             0);
+    }
+    else if (IsA(inner_plan, HeapFetch))
+    {
+        set_inner_join_references(glob, inner_plan->lefttree, outer_itlist);
     }
     else if (IsA(inner_plan, BitmapIndexScan))
     {
@@ -1138,6 +1196,49 @@ set_upper_references(PlannerGlobal *glob, Plan *plan, int rtoffset)
     pfree(subplan_itlist);
 }

+static void
+set_heapfetch_references(PlannerGlobal *glob, Plan *plan, int rtoffset,
+                         int scanvar)
+{
+    Plan       *subplan = plan->lefttree;
+    indexed_tlist *subplan_itlist;
+    HeapFetch  *hfplan = (HeapFetch *) plan;
+    Index        varno = scanvar;
+    AttrNumber    varattno = SelfItemPointerAttributeNumber;
+    tlist_vinfo *vinfo;
+    int            i;
+
+    subplan_itlist = build_tlist_index(subplan->targetlist);
+
+    /* Find the TID column in the node below */
+    vinfo = subplan_itlist->vars;
+    i = subplan_itlist->num_vars;
+    while (i-- > 0)
+    {
+        if (vinfo->varno == varno && vinfo->varattno == varattno &&
+            vinfo->varindexno != -1)
+        {
+            /* Found a match */
+            hfplan->tidIdx = vinfo->resno;
+            break;
+        }
+        vinfo++;
+    }
+    if (i < 0)
+        elog(ERROR, "ctid not found in subplan target list");
+
+    plan->targetlist = fix_join_expr(glob, plan->targetlist, subplan_itlist,
+                                      NULL, scanvar, rtoffset);
+
+    plan->qual =
+        fix_join_expr(glob,
+                      plan->qual,
+                      subplan_itlist,
+                      NULL, scanvar, rtoffset);
+
+    pfree(subplan_itlist);
+}
+
 /*
  * set_dummy_tlist_references
  *      Replace the targetlist of an upper-level plan node with a simple
@@ -1230,6 +1331,7 @@ build_tlist_index(List *tlist)
             Var           *var = (Var *) tle->expr;

             vinfo->varno = var->varno;
+            vinfo->varindexno = var->varindexno;
             vinfo->varattno = var->varattno;
             vinfo->resno = tle->resno;
             vinfo++;
@@ -1282,6 +1384,7 @@ build_tlist_index_other_vars(List *tlist, Index ignore_rel)
             if (var->varno != ignore_rel)
             {
                 vinfo->varno = var->varno;
+                vinfo->varindexno = var->varindexno;
                 vinfo->varattno = var->varattno;
                 vinfo->resno = tle->resno;
                 vinfo++;
@@ -1317,7 +1420,8 @@ search_indexed_tlist_for_var(Var *var, indexed_tlist *itlist,
     i = itlist->num_vars;
     while (i-- > 0)
     {
-        if (vinfo->varno == varno && vinfo->varattno == varattno)
+        if (vinfo->varno == varno && vinfo->varattno == varattno &&
+            vinfo->varindexno == vinfo->varindexno)
         {
             /* Found a match */
             Var           *newvar = copyVar(var);
@@ -1426,6 +1530,15 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context)
     {
         Var           *var = (Var *) node;

+        /* If it's for acceptable_rel, adjust and return it */
+        if (var->varno == context->acceptable_rel)
+        {
+            var = copyVar(var);
+            var->varno += context->rtoffset;
+            var->varnoold += context->rtoffset;
+            return (Node *) var;
+        }
+
         /* First look for the var in the input tlists */
         newvar = search_indexed_tlist_for_var(var,
                                               context->outer_itlist,
@@ -1443,17 +1556,9 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context)
                 return (Node *) newvar;
         }

-        /* If it's for acceptable_rel, adjust and return it */
-        if (var->varno == context->acceptable_rel)
-        {
-            var = copyVar(var);
-            var->varno += context->rtoffset;
-            var->varnoold += context->rtoffset;
-            return (Node *) var;
-        }
-
         /* No referent found for Var */
-        elog(ERROR, "variable not found in subplan target lists");
+        elog(ERROR, "variable %d:%d not found in subplan target lists",
+             var->varno, var->varattno);
     }
     if (IsA(node, PlaceHolderVar))
     {
@@ -1555,12 +1660,13 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
     {
         Var           *var = (Var *) node;

+        /* If it's for acceptable_rel, adjust and return it */
         newvar = search_indexed_tlist_for_var(var,
                                               context->subplan_itlist,
                                               OUTER,
                                               context->rtoffset);
         if (!newvar)
-            elog(ERROR, "variable not found in subplan target list");
+            elog(ERROR, "variable %d:%d not found in subplan target list", var->varno, var->varattno);
         return (Node *) newvar;
     }
     if (IsA(node, PlaceHolderVar))
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 5ba2254..09eccbc 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1849,10 +1849,12 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params)
         case T_IndexScan:
             finalize_primnode((Node *) ((IndexScan *) plan)->indexqual,
                               &context);
+            finalize_primnode((Node *) ((IndexScan *) plan)->indexonlyqual,
+                              &context);

             /*
-             * we need not look at indexqualorig, since it will have the same
-             * param references as indexqual.
+             * we need not look at indexqualorig/indexonlyqualorig, since they
+             * will have the same param references as indexqual/indexonlyqual.
              */
             break;

@@ -2018,6 +2020,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params)
         case T_Unique:
         case T_SetOp:
         case T_Group:
+        case T_HeapFetch:
             break;

         default:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e80f4cf..f2140e9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -16,13 +16,19 @@

 #include <math.h>

+#include "access/sysattr.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
 #include "optimizer/var.h"
 #include "parser/parse_expr.h"
@@ -401,6 +407,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel)
     pathnode->pathtype = T_SeqScan;
     pathnode->parent = rel;
     pathnode->pathkeys = NIL;    /* seqscan has unordered result */
+    pathnode->pathtargetlist = list_copy(rel->reltargetlist);

     cost_seqscan(pathnode, root, rel);

@@ -427,6 +434,7 @@ IndexPath *
 create_index_path(PlannerInfo *root,
                   IndexOptInfo *index,
                   List *clause_groups,
+                  List *indexonlyqualorig,
                   List *pathkeys,
                   ScanDirection indexscandir,
                   RelOptInfo *outer_rel)
@@ -459,6 +467,8 @@ create_index_path(PlannerInfo *root,
     /* Flatten the clause-groups list to produce indexclauses list */
     allclauses = flatten_clausegroups_list(clause_groups);

+    pathnode->indexonlyqualorig = indexonlyqualorig;
+
     /* Fill in the pathnode */
     pathnode->indexinfo = index;
     pathnode->indexclauses = allclauses;
@@ -504,7 +514,45 @@ create_index_path(PlannerInfo *root,
         pathnode->rows = rel->rows;
     }

-    cost_index(pathnode, root, index, indexquals, outer_rel);
+    cost_index(pathnode, root, index, indexquals, indexonlyqualorig, outer_rel);
+
+    if (index->amreturntuple)
+    {
+        ListCell   *lexpr = list_head(index->indexprs);
+
+        int i;
+        for(i = 0; i < index->ncolumns; i++)
+        {
+            int keycol = index->indexkeys[i];
+            Expr *expr;
+
+            if (keycol != 0)
+            {
+                expr = (Expr *) makeVar(rel->relid,
+                                        keycol,
+                                        index->atttype[i],
+                                        -1,
+                                        0);
+            }
+            else
+            {
+                expr = (Expr *) lfirst(lexpr);
+                lexpr = lnext(lexpr);
+            }
+
+            pathnode->path.pathtargetlist =
+                lappend(pathnode->path.pathtargetlist, expr);
+        }
+    }
+
+    {
+        Var *tidvar = makeVar(rel->relid, SelfItemPointerAttributeNumber,
+                              TIDOID, -1, 0);
+        tidvar->varindexno = index->indexno;
+
+        pathnode->path.pathtargetlist =
+            lappend(pathnode->path.pathtargetlist, tidvar);
+    }

     return pathnode;
 }
@@ -529,6 +577,7 @@ create_bitmap_heap_path(PlannerInfo *root,
     pathnode->path.pathtype = T_BitmapHeapScan;
     pathnode->path.parent = rel;
     pathnode->path.pathkeys = NIL;        /* always unordered */
+    pathnode->path.pathtargetlist = list_copy(rel->reltargetlist);

     pathnode->bitmapqual = bitmapqual;
     pathnode->isjoininner = (outer_rel != NULL);
@@ -613,6 +662,41 @@ create_bitmap_or_path(PlannerInfo *root,
     return pathnode;
 }

+
+/*
+ * create_heapfetch_path
+ *      Creates a path corresponding to a scan by TID, returning the pathnode.
+ */
+HeapFetchPath *
+create_heapfetch_path(PlannerInfo *root, RelOptInfo *rel,
+                      Path *subpath, IndexOptInfo *index,
+                      List *heapQuals, List *indexqualorig)
+{
+    HeapFetchPath    *pathnode = makeNode(HeapFetchPath);
+
+    pathnode->path.pathtype = T_HeapFetch;
+    pathnode->baserel = rel;
+    pathnode->path.parent = subpath->parent;
+    pathnode->path.pathkeys = subpath->pathkeys;
+    pathnode->path.pathtargetlist = rel->reltargetlist;
+
+    pathnode->subpath = subpath;
+
+    pathnode->indexinfo = index;
+
+    pathnode->heapclauses = heapQuals;
+    pathnode->indexqualorig = indexqualorig;
+
+    pathnode->path.startup_cost = subpath->startup_cost;
+    pathnode->path.total_cost = subpath->total_cost;
+
+#ifdef NOT_IMPLEMENTED
+    cost_heapfetch(&pathnode->path, root, rel);
+#endif
+
+    return pathnode;
+}
+
 /*
  * create_tidscan_path
  *      Creates a path corresponding to a scan by TID, returning the pathnode.
@@ -625,6 +709,7 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals)
     pathnode->path.pathtype = T_TidScan;
     pathnode->path.parent = rel;
     pathnode->path.pathkeys = NIL;
+    pathnode->path.pathtargetlist = list_copy(rel->reltargetlist);

     pathnode->tidquals = tidquals;

@@ -648,6 +733,7 @@ create_append_path(RelOptInfo *rel, List *subpaths)
     pathnode->path.parent = rel;
     pathnode->path.pathkeys = NIL;        /* result is always considered
                                          * unsorted */
+    pathnode->path.pathtargetlist = rel->reltargetlist;
     pathnode->subpaths = subpaths;

     pathnode->path.startup_cost = 0;
@@ -707,6 +793,7 @@ create_material_path(RelOptInfo *rel, Path *subpath)
     pathnode->path.parent = rel;

     pathnode->path.pathkeys = subpath->pathkeys;
+    pathnode->path.pathtargetlist = rel->reltargetlist;

     pathnode->subpath = subpath;

@@ -912,6 +999,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,

     pathnode->path.pathtype = T_Unique;
     pathnode->path.parent = rel;
+    pathnode->path.pathtargetlist = rel->reltargetlist;

     /*
      * Treat the output as always unsorted, since we don't necessarily have
@@ -1228,6 +1316,7 @@ create_subqueryscan_path(RelOptInfo *rel, List *pathkeys)
     pathnode->pathtype = T_SubqueryScan;
     pathnode->parent = rel;
     pathnode->pathkeys = pathkeys;
+    pathnode->pathtargetlist = rel->reltargetlist;

     cost_subqueryscan(pathnode, rel);

@@ -1247,6 +1336,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel)
     pathnode->pathtype = T_FunctionScan;
     pathnode->parent = rel;
     pathnode->pathkeys = NIL;    /* for now, assume unordered result */
+    pathnode->pathtargetlist = rel->reltargetlist;

     cost_functionscan(pathnode, root, rel);

@@ -1266,6 +1356,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel)
     pathnode->pathtype = T_ValuesScan;
     pathnode->parent = rel;
     pathnode->pathkeys = NIL;    /* result is always unordered */
+    pathnode->pathtargetlist = rel->reltargetlist;

     cost_valuesscan(pathnode, root, rel);

@@ -1285,6 +1376,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel)
     pathnode->pathtype = T_CteScan;
     pathnode->parent = rel;
     pathnode->pathkeys = NIL;    /* XXX for now, result is always unordered */
+    pathnode->pathtargetlist = rel->reltargetlist;

     cost_ctescan(pathnode, root, rel);

@@ -1304,6 +1396,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel)
     pathnode->pathtype = T_WorkTableScan;
     pathnode->parent = rel;
     pathnode->pathkeys = NIL;    /* result is always unordered */
+    pathnode->pathtargetlist = rel->reltargetlist;

     /* Cost is the same as for a regular CTE scan */
     cost_ctescan(pathnode, root, rel);
@@ -1345,6 +1438,7 @@ create_nestloop_path(PlannerInfo *root,
     pathnode->innerjoinpath = inner_path;
     pathnode->joinrestrictinfo = restrict_clauses;
     pathnode->path.pathkeys = pathkeys;
+    pathnode->path.pathtargetlist = joinrel->reltargetlist;

     cost_nestloop(pathnode, root, sjinfo);

@@ -1445,6 +1539,7 @@ create_mergejoin_path(PlannerInfo *root,
     pathnode->path_mergeclauses = mergeclauses;
     pathnode->outersortkeys = outersortkeys;
     pathnode->innersortkeys = innersortkeys;
+    pathnode->jpath.path.pathtargetlist = joinrel->reltargetlist;

     cost_mergejoin(pathnode, root, sjinfo);

@@ -1482,6 +1577,7 @@ create_hashjoin_path(PlannerInfo *root,
     pathnode->jpath.outerjoinpath = outer_path;
     pathnode->jpath.innerjoinpath = inner_path;
     pathnode->jpath.joinrestrictinfo = restrict_clauses;
+    pathnode->jpath.path.pathtargetlist = joinrel->reltargetlist;

     /*
      * A hashjoin never has pathkeys, since its output ordering is
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9419178..5c0c342 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -193,9 +193,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
              * We pre-zero the ordering info in case the index is unordered.
              */
             info->indexkeys = (int *) palloc(sizeof(int) * ncolumns);
-            info->opfamily = (Oid *) palloc0(sizeof(Oid) * (4 * ncolumns + 1));
+            info->opfamily = (Oid *) palloc0(sizeof(Oid) * (5 * ncolumns + 1));
             info->opcintype = info->opfamily + (ncolumns + 1);
-            info->fwdsortop = info->opcintype + ncolumns;
+            info->atttype = info->opcintype + ncolumns;
+            info->fwdsortop = info->atttype + ncolumns;
             info->revsortop = info->fwdsortop + ncolumns;
             info->nulls_first = (bool *) palloc0(sizeof(bool) * ncolumns);

@@ -204,6 +205,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
                 info->indexkeys[i] = index->indkey.values[i];
                 info->opfamily[i] = indexRelation->rd_opfamily[i];
                 info->opcintype[i] = indexRelation->rd_opcintype[i];
+                info->atttype[i] = indexRelation->rd_att->attrs[i]->atttypid;
             }

             info->relam = indexRelation->rd_rel->relam;
@@ -212,6 +214,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
             info->amsearchnulls = indexRelation->rd_am->amsearchnulls;
             info->amhasgettuple = OidIsValid(indexRelation->rd_am->amgettuple);
             info->amhasgetbitmap = OidIsValid(indexRelation->rd_am->amgetbitmap);
+            info->amreturntuple = indexRelation->rd_am->amreturntuple;

             /*
              * Fetch the ordering operators associated with the index, if any.
@@ -295,7 +298,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,

             index_close(indexRelation, NoLock);

-            indexinfos = lcons(info, indexinfos);
+            info->indexno = list_length(indexinfos);
+            indexinfos = lappend(indexinfos, info);
         }

         list_free(indexoidlist);
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index c70240b..d299b19 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3375,7 +3375,10 @@ get_variable(Var *var, int levelsup, bool showstar, deparse_context *context)
     char       *schemaname;
     char       *refname;
     char       *attname;
-
+/*
+    appendStringInfo(buf, "%d:%d", var->varno, var->varattno);
+    return NULL;
+*/
     /* Find appropriate nesting depth */
     netlevelsup = var->varlevelsup + levelsup;
     if (netlevelsup >= list_length(context->namespaces))
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index cddf3e8..6d72e3e 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -425,6 +425,7 @@ typedef struct BTScanPosItem    /* what we remember about each match */
 {
     ItemPointerData heapTid;    /* TID of referenced heap item */
     OffsetNumber indexOffset;    /* index item's location within page */
+    uint16        bufferOffset;    /* location in buffer */
 } BTScanPosItem;

 typedef struct BTScanPosData
@@ -453,7 +454,11 @@ typedef struct BTScanPosData
     int            lastItem;        /* last valid index in items[] */
     int            itemIndex;        /* current index in items[] */

-    BTScanPosItem items[MaxIndexTuplesPerPage]; /* MUST BE LAST */
+    BTScanPosItem items[MaxIndexTuplesPerPage]; /* MUST BE LAST EXCEPT FOR THE STUFF BELOW! */
+
+    char        *buffer;        /* to hold index tuples */
+    int            bufferUsed;
+
 } BTScanPosData;

 typedef BTScanPosData *BTScanPos;
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 3bac6d3..748c6ce 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -16,6 +16,7 @@

 #include "access/genam.h"
 #include "access/heapam.h"
+#include "access/itup.h"


 typedef struct HeapScanDescData
@@ -64,6 +65,7 @@ typedef struct IndexScanDescData
     Snapshot    xs_snapshot;    /* snapshot to see */
     int            numberOfKeys;    /* number of scan keys */
     ScanKey        keyData;        /* array of scan key descriptors */
+    bool        needIndexTuple;    /* should scan store index tuple in xs_itup? */

     /* signaling to index AM about killing index tuples */
     bool        kill_prior_tuple;        /* last-returned tuple is dead */
@@ -77,6 +79,8 @@ typedef struct IndexScanDescData
     Buffer        xs_cbuf;        /* current heap buffer in scan, if any */
     /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
     bool        xs_recheck;        /* T means scan keys must be rechecked */
+
+    IndexTuple    xs_itup;        /* current index tuple, if any */
 } IndexScanDescData;

 /* Struct for heap-or-index scans of system tables */
diff --git a/src/include/catalog/pg_am.h b/src/include/catalog/pg_am.h
index f577e57..dd057cd 100644
--- a/src/include/catalog/pg_am.h
+++ b/src/include/catalog/pg_am.h
@@ -49,6 +49,7 @@ CATALOG(pg_am,2601)
     bool        amsearchnulls;    /* can AM search for NULL index entries? */
     bool        amstorage;        /* can storage type differ from column type? */
     bool        amclusterable;    /* does AM support cluster command? */
+    bool        amreturntuple;    /* can AM return tuples back from index? */
     Oid            amkeytype;        /* type of data in index, or InvalidOid */
     regproc        aminsert;        /* "insert this tuple" function */
     regproc        ambeginscan;    /* "start new scan" function */
@@ -76,7 +77,7 @@ typedef FormData_pg_am *Form_pg_am;
  *        compiler constants for pg_am
  * ----------------
  */
-#define Natts_pg_am                        26
+#define Natts_pg_am                        27
 #define Anum_pg_am_amname                1
 #define Anum_pg_am_amstrategies            2
 #define Anum_pg_am_amsupport            3
@@ -89,36 +90,37 @@ typedef FormData_pg_am *Form_pg_am;
 #define Anum_pg_am_amsearchnulls        10
 #define Anum_pg_am_amstorage            11
 #define Anum_pg_am_amclusterable        12
-#define Anum_pg_am_amkeytype            13
-#define Anum_pg_am_aminsert                14
-#define Anum_pg_am_ambeginscan            15
-#define Anum_pg_am_amgettuple            16
-#define Anum_pg_am_amgetbitmap            17
-#define Anum_pg_am_amrescan                18
-#define Anum_pg_am_amendscan            19
-#define Anum_pg_am_ammarkpos            20
-#define Anum_pg_am_amrestrpos            21
-#define Anum_pg_am_ambuild                22
-#define Anum_pg_am_ambulkdelete            23
-#define Anum_pg_am_amvacuumcleanup        24
-#define Anum_pg_am_amcostestimate        25
-#define Anum_pg_am_amoptions            26
+#define Anum_pg_am_amreturntuple        13
+#define Anum_pg_am_amkeytype            14
+#define Anum_pg_am_aminsert                15
+#define Anum_pg_am_ambeginscan            16
+#define Anum_pg_am_amgettuple            17
+#define Anum_pg_am_amgetbitmap            18
+#define Anum_pg_am_amrescan                19
+#define Anum_pg_am_amendscan            20
+#define Anum_pg_am_ammarkpos            21
+#define Anum_pg_am_amrestrpos            22
+#define Anum_pg_am_ambuild                23
+#define Anum_pg_am_ambulkdelete            24
+#define Anum_pg_am_amvacuumcleanup        25
+#define Anum_pg_am_amcostestimate        26
+#define Anum_pg_am_amoptions            27

 /* ----------------
  *        initial contents of pg_am
  * ----------------
  */

-DATA(insert OID = 403 (  btree    5 1 t t t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan
btendscanbtmarkpos btrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions )); 
+DATA(insert OID = 403 (  btree    5 1 t t t t t t t f t t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan
btendscanbtmarkpos btrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions )); 
 DESCR("b-tree index access method");
 #define BTREE_AM_OID 403
-DATA(insert OID = 405 (  hash    1 1 f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap
hashrescanhashendscan hashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions
));
+DATA(insert OID = 405 (  hash    1 1 f t f f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap
hashrescanhashendscan hashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions
));
 DESCR("hash index access method");
 #define HASH_AM_OID 405
-DATA(insert OID = 783 (  gist    0 7 f f f t t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap
gistrescangistendscan gistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions
));
+DATA(insert OID = 783 (  gist    0 7 f f f t t t t t t f 0 gistinsert gistbeginscan gistgettuple gistgetbitmap
gistrescangistendscan gistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions
));
 DESCR("GiST index access method");
 #define GIST_AM_OID 783
-DATA(insert OID = 2742 (  gin    0 5 f f f t t f f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan
ginmarkposginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions )); 
+DATA(insert OID = 2742 (  gin    0 5 f f f t t f f t f f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan
ginmarkposginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions )); 
 DESCR("GIN index access method");
 #define GIN_AM_OID 2742

diff --git a/src/include/executor/nodeHeapfetch.h b/src/include/executor/nodeHeapfetch.h
new file mode 100644
index 0000000..4f54ae3
--- /dev/null
+++ b/src/include/executor/nodeHeapfetch.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeHeapfetch.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * $PostgreSQL$
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEHEAPFETCH_H
+#define NODEHEAPFETCH_H
+
+#include "nodes/execnodes.h"
+
+extern int    ExecCountSlotsHeapFetch(HeapFetch *node);
+extern HeapFetchState *ExecInitHeapFetch(HeapFetch *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecHeapFetch(HeapFetchState *node);
+extern void ExecEndHeapFetch(HeapFetchState *node);
+extern void ExecHeapFetchMarkPos(HeapFetchState *node);
+extern void ExecHeapFetchRestrPos(HeapFetchState *node);
+extern void ExecHeapFetchReScan(HeapFetchState *node, ExprContext *exprCtxt);
+
+#endif   /* NODEHEAPFETCH_H */
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index e40082d..b62dd71 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -15,6 +15,7 @@
 #define TUPTABLE_H

 #include "access/htup.h"
+#include "access/itup.h"
 #include "access/tupdesc.h"
 #include "storage/buf.h"

@@ -165,6 +166,9 @@ extern TupleTableSlot *ExecStoreTuple(HeapTuple tuple,
 extern TupleTableSlot *ExecStoreMinimalTuple(MinimalTuple mtup,
                       TupleTableSlot *slot,
                       bool shouldFree);
+extern TupleTableSlot *ExecStoreIndexTuple(IndexTuple itup,
+                      TupleTableSlot *slot,
+                      bool shouldFree);
 extern TupleTableSlot *ExecClearTuple(TupleTableSlot *slot);
 extern TupleTableSlot *ExecStoreVirtualTuple(TupleTableSlot *slot);
 extern TupleTableSlot *ExecStoreAllNullTuple(TupleTableSlot *slot);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 34fcf68..2acf414 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1076,6 +1076,17 @@ typedef struct ScanState
 typedef ScanState SeqScanState;

 /*
+ * HeapFetch uses a bare ScanState as its state node, since it needs
+ * no additional fields.
+ */
+typedef struct HeapFetchState
+{
+    ScanState ss;
+    HeapTupleData hfs_htup;
+    List     *hfs_indexqualorig;
+} HeapFetchState;
+
+/*
  * These structs store information about index quals that don't have simple
  * constant right-hand sides.  See comments for ExecIndexBuildScanKeys()
  * for discussion.
@@ -1109,12 +1120,12 @@ typedef struct
  *        RuntimeContext       expr context for evaling runtime Skeys
  *        RelationDesc       index relation descriptor
  *        ScanDesc           index scan descriptor
+ *        indexonlyqual       execution state for indexonlyqual expressions
  * ----------------
  */
 typedef struct IndexScanState
 {
     ScanState    ss;                /* its first field is NodeTag */
-    List       *indexqualorig;
     ScanKey        iss_ScanKeys;
     int            iss_NumScanKeys;
     IndexRuntimeKeyInfo *iss_RuntimeKeys;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 31135a3..8376158 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -71,6 +71,7 @@ typedef enum NodeTag
     T_Hash,
     T_SetOp,
     T_Limit,
+    T_HeapFetch,
     /* this one isn't a subclass of Plan: */
     T_PlanInvalItem,

@@ -109,6 +110,7 @@ typedef enum NodeTag
     T_HashState,
     T_SetOpState,
     T_LimitState,
+    T_HeapFetchState,

     /*
      * TAGS FOR PRIMITIVE NODES (primnodes.h)
@@ -221,6 +223,7 @@ typedef enum NodeTag
     T_AppendRelInfo,
     T_PlaceHolderInfo,
     T_PlannerParamItem,
+    T_HeapFetchPath,

     /*
      * TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 23a5117..3b58f6c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -264,6 +264,11 @@ typedef Scan SeqScan;
  * table).    This is a bit hokey ... would be cleaner to use a special-purpose
  * node type that could not be mistaken for a regular Var.    But it will do
  * for now.
+ *
+ * indexonlyqual and indexonlyqualorig represent expressions that can't be
+ * used as index keys, but can be evaluated without fetching the heap tuple,
+ * using data from the index. The index must be able to regurgitate index
+ * tuples stored in it.
  * ----------------
  */
 typedef struct IndexScan
@@ -273,8 +278,25 @@ typedef struct IndexScan
     List       *indexqual;        /* list of index quals (OpExprs) */
     List       *indexqualorig;    /* the same in original form */
     ScanDirection indexorderdir;    /* forward or backward or don't care */
+    List       *indexonlyqual;    /* quals that can be satisfied from index */
+    List       *indexonlyqualorig;    /* same in original form. Only needed for EXPLAIN */
+    List       *indexonlytargetlist;
 } IndexScan;

+
+/* ----------------
+ *        heap fetch node
+ *
+ * ----------------
+ */
+typedef struct HeapFetch
+{
+    Scan        scan;
+    AttrNumber    tidIdx;            /* index of TID column in outer node */
+    List       *indexqualorig;    /* for rechecking index quals */
+} HeapFetch;
+
+
 /* ----------------
  *        bitmap index scan node
  *
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 6e7f52b..fdd8195 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -135,6 +135,8 @@ typedef struct Var
     Expr        xpr;
     Index        varno;            /* index of this var's relation in the range
                                  * table (could also be INNER or OUTER) */
+    int            varindexno;        /* index into RelOptInfo.indexlist, or -1 if
+                                 * this Var doesn't refer an index */
     AttrNumber    varattno;        /* attribute number of this var, or zero for
                                  * all */
     Oid            vartype;        /* pg_type OID for the type of this var */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index bbce826..f907483 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -418,6 +418,8 @@ typedef struct IndexOptInfo
 {
     NodeTag        type;

+    int            indexno;
+
     Oid            indexoid;        /* OID of the index relation */
     RelOptInfo *rel;            /* back-link to index's table */

@@ -430,6 +432,7 @@ typedef struct IndexOptInfo
     Oid           *opfamily;        /* OIDs of operator families for columns */
     int           *indexkeys;        /* column numbers of index's keys, or 0 */
     Oid           *opcintype;        /* OIDs of opclass declared input data types */
+    Oid           *atttype;        /* OIDs of stored key data types */
     Oid           *fwdsortop;        /* OIDs of sort operators for each column */
     Oid           *revsortop;        /* OIDs of sort operators for backward scan */
     bool       *nulls_first;    /* do NULLs come first in the sort order? */
@@ -446,6 +449,7 @@ typedef struct IndexOptInfo
     bool        amsearchnulls;    /* can AM search for NULL index entries? */
     bool        amhasgettuple;    /* does AM have amgettuple interface? */
     bool        amhasgetbitmap; /* does AM have amgetbitmap interface? */
+    bool        amreturntuple;    /* can AM return tuples from index? */
 } IndexOptInfo;


@@ -588,6 +592,8 @@ typedef struct Path

     List       *pathkeys;        /* sort ordering of path's output */
     /* pathkeys is a List of PathKey nodes; see above */
+
+    List       *pathtargetlist;
 } Path;

 /*----------
@@ -637,6 +643,7 @@ typedef struct IndexPath
     IndexOptInfo *indexinfo;
     List       *indexclauses;
     List       *indexquals;
+    List       *indexonlyqualorig;
     bool        isjoininner;
     ScanDirection indexscandir;
     Cost        indextotalcost;
@@ -644,6 +651,20 @@ typedef struct IndexPath
     double        rows;            /* estimated number of result tuples */
 } IndexPath;

+
+typedef struct HeapFetchPath
+{
+    Path        path;
+    RelOptInfo *baserel;
+    Path       *subpath;
+
+    IndexOptInfo *indexinfo;
+
+    List       *heapclauses;
+
+    List       *indexqualorig;
+} HeapFetchPath;
+
 /*
  * BitmapHeapPath represents one or more indexscans that generate TID bitmaps
  * instead of directly accessing the heap, followed by AND/OR combinations
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 5111709..4956e86 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -66,7 +66,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
                     double index_pages, PlannerInfo *root);
 extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel);
 extern void cost_index(IndexPath *path, PlannerInfo *root, IndexOptInfo *index,
-           List *indexQuals, RelOptInfo *outer_rel);
+                List *indexQuals, List *indexOnlyQuals, RelOptInfo *outer_rel);
 extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
                       Path *bitmapqual, RelOptInfo *outer_rel);
 extern void cost_bitmap_and_node(BitmapAndPath *path, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 0f4c52e..632c535 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -31,6 +31,7 @@ extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel);
 extern IndexPath *create_index_path(PlannerInfo *root,
                   IndexOptInfo *index,
                   List *clause_groups,
+                  List *indexonlyquals,
                   List *pathkeys,
                   ScanDirection indexscandir,
                   RelOptInfo *outer_rel);
@@ -46,6 +47,9 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
                       List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
                     List *tidquals);
+extern HeapFetchPath *create_heapfetch_path(PlannerInfo *root, RelOptInfo *rel,
+                                            Path *outerpath, IndexOptInfo *index,
+                                            List *heapQuals, List *indexqualorig);
 extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths);
 extern ResultPath *create_result_path(List *quals);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 4f80edc..64d05c8 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,6 +71,10 @@ extern List *expand_indexqual_conditions(IndexOptInfo *index,
                             List *clausegroups);
 extern void check_partial_indexes(PlannerInfo *root, RelOptInfo *rel);
 extern List *flatten_clausegroups_list(List *clausegroups);
+extern void extract_indexonly_quals(PlannerInfo *root, IndexOptInfo *index,
+                        List *clause_groups,
+                        RelOptInfo *outer_rel,
+                        List **heapquals, List **indexonlyqualorig);

 /*
  * orindxpath.c
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 0dd2bcc..e6973e9 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -75,6 +75,8 @@ extern SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy, Plan *lefttree,
 extern Result *make_result(PlannerInfo *root, List *tlist,
             Node *resconstantqual, Plan *subplan);
 extern bool is_projection_capable_plan(Plan *plan);
+extern Expr *make_indexonly_expr(PlannerInfo *root, IndexOptInfo *index,
+                                 Expr *expr, bool *indexonly);

 /*
  * prototypes for plan/initsplan.c

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: WIP: generalized index constraints
Next
From: Michael Meskes
Date:
Subject: Re: clang's static checker report.