Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000 - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000 |
Date | |
Msg-id | 4AAE0FE3.40600@enterprisedb.com Whole thread Raw |
In response to | Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000 (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Resjunk sort columns, Heikki's index-only quals patch,
and bug #5000
Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000 |
List | pgsql-hackers |
Heikki Linnakangas wrote: > Tom Lane wrote: >> It strikes me that in the cases where it wouldn't be necessary to >> compute junk sort-key columns, it would be because we were scanning an >> index that includes those values. So if the plan were set up to pull >> those values from the index and return them, then we'd not have to add >> this extra complexity to grouping_planner --- the argument that it's not >> worth it to get rid of the junk columns comes back into play. Moreover, >> such an ability would also mean that if the user *does* ask for the >> sort column value as output (ie it's not resjunk), we can still satisfy >> the query from the index without recomputing the expensive function. >> >> So this is where we come to the connection to Heikki's index-only-quals >> patch. As submitted, that code is only able to use an index column in >> a scan qual, it's not able to return it as part of the scan result. >> This example makes it clear that that definition is missing a large >> part of the potential benefit of an index value extraction capability. >> >> To be able to do anything along that line would require some more work >> in the executor and a *lot* more work in the planner, and I'm honestly >> not sure what the planner part of it would look like. > > I think we should separate the Heap Fetch operation from the IndexScan. I've been hacking on that approach. It's quite unfinished, but before I spend any more time on it, I'd like to get some feedback on the overall design. The attached patch can create plans where quals are checked and joins are performed using values from indexes only, and the heap tuples are fetched only for matching rows. Passes regression tests, but code is quite ugly at points. Cost estimation is bogus. The patch builds on the indexam-api-changes patch I posted earlier, which is also attached. I haven't yet done the changes to that patch that were discussed. I haven't done any performance testing. The overhead of an extra executor node for each index scan could slow down simple queries, we might need to compensate that somehow, maybe reintroduce a fastpath combined IndexScan+HeapFetch node. I'm also afraid the extra work I've pushed to the stage where Paths are constructed could slow down planning quite a bit if you have a lot of indexes. Path nodes now carry a targetlist. That's because when you have a path like: HeapFetch -> Join ... You won't have all the columns of the join rel available at the join node yet, because they will be fetched in the HeapFetch node above. The targetlist in Path nodes reflect that, and the targetlist of the final Plan nodes are created from the targetlists in the Path nodes instead of the ones in RelOptInfos. Per earlier discussion, I changed the way index tuple fetching works in B-tree, so that it can now be relied on. Matching index tuples are copied to backend-local memory when the scan steps on a page. Var nodes that refer to index columns (indexquals and the new index-only filters) now have a new field, varindexno, set. While we could've continued with the old representation, now that we have more expressions that refer to index vars instead of heap vars, this makes debugging easier. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index b0a911e..205b3e1 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -1491,8 +1491,14 @@ heap_fetch(Relation relation, * On entry, *tid is the TID of a tuple (either a simple tuple, or the root * of a HOT chain), and buffer is the buffer holding this tuple. We search * for the first chain member satisfying the given snapshot. If one is - * found, we update *tid to reference that tuple's offset number, and - * return TRUE. If no match, return FALSE without modifying *tid. + * found, we update *tid and *heapTuple to reference that tuple, and + * return TRUE. If no match, return FALSE without modifying *tid (*heapTuple + * is undefined in that case). + * + * To fetch the next matching tuple in the chain, call again with 'firstCall' + * set to FALSE and *tid still pointing to the tuple returned by previous call. + * (normally only one tuple in a chain can be visible at a time, but that does + * not hold for special snapshots like SnapshotAny) * * If all_dead is not NULL, we check non-visible tuples to see if they are * globally dead; *all_dead is set TRUE if all members of the HOT chain @@ -1504,27 +1510,29 @@ heap_fetch(Relation relation, */ bool heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot, - bool *all_dead) + HeapTuple heapTuple, bool *all_dead, bool firstCall) { Page dp = (Page) BufferGetPage(buffer); TransactionId prev_xmax = InvalidTransactionId; OffsetNumber offnum; bool at_chain_start; + bool skip; - if (all_dead) + if (all_dead && firstCall) *all_dead = true; Assert(TransactionIdIsValid(RecentGlobalXmin)); Assert(ItemPointerGetBlockNumber(tid) == BufferGetBlockNumber(buffer)); offnum = ItemPointerGetOffsetNumber(tid); - at_chain_start = true; + + at_chain_start = firstCall; + skip = !firstCall; /* Scan through possible multiple members of HOT-chain */ for (;;) { ItemId lp; - HeapTupleData heapTuple; /* check for bogus TID */ if (offnum < FirstOffsetNumber || offnum > PageGetMaxOffsetNumber(dp)) @@ -1547,13 +1555,13 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot, break; } - heapTuple.t_data = (HeapTupleHeader) PageGetItem(dp, lp); - heapTuple.t_len = ItemIdGetLength(lp); + heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp); + heapTuple->t_len = ItemIdGetLength(lp); /* * Shouldn't see a HEAP_ONLY tuple at chain start. */ - if (at_chain_start && HeapTupleIsHeapOnly(&heapTuple)) + if (at_chain_start && HeapTupleIsHeapOnly(heapTuple)) break; /* @@ -1562,17 +1570,18 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot, */ if (TransactionIdIsValid(prev_xmax) && !TransactionIdEquals(prev_xmax, - HeapTupleHeaderGetXmin(heapTuple.t_data))) + HeapTupleHeaderGetXmin(heapTuple->t_data))) break; /* If it's visible per the snapshot, we must return it */ - if (HeapTupleSatisfiesVisibility(&heapTuple, snapshot, buffer)) + if (!skip && HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer)) { ItemPointerSetOffsetNumber(tid, offnum); if (all_dead) *all_dead = false; return true; } + skip = false; /* * If we can't see it, maybe no one else can either. At caller @@ -1580,7 +1589,7 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot, * transactions. */ if (all_dead && *all_dead && - HeapTupleSatisfiesVacuum(heapTuple.t_data, RecentGlobalXmin, + HeapTupleSatisfiesVacuum(heapTuple->t_data, RecentGlobalXmin, buffer) != HEAPTUPLE_DEAD) *all_dead = false; @@ -1588,13 +1597,13 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot, * Check to see if HOT chain continues past this tuple; if so fetch * the next offnum and loop around. */ - if (HeapTupleIsHotUpdated(&heapTuple)) + if (HeapTupleIsHotUpdated(heapTuple)) { - Assert(ItemPointerGetBlockNumber(&heapTuple.t_data->t_ctid) == + Assert(ItemPointerGetBlockNumber(&heapTuple->t_data->t_ctid) == ItemPointerGetBlockNumber(tid)); - offnum = ItemPointerGetOffsetNumber(&heapTuple.t_data->t_ctid); + offnum = ItemPointerGetOffsetNumber(&heapTuple->t_data->t_ctid); at_chain_start = false; - prev_xmax = HeapTupleHeaderGetXmax(heapTuple.t_data); + prev_xmax = HeapTupleHeaderGetXmax(heapTuple->t_data); } else break; /* end of chain */ @@ -1616,10 +1625,12 @@ heap_hot_search(ItemPointer tid, Relation relation, Snapshot snapshot, { bool result; Buffer buffer; + HeapTupleData heapTuple; buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid)); LockBuffer(buffer, BUFFER_LOCK_SHARE); - result = heap_hot_search_buffer(tid, buffer, snapshot, all_dead); + result = heap_hot_search_buffer(tid, buffer, snapshot, &heapTuple, + all_dead, true); LockBuffer(buffer, BUFFER_LOCK_UNLOCK); ReleaseBuffer(buffer); return result; diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c index 1c1cd34..4398e8c 100644 --- a/src/backend/access/index/genam.c +++ b/src/backend/access/index/genam.c @@ -99,9 +99,6 @@ RelationGetIndexScan(Relation indexRelation, ItemPointerSetInvalid(&scan->xs_ctup.t_self); scan->xs_ctup.t_data = NULL; scan->xs_cbuf = InvalidBuffer; - scan->xs_hot_dead = false; - scan->xs_next_hot = InvalidOffsetNumber; - scan->xs_prev_xmax = InvalidTransactionId; /* * Let the AM fill in the key and any opaque data it wants. diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index c86cd52..616edae 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -20,7 +20,8 @@ * index_insert - insert an index tuple into a relation * index_markpos - mark a scan position * index_restrpos - restore a scan position - * index_getnext - get the next tuple from a scan + * index_next - advance index scan to next match + * index_fetch - fetch heap tuple for current index scan position * index_getbitmap - get all tuples from a scan * index_bulk_delete - bulk deletion of index tuples * index_vacuum_cleanup - post-deletion cleanup of an index @@ -310,8 +311,6 @@ index_rescan(IndexScanDesc scan, ScanKey key) scan->xs_cbuf = InvalidBuffer; } - scan->xs_next_hot = InvalidOffsetNumber; - scan->kill_prior_tuple = false; /* for safety */ FunctionCall2(procedure, @@ -389,32 +388,28 @@ index_restrpos(IndexScanDesc scan) SCAN_CHECKS; GET_SCAN_PROCEDURE(amrestrpos); - scan->xs_next_hot = InvalidOffsetNumber; - scan->kill_prior_tuple = false; /* for safety */ FunctionCall1(procedure, PointerGetDatum(scan)); } /* ---------------- - * index_getnext - get the next heap tuple from a scan + * index_next - get the next index tuple from a scan * - * The result is the next heap tuple satisfying the scan keys and the - * snapshot, or NULL if no more matching tuples exist. On success, - * the buffer containing the heap tuple is pinned (the pin will be dropped - * at the next index_getnext or index_endscan). + * Advances to the next index tuple satisfying the scan keys, returning TRUE + * if there was one, FALSE otherwise. The heap TID pointer of the next match + * is stored in scan->xs_ctup.self. * * Note: caller must check scan->xs_recheck, and perform rechecking of the * scan keys if required. We do not do that here because we don't have * enough information to do it efficiently in the general case. * ---------------- */ -HeapTuple -index_getnext(IndexScanDesc scan, ScanDirection direction) +bool +index_next(IndexScanDesc scan, ScanDirection direction) { - HeapTuple heapTuple = &scan->xs_ctup; - ItemPointer tid = &heapTuple->t_self; FmgrInfo *procedure; + bool found; SCAN_CHECKS; GET_SCAN_PROCEDURE(amgettuple); @@ -422,220 +417,125 @@ index_getnext(IndexScanDesc scan, ScanDirection direction) Assert(TransactionIdIsValid(RecentGlobalXmin)); /* - * We always reset xs_hot_dead; if we are here then either we are just - * starting the scan, or we previously returned a visible tuple, and in - * either case it's inappropriate to kill the prior index entry. + * The AM's gettuple proc finds the next index entry matching the scan + * keys, and puts the TID in xs_ctup.t_self. It should also set + * scan->xs_recheck, though we pay no attention to that here. */ - scan->xs_hot_dead = false; + found = DatumGetBool(FunctionCall2(procedure, + PointerGetDatum(scan), + Int32GetDatum(direction))); - for (;;) - { - OffsetNumber offnum; - bool at_chain_start; - Page dp; + /* Reset kill flag immediately for safety */ + scan->kill_prior_tuple = false; - if (scan->xs_next_hot != InvalidOffsetNumber) - { - /* - * We are resuming scan of a HOT chain after having returned an - * earlier member. Must still hold pin on current heap page. - */ - Assert(BufferIsValid(scan->xs_cbuf)); - Assert(ItemPointerGetBlockNumber(tid) == - BufferGetBlockNumber(scan->xs_cbuf)); - Assert(TransactionIdIsValid(scan->xs_prev_xmax)); - offnum = scan->xs_next_hot; - at_chain_start = false; - scan->xs_next_hot = InvalidOffsetNumber; - } - else + /* If we're out of index entries, release any held pin on a heap page */ + if (!found) + { + if (BufferIsValid(scan->xs_cbuf)) { - bool found; - Buffer prev_buf; - - /* - * If we scanned a whole HOT chain and found only dead tuples, - * tell index AM to kill its entry for that TID. - */ - scan->kill_prior_tuple = scan->xs_hot_dead; - - /* - * The AM's gettuple proc finds the next index entry matching the - * scan keys, and puts the TID in xs_ctup.t_self (ie, *tid). It - * should also set scan->xs_recheck, though we pay no attention to - * that here. - */ - found = DatumGetBool(FunctionCall2(procedure, - PointerGetDatum(scan), - Int32GetDatum(direction))); - - /* Reset kill flag immediately for safety */ - scan->kill_prior_tuple = false; - - /* If we're out of index entries, break out of outer loop */ - if (!found) - break; - - pgstat_count_index_tuples(scan->indexRelation, 1); - - /* Switch to correct buffer if we don't have it already */ - prev_buf = scan->xs_cbuf; - scan->xs_cbuf = ReleaseAndReadBuffer(scan->xs_cbuf, - scan->heapRelation, - ItemPointerGetBlockNumber(tid)); - - /* - * Prune page, but only if we weren't already on this page - */ - if (prev_buf != scan->xs_cbuf) - heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf, - RecentGlobalXmin); - - /* Prepare to scan HOT chain starting at index-referenced offnum */ - offnum = ItemPointerGetOffsetNumber(tid); - at_chain_start = true; - - /* We don't know what the first tuple's xmin should be */ - scan->xs_prev_xmax = InvalidTransactionId; - - /* Initialize flag to detect if all entries are dead */ - scan->xs_hot_dead = true; + ReleaseBuffer(scan->xs_cbuf); + scan->xs_cbuf = InvalidBuffer; } + return false; + } - /* Obtain share-lock on the buffer so we can examine visibility */ - LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE); + pgstat_count_index_tuples(scan->indexRelation, 1); - dp = (Page) BufferGetPage(scan->xs_cbuf); + return true; +} - /* Scan through possible multiple members of HOT-chain */ - for (;;) - { - ItemId lp; - ItemPointer ctid; - - /* check for bogus TID */ - if (offnum < FirstOffsetNumber || - offnum > PageGetMaxOffsetNumber(dp)) - break; - - lp = PageGetItemId(dp, offnum); - - /* check for unused, dead, or redirected items */ - if (!ItemIdIsNormal(lp)) - { - /* We should only see a redirect at start of chain */ - if (ItemIdIsRedirected(lp) && at_chain_start) - { - /* Follow the redirect */ - offnum = ItemIdGetRedirect(lp); - at_chain_start = false; - continue; - } - /* else must be end of chain */ - break; - } - - /* - * We must initialize all of *heapTuple (ie, scan->xs_ctup) since - * it is returned to the executor on success. - */ - heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp); - heapTuple->t_len = ItemIdGetLength(lp); - ItemPointerSetOffsetNumber(tid, offnum); - heapTuple->t_tableOid = RelationGetRelid(scan->heapRelation); - ctid = &heapTuple->t_data->t_ctid; - - /* - * Shouldn't see a HEAP_ONLY tuple at chain start. (This test - * should be unnecessary, since the chain root can't be removed - * while we have pin on the index entry, but let's make it - * anyway.) - */ - if (at_chain_start && HeapTupleIsHeapOnly(heapTuple)) - break; - - /* - * The xmin should match the previous xmax value, else chain is - * broken. (Note: this test is not optional because it protects - * us against the case where the prior chain member's xmax aborted - * since we looked at it.) - */ - if (TransactionIdIsValid(scan->xs_prev_xmax) && - !TransactionIdEquals(scan->xs_prev_xmax, - HeapTupleHeaderGetXmin(heapTuple->t_data))) - break; - - /* If it's visible per the snapshot, we must return it */ - if (HeapTupleSatisfiesVisibility(heapTuple, scan->xs_snapshot, - scan->xs_cbuf)) - { - /* - * If the snapshot is MVCC, we know that it could accept at - * most one member of the HOT chain, so we can skip examining - * any more members. Otherwise, check for continuation of the - * HOT-chain, and set state for next time. - */ - if (IsMVCCSnapshot(scan->xs_snapshot)) - scan->xs_next_hot = InvalidOffsetNumber; - else if (HeapTupleIsHotUpdated(heapTuple)) - { - Assert(ItemPointerGetBlockNumber(ctid) == - ItemPointerGetBlockNumber(tid)); - scan->xs_next_hot = ItemPointerGetOffsetNumber(ctid); - scan->xs_prev_xmax = HeapTupleHeaderGetXmax(heapTuple->t_data); - } - else - scan->xs_next_hot = InvalidOffsetNumber; - - LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK); - - pgstat_count_heap_fetch(scan->indexRelation); - - return heapTuple; - } - - /* - * If we can't see it, maybe no one else can either. Check to see - * if the tuple is dead to all transactions. If we find that all - * the tuples in the HOT chain are dead, we'll signal the index AM - * to not return that TID on future indexscans. - */ - if (scan->xs_hot_dead && - HeapTupleSatisfiesVacuum(heapTuple->t_data, RecentGlobalXmin, - scan->xs_cbuf) != HEAPTUPLE_DEAD) - scan->xs_hot_dead = false; - - /* - * Check to see if HOT chain continues past this tuple; if so - * fetch the next offnum (we don't bother storing it into - * xs_next_hot, but must store xs_prev_xmax), and loop around. - */ - if (HeapTupleIsHotUpdated(heapTuple)) - { - Assert(ItemPointerGetBlockNumber(ctid) == - ItemPointerGetBlockNumber(tid)); - offnum = ItemPointerGetOffsetNumber(ctid); - at_chain_start = false; - scan->xs_prev_xmax = HeapTupleHeaderGetXmax(heapTuple->t_data); - } - else - break; /* end of chain */ - } /* loop over a single HOT chain */ - - LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK); - - /* Loop around to ask index AM for another TID */ - scan->xs_next_hot = InvalidOffsetNumber; - } - /* Release any held pin on a heap page */ - if (BufferIsValid(scan->xs_cbuf)) +/* ---------------- + * index_fetch - fetch the heap tuple the current index pointer points to + * + * The result is the heap tuple the current index tuple points to if it + * matches the snapshot, or NULL if it doesn't. On success, the buffer + * containing the heap tuple is pinned (the pin will be dropped at the next + * index_fetch or index_endscan). + * ---------------- + */ +HeapTuple +index_fetch(IndexScanDesc scan) +{ + ItemPointer tid = &scan->xs_ctup.t_self; + bool found; + Buffer prev_buf; + + /* + * We only fetch the first matching tuple in a chain, so we can't support + * SnapshotAny. All other snapshots only consider one tuple in a HOT chain + * as visible at any given moment. + */ + Assert(scan->xs_snapshot != SnapshotAny); + + /* Switch to correct buffer if we don't have it already */ + prev_buf = scan->xs_cbuf; + scan->xs_cbuf = ReleaseAndReadBuffer(scan->xs_cbuf, + scan->heapRelation, + ItemPointerGetBlockNumber(tid)); + + /* + * Prune page, but only if we weren't already on this page + */ + if (prev_buf != scan->xs_cbuf) + heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf, + RecentGlobalXmin); + + /* Obtain share-lock on the buffer so we can examine visibility */ + LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE); + + found = heap_hot_search_buffer(tid, scan->xs_cbuf, + scan->xs_snapshot, &scan->xs_ctup, + &scan->kill_prior_tuple, true); + LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK); + + if (found) { - ReleaseBuffer(scan->xs_cbuf); - scan->xs_cbuf = InvalidBuffer; + Assert(!scan->kill_prior_tuple); + + /* + * We must initialize all of *heapTuple (ie, scan->xs_ctup) since + * it is returned to the executor on success. + */ + scan->xs_ctup.t_tableOid = RelationGetRelid(scan->heapRelation); + + pgstat_count_heap_fetch(scan->indexRelation); + + return &scan->xs_ctup; } + else + return NULL; +} - return NULL; /* failure exit */ +/* ---------------- + * index_getnext - get the next heap tuple from a scan + * + * This is a shorthand for index_next() + index_fetch(). + * + * The result is the next heap tuple satisfying the scan keys and the + * snapshot, or NULL if no more matching tuples exist. On success, + * the buffer containing the heap tuple is pinned (the pin will be dropped + * at the next index_getnext or index_endscan). + * + * Note: caller must check scan->xs_recheck, and perform rechecking of the + * scan keys if required. We do not do that here because we don't have + * enough information to do it efficiently in the general case. + * ---------------- + */ +HeapTuple +index_getnext(IndexScanDesc scan, ScanDirection direction) +{ + for(;;) + { + HeapTuple tup; + + if (!index_next(scan, direction)) + return NULL; + + tup = index_fetch(scan); + if (tup != NULL) + return tup; + } } /* ---------------- diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index a6ba2ec..6fb810e 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -772,6 +772,7 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex) TransactionId OldestXmin; TransactionId FreezeXid; RewriteState rwstate; + ItemPointerData tid; /* * Open the relations we need. @@ -829,19 +830,56 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex) scan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, (ScanKey) NULL); - while ((tuple = index_getnext(scan, ForwardScanDirection)) != NULL) + tuple = NULL; + for(;;) { HeapTuple copiedTuple; bool isdead; int i; + bool found; CHECK_FOR_INTERRUPTS(); - /* Since we used no scan keys, should never need to recheck */ - if (scan->xs_recheck) - elog(ERROR, "CLUSTER does not support lossy index conditions"); + /* Advance to next index tuple if we're done with current HOT chain */ + if (tuple == NULL) + { + if (!index_next(scan, ForwardScanDirection)) + break; + + /* Since we used no scan keys, should never need to recheck */ + if (scan->xs_recheck) + elog(ERROR, "CLUSTER does not support lossy index conditions"); + + tid = scan->xs_ctup.t_self; - LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE); + scan->xs_cbuf = ReleaseAndReadBuffer(scan->xs_cbuf, OldHeap, + ItemPointerGetBlockNumber(&tid)); + /* Obtain share-lock on the buffer so we can examine visibility */ + LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE); + found = heap_hot_search_buffer(&tid, scan->xs_cbuf, + scan->xs_snapshot, &scan->xs_ctup, + &scan->kill_prior_tuple, true); + if (!found) + { + LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK); + break; + } + } + else + { + /* Fetch next tuple from current HOT chain */ + LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE); + found = heap_hot_search_buffer(&tid, scan->xs_cbuf, + scan->xs_snapshot, &scan->xs_ctup, + &scan->kill_prior_tuple, false); + if (!found) + { + LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK); + tuple = NULL; + continue; + } + } + tuple = &scan->xs_ctup; switch (HeapTupleSatisfiesVacuum(tuple->t_data, OldestXmin, scan->xs_cbuf)) diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c index eba7063..e511abd 100644 --- a/src/backend/executor/nodeBitmapHeapscan.c +++ b/src/backend/executor/nodeBitmapHeapscan.c @@ -382,9 +382,11 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres) { OffsetNumber offnum = tbmres->offsets[curslot]; ItemPointerData tid; + HeapTupleData heapTuple; ItemPointerSet(&tid, page, offnum); - if (heap_hot_search_buffer(&tid, buffer, snapshot, NULL)) + if (heap_hot_search_buffer(&tid, buffer, snapshot, &heapTuple, + NULL, true)) scan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid); } } diff --git a/src/include/access/genam.h b/src/include/access/genam.h index ae23a19..cfe2b60 100644 --- a/src/include/access/genam.h +++ b/src/include/access/genam.h @@ -145,6 +145,9 @@ extern void index_endscan(IndexScanDesc scan); extern void index_markpos(IndexScanDesc scan); extern void index_restrpos(IndexScanDesc scan); extern HeapTuple index_getnext(IndexScanDesc scan, ScanDirection direction); +extern bool index_next(IndexScanDesc scan, ScanDirection direction); +extern HeapTuple index_fetch(IndexScanDesc scan); + extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap); extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info, diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index f8395fe..a5781db 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -83,7 +83,8 @@ extern bool heap_fetch(Relation relation, Snapshot snapshot, HeapTuple tuple, Buffer *userbuf, bool keep_buf, Relation stats_relation); extern bool heap_hot_search_buffer(ItemPointer tid, Buffer buffer, - Snapshot snapshot, bool *all_dead); + Snapshot snapshot, HeapTuple tuple, + bool *all_dead, bool firstCall); extern bool heap_hot_search(ItemPointer tid, Relation relation, Snapshot snapshot, bool *all_dead); diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h index 47b95c2..3bac6d3 100644 --- a/src/include/access/relscan.h +++ b/src/include/access/relscan.h @@ -72,16 +72,11 @@ typedef struct IndexScanDescData /* index access method's private state */ void *opaque; /* access-method-specific info */ - /* xs_ctup/xs_cbuf/xs_recheck are valid after a successful index_getnext */ + /* xs_ctup/xs_cbuf/xs_recheck are valid after a successful index_next */ HeapTupleData xs_ctup; /* current heap tuple, if any */ Buffer xs_cbuf; /* current heap buffer in scan, if any */ /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */ bool xs_recheck; /* T means scan keys must be rechecked */ - - /* state data for traversing HOT chains in index_getnext */ - bool xs_hot_dead; /* T if all members of HOT chain are dead */ - OffsetNumber xs_next_hot; /* next member of HOT chain, if any */ - TransactionId xs_prev_xmax; /* previous HOT chain member's XMAX, if any */ } IndexScanDescData; /* Struct for heap-or-index scans of system tables */ diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c index 4398e8c..6e7807e 100644 --- a/src/backend/access/index/genam.c +++ b/src/backend/access/index/genam.c @@ -93,12 +93,14 @@ RelationGetIndexScan(Relation indexRelation, scan->kill_prior_tuple = false; scan->ignore_killed_tuples = true; /* default setting */ + scan->needIndexTuple = false; scan->opaque = NULL; ItemPointerSetInvalid(&scan->xs_ctup.t_self); scan->xs_ctup.t_data = NULL; scan->xs_cbuf = InvalidBuffer; + scan->xs_itup = NULL; /* * Let the AM fill in the key and any opaque data it wants. diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index 616edae..3d6d404 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -398,7 +398,9 @@ index_restrpos(IndexScanDesc scan) * * Advances to the next index tuple satisfying the scan keys, returning TRUE * if there was one, FALSE otherwise. The heap TID pointer of the next match - * is stored in scan->xs_ctup.self. + * is stored in scan->xs_ctup.self. If scan->needIndexTuple is TRUE, the + * index tuple itself is stored in scan->xs_itup, if the indexam succeeded in + * fetching it. * * Note: caller must check scan->xs_recheck, and perform rechecking of the * scan keys if required. We do not do that here because we don't have diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index 79d4c66..decc57b 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -281,6 +281,9 @@ btgettuple(PG_FUNCTION_ARGS) else res = _bt_first(scan, dir); + if (scan->needIndexTuple && res) + scan->xs_itup = (IndexTuple) &so->currPos.buffer[so->currPos.items[so->currPos.itemIndex].bufferOffset]; + PG_RETURN_BOOL(res); } @@ -368,6 +371,8 @@ btrescan(PG_FUNCTION_ARGS) so->keyData = NULL; so->killedItems = NULL; /* until needed */ so->numKilled = 0; + so->currPos.buffer = so->markPos.buffer = NULL; + so->currPos.bufferUsed = so->markPos.bufferUsed = 0; scan->opaque = so; } @@ -379,12 +384,14 @@ btrescan(PG_FUNCTION_ARGS) _bt_killitems(scan, false); ReleaseBuffer(so->currPos.buf); so->currPos.buf = InvalidBuffer; + so->currPos.bufferUsed = 0; } if (BTScanPosIsValid(so->markPos)) { ReleaseBuffer(so->markPos.buf); so->markPos.buf = InvalidBuffer; + so->markPos.bufferUsed = 0; } so->markItemIndex = -1; @@ -418,12 +425,24 @@ btendscan(PG_FUNCTION_ARGS) _bt_killitems(scan, false); ReleaseBuffer(so->currPos.buf); so->currPos.buf = InvalidBuffer; + + if (so->currPos.buffer != NULL) + { + pfree(so->currPos.buffer); + so->currPos.buffer = NULL; + } } if (BTScanPosIsValid(so->markPos)) { ReleaseBuffer(so->markPos.buf); so->markPos.buf = InvalidBuffer; + + if (so->markPos.buffer != NULL) + { + pfree(so->markPos.buffer); + so->markPos.buffer = NULL; + } } so->markItemIndex = -1; @@ -450,6 +469,7 @@ btmarkpos(PG_FUNCTION_ARGS) { ReleaseBuffer(so->markPos.buf); so->markPos.buf = InvalidBuffer; + so->markPos.bufferUsed = 0; } /* @@ -494,6 +514,7 @@ btrestrpos(PG_FUNCTION_ARGS) _bt_killitems(scan, false); ReleaseBuffer(so->currPos.buf); so->currPos.buf = InvalidBuffer; + so->currPos.bufferUsed = 0; } if (BTScanPosIsValid(so->markPos)) @@ -503,6 +524,11 @@ btrestrpos(PG_FUNCTION_ARGS) memcpy(&so->currPos, &so->markPos, offsetof(BTScanPosData, items[1]) + so->markPos.lastItem * sizeof(BTScanPosItem)); + + if (so->markPos.bufferUsed > 0) + memcpy(so->currPos.buffer, so->markPos.buffer, + so->markPos.bufferUsed); + so->currPos.bufferUsed = so->markPos.bufferUsed; } } diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c index 12b4528..cde3a39 100644 --- a/src/backend/access/nbtree/nbtsearch.c +++ b/src/backend/access/nbtree/nbtsearch.c @@ -1004,6 +1004,10 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum) */ so->currPos.nextPage = opaque->btpo_next; + if (scan->needIndexTuple && so->currPos.buffer == NULL) + so->currPos.buffer = palloc(BLCKSZ); + so->currPos.bufferUsed = 0; + if (ScanDirectionIsForward(dir)) { /* load items[] in ascending order */ @@ -1019,6 +1023,17 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum) /* _bt_checkkeys put the heap ptr into scan->xs_ctup.t_self */ so->currPos.items[itemIndex].heapTid = scan->xs_ctup.t_self; so->currPos.items[itemIndex].indexOffset = offnum; + if (scan->needIndexTuple) + { + Item itup = PageGetItem(page, PageGetItemId(page, offnum)); + Size size = IndexTupleSize(itup); + + so->currPos.items[itemIndex].bufferOffset = so->currPos.bufferUsed; + memcpy(&so->currPos.buffer[so->currPos.bufferUsed], + itup, size); + so->currPos.bufferUsed += size; + + } itemIndex++; } if (!continuescan) @@ -1052,6 +1067,18 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum) itemIndex--; so->currPos.items[itemIndex].heapTid = scan->xs_ctup.t_self; so->currPos.items[itemIndex].indexOffset = offnum; + + if (scan->needIndexTuple) + { + Item itup = PageGetItem(page, PageGetItemId(page, offnum)); + Size size = IndexTupleSize(itup); + + so->currPos.items[itemIndex].bufferOffset = so->currPos.bufferUsed; + memcpy(&so->currPos.buffer[so->currPos.bufferUsed], + itup, size); + so->currPos.bufferUsed += size; + + } } if (!continuescan) { @@ -1111,6 +1138,16 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir) offsetof(BTScanPosData, items[1]) + so->currPos.lastItem * sizeof(BTScanPosItem)); so->markPos.itemIndex = so->markItemIndex; + + if (so->currPos.bufferUsed > 0) + { + if (so->markPos.buffer == NULL) + so->markPos.buffer = palloc(BLCKSZ); + memcpy(so->markPos.buffer, so->currPos.buffer, + so->currPos.bufferUsed); + } + so->markPos.bufferUsed = so->currPos.bufferUsed; + so->markItemIndex = -1; } @@ -1130,6 +1167,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir) /* release the previous buffer */ _bt_relbuf(rel, so->currPos.buf); so->currPos.buf = InvalidBuffer; + so->currPos.bufferUsed = 0; /* if we're at end of scan, give up */ if (blkno == P_NONE || !so->currPos.moreRight) return false; @@ -1165,6 +1203,8 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir) */ for (;;) { + so->currPos.bufferUsed = 0; + /* Done if we know there are no matching keys to the left */ if (!so->currPos.moreLeft) { diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index c4fa17c..c884ddf 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -626,6 +626,9 @@ ExplainNode(Plan *plan, PlanState *planstate, case T_BitmapHeapScan: pname = sname = "Bitmap Heap Scan"; break; + case T_HeapFetch: + pname = sname = "Heap Fetch"; + break; case T_TidScan: pname = sname = "Tid Scan"; break; @@ -782,6 +785,7 @@ ExplainNode(Plan *plan, PlanState *planstate, /* FALL THRU */ case T_SeqScan: case T_BitmapHeapScan: + case T_HeapFetch: case T_TidScan: case T_SubqueryScan: case T_FunctionScan: @@ -967,6 +971,7 @@ ExplainNode(Plan *plan, PlanState *planstate, case T_CteScan: case T_WorkTableScan: case T_SubqueryScan: + case T_HeapFetch: show_scan_qual(plan->qual, "Filter", plan, outer_plan, es); break; case T_TidScan: @@ -1043,8 +1048,9 @@ ExplainNode(Plan *plan, PlanState *planstate, * nodes, but in bitmap scan trees we must, since the bottom * BitmapIndexScan nodes may have outer references. */ + /* same with heap fetches */ ExplainNode(outerPlan(plan), outerPlanState(planstate), - IsA(plan, BitmapHeapScan) ? outer_plan : NULL, + (IsA(plan, BitmapHeapScan) || IsA(plan, HeapFetch)) ? outer_plan : NULL, "Outer", NULL, es); } @@ -1329,6 +1335,7 @@ ExplainScanTarget(Scan *plan, ExplainState *es) case T_SeqScan: case T_IndexScan: case T_BitmapHeapScan: + case T_HeapFetch: case T_TidScan: /* Assert it's on a real relation */ Assert(rte->rtekind == RTE_RELATION); diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile index 4d70f98..9adcb53 100644 --- a/src/backend/executor/Makefile +++ b/src/backend/executor/Makefile @@ -22,6 +22,6 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \ nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \ nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \ nodeLimit.o nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \ - nodeWindowAgg.o tstoreReceiver.o spi.o + nodeWindowAgg.o nodeHeapfetch.o tstoreReceiver.o spi.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c index 25f350f..5d935f3 100644 --- a/src/backend/executor/execAmi.c +++ b/src/backend/executor/execAmi.c @@ -26,6 +26,7 @@ #include "executor/nodeGroup.h" #include "executor/nodeHash.h" #include "executor/nodeHashjoin.h" +#include "executor/nodeHeapfetch.h" #include "executor/nodeIndexscan.h" #include "executor/nodeLimit.h" #include "executor/nodeMaterial.h" @@ -159,6 +160,10 @@ ExecReScan(PlanState *node, ExprContext *exprCtxt) ExecBitmapHeapReScan((BitmapHeapScanState *) node, exprCtxt); break; + case T_HeapFetchState: + ExecHeapFetchReScan((HeapFetchState *) node, exprCtxt); + break; + case T_TidScanState: ExecTidReScan((TidScanState *) node, exprCtxt); break; @@ -376,6 +381,13 @@ ExecSupportsMarkRestore(NodeTag plantype) */ return false; + case T_HeapFetch: + /* + * T_HeapFetch has the same problem as T_Result above. But for + * HeapFetch it would actually be useful + */ + return false; + default: break; } diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c index 15af711..e9edb9d 100644 --- a/src/backend/executor/execProcnode.c +++ b/src/backend/executor/execProcnode.c @@ -90,6 +90,7 @@ #include "executor/nodeGroup.h" #include "executor/nodeHash.h" #include "executor/nodeHashjoin.h" +#include "executor/nodeHeapfetch.h" #include "executor/nodeIndexscan.h" #include "executor/nodeLimit.h" #include "executor/nodeMaterial.h" @@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags) estate, eflags); break; + case T_HeapFetch: + result = (PlanState *) ExecInitHeapFetch((HeapFetch *) node, + estate, eflags); + break; + case T_TidScan: result = (PlanState *) ExecInitTidScan((TidScan *) node, estate, eflags); @@ -373,6 +379,10 @@ ExecProcNode(PlanState *node) result = ExecBitmapHeapScan((BitmapHeapScanState *) node); break; + case T_HeapFetchState: + result = ExecHeapFetch((HeapFetchState *) node); + break; + case T_TidScanState: result = ExecTidScan((TidScanState *) node); break; @@ -566,6 +576,9 @@ ExecCountSlotsNode(Plan *node) case T_BitmapHeapScan: return ExecCountSlotsBitmapHeapScan((BitmapHeapScan *) node); + case T_HeapFetch: + return ExecCountSlotsHeapFetch((HeapFetch *) node); + case T_TidScan: return ExecCountSlotsTidScan((TidScan *) node); @@ -705,6 +718,10 @@ ExecEndNode(PlanState *node) ExecEndBitmapHeapScan((BitmapHeapScanState *) node); break; + case T_HeapFetchState: + ExecEndHeapFetch((HeapFetchState *) node); + break; + case T_TidScanState: ExecEndTidScan((TidScanState *) node); break; diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c index 19fa4e6..ff937e3 100644 --- a/src/backend/executor/execScan.c +++ b/src/backend/executor/execScan.c @@ -211,7 +211,7 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc if (!var || !IsA(var, Var)) return false; /* tlist item not a Var */ /* if these Asserts fail, planner messed up */ - Assert(var->varno == varno); + Assert(var->varno == varno || var->varno == OUTER ); Assert(var->varlevelsup == 0); if (var->varattno != attrno) return false; /* out of order */ diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c index c983066..b3c3f37 100644 --- a/src/backend/executor/execTuples.c +++ b/src/backend/executor/execTuples.c @@ -92,6 +92,7 @@ #include "postgres.h" #include "funcapi.h" +#include "access/itup.h" #include "catalog/pg_type.h" #include "nodes/nodeFuncs.h" #include "storage/bufmgr.h" @@ -578,6 +579,51 @@ ExecStoreVirtualTuple(TupleTableSlot *slot) return slot; } +TupleTableSlot * +ExecStoreIndexTuple(IndexTuple tuple, TupleTableSlot *slot, bool shouldFree) +{ + int natts; + int i; + + /* + * sanity checks + */ + Assert(tuple != NULL); + Assert(slot != NULL); + Assert(slot->tts_tupleDescriptor != NULL); + + /* + * Free any old physical tuple belonging to the slot. + */ + if (slot->tts_shouldFree) + heap_freetuple(slot->tts_tuple); + if (slot->tts_shouldFreeMin) + heap_free_minimal_tuple(slot->tts_mintuple); + + /* + * Store the new tuple into the specified slot. + */ + slot->tts_isempty = false; + slot->tts_shouldFree = shouldFree; + slot->tts_shouldFreeMin = false; + slot->tts_tuple = NULL; + slot->tts_mintuple = NULL; + + natts = slot->tts_tupleDescriptor->natts; + for (i = 0; i < natts - 1; i++) + { + slot->tts_values[i] = index_getattr(tuple, i + 1, + slot->tts_tupleDescriptor, + &slot->tts_isnull[i]); + } + slot->tts_values[i] = PointerGetDatum(&tuple->t_tid); + slot->tts_isnull[i] = false; + + slot->tts_nvalid = natts; + + return slot; +} + /* -------------------------------- * ExecStoreAllNullTuple * Set up the slot to contain a null in every column. diff --git a/src/backend/executor/nodeHeapfetch.c b/src/backend/executor/nodeHeapfetch.c new file mode 100644 index 0000000..bd08ac3 --- /dev/null +++ b/src/backend/executor/nodeHeapfetch.c @@ -0,0 +1,384 @@ +/*------------------------------------------------------------------------- + * + * nodeHeapfetch.c + * Routines to fetch additional columns from heap and check visibility + * + * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * $PostgreSQL$ + * + *------------------------------------------------------------------------- + */ +/* + * INTERFACE ROUTINES + * + * ExecHeapFetch scans a relation using tids + * ExecInitHeapFetch creates and initializes state info. + * ExecHeapFetchReScan rescans the tid relation. + * ExecEndHeapFetch releases all storage. + * ExecHeapFetchMarkPos marks scan position. + * ExecHeapFetchRestrPos restores scan position. + */ +#include "postgres.h" + +#include "access/heapam.h" +#include "access/sysattr.h" +#include "catalog/pg_type.h" +#include "executor/execdebug.h" +#include "executor/nodeHeapfetch.h" +#include "optimizer/clauses.h" +#include "storage/bufmgr.h" +#include "utils/array.h" +#include "utils/memutils.h" + +static TupleTableSlot *HeapFetchNext(HeapFetchState *node); + +#define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X)) +#define ItemPointerGetDatum(X) PointerGetDatum(X) + +/* ---------------------------------------------------------------- + * HeapFetchNext + * + * Retrieve a tuple from the HeapFetch node's currentRelation + * using the tid returned from the outer node. + * + * ---------------------------------------------------------------- + */ +static TupleTableSlot * +HeapFetchNext(HeapFetchState *node) +{ + EState *estate; + Snapshot snapshot; + Relation heapRelation; + TupleTableSlot *slot; + Index scanrelid; + Buffer buffer = InvalidBuffer; + HeapFetch *plan = (HeapFetch *) node->ss.ps.plan; + ExprContext *econtext; + + /* + * extract necessary information from tid scan node + */ + estate = node->ss.ps.state; + snapshot = estate->es_snapshot; + heapRelation = node->ss.ss_currentRelation; + slot = node->ss.ss_ScanTupleSlot; + scanrelid = ((HeapFetch *) node->ss.ps.plan)->scan.scanrelid; + econtext = node->ss.ps.ps_ExprContext; + + /* + * Check if we are evaluating PlanQual for tuple of this relation. + * Additional checking is not good, but no other way for now. We could + * introduce new nodes for this case and handle HeapFetch --> NewNode + * switching in Init/ReScan plan... + */ + if (estate->es_evTuple != NULL && + estate->es_evTuple[scanrelid - 1] != NULL) + { + if (estate->es_evTupleNull[scanrelid - 1]) + return ExecClearTuple(slot); + + ExecStoreTuple(estate->es_evTuple[scanrelid - 1], + slot, InvalidBuffer, false); + + /* Flag for the next call that no more tuples */ + estate->es_evTupleNull[scanrelid - 1] = true; + return slot; + } + + for(;;) + { + PlanState *outerNode; + TupleTableSlot *outerslot; + bool isnull; + ItemPointer tid; + HeapTuple tuple = &(node->hfs_htup); + bool visible; + bool recheck; + + outerNode = outerPlanState(node); + outerslot = ExecProcNode(outerNode); + + if (TupIsNull(outerslot)) + { + if (BufferIsValid(buffer)) + ReleaseBuffer(buffer); + return NULL; + } + + /* + * place the current tuple into the expr context + */ + econtext->ecxt_outertuple = outerslot; + + /* fetch tid from outer plan */ + tid = DatumGetItemPointer(slot_getattr(outerslot, plan->tidIdx, &isnull)); + Assert(!isnull); + + tuple->t_self = *tid; + + /* + * XXX: As a quick hack, the IndexScan node sets the high bit in posid + * if the original index qual needs to be rechecked + */ + if (tuple->t_self.ip_posid & (1<<15)) + { + tuple->t_self.ip_posid &= ~(1<<15); + recheck = true; + } + else + recheck = false; + + buffer = ReleaseAndReadBuffer(buffer, heapRelation, + ItemPointerGetBlockNumber(tid)); + LockBuffer(buffer, BUFFER_LOCK_SHARE); + visible = heap_hot_search_buffer(&tuple->t_self, buffer, snapshot, tuple, NULL, true); + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + if (visible) + { + /* + * store the scanned tuple in the scan tuple slot of the scan + * state. Eventually we will only do this and not return a tuple. + * Note: we pass 'false' because tuples returned by amgetnext are + * pointers onto disk pages and were not created with palloc() and + * so should not be pfree()'d. + */ + ExecStoreTuple(tuple, /* tuple to store */ + slot, /* slot to store in */ + buffer, /* buffer associated with tuple */ + false); /* don't pfree */ + + if (recheck) + { + econtext->ecxt_scantuple = slot; + ResetExprContext(econtext); + if (!ExecQual(node->hfs_indexqualorig, econtext, false)) + { + continue; /* nope, so ask index for another one */ + } + } + + /* + * At this point we have an extra pin on the buffer, because + * ExecStoreTuple incremented the pin count. Drop our local pin. + */ + ReleaseBuffer(buffer); + + return slot; + } + } + + /* + * if we get here it means the tid scan failed so we are at the end of the + * scan.. + */ + return ExecClearTuple(slot); +} + +/* ---------------------------------------------------------------- + * ExecHeapFetchScan(node) + * + * Retrieves the next tid from the outer node, and fetches the + * corresponding heap tuple. + * ---------------------------------------------------------------- + */ +TupleTableSlot * +ExecHeapFetch(HeapFetchState *node) +{ + /* + * use HeapFetchNext as access method + */ + return ExecScan(&node->ss, (ExecScanAccessMtd) HeapFetchNext); +} + +/* ---------------------------------------------------------------- + * ExecHeapFetchReScan(node) + * ---------------------------------------------------------------- + */ +void +ExecHeapFetchReScan(HeapFetchState *node, ExprContext *exprCtxt) +{ + EState *estate; + Index scanrelid; + + estate = node->ss.ps.state; + scanrelid = ((HeapFetch *) node->ss.ps.plan)->scan.scanrelid; + + node->ss.ps.ps_TupFromTlist = false; + + /* If we are being passed an outer tuple, save it for runtime key calc */ + if (exprCtxt != NULL) + node->ss.ps.ps_ExprContext->ecxt_outertuple = + exprCtxt->ecxt_outertuple; + + /* If this is re-scanning of PlanQual ... */ + if (estate->es_evTuple != NULL && + estate->es_evTuple[scanrelid - 1] != NULL) + { + estate->es_evTupleNull[scanrelid - 1] = false; + return; + } + + /* + * if chgParam of subnodes is not null then plans will be re-scanned by + * first ExecProcNode. + */ + if (((PlanState *) node)->lefttree->chgParam == NULL) + ExecReScan(((PlanState *) node)->lefttree, exprCtxt); +} + +/* ---------------------------------------------------------------- + * ExecEndHeapFetch + * + * Releases any storage allocated through C routines. + * Returns nothing. + * ---------------------------------------------------------------- + */ +void +ExecEndHeapFetch(HeapFetchState *node) +{ + /* + * Free the exprcontext + */ + ExecFreeExprContext(&node->ss.ps); + + /* + * clear out tuple table slots + */ + ExecClearTuple(node->ss.ps.ps_ResultTupleSlot); + ExecClearTuple(node->ss.ss_ScanTupleSlot); + + /* + * close the heap relation. + */ + ExecCloseScanRelation(node->ss.ss_currentRelation); + + /* + * shut down the subplan + */ + ExecEndNode(outerPlanState(node)); +} + +/* ---------------------------------------------------------------- + * ExecHeapFetchMarkPos + * + * XXX This is dead code at the moment, for the lack of support + * in ExecSupportsMarkRestore. + * ---------------------------------------------------------------- + */ +void +ExecHeapFetchMarkPos(HeapFetchState *node) +{ + ExecMarkPos(outerPlanState(node)); +} + +/* ---------------------------------------------------------------- + * ExecHeapFetchRestrPos + * + * XXX This is dead code at the moment, for the lack of support + * in ExecSupportsMarkRestore. + * ---------------------------------------------------------------- + */ +void +ExecHeapFetchRestrPos(HeapFetchState *node) +{ + ExecRestrPos(outerPlanState(node)); +} + +/* ---------------------------------------------------------------- + * ExecInitHeapFetch + * + * Initializes the tid scan's state information, creates + * scan keys, and opens the base and tid relations. + * + * Parameters: + * node: HeapFetch node produced by the planner. + * estate: the execution state initialized in InitPlan. + * ---------------------------------------------------------------- + */ +HeapFetchState * +ExecInitHeapFetch(HeapFetch *node, EState *estate, int eflags) +{ + HeapFetchState *hfstate; + Relation currentRelation; + Plan *outerPlan; + + /* + * create state structure + */ + hfstate = makeNode(HeapFetchState); + hfstate->ss.ps.plan = (Plan *) node; + hfstate->ss.ps.state = estate; + + /* + * Miscellaneous initialization + * + * create expression context for node + */ + ExecAssignExprContext(estate, &hfstate->ss.ps); + + hfstate->ss.ps.ps_TupFromTlist = false; + + /* + * Initialize child expressions. The indexqualorig expression is always + * initialized even though it will only be used in some uncommon cases + * --- would be nice to improve that. + */ + hfstate->ss.ps.targetlist = (List *) + ExecInitExpr((Expr *) node->scan.plan.targetlist, + (PlanState *) hfstate); + hfstate->ss.ps.qual = (List *) + ExecInitExpr((Expr *) node->scan.plan.qual, + (PlanState *) hfstate); + hfstate->hfs_indexqualorig = (List *) + ExecInitExpr((Expr *) node->indexqualorig, + (PlanState *) hfstate); + +#define HEAPFETCH_NSLOTS 2 + + /* + * tuple table initialization + */ + ExecInitResultTupleSlot(estate, &hfstate->ss.ps); + ExecInitScanTupleSlot(estate, &hfstate->ss); + + /* + * open the base relation and acquire appropriate lock on it. + */ + currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid); + + hfstate->ss.ss_currentRelation = currentRelation; + hfstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */ + + /* + * get the scan type from the relation descriptor. + */ + ExecAssignScanType(&hfstate->ss, RelationGetDescr(currentRelation)); + + /* + * initialize child nodes + */ + outerPlan = outerPlan(node); + outerPlanState(hfstate) = ExecInitNode(outerPlan, estate, eflags); + + /* + * Initialize result tuple type and projection info. + */ + ExecAssignResultTypeFromTL(&hfstate->ss.ps); + ExecAssignScanProjectionInfo(&hfstate->ss); + + /* + * all done. + */ + return hfstate; +} + +int +ExecCountSlotsHeapFetch(HeapFetch *node) +{ + return ExecCountSlotsNode(outerPlan((Plan *) node)) + + ExecCountSlotsNode(innerPlan((Plan *) node)) + HEAPFETCH_NSLOTS; +} diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c index d8ec8b3..2035302 100644 --- a/src/backend/executor/nodeIndexscan.c +++ b/src/backend/executor/nodeIndexscan.c @@ -27,6 +27,7 @@ #include "access/genam.h" #include "access/nbtree.h" #include "access/relscan.h" +#include "catalog/pg_type.h" #include "executor/execdebug.h" #include "executor/nodeIndexscan.h" #include "optimizer/clauses.h" @@ -53,8 +54,8 @@ IndexNext(IndexScanState *node) ScanDirection direction; IndexScanDesc scandesc; Index scanrelid; - HeapTuple tuple; TupleTableSlot *slot; + IndexScan *plan = (IndexScan *) node->ss.ps.plan; /* * extract necessary information from index scan node @@ -62,7 +63,7 @@ IndexNext(IndexScanState *node) estate = node->ss.ps.state; direction = estate->es_direction; /* flip direction if this is an overall backward scan */ - if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir)) + if (ScanDirectionIsBackward(plan->indexorderdir)) { if (ScanDirectionIsForward(direction)) direction = BackwardScanDirection; @@ -72,7 +73,7 @@ IndexNext(IndexScanState *node) scandesc = node->iss_ScanDesc; econtext = node->ss.ps.ps_ExprContext; slot = node->ss.ss_ScanTupleSlot; - scanrelid = ((IndexScan *) node->ss.ps.plan)->scan.scanrelid; + scanrelid = plan->scan.scanrelid; /* * Check if we are evaluating PlanQual for tuple of this relation. @@ -94,9 +95,6 @@ IndexNext(IndexScanState *node) ResetExprContext(econtext); - if (!ExecQual(node->indexqualorig, econtext, false)) - ExecClearTuple(slot); /* would not be returned by scan */ - /* Flag for the next call that no more tuples */ estate->es_evTupleNull[scanrelid - 1] = true; @@ -106,28 +104,39 @@ IndexNext(IndexScanState *node) /* * ok, now that we have what we need, fetch the next tuple. */ - while ((tuple = index_getnext(scandesc, direction)) != NULL) + while (index_next(scandesc, direction)) { /* - * Store the scanned tuple in the scan tuple slot of the scan state. - * Note: we pass 'false' because tuples returned by amgetnext are - * pointers onto disk pages and must not be pfree()'d. + * Before we fetch the heap tuple, see if we can evaluate quals + * using the index tuple. */ - ExecStoreTuple(tuple, /* tuple to store */ - slot, /* slot to store in */ - scandesc->xs_cbuf, /* buffer containing tuple */ - false); /* don't pfree */ + if (scandesc->xs_itup) + { + ExecStoreIndexTuple(scandesc->xs_itup, slot, false); + } + else + { + Assert(!scandesc->needIndexTuple); + Assert(slot->tts_tupleDescriptor->natts == 1); + + ExecClearTuple(slot); + slot->tts_values[0] = PointerGetDatum(&scandesc->xs_ctup.t_self); + slot->tts_isnull[0] = false; + ExecStoreVirtualTuple(slot); + } /* * If the index was lossy, we have to recheck the index quals using - * the real tuple. + * the real tuple in the HeapFetch node above. + * + * XXX: As a quick hack to signal to the HeapFetch node that this + * tuple needs to be rechecked, set the high bit in the posid field of + * the tid. */ if (scandesc->xs_recheck) { - econtext->ecxt_scantuple = slot; - ResetExprContext(econtext); - if (!ExecQual(node->indexqualorig, econtext, false)) - continue; /* nope, so ask index for another one */ + scandesc->xs_ctup.t_self.ip_posid |= (1<<15); + slot->tts_values[0] = PointerGetDatum(&scandesc->xs_ctup.t_self); } return slot; @@ -501,6 +510,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags) IndexScanState *indexstate; Relation currentRelation; bool relistarget; + bool needIndexTuple; /* * create state structure @@ -522,19 +532,20 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags) * initialize child expressions * * Note: we don't initialize all of the indexqual expression, only the - * sub-parts corresponding to runtime keys (see below). The indexqualorig - * expression is always initialized even though it will only be used in - * some uncommon cases --- would be nice to improve that. (Problem is + * sub-parts corresponding to runtime keys (see below). (Problem is * that any SubPlans present in the expression must be found now...) + * + * The targetlist and qual expressions in the plan node refer to columns + * in the heap, for the convenience of EXPLAIN and consistency with all + * other node types. But we return an index tuple, so we use versions of + * those expressions that have been transformed to refer index columns + * instead. */ indexstate->ss.ps.targetlist = (List *) - ExecInitExpr((Expr *) node->scan.plan.targetlist, + ExecInitExpr((Expr *) node->indexonlytargetlist, (PlanState *) indexstate); indexstate->ss.ps.qual = (List *) - ExecInitExpr((Expr *) node->scan.plan.qual, - (PlanState *) indexstate); - indexstate->indexqualorig = (List *) - ExecInitExpr((Expr *) node->indexqualorig, + ExecInitExpr((Expr *) node->indexonlyqual, (PlanState *) indexstate); #define INDEXSCAN_NSLOTS 2 @@ -554,15 +565,9 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags) indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */ /* - * get the scan type from the relation descriptor. - */ - ExecAssignScanType(&indexstate->ss, RelationGetDescr(currentRelation)); - - /* * Initialize result tuple type and projection info. */ ExecAssignResultTypeFromTL(&indexstate->ss.ps); - ExecAssignScanProjectionInfo(&indexstate->ss); /* * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop @@ -584,6 +589,44 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags) relistarget ? NoLock : AccessShareLock); /* + * Initialize slot to hold index tuple. The shape of the slot depends + * on whether we return the whole index tuple or just the tid. + */ + { + TupleDesc td; + int natts; + + if (node->indexonlyqual != NIL || + list_length(node->indexonlytargetlist) > 1) + { + TupleDesc itupdesc = RelationGetDescr(indexstate->iss_RelationDesc); + int i; + + natts = itupdesc->natts + 1; + + td = CreateTemplateTupleDesc(natts, false); + for(i = 0; i < itupdesc->natts; i++) + memcpy(td->attrs[i], itupdesc->attrs[i], + sizeof(FormData_pg_attribute)); + + needIndexTuple = true; + } + else + { + natts = 1; + td = CreateTemplateTupleDesc(natts, false); + + needIndexTuple = false; + } + + TupleDescInitEntry(td, natts, "tid", TIDOID, -1, 0); + + ExecAssignScanType(&indexstate->ss, td); + } + + ExecAssignScanProjectionInfo(&indexstate->ss); + + /* * Initialize index-specific scan state */ indexstate->iss_RuntimeKeysReady = false; @@ -629,6 +672,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags) estate->es_snapshot, indexstate->iss_NumScanKeys, indexstate->iss_ScanKeys); + indexstate->iss_ScanDesc->needIndexTuple = needIndexTuple; /* * all done. diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 25961d1..15c91be 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -313,6 +313,9 @@ _copyIndexScan(IndexScan *from) COPY_NODE_FIELD(indexqual); COPY_NODE_FIELD(indexqualorig); COPY_SCALAR_FIELD(indexorderdir); + COPY_NODE_FIELD(indexonlyqual); + COPY_NODE_FIELD(indexonlyqualorig); + COPY_NODE_FIELD(indexonlytargetlist); return newnode; } @@ -362,6 +365,24 @@ _copyBitmapHeapScan(BitmapHeapScan *from) } /* + * _copyHeapFetch + */ +static HeapFetch * +_copyHeapFetch(HeapFetch *from) +{ + HeapFetch *newnode = makeNode(HeapFetch); + + /* + * copy node superclass fields + */ + CopyScanFields((Scan *) from, (Scan *) newnode); + COPY_SCALAR_FIELD(tidIdx); + COPY_NODE_FIELD(indexqualorig); + + return newnode; +} + +/* * _copyTidScan */ static TidScan * @@ -879,6 +900,7 @@ _copyVar(Var *from) Var *newnode = makeNode(Var); COPY_SCALAR_FIELD(varno); + COPY_SCALAR_FIELD(varindexno); COPY_SCALAR_FIELD(varattno); COPY_SCALAR_FIELD(vartype); COPY_SCALAR_FIELD(vartypmod); @@ -3470,6 +3492,9 @@ copyObject(void *from) case T_BitmapHeapScan: retval = _copyBitmapHeapScan(from); break; + case T_HeapFetch: + retval = _copyHeapFetch(from); + break; case T_TidScan: retval = _copyTidScan(from); break; diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 142aa8f..00abb24 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -134,6 +134,7 @@ static bool _equalVar(Var *a, Var *b) { COMPARE_SCALAR_FIELD(varno); + COMPARE_SCALAR_FIELD(varindexno); COMPARE_SCALAR_FIELD(varattno); COMPARE_SCALAR_FIELD(vartype); COMPARE_SCALAR_FIELD(vartypmod); diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c index 9ed9018..33d03da 100644 --- a/src/backend/nodes/makefuncs.c +++ b/src/backend/nodes/makefuncs.c @@ -87,6 +87,8 @@ makeVar(Index varno, /* Likewise, we just set location to "unknown" here */ var->location = -1; + var->varindexno = -1; + return var; } diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index af801a4..d508329 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -399,6 +399,8 @@ _outIndexScan(StringInfo str, IndexScan *node) WRITE_NODE_FIELD(indexqual); WRITE_NODE_FIELD(indexqualorig); WRITE_ENUM_FIELD(indexorderdir, ScanDirection); + WRITE_NODE_FIELD(indexonlyqual); + WRITE_NODE_FIELD(indexonlyqualorig); } static void @@ -424,6 +426,17 @@ _outBitmapHeapScan(StringInfo str, BitmapHeapScan *node) } static void +_outHeapFetch(StringInfo str, HeapFetch *node) +{ + WRITE_NODE_TYPE("HEAPFETCH"); + + _outScanInfo(str, (Scan *) node); + + WRITE_INT_FIELD(tidIdx); + WRITE_NODE_FIELD(indexqualorig); +} + +static void _outTidScan(StringInfo str, TidScan *node) { WRITE_NODE_TYPE("TIDSCAN"); @@ -780,6 +793,7 @@ _outVar(StringInfo str, Var *node) WRITE_NODE_TYPE("VAR"); WRITE_UINT_FIELD(varno); + WRITE_INT_FIELD(varindexno); WRITE_INT_FIELD(varattno); WRITE_OID_FIELD(vartype); WRITE_INT_FIELD(vartypmod); @@ -2414,6 +2428,9 @@ _outNode(StringInfo str, void *obj) case T_BitmapHeapScan: _outBitmapHeapScan(str, obj); break; + case T_HeapFetch: + _outHeapFetch(str, obj); + break; case T_TidScan: _outTidScan(str, obj); break; diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 4f7b740..7b5914b 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -398,6 +398,7 @@ _readVar(void) READ_LOCALS(Var); READ_UINT_FIELD(varno); + READ_INT_FIELD(varindexno); READ_INT_FIELD(varattno); READ_OID_FIELD(vartype); READ_INT_FIELD(vartypmod); diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index fcbb8ca..8b88871 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -192,6 +192,8 @@ cost_seqscan(Path *path, PlannerInfo *root, * * 'index' is the index to be used * 'indexQuals' is the list of applicable qual clauses (implicit AND semantics) + * 'indexOnlyQuals' is the list of qual clauses that are applied to tuples + * fetched from index, before fetching the heap tuple * 'outer_rel' is the outer relation when we are considering using the index * scan as the inside of a nestloop join (hence, some of the indexQuals * are join clauses, and we should expect repeated scans of the index); @@ -213,6 +215,7 @@ void cost_index(IndexPath *path, PlannerInfo *root, IndexOptInfo *index, List *indexQuals, + List *indexOnlyQuals, RelOptInfo *outer_rel) { RelOptInfo *baserel = index->rel; @@ -221,6 +224,7 @@ cost_index(IndexPath *path, PlannerInfo *root, Cost indexStartupCost; Cost indexTotalCost; Selectivity indexSelectivity; + Selectivity indexOnlyQualSelectivity; double indexCorrelation, csquared; Cost min_IO_cost, @@ -254,6 +258,9 @@ cost_index(IndexPath *path, PlannerInfo *root, PointerGetDatum(&indexSelectivity), PointerGetDatum(&indexCorrelation)); + indexOnlyQualSelectivity = + clauselist_selectivity(root, indexOnlyQuals, 0, JOIN_INNER, NULL); + /* * Save amcostestimate's results for possible use in bitmap scan planning. * We don't bother to save indexStartupCost or indexCorrelation, because a @@ -267,7 +274,8 @@ cost_index(IndexPath *path, PlannerInfo *root, run_cost += indexTotalCost - indexStartupCost; /* estimate number of main-table tuples fetched */ - tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples); + tuples_fetched = clamp_row_est( + indexOnlyQualSelectivity * indexSelectivity * baserel->tuples); /*---------- * Estimate number of main-table pages fetched, and compute I/O cost. diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 52caddd..87073f7 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -27,9 +27,11 @@ #include "optimizer/cost.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" +#include "optimizer/planmain.h" #include "optimizer/predtest.h" #include "optimizer/restrictinfo.h" #include "optimizer/var.h" +#include "parser/parsetree.h" #include "utils/builtins.h" #include "utils/bytea.h" #include "utils/lsyscache.h" @@ -47,12 +49,10 @@ /* Whether we are looking for plain indexscan, bitmap scan, or either */ -typedef enum -{ - ST_INDEXSCAN, /* must support amgettuple */ - ST_BITMAPSCAN, /* must support amgetbitmap */ - ST_ANYSCAN /* either is okay */ -} ScanTypeControl; +#define ST_INDEXSCAN 1 +#define ST_BITMAPSCAN 2 +#define ST_ANYSCAN (ST_INDEXSCAN | ST_BITMAPSCAN) +typedef int ScanTypeControl; /* Per-path data used within choose_bitmap_and() */ typedef struct @@ -64,10 +64,11 @@ typedef struct } PathClauseUsage; -static List *find_usable_indexes(PlannerInfo *root, RelOptInfo *rel, +static void find_usable_indexes(PlannerInfo *root, RelOptInfo *rel, List *clauses, List *outer_clauses, bool istoplevel, RelOptInfo *outer_rel, - SaOpControl saop_control, ScanTypeControl scantype); + SaOpControl saop_control, ScanTypeControl scantype, + List **indexpaths, List **bmpaths); static List *find_saop_paths(PlannerInfo *root, RelOptInfo *rel, List *clauses, List *outer_clauses, bool istoplevel, RelOptInfo *outer_rel); @@ -175,9 +176,10 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel) * Find all the index paths that are directly usable for this relation * (ie, are valid without considering OR or JOIN clauses). */ - indexpaths = find_usable_indexes(root, rel, - rel->baserestrictinfo, NIL, - true, NULL, SAOP_FORBID, ST_ANYSCAN); + find_usable_indexes(root, rel, + rel->baserestrictinfo, NIL, + true, NULL, SAOP_FORBID, ST_ANYSCAN, + &indexpaths, &bitindexpaths); /* * Submit all the ones that can form plain IndexScan plans to add_path. (A @@ -189,18 +191,11 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel) * selectivity; we should discard anything that was generated solely for * ordering purposes. */ - bitindexpaths = NIL; foreach(l, indexpaths) { - IndexPath *ipath = (IndexPath *) lfirst(l); - - if (ipath->indexinfo->amhasgettuple) - add_path(rel, (Path *) ipath); + Path *ipath = (Path *) lfirst(l); - if (ipath->indexinfo->amhasgetbitmap && - ipath->indexselectivity < 1.0 && - !ScanDirectionIsBackward(ipath->indexscandir)) - bitindexpaths = lappend(bitindexpaths, ipath); + add_path(rel, ipath); } /* @@ -271,26 +266,37 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel) * 'saop_control' indicates whether ScalarArrayOpExpr clauses can be used * 'scantype' indicates whether we need plain or bitmap scan support * + * Paths suitable for regular index scans and bitmap scans are returned in + * *indexpaths and *bmpaths, respectively. Same path can be returned in both + * lists. + * * Note: check_partial_indexes() must have been run previously. *---------- */ -static List * +static void find_usable_indexes(PlannerInfo *root, RelOptInfo *rel, List *clauses, List *outer_clauses, bool istoplevel, RelOptInfo *outer_rel, - SaOpControl saop_control, ScanTypeControl scantype) + SaOpControl saop_control, ScanTypeControl scantype, + List **indexpaths, List **bmpaths) { Relids outer_relids = outer_rel ? outer_rel->relids : NULL; bool possibly_useful_pathkeys = has_useful_pathkeys(root, rel); - List *result = NIL; List *all_clauses = NIL; /* not computed till needed */ ListCell *ilist; + if (indexpaths) + *indexpaths = NIL; + if (bmpaths) + *bmpaths = NIL; + foreach(ilist, rel->indexlist) { IndexOptInfo *index = (IndexOptInfo *) lfirst(ilist); IndexPath *ipath; List *restrictclauses; + List *indexonlyQuals; + List *heapQuals; List *index_pathkeys; List *useful_pathkeys; bool useful_predicate; @@ -398,29 +404,70 @@ find_usable_indexes(PlannerInfo *root, RelOptInfo *rel, useful_pathkeys = NIL; /* - * 3. Generate an indexscan path if there are relevant restriction - * clauses in the current clauses, OR the index ordering is - * potentially useful for later merging or final output ordering, OR - * the index has a predicate that was proven by the current clauses. + * 3. The index might be useful if data from it can be used to + * evaluate any restrictions not used as index keys. */ - if (found_clause || useful_pathkeys != NIL || useful_predicate) + indexonlyQuals = NIL; + heapQuals = NIL; + if ((scantype & ST_INDEXSCAN) && index->amhasgettuple) + extract_indexonly_quals(root, index, restrictclauses, outer_rel, + &heapQuals, &indexonlyQuals); + else + extract_indexonly_quals(root, index, restrictclauses, outer_rel, + &heapQuals, NULL); + + /* + * 4. Generate an indexscan path if + * + * - there are relevant restriction clauses in the current clauses, OR + * - the index ordering is potentially useful for later merging or + * final output ordering, OR + * - the index has a predicate that was proven by the current + * clauses, OR + * - data from the index can be used to evaluate restriction clauses + * before accessing the heap. + */ + if (found_clause || useful_pathkeys != NIL || useful_predicate || + indexonlyQuals != NIL) { ipath = create_index_path(root, index, restrictclauses, + indexonlyQuals, useful_pathkeys, index_is_ordered ? ForwardScanDirection : NoMovementScanDirection, outer_rel); - result = lappend(result, ipath); + + /* + * If the only reason we're considering this path is index-only + * quals, this path is only usable for a regular index scan. + */ + if (!found_clause && useful_pathkeys == NIL && !useful_predicate) + { + Path *p = (Path *) create_heapfetch_path(root, rel, (Path *)ipath, index, heapQuals, get_actual_clauses(ipath->indexquals)); + *indexpaths = lappend(*indexpaths, p); + } + else + { + if (index->amhasgettuple && indexpaths != NULL) + { + Path *p = (Path *) create_heapfetch_path(root, rel, (Path *)ipath, index, heapQuals, get_actual_clauses(ipath->indexquals)); + + *indexpaths = lappend(*indexpaths, p); + } + + if (index->amhasgetbitmap && bmpaths != NULL) + *bmpaths = lappend(*bmpaths, ipath); + } } /* - * 4. If the index is ordered, a backwards scan might be interesting. + * 5. If the index is ordered, a backwards scan might be interesting. * Again, this is only interesting at top level. */ if (index_is_ordered && possibly_useful_pathkeys && - istoplevel && outer_rel == NULL) + istoplevel && outer_rel == NULL && indexpaths != NULL) { index_pathkeys = build_index_pathkeys(root, index, BackwardScanDirection); @@ -428,17 +475,19 @@ find_usable_indexes(PlannerInfo *root, RelOptInfo *rel, index_pathkeys); if (useful_pathkeys != NIL) { + Path *p; ipath = create_index_path(root, index, restrictclauses, + indexonlyQuals, useful_pathkeys, BackwardScanDirection, outer_rel); - result = lappend(result, ipath); + p = (Path *) create_heapfetch_path(root, rel, (Path *) ipath, index, + heapQuals, get_actual_clauses(ipath->indexquals)); + *indexpaths = lappend(*indexpaths, p); } } } - - return result; } @@ -457,6 +506,7 @@ find_saop_paths(PlannerInfo *root, RelOptInfo *rel, { bool have_saop = false; ListCell *l; + List *paths; /* * Since find_usable_indexes is relatively expensive, don't bother to run @@ -476,10 +526,11 @@ find_saop_paths(PlannerInfo *root, RelOptInfo *rel, if (!have_saop) return NIL; - return find_usable_indexes(root, rel, - clauses, outer_clauses, - istoplevel, outer_rel, - SAOP_REQUIRE, ST_BITMAPSCAN); + find_usable_indexes(root, rel, + clauses, outer_clauses, + istoplevel, outer_rel, + SAOP_REQUIRE, ST_BITMAPSCAN, NULL, &paths); + return paths; } @@ -536,13 +587,15 @@ generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel, { List *andargs = ((BoolExpr *) orarg)->args; - indlist = find_usable_indexes(root, rel, - andargs, - all_clauses, - false, - outer_rel, - SAOP_ALLOW, - ST_BITMAPSCAN); + find_usable_indexes(root, rel, + andargs, + all_clauses, + false, + outer_rel, + SAOP_ALLOW, + ST_BITMAPSCAN, + NULL, + &indlist); /* Recurse in case there are sub-ORs */ indlist = list_concat(indlist, generate_bitmap_or_paths(root, rel, @@ -554,13 +607,15 @@ generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel, { Assert(IsA(orarg, RestrictInfo)); Assert(!restriction_is_or_clause((RestrictInfo *) orarg)); - indlist = find_usable_indexes(root, rel, - list_make1(orarg), - all_clauses, - false, - outer_rel, - SAOP_ALLOW, - ST_BITMAPSCAN); + find_usable_indexes(root, rel, + list_make1(orarg), + all_clauses, + false, + outer_rel, + SAOP_ALLOW, + ST_BITMAPSCAN, + NULL, + &indlist); } /* @@ -1123,6 +1178,102 @@ group_clauses_by_indexkey(IndexOptInfo *index, return clausegroup_list; } +void +extract_indexonly_quals(PlannerInfo *root, IndexOptInfo *index, + List *clause_groups, + RelOptInfo *outer_rel, + List **heapquals, List **indexonlyqualorig) +{ + List *scan_clauses = index->rel->baserestrictinfo; + List *indexquals; + List *allclauses; + ListCell *l; + RelOptInfo *rel = index->rel; + + /* Convert clauses to indexquals the executor can handle */ + indexquals = expand_indexqual_conditions(index, clause_groups); + + /* Flatten the clause-groups list to produce indexclauses list */ + allclauses = flatten_clausegroups_list(clause_groups); + + /* + * If this is an innerjoin scan, the indexclauses will contain join + * clauses that are not present in scan_clauses (since the passed-in value + * is just the rel's baserestrictinfo list). We must add these clauses to + * scan_clauses to ensure they get checked. In most cases we will remove + * the join clauses again below, but if a join clause contains a special + * operator, we need to make sure it gets into the scan_clauses. + * + * Note: pointer comparison should be enough to determine RestrictInfo + * matches. + */ + if (outer_rel != NULL) + scan_clauses = list_union_ptr(scan_clauses, allclauses); + + /* + * The qpqual list must contain all restrictions not automatically handled + * by the index. All the predicates in the indexquals will be checked + * (either by the index itself, or by nodeIndexscan.c), but if there are + * any "special" operators involved then they must be included in qpqual. + * The upshot is that qpqual must contain scan_clauses minus whatever + * appears in indexquals. + * + * In normal cases simple pointer equality checks will be enough to spot + * duplicate RestrictInfos, so we try that first. In some situations + * (particularly with OR'd index conditions) we may have scan_clauses that + * are not equal to, but are logically implied by, the index quals; so we + * also try a predicate_implied_by() check to see if we can discard quals + * that way. (predicate_implied_by assumes its first input contains only + * immutable functions, so we have to check that.) + * + * We can also discard quals that are implied by a partial index's + * predicate, but only in a plain SELECT; when scanning a target relation + * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the + * plan so that they'll be properly rechecked by EvalPlanQual testing. + */ + *heapquals = NIL; + if (indexonlyqualorig) + *indexonlyqualorig = NIL; + foreach(l, scan_clauses) + { + RestrictInfo *rinfo = (RestrictInfo *) lfirst(l); + bool is_indexonly; + + Assert(IsA(rinfo, RestrictInfo)); + if (rinfo->pseudoconstant) + continue; /* we may drop pseudoconstants here */ + if (list_member_ptr(indexquals, rinfo)) + continue; + if (!contain_mutable_functions((Node *) rinfo->clause)) + { + List *clausel = list_make1(rinfo->clause); + + if (predicate_implied_by(clausel, indexquals)) + continue; + if (index->indpred) + { + if (rel->relid != root->parse->resultRelation && + get_rowmark(root->parse, rel->relid) == NULL) + if (predicate_implied_by(clausel, + index->indpred)) + continue; + } + } + + if (indexonlyqualorig) + { + make_indexonly_expr(root, index, + (Expr *) rinfo->clause, + &is_indexonly); + if (is_indexonly) + *indexonlyqualorig = lappend(*indexonlyqualorig, rinfo); + else + *heapquals = lappend(*heapquals, rinfo); + } + else + *heapquals = lappend(*heapquals, rinfo); + } +} /* * match_clause_to_indexcol() @@ -1677,7 +1828,6 @@ best_inner_indexscan(PlannerInfo *root, RelOptInfo *rel, List *clause_list; List *indexpaths; List *bitindexpaths; - List *allindexpaths; ListCell *l; InnerIndexscanInfo *info; MemoryContext oldcontext; @@ -1773,27 +1923,13 @@ best_inner_indexscan(PlannerInfo *root, RelOptInfo *rel, * Find all the index paths that are usable for this join, except for * stuff involving OR and ScalarArrayOpExpr clauses. */ - allindexpaths = find_usable_indexes(root, rel, - clause_list, NIL, - false, outer_rel, - SAOP_FORBID, - ST_ANYSCAN); - - /* - * Include the ones that are usable as plain indexscans in indexpaths, and - * include the ones that are usable as bitmap scans in bitindexpaths. - */ - indexpaths = bitindexpaths = NIL; - foreach(l, allindexpaths) - { - IndexPath *ipath = (IndexPath *) lfirst(l); - - if (ipath->indexinfo->amhasgettuple) - indexpaths = lappend(indexpaths, ipath); - - if (ipath->indexinfo->amhasgetbitmap) - bitindexpaths = lappend(bitindexpaths, ipath); - } + find_usable_indexes(root, rel, + clause_list, NIL, + false, outer_rel, + SAOP_FORBID, + ST_ANYSCAN, + &indexpaths, + &bitindexpaths); /* * Generate BitmapOrPaths for any suitable OR-clauses present in the diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 8e5767d..77fb789 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -16,9 +16,14 @@ #include <math.h> +#include "access/sysattr.h" +#include "catalog/pg_type.h" +#include "nodes/makefuncs.h" #include "optimizer/cost.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" +#include "optimizer/planmain.h" +#include "optimizer/tlist.h" static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel, @@ -42,6 +47,8 @@ static List *select_mergejoin_clauses(PlannerInfo *root, List *restrictlist, JoinType jointype); +static Path *bubbleup_heapfetches(PlannerInfo *root, JoinPath *path); + /* * add_paths_to_joinrel @@ -130,6 +137,159 @@ add_paths_to_joinrel(PlannerInfo *root, if (enable_hashjoin) hash_inner_and_outer(root, joinrel, outerrel, innerrel, restrictlist, jointype, sjinfo); + + { + ListCell *l; + List *newpaths = NIL; + foreach(l, joinrel->pathlist) + { + Path *path = (Path *) lfirst(l); + path = bubbleup_heapfetches(root, (JoinPath *)path); + newpaths = lappend(newpaths, path); + } + joinrel->pathlist = newpaths; + } +} + +static bool +can_bubbleup(PlannerInfo *root, JoinPath *join, HeapFetchPath *hfpath) +{ + bool indexonly; + ListCell *l; + + /* + * XXX: + * Don't bubble up above a Merge Join, if a sort is required. This is + * just an implementation limitation - we'd have to check if the + * column we're sorting on is in the targetlist of the subpath, which + * is not trivial + */ + if (IsA(join, MergePath) && ((MergePath *) join)->outersortkeys != NIL) + return false; + + /* Check if we need any of the columns not in the index in the join clause */ + foreach(l, join->joinrestrictinfo) + { + RestrictInfo *rinfo = (RestrictInfo *) lfirst(l); + make_indexonly_expr(root, hfpath->indexinfo, rinfo->clause, + &indexonly); + if (!indexonly) + return false; + } + return true; +} + +static HeapFetchPath * +bubbleup_step(PlannerInfo *root, Path *join, HeapFetchPath *hfpath) +{ + /* This can be bubbled up */ + Var *tidvar; + ListCell *l; + List *origtlist; + HeapFetchPath *result; + + tidvar = makeVar(hfpath->indexinfo->rel->relid, + SelfItemPointerAttributeNumber, + TIDOID, -1, 0); + tidvar->varindexno = hfpath->indexinfo->indexno; + + origtlist = join->pathtargetlist; + + /* + * Remove any entries that can't be calculated below the heap fetch. + */ + join->pathtargetlist = NIL; + foreach(l, origtlist) + { + bool indexonly; + + make_indexonly_expr(root, hfpath->indexinfo, lfirst(l), &indexonly); + if (indexonly) + join->pathtargetlist = lappend(join->pathtargetlist, copyObject(lfirst(l))); + } + + /* + * Add the ctid col to the targetlist of the join + */ + if (!list_member(join->pathtargetlist, tidvar)) + join->pathtargetlist = lappend(join->pathtargetlist, tidvar); + + result = create_heapfetch_path(root, hfpath->baserel, + join, hfpath->indexinfo, + hfpath->heapclauses, hfpath->indexqualorig); + result->path.pathtargetlist = origtlist; + + return result; +} + +static Path * +bubbleup_heapfetches(PlannerInfo *root, JoinPath *path) +{ + Path *outerpath = path->outerjoinpath; + Path *innerpath = path->innerjoinpath; + Path *result = (Path *) path; + HeapFetchPath *lasthf = NULL; + + if (IsA(path, HeapFetchPath)) + return (Path *) path; + + if (path->jointype == JOIN_INNER || path->jointype == JOIN_LEFT) + { + while(IsA(outerpath, HeapFetchPath)) + { + HeapFetchPath *hfpath = (HeapFetchPath *) outerpath; + + if (can_bubbleup(root, path, hfpath)) + { + outerpath = hfpath->subpath; + + hfpath = bubbleup_step(root, (Path *) path, hfpath); + if (lasthf) + lasthf->subpath = (Path *) hfpath; + else + result = (Path *) hfpath; + lasthf = hfpath; + } + else + { + /* XXX: we could see if there's more heap fetches below this + * that could be pulled above this heap fetch and the join path. + */ + break; + } + } + } + + if (path->jointype == JOIN_INNER || path->jointype == JOIN_RIGHT) + { + while(IsA(innerpath, HeapFetchPath)) + { + HeapFetchPath *hfpath = (HeapFetchPath *) innerpath; + + if (can_bubbleup(root, path, hfpath)) + { + innerpath = hfpath->subpath; + + hfpath = bubbleup_step(root, (Path *) path, hfpath); + if (lasthf) + lasthf->subpath = (Path *) hfpath; + else + result = (Path *) hfpath; + lasthf = hfpath; + } + else + { + /* XXX: we could see if there's more heap fetches below this + * that could be pulled above this heap fetch and the join path. + */ + break; + } + } + } + path->outerjoinpath = outerpath; + path->innerjoinpath = innerpath; + + return result; } /* diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 6e5c251..9496ff0 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -20,6 +20,8 @@ #include <math.h> #include "access/skey.h" +#include "access/sysattr.h" +#include "catalog/pg_type.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" @@ -36,7 +38,7 @@ static Plan *create_scan_plan(PlannerInfo *root, Path *best_path); -static List *build_relation_tlist(RelOptInfo *rel); +static List *build_path_tlist(Path *path); static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel); static void disuse_physical_tlist(Plan *plan, Path *path); static Plan *create_gating_plan(PlannerInfo *root, Plan *plan, List *quals); @@ -49,6 +51,8 @@ static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path, List *tlist, List *scan_clauses); static IndexScan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path, List *tlist, List *scan_clauses); +static HeapFetch *create_heapfetch_plan(PlannerInfo *root, HeapFetchPath *best_path, + List *tlist, List *scan_clauses); static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root, BitmapHeapPath *best_path, List *tlist, List *scan_clauses); @@ -67,11 +71,11 @@ static CteScan *create_ctescan_plan(PlannerInfo *root, Path *best_path, static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_path, List *tlist, List *scan_clauses); static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path, - Plan *outer_plan, Plan *inner_plan); + Plan *outer_plan, Plan *inner_plan, List *tlist); static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path, - Plan *outer_plan, Plan *inner_plan); + Plan *outer_plan, Plan *inner_plan, List *tlist); static HashJoin *create_hashjoin_plan(PlannerInfo *root, HashPath *best_path, - Plan *outer_plan, Plan *inner_plan); + Plan *outer_plan, Plan *inner_plan, List *tlist); static List *fix_indexqual_references(List *indexquals, IndexPath *index_path); static Node *fix_indexqual_operand(Node *node, IndexOptInfo *index); static List *get_switched_clauses(List *clauses, Relids outerrelids); @@ -82,6 +86,7 @@ static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid); static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid, Oid indexid, List *indexqual, List *indexqualorig, ScanDirection indexscandir); +static HeapFetch *make_heapfetch(List *qptlist, List *qpqual, Index scanrelid); static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid, List *indexqual, List *indexqualorig); @@ -157,6 +162,7 @@ create_plan(PlannerInfo *root, Path *best_path) case T_IndexScan: case T_BitmapHeapScan: case T_TidScan: + case T_HeapFetch: /* strictly speakin not a scan, but... */ case T_SubqueryScan: case T_FunctionScan: case T_ValuesScan: @@ -216,15 +222,15 @@ create_scan_plan(PlannerInfo *root, Path *best_path) * planner.c may replace the tlist we generate here, forcing projection to * occur.) */ - if (use_physical_tlist(root, rel)) + if (use_physical_tlist(root, rel) && !IsA(best_path, IndexPath)) { tlist = build_physical_tlist(root, rel); /* if fail because of dropped cols, use regular method */ if (tlist == NIL) - tlist = build_relation_tlist(rel); + tlist = build_path_tlist(best_path); } else - tlist = build_relation_tlist(rel); + tlist = build_path_tlist(best_path); /* * Extract the relevant restriction clauses from the parent relation. The @@ -249,6 +255,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path) scan_clauses); break; + case T_HeapFetch: + plan = (Plan *) create_heapfetch_plan(root, + (HeapFetchPath *) best_path, + tlist, + scan_clauses); + break; + case T_BitmapHeapScan: plan = (Plan *) create_bitmap_scan_plan(root, (BitmapHeapPath *) best_path, @@ -320,13 +333,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path) * Build a target list (ie, a list of TargetEntry) for a relation. */ static List * -build_relation_tlist(RelOptInfo *rel) +build_path_tlist(Path *path) { List *tlist = NIL; int resno = 1; ListCell *v; - foreach(v, rel->reltargetlist) + foreach(v, path->pathtargetlist) { /* Do we really need to copy here? Not sure */ Node *node = (Node *) copyObject(lfirst(v)); @@ -412,15 +425,22 @@ disuse_physical_tlist(Plan *plan, Path *path) switch (path->pathtype) { case T_SeqScan: +#ifdef NOT_USED + /* + * IndexScans are special, we don't use the physical tlist + * optimization for them. + */ case T_IndexScan: +#endif case T_BitmapHeapScan: case T_TidScan: + case T_HeapFetch: case T_SubqueryScan: case T_FunctionScan: case T_ValuesScan: case T_CteScan: case T_WorkTableScan: - plan->targetlist = build_relation_tlist(path->parent); + plan->targetlist = build_path_tlist(path); break; default: break; @@ -477,29 +497,35 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path) Plan *outer_plan; Plan *inner_plan; Plan *plan; + List *tlist; outer_plan = create_plan(root, best_path->outerjoinpath); inner_plan = create_plan(root, best_path->innerjoinpath); + tlist = build_path_tlist((Path *) best_path); + switch (best_path->path.pathtype) { case T_MergeJoin: plan = (Plan *) create_mergejoin_plan(root, (MergePath *) best_path, outer_plan, - inner_plan); + inner_plan, + tlist); break; case T_HashJoin: plan = (Plan *) create_hashjoin_plan(root, (HashPath *) best_path, outer_plan, - inner_plan); + inner_plan, + tlist); break; case T_NestLoop: plan = (Plan *) create_nestloop_plan(root, (NestPath *) best_path, outer_plan, - inner_plan); + inner_plan, + tlist); break; default: elog(ERROR, "unrecognized node type: %d", @@ -543,7 +569,7 @@ static Plan * create_append_plan(PlannerInfo *root, AppendPath *best_path) { Append *plan; - List *tlist = build_relation_tlist(best_path->path.parent); + List *tlist = build_path_tlist((Path *) best_path); List *subplans = NIL; ListCell *subpaths; @@ -668,7 +694,7 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path) * should be left as-is if we don't need to add any expressions; but if we * do have to add expressions, then a projection step will be needed at * runtime anyway, so we may as well remove unneeded items. Therefore - * newtlist starts from build_relation_tlist() not just a copy of the + * newtlist starts from build_path_tlist() not just a copy of the * subplan's tlist; and we don't install it into the subplan unless we are * sorting or stuff has to be added. */ @@ -676,7 +702,7 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path) uniq_exprs = best_path->uniq_exprs; /* initialize modified subplan tlist as just the "required" vars */ - newtlist = build_relation_tlist(best_path->path.parent); + newtlist = build_path_tlist((Path *) best_path); nextresno = list_length(newtlist) + 1; newitems = false; @@ -764,7 +790,7 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path) * subplan tlist. */ plan = (Plan *) make_agg(root, - build_relation_tlist(best_path->path.parent), + build_path_tlist((Path *) best_path), NIL, AGG_HASHED, numGroupCols, @@ -870,16 +896,32 @@ create_indexscan_plan(PlannerInfo *root, List *indexquals = best_path->indexquals; Index baserelid = best_path->path.parent->relid; Oid indexoid = best_path->indexinfo->indexoid; - List *qpqual; List *stripped_indexquals; + List *indexonlyqualorig = best_path->indexonlyqualorig; List *fixed_indexquals; ListCell *l; IndexScan *scan_plan; + RelOptInfo *reloptinfo; + IndexOptInfo *indexoptinfo; + bool is_indexonly; /* it should be a base rel... */ Assert(baserelid > 0); Assert(best_path->path.parent->rtekind == RTE_RELATION); + /* Find the index that was used */ + reloptinfo = root->simple_rel_array[baserelid]; + foreach(l, reloptinfo->indexlist) + { + indexoptinfo = (IndexOptInfo *) lfirst(l); + if (indexoptinfo->indexoid == indexoid) + break; + else + indexoptinfo = NULL; + } + if (indexoptinfo == NULL) + elog(ERROR, "can't find IndexOptInfo for index scan"); + /* * Build "stripped" indexquals structure (no RestrictInfos) to pass to * executor as indexqualorig @@ -892,83 +934,40 @@ create_indexscan_plan(PlannerInfo *root, */ fixed_indexquals = fix_indexqual_references(indexquals, best_path); - /* - * If this is an innerjoin scan, the indexclauses will contain join - * clauses that are not present in scan_clauses (since the passed-in value - * is just the rel's baserestrictinfo list). We must add these clauses to - * scan_clauses to ensure they get checked. In most cases we will remove - * the join clauses again below, but if a join clause contains a special - * operator, we need to make sure it gets into the scan_clauses. - * - * Note: pointer comparison should be enough to determine RestrictInfo - * matches. - */ - if (best_path->isjoininner) - scan_clauses = list_union_ptr(scan_clauses, best_path->indexclauses); - - /* - * The qpqual list must contain all restrictions not automatically handled - * by the index. All the predicates in the indexquals will be checked - * (either by the index itself, or by nodeIndexscan.c), but if there are - * any "special" operators involved then they must be included in qpqual. - * The upshot is that qpqual must contain scan_clauses minus whatever - * appears in indexquals. - * - * In normal cases simple pointer equality checks will be enough to spot - * duplicate RestrictInfos, so we try that first. In some situations - * (particularly with OR'd index conditions) we may have scan_clauses that - * are not equal to, but are logically implied by, the index quals; so we - * also try a predicate_implied_by() check to see if we can discard quals - * that way. (predicate_implied_by assumes its first input contains only - * immutable functions, so we have to check that.) - * - * We can also discard quals that are implied by a partial index's - * predicate, but only in a plain SELECT; when scanning a target relation - * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the - * plan so that they'll be properly rechecked by EvalPlanQual testing. - */ - qpqual = NIL; - foreach(l, scan_clauses) - { - RestrictInfo *rinfo = (RestrictInfo *) lfirst(l); - - Assert(IsA(rinfo, RestrictInfo)); - if (rinfo->pseudoconstant) - continue; /* we may drop pseudoconstants here */ - if (list_member_ptr(indexquals, rinfo)) - continue; - if (!contain_mutable_functions((Node *) rinfo->clause)) - { - List *clausel = list_make1(rinfo->clause); - - if (predicate_implied_by(clausel, indexquals)) - continue; - if (best_path->indexinfo->indpred) - { - if (baserelid != root->parse->resultRelation && - get_rowmark(root->parse, baserelid) == NULL) - if (predicate_implied_by(clausel, - best_path->indexinfo->indpred)) - continue; - } - } - qpqual = lappend(qpqual, rinfo); - } - /* Sort clauses into best execution order */ - qpqual = order_qual_clauses(root, qpqual); + indexonlyqualorig = order_qual_clauses(root, indexonlyqualorig); /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */ - qpqual = extract_actual_clauses(qpqual, false); + indexonlyqualorig = extract_actual_clauses(indexonlyqualorig, false); - /* Finally ready to build the plan node */ scan_plan = make_indexscan(tlist, - qpqual, + indexonlyqualorig, baserelid, indexoid, fixed_indexquals, stripped_indexquals, best_path->indexscandir); + scan_plan->indexonlyqualorig = indexonlyqualorig; + scan_plan->indexonlyqual = (List *) + make_indexonly_expr(root, indexoptinfo, + (Expr *) indexonlyqualorig, + &is_indexonly); + if (!is_indexonly) + elog(ERROR, "entry in index target list not found in index"); + if (list_length(tlist) > 1) + { + scan_plan->indexonlytargetlist = + (List *) make_indexonly_expr(root, indexoptinfo, + (Expr *) tlist, &is_indexonly); + + if (!is_indexonly) + elog(ERROR, "entry in index target list not found in index"); + } + else + { + /* ctid only */ + scan_plan->indexonlytargetlist = tlist; + } copy_path_costsize(&scan_plan->scan.plan, &best_path->path); /* use the indexscan-specific rows estimate, not the parent rel's */ @@ -977,6 +976,48 @@ create_indexscan_plan(PlannerInfo *root, return scan_plan; } + +/* + * create_heapfetch_plan + * Returns a heap fetch plan for the base relation scanned by 'best_path' + * with restriction clauses 'scan_clauses' and targetlist 'tlist'. + */ +static HeapFetch * +create_heapfetch_plan(PlannerInfo *root, + HeapFetchPath *best_path, + List *tlist, + List *scan_clauses) +{ + Index baserelid = best_path->baserel->relid; + List *qpqual; + HeapFetch *scan_plan; + Plan *subplan; + + /* it should be a base rel... */ + Assert(baserelid > 0); + Assert(best_path->baserel->rtekind == RTE_RELATION); + + subplan = create_plan(root, best_path->subpath); + + qpqual = best_path->heapclauses; + + /* Sort clauses into best execution order */ + qpqual = order_qual_clauses(root, qpqual); + + /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */ + qpqual = extract_actual_clauses(qpqual, false); + + /* Finally ready to build the plan node */ + scan_plan = make_heapfetch(tlist, qpqual, baserelid); + + copy_path_costsize(&scan_plan->scan.plan, &best_path->path); + + scan_plan->scan.plan.lefttree = subplan; + scan_plan->indexqualorig = best_path->indexqualorig; + + return scan_plan; +} + /* * create_bitmap_scan_plan * Returns a bitmap scan plan for the base relation scanned by 'best_path' @@ -1555,9 +1596,9 @@ static NestLoop * create_nestloop_plan(PlannerInfo *root, NestPath *best_path, Plan *outer_plan, - Plan *inner_plan) + Plan *inner_plan, + List *tlist) { - List *tlist = build_relation_tlist(best_path->path.parent); List *joinrestrictclauses = best_path->joinrestrictinfo; List *joinclauses; List *otherclauses; @@ -1606,9 +1647,9 @@ static MergeJoin * create_mergejoin_plan(PlannerInfo *root, MergePath *best_path, Plan *outer_plan, - Plan *inner_plan) + Plan *inner_plan, + List *tlist) { - List *tlist = build_relation_tlist(best_path->jpath.path.parent); List *joinclauses; List *otherclauses; List *mergeclauses; @@ -1890,9 +1931,9 @@ static HashJoin * create_hashjoin_plan(PlannerInfo *root, HashPath *best_path, Plan *outer_plan, - Plan *inner_plan) + Plan *inner_plan, + List *tlist) { - List *tlist = build_relation_tlist(best_path->jpath.path.parent); List *joinclauses; List *otherclauses; List *hashclauses; @@ -2133,9 +2174,8 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index) /* * We represent index keys by Var nodes having the varno of the base table * but varattno equal to the index's attribute number (index column - * position). This is a bit hokey ... would be cleaner to use a - * special-purpose node type that could not be mistaken for a regular Var. - * But it will do for now. + * position), and varindexno set to index number (IndexOptInfo->indexno) + * of the index. */ Var *result; int pos; @@ -2160,6 +2200,7 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index) if (index->indexkeys[pos] == varatt) { result = (Var *) copyObject(node); + result->varindexno = index->indexno; result->varattno = pos + 1; return (Node *) result; } @@ -2186,6 +2227,7 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index) result = makeVar(index->rel->relid, pos + 1, exprType(lfirst(indexpr_item)), -1, 0); + result->varindexno = index->indexno; return (Node *) result; } indexpr_item = lnext(indexpr_item); @@ -2198,6 +2240,195 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index) } /* + * replace_heapvars_with_indexvars + * + * Replaces all Vars in an expression with Vars that refer to columns + * of an index. The resulting expression can be used in the executor to + * evaluate an expression using an IndexTuple from the index, instead + * of a heap tuple. + * + * Returns NULL and sets context->indexonly to FALSE if the expression can't + * be evaluated using columns from the index only. + */ +typedef struct +{ + IndexOptInfo *index; /* index we're dealing with */ + bool indexonly; +} replace_heapvars_context; + +static Node * +replace_heapvars_with_indexvars(Node *node, replace_heapvars_context *context) +{ + /* + * We represent index keys by Var nodes having the varno of the base table + * but varattno equal to the index's attribute number (index column + * position), and varindexno set to index number (IndexOptInfo->indexno) + * of the index. + */ + Var *result; + int pos; + IndexOptInfo *index = context->index; + ListCell *indexpr_item; + + if (node == NULL || !context->indexonly) + return NULL; + + if (IsA(node, Var)) + { + Var *var = (Var *) node; + + if (var->varno == index->rel->relid && var->varindexno == -1) + { + /* Try to match against simple index columns */ + int varatt = var->varattno; + + if (varatt != 0) + { + for (pos = 0; pos < index->ncolumns; pos++) + { + /* + * If the type stored in the index doesn't match that + * in the heap, we can't use the value from the index. + */ + if (index->atttype[pos] != var->vartype) + continue; + + if (index->indexkeys[pos] == varatt) + { + + result = (Var *) copyObject(node); + result->varattno = pos + 1; + result->varindexno = context->index->indexno; + return (Node *) result; + } + } + /* Not in index. The heap tuple is needed for this qual. */ + context->indexonly = false; + return NULL; + } + else + { + /* TODO: we don't handle full-row references yet. */ + context->indexonly = false; + return NULL; + } + } + else + { + /* XXX: Can this happen? */ + /* + context->indexonly = false; + return NULL; + */ + } + } + + /* Try to match against index expressions */ + if (index->indexprs != NIL) + { + indexpr_item = list_head(index->indexprs); + for (pos = 0; pos < index->ncolumns; pos++) + { + if (index->indexkeys[pos] == 0) + { + Node *indexkey; + Oid indexkey_type; + + if (indexpr_item == NULL) + elog(ERROR, "too few entries in indexprs list"); + + indexkey = (Node *) lfirst(indexpr_item); + if (indexkey && IsA(indexkey, RelabelType)) + indexkey = (Node *) ((RelabelType *) indexkey)->arg; + + /* + * If the type stored in the index doesn't match that of + * the expression, we can't use the value from the index. + */ + indexkey_type = exprType(indexkey); + if (index->atttype[pos] != indexkey_type) + continue; + + if (equal(node, indexkey)) + { + /* Found a match */ + result = makeVar(index->rel->relid, pos + 1, + exprType(lfirst(indexpr_item)), -1, + 0); + result->varindexno = context->index->indexno; + return (Node *) result; + } + indexpr_item = lnext(indexpr_item); + } + } + } + + /* Not found */ + return expression_tree_mutator(node, replace_heapvars_with_indexvars, + context); +} + +/* + * make_indexonly_expr + * + * Given an expression, return an equivalent Expr tree that evaluates the + * expression using columns from an index. On success, *indexonly is set to + * TRUE. If the expression can't be evaluated using only index columns, + * *indexonly is set to FALSE and NULL is returned. + * + * This is similar to fix_indexonly_references, but we return NULL instead + * of throwing an error if the expression can't be evaluated using only index + * columns, and the input expression doesn't need to be in simple "A op B" + * format. + */ +Expr * +make_indexonly_expr(PlannerInfo *root, IndexOptInfo *index, + Expr *expr, bool *indexonly) +{ + replace_heapvars_context context; + + if (expr == NULL) + { + if (indexonly) + *indexonly = true; + return NULL; + } + + /* + * If the indexam can't return index tuples, we can forget about it. + */ + if (!index->amreturntuple) + { + if (indexonly) + *indexonly = false; + return NULL; + } + + /* + * Expression containing volatile functions can't be used as index-only + * quals, because the expression would be executed on dead rows as well. + */ + if (contain_volatile_functions((Node *) expr)) + { + if (indexonly) + *indexonly = false; + return NULL; + } + + context.index = index; + context.indexonly = true; + + expr = (Expr *) replace_heapvars_with_indexvars((Node *) expr, &context); + + if (indexonly) + *indexonly = context.indexonly; + if (!context.indexonly) + return NULL; + else + return expr; +} + +/* * get_switched_clauses * Given a list of merge or hash joinclauses (as RestrictInfo nodes), * extract the bare clauses, and rearrange the elements within the @@ -2435,6 +2666,25 @@ make_indexscan(List *qptlist, return node; } + +static HeapFetch * +make_heapfetch(List *qptlist, + List *qpqual, + Index scanrelid) +{ + HeapFetch *node = makeNode(HeapFetch); + Plan *plan = &node->scan.plan; + + /* cost should be inserted by caller */ + plan->targetlist = qptlist; + plan->qual = qpqual; + plan->lefttree = NULL; + plan->righttree = NULL; + node->scan.scanrelid = scanrelid; + + return node; +} + static BitmapIndexScan * make_bitmap_indexscan(Index scanrelid, Oid indexid, diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c index 34e9617..37cf67b 100644 --- a/src/backend/optimizer/plan/planagg.c +++ b/src/backend/optimizer/plan/planagg.c @@ -25,6 +25,7 @@ #include "optimizer/paths.h" #include "optimizer/planmain.h" #include "optimizer/predtest.h" +#include "optimizer/restrictinfo.h" #include "optimizer/subselect.h" #include "parser/parse_clause.h" #include "parser/parsetree.h" @@ -38,7 +39,7 @@ typedef struct Oid aggsortop; /* Oid of its sort operator */ Expr *target; /* expression we are aggregating on */ Expr *notnulltest; /* expression for "target IS NOT NULL" */ - IndexPath *path; /* access path for index scan */ + Path *path; /* access path for index scan */ Cost pathcost; /* estimated cost to fetch first row */ bool nulls_first; /* null ordering direction matching index */ Param *param; /* param for subplan's output */ @@ -297,7 +298,7 @@ find_minmax_aggs_walker(Node *node, List **context) static bool build_minmax_path(PlannerInfo *root, RelOptInfo *rel, MinMaxAggInfo *info) { - IndexPath *best_path = NULL; + Path *best_path = NULL; Cost best_cost = 0; bool best_nulls_first = false; NullTest *ntest; @@ -327,6 +328,8 @@ build_minmax_path(PlannerInfo *root, RelOptInfo *rel, MinMaxAggInfo *info) IndexPath *new_path; Cost new_cost; bool found_clause; + List *heapquals; + List *indexonlyquals; /* Ignore non-btree indexes */ if (index->relam != BTREE_AM_OID) @@ -399,8 +402,12 @@ build_minmax_path(PlannerInfo *root, RelOptInfo *rel, MinMaxAggInfo *info) /* * Build the access path. We don't bother marking it with pathkeys. */ + extract_indexonly_quals(root, index, restrictclauses, NULL, + &heapquals, &indexonlyquals); + new_path = create_index_path(root, index, restrictclauses, + indexonlyquals, NIL, indexscandir, NULL); @@ -420,7 +427,10 @@ build_minmax_path(PlannerInfo *root, RelOptInfo *rel, MinMaxAggInfo *info) */ if (best_path == NULL || new_cost < best_cost) { - best_path = new_path; + best_path = (Path *) create_heapfetch_path(root, rel, (Path *) new_path, + index, heapquals, get_actual_clauses(new_path->indexquals)); + + best_cost = new_cost; if (ScanDirectionIsForward(indexscandir)) best_nulls_first = index->nulls_first[indexcol]; @@ -478,6 +488,7 @@ make_agg_subplan(PlannerInfo *root, MinMaxAggInfo *info) Plan *iplan; TargetEntry *tle; SortGroupClause *sortcl; + IndexOptInfo *indexinfo; /* * Generate a suitably modified query. Much of the work here is probably @@ -545,14 +556,20 @@ make_agg_subplan(PlannerInfo *root, MinMaxAggInfo *info) plan->targetlist = copyObject(subparse->targetList); - if (IsA(plan, Result)) - iplan = plan->lefttree; - else - iplan = plan; + iplan = plan; + while(IsA(iplan, Result) || IsA(iplan, HeapFetch)) + iplan = iplan->lefttree; Assert(IsA(iplan, IndexScan)); + if (IsA(info->path, IndexPath)) + indexinfo = ((IndexPath *) info->path)->indexinfo; + else if (IsA(info->path, HeapFetchPath)) + indexinfo = ((HeapFetchPath *) info->path)->indexinfo; + else + elog(ERROR, "unexpected path type %d", info->path->type); + if (!list_member(iplan->qual, info->notnulltest) && - !list_member(info->path->indexinfo->indpred, info->notnulltest)) + !list_member(indexinfo->indpred, info->notnulltest)) iplan->qual = lcons(info->notnulltest, iplan->qual); plan = (Plan *) make_limit(plan, diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index f3c50fe..6f02beb 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -15,7 +15,10 @@ */ #include "postgres.h" +#include "access/sysattr.h" #include "access/transam.h" +#include "catalog/pg_am.h" +#include "catalog/pg_opfamily.h" #include "catalog/pg_type.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" @@ -30,6 +33,7 @@ typedef struct { Index varno; /* RT index of Var */ + int varindexno; AttrNumber varattno; /* attr number of Var */ AttrNumber resno; /* TLE position of Var */ } tlist_vinfo; @@ -92,6 +96,7 @@ static void set_join_references(PlannerGlobal *glob, Join *join, int rtoffset); static void set_inner_join_references(PlannerGlobal *glob, Plan *inner_plan, indexed_tlist *outer_itlist); static void set_upper_references(PlannerGlobal *glob, Plan *plan, int rtoffset); +static void set_heapfetch_references(PlannerGlobal *glob, Plan *plan, int rtoffset, int scanvar); static void set_dummy_tlist_references(Plan *plan, int rtoffset); static indexed_tlist *build_tlist_index(List *tlist); static Var *search_indexed_tlist_for_var(Var *var, @@ -266,14 +271,37 @@ set_plan_refs(PlannerGlobal *glob, Plan *plan, int rtoffset) IndexScan *splan = (IndexScan *) plan; splan->scan.scanrelid += rtoffset; + + /* + * Replace the target list (containing vars referring heap + * cols), with straight references to index cols. XXX: this + * depends on the fact that build_index_tlist() includes all + * index cols in the target list. + */ splan->scan.plan.targetlist = fix_scan_list(glob, splan->scan.plan.targetlist, rtoffset); + splan->scan.plan.qual = fix_scan_list(glob, splan->scan.plan.qual, rtoffset); + splan->indexqual = fix_scan_list(glob, splan->indexqual, rtoffset); splan->indexqualorig = fix_scan_list(glob, splan->indexqualorig, rtoffset); + splan->indexonlyqual = + fix_scan_list(glob, splan->indexonlyqual, rtoffset); + splan->indexonlyqualorig = + fix_scan_list(glob, splan->indexonlyqualorig, rtoffset); + splan->indexonlytargetlist = + fix_scan_list(glob, splan->indexonlytargetlist, rtoffset); + { + TargetEntry *tid_te = (TargetEntry *) llast(splan->indexonlytargetlist); + /* hack the last entry in target list to be ctid */ + Var *tidvar = (Var *) tid_te->expr; + Assert(IsA(tidvar, Var)); + Assert(tidvar->varattno = SelfItemPointerAttributeNumber); + tidvar->varattno = list_length(splan->scan.plan.targetlist); + } } break; case T_BitmapIndexScan: @@ -303,6 +331,14 @@ set_plan_refs(PlannerGlobal *glob, Plan *plan, int rtoffset) fix_scan_list(glob, splan->bitmapqualorig, rtoffset); } break; + case T_HeapFetch: + { + HeapFetch *splan = (HeapFetch *) plan; + int oldscanrelid = splan->scan.scanrelid; + splan->scan.scanrelid += rtoffset; + set_heapfetch_references(glob, plan, rtoffset, oldscanrelid); + } + break; case T_TidScan: { TidScan *splan = (TidScan *) plan; @@ -925,12 +961,12 @@ set_inner_join_references(PlannerGlobal *glob, Plan *inner_plan, */ IndexScan *innerscan = (IndexScan *) inner_plan; List *indexqualorig = innerscan->indexqualorig; + List *indexonlyqualorig = innerscan->indexonlyqualorig; + Index innerrel = innerscan->scan.scanrelid; /* No work needed if indexqual refers only to its own rel... */ if (NumRelids((Node *) indexqualorig) > 1) { - Index innerrel = innerscan->scan.scanrelid; - /* only refs to outer vars get changed in the inner qual */ innerscan->indexqualorig = fix_join_expr(glob, indexqualorig, @@ -944,20 +980,42 @@ set_inner_join_references(PlannerGlobal *glob, Plan *inner_plan, NULL, innerrel, 0); + } - /* - * We must fix the inner qpqual too, if it has join clauses (this - * could happen if special operators are involved: some indexquals - * may get rechecked as qpquals). - */ - if (NumRelids((Node *) inner_plan->qual) > 1) - inner_plan->qual = fix_join_expr(glob, - inner_plan->qual, + /* No work needed if indexqual refers only to its own rel... */ + if (NumRelids((Node *) indexonlyqualorig) > 1) + { + /* only refs to outer vars get changed in the inner qual */ + innerscan->indexonlyqualorig = fix_join_expr(glob, + indexonlyqualorig, + outer_itlist, + NULL, + innerrel, + 0); + innerscan->indexonlyqual = fix_join_expr(glob, + innerscan->indexonlyqual, outer_itlist, NULL, innerrel, 0); } + + /* + * We must fix the inner qpqual too, if it has join clauses (this + * could happen if special operators are involved: some indexquals + * may get rechecked as qpquals). + */ + if (NumRelids((Node *) inner_plan->qual) > 1) + inner_plan->qual = fix_join_expr(glob, + inner_plan->qual, + outer_itlist, + NULL, + innerrel, + 0); + } + else if (IsA(inner_plan, HeapFetch)) + { + set_inner_join_references(glob, inner_plan->lefttree, outer_itlist); } else if (IsA(inner_plan, BitmapIndexScan)) { @@ -1138,6 +1196,49 @@ set_upper_references(PlannerGlobal *glob, Plan *plan, int rtoffset) pfree(subplan_itlist); } +static void +set_heapfetch_references(PlannerGlobal *glob, Plan *plan, int rtoffset, + int scanvar) +{ + Plan *subplan = plan->lefttree; + indexed_tlist *subplan_itlist; + HeapFetch *hfplan = (HeapFetch *) plan; + Index varno = scanvar; + AttrNumber varattno = SelfItemPointerAttributeNumber; + tlist_vinfo *vinfo; + int i; + + subplan_itlist = build_tlist_index(subplan->targetlist); + + /* Find the TID column in the node below */ + vinfo = subplan_itlist->vars; + i = subplan_itlist->num_vars; + while (i-- > 0) + { + if (vinfo->varno == varno && vinfo->varattno == varattno && + vinfo->varindexno != -1) + { + /* Found a match */ + hfplan->tidIdx = vinfo->resno; + break; + } + vinfo++; + } + if (i < 0) + elog(ERROR, "ctid not found in subplan target list"); + + plan->targetlist = fix_join_expr(glob, plan->targetlist, subplan_itlist, + NULL, scanvar, rtoffset); + + plan->qual = + fix_join_expr(glob, + plan->qual, + subplan_itlist, + NULL, scanvar, rtoffset); + + pfree(subplan_itlist); +} + /* * set_dummy_tlist_references * Replace the targetlist of an upper-level plan node with a simple @@ -1230,6 +1331,7 @@ build_tlist_index(List *tlist) Var *var = (Var *) tle->expr; vinfo->varno = var->varno; + vinfo->varindexno = var->varindexno; vinfo->varattno = var->varattno; vinfo->resno = tle->resno; vinfo++; @@ -1282,6 +1384,7 @@ build_tlist_index_other_vars(List *tlist, Index ignore_rel) if (var->varno != ignore_rel) { vinfo->varno = var->varno; + vinfo->varindexno = var->varindexno; vinfo->varattno = var->varattno; vinfo->resno = tle->resno; vinfo++; @@ -1317,7 +1420,8 @@ search_indexed_tlist_for_var(Var *var, indexed_tlist *itlist, i = itlist->num_vars; while (i-- > 0) { - if (vinfo->varno == varno && vinfo->varattno == varattno) + if (vinfo->varno == varno && vinfo->varattno == varattno && + vinfo->varindexno == vinfo->varindexno) { /* Found a match */ Var *newvar = copyVar(var); @@ -1426,6 +1530,15 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context) { Var *var = (Var *) node; + /* If it's for acceptable_rel, adjust and return it */ + if (var->varno == context->acceptable_rel) + { + var = copyVar(var); + var->varno += context->rtoffset; + var->varnoold += context->rtoffset; + return (Node *) var; + } + /* First look for the var in the input tlists */ newvar = search_indexed_tlist_for_var(var, context->outer_itlist, @@ -1443,17 +1556,9 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context) return (Node *) newvar; } - /* If it's for acceptable_rel, adjust and return it */ - if (var->varno == context->acceptable_rel) - { - var = copyVar(var); - var->varno += context->rtoffset; - var->varnoold += context->rtoffset; - return (Node *) var; - } - /* No referent found for Var */ - elog(ERROR, "variable not found in subplan target lists"); + elog(ERROR, "variable %d:%d not found in subplan target lists", + var->varno, var->varattno); } if (IsA(node, PlaceHolderVar)) { @@ -1555,12 +1660,13 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context) { Var *var = (Var *) node; + /* If it's for acceptable_rel, adjust and return it */ newvar = search_indexed_tlist_for_var(var, context->subplan_itlist, OUTER, context->rtoffset); if (!newvar) - elog(ERROR, "variable not found in subplan target list"); + elog(ERROR, "variable %d:%d not found in subplan target list", var->varno, var->varattno); return (Node *) newvar; } if (IsA(node, PlaceHolderVar)) diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 92e9d6e..328ce52 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -1865,10 +1865,12 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params) case T_IndexScan: finalize_primnode((Node *) ((IndexScan *) plan)->indexqual, &context); + finalize_primnode((Node *) ((IndexScan *) plan)->indexonlyqual, + &context); /* - * we need not look at indexqualorig, since it will have the same - * param references as indexqual. + * we need not look at indexqualorig/indexonlyqualorig, since they + * will have the same param references as indexqual/indexonlyqual. */ break; @@ -2034,6 +2036,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params) case T_Unique: case T_SetOp: case T_Group: + case T_HeapFetch: break; default: diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index b7c3d3c..ce9031e 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -16,13 +16,19 @@ #include <math.h> +#include "access/sysattr.h" #include "catalog/pg_operator.h" +#include "catalog/pg_type.h" #include "executor/executor.h" #include "miscadmin.h" +#include "nodes/makefuncs.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" +#include "optimizer/planmain.h" +#include "optimizer/predtest.h" +#include "optimizer/restrictinfo.h" #include "optimizer/tlist.h" #include "optimizer/var.h" #include "parser/parse_expr.h" @@ -401,6 +407,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel) pathnode->pathtype = T_SeqScan; pathnode->parent = rel; pathnode->pathkeys = NIL; /* seqscan has unordered result */ + pathnode->pathtargetlist = list_copy(rel->reltargetlist); cost_seqscan(pathnode, root, rel); @@ -427,6 +434,7 @@ IndexPath * create_index_path(PlannerInfo *root, IndexOptInfo *index, List *clause_groups, + List *indexonlyqualorig, List *pathkeys, ScanDirection indexscandir, RelOptInfo *outer_rel) @@ -459,6 +467,8 @@ create_index_path(PlannerInfo *root, /* Flatten the clause-groups list to produce indexclauses list */ allclauses = flatten_clausegroups_list(clause_groups); + pathnode->indexonlyqualorig = indexonlyqualorig; + /* Fill in the pathnode */ pathnode->indexinfo = index; pathnode->indexclauses = allclauses; @@ -504,7 +514,45 @@ create_index_path(PlannerInfo *root, pathnode->rows = rel->rows; } - cost_index(pathnode, root, index, indexquals, outer_rel); + cost_index(pathnode, root, index, indexquals, indexonlyqualorig, outer_rel); + + if (index->amreturntuple) + { + ListCell *lexpr = list_head(index->indexprs); + + int i; + for(i = 0; i < index->ncolumns; i++) + { + int keycol = index->indexkeys[i]; + Expr *expr; + + if (keycol != 0) + { + expr = (Expr *) makeVar(rel->relid, + keycol, + index->atttype[i], + -1, + 0); + } + else + { + expr = (Expr *) lfirst(lexpr); + lexpr = lnext(lexpr); + } + + pathnode->path.pathtargetlist = + lappend(pathnode->path.pathtargetlist, expr); + } + } + + { + Var *tidvar = makeVar(rel->relid, SelfItemPointerAttributeNumber, + TIDOID, -1, 0); + tidvar->varindexno = index->indexno; + + pathnode->path.pathtargetlist = + lappend(pathnode->path.pathtargetlist, tidvar); + } return pathnode; } @@ -529,6 +577,7 @@ create_bitmap_heap_path(PlannerInfo *root, pathnode->path.pathtype = T_BitmapHeapScan; pathnode->path.parent = rel; pathnode->path.pathkeys = NIL; /* always unordered */ + pathnode->path.pathtargetlist = list_copy(rel->reltargetlist); pathnode->bitmapqual = bitmapqual; pathnode->isjoininner = (outer_rel != NULL); @@ -613,6 +662,41 @@ create_bitmap_or_path(PlannerInfo *root, return pathnode; } + +/* + * create_heapfetch_path + * Creates a path corresponding to a scan by TID, returning the pathnode. + */ +HeapFetchPath * +create_heapfetch_path(PlannerInfo *root, RelOptInfo *rel, + Path *subpath, IndexOptInfo *index, + List *heapQuals, List *indexqualorig) +{ + HeapFetchPath *pathnode = makeNode(HeapFetchPath); + + pathnode->path.pathtype = T_HeapFetch; + pathnode->baserel = rel; + pathnode->path.parent = subpath->parent; + pathnode->path.pathkeys = subpath->pathkeys; + pathnode->path.pathtargetlist = rel->reltargetlist; + + pathnode->subpath = subpath; + + pathnode->indexinfo = index; + + pathnode->heapclauses = heapQuals; + pathnode->indexqualorig = indexqualorig; + + pathnode->path.startup_cost = subpath->startup_cost; + pathnode->path.total_cost = subpath->total_cost; + +#ifdef NOT_IMPLEMENTED + cost_heapfetch(&pathnode->path, root, rel); +#endif + + return pathnode; +} + /* * create_tidscan_path * Creates a path corresponding to a scan by TID, returning the pathnode. @@ -625,6 +709,7 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals) pathnode->path.pathtype = T_TidScan; pathnode->path.parent = rel; pathnode->path.pathkeys = NIL; + pathnode->path.pathtargetlist = list_copy(rel->reltargetlist); pathnode->tidquals = tidquals; @@ -648,6 +733,7 @@ create_append_path(RelOptInfo *rel, List *subpaths) pathnode->path.parent = rel; pathnode->path.pathkeys = NIL; /* result is always considered * unsorted */ + pathnode->path.pathtargetlist = rel->reltargetlist; pathnode->subpaths = subpaths; pathnode->path.startup_cost = 0; @@ -707,6 +793,7 @@ create_material_path(RelOptInfo *rel, Path *subpath) pathnode->path.parent = rel; pathnode->path.pathkeys = subpath->pathkeys; + pathnode->path.pathtargetlist = rel->reltargetlist; pathnode->subpath = subpath; @@ -911,6 +998,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, pathnode->path.pathtype = T_Unique; pathnode->path.parent = rel; + pathnode->path.pathtargetlist = rel->reltargetlist; /* * Treat the output as always unsorted, since we don't necessarily have @@ -1227,6 +1315,7 @@ create_subqueryscan_path(RelOptInfo *rel, List *pathkeys) pathnode->pathtype = T_SubqueryScan; pathnode->parent = rel; pathnode->pathkeys = pathkeys; + pathnode->pathtargetlist = rel->reltargetlist; cost_subqueryscan(pathnode, rel); @@ -1246,6 +1335,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel) pathnode->pathtype = T_FunctionScan; pathnode->parent = rel; pathnode->pathkeys = NIL; /* for now, assume unordered result */ + pathnode->pathtargetlist = rel->reltargetlist; cost_functionscan(pathnode, root, rel); @@ -1265,6 +1355,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel) pathnode->pathtype = T_ValuesScan; pathnode->parent = rel; pathnode->pathkeys = NIL; /* result is always unordered */ + pathnode->pathtargetlist = rel->reltargetlist; cost_valuesscan(pathnode, root, rel); @@ -1284,6 +1375,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel) pathnode->pathtype = T_CteScan; pathnode->parent = rel; pathnode->pathkeys = NIL; /* XXX for now, result is always unordered */ + pathnode->pathtargetlist = rel->reltargetlist; cost_ctescan(pathnode, root, rel); @@ -1303,6 +1395,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel) pathnode->pathtype = T_WorkTableScan; pathnode->parent = rel; pathnode->pathkeys = NIL; /* result is always unordered */ + pathnode->pathtargetlist = rel->reltargetlist; /* Cost is the same as for a regular CTE scan */ cost_ctescan(pathnode, root, rel); @@ -1344,6 +1437,7 @@ create_nestloop_path(PlannerInfo *root, pathnode->innerjoinpath = inner_path; pathnode->joinrestrictinfo = restrict_clauses; pathnode->path.pathkeys = pathkeys; + pathnode->path.pathtargetlist = joinrel->reltargetlist; cost_nestloop(pathnode, root, sjinfo); @@ -1443,6 +1537,7 @@ create_mergejoin_path(PlannerInfo *root, pathnode->path_mergeclauses = mergeclauses; pathnode->outersortkeys = outersortkeys; pathnode->innersortkeys = innersortkeys; + pathnode->jpath.path.pathtargetlist = joinrel->reltargetlist; cost_mergejoin(pathnode, root, sjinfo); @@ -1480,6 +1575,7 @@ create_hashjoin_path(PlannerInfo *root, pathnode->jpath.outerjoinpath = outer_path; pathnode->jpath.innerjoinpath = inner_path; pathnode->jpath.joinrestrictinfo = restrict_clauses; + pathnode->jpath.path.pathtargetlist = joinrel->reltargetlist; /* * A hashjoin never has pathkeys, since its output ordering is diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 9419178..5c0c342 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -193,9 +193,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, * We pre-zero the ordering info in case the index is unordered. */ info->indexkeys = (int *) palloc(sizeof(int) * ncolumns); - info->opfamily = (Oid *) palloc0(sizeof(Oid) * (4 * ncolumns + 1)); + info->opfamily = (Oid *) palloc0(sizeof(Oid) * (5 * ncolumns + 1)); info->opcintype = info->opfamily + (ncolumns + 1); - info->fwdsortop = info->opcintype + ncolumns; + info->atttype = info->opcintype + ncolumns; + info->fwdsortop = info->atttype + ncolumns; info->revsortop = info->fwdsortop + ncolumns; info->nulls_first = (bool *) palloc0(sizeof(bool) * ncolumns); @@ -204,6 +205,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->indexkeys[i] = index->indkey.values[i]; info->opfamily[i] = indexRelation->rd_opfamily[i]; info->opcintype[i] = indexRelation->rd_opcintype[i]; + info->atttype[i] = indexRelation->rd_att->attrs[i]->atttypid; } info->relam = indexRelation->rd_rel->relam; @@ -212,6 +214,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->amsearchnulls = indexRelation->rd_am->amsearchnulls; info->amhasgettuple = OidIsValid(indexRelation->rd_am->amgettuple); info->amhasgetbitmap = OidIsValid(indexRelation->rd_am->amgetbitmap); + info->amreturntuple = indexRelation->rd_am->amreturntuple; /* * Fetch the ordering operators associated with the index, if any. @@ -295,7 +298,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, index_close(indexRelation, NoLock); - indexinfos = lcons(info, indexinfos); + info->indexno = list_length(indexinfos); + indexinfos = lappend(indexinfos, info); } list_free(indexoidlist); diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index c70240b..d299b19 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -3375,7 +3375,10 @@ get_variable(Var *var, int levelsup, bool showstar, deparse_context *context) char *schemaname; char *refname; char *attname; - +/* + appendStringInfo(buf, "%d:%d", var->varno, var->varattno); + return NULL; +*/ /* Find appropriate nesting depth */ netlevelsup = var->varlevelsup + levelsup; if (netlevelsup >= list_length(context->namespaces)) diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h index cddf3e8..6d72e3e 100644 --- a/src/include/access/nbtree.h +++ b/src/include/access/nbtree.h @@ -425,6 +425,7 @@ typedef struct BTScanPosItem /* what we remember about each match */ { ItemPointerData heapTid; /* TID of referenced heap item */ OffsetNumber indexOffset; /* index item's location within page */ + uint16 bufferOffset; /* location in buffer */ } BTScanPosItem; typedef struct BTScanPosData @@ -453,7 +454,11 @@ typedef struct BTScanPosData int lastItem; /* last valid index in items[] */ int itemIndex; /* current index in items[] */ - BTScanPosItem items[MaxIndexTuplesPerPage]; /* MUST BE LAST */ + BTScanPosItem items[MaxIndexTuplesPerPage]; /* MUST BE LAST EXCEPT FOR THE STUFF BELOW! */ + + char *buffer; /* to hold index tuples */ + int bufferUsed; + } BTScanPosData; typedef BTScanPosData *BTScanPos; diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h index 3bac6d3..748c6ce 100644 --- a/src/include/access/relscan.h +++ b/src/include/access/relscan.h @@ -16,6 +16,7 @@ #include "access/genam.h" #include "access/heapam.h" +#include "access/itup.h" typedef struct HeapScanDescData @@ -64,6 +65,7 @@ typedef struct IndexScanDescData Snapshot xs_snapshot; /* snapshot to see */ int numberOfKeys; /* number of scan keys */ ScanKey keyData; /* array of scan key descriptors */ + bool needIndexTuple; /* should scan store index tuple in xs_itup? */ /* signaling to index AM about killing index tuples */ bool kill_prior_tuple; /* last-returned tuple is dead */ @@ -77,6 +79,8 @@ typedef struct IndexScanDescData Buffer xs_cbuf; /* current heap buffer in scan, if any */ /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */ bool xs_recheck; /* T means scan keys must be rechecked */ + + IndexTuple xs_itup; /* current index tuple, if any */ } IndexScanDescData; /* Struct for heap-or-index scans of system tables */ diff --git a/src/include/catalog/pg_am.h b/src/include/catalog/pg_am.h index f577e57..dd057cd 100644 --- a/src/include/catalog/pg_am.h +++ b/src/include/catalog/pg_am.h @@ -49,6 +49,7 @@ CATALOG(pg_am,2601) bool amsearchnulls; /* can AM search for NULL index entries? */ bool amstorage; /* can storage type differ from column type? */ bool amclusterable; /* does AM support cluster command? */ + bool amreturntuple; /* can AM return tuples back from index? */ Oid amkeytype; /* type of data in index, or InvalidOid */ regproc aminsert; /* "insert this tuple" function */ regproc ambeginscan; /* "start new scan" function */ @@ -76,7 +77,7 @@ typedef FormData_pg_am *Form_pg_am; * compiler constants for pg_am * ---------------- */ -#define Natts_pg_am 26 +#define Natts_pg_am 27 #define Anum_pg_am_amname 1 #define Anum_pg_am_amstrategies 2 #define Anum_pg_am_amsupport 3 @@ -89,36 +90,37 @@ typedef FormData_pg_am *Form_pg_am; #define Anum_pg_am_amsearchnulls 10 #define Anum_pg_am_amstorage 11 #define Anum_pg_am_amclusterable 12 -#define Anum_pg_am_amkeytype 13 -#define Anum_pg_am_aminsert 14 -#define Anum_pg_am_ambeginscan 15 -#define Anum_pg_am_amgettuple 16 -#define Anum_pg_am_amgetbitmap 17 -#define Anum_pg_am_amrescan 18 -#define Anum_pg_am_amendscan 19 -#define Anum_pg_am_ammarkpos 20 -#define Anum_pg_am_amrestrpos 21 -#define Anum_pg_am_ambuild 22 -#define Anum_pg_am_ambulkdelete 23 -#define Anum_pg_am_amvacuumcleanup 24 -#define Anum_pg_am_amcostestimate 25 -#define Anum_pg_am_amoptions 26 +#define Anum_pg_am_amreturntuple 13 +#define Anum_pg_am_amkeytype 14 +#define Anum_pg_am_aminsert 15 +#define Anum_pg_am_ambeginscan 16 +#define Anum_pg_am_amgettuple 17 +#define Anum_pg_am_amgetbitmap 18 +#define Anum_pg_am_amrescan 19 +#define Anum_pg_am_amendscan 20 +#define Anum_pg_am_ammarkpos 21 +#define Anum_pg_am_amrestrpos 22 +#define Anum_pg_am_ambuild 23 +#define Anum_pg_am_ambulkdelete 24 +#define Anum_pg_am_amvacuumcleanup 25 +#define Anum_pg_am_amcostestimate 26 +#define Anum_pg_am_amoptions 27 /* ---------------- * initial contents of pg_am * ---------------- */ -DATA(insert OID = 403 ( btree 5 1 t t t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscanbtmarkpos btrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions )); +DATA(insert OID = 403 ( btree 5 1 t t t t t t t f t t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscanbtmarkpos btrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions )); DESCR("b-tree index access method"); #define BTREE_AM_OID 403 -DATA(insert OID = 405 ( hash 1 1 f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescanhashendscan hashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions )); +DATA(insert OID = 405 ( hash 1 1 f t f f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescanhashendscan hashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions )); DESCR("hash index access method"); #define HASH_AM_OID 405 -DATA(insert OID = 783 ( gist 0 7 f f f t t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescangistendscan gistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions )); +DATA(insert OID = 783 ( gist 0 7 f f f t t t t t t f 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescangistendscan gistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions )); DESCR("GiST index access method"); #define GIST_AM_OID 783 -DATA(insert OID = 2742 ( gin 0 5 f f f t t f f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkposginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions )); +DATA(insert OID = 2742 ( gin 0 5 f f f t t f f t f f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkposginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions )); DESCR("GIN index access method"); #define GIN_AM_OID 2742 diff --git a/src/include/executor/nodeHeapfetch.h b/src/include/executor/nodeHeapfetch.h new file mode 100644 index 0000000..4f54ae3 --- /dev/null +++ b/src/include/executor/nodeHeapfetch.h @@ -0,0 +1,27 @@ +/*------------------------------------------------------------------------- + * + * nodeHeapfetch.h + * + * + * + * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * $PostgreSQL$ + * + *------------------------------------------------------------------------- + */ +#ifndef NODEHEAPFETCH_H +#define NODEHEAPFETCH_H + +#include "nodes/execnodes.h" + +extern int ExecCountSlotsHeapFetch(HeapFetch *node); +extern HeapFetchState *ExecInitHeapFetch(HeapFetch *node, EState *estate, int eflags); +extern TupleTableSlot *ExecHeapFetch(HeapFetchState *node); +extern void ExecEndHeapFetch(HeapFetchState *node); +extern void ExecHeapFetchMarkPos(HeapFetchState *node); +extern void ExecHeapFetchRestrPos(HeapFetchState *node); +extern void ExecHeapFetchReScan(HeapFetchState *node, ExprContext *exprCtxt); + +#endif /* NODEHEAPFETCH_H */ diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h index e40082d..b62dd71 100644 --- a/src/include/executor/tuptable.h +++ b/src/include/executor/tuptable.h @@ -15,6 +15,7 @@ #define TUPTABLE_H #include "access/htup.h" +#include "access/itup.h" #include "access/tupdesc.h" #include "storage/buf.h" @@ -165,6 +166,9 @@ extern TupleTableSlot *ExecStoreTuple(HeapTuple tuple, extern TupleTableSlot *ExecStoreMinimalTuple(MinimalTuple mtup, TupleTableSlot *slot, bool shouldFree); +extern TupleTableSlot *ExecStoreIndexTuple(IndexTuple itup, + TupleTableSlot *slot, + bool shouldFree); extern TupleTableSlot *ExecClearTuple(TupleTableSlot *slot); extern TupleTableSlot *ExecStoreVirtualTuple(TupleTableSlot *slot); extern TupleTableSlot *ExecStoreAllNullTuple(TupleTableSlot *slot); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 34fcf68..2acf414 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -1076,6 +1076,17 @@ typedef struct ScanState typedef ScanState SeqScanState; /* + * HeapFetch uses a bare ScanState as its state node, since it needs + * no additional fields. + */ +typedef struct HeapFetchState +{ + ScanState ss; + HeapTupleData hfs_htup; + List *hfs_indexqualorig; +} HeapFetchState; + +/* * These structs store information about index quals that don't have simple * constant right-hand sides. See comments for ExecIndexBuildScanKeys() * for discussion. @@ -1109,12 +1120,12 @@ typedef struct * RuntimeContext expr context for evaling runtime Skeys * RelationDesc index relation descriptor * ScanDesc index scan descriptor + * indexonlyqual execution state for indexonlyqual expressions * ---------------- */ typedef struct IndexScanState { ScanState ss; /* its first field is NodeTag */ - List *indexqualorig; ScanKey iss_ScanKeys; int iss_NumScanKeys; IndexRuntimeKeyInfo *iss_RuntimeKeys; diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 31135a3..8376158 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -71,6 +71,7 @@ typedef enum NodeTag T_Hash, T_SetOp, T_Limit, + T_HeapFetch, /* this one isn't a subclass of Plan: */ T_PlanInvalItem, @@ -109,6 +110,7 @@ typedef enum NodeTag T_HashState, T_SetOpState, T_LimitState, + T_HeapFetchState, /* * TAGS FOR PRIMITIVE NODES (primnodes.h) @@ -221,6 +223,7 @@ typedef enum NodeTag T_AppendRelInfo, T_PlaceHolderInfo, T_PlannerParamItem, + T_HeapFetchPath, /* * TAGS FOR MEMORY NODES (memnodes.h) diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index 23a5117..3b58f6c 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -264,6 +264,11 @@ typedef Scan SeqScan; * table). This is a bit hokey ... would be cleaner to use a special-purpose * node type that could not be mistaken for a regular Var. But it will do * for now. + * + * indexonlyqual and indexonlyqualorig represent expressions that can't be + * used as index keys, but can be evaluated without fetching the heap tuple, + * using data from the index. The index must be able to regurgitate index + * tuples stored in it. * ---------------- */ typedef struct IndexScan @@ -273,8 +278,25 @@ typedef struct IndexScan List *indexqual; /* list of index quals (OpExprs) */ List *indexqualorig; /* the same in original form */ ScanDirection indexorderdir; /* forward or backward or don't care */ + List *indexonlyqual; /* quals that can be satisfied from index */ + List *indexonlyqualorig; /* same in original form. Only needed for EXPLAIN */ + List *indexonlytargetlist; } IndexScan; + +/* ---------------- + * heap fetch node + * + * ---------------- + */ +typedef struct HeapFetch +{ + Scan scan; + AttrNumber tidIdx; /* index of TID column in outer node */ + List *indexqualorig; /* for rechecking index quals */ +} HeapFetch; + + /* ---------------- * bitmap index scan node * diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index 6e7f52b..fdd8195 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -135,6 +135,8 @@ typedef struct Var Expr xpr; Index varno; /* index of this var's relation in the range * table (could also be INNER or OUTER) */ + int varindexno; /* index into RelOptInfo.indexlist, or -1 if + * this Var doesn't refer an index */ AttrNumber varattno; /* attribute number of this var, or zero for * all */ Oid vartype; /* pg_type OID for the type of this var */ diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index bbce826..f907483 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -418,6 +418,8 @@ typedef struct IndexOptInfo { NodeTag type; + int indexno; + Oid indexoid; /* OID of the index relation */ RelOptInfo *rel; /* back-link to index's table */ @@ -430,6 +432,7 @@ typedef struct IndexOptInfo Oid *opfamily; /* OIDs of operator families for columns */ int *indexkeys; /* column numbers of index's keys, or 0 */ Oid *opcintype; /* OIDs of opclass declared input data types */ + Oid *atttype; /* OIDs of stored key data types */ Oid *fwdsortop; /* OIDs of sort operators for each column */ Oid *revsortop; /* OIDs of sort operators for backward scan */ bool *nulls_first; /* do NULLs come first in the sort order? */ @@ -446,6 +449,7 @@ typedef struct IndexOptInfo bool amsearchnulls; /* can AM search for NULL index entries? */ bool amhasgettuple; /* does AM have amgettuple interface? */ bool amhasgetbitmap; /* does AM have amgetbitmap interface? */ + bool amreturntuple; /* can AM return tuples from index? */ } IndexOptInfo; @@ -588,6 +592,8 @@ typedef struct Path List *pathkeys; /* sort ordering of path's output */ /* pathkeys is a List of PathKey nodes; see above */ + + List *pathtargetlist; } Path; /*---------- @@ -637,6 +643,7 @@ typedef struct IndexPath IndexOptInfo *indexinfo; List *indexclauses; List *indexquals; + List *indexonlyqualorig; bool isjoininner; ScanDirection indexscandir; Cost indextotalcost; @@ -644,6 +651,20 @@ typedef struct IndexPath double rows; /* estimated number of result tuples */ } IndexPath; + +typedef struct HeapFetchPath +{ + Path path; + RelOptInfo *baserel; + Path *subpath; + + IndexOptInfo *indexinfo; + + List *heapclauses; + + List *indexqualorig; +} HeapFetchPath; + /* * BitmapHeapPath represents one or more indexscans that generate TID bitmaps * instead of directly accessing the heap, followed by AND/OR combinations diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h index 11be226..f86e4dd 100644 --- a/src/include/optimizer/cost.h +++ b/src/include/optimizer/cost.h @@ -66,7 +66,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages, double index_pages, PlannerInfo *root); extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel); extern void cost_index(IndexPath *path, PlannerInfo *root, IndexOptInfo *index, - List *indexQuals, RelOptInfo *outer_rel); + List *indexQuals, List *indexOnlyQuals, RelOptInfo *outer_rel); extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual, RelOptInfo *outer_rel); extern void cost_bitmap_and_node(BitmapAndPath *path, PlannerInfo *root); diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 0f4c52e..632c535 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -31,6 +31,7 @@ extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel); extern IndexPath *create_index_path(PlannerInfo *root, IndexOptInfo *index, List *clause_groups, + List *indexonlyquals, List *pathkeys, ScanDirection indexscandir, RelOptInfo *outer_rel); @@ -46,6 +47,9 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root, List *bitmapquals); extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals); +extern HeapFetchPath *create_heapfetch_path(PlannerInfo *root, RelOptInfo *rel, + Path *outerpath, IndexOptInfo *index, + List *heapQuals, List *indexqualorig); extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths); extern ResultPath *create_result_path(List *quals); extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath); diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 4f80edc..64d05c8 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -71,6 +71,10 @@ extern List *expand_indexqual_conditions(IndexOptInfo *index, List *clausegroups); extern void check_partial_indexes(PlannerInfo *root, RelOptInfo *rel); extern List *flatten_clausegroups_list(List *clausegroups); +extern void extract_indexonly_quals(PlannerInfo *root, IndexOptInfo *index, + List *clause_groups, + RelOptInfo *outer_rel, + List **heapquals, List **indexonlyqualorig); /* * orindxpath.c diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index 0dd2bcc..e6973e9 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -75,6 +75,8 @@ extern SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy, Plan *lefttree, extern Result *make_result(PlannerInfo *root, List *tlist, Node *resconstantqual, Plan *subplan); extern bool is_projection_capable_plan(Plan *plan); +extern Expr *make_indexonly_expr(PlannerInfo *root, IndexOptInfo *index, + Expr *expr, bool *indexonly); /* * prototypes for plan/initsplan.c
pgsql-hackers by date: