Re: Index AM change proposals, redux - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Index AM change proposals, redux
Date
Msg-id 20498.1208031032@sss.pgh.pa.us
Whole thread Raw
In response to Re: Index AM change proposals, redux  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I looked over the issue of making the regular-indexscan code path able
to handle runtime determination of operator lossiness.  Here's my
implementation plan:

* Extend amgettuple API to allow a boolean recheck flag to be passed back.

* Remove index_getnext_indexitem, which is dead code and has been for
awhile.  (It seems to have been put in to help gist and hash vacuuming,
which before that were using index_getnext loops and thus thrashing the
heap on every call... but they were both rewritten entirely since then.)

* Have index_getnext() just pass the recheck flag back to its caller,
without doing any extra work.  The existing callers arenodeIndexscan.cCLUSTERsystable_getnextGetNewOidWithIndex (should
usesystable_getnext)
 
and about ten places that use index_getnext directly, instead of
going through systable_getnext, because they rely on seeing ordered
results which systable_getnext doesn't promise.

* nodeIndexscan can execute rechecks using the indexqualorig expressions,
which it has set up and ready to go already, so there's basically no
added overhead for non-lossy cases.

* CLUSTER doesn't pass any scankeys so it should never see recheck=true,
and can/should error out.

* systable_getnext and the other direct callers are only used with
system catalogs, which should never have lossy operators anyway,
so for the moment we can just have them error out on recheck=true.
But I think it's important to have a plan for fixing that later.
We know that these functions are used only with "simple" scan keys
(no index expressions), so it would be easy enough to apply the recheck
using the scan keys ... if we only had the heap attnums at hand
rather than the index attnums.  In the case of systable_getnext
it'd be very simple to extend systable_beginscan and the SysScanDesc
data structure to preserve the heap attnums.  What I propose doing
about the other direct callers is to make a variant of the systable
scan code that *does* promise to preserve ordering, but otherwise
has a similar API to systable_beginscan/systable_getnext.  If they
all go through that then we'll have a single place to fix to support
lossy results.

* As discussed yesterday, eliminate fixed RECHECK markings on operators,
and change the API for GIST/GIN "consistent" functions to pass back
a recheck indicator.

* The planner will no longer care whether an index operator is lossy.
This means that as far as the planner is concerned there isn't any strong
reason for fix_indexqual_references to call get_op_opfamily_properties ---
the only reason it's doing that is to build the indexstrategy and
indexsubtype lists to attach to the generated indexscan plan node.
It's tempting to drop those lists from the plan tree altogether and
instead do the get_op_opfamily_properties lookup during
ExecIndexBuildScanKeys.  For a plan executed only once, this would be
a clear win --- it's the same total number of catcache lookups and we
avoid some palloc overhead to construct the lists.  For a plan that's
cached and reused many times, at some point the extra lookups would
start to cost more than the list storage costs.  But it's not like
executor startup doesn't involve a lot of catcache fetches anyway.
Any thoughts about that tradeoff?

BTW, it occurs to me that this'll make it trivially easy to experiment
with hash indexes that only store the hash values ... they'll just
always set the recheck flag to true.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: stack depth limit exceeded - patch possible?
Next
From: Alexander Wöhrer
Date:
Subject: Re: stack depth limit exceeded - patch possible?