Re: index-only scans - Mailing list pgsql-hackers

From Tom Lane
Subject Re: index-only scans
Date
Msg-id 15551.1318190582@sss.pgh.pa.us
Whole thread Raw
In response to Re: index-only scans  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: index-only scans
Re: index-only scans
List pgsql-hackers
I wrote:
> I believe that we should rejigger things so that when an index-only scan
> is selected, the executor *always* works from the data supplied by the
> index.  Even if it has to visit the heap --- it will do that but just to
> consult the tuple's visibility data, and then use what it got from the
> index anyway.  This means we'd build the plan node's filter quals and
> targetlist to reference the index tuple columns not the underlying
> table's.

I've been studying this a bit.  The key decision is how to represent
Vars that reference columns of the index.  We really have to have
varattno equal to the index column number, else ExecEvalVar will pull
the wrong column from the tuple.  However, varno is not so clear cut.
There are at least four things we could do:

1. Keep varno = table's rangetable index.  The trouble with this is that
a Var referencing index column N would look exactly like a Var
referencing table column N; so the same Var would mean something
different in an index-only scan node than it does in any other type of
scan node for the same table.  We could maybe make that work, but it
seems confusing and fragile as heck.  The executor isn't going to care
much, but inspection of the plan tree by e.g. EXPLAIN sure will.

2. Set varno = OUTER (or maybe INNER).  This is safe because there's no
other use for OUTER/INNER in a table scan node.  We would have to hack
things so that the index tuple gets put into econtext->ecxt_outertuple
(resp. ecxt_innertuple) at runtime, but that seems like no big problem.
In both setrefs.c and ruleutils.c, it would be desirable to have a
TargetEntry list somewhere representing the index columns, which setrefs
would want so it could set up the special Var nodes with fix_upper_expr,
and ruleutils would want so it could interpret the Vars using existing
machinery.  I'm not sure whether to hang that list on the index-only
plan node or expect EXPLAIN to regenerate it at need.

3. Invent another special varno value similar to OUTER/INNER but
representing an index reference.  This is just about like #2 except that
we could still put the index tuple into econtext->ecxt_scantuple, and
ExecEvalVar would do the right thing as it stands.

4. Create a rangetable entry specifically representing the index,
and set varno equal to that RTE's number.  This has some attractiveness
in terms of making the meaning of the Vars clear, but an RTE that
represents an index rather than a table seems kind of ugly otherwise.
It would likely require changes in unrelated parts of the code.


One point here is that we have historically used solution #1 to
represent the index keys in index qual expressions.  We avoid the
ambiguity issues by not asking EXPLAIN to try to interpret the indexqual
tree at all: it works from indexqualorig which contains ordinary Vars.
So one way to dodge the disadvantages of solution #1 would be to add
untransformed "targetlistorig" and "qualorig" fields to an index-only
plan node, and use those for EXPLAIN.  However, those fields would be
totally dead weight if the plan were never EXPLAINed, whereas
indexqualorig has a legitimate use for rechecking indexquals against the
heap tuple in case of a lossy index.  (BTW, if we go with any solution
other than #1, I'm strongly inclined to change the representation of
indexqual to match.  See the comments in fix_indexqual_operand.)

At the moment I'm leaning to approach #3, but I wonder if anyone has
a different opinion or another idea altogether.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: unite recovery.conf and postgresql.conf
Next
From: Greg Stark
Date:
Subject: Re: index-only scans