Re: index-only scans - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: index-only scans |
Date | |
Msg-id | 15551.1318190582@sss.pgh.pa.us Whole thread Raw |
In response to | Re: index-only scans (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: index-only scans
Re: index-only scans |
List | pgsql-hackers |
I wrote: > I believe that we should rejigger things so that when an index-only scan > is selected, the executor *always* works from the data supplied by the > index. Even if it has to visit the heap --- it will do that but just to > consult the tuple's visibility data, and then use what it got from the > index anyway. This means we'd build the plan node's filter quals and > targetlist to reference the index tuple columns not the underlying > table's. I've been studying this a bit. The key decision is how to represent Vars that reference columns of the index. We really have to have varattno equal to the index column number, else ExecEvalVar will pull the wrong column from the tuple. However, varno is not so clear cut. There are at least four things we could do: 1. Keep varno = table's rangetable index. The trouble with this is that a Var referencing index column N would look exactly like a Var referencing table column N; so the same Var would mean something different in an index-only scan node than it does in any other type of scan node for the same table. We could maybe make that work, but it seems confusing and fragile as heck. The executor isn't going to care much, but inspection of the plan tree by e.g. EXPLAIN sure will. 2. Set varno = OUTER (or maybe INNER). This is safe because there's no other use for OUTER/INNER in a table scan node. We would have to hack things so that the index tuple gets put into econtext->ecxt_outertuple (resp. ecxt_innertuple) at runtime, but that seems like no big problem. In both setrefs.c and ruleutils.c, it would be desirable to have a TargetEntry list somewhere representing the index columns, which setrefs would want so it could set up the special Var nodes with fix_upper_expr, and ruleutils would want so it could interpret the Vars using existing machinery. I'm not sure whether to hang that list on the index-only plan node or expect EXPLAIN to regenerate it at need. 3. Invent another special varno value similar to OUTER/INNER but representing an index reference. This is just about like #2 except that we could still put the index tuple into econtext->ecxt_scantuple, and ExecEvalVar would do the right thing as it stands. 4. Create a rangetable entry specifically representing the index, and set varno equal to that RTE's number. This has some attractiveness in terms of making the meaning of the Vars clear, but an RTE that represents an index rather than a table seems kind of ugly otherwise. It would likely require changes in unrelated parts of the code. One point here is that we have historically used solution #1 to represent the index keys in index qual expressions. We avoid the ambiguity issues by not asking EXPLAIN to try to interpret the indexqual tree at all: it works from indexqualorig which contains ordinary Vars. So one way to dodge the disadvantages of solution #1 would be to add untransformed "targetlistorig" and "qualorig" fields to an index-only plan node, and use those for EXPLAIN. However, those fields would be totally dead weight if the plan were never EXPLAINed, whereas indexqualorig has a legitimate use for rechecking indexquals against the heap tuple in case of a lossy index. (BTW, if we go with any solution other than #1, I'm strongly inclined to change the representation of indexqual to match. See the comments in fix_indexqual_operand.) At the moment I'm leaning to approach #3, but I wonder if anyone has a different opinion or another idea altogether. regards, tom lane
pgsql-hackers by date: