Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date
Msg-id 4A90448F.4060601@enterprisedb.com
Whole thread Raw
In response to Resjunk sort columns, Heikki's index-only quals patch, and bug #5000  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
List pgsql-hackers
Tom Lane wrote:
> It strikes me that in the cases where it wouldn't be necessary to
> compute junk sort-key columns, it would be because we were scanning an
> index that includes those values.  So if the plan were set up to pull
> those values from the index and return them, then we'd not have to add
> this extra complexity to grouping_planner --- the argument that it's not
> worth it to get rid of the junk columns comes back into play.  Moreover,
> such an ability would also mean that if the user *does* ask for the
> sort column value as output (ie it's not resjunk), we can still satisfy
> the query from the index without recomputing the expensive function.
> 
> So this is where we come to the connection to Heikki's index-only-quals
> patch.  As submitted, that code is only able to use an index column in
> a scan qual, it's not able to return it as part of the scan result.
> This example makes it clear that that definition is missing a large
> part of the potential benefit of an index value extraction capability.
> 
> To be able to do anything along that line would require some more work
> in the executor and a *lot* more work in the planner, and I'm honestly
> not sure what the planner part of it would look like.

I think we should separate the Heap Fetch operation from the IndexScan.
So in your example, you'd get a plan like:

postgres=# explain verbose select unique1 from tenk1 order by foo(unique1);
                    QUERY
 
PLAN

--------------------------------------------------------------------------------
-----------------------------------------------------------------
Heap Fetch on public.tenk1  Output: unique1  ->  Index Scan using fooi on public.tenk1  (cost=0.00..7298.60
rows=290 width=244)        Output: foo(unique1), ctid

IndexScanPath would be correspondingly split into HeapFetchPath and
IndexPath. There would need to be magic to figure out how to decompose
each top target list entry to components fetched from the heap and the
index. In a more complex scenario, we might have
"foo(unique1)*foo(unique2)" in the SELECT list. That could be evaluated
by fetching "foo(unique1)" from the index, "unique2" from the heap, and
projecting those with "A*foo(B)" at the Heap Fetch node.

The cost estimates of IndexPath would no longer include the cost of
fetching from heap - that would be taken into account in HeapFetchPath.
The IndexScan executor node would be a lot simpler as well, as it would
only need to deal with the index, not the heap.

The really cool thing about this is that we could then "bubble up" the
HeapFetch nodes when we construct joins. For example, when we construct
a nested loop join path on a SeqScan on tenk2 and HeapFetch+IndexPath on
tenk1, we could move the HeapFetch above the join node:

postgres=# explain select tenk1.unique1, tenk2.unique2 from tenk1, tenk2
where tenk1.unique1 = tenk2.unique1

Heap Fetch on public.tenk1  Output: tenk1.unique1, tenk2.unique1  ->  Nested Loop (tenk1.unique1 = tenk2.unique1)
Output: tenk1.unique1, tenk2.unique1, tenk1.ctid        ->  Index Scan using foo_unique1 on public.tenk1
Output:tenk1.unique1, tenk1.ctid        ->  Seq Scan on tenk2              Output: tenk2.unique1
 

In this example, HeapFetch actually only needs to check the visibility -
it's easy to see how we could turn this into a true index-only scan if
we had a crash-safe visibility map.

>  The main point
> at the moment is that to do something like this, we'd want the indexscan
> to certainly extract the function value from the index.  Heikki's patch
> views that as an optional behavior with no real penalty for failure.
> I don't think that's good enough.

Yes, so it seems. One complication is the 'rechecks' that an Index scan
sometimes has to do. The heap tuple is required for that. If we split
the index scan to HeapFetch and IndexScan as I'm envisioning, the
rechecks could be done in the HeapFetch node, but then we need to
propagate the recheck flag from IndexScan to HeapFetch.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: WIP: generalized index constraints
Next
From: Kenneth Marshall
Date:
Subject: Re: Another try at reducing repeated detoast work for PostGIS