Re: Visibility map thoughts - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Visibility map thoughts
Date
Msg-id 473077AD.30101@enterprisedb.com
Whole thread Raw
In response to Re: Visibility map thoughts  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Visibility map thoughts  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
Simon Riggs wrote:
> I'm thinking that looking in the visibility map will have a cost also,
> so how will we know whether to bother looking? I'm assuming that we
> won't want to do that lookup in all cases, since it could easily just
> add pathlength and contention in the normal OLTP case.

Yeah, I believe that's true. There's a non-zero cost to return tuples 
from the index as well. Now that we do b-tree scans page at a time, if 
we're going to return data from the index, we need to copy all the 
matching data to backend-private memory when we step to a page.

> Presumably there
> would be a test in the planner to see if an index-only plan was
> possible?

Yes, you can check if all the columns you need are in the index. You 
won't know for sure how much of the visibility map is set until 
execution time, however.

I'm thinking that we're going to have an entry in pg_stats on the 
fraction of set bits in the map, and factor that in the cost estimate of 
index scans.

> ISTM that it would make most sense to do it during BitmapIndex scans.
> Specifically, as an intermediate step between BitmapIndex scan and
> BitmapHeap scan. That would be fairly likely to be a win in most cases
> because the bitmaps should compare easily and the amortised cost per row
> is likely to be very small, even if no heap lookups are avoided.

That only helps with COUNT(*), because until the BitmapHeap scan all you 
have is a bitmap of tuples. You can't return any values from just a 
bitmap. (Seems easy to do, so it's still probably worth doing just for 
COUNT(*), though)


A major design question is how where and how the executor for index-only 
scans should work. The required planner changes will follow from that. 
The simplest solution is to just modify nodeIndexScan.c a little bit to 
check the visibility map before fetching the heap tuple. If the bit in 
the visibility map is set, skip the heap fetch and return the required 
values from the index tuple instead. That would give most of the benefit 
already.

The indexam API needs to be modified as well, because there's currently 
no API to return index tuples from an index.

There's some more fancy stuff we could do if we separated the index scan 
and heap fetch to two different executor nodes. You could for example do 
an index-only scan and join, and check visibility only after the join, 
avoiding unnecessarily accessing the heap for tuples that don't satisfy 
the join condition. In fact we could do that without the visibility map 
as well.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Visibility map thoughts
Next
From: Mark Mielke
Date:
Subject: Re: Visibility map thoughts