Re: GIT patch - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: GIT patch
Date
Msg-id 46B18D53.5010700@enterprisedb.com
Whole thread Raw
In response to Re: GIT patch  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: GIT patch
Re: GIT patch
List pgsql-hackers
Alvaro Herrera wrote:
> Hmm, do say, doesn't it seem like the lack of feedback and the failed
> bitmap patch played against final development of this patch?  

Yes :(. That's a one reason why I tried to help with the review of that
patch.

> At this
> point I feel like the patch still needs some work and reshuffling before
> it is in an acceptable state.  The fact that there are some API changes
> for which the patch needs to be adjusted makes me feel like we should
> put this patch on hold for 8.4.  So we would first get the API changes
> discussed and done and then adapt this patch to them.

I hate to say it but I agree. Getting the API changes discussed and
committed was my plan in February/March. Unfortunately it didn't happen
back then.

There's a few capabilities we need from the new API:

1. Support for candidate matches. Because a clustered index doesn't
contain the key for every heap tuple, when you search for a value we
don't know exactly which ones match. Instead, you get a bunch of
candidate matches, which need to be rechecked after fetching the heap
tuple. Oleg & Teodor pointed out that GiST could use the capability as
well. I also proposed a while ago to change the hash index
implementation so that it doesn't store the index key in the index, but
just the hash of it. That again would need the support for candidate
matches. And there's range-encoded bitmap indexes, if we implement them
in a more distant future.

2. Support to sort the heap tuples represented by one index tuple, in
normal index scans, if we go with alternative 1. Or support to do binary
searches over them, if we go with alternative 2 or 3. As the patch
stands, the sorting is done within b-tree, but that's quite ugly.

> Of the three proposals you suggest, I think the first one
> 
>> 1. A grouped index tuple contains a bitmap of offsetnumbers,
>> representing a bunch of heap tuples stored on the same heap page, that
>> all have a key between the key stored on the index tuple and the next
>> index tuple. We don't keep track of the ordering of the heap tuples
>> represented by one group index tuple. When doing a normal, non-bitmap,
>> index scan, they need to be sorted. This is what the patch currently
>> implements.
> 
> makes the most sense -- the index is keep simple and fast, and doing the
> sorting during an indexscan seems a perfectly acceptable compromise,
> knowing that the amount of tuples possible returned for sort is limited
> by the heap blocksize.

The overhead is shown in the CPU test results, particularly in the
select_range* tests, I put up on the git web site:
http://community.enterprisedb.com/git/.

The other alternatives might be simpler, though.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: "Rohit Khare"
Date:
Subject: .NET driver
Next
From: Alexey Klyukin
Date:
Subject: Re: GIT patch