Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: GIN improvements part2: fast scan
Date
Msg-id CAPpHfduOvUFT+GgCrnBOzRR19eTb9mDsDJwU=S0h477fdgr=RA@mail.gmail.com
Whole thread Raw
In response to Re: GIN improvements part2: fast scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On Fri, Nov 15, 2013 at 12:34 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 14.11.2013 19:26, Alexander Korotkov wrote:
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:

On 28.06.2013 22:31, Alexander Korotkov wrote:

Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and exact
false values will be passed in the case of current patch consistent; exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.


I'm going to mark this as "returned with feedback". For the next version,
I'd like to see the API changed per above. Also, I'd like us to do
something about the tidbitmap overhead, as a separate patch before this, so
that we can assess the actual benefit of this patch. And a new test case
that demonstrates the I/O benefits.


Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional
information now.
2) New API with tri-state logic is introduced.

Thanks! A couple of thoughts after a 5-minute glance:

* documentation

Will provide documented version this week.
 
* How about defining the tri-state consistent function to also return a tri-state? True would mean that the tuple definitely matches, false means the tuple definitely does not match, and Unknown means it might match. Or does return value true with recheck==true have the same effect? If I understood the patch, right, returning Unknown or True wouldn't actually make any difference, but it's conceivable that we might come up with more optimizations in the future that could take advantage of that. For example, for a query like "foo OR (bar AND baz)", you could immediately return any tuples that match foo, and not bother scanning for bar and baz at all.

The meaning of recheck flag when input contains unknown is undefined now. :)
For instance, we could define it in following ways:
1) Like returning Unknown meaning that consistent with true of false instead of input Unknown could return either true or false.
2) Consistent with true of false instead of input Unknown could return recheck. This meaning is probably logical, but I don't see any usage of it.

I'm not against idea of tri-state returning value for consisted, because it's logical continuation of its tri-state input. However, I don't see usage of distinguish True and Unknown in returning value for now :)

In example you give we can return foo immediately, but we have to create full bitmap. So we anyway will have to scan (bar AND baz). We could skip part of trees for bar and baz. But it's possible only when foo contains large amount of sequential TIDS so we can be sure that we didn't miss any TIDs. This seems to be very narrow use-case for me.

Another point is that one day we probably could immediately return tuples in gingettuple. And with LIMIT clause and no sorting we can don't search for other tuples. However, gingettuple was removed because of reasons of concurrency. And my patches for index-based ordering didn't return it in previous manner: it collects all the results and then returns them one-by-one.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Using indices for UNION.
Next
From: Heikki Linnakangas
Date:
Subject: Re: [PATCH] pg_upgrade: support for btrfs copy-on-write clones