Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: GIN improvements part2: fast scan
Date
Msg-id 5305CF6E.1080508@vmware.com
Whole thread Raw
In response to Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On 02/09/2014 12:11 PM, Alexander Korotkov wrote:
> I've rebased catalog changes with last master. Patch is attached. I've
> rerun my test suite with both last master ('committed') and attached
> patch ('ternary-consistent').

Thanks!

>           method         |       sum
> ------------------------+------------------
>   committed              | 143491.715000001
>   fast-scan-11           | 126916.111999999
>   fast-scan-light        |       137321.211
>   fast-scan-light-heikki | 138168.028000001
>   master                 |       446976.288
>   ternary-consistent     |       125923.514
>
> I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8
> to 4.

Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make 
sure we get similar behavior in Tomas' tests that used 6 search terms. 
But I always felt that it was too large for real queries, once we have 
the catalog changes, that's why I lowered to 4 when committing. If an 
opclass benefits greatly from fast scan, it should provide the ternary 
consistent function, and not rely on the shim implementation.

> I'm not sure about decision to reserve separate procedure number for
> ternary consistent. Probably, it would be better to add ginConfig method.
> It would be useful for post 9.4 improvements.

Hmm, it might be useful for an opclass to provide both, a boolean and 
ternary consistent function, if the boolean version is significantly 
more efficient when all the arguments are TRUE/FALSE. OTOH, you could 
also do a quick check through the array to see if there are any MAYBE 
arguments, within the consistent function. But I'm inclined to keep the 
possibility to provide both versions. As long as we support the boolean 
version at all, there's not much difference in terms of the amount of 
code to support having them both for the same opclass.

A ginConfig could be useful for many other things, but I don't think 
it's worth adding it now.


What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck? 
We discussed that earlier, but didn't reach any conclusion. That needs 
to be clarified in the docs. One possibility is to document that they're 
equivalent. Another is to forbid one of them. Yet another is to assign a 
different meaning to each.

I've been thinking that it might be useful to define them so that a 
MAYBE result from the tri-consistent function means that it cannot 
decide if you have a match or not, because some of the inputs were 
MAYBE. And TRUE+recheck means that even if all the MAYBE inputs were 
passed as TRUE or FALSE, the result would be the same, TRUE+recheck. The 
practical difference would be that if the tri-consistent function 
returns TRUE+recheck, ginget.c wouldn't need to bother fetching the 
other entries, it could just return the entry with recheck=true 
immediately. While with MAYBE result, it would fetch the other entries 
and call tri-consistent again. ginget.c doesn't currently use the 
tri-consistent function that way - it always fetches all the entries for 
a potential match before calling tri-consistent, but it could. I had it 
do that in some of the patch versions, but Tomas' testing showed that it 
was a big loss on some queries, because the consistent function was 
called much more often. Still, something like that might be sensible in 
the future, so it might be good to distinguish those cases in the API 
now. Note that ginarrayproc is already using the return values like 
that: in GinContainedStrategy, it always returns TRUE+recheck regardless 
of the inputs, but in other cases it uses GIN_MAYBE.

- Heikki



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Selecting large tables gets killed
Next
From: Bernd Helmle
Date:
Subject: Re: Selecting large tables gets killed