Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: GIN improvements part2: fast scan
Date
Msg-id 52EED0FC.2060503@fuzzy.cz
Whole thread Raw
In response to Re: GIN improvements part2: fast scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: GIN improvements part2: fast scan  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers
On 2.2.2014 11:45, Heikki Linnakangas wrote:
> On 01/30/2014 01:53 AM, Tomas Vondra wrote:
>> (3) A file with explain plans for 4 queries suffering ~2x slowdown,
>>      and explain plans with 9.4 master and Heikki's patches is available
>>      here:
>>
>>        http://www.fuzzy.cz/tmp/gin/queries.txt
>>
>>      All the queries have 6 common words, and the explain plans look
>>      just fine to me - exactly like the plans for other queries.
>>
>>      Two things now caught my eye. First some of these queries actually
>>      have words repeated - either exactly like "database & database" or
>>      in negated form like "!anything & anything". Second, while
>>      generating the queries, I use "dumb" frequency, where only exact
>>      matches count. I.e. "write != written" etc. But the actual number
>>      of hits may be much higher - for example "write" matches exactly
>>      just 5% documents, but using @@ it matches more than 20%.
>>
>>      I don't know if that's the actual cause though.
> 
> Ok, here's another variant of these patches. Compared to git master, it
> does three things:
> 
> 1. It adds the concept of ternary consistent function internally, but no
> catalog changes. It's implemented by calling the regular boolean
> consistent function "both ways".
> 
> 2. Use a binary heap to get the "next" item from the entries in a scan.
> I'm pretty sure this makes sense, because arguably it makes the code
> more readable, and reduces the number of item pointer comparisons
> significantly for queries with a lot of entries.
> 
> 3. Only perform the pre-consistent check to try skipping entries, if we
> don't already have the next item from the entry loaded in the array.
> This is a tradeoff, you will lose some of the performance gain you might
> get from pre-consistent checks, but it also limits the performance loss
> you might get from doing useless pre-consistent checks.
> 
> So taken together, I would expect this patch to make some of the
> performance gains less impressive, but also limit the loss we saw with
> some of the other patches.
> 
> Tomas, could you run your test suite with this patch, please?

Sure, will do. Do I get it right that this should be applied instead of
the four patches you've posted earlier?

Tomas



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Recovery inconsistencies, standby much larger than primary
Next
From: Peter Geoghegan
Date:
Subject: Re: Making strxfrm() blobs in indexes work