Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: GIN improvements part2: fast scan
Date
Msg-id 52E9A274.6040806@fuzzy.cz
Whole thread Raw
In response to Re: GIN improvements part2: fast scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: GIN improvements part2: fast scan
Re: GIN improvements part2: fast scan
List pgsql-hackers
On 28.1.2014 08:29, Heikki Linnakangas wrote:
> On 01/28/2014 05:54 AM, Tomas Vondra wrote:
>> Then I ran those scripts on:
>>
>>    * 9.3
>>    * 9.4 with Heikki's patches (9.4-heikki)
>>    * 9.4 with Heikki's and first patch (9.4-alex-1)
>>    * 9.4 with Heikki's and both patches (9.4-alex-2)
> 
> It would be good to also test with unpatched 9.4 (ie. git master). The
> packed posting lists patch might account for a large part of the
> differences between 9.3 and the patched 9.4 versions.
> 
> - Heikki
> 

Hi,

the e-mail I sent yesterday apparently did not make it into the list,
probably because of the attachments, so I'll just link them this time.

I added the results from 9.4 master to the spreadsheet:

https://docs.google.com/spreadsheet/ccc?key=0Alm8ruV3ChcgdHJfZTdOY2JBSlkwZjNuWGlIaGM0REE

It's a bit cumbersome to analyze though, so I've quickly hacked up a
simple jqplot page that allows comparing the results. It's available
here: http://www.fuzzy.cz/tmp/gin/

It's likely there are some quirks and issues - let me know about them.

The ODT with the data is available here:
      http://www.fuzzy.cz/tmp/gin/gin-scan-benchmarks.ods


Three quick basic observations:

(1) The current 9.4 master is consistently better than 9.3 by about 15%   on rare words, and up to 30% on common words.
Seethe charts for   6-word queries:
 
      http://www.fuzzy.cz/tmp/gin/6-words-rare-94-vs-93.png      http://www.fuzzy.cz/tmp/gin/6-words-rare-94-vs-93.png
   With 3-word queries the effects are even stronger & clearer,   especially with the common words.

(2) Heikki's patches seem to work OK, i.e. improve the performance, but   only with rare words.
     http://www.fuzzy.cz/tmp/gin/heikki-vs-94-rare.png
   With 3 words the impact is much stronger than with 6 words,   presumably because it depends on how frequent the
combinationof   words is (~ multiplication of probabilities). See
 
     http://www.fuzzy.cz/tmp/gin/heikki-vs-94-3-common-words.png
http://www.fuzzy.cz/tmp/gin/heikki-vs-94-6-common-words.png
   for comparison of 9.4 master vs. 9.4+heikki's patches.

(3) A file with explain plans for 4 queries suffering ~2x slowdown,   and explain plans with 9.4 master and Heikki's
patchesis available   here:
 
     http://www.fuzzy.cz/tmp/gin/queries.txt
   All the queries have 6 common words, and the explain plans look   just fine to me - exactly like the plans for other
queries.
   Two things now caught my eye. First some of these queries actually   have words repeated - either exactly like
"database& database" or   in negated form like "!anything & anything". Second, while   generating the queries, I use
"dumb"frequency, where only exact   matches count. I.e. "write != written" etc. But the actual number   of hits may be
muchhigher - for example "write" matches exactly   just 5% documents, but using @@ it matches more than 20%.
 
   I don't know if that's the actual cause though.

regards
Tomas




pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: New option for pg_basebackup, to specify a different directory for pg_xlog
Next
From: Tom Lane
Date:
Subject: Re: Suspicion of a compiler bug in clang: using ternary operator in ereport()