Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: GIN improvements part2: fast scan |
Date | |
Msg-id | 52E9A274.6040806@fuzzy.cz Whole thread Raw |
In response to | Re: GIN improvements part2: fast scan (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: GIN improvements part2: fast scan
Re: GIN improvements part2: fast scan |
List | pgsql-hackers |
On 28.1.2014 08:29, Heikki Linnakangas wrote: > On 01/28/2014 05:54 AM, Tomas Vondra wrote: >> Then I ran those scripts on: >> >> * 9.3 >> * 9.4 with Heikki's patches (9.4-heikki) >> * 9.4 with Heikki's and first patch (9.4-alex-1) >> * 9.4 with Heikki's and both patches (9.4-alex-2) > > It would be good to also test with unpatched 9.4 (ie. git master). The > packed posting lists patch might account for a large part of the > differences between 9.3 and the patched 9.4 versions. > > - Heikki > Hi, the e-mail I sent yesterday apparently did not make it into the list, probably because of the attachments, so I'll just link them this time. I added the results from 9.4 master to the spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0Alm8ruV3ChcgdHJfZTdOY2JBSlkwZjNuWGlIaGM0REE It's a bit cumbersome to analyze though, so I've quickly hacked up a simple jqplot page that allows comparing the results. It's available here: http://www.fuzzy.cz/tmp/gin/ It's likely there are some quirks and issues - let me know about them. The ODT with the data is available here: http://www.fuzzy.cz/tmp/gin/gin-scan-benchmarks.ods Three quick basic observations: (1) The current 9.4 master is consistently better than 9.3 by about 15% on rare words, and up to 30% on common words. Seethe charts for 6-word queries: http://www.fuzzy.cz/tmp/gin/6-words-rare-94-vs-93.png http://www.fuzzy.cz/tmp/gin/6-words-rare-94-vs-93.png With 3-word queries the effects are even stronger & clearer, especially with the common words. (2) Heikki's patches seem to work OK, i.e. improve the performance, but only with rare words. http://www.fuzzy.cz/tmp/gin/heikki-vs-94-rare.png With 3 words the impact is much stronger than with 6 words, presumably because it depends on how frequent the combinationof words is (~ multiplication of probabilities). See http://www.fuzzy.cz/tmp/gin/heikki-vs-94-3-common-words.png http://www.fuzzy.cz/tmp/gin/heikki-vs-94-6-common-words.png for comparison of 9.4 master vs. 9.4+heikki's patches. (3) A file with explain plans for 4 queries suffering ~2x slowdown, and explain plans with 9.4 master and Heikki's patchesis available here: http://www.fuzzy.cz/tmp/gin/queries.txt All the queries have 6 common words, and the explain plans look just fine to me - exactly like the plans for other queries. Two things now caught my eye. First some of these queries actually have words repeated - either exactly like "database& database" or in negated form like "!anything & anything". Second, while generating the queries, I use "dumb"frequency, where only exact matches count. I.e. "write != written" etc. But the actual number of hits may be muchhigher - for example "write" matches exactly just 5% documents, but using @@ it matches more than 20%. I don't know if that's the actual cause though. regards Tomas
pgsql-hackers by date: