Re: GIN improvements part 1: additional information - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: GIN improvements part 1: additional information
Date
Msg-id CAPpHfdskbPbjWYJhkd-FkZCKEUZd03PRAw05NyT0Hd-jxWOyfg@mail.gmail.com
Whole thread Raw
In response to Re: GIN improvements part 1: additional information  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: GIN improvements part 1: additional information  (Tomas Vondra <tv@fuzzy.cz>)
Re: GIN improvements part 1: additional information  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
On 8.1.2014 22:58, Alexander Korotkov wrote:
> Thanks for reporting. Fixed version is attached.

I've tried to rerun the 'archie' benchmark with the current patch, and
once again I got

   PANIC:  could not split GIN page, didn't fit

I reran it with '--enable-cassert' and with that I got

TRAP: FailedAssertion("!(ginCompareItemPointers(&items[i - 1],
                   &items[i]) < 0)", File: "gindatapage.c", Line: 149)
LOG:  server process (PID 5364) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: INSERT INTO messages ...

so the assert in GinDataLeafPageGetUncompressed fails for some reason.

I can easily reproduce it, but my knowledge in this area is rather
limited so I'm not entirely sure what to look for.

I've fixed this bug and many other bug. Now patch passes test suite that I've used earlier. The results are so:

Operations time:
         event         |     period      
-----------------------+-----------------
 index_build           | 00:01:47.53915
 index_build_recovery  | 00:00:04
 index_update          | 00:05:24.388163
 index_update_recovery | 00:00:53
 search_new            | 00:24:02.289384
 search_updated        | 00:27:09.193343
(6 rows)

Index sizes:
     label     |   size    
---------------+-----------
 new           | 384761856
 after_updates | 667942912
(2 rows)

Also, I made following changes in algorithms:
  • Now, there is a limit to number of uncompressed TIDs in the page. After reaching this limit, they are encoded independent on if they can fit page. That seems to me more desirable behaviour and somehow it accelerates search speed. Before this change times were following:
         event         |     period      
-----------------------+-----------------
 index_build           | 00:01:51.467888
 index_build_recovery  | 00:00:04
 index_update          | 00:05:03.315155
 index_update_recovery | 00:00:51
 search_new            | 00:24:43.194882
 search_updated        | 00:28:36.316784
(6 rows)
  • Page are not fully re-encoded if it's enough to re-encode just last segment.

README is updated.

------
With best regards,
Alexander Korotkov.

Attachment

pgsql-hackers by date:

Previous
From: Mel Gorman
Date:
Subject: Linux kernel impact on PostgreSQL performance
Next
From: Alexander Korotkov
Date:
Subject: KNN-GiST with recheck