Re: Handling GIN incomplete splits - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Handling GIN incomplete splits
Date
Msg-id CAB7nPqRQ0y8U4Kkx5DVsNZfz91CBkfAhZcrV66tjyvP=VkVYvw@mail.gmail.com
Whole thread Raw
In response to Handling GIN incomplete splits  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Handling GIN incomplete splits  (Michael Paquier <michael.paquier@gmail.com>)
Re: Handling GIN incomplete splits  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
Hi,

Here is a review of the first three patches:
1) Further gin refactoring:
make check passes (core tests and contrib tests).
Code compiles without warnings.
Then... About the patch... Even if I got little experience with code of gin, moving the flag for search mode out of btree, as well as removing the logic of PostingTreeScan really makes the code lighter and easier to follow.
Just wondering, why not simplifying as well ginTraverseLock:ginbtree.c at the same time to something similar to that?
       if (!GinPageIsLeaf(page) || searchMode == TRUE)
               return access;

       /* we should relock our page */
       LockBuffer(buffer, GIN_UNLOCK);
       LockBuffer(buffer, GIN_EXCLUSIVE);

       /* But root can become non-leaf during relock */
       if (!GinPageIsLeaf(page))
       {
               /* restore old lock type (very rare) */
               LockBuffer(buffer, GIN_UNLOCK);
               LockBuffer(buffer, GIN_SHARE);
        }
       else
               access = GIN_EXCLUSIVE;
        return access;
Feel free to discard as I can imagine that changing such code would make back-branch maintenance more difficult and it would increase conflicts with patches currently in development.
2) Refactoring of internal gin btree (needs patch 1 applied first):
make check passes (core tests and contrib tests).
Code compiles without warnings.
Yep, removing ginPageGetLinkItup makes sense. Just to be picky, I would have put the arguments of GinFormInteriorTuple replacing ginPageGetLinkItup in 3 separate lines just for lisibility.
In dataPrepareDownlink:gindatapage.c, depending on if lpage is a leaf page or not, isn't it inconsistent with the older code not to use GinDataPageGetItemPointer and GinDataPageGetPostingItem to set btree->pitem.key.
In ginContinueSplit:ginxlog.c, could it be possible to remove this code? It looks that its deletion has been forgotten:    /*                                                                                                                                                                               
 * elog(NOTICE,"ginContinueSplit root:%u l:%u r:%u",  split->rootBlkno,
 * split->leftBlkno, split->rightBlkno);
 */
Except the doubt about dataPrepareDownlink (related to my lack of knowledge of the code), patch looks good.
3) More refactoring (needs patches 1 and 2):
make check passes (core tests and contrib tests).
Code compiles without warnings.
Perhaps this patch would have been easier to read with context diffs :) It just moves code around so nothing to say.

Then, I have done a small test with all 3 patches applied. Test is done with pg_trgm by uploading the book "Les Miserables":
=# CREATE TABLE les_miserables (num serial, line text);
CREATE TABLE
=# \copy les_miserables (line) FROM '~/Desktop/pg135.txt';
=# select count(*) from les_miserables;
 count
-------
 68116
(1 row)
=# CREATE INDEX les_miserables_idx ON les_miserables USING gin (line gin_trgm_ops);
CREATE INDEX

And here is the result of this query (average of a couple of 5 runs):
=# explain analyse SELECT * FROM les_miserables where line ~~ '%Cosette%';
Vanilla server: 5.289 ms
With patch 1 only: 5.283 ms
With patches 1+2: 5.232 ms
With patches 1+2+3: 5.232 ms
Based on that there is no performance degradation.

I just began reading the 4th patch. As it is a bit more complex and needs more testing, I'll provide feedback later.
Regards,

On Thu, Nov 14, 2013 at 1:49 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
Here's another part of my crusade against xlog cleanup routines. This series of patches gets rid of the gin_cleanup() function, which is currently used to finish splits of GIN b-tree pages, if the system crashes (or an error occurs) between splitting a page and inserting its downlink to the parent.

The first three patches just move code around. IMHO they make the code more readable, so they should be committed in any case. The meat is in the fourth patch.

Thoughts, objections?


--
Michael

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Replication Node Identifiers and crashsafe Apply Progress
Next
From: Craig Ringer
Date:
Subject: Re: Call flow of btinsert(PG_FUNCTION_ARGS)