Re: [HACKERS] Resurrecting per-page cleaner for btree - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: [HACKERS] Resurrecting per-page cleaner for btree
Date
Msg-id 200607260138.k6Q1cLH16823@momjian.us
Whole thread Raw
In response to Re: Resurrecting per-page cleaner for btree  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Resurrecting per-page cleaner for btree  (Bruce Momjian <bruce@momjian.us>)
Re: Resurrecting per-page cleaner for btree  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Re: [HACKERS] Resurrecting per-page cleaner for btree  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
The attached patch requires the new row to fit, and 10% to be free on
the page.  Would someone test that?

---------------------------------------------------------------------------

Tom Lane wrote:
> ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> > This is a revised patch originated by Junji TERAMOTO for HEAD.
> >   [BTree vacuum before page splitting]
> >   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> > I think we can resurrect his idea because we will scan btree pages
> > at-atime now; the missing-restarting-point problem went away.
>
> I've applied this but I'm now having some second thoughts about it,
> because I'm seeing an actual *decrease* in pgbench numbers from the
> immediately prior CVS HEAD code.  Using
>     pgbench -i -s 10 bench
>     pgbench -c 10 -t 1000 bench    (repeat this half a dozen times)
> with fsync off but all other settings factory-stock, what I'm seeing
> is that the first run looks really good but subsequent runs tail off in
> spectacular fashion :-(  Pre-patch there was only minor degradation in
> successive runs.
>
> What I think is happening is that because pgbench depends so heavily on
> updating existing records, we get into a state where an index page is
> about full and there's one dead tuple on it, and then for each insertion
> we have
>
>     * check for uniqueness marks one more tuple dead (the
>       next-to-last version of the tuple)
>     * newly added code removes one tuple and does a write
>     * now there's enough room to insert one tuple
>     * lather, rinse, repeat, never splitting the page.
>
> The problem is that we've traded splitting a page every few hundred
> inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
> record, on *every* insert.  This is not good.
>
> Had you done any performance testing on this patch, and if so what
> tests did you use?  I'm a bit hesitant to try to fix it on the basis
> of pgbench results alone.
>
> One possible fix that comes to mind is to only perform the cleanup
> if we are able to remove more than one dead tuple (perhaps about 10
> would be good).  Or do the deletion anyway, but then go ahead and
> split the page unless X amount of space has been freed (where X is
> more than just barely enough for the incoming tuple).
>
> After all the thought we've put into this, it seems a shame to
> just abandon it :-(.  But it definitely needs more tweaking.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/access/nbtree/nbtinsert.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/nbtree/nbtinsert.c,v
retrieving revision 1.142
diff -c -c -r1.142 nbtinsert.c
*** src/backend/access/nbtree/nbtinsert.c    25 Jul 2006 19:13:00 -0000    1.142
--- src/backend/access/nbtree/nbtinsert.c    26 Jul 2006 01:35:52 -0000
***************
*** 438,445 ****
              if (P_ISLEAF(lpageop) && P_HAS_GARBAGE(lpageop))
              {
                  _bt_vacuum_one_page(rel, buf);
!                 if (PageGetFreeSpace(page) >= itemsz)
!                     break;        /* OK, now we have enough space */
              }

              /*
--- 438,451 ----
              if (P_ISLEAF(lpageop) && P_HAS_GARBAGE(lpageop))
              {
                  _bt_vacuum_one_page(rel, buf);
!                 /*
!                  *    Free space should be large enough for the new tuple and
!                  *    should be >= 10% because scanning the page over and
!                  *    over again to get just a little free space is inefficient.
!                  */
!                 if (PageGetFreeSpace(page) >= itemsz &&
!                     PageGetFreeSpace(page) >= BLCKSZ / 10)
!                     break;
              }

              /*

pgsql-patches by date:

Previous
From: "William ZHANG"
Date:
Subject: Re: Patch for VS.Net 2005's strxfrm() bug
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Resurrecting per-page cleaner for btree