Home > mailing lists

Re: [PATCHES] Resurrecting per-page cleaner for btree - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: [PATCHES] Resurrecting per-page cleaner for btree
Date	July 25, 2006 16:38:41
Msg-id	23887.1153856278@sss.pgh.pa.us Whole thread Raw
In response to	Resurrecting per-page cleaner for btree (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Responses	Re: [PATCHES] Resurrecting per-page cleaner for btree Re: Resurrecting per-page cleaner for btree Re: [PATCHES] Resurrecting per-page cleaner for btree
List	pgsql-hackers

Tree view

ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> This is a revised patch originated by Junji TERAMOTO for HEAD.
>   [BTree vacuum before page splitting]
>   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> I think we can resurrect his idea because we will scan btree pages
> at-atime now; the missing-restarting-point problem went away.

I've applied this but I'm now having some second thoughts about it,
because I'm seeing an actual *decrease* in pgbench numbers from the
immediately prior CVS HEAD code.  Using
    pgbench -i -s 10 bench
    pgbench -c 10 -t 1000 bench    (repeat this half a dozen times)
with fsync off but all other settings factory-stock, what I'm seeing
is that the first run looks really good but subsequent runs tail off in
spectacular fashion :-(  Pre-patch there was only minor degradation in
successive runs.

What I think is happening is that because pgbench depends so heavily on
updating existing records, we get into a state where an index page is
about full and there's one dead tuple on it, and then for each insertion
we have

    * check for uniqueness marks one more tuple dead (the
      next-to-last version of the tuple)
    * newly added code removes one tuple and does a write
    * now there's enough room to insert one tuple
    * lather, rinse, repeat, never splitting the page.

The problem is that we've traded splitting a page every few hundred
inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
record, on *every* insert.  This is not good.

Had you done any performance testing on this patch, and if so what
tests did you use?  I'm a bit hesitant to try to fix it on the basis
of pgbench results alone.

One possible fix that comes to mind is to only perform the cleanup
if we are able to remove more than one dead tuple (perhaps about 10
would be good).  Or do the deletion anyway, but then go ahead and
split the page unless X amount of space has been freed (where X is
more than just barely enough for the incoming tuple).

After all the thought we've put into this, it seems a shame to
just abandon it :-(.  But it definitely needs more tweaking.

            regards, tom lane

pgsql-hackers by date:

From: "Dave Page"
Date: 25 July 2006, 16:21:40
Subject: Re: root/administartor user check option.

From: Joachim Wieland
Date: 25 July 2006, 16:46:00
Subject: status of yet another timezone todo item

Re: [PATCHES] Resurrecting per-page cleaner for btree - Mailing list pgsql-hackers

Previous

Next