Re: [PATCHES] Resurrecting per-page cleaner for btree - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCHES] Resurrecting per-page cleaner for btree
Date
Msg-id 23887.1153856278@sss.pgh.pa.us
Whole thread Raw
In response to Resurrecting per-page cleaner for btree  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Responses Re: [PATCHES] Resurrecting per-page cleaner for btree  (Bruce Momjian <bruce@momjian.us>)
Re: Resurrecting per-page cleaner for btree  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Re: [PATCHES] Resurrecting per-page cleaner for btree  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> This is a revised patch originated by Junji TERAMOTO for HEAD.
>   [BTree vacuum before page splitting]
>   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> I think we can resurrect his idea because we will scan btree pages
> at-atime now; the missing-restarting-point problem went away.

I've applied this but I'm now having some second thoughts about it,
because I'm seeing an actual *decrease* in pgbench numbers from the
immediately prior CVS HEAD code.  Using
    pgbench -i -s 10 bench
    pgbench -c 10 -t 1000 bench    (repeat this half a dozen times)
with fsync off but all other settings factory-stock, what I'm seeing
is that the first run looks really good but subsequent runs tail off in
spectacular fashion :-(  Pre-patch there was only minor degradation in
successive runs.

What I think is happening is that because pgbench depends so heavily on
updating existing records, we get into a state where an index page is
about full and there's one dead tuple on it, and then for each insertion
we have

    * check for uniqueness marks one more tuple dead (the
      next-to-last version of the tuple)
    * newly added code removes one tuple and does a write
    * now there's enough room to insert one tuple
    * lather, rinse, repeat, never splitting the page.

The problem is that we've traded splitting a page every few hundred
inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
record, on *every* insert.  This is not good.

Had you done any performance testing on this patch, and if so what
tests did you use?  I'm a bit hesitant to try to fix it on the basis
of pgbench results alone.

One possible fix that comes to mind is to only perform the cleanup
if we are able to remove more than one dead tuple (perhaps about 10
would be good).  Or do the deletion anyway, but then go ahead and
split the page unless X amount of space has been freed (where X is
more than just barely enough for the incoming tuple).

After all the thought we've put into this, it seems a shame to
just abandon it :-(.  But it definitely needs more tweaking.

            regards, tom lane

pgsql-hackers by date:

Previous
From: "Dave Page"
Date:
Subject: Re: root/administartor user check option.
Next
From: Joachim Wieland
Date:
Subject: status of yet another timezone todo item