Re: limiting hint bit I/O - Mailing list pgsql-hackers

From Robert Haas
Subject Re: limiting hint bit I/O
Date
Msg-id AANLkTinD8SAC5uQetkH230YoU1A_kL6GkgO6t4=Op4Pv@mail.gmail.com
Whole thread Raw
In response to Re: limiting hint bit I/O  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: limiting hint bit I/O  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Tue, Jan 18, 2011 at 1:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>>>> I think you may be confused about what the patch does - currently,
>>>> pages with hint bit changes are considered dirty, period.
>>>> Therefore, they are written whenever any other dirty page would be
>>>> written: by the background writer cleaning scan, at checkpoints,
>>>> and when a backend must write a dirty buffer before reallocating it
>>>> to hold a different page. The patch keeps the first of these and
>>>> changes the second two
>
> While I was trying to performance-test the texteq patch, it occurred to
> me that this proposed hint-bit change has got a serious drawback.  To
> wit, that it will totally destroy reproducibility of any performance
> test that involves table scans.  Right now, you know that you can take
> hint bits out of the equation by doing a vacuum analyze and checkpoint;
> after that, all hint bits in the table are known to be set and written
> to disk.  Then you can get on with comparing the effects of some patch
> or other.  With the proposed patch, it will never be clear whether
> all the hint bits are set, because the patch specifically removes the
> deterministic ways to get a hint bit written out.  So you'll never be
> very sure whether a performance difference you think you see is real,
> or whether one case or the other got affected by extra clog lookups.
> It's hard enough already to be sure about performance changes on the
> order of 1%, but this will make it impossible.

True.  You could perhaps fix that by adding a GUC, but that feels
awfully like making it the user's problem to fix our broken
implementation.  Maybe we could live with it if the GUC were only
something developers ever needed to use, but I expect different people
would have different ideas about the correct setting in production.

If I'm not failing to understand the situation, the problem with the
first sequential scan after a bulk load is that we're cycling through
a ring of buffers that all have hint-bit changes and therefore all
have to be written.  The first pass through the ring is OK, but after
that every new buffer we bring in requires evicting a buffer that we
first have to write.  Of course, with the patch, this bottleneck is
removed by skipping all those writes, but that now causes a second
problem: the pages only get written if the background writer happens
to notice them before the backend gets all the way around the ring,
and that's pretty hit-or-miss, so we basically dribble hint bits out
to disk here and there but the steady state never really converges to
"all hint bits on disk".

Maybe we could work around this by making the algorithm a little more
sophisticated.  Instead of the rather unilateral policy "backends
don't write pages that are only dirty due to hint bit changes!" we
could have some more nuanced rules.  For example, we might decree that
a backend will maintain a counter of the number of non-dirty pages
it's allocated.  Once it's allocated 20 pages that are either clean or
dirty-only-for-hint-bits, it writes that (or the next)
dirty-only-for-hint-bits it encounters.  That way, the effort of hint
bit setting would be spread out over the first 20 table scans, and
after that you converge to steady state.  We could also possibly
special-case vacuum to always write dirty-only-for-hint bits pages, on
the theory that the work is going to have to be done at some point,
and we're better off doing it during a maintenance task than
elsewhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: texteq/byteaeq: avoid detoast
Next
From: Josh Berkus
Date:
Subject: Re: Replication logging