Thread: Visibility map and hint bits

Visibility map and hint bits

From

Bruce Momjian

Date:

05 May 2011, 13:59:51

There has been a lot of recent discussion about the visibility map (for
index-only scans) and hint bits (trying to avoid double-writing a
table).

I wonder if we could fix both of these at the same time.  Once the
visibility map is reliable, can we use that to avoid updating the hint
bits on all rows on a page?

For bulk loads, all the pages are going have the same xid and all be
visible, so instead of writing the entire table, we just write the
visibility map.

I think the problem is that we have the PD_ALL_VISIBLE page flag, which
requires a write of the page as well.  Could we get by with only the
visibility bits and remove PD_ALL_VISIBLE?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Visibility map and hint bits

From

Merlin Moncure

Date:

05 May 2011, 15:11:51

On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian <bruce@momjian.us> wrote:
> There has been a lot of recent discussion about the visibility map (for
> index-only scans) and hint bits (trying to avoid double-writing a
> table).

I still think a small tqual.c maintained cache of hint bits will
effectively eliminate hint bit i/o issues surrounding bulk loads.  Tom
fired a shot across the bow regarding the general worthiness of that
technique though (see:
http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html)
:(.  I can rig up a cleaned up version of the patch pretty
easily...it's a local change and fairly simple.

I don't think there is any way to remove the hint bits without
suffering some other problem.

merlin

Re: Visibility map and hint bits

From

Bruce Momjian

Date:

05 May 2011, 15:34:45

Merlin Moncure wrote:
> On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > There has been a lot of recent discussion about the visibility map (for
> > index-only scans) and hint bits (trying to avoid double-writing a
> > table).
> 
> I still think a small tqual.c maintained cache of hint bits will
> effectively eliminate hint bit i/o issues surrounding bulk loads.  Tom
> fired a shot across the bow regarding the general worthiness of that
> technique though (see:
> http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html)
> :(.  I can rig up a cleaned up version of the patch pretty
> easily...it's a local change and fairly simple.
> 
> I don't think there is any way to remove the hint bits without
> suffering some other problem.

Was that the idea that the pages had to fit in the cache and be updated
with hint bits before being written to disk?  Restricting that to the
size of the buffer cache seemed very limiting.

One 8k visibilty map page can hold bits for 1/2 gig of heap pages so I
thought that would be a better all-visible indictor and avoid many
all-visible page writes in bulk load cases.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Visibility map and hint bits

From

Robert Haas

Date:

05 May 2011, 15:45:39

On Thu, May 5, 2011 at 12:59 PM, Bruce Momjian <bruce@momjian.us> wrote:
> I wonder if we could fix both of these at the same time.  Once the
> visibility map is reliable, can we use that to avoid updating the hint
> bits on all rows on a page?

I don't think so.  There are two problems:

1. If there is a long-running transaction on the system, it will not
be possible to set PD_ALL_VISIBLE, but hint bits can still be set.  So
there could be a significant performance regression if we don't set
hint bits in that case.

2. Making the visibility map crash-safe will mean making setting hint
bits emit XLOG records, so it can't be done on Hot Standby servers at
all, and it's much more expensive than just setting a hint bit on the
master.

> For bulk loads, all the pages are going have the same xid and all be
> visible, so instead of writing the entire table, we just write the
> visibility map.
>
> I think the problem is that we have the PD_ALL_VISIBLE page flag, which
> requires a write of the page as well.  Could we get by with only the
> visibility bits and remove PD_ALL_VISIBLE?

In some ways, that would make things much simpler.  But to make that
work, every insert/update/delete to a page would have to pin the
visibility map page and clear PD_ALL_VISIBLE if appropriate, so it
might not be good from a performance standpoint, especially in
high-concurrency workloads.  Right now, if PD_ALL_VISIBLE isn't set,
we don't bother touching the visibility map page, which seems like a
possibly important optimization.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Visibility map and hint bits

From

Merlin Moncure

Date:

05 May 2011, 15:52:08

On Thu, May 5, 2011 at 1:34 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Merlin Moncure wrote:
>> On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> > There has been a lot of recent discussion about the visibility map (for
>> > index-only scans) and hint bits (trying to avoid double-writing a
>> > table).
>>
>> I still think a small tqual.c maintained cache of hint bits will
>> effectively eliminate hint bit i/o issues surrounding bulk loads.  Tom
>> fired a shot across the bow regarding the general worthiness of that
>> technique though (see:
>> http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html)
>> :(.  I can rig up a cleaned up version of the patch pretty
>> easily...it's a local change and fairly simple.
>>
>> I don't think there is any way to remove the hint bits without
>> suffering some other problem.
>
> Was that the idea that the pages had to fit in the cache and be updated
> with hint bits before being written to disk?  Restricting that to the
> size of the buffer cache seemed very limiting.
>
> One 8k visibilty map page can hold bits for 1/2 gig of heap pages so I
> thought that would be a better all-visible indictor and avoid many
> all-visible page writes in bulk load cases.

no, that was my first idea -- check visibility when you evict.  that
helps a different problem but not bulk loads.  One way it could help
is for marking PD_ALL_VISIBLE.  This might also be a winner but there
is some valid skepticism that adding more work for bgwriter is really
a good idea.

The tqual cache idea is such that there is a small cache that
remembers the commit/cancel status of recently seen transactions. If
scan a tuple and the status is known via cache, you set the bit but
don't mark the page dirty.  That way, if you are scanning a lot of
unhinted tuples with similar xid, you don't need to jam out i/o.  I
think the general concept is clean, but it might need some buy in from
tom and some performance testing for justification.

The alternate 'cleaner' approach of maintaining larger transam.c cache
had some downsides I saw no simple workaround for.

merlin

Re: Visibility map and hint bits

From

"Kevin Grittner"

Date:

05 May 2011, 16:00:30

Merlin Moncure <mmoncure@gmail.com> wrote:
> a small cache that remembers the commit/cancel status of recently
> seen transactions.
How is that different from the head of the clog SLRU?
-Kevin

Re: Visibility map and hint bits

From

Merlin Moncure

Date:

05 May 2011, 16:20:24

On Thu, May 5, 2011 at 2:00 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> a small cache that remembers the commit/cancel status of recently
>> seen transactions.
>
> How is that different from the head of the clog SLRU?

several things:
*) any slru access requires lock (besides the lock itself, you are
spending cycles in critical path)
*) cache access happens at different stage of processing in
HeapTupleSatisfiesMVCC: both TransactionIdIsCurrentTransactionId and
TransactionIdIsInProgress have to be checked first. Logically, it's
extension of hint bit check itself, not expansion of lower levels of
caching
*) in tqual.c you can sneak in some small optimizations like only
caching the bit if it's known good in the WAL (XlogNeedsFlush).  That
way you don't need to keep checking it over and over for the same
trasaction
*) slru level accesses happen too late to give much benefit:

I can't stress enough how tight HeapTupleSatisfiesMVCC is.  On my
workstation VM, each non inline function call shows up measurably in
profiling.  I think anything you do here has to be inline, hand
rolled, and very tight (you can forget anything around dynahash).
Delegating the cache management to transam or (even worse) slru level
penalizes some workloads non-trivially.

merlin

Re: Visibility map and hint bits

From

Merlin Moncure

Date:

06 May 2011, 11:42:55

On Thu, May 5, 2011 at 2:20 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, May 5, 2011 at 2:00 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>>> a small cache that remembers the commit/cancel status of recently
>>> seen transactions.
>>
>> How is that different from the head of the clog SLRU?
>
> several things:
> *) any slru access requires lock (besides the lock itself, you are
> spending cycles in critical path)
> *) cache access happens at different stage of processing in
> HeapTupleSatisfiesMVCC: both TransactionIdIsCurrentTransactionId and
> TransactionIdIsInProgress have to be checked first. Logically, it's
> extension of hint bit check itself, not expansion of lower levels of
> caching
> *) in tqual.c you can sneak in some small optimizations like only
> caching the bit if it's known good in the WAL (XlogNeedsFlush).  That
> way you don't need to keep checking it over and over for the same
> trasaction
> *) slru level accesses happen too late to give much benefit:
>
> I can't stress enough how tight HeapTupleSatisfiesMVCC is.  On my
> workstation VM, each non inline function call shows up measurably in
> profiling.  I think anything you do here has to be inline, hand
> rolled, and very tight (you can forget anything around dynahash).
> Delegating the cache management to transam or (even worse) slru level
> penalizes some workloads non-trivially.

An updated patch is attached.  It's still WIP, but I need a little
guidance before going further.

What I did:
*) Added a lot of source level comments that should explain better
what's happening and why
*) Fixed a significant number of goofs in the earlier patch.
*) Reorganized the interaction with HeapTupleSatisfiesMVCC.  In
particular SetHintBits() is returning if it actually set the bit
because I can use that information.

What's not done:
*) Only commit bits are cached, and caching action is only happening
in HeapTupleSatisfiesMVCC.  I'm not sure yet if it's better to store
invalid bits in the same cache or in a separate one.  I'm not sure if
the other satisfies routines should also be engaging the cache.
Translated from nerd speak, that means I haven't yet done the research
to see when they are fired and if they are bottlenecks :-).

*) I'd like to reach some sort of consensus with Tom if there is any
point in going further in direction.  Not so much on how the mechanics
of how the cache work, but that it is at the tqual.c level and the
changes to HeapTuplesSatisfiesMVCC. In particular.  I think caching at
transam.c level is a dead end on performance grounds regardless of how
you implement the cache.

Some points of note:
*) Is it acceptable to use static definition of memory like that.  If
not, should there be a more standard allocation under
CacheMemoryContext?

*) Testing for the benefit is simple: just create a bunch of records
and seqscan the table (select count(*)).  Without the patch the first
scan is slower and does a bunch of i/o.  With it, it does not.

*) The cache overhead is *almost* not measurable.   As best I can tell
we are looking at maybe 1% ish overhead in synthetic scan heavy
workloads (i think this is a fair price to pay for all the i/o
savings).  The degenerate case of repeated 'rollups' is really
difficult to generate, even synthetically -- if the cache is
performing lousily the regular hint bit action tends to protect it.
Performance testing under real workloads is going to give better info
here.

merlin

Attachment

hbache.patch