Thread: How much do the hint bits help?

How much do the hint bits help?

From

Merlin Moncure

Date:

21 December 2010, 18:42:47

I've been playing around with postgresql hint bits in order to teach
myself more about the internals of the MVCC system.  I noticed that
the hint bit system has been around forever (Vadim era) and predates
several backend improvements that might affect their usefulness.  So I
started playing around, trying to quantify the benefit they provide
with an eye of optimizing clog lookups if it turned out to be
necessary say by mmap-ing a big transaction status file just to see if
that helped.

Attached is an incomplete patch disabling hint bits based on compile
switch.  It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines.  However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around.  Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

So far, at least doing pgbench runs and another test designed to
exercise clog lookups, the performance loss of always doing full
lookup hasn't materialized.  Note that in these cases the clog lru
cache is pretty effective, and it's pretty likely I may have blown it
in some other way, so take the results for a grain of salt.   But,
here are the following questions/points:

*) relative to when the hint bits where implemented, the amount of
transactions to map has shrunk, while hardware has improved by a
couple of orders of magnitude.  Also the postgres architecture has
changed considerably.  Are they still necessary?

*) what's a good way to stress the clog severely? I'd like to pick a
degenerate case to get a better idea of the way things stand without
them.

*) is there community interest in a full patch that fills in the
missing details not implemented here?

merlin

Attachment

Re: How much do the hint bits help?

From

"Kevin Grittner"

Date:

21 December 2010, 19:42:04

Merlin Moncure <mmoncure@gmail.com> wrote:
> *) what's a good way to stress the clog severely? I'd like to pick
> a degenerate case to get a better idea of the way things stand
> without them.
The worst I can think of is a large database with a 90/10 mix of
reads to writes -- all short transactions.  Maybe someone else can
do better.  In particular, I'm not sure how savepoints might play
into a degenerate case.
Since we're always talking about how to do better with hint bits
during an unlogged bulk load, it would be interesting to benchmark
one of those followed by a `select count(*) from newtable;` with and
without the patch, on a data set too big to fit in RAM.
> *) is there community interest in a full patch that fills in the
> missing details not implemented here?
I'm certainly curious to see real numbers.
-Kevin

Re: How much do the hint bits help?

From

Mark Kirkwood

Date:

21 December 2010, 20:05:33

On 22/12/10 11:42, Merlin Moncure wrote:
> Attached is an incomplete patch disabling hint bits based on compile
> switch.  It's not complete, for example it's not reconciling some
> assumptions in heapam.c that hint bits have been set in various
> routines.  However, it mostly passes regression and I deemed it good
> enough to run some preliminary benchmarks and fool around.  Obviously,
> hint bits are an annoying impediment to a couple of other cool pending
> features, and it certainly would be nice to operate without them.
> Also, for particular workloads, the extra i/o hint bits can cause a
> fair amount of pain.

Looks like a great idea to test, however I don't seem to be able to 
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of 
src/include/pg_config_manual.h)

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing 
-fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in 
this function)
heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in this 
function)
make[4]: *** [heapam.o] Error 1

Re: How much do the hint bits help?

From

Mark Kirkwood

Date:

21 December 2010, 20:06:46

On 22/12/10 13:05, Mark Kirkwood wrote:
> On 22/12/10 11:42, Merlin Moncure wrote:
>> Attached is an incomplete patch disabling hint bits based on compile
>> switch.  It's not complete, for example it's not reconciling some
>> assumptions in heapam.c that hint bits have been set in various
>> routines.  However, it mostly passes regression and I deemed it good
>> enough to run some preliminary benchmarks and fool around.  Obviously,
>> hint bits are an annoying impediment to a couple of other cool pending
>> features, and it certainly would be nice to operate without them.
>> Also, for particular workloads, the extra i/o hint bits can cause a
>> fair amount of pain.
>
> Looks like a great idea to test, however I don't seem to be able to 
> compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of 
> src/include/pg_config_manual.h)
>
> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing 
> -fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o 
> heapam.c
> heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
> heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in 
> this function)
> heapam.c:3867: error: (Each undeclared identifier is reported only once
> heapam.c:3867: error: for each function it appears in.)
> heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in 
> this function)
> make[4]: *** [heapam.o] Error 1
>

Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)

Re: How much do the hint bits help?

From

Merlin Moncure

Date:

21 December 2010, 20:20:33

On Tue, Dec 21, 2010 at 7:06 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> On 22/12/10 13:05, Mark Kirkwood wrote:
>>
>> On 22/12/10 11:42, Merlin Moncure wrote:
>>>
>>> Attached is an incomplete patch disabling hint bits based on compile
>>> switch.  It's not complete, for example it's not reconciling some
>>> assumptions in heapam.c that hint bits have been set in various
>>> routines.  However, it mostly passes regression and I deemed it good
>>> enough to run some preliminary benchmarks and fool around.  Obviously,
>>> hint bits are an annoying impediment to a couple of other cool pending
>>> features, and it certainly would be nice to operate without them.
>>> Also, for particular workloads, the extra i/o hint bits can cause a
>>> fair amount of pain.
>>
>> Looks like a great idea to test, however I don't seem to be able to
>> compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
>> src/include/pg_config_manual.h)
>>
>> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
>> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
>> -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
>> heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
>> heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in this
>> function)
>> heapam.c:3867: error: (Each undeclared identifier is reported only once
>> heapam.c:3867: error: for each function it appears in.)
>> heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in this
>> function)
>> make[4]: *** [heapam.o] Error 1
>>
>
> Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)

did you check to see if the patch applied clean? btw I was working
against postgresql-9.0.1...

it looks like you are missing at least some of the changes to htup.h:

../postgresql-9.0.1_hb2/src/include/access/htup.h

#ifndef DISABLE_HINT_BITS
#define HEAP_XMIN_COMMITTED        0x0100    /* t_xmin committed */
#define HEAP_XMIN_INVALID        0x0200    /* t_xmin invalid/aborted */
#define HEAP_XMAX_COMMITTED        0x0400    /* t_xmax committed */
#define HEAP_XMAX_INVALID        0x0800    /* t_xmax invalid/aborted */
#endif

merlin

Re: How much do the hint bits help?

From

Merlin Moncure

Date:

21 December 2010, 20:23:58

On Tue, Dec 21, 2010 at 7:20 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Dec 21, 2010 at 7:06 PM, Mark Kirkwood
> <mark.kirkwood@catalyst.net.nz> wrote:
>> On 22/12/10 13:05, Mark Kirkwood wrote:
>>>
>>> On 22/12/10 11:42, Merlin Moncure wrote:
>>>>
>>>> Attached is an incomplete patch disabling hint bits based on compile
>>>> switch.  It's not complete, for example it's not reconciling some
>>>> assumptions in heapam.c that hint bits have been set in various
>>>> routines.  However, it mostly passes regression and I deemed it good
>>>> enough to run some preliminary benchmarks and fool around.  Obviously,
>>>> hint bits are an annoying impediment to a couple of other cool pending
>>>> features, and it certainly would be nice to operate without them.
>>>> Also, for particular workloads, the extra i/o hint bits can cause a
>>>> fair amount of pain.
>>>
>>> Looks like a great idea to test, however I don't seem to be able to
>>> compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
>>> src/include/pg_config_manual.h)
>>>
>>> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
>>> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
>>> -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
>>> heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
>>> heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in this
>>> function)
>>> heapam.c:3867: error: (Each undeclared identifier is reported only once
>>> heapam.c:3867: error: for each function it appears in.)
>>> heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in this
>>> function)
>>> make[4]: *** [heapam.o] Error 1
>>>
>>
>> Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)
>
> did you check to see if the patch applied clean? btw I was working
> against postgresql-9.0.1...

ah, this is the problem (9.0.1 vs head).  to work vs head it prob
needs a few more tweaks.  you can also try removing it yourself --
most of the changes follow a similar pattern.

merlin

Re: How much do the hint bits help?

From

Tom Lane

Date:

21 December 2010, 20:45:25

Merlin Moncure <mmoncure@gmail.com> writes:
> Attached is an incomplete patch disabling hint bits based on compile
> switch. ...
> So far, at least doing pgbench runs and another test designed to
> exercise clog lookups, the performance loss of always doing full
> lookup hasn't materialized.

The standard pgbench test would be just about 100% useless for stressing
this, because its net database activity is only about one row
touched/updated per query.  You need a test case that hits lots of rows
per query, else you're just measuring parse+plan+network overhead.
        regards, tom lane

Re: How much do the hint bits help?

From

Merlin Moncure

Date:

21 December 2010, 20:56:54

On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> Attached is an incomplete patch disabling hint bits based on compile
>> switch. ...
>> So far, at least doing pgbench runs and another test designed to
>> exercise clog lookups, the performance loss of always doing full
>> lookup hasn't materialized.
>
> The standard pgbench test would be just about 100% useless for stressing
> this, because its net database activity is only about one row
> touched/updated per query.  You need a test case that hits lots of rows
> per query, else you're just measuring parse+plan+network overhead.

right -- see the attached clog_stress.sql above.  It creates a script
that inserts records in blocks of 10000, deletes half of them, and
vacuums.  Neither the execution of the script nor a seq scan following
its execution showed an interesting performance difference (which I am
arbitrarily calling 5% in either direction).  Like I said though, I
don't trust the patch or the results yet.

@Mark: apparently the cvs server is behind git and there are some
recent changes to heapam.c that need more attention.  I need to get
git going on my box, but try changing this:
if ((tuple->t_infomask & HEAP_XMIN_COMMITTED) ||    (!(tuple->t_infomask & HEAP_XMIN_COMMITTED) &&
!(tuple->t_infomask& HEAP_XMIN_INVALID) &&     TransactionIdDidCommit(xmin))) 

to this:
if (TransactionIdDidCommit(xmin))

also, isn't the extra check vs HEAP_XMIN_COMMITTED redundant, and if
you do have to look up clog, why not set the hint bit?

merlin

Re: How much do the hint bits help?

From

Mark Kirkwood

Date:

21 December 2010, 23:03:57

On 22/12/10 13:56, Merlin Moncure wrote:
> On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>
> @Mark: apparently the cvs server is behind git and there are some
> recent changes to heapam.c that need more attention.  I need to get
> git going on my box, but try changing this:
>
>     if ((tuple->t_infomask&  HEAP_XMIN_COMMITTED) ||
>         (!(tuple->t_infomask&  HEAP_XMIN_COMMITTED)&&
>         !(tuple->t_infomask&  HEAP_XMIN_INVALID)&&
>         TransactionIdDidCommit(xmin)))
>
> to this:
>
>     if (TransactionIdDidCommit(xmin))
>
> also, isn't the extra check vs HEAP_XMIN_COMMITTED redundant, and if
> you do have to look up clog, why not set the hint bit?
>

That gets it compiling.

Re: How much do the hint bits help?

From

Heikki Linnakangas

Date:

22 December 2010, 03:21:02

On 22.12.2010 02:56, Merlin Moncure wrote:
> On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Merlin Moncure<mmoncure@gmail.com>  writes:
>>> Attached is an incomplete patch disabling hint bits based on compile
>>> switch. ...
>>> So far, at least doing pgbench runs and another test designed to
>>> exercise clog lookups, the performance loss of always doing full
>>> lookup hasn't materialized.
>>
>> The standard pgbench test would be just about 100% useless for stressing
>> this, because its net database activity is only about one row
>> touched/updated per query.  You need a test case that hits lots of rows
>> per query, else you're just measuring parse+plan+network overhead.
>
> right -- see the attached clog_stress.sql above.  It creates a script
> that inserts records in blocks of 10000, deletes half of them, and
> vacuums.  Neither the execution of the script nor a seq scan following
> its execution showed an interesting performance difference (which I am
> arbitrarily calling 5% in either direction).  Like I said though, I
> don't trust the patch or the results yet.

Make sure you have a good mix of different xids in the table, 
TransactionLogFetch has a one-item cache so repeatedly checking the same 
xid is much faster than the general case.

Perhaps run pgbench for a while, and then do "SELECT COUNT(*)" on the 
resulting tables.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: How much do the hint bits help?

From

Simon Riggs

Date:

22 December 2010, 09:22:08

On Tue, 2010-12-21 at 17:42 -0500, Merlin Moncure wrote:

> *) is there community interest in a full patch that fills in the
> missing details not implemented here? 

You're thinking seems sound to me. We now have all-visible flags, fewer
xids, much better clog concurrency. Avoiding hint bits would also
noticeably reduce number of dirty writes, especially at checkpoint.

Hot Standby already ignores hint bits and I've not heard a single
complaint, so we are already doing this in the code.

I don't see any reason to believe that there is not an equally effective
optimisation that we can apply to bring performance back up, if it is
shown to drop in particular use cases.

I would vote to put this into 9.1 as a non-default option at restart,
opening the door to other features which hint bits are frustrating.
People can then choose between those features and the "power of hint
bits". I think many people would choose db block checksums.

If you need support, or direct help with the code, just ask. Am happy to
be your committer also.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: How much do the hint bits help?

From

Heikki Linnakangas

Date:

22 December 2010, 09:30:36

On 22.12.2010 15:21, Simon Riggs wrote:
> On Tue, 2010-12-21 at 17:42 -0500, Merlin Moncure wrote:
>
>> *) is there community interest in a full patch that fills in the
>> missing details not implemented here?
>
> You're thinking seems sound to me. We now have all-visible flags, fewer
> xids, much better clog concurrency. Avoiding hint bits would also
> noticeably reduce number of dirty writes, especially at checkpoint.

Yep.

> Hot Standby already ignores hint bits and I've not heard a single
> complaint, so we are already doing this in the code.

No, the XMIN/XMAX committed/invalid hint bits on each heap tuple are 
used during hot sandby just like during normal operation. We ignore the 
index tuples marked as dead during hot standby, but that's a different 
issue.

> I would vote to put this into 9.1 as a non-default option at restart,
> opening the door to other features which hint bits are frustrating.
> People can then choose between those features and the "power of hint
> bits". I think many people would choose db block checksums.

Making it optional would add some ifs in the critical paths, possibly 
making it slower.

My gut feeling is that a reasonable compromise is to set hint bits like 
we do today, but don't mark the page as dirty when only hint bits are 
set. That way you get the benefit of hint bits for tuples that are 
frequently accessed and stay in buffer cache. But you don't spend any 
extra I/O to set them. I'd really like to see a worst-case scenario 
benchmark of a patch that does that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: How much do the hint bits help?

From

Simon Riggs

Date:

22 December 2010, 09:59:38

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

> > I would vote to put this into 9.1 as a non-default option at restart,
> > opening the door to other features which hint bits are frustrating.
> > People can then choose between those features and the "power of hint
> > bits". I think many people would choose db block checksums.
> 
> Making it optional would add some ifs in the critical paths, possibly 
> making it slower.

Hardly. A server-start parameter is going to be constant during
execution and branch prediction will just snuff that away to nothing.

> My gut feeling is that a reasonable compromise is to set hint bits like 
> we do today, but don't mark the page as dirty when only hint bits are 
> set. That way you get the benefit of hint bits for tuples that are 
> frequently accessed and stay in buffer cache. But you don't spend any 
> extra I/O to set them. I'd really like to see a worst-case scenario 
> benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness. This isn't a discussion about hint
bits, its a discussion about opening the way for other features.

ISTM there are other ways of optimising any clog issues that may remain,
so clutching to this ancient optimisation has no further benefit for me.

Merlin's idea seems to me to be original, useful *and* reasonable.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: How much do the hint bits help?

From

Heikki Linnakangas

Date:

22 December 2010, 10:22:21

On 22.12.2010 15:59, Simon Riggs wrote:
> On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:
>> My gut feeling is that a reasonable compromise is to set hint bits like
>> we do today, but don't mark the page as dirty when only hint bits are
>> set. That way you get the benefit of hint bits for tuples that are
>> frequently accessed and stay in buffer cache. But you don't spend any
>> extra I/O to set them. I'd really like to see a worst-case scenario
>> benchmark of a patch that does that.
>
> That sounds great, but still prevents block checksums and that is a very
> valuable feature for robustness.

It does? The problem with block checksums is that if you modify a page 
and don't have a corresponding WAL record for it, like a hint bit 
update, you can have a torn page so that the checksum doesn't match. 
Refraining from dirtying the page when a hint bit is updated avoids the 
problem. With that change, we only ever write pages to disk that have a 
WAL record associated with it, with full-page images as necessary to 
avoid torn pages.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: How much do the hint bits help?

From

Simon Riggs

Date:

22 December 2010, 10:53:09

On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:
> On 22.12.2010 15:59, Simon Riggs wrote:
> > On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:
> >> My gut feeling is that a reasonable compromise is to set hint bits like
> >> we do today, but don't mark the page as dirty when only hint bits are
> >> set. That way you get the benefit of hint bits for tuples that are
> >> frequently accessed and stay in buffer cache. But you don't spend any
> >> extra I/O to set them. I'd really like to see a worst-case scenario
> >> benchmark of a patch that does that.
> >
> > That sounds great, but still prevents block checksums and that is a very
> > valuable feature for robustness.
> 
> It does? The problem with block checksums is that if you modify a page 
> and don't have a corresponding WAL record for it, like a hint bit 
> update, you can have a torn page so that the checksum doesn't match. 
> Refraining from dirtying the page when a hint bit is updated avoids the 
> problem. With that change, we only ever write pages to disk that have a 
> WAL record associated with it, with full-page images as necessary to 
> avoid torn pages.

Which then leads to a block CRC not matching the block in memory. Sure,
we can avoid CRC checking the hint bits, but that requires a much more
expensive and complex CRC check.

So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.

Postgres needs CRC checking more than it needs hint bits.

I think we should allow this as an option, and if it proves to be an
issue during beta then we can remove it before we go live, assuming we
cannot get a reasonable alternate optimisation.

I think its important for Postgres to implement this in the same release
as sync rep. They complement each other: confirmed robustness. Exactly
the features we need to prove to the rest of the world to trust us with
their data.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: How much do the hint bits help?

From

Robert Haas

Date:

22 December 2010, 11:01:26

On Wed, Dec 22, 2010 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I think its important for Postgres to implement this in the same release
> as sync rep.

i.e. never, at the rate sync rep has been progressing for the last few months?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: How much do the hint bits help?

From

Heikki Linnakangas

Date:

22 December 2010, 11:02:02

On 22.12.2010 16:52, Simon Riggs wrote:
> On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:
>> On 22.12.2010 15:59, Simon Riggs wrote:
>>> On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:
>>>> My gut feeling is that a reasonable compromise is to set hint bits like
>>>> we do today, but don't mark the page as dirty when only hint bits are
>>>> set. That way you get the benefit of hint bits for tuples that are
>>>> frequently accessed and stay in buffer cache. But you don't spend any
>>>> extra I/O to set them. I'd really like to see a worst-case scenario
>>>> benchmark of a patch that does that.
>>>
>>> That sounds great, but still prevents block checksums and that is a very
>>> valuable feature for robustness.
>>
>> It does? The problem with block checksums is that if you modify a page
>> and don't have a corresponding WAL record for it, like a hint bit
>> update, you can have a torn page so that the checksum doesn't match.
>> Refraining from dirtying the page when a hint bit is updated avoids the
>> problem. With that change, we only ever write pages to disk that have a
>> WAL record associated with it, with full-page images as necessary to
>> avoid torn pages.
>
> Which then leads to a block CRC not matching the block in memory.

What do you mean?

Do you envision that the CRC is calculated at every update, or only when 
a page is written out from the buffer cache? If the former, you could 
recalculate the CRC at a hint bit update too. If the latter, the hint 
bits are included in the page image that you checksum just like any 
other data.

> So what you suggest works only if we restrict CRC checking to blocks
> incoming to the buffer cache, but leaves us unable to do CRC checks on
> blocks once in the buffer cache. Since many blocks stay in cache almost
> constantly, we're left with the situation that the most heavily used
> parts of the database seldom get CRC checked.

There's plenty of stuff in memory that's not covered by an 
application-level CRC. That's what ECC RAM is for. Updating the CRC at 
every update to a page seems really expensive, but it's an orthogonal 
issue to hint bits.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: How much do the hint bits help?

From

Aidan Van Dyk

Date:

22 December 2010, 11:20:31

On Wed, Dec 22, 2010 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

> So what you suggest works only if we restrict CRC checking to blocks
> incoming to the buffer cache, but leaves us unable to do CRC checks on
> blocks once in the buffer cache. Since many blocks stay in cache almost
> constantly, we're left with the situation that the most heavily used
> parts of the database seldom get CRC checked.

With this statement, you just moved the goal posts on the checksumming
ideas.  In fact, you didn't just move the goal posts, you picked the
ball up and teleported it to another stadium.

I believe that most of the people talking about and wanting checksums
so far have been wanting them to verify I/O, not to verify that PG has
no bugs, that RAM is staying charged correctly, and that no stray bits
have been flipped, and that nobody else happens to be scribbling over
our shared buffers.

Being able to arbitrary (i.e at any point in time) prove that the
shared buffers contents are exactly what they should be may be a
worthy goal, but that's many orders of magnitude more difficult than
verifying that the bytes we read from disk are the ones we wrote to
disk.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: How much do the hint bits help?

From

Simon Riggs

Date:

22 December 2010, 11:32:02

On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:
> On 22.12.2010 16:52, Simon Riggs wrote:
> > On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:
> >> On 22.12.2010 15:59, Simon Riggs wrote:
> >>> On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:
> >>>> My gut feeling is that a reasonable compromise is to set hint bits like
> >>>> we do today, but don't mark the page as dirty when only hint bits are
> >>>> set. That way you get the benefit of hint bits for tuples that are
> >>>> frequently accessed and stay in buffer cache. But you don't spend any
> >>>> extra I/O to set them. I'd really like to see a worst-case scenario
> >>>> benchmark of a patch that does that.
> >>>
> >>> That sounds great, but still prevents block checksums and that is a very
> >>> valuable feature for robustness.
> >>
> >> It does? The problem with block checksums is that if you modify a page
> >> and don't have a corresponding WAL record for it, like a hint bit
> >> update, you can have a torn page so that the checksum doesn't match.
> >> Refraining from dirtying the page when a hint bit is updated avoids the
> >> problem. With that change, we only ever write pages to disk that have a
> >> WAL record associated with it, with full-page images as necessary to
> >> avoid torn pages.
> >
> > Which then leads to a block CRC not matching the block in memory.

> Do you envision that the CRC is calculated at every update, or only when 
> a page is written out from the buffer cache? 

At every update, so there is a clear assertion that the CRC matches the
block.

> If the former, you could 
> recalculate the CRC at a hint bit update too. If the latter, the hint 
> bits are included in the page image that you checksum just like any 
> other data.

If we didn't have hint bits, we wouldn't need to recalculate the CRC
each time one was updated...

> > So what you suggest works only if we restrict CRC checking to blocks
> > incoming to the buffer cache, but leaves us unable to do CRC checks on
> > blocks once in the buffer cache. Since many blocks stay in cache almost
> > constantly, we're left with the situation that the most heavily used
> > parts of the database seldom get CRC checked.
> 
> There's plenty of stuff in memory that's not covered by an 
> application-level CRC. That's what ECC RAM is for. 

http://www.google.com/research/pubs/archive/35162.pdf

Google research shows that each DIMM has an 8% chance per annum of
uncorrectable memory errors, even on ECC.

If you have large RAM, like everybody now does, your incidence of this
type of error will be much higher than it was in previous years, so our
perception of what is necessary now to protect databases is out of date.

We have data under our care, and will be much more likely to receive
this kind of error because of the amount of RAM we use.

> Updating the CRC at 
> every update to a page seems really expensive, but it's an orthogonal 
> issue to hint bits.

Clearly, the frequency with which we set hint bits affects the frequency
we can sensibly update CRCs. It shouldn't be up to us to decide how much
protection a user wants to give their data.

There might be two or three settings that make sense, but clearly we
need to be able to limit hint-bit setting to allow us to have a usable
CRC check. So there is a very string connection between turning this
optimisation off and gaining CRC checking as a feature.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: How much do the hint bits help?

From

Heikki Linnakangas

Date:

22 December 2010, 11:37:11

On 22.12.2010 17:31, Simon Riggs wrote:
> On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:
>> Do you envision that the CRC is calculated at every update, or only when
>> a page is written out from the buffer cache?
>
> At every update, so there is a clear assertion that the CRC matches the
> block.

Umm, when do you check the CRC? Every time the page is locked? Every 
time it's updated? If don't verify the CRC, what is it good for?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: How much do the hint bits help?

From

Heikki Linnakangas

Date:

22 December 2010, 11:42:36

On 22.12.2010 17:31, Simon Riggs wrote:
> On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:
>> There's plenty of stuff in memory that's not covered by an
>> application-level CRC. That's what ECC RAM is for.
>
> http://www.google.com/research/pubs/archive/35162.pdf
>
> Google research shows that each DIMM has an 8% chance per annum of
> uncorrectable memory errors, even on ECC.

You misread that paper. From summary:

>   About a third of machines and over 8% of DIMMs in
> our fleet saw at least one *correctable* error per year.

Emphasis mine.

> Our
> per-DIMM rates of correctable errors translate to an aver-
> age of 25,000–75,000 FIT (failures in time per billion hours
> of operation) per Mbit and a median FIT range of 778 –
> 25,000 per Mbit (median for DIMMs with errors), while pre-
> vious studies report 200-5,000 FIT per Mbit. The number of
> correctable errors per DIMM is highly variable, with some
> DIMMs experiencing a huge number of errors, compared to
> others. The annual incidence of uncorrectable errors was
> 1.3% per machine and 0.22% per DIMM.

So the real figure of uncorrectable errors is 0.22% per DIMM.

Anyway, unreliable RAM calls for more ECC bits in DIMMs, not invasive 
architectural changes to every single application in the system.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: How much do the hint bits help?

From

Tom Lane

Date:

22 December 2010, 11:45:50

Aidan Van Dyk <aidan@highrise.ca> writes:
> With this statement, you just moved the goal posts on the checksumming
> ideas.  In fact, you didn't just move the goal posts, you picked the
> ball up and teleported it to another stadium.

What he said.  I can't imagine that anyone will be interested in any
case other than "set the CRC immediately before writing, and check it
upon first reading the page in".  Maintaining it continuously while the
page is in shared memory is completely insane from a cost-versus-benefit
perspective.
        regards, tom lane

Re: How much do the hint bits help?

From

Simon Riggs

Date:

22 December 2010, 11:52:37

On Wed, 2010-12-22 at 10:45 -0500, Tom Lane wrote:
> Aidan Van Dyk <aidan@highrise.ca> writes:
> > With this statement, you just moved the goal posts on the checksumming
> > ideas.  In fact, you didn't just move the goal posts, you picked the
> > ball up and teleported it to another stadium.
> 
> What he said.  I can't imagine that anyone will be interested in any
> case other than "set the CRC immediately before writing, and check it
> upon first reading the page in".  Maintaining it continuously while the
> page is in shared memory is completely insane from a cost-versus-benefit
> perspective.

If you insist on setting hint-bits, then that is probably true.

Many people experience almost no I/O these days, and there's a strong
correlation between people caring about their data and also being
willing to spend big $s on cache. We need to protect our users, however
much money they spent on cache; I would argue the more money they spent
on cache the harder we should be trying to protect them.

I'm sure it will take a little while for everybody to understand why a
full CRC implementation is both necessary and now possible. Paradigm
shifts of thought do seem like teleports, but they can be beneficial.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: How much do the hint bits help?

From

Aidan Van Dyk

Date:

22 December 2010, 11:55:31

On Wed, Dec 22, 2010 at 10:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

> I'm sure it will take a little while for everybody to understand why a
> full CRC implementation is both necessary and now possible. Paradigm
> shifts of thought do seem like teleports, but they can be beneficial.

But please don't deny the rest of us airbags while you keep working on
teleportation ;-)

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: How much do the hint bits help?

From

Tom Lane

Date:

22 December 2010, 12:00:19

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> My gut feeling is that a reasonable compromise is to set hint bits like 
> we do today, but don't mark the page as dirty when only hint bits are 
> set. That way you get the benefit of hint bits for tuples that are 
> frequently accessed and stay in buffer cache. But you don't spend any 
> extra I/O to set them.

I think it's far more likely that that could be acceptable than the
radical method of removing hint bits altogether.

I have not looked into what's wrong with Merlin's test case, but my
thinking about it goes like this: we know that contention for buffer
lookup is significant at high loads, despite the facts that the accesses
are distributed across a lot of independently-usable buffers and we've
done much work to partition the lookup locks.  If we remove hint bits
and thereby force an access to clog for every tuple touch, we can expect
that the contention for clog access will be comparable to the worst case
for buffer access contention ... except that in many cases, it will be
distributed across far fewer pages and so the actual interference rate
will be far higher.  This will make our past experiences with "context
swap storms" look like a day at the beach.
        regards, tom lane

Re: How much do the hint bits help?

From

Simon Riggs

Date:

22 December 2010, 12:01:12

On Wed, 2010-12-22 at 17:42 +0200, Heikki Linnakangas wrote:
> On 22.12.2010 17:31, Simon Riggs wrote:
> > On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:
> >> There's plenty of stuff in memory that's not covered by an
> >> application-level CRC. That's what ECC RAM is for.
> >
> > http://www.google.com/research/pubs/archive/35162.pdf
> >
> > Google research shows that each DIMM has an 8% chance per annum of
> > uncorrectable memory errors, even on ECC.
> 
> You misread that paper. From summary:

I read the paper in detail before I posted. If you think that finding an
error in my quote disproves anything, you should read the whole paper. I
see this:

Conclusion 1
"... Nonetheless, the remaining incidence of 0.22% per DIMM
per year makes a crash-tolerant application layer indispens-
able for large-scale server farms."

What you are arguing for is a protection system that will reduce in
effectiveness as we add more cache.

What I am arguing in favour of is an option to allow people to protect
their data, whatever the size of their cache. I'm not forcing you or
anyone to use it, but I think its an important option to be offering to
our users. 

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: How much do the hint bits help?

From

Merlin Moncure

Date:

22 December 2010, 12:01:18

On Wed, Dec 22, 2010 at 10:55 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:
> On Wed, Dec 22, 2010 at 10:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> I'm sure it will take a little while for everybody to understand why a
>> full CRC implementation is both necessary and now possible. Paradigm
>> shifts of thought do seem like teleports, but they can be beneficial.
>
> But please don't deny the rest of us airbags while you keep working on
> teleportation ;-)

well, simon's point that hint bits complicate checksum may nor may not
be the case, but no hint bits = less i/o = less checksumming (unless
you checksum around the hint bits).  This lowers the expense of doing
it, which is nice.  Maybe that doesn't matter in the end, we'll see.

merlin

Re: How much do the hint bits help?

From

Tom Lane

Date:

22 December 2010, 12:06:35

Merlin Moncure <mmoncure@gmail.com> writes:
> well, simon's point that hint bits complicate checksum may nor may not
> be the case, but no hint bits = less i/o = less checksumming (unless
> you checksum around the hint bits).

I think you're optimistically assuming the extra clog accesses don't
cost any I/O.
        regards, tom lane

Re: How much do the hint bits help?

From

Merlin Moncure

Date:

22 December 2010, 12:06:55

On Wed, Dec 22, 2010 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> My gut feeling is that a reasonable compromise is to set hint bits like
>> we do today, but don't mark the page as dirty when only hint bits are
>> set. That way you get the benefit of hint bits for tuples that are
>> frequently accessed and stay in buffer cache. But you don't spend any
>> extra I/O to set them.
>
> I think it's far more likely that that could be acceptable than the
> radical method of removing hint bits altogether.
>
> I have not looked into what's wrong with Merlin's test case, but my
> thinking about it goes like this: we know that contention for buffer
> lookup is significant at high loads, despite the facts that the accesses
> are distributed across a lot of independently-usable buffers and we've
> done much work to partition the lookup locks.  If we remove hint bits
> and thereby force an access to clog for every tuple touch, we can expect
> that the contention for clog access will be comparable to the worst case
> for buffer access contention ... except that in many cases, it will be
> distributed across far fewer pages and so the actual interference rate
> will be far higher.  This will make our past experiences with "context
> swap storms" look like a day at the beach.

right.  note I'm not suggesting they they should actually be removed,
at least not yet.  I was just playing around and noticed that the cost
of not having them is not immediately obvious in highly synthetic
tests.  The cost of clog access in best case scenario appears to be
near zero, which I thought was interesting enough to point out.  What
I'm after here is the worst case scenario, how likely it is to happen,
and looking into possible remedies (if any).

I'm going to do lots more testing over the holidays.  I'm fishing for
ideas on good ways to flesh things out more.

merlin

Re: How much do the hint bits help?

From

Merlin Moncure

Date:

22 December 2010, 12:12:14

On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> well, simon's point that hint bits complicate checksum may nor may not
>> be the case, but no hint bits = less i/o = less checksumming (unless
>> you checksum around the hint bits).
>
> I think you're optimistically assuming the extra clog accesses don't
> cost any I/O.

right, but clog is much more highly packed which is both a good and a
bad thing.  my conjecture here is that jamming the clog files is
actually good, because that keeps them 'hot' and more than compensates
the extra heap i/o.  the extra lock of course is scary.

here's the thing, compared to the 90's when they were put in, the
transaction space has shrunk by half and we put gigabytes, not
megabytes of memory into servers.  what does this mean for the clog?
that's what i'm after.

merlin

Re: How much do the hint bits help?

From

Merlin Moncure

Date:

22 December 2010, 12:14:11

On Wed, Dec 22, 2010 at 11:12 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Merlin Moncure <mmoncure@gmail.com> writes:
>>> well, simon's point that hint bits complicate checksum may nor may not
>>> be the case, but no hint bits = less i/o = less checksumming (unless
>>> you checksum around the hint bits).
>>
>> I think you're optimistically assuming the extra clog accesses don't
>> cost any I/O.
>
> right, but clog is much more highly packed which is both a good and a
> bad thing.  my conjecture here is that jamming the clog files is
> actually good, because that keeps them 'hot' and more than compensates
> the extra heap i/o.  the extra lock of course is scary.

er, should have said, plus less heap i/o compensates the extra clog i/o.

merlin

Re: How much do the hint bits help?

From

Simon Riggs

Date:

22 December 2010, 12:24:08

On Wed, 2010-12-22 at 10:59 -0500, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > My gut feeling is that a reasonable compromise is to set hint bits like 
> > we do today, but don't mark the page as dirty when only hint bits are 
> > set. That way you get the benefit of hint bits for tuples that are 
> > frequently accessed and stay in buffer cache. But you don't spend any 
> > extra I/O to set them.
> 
> I think it's far more likely that that could be acceptable than the
> radical method of removing hint bits altogether.

I haven't argued to remove them, just have an option to not set them.

> I have not looked into what's wrong with Merlin's test case, but my
> thinking about it goes like this: we know that contention for buffer
> lookup is significant at high loads, despite the facts that the accesses
> are distributed across a lot of independently-usable buffers and we've
> done much work to partition the lookup locks.  If we remove hint bits
> and thereby force an access to clog for every tuple touch, we can expect
> that the contention for clog access will be comparable to the worst case
> for buffer access contention ... except that in many cases, it will be
> distributed across far fewer pages and so the actual interference rate
> will be far higher.  This will make our past experiences with "context
> swap storms" look like a day at the beach.

I think you're right, but I also think there are other ways we could
optimise that other than hint bits. 

For example, the single item cache might be changed, or we might
buffer/batch clog updates, or we might use a hash table of known aborted
transactions etc.

As Merlin points out, we don't have much evidence for their value or
lack of value, so we need a parameter to allow wide scale testing.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: How much do the hint bits help?

From

Tom Lane

Date:

22 December 2010, 12:43:09

Merlin Moncure <mmoncure@gmail.com> writes:
> I'm going to do lots more testing over the holidays.  I'm fishing for
> ideas on good ways to flesh things out more.

Based on the analogy to past bufmgr contention problems, I'd suggest
going back through the archives to look for the test cases associated
with context swap storm discussions.  The cases themselves might not
be quite right for this, but they'd at least show a structure for
stressing things at the tuple-access level.
        regards, tom lane

Re: How much do the hint bits help?

From

David Fetter

Date:

22 December 2010, 12:53:55

On Wed, Dec 22, 2010 at 04:00:30PM +0000, Simon Riggs wrote:
> On Wed, 2010-12-22 at 17:42 +0200, Heikki Linnakangas wrote:
> > On 22.12.2010 17:31, Simon Riggs wrote:
> > > On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:
> > >> There's plenty of stuff in memory that's not covered by an
> > >> application-level CRC. That's what ECC RAM is for.
> > >
> > > http://www.google.com/research/pubs/archive/35162.pdf
> > >
> > > Google research shows that each DIMM has an 8% chance per annum of
> > > uncorrectable memory errors, even on ECC.
> > 
> > You misread that paper. From summary:
> 
> I read the paper in detail before I posted. If you think that finding an
> error in my quote disproves anything, you should read the whole paper. I
> see this:
> 
> Conclusion 1
> "... Nonetheless, the remaining incidence of 0.22% per DIMM
> per year makes a crash-tolerant application layer indispens-
> able for large-scale server farms."
> 
> What you are arguing for is a protection system that will reduce in
> effectiveness as we add more cache.
> 
> What I am arguing in favour of is an option to allow people to protect
> their data, whatever the size of their cache. I'm not forcing you or
> anyone to use it, but I think its an important option to be offering to
> our users. 

For what version of PostgreSQL are you proposing that we provide this
protection?  Let's assume that it's before 10.0 so we can get some
idea of how this will arise :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: How much do the hint bits help?

From

Heikki Linnakangas

Date:

22 December 2010, 16:08:14

On 22.12.2010 18:12, Merlin Moncure wrote:
> On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Merlin Moncure<mmoncure@gmail.com>  writes:
>>> well, simon's point that hint bits complicate checksum may nor may not
>>> be the case, but no hint bits = less i/o = less checksumming (unless
>>> you checksum around the hint bits).
>>
>> I think you're optimistically assuming the extra clog accesses don't
>> cost any I/O.
>
> right, but clog is much more highly packed which is both a good and a
> bad thing.

As a sidenote: note that the clog is not currently CRC'd.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: How much do the hint bits help?

From

Josh Berkus

Date:

22 December 2010, 17:18:49

> right -- see the attached clog_stress.sql above.  It creates a script
> that inserts records in blocks of 10000, deletes half of them, and
> vacuums.  Neither the execution of the script nor a seq scan following
> its execution showed an interesting performance difference (which I am
> arbitrarily calling 5% in either direction).  Like I said though, I
> don't trust the patch or the results yet.

Given that DBT2 stressed the bufrmgr contention pretty well, it seems
like it'd be worth trying this for hint bits in the test servers.  We
should see if Mark Wong can do this in the new year.

I might be able to test on some client workloads.  We'll see; currently
I lack the harness to simulate a high level of client contention.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: How much do the hint bits help?

From

Mark Kirkwood

Date:

22 December 2010, 17:21:42

On 23/12/10 05:06, Merlin Moncure wrote:
> On Wed, Dec 22, 2010 at 10:59 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>>> My gut feeling is that a reasonable compromise is to set hint bits like
>>> we do today, but don't mark the page as dirty when only hint bits are
>>> set. That way you get the benefit of hint bits for tuples that are
>>> frequently accessed and stay in buffer cache. But you don't spend any
>>> extra I/O to set them.
>> I think it's far more likely that that could be acceptable than the
>> radical method of removing hint bits altogether.
>>
>> I have not looked into what's wrong with Merlin's test case, but my
>> thinking about it goes like this: we know that contention for buffer
>> lookup is significant at high loads, despite the facts that the accesses
>> are distributed across a lot of independently-usable buffers and we've
>> done much work to partition the lookup locks.  If we remove hint bits
>> and thereby force an access to clog for every tuple touch, we can expect
>> that the contention for clog access will be comparable to the worst case
>> for buffer access contention ... except that in many cases, it will be
>> distributed across far fewer pages and so the actual interference rate
>> will be far higher.  This will make our past experiences with "context
>> swap storms" look like a day at the beach.
> right.  note I'm not suggesting they they should actually be removed,
> at least not yet.  I was just playing around and noticed that the cost
> of not having them is not immediately obvious in highly synthetic
> tests.  The cost of clog access in best case scenario appears to be
> near zero, which I thought was interesting enough to point out.  What
> I'm after here is the worst case scenario, how likely it is to happen,
> and looking into possible remedies (if any).
>
> I'm going to do lots more testing over the holidays.  I'm fishing for
> ideas on good ways to flesh things out more.
>
>

Certainly having a choice about configuring them would be a good 
addition in itself, e.g  for data warehousing use the hint bits can be a 
considerable impediment so the *ability* to not have them would be a 
huge advantage.

if I have time over the early new year I'll do some testing too.

Cheers

Mark

Re: CRC checks WAS: How much do the hint bits help?

From

Josh Berkus

Date:

22 December 2010, 17:31:47

> I believe that most of the people talking about and wanting checksums
> so far have been wanting them to verify I/O, not to verify that PG has
> no bugs, that RAM is staying charged correctly, and that no stray bits
> have been flipped, and that nobody else happens to be scribbling over
> our shared buffers.

I agree that this should be our first goal.  Yes, we want to protect
users against memory errors as well.  However, that's a much tougher
feature to implement; I've done some hashing this out with engineers on
other DBMSes and nobody has good answers right now.  The overhead of
what Simon proposes would be enormous, and few users would be interested
in paying that cost.

Doing a CRC check-on-write, as well as checking for format corruption
before write would catch a majority of real-world problems.  Please
don't hold that up in pursuit of the bit-flipping problem, which
*nobody* has solved.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: How much do the hint bits help?

From

Josh Berkus

Date:

22 December 2010, 17:33:26

> Certainly having a choice about configuring them would be a good
> addition in itself, e.g  for data warehousing use the hint bits can be a
> considerable impediment so the *ability* to not have them would be a
> huge advantage.

Would need to be a restart option, no?

Regarding the contention which Tom expects: the extra load on the CLOG
would be 100% reads, no?  If it's *all* reads, why would we have any
more contention than we have now?

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: How much do the hint bits help?

From

Tom Lane

Date:

22 December 2010, 17:54:39

Josh Berkus <josh@agliodbs.com> writes:
> Regarding the contention which Tom expects: the extra load on the CLOG
> would be 100% reads, no?  If it's *all* reads, why would we have any
> more contention than we have now?

Read involves sharelock which still causes contention.  Those bufmgr
contention storms we saw before were completely independent of whether
the pages were accessed for read or for write.

Another thing to keep in mind is that the current clog access code is
designed on the assumption that there's considerable locality of access
to pg_clog, ie, you usually only need to consult it for recent XIDs
because older ones have been hinted.  Turn off hint bits, that behavior
goes out the window.
        regards, tom lane

Re: How much do the hint bits help?

From

Dimitri Fontaine

Date:

22 December 2010, 18:06:13

Josh Berkus <josh@agliodbs.com> writes: > I might be able to test 
on some client workloads.  We'll see; currently > I lack the 
harness to simulate a high level of client contention.   We're 
pretty successful in doing that with Tsung, even against large 
clusters of plproxy nodes.   http://tsung.erlang-projects.org/
http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: How much do the hint bits help?

From

Mark Kirkwood

Date:

22 December 2010, 18:07:27

On 23/12/10 10:54, Tom Lane wrote:
> Josh Berkus<josh@agliodbs.com>  writes:
>> Regarding the contention which Tom expects: the extra load on the CLOG
>> would be 100% reads, no?  If it's *all* reads, why would we have any
>> more contention than we have now?
> Read involves sharelock which still causes contention.  Those bufmgr
> contention storms we saw before were completely independent of whether
> the pages were accessed for read or for write.
>
> Another thing to keep in mind is that the current clog access code is
> designed on the assumption that there's considerable locality of access
> to pg_clog, ie, you usually only need to consult it for recent XIDs
> because older ones have been hinted.  Turn off hint bits, that behavior
> goes out the window.

Would a larger (or configurable) clog cache help with this tho?

Cheers

Mark

Re: How much do the hint bits help?

From

Simon Riggs

Date:

22 December 2010, 19:01:12

On Wed, 2010-12-22 at 22:08 +0200, Heikki Linnakangas wrote:
> On 22.12.2010 18:12, Merlin Moncure wrote:
> > On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
> >> Merlin Moncure<mmoncure@gmail.com>  writes:
> >>> well, simon's point that hint bits complicate checksum may nor may not
> >>> be the case, but no hint bits = less i/o = less checksumming (unless
> >>> you checksum around the hint bits).
> >>
> >> I think you're optimistically assuming the extra clog accesses don't
> >> cost any I/O.
> >
> > right, but clog is much more highly packed which is both a good and a
> > bad thing.
> 
> As a sidenote: note that the clog is not currently CRC'd.

Good point, thanks for mentioning it.

With 64kB of clog buffers and potentially 8 GB of shared_buffers, which
is about 10^5 more RAM for shared_buffers. So a protection mechanism for
shared_buffers will trap about 99.999% of RAM errors.

We might say that an error in clog could have a serious effect, and I
would agree. I don't see a way around that though, except for a CRC
check when we write to disk.

My understanding is that the context switch storms were because of the
I/O involved with thrashing the clog buffers. (Well, actually, I think
it was subtrans, but sane difference). To solve that, we could just swap
them out to shared_buffers with usage = 5 rather than evict them.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: How much do the hint bits help?

From

Merlin Moncure

Date:

22 December 2010, 19:13:27

On Wed, Dec 22, 2010 at 4:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Regarding the contention which Tom expects: the extra load on the CLOG
>> would be 100% reads, no?  If it's *all* reads, why would we have any
>> more contention than we have now?
>
> Read involves sharelock which still causes contention.  Those bufmgr
> contention storms we saw before were completely independent of whether
> the pages were accessed for read or for write.
>
> Another thing to keep in mind is that the current clog access code is
> designed on the assumption that there's considerable locality of access
> to pg_clog, ie, you usually only need to consult it for recent XIDs
> because older ones have been hinted.  Turn off hint bits, that behavior
> goes out the window.

That's not always going to be the case though.  In olap-ish
environments you will see cases of scans over many records that come
from a single transaction.  This is also the case where hint bits can
really drill you -- you insert a bunch of records, log the bits,
delete, log the bits, and vacuum eventually.  I started investigating
this on behalf of a friend who is experiencing basically the worst
case with regularity.

merlin