Thread: On the usefulness of hint bits

On the usefulness of hint bits

From
Leonardo Francalanci
Date:
Hi,


I was wondering what is the advantage of having hint bits for OLAP
-style workloads, that is when the number of transactions is not
that high.

If I got it right, in 10 pg_clog pages we can store the status for more
than 320000 transactions. That's a lot, in a very small space
(80KB?).

So I was wondering what's the gain we get from hint bits in cases
where pg_clog is "small" (that is, will be cached by postgresql/the
OS).

does somebody have some numbers on the effect of hint bits on
first/second reads?

I mean:

create mytable as .....

select * from mytable -> this one will update hint bits
select * from mytable -> this one will use them

to test it I guess we should change the code to have:

a version where they are never updated (that is, always look at
pg_clog)
(so that you don't have to write them the first time, and you'll get the
"true" reading time + pg_clog reading time)


a version that always set them to COMMITTED
(so that you don't have to write them the first time, and you'll get the
"true" reading time for the "second" read that would use them,
regardless of any pg/OS cache)


I'm asking because I don't like having all those writes on the
first scan... and I would like to know what the real benefit is for the
reads that come after the first one in case there are "few"
transactions per second (for example, less than 1 transaction per
second)





Re: On the usefulness of hint bits

From
Tom Lane
Date:
Leonardo Francalanci <m_lists@yahoo.it> writes:
> I was wondering what is the advantage of having hint bits for OLAP
> -style workloads, that is when the number of transactions is not
> that high.

> If I got it right, in 10 pg_clog pages we can store the status for more
> than 320000 transactions. That's a lot, in a very small space
> (80KB?).

> So I was wondering what's the gain we get from hint bits in cases
> where pg_clog is "small" (that is, will be cached by postgresql/the
> OS).

Reduction of contention for pg_clog access, for one thing.  If you read
the archives, you'll find that pg_clog access contention has been shown
to be one cause of "context swap storms".  Having to go to clog for
every single tuple access would make that orders of magnitude worse.

More generally, we're not going to give up hint bits even if there are
identifiable workloads where they don't buy much --- because there are
many others where they do.
        regards, tom lane


Re: On the usefulness of hint bits

From
Leonardo Francalanci
Date:
> Reduction of contention for pg_clog  access, for one thing.  If you read
> the archives, you'll find that  pg_clog access contention has been shown
> to be one cause of "context swap  storms".  Having to go to clog for
> every single tuple access would make  that orders of magnitude worse.


Ok; is it the

"Wierd context-switching issue on Xeon"

thread? Or that has nothing to do with it? I tried  "context swap  storms
pg_clog" but I didn't get anything...

> More generally, we're not going to give  up hint bits even if there are
> identifiable workloads where they don't buy  much --- because there are
> many others where they  do.


Sure: I wasn't suggesting to give up them, just to make their usage
"user selectable" (and "on" by default)





Re: On the usefulness of hint bits

From
Robert Haas
Date:
On Mon, Oct 11, 2010 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Reduction of contention for pg_clog access, for one thing.  If you read
> the archives, you'll find that pg_clog access contention has been shown
> to be one cause of "context swap storms".

I wonder if we could improve this with some sort of process-local
cache - not to get rid of hint bits, just to reduce pg_clog
contention.  We might easily end up testing the same XID many times
during the same table scan.

Another idea that's been discussed before is to avoid writing out
pages when only the hit bints have changed.  Or perhaps to write them
out from the background writer only, but not from backends and not
when checkpointing - have a state BM_UNTIDY, which the background
writer cleaning scan will treat as dirty, but which can otherwise be
treated as "not dirty", so that if we start to run short of free
buffers we don't hold things up writing out the hint bit updates.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: On the usefulness of hint bits

From
Leonardo Francalanci
Date:
> I wonder if we could improve  this with some sort of process-local
> cache - not to get rid of hint bits,  just to reduce pg_clog
> contention.  We might easily end up testing the  same XID many times
> during the same table scan.

I guess that's my scenario... not that many transactions, so even
"copying" the whole pg_clog table in the per-process memory would
be doable...





Re: On the usefulness of hint bits

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Oct 11, 2010 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Reduction of contention for pg_clog access, for one thing. �If you read
>> the archives, you'll find that pg_clog access contention has been shown
>> to be one cause of "context swap storms".

> I wonder if we could improve this with some sort of process-local
> cache - not to get rid of hint bits, just to reduce pg_clog
> contention.  We might easily end up testing the same XID many times
> during the same table scan.

There already is a one-entry cache --- see TransactionLogFetch.  Not
sure if making it bigger would be a win in current usage, although
you'd likely have to if you were trying to not set hint bits.

> Another idea that's been discussed before is to avoid writing out
> pages when only the hit bints have changed.

Yeah.
        regards, tom lane