Re: I/O on select count(*) - Mailing list pgsql-performance

From Alvaro Herrera
Subject Re: I/O on select count(*)
Date
Msg-id 20080515020553.GN9838@alvh.no-ip.org
Whole thread Raw
In response to Re: I/O on select count(*)  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: I/O on select count(*)
Re: I/O on select count(*)
Re: I/O on select count(*)
List pgsql-performance
Greg Smith wrote:
> On Wed, 14 May 2008, Kevin Grittner wrote:
>
>> If this is the first time that the rows are being read since they were
>> inserted (or since the database was loaded, including from backup), it
>> may be rewriting the rows to set hint bits, which can make subsequent
>> access faster.
>
> This is the second time this has come up recently, and I know it used to
> puzzle me too.  This is a particularly relevant area to document better
> for people doing benchmarking.  As close I've found to a useful
> commentary on this subject is the thread at
> http://archives.postgresql.org/pgsql-patches/2005-07/msg00390.php
>
> I still don't completely understand this myself though, if I did I'd add
> a FAQ on it.  Anyone want to lecture for a minute on the birth and care
> of hint bits?  I'll make sure any comments here get onto the wiki.

Hint bits are used to mark tuples as created and/or deleted by
transactions that are know committed or aborted.  To determine the
visibility of a tuple without such bits set, you need to consult pg_clog
and possibly pg_subtrans, so it is an expensive check.  On the other
hand, if the tuple has the bits set, then it's state is known (or, at
worst, it can be calculated easily from your current snapshot, without
looking at pg_clog.)

There are four hint bits:
XMIN_COMMITTED -- creating transaction is known committed
XMIN_ABORTED   -- creating transaction is known aborted
XMAX_COMMITTED -- same, for the deleting transaction
XMAX_ABORTED   -- ditto

If neither of the bits is set, then the transaction is either in
progress (which you can check by examining the list of running
transactions in shared memory) or your process is the first one to check
(in which case, you need to consult pg_clog to know the status, and you
can update the hint bits if you find out a permanent state).


Regarding FAQs, I'm having trouble imagining putting this in the user
FAQ; I think it belongs into the developer's FAQ.  However, a
benchmarker is not going to look there.  Maybe we should start "a
benchmarker's FAQ"?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: I/O on select count(*)
Next
From: Greg Smith
Date:
Subject: Re: I/O on select count(*)