Re: Writing Commit Status hint bits (was Re: Constant - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Writing Commit Status hint bits (was Re: Constant
Date
Msg-id 1122018686.21502.26.camel@localhost.localdomain
Whole thread Raw
In response to Re: Writing Commit Status hint bits (was Re: Constant WAL replay)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Writing Commit Status hint bits (was Re: Constant WAL replay)
List pgsql-hackers
On Wed, 2005-07-20 at 13:20 -0400, Tom Lane wrote:

> > When VACUUM freezes the xid, it *does* make sense at that point to
> > update the hint bits as a performance optimization.
>
> The hint bits are not really relevant when xmin = FrozenTransactionId,
> since any examiner of the tuple would consider that XID committed anyway.
> Besides, the hint bit is guaranteed set in that scenario; note the
> Assert where vacuum is setting it:
>
>                     HeapTupleHeaderSetXmin(tuple.t_data, FrozenTransactionId);
>                     /* infomask should be okay already */
>                     Assert(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED);
> The scenario in which the hint bit *must* be set is where it is for
> an XID for which we have deleted the relevant section of pg_clog,
> which we are willing to do well before freeze occurs, if we know that
> all the relevant XIDs have been hinted.  See TruncateCLOG.  Your patch
> breaks that logic by not considering hint-bit updates as changes that
> must be flushed to disk by checkpoint.

OK, I see that now. Thank you for showing me that.

I will change the patch so that this does not prevent VACUUM from
setting hint bits.

All of this does raise other questions, but thats probably sensible to
raise those on other threads.

> > 1. Any block read may attempt to set hint bits, which dirties the block
> > and must be written out. So *reads* can result in heavier write activity
> > at checkpoint time.
>
> Sure, but the alternative is heavier activity in repeated checks of
> pg_clog to find out commit state that a previous examiner of the tuple
> already found out.  The patch supposes that one write is worse than N
> reads, which is clearly a loss at some not-exceedingly-large value of N.
> If we thought that was a good tradeoff, we might as well not have the
> hint bits at all.

That is not a decision we can make without knowledge of the application.
In general, the hint bits are good. In *some* cases, not. I still seek
control over that as a designer.

Specifically, the scenario I want to optimize is this:
- we load a table with lots of real time measurement data, as one child
out of a large number of similar child tables
- we then immediately create summary tables from the measurements
- after this the detailed data is only sporadically accessed, if ever
- detail data is dropped from the database after a few weeks
- the majority of the database is detail data, so those tables are never
vacuumed since no rows are ever deleted from those tables (the tables
are dropped) nor is access sufficiently frequent to make it sensible to
set hint bits - hence *no* complete database vacuum is run on a regular
basis

In this design, the second step causes the whole detailed data table to
be written out to the database (again) immediately after loading. I
would like to be able to prevent that.

That design is *not* uncommon, since we might describe it as a
"regulatory/monitoring data archiving" application. I have two current
clients with exactly this design, plus have worked on similar apps in
the past. The PostgreSQL licence cost makes it particularly suitable for
that kind of application.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: yuanjia lee
Date:
Subject: Re: recover corrupted pg_controldata from WAL
Next
From: ohp@pyrenet.fr
Date:
Subject: regressin failure on latest CVS