Do we need so many hint bits? - Mailing list pgsql-hackers

From Jeff Davis
Subject Do we need so many hint bits?
Date
Msg-id 1353026577.14335.91.camel@sussancws0025
Whole thread Raw
Responses Re: Do we need so many hint bits?  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Do we need so many hint bits?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Do we need so many hint bits?  (Andres Freund <andres@2ndquadrant.com>)
Re: Do we need so many hint bits?  (Robert Haas <robertmhaas@gmail.com>)
Re: Do we need so many hint bits?  (Jeff Janes <jeff.janes@gmail.com>)
Re: Do we need so many hint bits?  (Atri Sharma <atri.jiit@gmail.com>)
List pgsql-hackers
Related to discussion here:
http://archives.postgresql.org/message-id/CAHyXU0zn5emePLedoZUGrAQiF92F-YjvFr-P5vUh6n0WpKZ6PQ@mail.gmail.com

It occurred to me recently that many of the hint bits aren't terribly
important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly
has a purpose, and we'd expect it to be used many times following the
initial CLOG lookup.

But the other tuple hint bits seem to be there just for symmetry,
because they shouldn't last long. If HEAP_XMIN_INVALID or
HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
should just be changed to InvalidTransactionId.

Removing those 3 hints would give us 3 more flag bits (eventually, after
we are sure they aren't just leftover), and it would also reduce the
chance that a page is dirtied for no other reason than to set them. It
might even take a few cycles out of the tqual.c routines, or at least
reduce the code size. Not a huge win, but I don't see much downside
either.

Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
in the visibility map patch, apparently as a way to know when to clear
the VM bit when doing an update. It was then also used for scans, which
showed a significant speedup. But I wonder: why not just use the
visibilitymap directly from those places? It can be used for the scan
because it is crash safe now (not possible before). And since it's only
one lookup per scanned page, then I don't think it would be a measurable
performance loss there. Inserts/updates/deletes also do a significant
amount of work, so again, I doubt it's a big drop in performance there
-- maybe under a lot of concurrency or something.

The benefit of removing PD_ALL_VISIBLE would be significantly higher.
It's quite common to load a lot of data, and then do some reads for a
while (setting hint bits and flushing them to disk), and then do a
VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the
pages again. Also, if I remember correctly, Robert went to significant
effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM
bits consistent. Maybe this was all discussed before?

All of these hint bits will have a bit more of a performance impact
after checksums are introduced (for those that use them in conjunction
with large data loads), so I'm looking for some simple ways to mitigate
those effects. What kind of worst-case tests could I construct to see if
there are worrying performance effects to removing these hint bits?

Regards,Jeff Davis




pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: WIP patch for hint bit i/o mitigation
Next
From: Stephen Frost
Date:
Subject: Re: Doc patch making firm recommendation for setting the value of commit_delay