On 2019-03-29 15:58:14 +0000, Simon Riggs wrote: > On Fri, 29 Mar 2019 at 15:29, Andres Freund <andres@anarazel.de> wrote: > > That's far from a trivial feature imo. It seems quite possible that we'd > > end up with increased overhead, because the current logic can get away > > with only doing hint bit style writes - but would that be true if we > > started actually replacing the item pointers? Because I don't see any > > guarantee they couldn't cross a page boundary etc? So I think we'd need > > to do WAL logging during index searches, which seems prohibitively > > expensive. > > > > Don't see that. > > I was talking about reusing the first 4 bytes of an index tuple's > ItemPointerData, > which is the first field of an index tuple. Index tuples are MAXALIGNed, so > I can't see how that would ever cross a page boundary.
They're 8 bytes, and MAXALIGN often is 4 bytes:
xids are 4 bytes, so we're good.
If MAXALIGN could ever be 2 bytes, we'd have a problem.
So as a whole they definitely can cross sector boundaries. You might be able to argue your way out of that by saying that the blkid is going to be aligned, but that's not that trivial, as t_info isn't guaranteed that.
But even so, you can't have unlogged changes that you then rely on. Even if there's no torn page issue. Currently BTP_HAS_GARBAGE and ItemIdMarkDead() are treated as hints - if we want to guarantee all these are accurate, I don't quite see how we'd get around WAL logging those.
You can have unlogged changes that you rely on - that is exactly how hints work.
If the hint is lost, we do the I/O. Worst case it would be the same as what you have now.
I'm talking about saving many I/Os - this doesn't need to provably avoid all I/Os to work, its incremental benefit all the way.
--
Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services