Re: better page-level checksums - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: better page-level checksums |
Date | |
Msg-id | CAH2-Wznckc7EMLVZVR5eRWQVhP0VG-EGxG4UrBcPXAG17SuBeA@mail.gmail.com Whole thread Raw |
In response to | Re: better page-level checksums (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: better page-level checksums
|
List | pgsql-hackers |
On Tue, Jun 14, 2022 at 8:48 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jun 13, 2022 at 6:26 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Anyway, I can see how it would be useful to be able to know the offset > > of a nonce or of a hash digest on any given page, without access to a > > running server. But why shouldn't that be possible with other designs, > > including designs closer to what I've outlined? > > I don't know what you mean by this. As far as I'm aware, the only > design you've outlined is one where the space wasn't at the same > offset on every page. I am skeptical of that particular aspect, yes. Though I would define it the other way around (now the true special area struct isn't necessarily at the same offset for a given AM, at least across data directories). My main concern is maintaining the ability to interpret much about the contents of a page without context, and to not make it any harder to grow the special area dynamically -- which is a broader concern. Your patch isn't going to be the last one that wants to do something with the special area. This needs to be carefully considered. I see a huge amount of potential for adding new optimizations that use subsidiary space on the page, presumably implemented via a special area that can grow dynamically. For example, an ad-hoc compression technique for heap pages that temporarily "absorbs" some extra versions in the event of opportunistic pruning running and failing to free enough space. Such a design would operate on similar principles to deduplication in unique indexes, where the goal is to buy time rather than buy space. When we fail to keep the contents of a heap page together today, we often barely fail, so I expect something like this to have an outsized impact on some workloads. > In general, I was imagining that you'd need to look at the control > file to understand how much space had been reserved per page in this > particular cluster. I agree that's a bit awkward, especially for > pg_filedump. However, pg_filedump and I think also some code internal > to PostgreSQL try to figure out what kind of page we've got by looking > at the *size* of the special space. It's only good luck that we > haven't had a collision there yet, and continuing to rely on that > seems like a dead end. Perhaps we should start including a per-AM > magic number at the beginning of the special space. It's true that that approach is just a hack -- we probably can do better. I don't think that it's okay to break it, though. At least not without providing a comparable alternative, that doesn't rely on context from the control file. -- Peter Geoghegan
pgsql-hackers by date: