preserving forensic information when we freeze - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | preserving forensic information when we freeze |
Date | |
Msg-id | CA+TgmoaEmnoLZmVbb8gvY69NA8zw9BWpiZ9+TLz-LnaBOZi7JA@mail.gmail.com Whole thread Raw |
Responses |
Re: preserving forensic information when we freeze
Re: preserving forensic information when we freeze Re: preserving forensic information when we freeze |
List | pgsql-hackers |
Various people, including at least Heikki, Andres, and myself, have proposed various schemes for avoiding freezing that amount to doing it sooner, when we're already writing WAL anyway, or at least when the buffer is already dirty anyway, or at least while the buffer is already in shared_buffers anyway. Various people, including at least Tom and Andres, have raised the point that this would lose possibly-useful forensic information that they have in the past found to be of tangible value in previous debugging of databases that have somehow gotten messed up. I don't know who originally proposed it, but I've had many conversations about how we could address this concern: instead of replacing xmin when we freeze, just set an infomask bit that means "xmin is frozen" and leave the old, literal xmin in place. FrozenTransactionId would still exist and still be understood, of course, but new freezing operations wouldn't use it. I have attempted to implement this. Trouble is, we're out of infomask bits. Using an infomask2 bit might work but we don't have many of them left either, so it's worth casting about a bit for a better solution. Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for this purpose, but I think we're better off trying to reclaim those bits in a future release. Exactly how to do that is a topic for another email, but I believe it's very possible. What I'd like to propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to indicate that xmin is frozen. This bit pattern isn't used for anything else, so there's no confusion possible with existing data already on disk, but it's necessary to audit all code that checks HEAP_XMIN_INVALID to make sure it doesn't get confused. I've done this, and there's little enough of it that it seems pretty easy to handle. A somewhat larger problem is that this requires auditing every place that looks at a tuple xmin and deciding whether any changes are needed to handle the possibility that the tuple may be frozen even though xmin != FrozenTransactionId. This is a somewhat more significant change, but it doesn't seem to be too bad. But there are a couple of cases that are tricky enough that they seem worth expounding upon: - When we follow HOT chains, we determine where the HOT chain ends by matching the xmax of each tuple with the xmin of the next tuple. If they don't match, we conclude that the HOT chain has ended. I initially thought this logic might be buggy even as things stand today if the latest tuple in the chain is frozen, but as Andres pointed out to me, that's not so. If the newest tuple in the chain is all-visible (the earliest point at which we can theoretically freeze it), all earlier tuples are dead altogether, and heap_page_prune() is always called after locking the buffer and before freezing, so any tuple we freeze must be the first in its HOT chain. For the same reason, this logic doesn't need any adjustment for the new freezing system: it's never looking at anything old enough to be frozen in the first place. - Various procedural languages use the combination of TID and XMIN to determine whether a function needs to be recompiled. Although the possibility of this doing the wrong thing seems quite remote, it's not obvious to me why it's theoretically correct even as things stand today. Suppose that previously-frozen tuple is vacuumed away and another tuple is placed at the same TID and then frozen. Then, we check whether the cache is still valid and, behold, it is. This would actually get better with this patch, since it wouldn't be enough merely for the old and new tuples to both be frozen; they'd have to have had the same XID prior to freezing. I think that could only happen if a backend sticks around for at least 2^32 transactions, but I don't know what would prevent it in that case. - heap_get_latest_tid() appears broken even without this patch. It's only used on user-specified TIDs, either in a TID scan, or due to the use of currtid_byreloid() and currtid_byrelname(). It attempts find the latest version of the tuple referenced by the given TID by following the CTID links. Like HOT, it uses XMAX/XMIN matching to detect when the chain is broken. However, unlike HOT, update chains can in general span multiple blocks. It is not now nor has it ever been safe to assume that the next tuple in the chain can't be frozen before the previous one is vacuumed away. Andres came up with the best example: suppose the tuple to be frozen physically precedes its predecessor; then, an in-progress vacuum might reach the to-be-frozen tuple before it reaches (and removes) the previous row version. In newer releases, the same problem could be caused by vacuum's occasional page-skipping behavior. As with the previous point, the "don't actually change xmin when we freeze" approach actually makes it harder for a chain to get "broken" when it shouldn't, but I suspect it's just moving us from one set of extremely-obscure failure cases to another. This patch probably needs some documentation updates. Suggestions as to where would be appreciated. As a general statement, I view this work as something that is likely needed no matter which one of the "remove freezing" approaches that have been proposed we choose to adopt. It does not fix anything in and of itself, but it (hopefully) removes an objection to the entire line of inquiry. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: