Re: preserving forensic information when we freeze - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: preserving forensic information when we freeze |
Date | |
Msg-id | 20130529000042.GC818@awork2.anarazel.de Whole thread Raw |
In response to | preserving forensic information when we freeze (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: preserving forensic information when we freeze
Re: preserving forensic information when we freeze |
List | pgsql-hackers |
On 2013-05-28 09:21:27 -0400, Robert Haas wrote: > I have attempted to implement this. Trouble is, we're out of infomask > bits. Using an infomask2 bit might work but we don't have many of > them left either, so it's worth casting about a bit for a better > solution. Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for > this purpose, but I think we're better off trying to reclaim those > bits in a future release. Exactly how to do that is a topic for > another email, but I believe it's very possible. What I'd like to > propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to > indicate that xmin is frozen. This bit pattern isn't used for > anything else, so there's no confusion possible with existing data > already on disk, but it's necessary to audit all code that checks > HEAP_XMIN_INVALID to make sure it doesn't get confused. I've done > this, and there's little enough of it that it seems pretty easy to > handle. I only suggested MOVED_IN/OFF because I didn't remember HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that combination could have been produced in the past in a way I couldn't find so far, I am all for it. I don't see a problem with breaking backward compatibility on that level and I don't think there's any way to get there without some level of low level compat break. > - When we follow HOT chains, we determine where the HOT chain ends by > matching the xmax of each tuple with the xmin of the next tuple. If > they don't match, we conclude that the HOT chain has ended. I > initially thought this logic might be buggy even as things stand today > if the latest tuple in the chain is frozen, but as Andres pointed out > to me, that's not so. If the newest tuple in the chain is all-visible > (the earliest point at which we can theoretically freeze it), all > earlier tuples are dead altogether, and heap_page_prune() is always > called after locking the buffer and before freezing, so any tuple we > freeze must be the first in its HOT chain. For the same reason, this > logic doesn't need any adjustment for the new freezing system: it's > never looking at anything old enough to be frozen in the first place. I am all for adding a comment why this is safe though. We thought about it for some while before we were convinced... > - Various procedural languages use the combination of TID and XMIN to > determine whether a function needs to be recompiled. Although the > possibility of this doing the wrong thing seems quite remote, it's not > obvious to me why it's theoretically correct even as things stand > today. Suppose that previously-frozen tuple is vacuumed away and > another tuple is placed at the same TID and then frozen. Then, we > check whether the cache is still valid and, behold, it is. This would > actually get better with this patch, since it wouldn't be enough > merely for the old and new tuples to both be frozen; they'd have to > have had the same XID prior to freezing. I think that could only > happen if a backend sticks around for at least 2^32 transactions, but > I don't know what would prevent it in that case. Hm. As previously said, I am less than convinced of those adhoc mechanisms and I think this should get properly integrated into the normal cache invalidation mechanisms. But: I think this is safe since we compare the stored/cached xmin/tid with one gotten from the SearchSysCache just before which will point to the correct location as of the last AcceptInvalidationMessages(). I can't think of a way were this would allow the case you describe. > - heap_get_latest_tid() appears broken even without this patch. It's > only used on user-specified TIDs, either in a TID scan, or due to the > use of currtid_byreloid() and currtid_byrelname(). It attempts find > the latest version of the tuple referenced by the given TID by > following the CTID links. Like HOT, it uses XMAX/XMIN matching to > detect when the chain is broken. However, unlike HOT, update chains > can in general span multiple blocks. It is not now nor has it ever > been safe to assume that the next tuple in the chain can't be frozen > before the previous one is vacuumed away. Andres came up with the > best example: suppose the tuple to be frozen physically precedes its > predecessor; then, an in-progress vacuum might reach the to-be-frozen > tuple before it reaches (and removes) the previous row version. In > newer releases, the same problem could be caused by vacuum's > occasional page-skipping behavior. As with the previous point, the > "don't actually change xmin when we freeze" approach actually makes it > harder for a chain to get "broken" when it shouldn't, but I suspect > it's just moving us from one set of extremely-obscure failure cases to > another. I don't think this is especially problematic though. If you do a tidscan starting from a tid that is so old that it can be removed: you're doing it wrong. The tid could have been reused for something else anyway. I think the ctid chaining is only meaningful if the tuple got updated very recently, i.e. you hold a snapshot that prevents the removal of the root tuple's snapshot. Nice work! Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: